Generative AI - ICO position on web scraping

Generative AI - ICO position on web scraping

Generative AI - ICO position on web scraping

Following its recent consultation to address concerns that Generative AI (GenAI) poses for UK data protection law, the Information Commissioner’s Office (ICO) has published its position on two key issues.

As GenAI models are often trained on substantial amounts of data, developers sometimes collect publicly available information from the internet to use as training data (referred to as “web scraping”). This might include personal data on individuals, meaning developers must show that they have a legitimate interest for this to be lawful:

  • The ICO clarified that developers must demonstrate a “specific and clear interest” to justify using scraped data; general commercial or societal interests are unlikely to be enough.
  • Developers need to show that web scraping is necessary to collect sufficient training data. The ICO encourages developers to find alternatives to scraping where possible, such as licensing it from publishers who collect personal data in a transparent manner.
  • If people are unaware of when their personal data is scraped, then they will be unable to exercise their rights in response. The ICO therefore expects GenAI developers to improve their transparency so that people know when their data is being processed.

Additionally, the ICO has clarified its expectations for GenAI developers when handling information requests from individuals whose personal data has been used to train their model:

  • If developers cannot inform individuals of how their personal data has been used, then data processing may become unlawful, so effective transparency measures are essential.
  • Some developers have argued that UK data protection law may not require them to respond to a request if they have used personal data in a way which means that the individual to whom it belongs can no longer be identified (for example, compiling it into training data).
  • The ICO has cautioned against relying on this exception too broadly. If a developer cannot identify the personal data that belongs to an individual who has made an information request, then it should inform the individual and allow them to provide more information to help locate the correct data.

The ICO considers AI to be a priority area due to its wide-reaching implications on data protection and privacy, and this consultation is just one of various measures to examine and regulate the data protection implications of AI.

For further information, see the ICO’s full response here.

If you require further advice on how data protection law applies to your business, then please contact our data protection and IT team.

Sources

Contact our experts for further advice

Search our site