Share This Article
The Dutch data protection authority has provided guidelines that are particularly pertinent for companies leveraging data scraping to train generative artificial intelligence (AI) systems. Here are the key takeaways:
1. Legitimate Interest: This is likely the only legal basis for data scraping, even if the data is publicly available and promptly removed after collection ๐ The question is whether individuals currently have a legitimate expectation that their data will be scraped by AI and how to ensure that it is the case
2. New Processing Considerations: Data scraping is not a compatible purpose for further data collection and processing; it only applies only to a new processing activity ๐ This position is quite rigid and inconsistent with the first remark. A different legal basis shall apply to existing data processing which might end up with an unfeasible option
3. Commercial vs. Non-Commercial Interests: Purely commercial interests do not justify the use of legitimate interest as a legal basis. If scraping is for non-commercial purposes such as fraud prevention or improving security, it may be permissible ๐ Companies process personal data for commercial purposes, and that could not be otherwise. We can prove that such interests are balanced with those of data subjects who might benefit from that as well and that should be enough
4. Ethical Implications: Prior to scraping, companies must consider the potential harms and whether individuals have reasonable expectations of their data being used in such a manner ๐ The threshold to meet the reasonable expectation standard shall be clearly set. Using solutions of legal design might enable a higher level of transparency and increase arguments maintaining the existence of such expectations
5. Transparency and Data Management: Companies must be transparent about their data processing activities and strive to delete, pseudonymize, or anonymize data as soon as possible ๐ Documenting the development process of the AI system and proving its compliance with the regulatory framework are crucial. They require a collaboration between IT and legal departments of companies which we hopefully will see more and more in companiesโ organizations
6. Special Categories of Data: When dealing with special categories of data, itโs crucial to consider if the individual has actively made the data public ๐ I wonder whether the regulatory framework is sufficiently mature to enable data scraping of special categories of data.
What is your view on the view of the Dutch privacy authority on data scraping by AI systems? AI is the future, it requires the proper legal guardrails and documented processes to protect the interests of businesses and enable its fully compliant exploitation.
You can read several articles on the legal implications of the artificial intelligence and the impact of the AI Act HERE