Share This Article
The Hamburg Data Protection Authority’s position on the lack of personal data processing by LLMs during data storage, if combined with CNIL’s recent view, might signal a substantial change in privacy authorities’ approach to any data processing performed by generative artificial intelligence (AI).
Hamburg Data Protection Authority’s View on Data Storage by LLMs
In a previous article, I discussed the paper issued by the Hamburg Data Protection Authority. Unlike traditional data systems, this paper argued that LLMs process tokens and vector relationships (embeddings). These tokens fragment the original information into such small parts that their storage does not constitute the processing of personal data.
According to the authority, tokens and embeddings in LLMs lack the direct and identifiable link to individuals required by CJEU jurisprudence to be classified as personal data. Furthermore, when LLMs respond to prompts, they generate new information that cannot be considered a copy of the original due to the “creation” phase.
While it may be possible to extract training data from LLMs, developers of artificial intelligence solutions must ensure that outputs cannot be deemed copies or even derivative works of the original content, implementing the appropriate guardrails.
CNIL’s Innovative Interpretations on Privacy and Generative AI
The Hamburg authority’s position aligns closely with the views expressed by the French Data Protection Authority (CNIL) in its current consultation on applying the GDPR to AI models.
The CNIL has requested stakeholders “to shed light on the conditions under which AI models can be considered anonymous or must be regulated by the GDPR.“
The CNIL has also shown a more open approach to relying on legitimate interests as a legal basis for developing AI systems, which is crucial for the data collection phase necessary for AI training. The CNIL emphasizes that the legitimate interests underlying data processing must be clearly defined in the Legitimate Interest Assessment (LIA), and the commercial purpose of developing an AI system does not contradict the use of legitimate interest as a legal basis.
In any case, developers must ensure that data processing is essential for development and does not threaten individuals’ rights and freedoms.
Potential Convergence of Views Between Hamburg and CNIL on AI
If we combine the positions of the Hamburg Data Protection Authority and CNIL positions, developers and deployers might have found major support in maintaining the GDPR compliance of data processing through generative artificial intelligence solutions. Specifically:
- Collected data could be processed based on legitimate interest, but with most relevant personal data automatically removed immediately after collection to reinforce the LIA;
- Only filtered data should be provided to the AI model for training, strengthening the argument that tokens stored by LLMs do not qualify as personal data and
- Guardrails should be in place to ensure that outputs cannot be copies or derivative works of any data used for training.
This approach should be supported by a detailed Data Protection Impact Assessment (DPIA) and a Legitimate Interest Assessment (LIA) and could offer significant protection for companies developing and exploiting AI solutions.
Additionally, this approach could be valuable in defending against intellectual property challenges, as it aligns with the Text and Data Mining (TDM) copyright exception. On the topic, you can read the article “AI Act โ What Is the Scope of the TDM Copyright Exception?“.