Share This Article
The Hamburg Data Protection Authority issued an insightful discussion paper addressing privacy risks and the usage of LLMs that in their view do not process personal data.
Generative artificial intelligence systems like large language models generate some data protection criticalities in relation to the personal data processed as part of their training. This is why the position of the Hamburg privacy authority might be so relevant.
Here are some groundbreaking insights:
- LLM Processing & Data Storage: Unlike traditional data systems, LLMs process tokens and vector relationships (embeddings), which the Hamburg Data Protection Authority argues do not constitute “processing” or “storing” personal data under GDPR. If there is no processing of personal data, the GDPR is not applicable.
- Tokenization vs. Personal Data: Tokens and embeddings in LLMs do not have the direct, identifiable link to individuals required by CJEU jurisprudence to be considered personal data. This aspect shall be proven through the operation of the LLM.
- Memorization Attacks: While extracting training data from LLMs is possible, these attacks are often impractical and legally questionable, meaning personal data identification isn’t always feasible under current legislation. This means that as part of ordinary operations, no processing of personal data should occur.
- Legality of LLM Usage: Even if personal data was mishandled during LLM development, it doesnโt necessarily make using the resulting model illegal, offering reassurance to those deploying third-party models.
This paper reflects a sophisticated, tech-savvy approach to the intersection of AI and privacy.
Will other EU privacy authorities follow the same path? That would be ground breaking change for the industry! On a similar topic, you can read the article โThe Italian case on ChatGPT benchmarks generative AIโs privacy compliance?โ.