Share This Article
The usage of copyright protected material for the training of artificial intelligence systems is a major issue under the AI Act, but what is the scope of the applicable TDM exception?
The EU AI Act provides the obligation on providers of AI systems to put in place a policy to respect Union copyright law and draw up and make publicly available a sufficiently detailed summary of the content used for training of the general-purpose AI model, according to a template provided by the AI Office. This disclosure obligation might lead to major disputes if there is insufficient clarity on the perimeter of exemptions to copyright exclusivity rights.
Unfortunately, the conflict between copyright and usage by AI systems of protected material has not been sorted by the EU AI Act, which only contains a cross-reference to the provisions of the EU Copyright Directive that address the text and data mining copyright exception.
The Text and Data Mining Exception under the Copyright Directive
In a previous article, my DLA Piper colleagues Elena Varese and Carolina Battistella had tackled the matter.
The Copyright Directive 2019/790/EU introduced the so-called text and data mining (TDM) exceptions, regulated in Articles 3 (Text and data mining for scientific research purposes) and 4 (Exceptions or limitations for the purposes of text and data mining) which apply to the training of AI systems. TDM is defined in Article 2 of the Copyright Directive as
any automated analysis technique aimed at analyzing text and data in digital format for the purpose of generating information, including, but not limited to, patterns, trends, and correlations.
Given the large amounts of data used by artificial intelligence systems to generate new content, the close relationship between generative AI and the TDM exception is evident: the text and data mining exception allows AI systems to access large amounts of data, which are used by generative artificial intelligence systems to create new content. Should these systems not be allowed to access such data, their ability to generate content would undoubtedly be limited.
Among the two TDM exceptions regulated by the EU Copyright Directive, the second one, which also allows mining for profit, deserves particular attention. Indeed, it exempts any text and data mining activity carried out on the intellectual work, including software or database protected material, regardless of the purpose or the qualification of the person exercising it.
This, however, is provided that:
- such person has had lawful access to the content for the purpose of text and data extraction; and
- the owner of the copyright and related rights and/or the database owner has not expressly reserved the extraction of text and data (the so-called opt-out mechanism), thereby bringing TDMโs activities under its exclusive control.
What shall rights holders do to exercise their opt-out under the TDM?
You may have noticed that many websites and even images on social media now have a reservation right statement to prevent the text and data mining exemption to apply and ensure that their content is not used for the training of artificial intelligence systems. But is that enough?
The liberalizing scope of the opt-out mechanism granted depends on the manner in which the reservation is made by the rights holder. It is Article 4(3) of the Copyright Directive itself that requires that the reservation be made โin an appropriate manner, for example, by means of tools enabling automated reading in the case of content made publicly available online.โ This provision seems to require that the reservation statement be machine-readable when the work to which it refers is made available to the public on the Internet. The effects of opting out can actually also result from the inclusion of an appropriate clause in a contract, an assumption moreover confirmed by the Copyright Directive itself, which does not include Article 4 among the mandatory rules.
Moreover, the qualification of the reservation statement is independent of any assessment regarding the possible presence of computer mechanisms to prevent data extraction. This interpretation is based on the merely informative function of the reservation. Thus, it will be sufficient to include the reservation in the R&D of the website, even if it lacks protective measures.
Therefore, the reservation
- will be able to be a โdigitalโ declaration devoid of IT protection mechanisms, such as the exclusion protocols contained in robots.txt files; or
- may be achieved through the affixing of a digital rights management system that not only has a digital protection function but also incorporates an automatically detectable digital declaration; and
- on the other hand, it may not consist of the mere affixing of technical protection measures that do not include any declaration, and which therefore turn out to be mere tacit manifestations of will. Thus, the affixing of technical measures does not have the effect of making any TDM activity per se unlawful, but it does, however, make extractions incompatible with the technical measure adopted prohibited, since copyright law prohibits circumventing technological protection measures.
As such a reservation statement shall always be present and it is better if accompanied by technical protection measures that however do not have to impact the indexing of the content.
What is the content used by the AI system covered by the copyright TDM?
There is considerable confusion on the scope of the TDM and on the potential implications in case of exercise of the opt-out right by rights holders. Indeed, the text and data mining exception applies only to copyright protected material used for the training of the artificial intelligence system. As such, providers shall make sure thatย reproductions and extractions of copyright protected material is โretained for as long as necessary for the purpose of text and data extraction,โ this is because the functionality of a copy to the extraction of text or data ceases at the time it is accomplished. Therefore, copies may not be retained for purposes beyond that of TDM, such as to verify and demonstrate achievements.
Under such a scenario, if the AI system is no longer using the copyright protected material since the training was already completed, the exercise of the opt-out by rights holders will not require any further activity by the provider.
Guardrails shall be put in place since the AI system shall not use the content as part of its computational analysis to generate outputs. Therefore, to limit the risk of challenges, the generative AI system shall be programmed in a way that all its outputs considerably deviate from the original content and cannot fall under the definition of derivative works.
This rational applies not only to the generations of images, but also to any content such as summaries, analysis of material and even parts of codes created by the AI system.
Recommendations for rights holders and providers of AI systems
A case by case assessment on the applicability of the TDM copyright exception shall be performed regardless of whether you are a right holder or the provider of an AI system.
Rights holders shall find the right balance between exercising their opt-out through statements and technical measures and still ensuring that it does not negatively impact the Google ranking of the webpages.
At the same time, providers of AI systems, as well as deployers when they provide the material to providers, shall:
- Obtain legitimate access to the content;
- Verify that the rights holders have not reserved the right to make reproductions for TDM purposes;
- Keep the copies made only as long as necessary for TDM purposes; and
- Ensure technical safeguards are in place to prevent that AI generated outputs are not challenged as derivative works.
The scenario set out above shall take into account also the potential additional legal tools that rights holders might have to protect their rights which include the broader protection granted by unfair competition rules.
Reach out to us if you want to know more, and read the page HERE for some other articles on the most relevant provisions of the AI Act.