We use cookies to improve your experience and analyse site traffic.
AI models trained on copyrighted material create legal and regulatory risk for deploying organisations. The EU AI Act requires the AISDP to document the copyright status of training data, the legal basis for its use, and procedures for handling rights holder claims.
The training data used to develop AI models, particularly large language models and generative AI systems, may include copyrighted material.
The training data used to develop AI models, particularly large language models and generative AI systems, may include copyrighted material. The legal landscape is evolving rapidly, with litigation in multiple jurisdictions challenging the legality of training on copyrighted content without licence. For high-risk AI systems under the EU AI Act, the aisdp must document the copyright status of the training data used in models the organisation deploys.
The AISDP must record the copyright status of the training data for each model deployed.
The AISDP must record the copyright status of the training data for each model deployed. This includes identifying whether the training data includes copyrighted text, images, audio, or other works, and the legal basis relied upon for processing that material. Acceptable legal bases include licence, consent, the text and data mining exception under Directive (EU) 2019/790, or another recognised basis. The documentation must also cover the measures taken to identify and exclude material where the rights holder has exercised an opt-out under the Directive.
The AISDP must document the procedures for responding to copyright claims from rights holders.
The AISDP must document the procedures for responding to copyright claims from rights holders. This means establishing a clear process for receiving, assessing, and acting upon claims that copyrighted material has been used in the training data of a deployed model. Organisations should ensure that these procedures are proportionate to the scale and nature of the AI system's use of third-party content, and that they can demonstrate compliance with applicable copyright law when challenged.
For systems incorporating pre-trained models from third parties, the organisation should obtain contractual representations regarding the copyright status of the model's training data.
For systems incorporating pre-trained models from third parties, the organisation should obtain contractual representations regarding the copyright status of the model's training data. These representations should cover the legal basis on which the training data was collected and processed, whether any rights holder opt-outs have been respected, and the provider's procedures for handling copyright claims. Contractual protections provide a documented chain of accountability that the ai system assessor can reference in the AISDP.
Where contractual representations from the model provider are unavailable or qualified, the AI System Assessor records the risk in the RISK REGISTER and assesses it for potential regulatory and reputational impact.
Where contractual representations from the model provider are unavailable or qualified, the AI System Assessor records the risk in the risk register and assesses it for potential regulatory and reputational impact. A qualified representation — for example, one that disclaims liability for a subset of training data — indicates that the organisation cannot fully rely on the provider's assurances. The risk assessment should consider the severity of potential infringement, the likelihood of claims being brought, and the reputational consequences for the deploying organisation.
Organisations can reduce copyright risk through several practical measures.
Organisations can reduce copyright risk through several practical measures. Input filtering and output monitoring can detect and flag content that closely resembles known copyrighted works. Regular audits of the model provider's copyright compliance documentation help ensure that representations remain current. Where the model provider updates the training data or model version, the organisation should reassess the copyright risk and update the AISDP accordingly. These controls are particularly important where the legal basis for training data processing is uncertain or contested.
Acceptable legal bases include licence, consent, the text and data mining exception under Directive (EU) 2019/790, or another recognised basis. The AISDP must document which basis applies to the training data used.
The AI System Assessor records the risk in the risk register and assesses it for potential regulatory and reputational impact. A qualified or unavailable representation indicates the organisation cannot fully rely on the provider's assurances.
Directive (EU) 2019/790 provides an exception for text and data mining, but rights holders can exercise an opt-out. The AISDP must document the measures taken to identify and exclude material where the opt-out has been exercised.
The AI System Assessor records the risk in the risk register and assesses it for regulatory and reputational impact.