What happens if a model provider cannot confirm the copyright status of training data?

The AI System Assessor records the risk in the risk register and assesses it for potential regulatory and reputational impact. A qualified or unavailable representation indicates the organisation cannot fully rely on the provider's assurances.

How does the text and data mining exception under the EU Copyright Directive apply?

Directive (EU) 2019/790 provides an exception for text and data mining, but rights holders can exercise an opt-out. The AISDP must document the measures taken to identify and exclude material where the opt-out has been exercised.

What happens if a model provider cannot confirm the copyright status of training data?

The AI System Assessor records the risk in the risk register and assesses it for potential regulatory and reputational impact. A qualified or unavailable representation indicates the organisation cannot fully rely on the provider's assurances.

How does the text and data mining exception under the EU Copyright Directive apply?

Directive (EU) 2019/790 provides an exception for text and data mining, but rights holders can exercise an opt-out. The AISDP must document the measures taken to identify and exclude material where the opt-out has been exercised.

Copyright and intellectual property risk is a critical consideration when selecting AI models for deployment under the EU AI Act. The training data used to develop large language models and generative AI systems frequently includes copyrighted text, images, audio, and other works, and the legality of this practice is being challenged in courts across multiple jurisdictions. For high-risk AI systems, the AI System Description and Performance document must record the copyright status of the training data, the legal basis relied upon for processing it, and the measures taken to respect rights holder opt-outs under Directive (EU) 2019/790. Where organisations deploy third-party pre-trained models, contractual representations about training data copyright status should be obtained from providers. Unqualified or unavailable representations must be recorded in the risk register and assessed for regulatory and reputational impact.

Why does copyright risk matter for AI model selection?

Regulatory Requirement

The training data used to develop AI models, particularly large language models and generative AI systems, may include copyrighted material.

The training data used to develop AI models, particularly large language models and generative AI systems, may include copyrighted material. The legal landscape is evolving rapidly, with litigation in multiple jurisdictions challenging the legality of training on copyrighted content without licence. For high-risk AI systems under the EU AI Act, the aisdp must document the copyright status of the training data used in models the organisation deploys.

What must the AISDP document about training data copyright?

Regulatory Requirement

The AISDP must record the copyright status of the training data for each model deployed.

The AISDP must record the copyright status of the training data for each model deployed. This includes identifying whether the training data includes copyrighted text, images, audio, or other works, and the legal basis relied upon for processing that material. Acceptable legal bases include licence, consent, the text and data mining exception under Directive (EU) 2019/790, or another recognised basis. The documentation must also cover the measures taken to identify and exclude material where the rights holder has exercised an opt-out under the Directive.

How should organisations handle copyright claims?

Regulatory Requirement

The AISDP must document the procedures for responding to copyright claims from rights holders.

The AISDP must document the procedures for responding to copyright claims from rights holders. This means establishing a clear process for receiving, assessing, and acting upon claims that copyrighted material has been used in the training data of a deployed model. Organisations should ensure that these procedures are proportionate to the scale and nature of the AI system's use of third-party content, and that they can demonstrate compliance with applicable copyright law when challenged.

What contractual protections apply to third-party models?

Engineering Approach

For systems incorporating pre-trained models from third parties, the organisation should obtain contractual representations regarding the copyright status of the model's training data.

For systems incorporating pre-trained models from third parties, the organisation should obtain contractual representations regarding the copyright status of the model's training data. These representations should cover the legal basis on which the training data was collected and processed, whether any rights holder opt-outs have been respected, and the provider's procedures for handling copyright claims. Contractual protections provide a documented chain of accountability that the ai system assessor can reference in the AISDP.

How should unqualified copyright representations be handled?

Engineering Approach

Where contractual representations from the model provider are unavailable or qualified, the AI System Assessor records the risk in the RISK REGISTER and assesses it for potential regulatory and reputational impact.

Where contractual representations from the model provider are unavailable or qualified, the AI System Assessor records the risk in the risk register and assesses it for potential regulatory and reputational impact. A qualified representation — for example, one that disclaims liability for a subset of training data — indicates that the organisation cannot fully rely on the provider's assurances. The risk assessment should consider the severity of potential infringement, the likelihood of claims being brought, and the reputational consequences for the deploying organisation.

What compensating controls reduce copyright risk?

Compensating Controls

Organisations can reduce copyright risk through several practical measures.

Organisations can reduce copyright risk through several practical measures. Input filtering and output monitoring can detect and flag content that closely resembles known copyrighted works. Regular audits of the model provider's copyright compliance documentation help ensure that representations remain current. Where the model provider updates the training data or model version, the organisation should reassess the copyright risk and update the AISDP accordingly. These controls are particularly important where the legal basis for training data processing is uncertain or contested.

Copyright and Intellectual Property Risk in AI Model Selection

Written by

Why does copyright risk matter for AI model selection?

What must the AISDP document about training data copyright?

How should organisations handle copyright claims?

What contractual protections apply to third-party models?

How should unqualified copyright representations be handled?

What compensating controls reduce copyright risk?

Frequently Asked Questions

Related Pages

In This Section

Build compliance into your pipeline