We use cookies to improve your experience and analyse site traffic.
Article 10 of the EU AI Act requires data governance for training, validation, and testing datasets regardless of whether the data was collected first-hand or acquired from a third party. Organisations must extend their governance frameworks to cover every external data source with binding contractual provisions and independent technical verification.
Organisations deploying high-risk AI systems frequently rely on data they did not collect, curate, or control, yet they bear full compliance responsibility for that data under Article 10.
Organisations deploying high-risk AI systems frequently rely on data they did not collect, curate, or control, yet they bear full compliance responsibility for that data under Article 10. Training datasets may be licensed from commercial data brokers. Pre-trained models may have been trained on web-scale corpora assembled by a gpai provider. Feature enrichment services may supply demographic, firmographic, or behavioural data from external sources.
In each of these scenarios, Article 10's data governance requirements apply to the training, validation, and testing datasets regardless of whether the data was collected first-hand or acquired from a third party. The organisation cannot delegate its regulatory obligations to a supplier, even where the supplier bears contractual responsibility for data quality. This creates a governance challenge: the organisation must satisfy the same data governance standards for externally sourced data as for data it collects directly, despite having limited visibility into the data's origins, collection methodology, and processing history.
Addressing this challenge requires a structured approach spanning contractual provisions, technical validation controls, liability arrangements, documentation requirements, and ongoing monitoring for silent changes in supplier practices. Data Governance and Management covers the full data governance framework within which third-party governance operates.
The data governance framework must extend beyond the organisation's own data operations to address every third-party data source through binding contractual requirements.
The data governance framework must extend beyond the organisation's own data operations to address every third-party data source through binding contractual requirements. Each data supplier relationship should be governed by provisions addressing provenance, quality, bias, change management, and audit access.
Provenance disclosure requires the supplier to reveal the data's original collection methodology, the lawful basis under which the data was collected (consent, legitimate interest, public interest, or other), the populations and geographies represented, any known limitations or biases in coverage, and any prior processing or filtering applied. Without this provenance information, the organisation cannot assess the data's suitability for training a high-risk AI system under Article 10(2), which requires that datasets be relevant, sufficiently representative, and to the best extent possible free of errors and complete.
Data quality specifications define measurable standards against which incoming data is validated. Contracts should set completeness thresholds specifying the maximum acceptable proportion of missing values per field, accuracy guarantees specifying error rate bounds verified through the supplier's quality assurance processes, timeliness requirements specifying the maximum age of records and update frequency, and consistency specifications covering format standards, schema compliance, and referential integrity. These specifications become the baseline against which incoming data is validated. Deliveries that fail to meet the specifications are rejected or flagged for remediation by the Technical SME before the data enters the training pipeline.
Contractual audit rights enable the organisation to verify a supplier's provenance disclosures, quality assurance processes, and bias management practices through direct inspection rather than relying on self-reported claims.
Contractual audit rights enable the organisation to verify a supplier's provenance disclosures, quality assurance processes, and bias management practices through direct inspection rather than relying on self-reported claims. These rights should cover on-site or remote inspection of the supplier's data collection and processing infrastructure, review of quality assurance records, access to the supplier's own bias and representativeness assessments, and verification that the supplier's data handling practices comply with GDPR and any applicable sector-specific data protection requirements.
The frequency of audits should be proportionate to the data's risk profile: annual audits for suppliers of high-volume or high-sensitivity data, biennial assessments for lower-risk sources. The Internal Audit Assurance Lead documents and retains the audit findings as part of the AISDP evidence pack for Module 4, ensuring that the evidence trail connects the supplier's practices to the organisation's compliance obligations.
Where a supplier refuses to grant audit rights, the organisation should assess whether alternative assurance mechanisms are available. These may include independent third-party audits commissioned by the supplier, SOC 2 or ISO 27001 certifications covering the data processing operations, or regulatory compliance certificates from the supplier's supervisory authority. The absence of any assurance mechanism represents a material data governance gap that the AI System Assessor records in the risk register. Conformity Assessment Documentation addresses how audit evidence feeds into the broader conformity assessment process.
Every data delivery from a third-party source should pass through an automated intake validation pipeline before entering the training data store, regardless of the supplier's contractual warranties.
Every data delivery from a third-party source should pass through an automated intake validation pipeline before entering the training data store, regardless of the supplier's contractual warranties. Contractual warranties set the expected standard; independent verification confirms that the standard is met in practice.
The validation pipeline should verify schema compliance covering field names, data types, and value formats, along with completeness covering missing value rates per field against contracted thresholds. It should also perform range and distribution checks using statistical tests that compare each delivery's distribution against the historical baseline for the same source. Anomaly detection identifies records or batches that are statistically unusual, which may indicate collection errors, processing failures, or silent methodology changes by the supplier. The data engineering team extends the pipeline with supplier-specific expectations, encoding the contractual quality specifications as automated checks so that every delivery is measured against the agreed standards.
Deliveries that fail validation are quarantined: they sit in a holding area, a notification is sent to both the data engineering team and the supplier, and the data does not enter the training pipeline until the failure is resolved. The quarantine log records each failed delivery, the nature of the failure, and the resolution, and serves as a Module 4 evidence artefact demonstrating that the organisation actively enforces its data quality standards on external sources.
Beyond individual delivery validation, the organisation should periodically reassess whether the supplier's data remains suitable for the system's intended purpose. A dataset that was representative when initially licensed may become unrepresentative as the deployment population changes, as the supplier's collection methodology evolves, or as societal patterns shift. The Technical SME conducts this reassessment at least annually, or more frequently for high-sensitivity data sources. The reassessment should include updated representativeness analysis examining whether the data still reflects the deployment population, fairness impact testing assessing whether the current model's fairness profile changes when retrained on the latest supplier data, and comparison against any new data sources that have become available. This periodic review prevents the organisation from relying on stale vendor relationships when better options exist. details the fairness testing methodologies applicable to these reassessments.
When third-party data causes a compliance failure, such as a fairness deficiency traceable to unrepresentative training data from an external vendor, the allocation of liability between the organisation and the supplier must be addressed contractually.
When third-party data causes a compliance failure, such as a fairness deficiency traceable to unrepresentative training data from an external vendor, the allocation of liability between the organisation and the supplier must be addressed contractually. The Legal and Regulatory Advisor is responsible for structuring these liability arrangements. Contracts should specify the supplier's liability for data quality breaches, including the remedy available to the organisation such as replacement data, reprocessing, or financial compensation.
The contract should also address the supplier's obligation to cooperate with regulatory investigations arising from data quality issues, and the indemnification arrangements for losses arising from the supplier's breach of data quality, provenance, or bias warranties. These provisions ensure that the organisation has contractual recourse when supplier data causes downstream compliance failures, though they serve a commercial rather than regulatory purpose.
However, contractual liability allocation does not diminish the organisation's own regulatory obligations. Under the AI Act, the provider or deployer remains responsible for compliance regardless of the data's source. Contractual remedies against the supplier serve as commercial protection, not a regulatory defence. This distinction is critical: an organisation cannot point to its supplier contract as a defence against regulatory enforcement when the data it used failed to meet Article 10 requirements.
AISDP Module 4 must document the third-party data governance framework with the same rigour as the first-party data governance framework.
AISDP Module 4 must document the third-party data governance framework with the same rigour as the first-party data governance framework. For each third-party data source, the AISDP must record the supplier identity and contractual reference, the data's purpose within the system (training, validation, testing, feature enrichment, or other), and the provenance information disclosed by the supplier.
The documentation must also capture the quality specifications and intake validation results, the representativeness assessment and any identified gaps, the audit rights and most recent audit findings, and the residual risk from any disclosure or quality gaps that could not be resolved. This documentation requirement ensures that the conformity assessment has a complete picture of the data's provenance chain, from original collection through to ingestion into the training pipeline.
Gaps in supplier disclosure are recorded as non-conformities in the Non-Conformity Register and escalated to the vendor through the procurement function. Where the vendor refuses to disclose adequate information, the organisation must assess whether it can compensate through its own evaluation of the model's outputs, specifically testing for bias on representative data from the deployment context. A gap that cannot be compensated through independent evaluation is a compliance risk that the AI System Assessor reflects in the risk register (Module 6) and communicates to the AI Governance Lead for a residual risk acceptance decision.
The mitigation strategy for third-party data operates on three layers: contractual, technical, and ongoing monitoring.
The mitigation strategy for third-party data operates on three layers: contractual, technical, and ongoing monitoring. The contractual layer establishes the baseline through supplier agreements addressing the five domains described above: provenance disclosure, quality specifications, bias and representativeness warranties, change notification, and audit rights.
The technical layer validates every delivery regardless of contractual promises. The intake validation pipeline is extended with supplier-specific expectations by the data engineering team. A dedicated expectation suite for each supplier encodes the contractual quality specifications as automated checks covering schema compliance, completeness thresholds, distributional consistency, and anomaly detection. Every delivery passes through this suite before the data enters the training pipeline. Deliveries that fail are quarantined: they sit in a holding area, a notification is sent to both the data engineering team and the supplier, and the data does not enter the pipeline until the failure is resolved. The quarantine log is a Module 4 evidence artefact.
The ongoing layer monitors for silent changes. Suppliers sometimes change their data collection or processing practices without notification, even when contractually obligated to do so. The technical defence is statistical monitoring of incoming deliveries, comparing each delivery's distributional profile against the historical baseline for that supplier. A sudden shift in the distribution of a key feature, a change in the proportion of missing values, or an unexpected change in demographic composition all signal a potential methodology change. When a silent change is detected, the supplier is notified, and the delivery is treated as if the contracted notification period had not been observed. This ongoing monitoring prevents the organisation from relying on stale vendor relationships when the underlying data characteristics have shifted.
The organisation should assess alternative assurance mechanisms such as independent third-party audits, SOC 2 or ISO 27001 certifications, or regulatory compliance certificates. The absence of any assurance mechanism is a material data governance gap recorded in the risk register.
At least annually, or more frequently for high-sensitivity data sources. Reassessment should include updated representativeness analysis, fairness impact testing with the current model, and comparison against new data sources that have become available.
No. Under the AI Act, the provider or deployer remains responsible for compliance regardless of the data's source. Contractual remedies against the supplier serve as commercial protection, not a regulatory defence.
Audit rights enable direct inspection of supplier data collection infrastructure, quality assurance records, and bias assessments rather than relying on self-reported claims.
Every delivery must pass through automated validation covering schema compliance, completeness thresholds, distribution checks, and anomaly detection before entering the training pipeline.
Module 4 must record supplier identity, data purpose, provenance disclosures, quality specifications, representativeness assessments, audit findings, and residual risks for each third-party source.
A three-layer strategy combining contractual baselines, technical intake validation with quarantine, and ongoing statistical monitoring for silent supplier changes.
Bias and representativeness warranties require the supplier to provide demographic composition statistics for the dataset, to the extent that disclosure is lawful under data protection obligations, and to warrant that the data does not systematically underrepresent or overrepresent any population group in a manner that would introduce bias into a downstream model. Where the supplier cannot provide demographic composition data because the data was not collected or because disclosure would breach the supplier's own data protection obligations, the AI System Assessor records this gap. The organisation then compensates through its own representativeness testing of the data against the deployment population.
Change notification provisions require the supplier to notify the organisation before making material changes to the data's collection methodology, scope, or processing. A supplier that silently changes its data collection practices can introduce distribution shifts that propagate through the training pipeline without awareness. The notification obligation should specify a minimum lead time, typically 30 to 90 days, and require a description of the change's expected impact on the data's composition and quality.
Where a supplier refuses to grant adequate provenance disclosure, quality specifications, or audit rights, the organisation faces a documentation gap in Module 4. The AI System Assessor records the gap in the risk register, and the organisation must assess whether compensating controls including its own independent quality testing, representativeness analysis, and bias evaluation are sufficient to satisfy Article 10. If they are not, the AI System Assessor replaces the data source with one that provides adequate transparency for compliance purposes.