How do you compensate for GPAI training data opacity?

Behavioural proxy testing with sentinel evaluation datasets covering fairness and safety dimensions, plus output distribution analysis across demographic subgroups using the downstream system's own operational data.

What should you do if a GPAI provider applies silent updates within a version identifier?

Implement sentinel monitoring that evaluates the model against a baseline dataset on a daily schedule for high-risk systems. Pin to specific model checkpoints where the API supports it. Include contractual notification requirements with a minimum 30-day notice period for changes that could affect behaviour.

Is the GPAI abstraction layer mandatory for compliance?

Not strictly mandatory. The compliance functions can be implemented within application code and verified manually as a procedural alternative. However, this approach is only feasible for single-system deployments with moderate traffic. At scale, the abstraction layer becomes necessary for operational reliability and auditability.

How does systemic risk classification of the underlying GPAI model affect the downstream system?

The downstream system inherits the systemic risk characteristics. The risk assessment must address whether the deployment context amplifies or constrains the risk. The downstream provider should request access to the evaluation results and risk assessments the GPAI provider produces under Article 55, and apply compensating controls with heightened intensity where these are unavailable.

What should you do if a GPAI provider applies silent updates within a version identifier?

Implement sentinel monitoring that evaluates the model against a baseline dataset on a daily schedule for high-risk systems. Pin to specific model checkpoints where the API supports it. Include contractual notification requirements with a minimum 30-day notice period for changes that could affect behaviour.

Is the GPAI abstraction layer mandatory for compliance?

Not strictly mandatory. The compliance functions can be implemented within application code and verified manually as a procedural alternative. However, this approach is only feasible for single-system deployments with moderate traffic. At scale, the abstraction layer becomes necessary for operational reliability and auditability.

How does systemic risk classification of the underlying GPAI model affect the downstream system?

The downstream system inherits the systemic risk characteristics. The risk assessment must address whether the deployment context amplifies or constrains the risk. The downstream provider should request access to the evaluation results and risk assessments the GPAI provider produces under Article 55, and apply compensating controls with heightened intensity where these are unavailable.

Not in practice. Article 25(1)(a) applies if the system is placed on the market under the organisation's own name. Article 25(1)(c) applies if the intended purpose differs from the GPAI model's general purpose. Most high-risk applications trigger provider status through one of these pathways regardless of whether fine-tuning occurs.

Abstract

Read abstract

The majority of high-risk AI systems entering production incorporate a general-purpose AI model as their primary inference component. The EU AI Act creates a layered obligation structure where the GPAI model provider bears Articles 51 to 56 obligations, but the downstream system provider bears full Chapter 2 compliance responsibility under Article 25. This creates a structural information asymmetry: the downstream provider needs detailed information about the GPAI model to complete its AISDP, but the provider controls that information. Article 25(3) establishes a statutory right to request disclosure across six categories. Where disclosure is insufficient, compensating controls address training data opacity through behavioural proxy testing, behavioural unpredictability through structured characterisation, version instability through pinning and sentinel monitoring, and security opacity through defence-in-depth architectures. Fine-tuning a GPAI model for a high-risk use case triggers provider status in almost all circumstances. A GPAI abstraction layer provides the central engineering control, enforcing compliance constraints at the interface between the downstream system and the model API.

How does the AI Act allocate obligations for GPAI-based high-risk systems?

Regulatory Requirement

The AI Act creates a layered obligation structure where the downstream provider bears full responsibility for the complete system, even when its most consequential component was designed and operated by a third-party gpai model provider. Article 25 is the critical provision: the GPAI provider's compliance with its own obligations does not reduce the downstream provider's obligations by a single requirement.

Three layers define the regulatory architecture. The GPAI model provider (OpenAI, Anthropic, Google, Mistral, Meta) bears obligations under Articles 51 to 56 for transparency, copyright compliance, and technical documentation, with additional safety obligations if classified as systemic risk under Article 51. The downstream system provider, the organisation building the application, bears full compliance with Chapter 2 requirements (Articles 8 to 20) including conformity assessment, Declaration of Conformity, CE marking, and EU database registration. The deployer bears Article 26 obligations for use in accordance with Instructions for Use, human oversight, monitoring, and incident reporting.

This creates a structural information asymmetry. The downstream provider needs detailed information about the GPAI model to complete its aisdp, but the GPAI provider controls that information and has commercial incentives to restrict disclosure. The majority of high-risk AI systems entering production in 2025 and 2026 incorporate a GPAI model as their primary inference component, making this challenge central to compliance.

What information can you request from GPAI providers under Article 25?

Regulatory Requirement

Article 25 entitles the provider of a high-risk AI system to request from the GPAI model provider the information necessary to ensure compliance with the AI Act.

Article 25 entitles the provider of a high-risk AI system to request from the GPAI model provider the information necessary to ensure compliance with the AI Act. This is a statutory right, not a courtesy. The request is submitted in writing, specifying the legal basis, information categories, intended use, and the downstream system's risk classification.

The structured information request covers six categories. Training data governance requests cover sources, collection methodology, geographic and demographic coverage, temporal scope, known biases, and copyright compliance measures. These feed AISDP Module 4. Model architecture and behaviour requests cover architecture family, parameter count, training methodology, alignment approach, known failure modes, and output constraints, feeding Module 3. Safety and security evaluation requests cover red-teaming results, adversarial testing, vulnerability disclosure policy, and incident response capabilities, feeding Module 9.

Versioning and change policy requests cover version scheme, deprecation policy, change notification commitments, and rollback availability, feeding Module 10. Data handling practice requests cover inference data retention, training data use, data processing agreements, and sub-processor disclosure, feeding Modules 4 and 9. Systemic risk documentation requests cover model evaluation results and risk assessments for Article 51 classified models, feeding Module 6.

Typical provider responses vary significantly. Training data disclosures are usually partial, with high-level categories disclosed but granular source lists withheld. Architecture family and parameter counts are usually available, but detailed failure mode analysis is rarely shared. Security disclosures range from published model cards to results shared under NDA to outright refusal. Version and change documentation is usually available, though many providers apply silent updates within a version identifier without notification.

What happens when a GPAI provider refuses to disclose information?

Regulatory Requirement

GPAI providers refuse or restrict disclosure for three reasons: commercial confidentiality around training data and architecture, intellectual property protection where safety results may reveal vulnerabilities, and operational scalability where bespoke disclosures to every integrator are commercially impractical.

Level 1 is negotiation. The request goes through the GPAI provider's enterprise or partnership channel. Many providers have dedicated compliance teams that provide information under NDA beyond standard API documentation. The AI System Assessor documents the negotiation: what was requested, offered, and refused.

Level 2 is escalation to the AI Office. Where the GPAI provider participates in the Code of Practice under Article 56, the downstream provider can refer the refusal to the AI Office. Participating providers who refuse to honour Code of Practice transparency commitments face enforcement under Article 88.

Level 3 is implementing compensating controls. Where disclosure cannot be obtained, the downstream provider fills information gaps through its own testing, evaluation, and monitoring. Level 4 is documenting the residual risk. Where compensating controls cannot fully substitute for missing information, the residual risk is recorded in the risk register (AISDP Module 6), assessed against the acceptability threshold, and escalated to the AI Governance Lead for a risk acceptance decision. A pattern of non-disclosure may itself influence model selection: choosing a more transparent GPAI provider reduces compliance risk.

How should you maintain a GPAI disclosure register?

Engineering Approach

The AI System Assessor maintains a structured GPAI disclosure register as a standing AISDP artefact that serves as both an audit trail and a change management trigger.

The AI System Assessor maintains a structured GPAI disclosure register as a standing AISDP artefact that serves as both an audit trail and a change management trigger. Each row records the Code of Practice commitment area, the information category requested, the disclosure received or absence of disclosure, the date, the AISDP module it informs, any compensating control applied, and the residual risk assessment.

When the GPAI provider updates its disclosures, the Assessor reviews the register and determines whether any AISDP modules require corresponding updates. When the provider releases a new model version, the Assessor re-evaluates whether existing disclosures remain valid for the new version. The register links directly to AISDP Modules 1, 2, 3, 4, 6, 8, 9, 10, and 12.

This register transforms GPAI integration compliance from an ad hoc exercise into a systematic, auditable process. Without it, the organisation cannot demonstrate to a conformity assessment body that it made reasonable efforts to obtain the information necessary for compliance.

How do you compensate for training data opacity?

Compensating Controls

The downstream provider cannot inspect, audit, or reproduce the GPAI model's training data, so Article 10's requirements for training data documentation, completeness, representativeness, and bias assessment cannot be satisfied at the pre-training level.

Behavioural proxy testing requires the Technical SME to design a sentinel evaluation dataset exercising the fairness, accuracy, and safety dimensions that training data disclosures do not address. The dataset includes balanced representation across protected characteristic subgroups relevant to the system's intended purpose. It includes adversarial examples testing for known bias patterns in the domain, such as gender bias in recruitment language, racial bias in credit-related terminology, and age bias in clinical descriptions. Edge cases probe the model's behaviour at the boundaries of its training distribution. Multilingual samples cover non-English populations where relevant.

The sentinel dataset is versioned, maintained, and re-evaluated whenever the GPAI model version changes. Results are documented in AISDP Module 4 alongside an explicit statement that the provider could not conduct Article 10 data documentation at the pre-training level and has substituted behavioural proxy testing.

Output distribution analysis requires the Technical SME to analyse the GPAI model's output distribution across demographic subgroups within the system's operational context. This uses the downstream system's own operational data, not the GPAI provider's training data. The analysis tests whether the model produces systematically different outputs for different subgroups when presented with equivalent inputs. Results are documented in Module 4 and cross-referenced to the fairness evaluation in Module 5.

How do you address behavioural unpredictability in foundation models?

Compensating Controls

Foundation models exhibit emergent behaviours not explicitly designed, including reasoning capabilities, tool-use abilities, and failure modes that appear at scale and were not present in smaller models. The downstream provider cannot predict or document these behaviours from first principles, making structured behavioural characterisation essential.

The Technical SME conducts systematic characterisation within the downstream system's operational context, covering five dimensions. Output determinism assesses whether the same input produces the same output and at what temperature settings. Sensitivity to prompt phrasing tests whether rephrasing the same question produces materially different outputs. Refusal behaviour maps conditions under which the model refuses output and whether refusal patterns affect any demographic subgroup disproportionately.

Hallucination rate measurement determines the proportion of outputs containing factual errors and whether the error rate varies by domain or input type. Boundary behaviour assessment examines how the model behaves when inputs fall outside the domain covered by the system's intended purpose.

The characterisation is documented in AISDP Module 3 and updated whenever the GPAI model version changes. The AI System Assessor uses it to inform the risk assessment in Module 6. This work is not optional: without it, the downstream provider cannot demonstrate that it understands the behavioural properties of its most consequential system component.

How do you manage version instability from GPAI providers?

Compensating Controls

GPAI providers update their models on their own schedule, and these updates can change behaviour in ways that affect the downstream system's compliance posture.

GPAI providers update their models on their own schedule, and these updates can change behaviour in ways that affect the downstream system's compliance posture. Some providers apply changes within a version identifier without notification, meaning a model accessed under the same name today may behave differently from three months ago. Three controls address this risk.

Version pinning is the first defence. Where the provider's API supports requesting a specific model checkpoint rather than the latest version, the Technical SME pins the production system to a specific version. Version upgrades are treated as system changes that trigger the governance pipeline gates and the change classification logic, including potential substantial modification assessment.

Sentinel monitoring provides continuous assurance. The Technical SME maintains a monitoring pipeline that evaluates the GPAI model against the sentinel dataset on a defined schedule, daily for high-risk systems. If the evaluation detects a behavioural shift exceeding defined tolerances, including output distribution shift, fairness metric change, or hallucination rate increase, an automated alert triggers investigation. The alert is documented in the post-market monitoring records (AISDP Module 12).

Contractual notification requirements form the third control. The procurement contract should require the GPAI provider to notify the downstream provider before making changes that could affect model behaviour, with a minimum notice period of 30 days recommended for high-risk systems.

How do you address security opacity in GPAI-based systems?

Compensating Controls

The downstream provider cannot audit the GPAI provider's security infrastructure, training pipeline security, or model integrity controls, so Article 15's resilience requirements must be satisfied through the downstream system's own security architecture.

Prompt injection defence in depth implements multiple layers of protection against the most significant security threat to GPAI-based systems. Input sanitisation validates and constrains user inputs before they reach the GPAI model. System prompt isolation separates the system prompt from user content using the provider's recommended separation mechanisms. Output validation verifies that the model's output conforms to expected format and content boundaries. Action gating, for agentic systems, requires explicit approval before the system acts on model output. These controls are documented in AISDP Module 9 with test evidence from the red-teaming programme.

Model integrity verification addresses the risk that a model accessed via API may have been tampered with or replaced without notification. The sentinel monitoring pipeline provides a behavioural proxy for integrity: unexpected changes in behaviour on the sentinel dataset may indicate an integrity issue as well as a version change. Where the GPAI model is deployed on-premises or in a private cloud, the Technical SME verifies model artefact integrity using cryptographic hashes at deployment time and periodically during operation.

When does fine-tuning make you the provider of a high-risk system?

Regulatory Requirement

Article 25 treats any entity that makes a substantial modification to a high-risk AI system as its provider, and fine-tuning a GPAI model for a high-risk use case constitutes a substantial modification in almost all circumstances. The compliance consequence is unambiguous: the organisation becomes a provider and must prepare a full AISDP, conduct a conformity assessment, sign a Declaration of Conformity, and register in the EU database.

Full fine-tuning, parameter-efficient fine-tuning (LoRA, QLoRA, prefix tuning), reinforcement learning from human feedback applied to a pre-trained model, and distillation from a larger GPAI model into a smaller domain-specific model all constitute fine-tuning for the purposes of the provider boundary analysis. In-context learning, providing examples in the prompt without modifying parameters, does not constitute fine-tuning but may still trigger a substantial modification if it changes the system's intended purpose.

A grey zone exists for systems using a GPAI model via API with a carefully crafted system prompt, retrieval-augmented generation, and output post-processing but without modifying model parameters. Article 25 applies if the organisation places the system on the market under its own name. Article 25 applies if the system's intended purpose differs from the GPAI model's general purpose. In practice, most high-risk applications of GPAI models trigger provider status through one of these pathways regardless of whether fine-tuning occurs.

What additional obligations apply when using systemic risk models?

Regulatory Requirement

Where the integrated GPAI model has been classified as having systemic risk under Article 51, the downstream provider faces three additional considerations that go beyond the standard GPAI integration requirements.

Inherited systemic risk means the downstream system inherits the risk characteristics of the underlying model. The risk assessment in AISDP Module 6 must explicitly address whether the deployment context amplifies or constrains the systemic risk. A systemic risk model deployed in a closed, narrowly scoped recruitment screening system presents a different profile from the same model deployed in an open-ended customer interaction system.

Access to systemic risk documentation is a practical consideration. Article 55 requires systemic risk model providers to conduct model evaluations, assess and mitigate systemic risks, report serious incidents, and ensure cybersecurity. The downstream provider should request these evaluation results and risk assessments. Where available, they provide valuable input. Where unavailable, the compensating controls apply with heightened intensity.

Incident reporting operates in parallel chains. Article 55 requires systemic risk model providers to report serious incidents to the AI Office. The downstream provider's own Article 73 reporting obligation operates simultaneously. A serious incident involving a high-risk system built on a systemic risk model may trigger reporting by the downstream provider to the national market surveillance authority and by the GPAI provider to the AI Office. The Legal and Regulatory Advisor coordinates to ensure consistency between the two reports.

How should GPAI integration be documented across the AISDP?

Engineering Approach

Each AISDP module requires specific content addressing the GPAI model component, creating a comprehensive documentation trail across the entire technical documentation package.

Each AISDP module requires specific content addressing the GPAI model component, creating a comprehensive documentation trail across the entire technical documentation package. Nine modules carry GPAI-specific requirements.

Module 1 (System Description) identifies the GPAI model, its provider, the API version or model identifier in use, the provider's EU registration status, and the intended purpose mapping from general-purpose to high-risk application. Module 2 (Development Process) documents the model selection rationale, fine-tuning methodology if applicable, prompt engineering approach, and integration architecture. Module 3 (Architecture) describes the GPAI model's position in the system architecture, data flows between the downstream system and the model API, output validation and filtering layers, and fallback mechanisms.

Module 4 (Data Governance) records the GPAI disclosure register's training data entries, behavioural proxy testing results, output distribution analysis, and the Article 10 gap statement. Module 6 (Risk Management) contains the inherited risk analysis, GPAI provider due diligence record, residual risk from information gaps, and contractual risk transfer analysis. Module 8 (Transparency) addresses how the GPAI model's involvement is disclosed to deployers and affected persons, the explanation methodology for GPAI-generated outputs, and the limitations of post-hoc explanations for foundation model reasoning.

Module 9 (Cybersecurity) documents the prompt injection defence architecture, GPAI-specific threat model, red-teaming results for GPAI attack surfaces, and security opacity compensating controls. Module 10 (Version Control) covers the version pinning policy, sentinel monitoring configuration, and change management process for GPAI model version upgrades. Module 12 (Post-Market Monitoring) records sentinel monitoring results, behavioural drift records, provider communication logs, and disclosure register update history.

What is the GPAI abstraction layer and when is it needed?

Compensating Controls

The central compensating control for GPAI integration is the GPAI abstraction layer: a software component that sits between the downstream system and the GPAI model API, providing a controlled interface that enforces compliance constraints regardless of the model's behaviour.

The abstraction layer implements five functions. Input governance validates and logs every request sent to the GPAI model, enforcing input constraints documented in the AISDP. Output validation checks every response against the expected output schema, content safety rules, and domain-specific constraints before the response reaches the processing pipeline. Sentinel injection periodically inserts evaluation requests into production traffic, compares responses against expected baselines, and alerts on deviation.

Version tracking logs the exact model version or checkpoint used for each inference, enabling retrospective analysis when the GPAI provider changes the model. Cost and rate monitoring tracks API usage against budgeted limits, preventing runaway costs from recursive agentic loops or adversarial abuse.

Without the abstraction layer, compliance functions must be implemented within the application code and verified manually. Input and output validation are coded into request handlers and response processing logic. Sentinel evaluation runs as a scheduled batch job rather than inline with production traffic. Version tracking relies on API response headers. Cost monitoring uses the provider's billing dashboard. This procedural approach is feasible for single-system deployments with moderate traffic volumes. At scale, the abstraction layer becomes necessary for operational reliability and auditability.

GPAI Integration: Building High-Risk Systems on Foundation Models

Written by

How does the AI Act allocate obligations for GPAI-based high-risk systems?

What information can you request from GPAI providers under Article 25?

What happens when a GPAI provider refuses to disclose information?

How should you maintain a GPAI disclosure register?

How do you compensate for training data opacity?

How do you address behavioural unpredictability in foundation models?

How do you manage version instability from GPAI providers?

How do you address security opacity in GPAI-based systems?

When does fine-tuning make you the provider of a high-risk system?

What additional obligations apply when using systemic risk models?

How should GPAI integration be documented across the AISDP?

What is the GPAI abstraction layer and when is it needed?

Frequently Asked Questions

Related Pages

Start your compliance journey