When does fine-tuning a foundation model trigger provider obligations?

Under Article 25(1)(b), fine-tuning that changes the model's intended purpose or risk profile triggers full provider obligations including conformity assessment, AISDP preparation, and CE marking.

What six criteria should drive model selection for EU AI Act compliance?

Documentability, testability, auditability, bias detectability, maintainability, and determinism — each scored on a three-level scale and weighted by the system's risk profile.

How does model selection differ for agentic AI systems?

Agentic systems require bounded action spaces, reversibility classification for each permitted action, chain-of-action traceability, and infrastructure-enforced timeout and iteration limits.

What must the Model Selection Record document?

Every learned component in the system architecture — primary decision models, embedding models, re-ranking models, and auxiliary classifiers — with evaluation rigour proportionate to each component's influence on outputs.

Does using a LoRA adapter on a foundation model trigger provider status?

Yes, if the adapter redirects the model toward a high-risk use case. The compliance boundary depends on whether the modification changes the model's intended purpose or risk profile, not on the volume of parameters modified. A LoRA adapter applied for high-risk recruitment screening triggers the same Article 25(1)(b) analysis as full fine-tuning.

Can an organisation use an open-source model in a high-risk system?

Yes, but the organisation inherits all documentation gaps. Training data provenance, bias testing, and adversarial evaluation may be unknown. The AISDP must document due diligence performed, licensing compatibility, and residual risks from provenance gaps. Best practice is to download once, hash, store internally, and never re-download from the external hub.

How should the Model Selection Record handle auxiliary models like embedding models?

Every learned component visible in the architecture diagram requires an entry proportionate to its influence. A primary decision model warrants full six-criteria evaluation. An embedding model warrants focused assessment of provenance, linguistic performance, biases, and version pinning. Any component without an entry is a documentation gap.

What happens if a model provider refuses to disclose training data composition?

The AI System Assessor records the gaps as non-conformities in AISDP Module 3. The organisation must assess whether it can compensate through its own output-level testing and evaluation. The vendor due diligence questionnaire should capture the provider's willingness to provide Annex IV information before the model is selected.

Is the best-performing model always the right choice for compliance?

No. A model achieving marginally higher accuracy at the cost of opacity may be a poor compliance choice if it cannot produce per-decision explanations for Article 14 oversight or if stochastic outputs complicate Article 12 logging. The weighted decision matrix ensures compliance criteria are evaluated alongside performance.

Does using a LoRA adapter on a foundation model trigger provider status?

Yes, if the adapter redirects the model toward a high-risk use case. The compliance boundary depends on whether the modification changes the model's intended purpose or risk profile, not on the volume of parameters modified. A LoRA adapter applied for high-risk recruitment screening triggers the same Article 25(1)(b) analysis as full fine-tuning.

Can an organisation use an open-source model in a high-risk system?

Yes, but the organisation inherits all documentation gaps. Training data provenance, bias testing, and adversarial evaluation may be unknown. The AISDP must document due diligence performed, licensing compatibility, and residual risks from provenance gaps. Best practice is to download once, hash, store internally, and never re-download from the external hub.

How should the Model Selection Record handle auxiliary models like embedding models?

Every learned component visible in the architecture diagram requires an entry proportionate to its influence. A primary decision model warrants full six-criteria evaluation. An embedding model warrants focused assessment of provenance, linguistic performance, biases, and version pinning. Any component without an entry is a documentation gap.

What happens if a model provider refuses to disclose training data composition?

The AI System Assessor records the gaps as non-conformities in AISDP Module 3. The organisation must assess whether it can compensate through its own output-level testing and evaluation. The vendor due diligence questionnaire should capture the provider's willingness to provide Annex IV information before the model is selected.

Is the best-performing model always the right choice for compliance?

No. A model achieving marginally higher accuracy at the cost of opacity may be a poor compliance choice if it cannot produce per-decision explanations for Article 14 oversight or if stochastic outputs complicate Article 12 logging. The weighted decision matrix ensures compliance criteria are evaluated alongside performance.

AI Model Selection for EU AI Act Compliance

Written by

Michael Clark

Chief Executive Officer, Standard Intelligence

Founder and CEO of Standard Intelligence. Author of the Practitioners Implementation Guide series for EU AI Act compliance.

Martin Dean

Chief Technology Officer, Standard Intelligence

CTO of Standard Intelligence. Leads platform engineering and contributes to the PIG series technical content.

Model selection under the EU AI Act is a compliance decision as much as an engineering one. The choice of model architecture directly determines the organisation's ability to satisfy requirements for transparency under Article 13, explainability supporting Article 14 human oversight, accuracy and robustness under Article 15, and data governance under Article 10. This pillar page covers model origin risk, copyright exposure, geopolitical considerations, the fine-tuning provider boundary, and the Model Selection Record.

Abstract

Read abstract

Model selection for high-risk AI systems under the EU AI Act requires evaluating candidates against six compliance criteria beyond traditional performance metrics: documentability, testability, auditability, bias detectability, maintainability, and determinism. The choice of architecture directly affects the organisation's ability to satisfy Annex IV documentation requirements, Article 13 transparency obligations, and Article 14 human oversight capabilities. Model origin risk varies significantly across open-source, proprietary, and in-house models, each carrying distinct documentation gaps and provenance challenges. Organisations that fine-tune general-purpose AI models for high-risk applications trigger the provider boundary under Article 25(1)(b), assuming the full set of provider obligations. Copyright and intellectual property exposure from training data requires documented assessment, particularly for systems incorporating pre-trained third-party models. Geopolitical risk arises from models aligned to non-EU values and infrastructure hosted outside EU jurisdiction. The Model Selection Record documents every learned component in the system architecture, from the primary decision model through embedding models and auxiliary classifiers, with evaluation rigour proportionate to each component's influence on the final output.

Why is model selection a compliance decision under the EU AI Act?

Regulatory Requirement

Model selection under the EU AI Act acquires a compliance dimension that goes well beyond traditional performance metrics like accuracy, latency, and resource constraints.

Model selection under the EU AI Act acquires a compliance dimension that goes well beyond traditional performance metrics like accuracy, latency, and resource constraints. The choice of model architecture directly determines whether the organisation can satisfy requirements for transparency under Article 13, provide the explanations required for effective human oversight under Article 14, demonstrate accuracy and robustness per Article 15, and document the system to the standard required by Annex IV.

A model that achieves marginally higher accuracy at the cost of opacity may be a poor choice if the resulting system cannot produce the explanations required for oversight or the documentation required by Annex IV, which mandates describing the model architecture "in sufficient detail for a qualified technical reviewer to understand its structure and behaviour." This requirement fundamentally favours architectures that can be documented with precision.

The EU AI Act's definition of an AI system under Article 3(1) encompasses machine-based systems that infer from inputs to generate outputs such as predictions, content, recommendations, or decisions. This captures a broad range of approaches: traditional heuristic and rule-based systems, statistical models, ensemble methods, deep neural networks, foundation models, and hybrid architectures. The model selection process should evaluate the full spectrum, selecting the approach that best satisfies both the business requirements and the compliance obligations.

For traditional heuristic systems, the compliance advantage is transparency: every decision pathway is deterministic and documentable. The compliance disadvantage is that manually designed scoring models can embed designer biases without the statistical tools available to detect them. Statistical models such as logistic regression occupy a middle ground, offering transparent, interpretable parameters well-suited to domains with established regulatory expectations. Ensemble methods like gradient-boosted trees offer strong performance with SHAP values providing theoretically grounded feature attribution. Deep neural networks and foundation models achieve state-of-the-art performance on unstructured data but introduce explainability limitations and stochastic output challenges that require compensating controls.

The AISDP must document the rationale for selecting one approach over another, demonstrating that the compliance implications were assessed alongside performance. Risk assessment establishes the risk profile that shapes all subsequent model selection decisions.

What are the compliance risks of different model origins?

Regulatory Requirement

Model origin risk varies materially across three categories, and the provenance of a model carries implications that the AI System Assessor must assess and document in AISDP Module 3.

Open-source models from repositories such as Hugging Face or GitHub offer accessibility and community validation but introduce several risks. Training data provenance may be unknown or poorly documented. The development process may not have included the bias testing, adversarial evaluation, or documentation records that the EU AI Act requires. Any organisation using an open-source model as a component of a high-risk system inherits these documentation gaps and must either fill them through its own due diligence or accept the resulting non-conformity risk. AISDP Module 3 must record which open-source components are incorporated, the due diligence performed on each, the licensing terms and their compatibility with the system's commercial and regulatory context, and the residual risks from provenance gaps.

Proprietary third-party models present different challenges. A commercial provider may refuse to disclose training data composition, architecture details, or fairness evaluation results, citing trade secrets. This creates an AISDP documentation gap that the organisation must address. Where vendor disclosures are insufficient, the AI System Assessor records the gaps as non-conformities, and the organisation assesses whether it can compensate through its own output-level testing and evaluation.

offer the greatest control over documentation and governance. The risk centres on process discipline: whether the development team followed documented methodology, satisfied training data governance requirements, and conducted rigorous testing. Models developed informally, during hackathons or innovation sprints, may lack the documentation infrastructure the AISDP demands.

When does fine-tuning trigger provider obligations under Article 25?

Regulatory Requirement

Organisations that fine-tune a general-purpose AI model for use in a high-risk system cross the provider boundary defined in Article 25(1)(b), assuming the full set of provider obligations under Article 16 in most cases.

Article 25(1)(b) provides that a deployer becomes a provider if it places on the market a high-risk AI system under its own name or trademark, or makes a substantial modification. Fine-tuning a general-purpose foundation model for a specific high-risk application, such as adapting a general LLM for credit assessment or recruitment screening, changes the model's intended purpose from "general purpose" to a specific high-risk use case. This triggers provider status with the complete set of obligations: conformity assessment, AISDP preparation, CE marking, Declaration of Conformity, and EU database registration.

The Legal and Regulatory Advisor should assess fine-tuning against three criteria. First, does it change the model's intended purpose as documented by the gpai provider? If the provider describes the model as general-purpose text generation and the organisation fine-tunes it for medical triage, the intended purpose has changed. Second, does it alter the model's risk profile? Fine-tuning on domain-specific data may introduce new bias patterns, failure modes, or accuracy characteristics not present in the base model. Third, does it affect the model's compliance with the GPAI provider's own obligations under Articles 51 to 56? Fine-tuning may void safety evaluations or alignment testing conducted on the base model.

What copyright and intellectual property risks arise from model selection?

Engineering Approach

Training data for AI models, particularly large language models and generative systems, may include copyrighted material, and the legal landscape is evolving rapidly with litigation in multiple jurisdictions challenging the legality of training on copyrighted content without licence.

For high-risk AI systems, the AISDP must document the copyright status of training data comprehensively. This includes identifying whether the data includes copyrighted works (text, images, audio), the legal basis relied upon for processing that material (licence, consent, or the text and data mining exception under Directive (EU) 2019/790), the measures taken to identify and exclude material where rights holders have exercised opt-outs, and the procedures for responding to copyright claims.

For systems incorporating pre-trained third-party models, the organisation should obtain contractual representations regarding the copyright status of the model's training data. Where such representations are unavailable or qualified, the AI System Assessor records the risk and assesses potential regulatory and reputational impact. The risk register should distinguish between models where training data copyright is documented and verified, models where the provider asserts clean training data but cannot provide detailed evidence, and models where training data provenance is substantially unknown.

The risk register should distinguish between three tiers of copyright confidence: models where training data copyright is documented and verified through licensing agreements, models where the provider asserts clean training data but cannot provide detailed provenance evidence, and models where training data composition is substantially unknown. Each tier carries different residual risk levels that the AI System Assessor quantifies in the risk assessment. For the third tier, the risk may be sufficient to disqualify the model for high-risk applications unless the organisation can demonstrate compensating controls such as output filtering for copyrighted content patterns.

How should geopolitical and nation-alignment risks be assessed?

Engineering Approach

Foundation models reflect the values, cultural norms, and policy perspectives embedded in their training data and alignment processes, creating compliance risks when deployed in EU contexts that the AISDP must document.

Training data geographic bias is the most common concern. Models trained predominantly on data from a particular jurisdiction may perform poorly on EU populations. A credit scoring model trained on US financial behaviour data may not generalise to European markets with different consumer protection frameworks. A natural language processing model trained on American English may handle EU-specific legal terminology, regulatory references, or cultural contexts poorly.

Alignment and safety tuning reflects the model provider's values and regulatory environment. A model aligned primarily to US free speech norms may not meet EU expectations regarding hate speech, disinformation, or content moderation. The assessment must evaluate the model's alignment against EU legal and ethical standards and document additional fine-tuning or guardrails applied to address gaps. Behavioural benchmarking using frameworks like HELM with EU-specific thresholds, testing against the Framework Decision on combating racism and xenophobia (2008/913/JHA), and EU-specific MMLU subsets provide quantified evidence.

Foreign government influence requires evaluating the model provider's ownership structure, governance, and known government relationships, particularly for sensitive domains like law enforcement, migration, or public administration. risks arise when inference or training infrastructure is hosted outside the EU, exposing the organisation to foreign government access under domestic laws such as the US CLOUD Act or China's National Intelligence Law. The module documents these risks.

What six criteria should drive model selection for compliance?

Engineering Approach

Model selection for high-risk systems should be evaluated against six compliance criteria alongside traditional performance metrics, scored on a three-level scale (strong, adequate, weak) with evidence-based justification for each score.

Documentability asks whether the architecture and decision process can be described to Annex IV standard, such that a qualified reviewer could reproduce the training process from documentation alone. Linear models have strong documentability where every parameter is a named coefficient. Transformers with billions of parameters have weaker documentability; the architecture can be described but learned representations cannot be enumerated.

Testability asks whether accuracy, robustness, and fairness can be evaluated meaningfully per Article 15. Deterministic architectures such as decision trees and linear models simplify testing. Stochastic architectures like LLMs require statistical testing frameworks and the assessment must specify the methodology needed.

Auditability asks whether individual decisions can be reconstructed from logs per Article 12. Models requiring only the input and model version for output reconstruction are strongly auditable. Models depending on runtime conditions, session state, or RAG context require more sophisticated logging.

Bias detectability asks whether fairness metrics can be computed at subgroup level and proxy variable effects identified. Models producing calibrated probability scores are more amenable to fairness analysis than those producing only ranked outputs or categorical labels. SHAP and integrated gradients enable feature attribution analysis for proxy detection.

What special considerations apply to agentic AI model selection?

Engineering Approach

Agentic AI systems that take actions in the world beyond producing predictions introduce a distinct set of model selection considerations that the AISDP must address, as Article 3(1) encompasses behaviour where outputs trigger real-world actions.

Action space constraints are the primary concern. The model architecture must support constraining the action space to the set of actions that are safe and within the system's intended purpose. An agentic system for a high-risk domain must have a bounded action space that can be documented, tested, and monitored. Open-ended architectures that can theoretically take any action require extensive guardrails whose effectiveness must be demonstrated in the AISDP. In frameworks like LangChain and LangGraph, this is implemented through tool definitions with Pydantic schemas that validate every tool call before execution.

Reversibility classification adds a second layer. Each permitted action is classified as reversible (can be undone programmatically), partially reversible (can be undone with effort, such as a database rollback), or irreversible (cannot be undone, such as a sent communication or physical actuator command). Irreversible actions require a human-in-the-loop checkpoint: the agent proposes, a human reviews and approves, and only then does execution proceed. Agentic AI compliance covers bounded autonomy and tool-use governance in detail.

requires the model to support logging the entire reasoning and action chain: the initial trigger, each intermediate observation and decision point, each tool call with parameters and return values, and the final outcome. This is essential for Article 12 compliance and post-incident investigation. The log is structured as JSON with a defined schema, written to append-only storage.

How should explainability requirements influence model selection?

Engineering Approach

The choice of model architecture directly constrains the organisation's ability to satisfy Article 13 transparency and Article 14 human oversight requirements, and explainability properties must be assessed during selection rather than retrofitted after development.

Intrinsic explainability is provided by architectures where the decision process is transparent by construction: linear models expose coefficients, decision trees expose branching logic, rule-based systems expose rules. Post-hoc methods (SHAP, LIME, attention maps, counterfactual explanations) approximate reasoning after the fact with known limitations in fidelity and stability.

Three distinct audiences require different explanation formats. Operators need explanations enabling independent judgement; SHAP values decomposing predictions into per-feature contributions serve this well, with TreeExplainer providing exact, fast computation for tree-based models. Affected persons need plain-language explanations answering "why was this decision made"; Alibi Explain's counterfactual explanations are particularly useful, answering "what would need to be different for the outcome to change." Auditors need explanations consistent with documented system design, revealing whether the model actually behaves as described in the AISDP.

How should model selection be governed and documented?

Compensating Controls

Model selection for a high-risk AI system is a compliance-significant decision that requires a defined governance process documented in the AISDP, not discretion left to individual data scientists.

The governance process involves multiple roles. The Technical SME proposes candidate architectures, evaluates them against the six compliance criteria, and recommends a selection. The AI Governance Lead reviews the recommendation against risk assessment findings. Legal counsel reviews intellectual property and GPAI provider implications. For commercially significant choices, the Business Owner is consulted. The AI Governance Lead approves the final selection. A formal objection mechanism allows any stakeholder to challenge the choice; objections are documented, discussed, and resolved with the resolution recorded.

The weighted decision matrix provides the evaluation structure. Each of the six criteria is scored for every candidate model, with weights reflecting the system's risk profile. Scoring must be evidence-based: "documentability: strong" means a qualified reviewer could reproduce training from documentation alone. Weights are documented and approved before evaluation begins.

Model cards following the Mitchell et al. (2019) format capture the standardised documentation output: intended use, disaggregated performance results, known limitations, training data summary, and ethical considerations. Google's Model Cards Toolkit automates generation from evaluation metrics, producing structured documents for AISDP Module 3.

in the system architecture, not only the primary decision model. Many high-risk systems incorporate multiple components: embedding models for retrieval or similarity, re-ranking models, classification heads for routing, and auxiliary models for monitoring or safety. Each influences outputs and carries its own risk profile. A primary decision model warrants full evaluation against all six criteria. An embedding model in a warrants focused assessment covering provenance, linguistic performance, known biases, and version pinning. An auxiliary safety classifier warrants documentation of accuracy, failure modes, and consequences for the primary system's compliance profile.

AI Model Selection for EU AI Act Compliance

Written by

Why is model selection a compliance decision under the EU AI Act?

What are the compliance risks of different model origins?

When does fine-tuning trigger provider obligations under Article 25?

What copyright and intellectual property risks arise from model selection?

How should geopolitical and nation-alignment risks be assessed?

What six criteria should drive model selection for compliance?

What special considerations apply to agentic AI model selection?

How should explainability requirements influence model selection?

How should model selection be governed and documented?

Frequently Asked Questions

Related Pages

Start your compliance journey