We use cookies to improve your experience and analyse site traffic.
Article 11 and Annex IV of the EU AI Act require providers of high-risk AI systems to document key design choices including model selection. This page explains the governance process, decision criteria, and documentation requirements that transform model selection from an informal technical decision into an auditable compliance activity.
Model selection for a high-risk AI system is a compliance-significant decision that must follow a defined governance process.
Model selection for a high-risk AI system is a compliance-significant decision that must follow a defined governance process. It should not be left to the discretion of individual data scientists. The aisdp prescribes a process where the Technical SME proposes a candidate, the team evaluates against documented criteria, and the results are reviewed by the AI Governance Lead before the selection decision is formally recorded with supporting analysis.
This governance record forms part of the AISDP evidence pack. It should be retrievable if a notified body or competent authority asks why a particular model architecture was chosen. Model Selection and Validation covers the broader framework within which this governance sits.
The Technical SME proposes candidate architectures, evaluates them against compliance criteria, and recommends a selection.
The Technical SME proposes candidate architectures, evaluates them against compliance criteria, and recommends a selection. The AI Governance Lead then reviews this recommendation against both the compliance criteria and the risk assessment findings. Legal counsel reviews the intellectual property and gpai provider implications separately.
For systems where the model choice has significant commercial implications, such as selecting a costly proprietary model over a less expensive open-source alternative, the Business Owner should also be consulted. The AI Governance Lead holds final approval authority for the selection decision.
The governance process must include a mechanism for challenging the model choice.
The governance process must include a mechanism for challenging the model choice. If the Classification Reviewer, Legal and Regulatory Advisor, or any other stakeholder believes the selected model does not adequately satisfy the compliance criteria, they should be able to raise a formal objection.
The objection is documented, discussed, and resolved through the governance process. How the objection was resolved, including any changes to the choice or additional compensating controls, is recorded in the model selection rationale document. This challenge mechanism ensures that compliance concerns are surfaced and addressed before a selection becomes final.
The weighted decision matrix brings structure to the trade-off between performance and compliance.
The weighted decision matrix brings structure to the trade-off between performance and compliance. Model selection in a compliance-regulated context differs fundamentally from selection in a purely performance-driven context: the best-performing model is not necessarily the most compliant model.
Six criteria form the basis of the matrix: documentability, testability, auditability, bias detectability, maintainability, and determinism. Each criterion is scored on a three-level scale (strong, adequate, weak) for every candidate model. The scoring must be evidence-based, not impressionistic. "Documentability: strong" means a qualified reviewer could reproduce the training process from the documentation alone.
A deep neural network that achieves marginally higher accuracy than a gradient-boosted ensemble may be a poor compliance choice if it cannot produce the per-decision explanations required for Article 14 human oversight, or if its stochastic outputs complicate Article 12 logging requirements.
The matrix should be weighted according to the system's risk profile. For a high risk ai system in the employment domain, where Article 14 human oversight is paramount and affected persons have a right to explanation under Article 86, explainability-adjacent criteria (documentability, auditability, bias detectability) should carry higher weight. For a safety-critical system where Article 15 robustness is paramount, testability and determinism should carry higher weight. The weights must be documented and approved by the AI Governance Lead before the evaluation begins, to avoid post-hoc rationalisation of a preferred choice. explores the explainability requirements that influence this weighting.
Model cards are the standardised documentation format for capturing model selection outputs as compliance evidence.
Model cards are the standardised documentation format for capturing model selection outputs as compliance evidence. A model card, as defined by Mitchell et al. (2019), records the model's intended use, its performance evaluation results including disaggregated metrics across subgroups, its known limitations, its training data summary, and its ethical considerations.
Google's Model Cards Toolkit automates generating model cards from evaluation metrics, producing a structured HTML or Markdown document that can be stored as a Module 3 evidence artefact. Hugging Face's Model Card metadata schema extends this further with machine-readable fields that can be queried programmatically. Both formats support the evidence requirements of the AISDP.
AISDP Module 3 requires describing "the key design choices and their rationale," making the model selection rationale document a core component.
AISDP Module 3 requires describing "the key design choices and their rationale," making the model selection rationale document a core component. The document should cover the following areas:
This documentation serves two audiences. The Technical SME reviews it for technical soundness. The Classification Reviewer and any notified body review it for evidence that the organisation made a considered, risk-aware choice and did not simply default to the most complex available model.
The selection rationale must cover every learned component in the system architecture, not only the primary decision-making model.
The selection rationale must cover every learned component in the system architecture, not only the primary decision-making model. Many high-risk AI systems incorporate multiple model components: embedding models used for retrieval or similarity matching, re-ranking models that refine search results, classification heads for routing or filtering, and auxiliary models used for monitoring or safety evaluation. Each component influences the system's outputs and carries its own risk profile.
A primary decision model warrants a full evaluation against all six compliance criteria. An embedding model used in a RAG pipeline warrants a focused assessment covering provenance, linguistic performance across the deployment languages, known biases in the representation space, and the version pinning strategy. An auxiliary safety classifier warrants documentation of its accuracy, its failure modes, and the consequences of those failures for the primary system's compliance profile.
The AI System Assessor verifies that the Model Selection Record is complete with respect to the system's architecture diagram. Any model component visible in the architecture that lacks a corresponding entry in the Model Selection Record is a documentation gap. Documentation and Evidence Management provides further guidance on structuring compliance documentation across multiple components.
No. The best-performing model is not necessarily the most compliant. A deep neural network with marginally higher accuracy may be a poor choice if it cannot produce per-decision explanations required for human oversight or if stochastic outputs complicate logging requirements.
The governance process must include a formal challenge mechanism. Any stakeholder can raise a documented objection, which is discussed and resolved. The resolution, including any changes or additional compensating controls, is recorded in the rationale document.
Not the same depth, but it does need documentation. An embedding model warrants a focused assessment covering provenance, linguistic performance, known biases, and version pinning strategy, proportionate to its influence on outputs.
A structured scoring tool evaluating candidate models against six compliance criteria (documentability, testability, auditability, bias detectability, maintainability, determinism) weighted by risk profile.
Functional requirements, compliance criteria, candidate evaluations, methodology, comparison results, recommended selection with trade-offs, and governance approval records.
Every learned component in the system architecture, including embedding models, re-ranking models, classification heads, and auxiliary safety classifiers, proportionate to influence on outputs.
Model cards capture intended use, disaggregated performance metrics, limitations, training data summary, and ethical considerations in a standardised format suitable for Module 3 evidence.