We use cookies to improve your experience and analyse site traffic.
Annex IV requires detailed documentation of the system's architecture, design, and computational infrastructure. This section structures that documentation around an eight-layer reference architecture spanning data ingestion through monitoring, with per-layer controls against intent and outcome drift.
AISDP Module 3 requires a description of the system's architecture, model type, algorithmic approach, key design choices, inputs and outputs, and the human-machine interaction design.
AISDP Module 3 requires a description of the system's architecture, model type, algorithmic approach, key design choices, inputs and outputs, and the human-machine interaction design. The Technical SME describes the architecture at a level of detail sufficient for a qualified technical reviewer to understand the system's structure and behaviour. If a competent external reviewer cannot reconstruct the system's design rationale and operational behaviour from the documentation, the documentation is insufficient.
Before any architectural work begins, the Business Owner grounds the development in a clear articulation of business intent, ethical commitment, and transparency principles. The statement of business intent must be precise: "to assist human recruiters in screening high-volume applications by ranking candidates against role-specific competency profiles" is adequate, whereas "to improve recruitment efficiency" is too vague to constrain design decisions. The ethical framework must translate principles into concrete design constraints: "the system must achieve a selection rate ratio of at least 0.90 across all measured protected characteristic subgroups" rather than "the system must not discriminate."
The architectural documentation should include multiple diagram types at different abstraction levels. System Context diagrams (C4 Level 1) establish the system boundary and external connections. Container diagrams (C4 Level 2) show major technical building blocks with technology choices. Component diagrams (C4 Level 3) show internal structure within complex containers. Data Flow diagrams trace the path of data from ingestion to output, essential for Article 12 traceability. Deployment diagrams show the physical or cloud infrastructure. Sequence diagrams illustrate critical interaction patterns including the human oversight workflow.
Interface specifications complete the documentation: API contracts via OpenAPI or Protocol Buffers, data contracts with schemas and value range expectations, and human interface specifications showing the information presented to operators and the workflow enforced. The Technical SME versions all diagrams alongside code and model artefacts to prevent documentation drift.
A high-risk AI system designed for EU AI Act compliance is structured as a layered architecture where each layer provides specific protections against intent drift, where the system's behaviour diverges from its stated purpose, and outcome drift, where the system's outputs change over time affecting fairness, accuracy, or safety.
A high-risk AI system designed for EU AI Act compliance is structured as a layered architecture where each layer provides specific protections against intent drift, where the system's behaviour diverges from its stated purpose, and outcome drift, where the system's outputs change over time affecting fairness, accuracy, or safety. The eight layers are: data ingestion, feature engineering, model inference, post-processing, explainability, human oversight interface, logging and audit, and monitoring.
Each layer implements controls in two categories. Controls against intent drift prevent the system from deviating from its documented intended purpose through technical enforcement rather than policy instruction. Controls against outcome drift detect and alert when the system's behaviour is changing in ways that may affect compliance, enabling timely intervention. The layered approach ensures that failures at one layer are caught by controls at subsequent layers, creating defence in depth.
The architecture feeds into AISDP Module 3 (Architecture and Design) and Module 7 (Human Oversight). Architecture decisions made at design time also have implications for the system's eventual decommissioning; systems designed with clear infrastructure-as-code definitions, isolated credential namespaces, and modular data storage are substantially easier to decommission in a controlled and auditable manner.
The data ingestion layer receives, validates, and normalises input data from deployer systems.
The data ingestion layer receives, validates, and normalises input data from deployer systems. It is the system's first contact with the outside world and the point at which malformed, adversarial, or out-of-distribution data is intercepted. Controls against intent drift include schema validation rejecting non-conforming records with logged errors rather than silent coercion, input range enforcement checking numerical features against training data distributions, prohibited feature blocking as a hard technical control excluding features identified as proxies for protected characteristics, and data minimisation stripping personal data not required for the intended purpose.
Controls against outcome drift include distribution monitoring computing real-time summary statistics and comparing them against the training baseline, and comprehensive logging recording every data record with timestamp, source identifier, validation result, and content hash for Article 12 traceability.
The feature engineering layer transforms raw data into feature vectors consumed by the model. This is where proxy variable risks materialise and where undocumented transformations can introduce hidden bias. A central feature registry records each feature's name, source, transformation logic, expected distribution, business justification, and proxy variable risk assessment. Feature parity enforcement ensures features computed for inference are identical to those used during training, preventing training-serving skew. The proxy variable audit computes each feature's correlation with protected characteristics; features exceeding defined thresholds require documented justification from the Technical SME and AI Governance Lead.
The model inference layer executes the trained model against the feature vector and produces a raw output.
The model inference layer executes the trained model against the feature vector and produces a raw output. Model version pinning ensures the inference service serves a specific, immutable model version from the registry, with updates requiring a deployment event with human approval. Confidence thresholding routes predictions below a defined threshold to human review before being acted upon, preventing the system from acting on uncertain predictions. Output constraint enforcement prevents pathological model behaviour from propagating downstream by enforcing hard constraints on output ranges and classification sets.
Per-prediction feature attribution using SHAP or LIME records feature contributions for each prediction, supporting explainability requirements. Prediction distribution monitoring tracks output distributions in real time to detect shifts indicating evolving model behaviour.
The post-processing layer applies thresholds, calibrations, business rules, and output formatting, shaping raw outputs into actionable results. Every business rule is documented in the aisdp with its rationale, effect on raw output, and interaction with fairness mitigations. Business rules can override model outputs in ways that affect fairness; the Technical SME assesses each rule for fairness impact. Calibration validations confirm that fairness adjustments achieve the intended improvement without unintended side effects. Override logging records every instance where a rule changes the model's raw output.
Threshold stability monitoring tracks the proportion of inputs crossing decision thresholds over time. Changes in crossing rates indicate score distribution shifts that may require threshold recalibration. Fairness metrics computed during development are periodically recomputed on production data passing through the post-processing layer, catching drift that affects final outputs rather than just raw predictions.
The explainability layer generates human-readable explanations of individual predictions, supporting the Article 14 human oversight requirement by providing the information that oversight operators need to evaluate outputs.
The explainability layer generates human-readable explanations of individual predictions, supporting the Article 14 human oversight requirement by providing the information that oversight operators need to evaluate outputs. Explanation fidelity validation ensures explanations reflect the model's actual behaviour; an explanation attributing a decision to Feature A when the model relied on Feature B is worse than no explanation because it misleads the human overseer. Fidelity is tested by comparing feature attributions against the model's sensitivity to feature perturbations.
Explanations must be audience-appropriate. Technical operators receive precise feature contributions and confidence indicators. Affected persons receive plain-language explanations focusing on factors most relevant to their situation. Explanation consistency monitoring detects when the dominant explanation for a particular decision type changes without a corresponding model update, indicating potential instability.
The human oversight interface is the component through which operators review, accept, override, or reject outputs. A mandatory review workflow enforces a review step before any output is acted upon; auto-acceptance configurations are technically prevented for high-risk systems. Automation bias countermeasures include presenting underlying data before revealing the system's recommendation, requiring minimum review duration, displaying confidence indicators prominently, and periodically presenting calibration cases with known outcomes.
Override capability is mandatory, with every override logged with operator identity, original recommendation, override decision, and stated rationale. Override rate monitoring at aggregate, per-deployer, and per-operator levels tracks system health. Review time monitoring uses average review time as a proxy for thoroughness; operators consistently reviewing cases in under 60 seconds are unlikely to be performing meaningful oversight. ensures that interface changes are tracked alongside model changes.
The logging and audit layer captures a comprehensive record of system operation supporting Article 12's automatic recording requirements.
The logging and audit layer captures a comprehensive record of system operation supporting Article 12's automatic recording requirements. Immutable logging in append-only format with cryptographic hash chains ensures tamper evidence; no system component, user, or administrator can modify historical entries. Comprehensive event coverage captures every material event: data ingestion, feature computation, model inference, post-processing, explanation generation, operator actions, configuration changes, deployment events, and monitoring alerts.
Log-based drift detection feeds aggregated data to the monitoring layer's algorithms. Changes in inference patterns, error rates, or operator behaviour provide early warning of outcome drift. A regulatory export capability supports export in formats suitable for inspection within competent authority response timelines.
The monitoring layer continuously observes operational behaviour across performance, fairness, data quality, and anomalous patterns. Intent alignment dashboards display the system's current behaviour relative to documented intended purpose with clear indication of whether the system is within specification. Anomaly detection identifies unusual patterns in inputs, outputs, or operational metrics, triggering alerts and above defined severity thresholds automatic escalation.
Multi-dimensional drift monitoring tracks drift across input feature distributions, output score distributions, fairness metrics, error rates, and operator override rates simultaneously. Single-dimension monitoring may miss drift that manifests across multiple dimensions without crossing any individual threshold. Feedback loop detection includes specific checks for effects where the system's outputs influence data subsequently used to evaluate or retrain the system, requiring comparison of training and production distributions while controlling for the system's own influence. The monitoring outputs feed into the framework.
Article 15 and Annex IV require documentation of the hardware and software environment.
Article 15 and Annex IV require documentation of the hardware and software environment. For cloud-hosted systems, the AISDP must specify the cloud provider, deployment region within the EU/EEA for high-risk systems processing personal data, specific services used across compute, databases, orchestration, model serving, and logging, and instance types with resource allocations including GPU/TPU specifications.
Cloud provider data processing agreements must be in place and referenced. Many organisations use managed AI services such as AWS SageMaker, Google Vertex AI, or Azure Machine Learning; the AISDP documents which services are used, what data flows through them, the provider's data handling practices, availability SLAs, and fallback strategies. For third-party model APIs, the documentation additionally covers the provider's model versioning policy, data retention practices, latency characteristics, and contractual commitments regarding behaviour stability.
Containerisation with Docker and orchestration with Kubernetes provide reproducible, versioned deployment environments. Module 3 captures the container image build process, private registry with access controls and image signing, orchestration configuration, and resource limits. Each container image is immutable, tagged with corresponding code and model versions, and stored in a private registry with access logging.
Disaster recovery and business continuity planning defines the recovery point objective and recovery time objective for the system. For high-risk AI systems, a model serving failure that forces deployers to make decisions without AI support may itself be a compliance concern. The disaster recovery plan covers model artefact backup and restoration, data pipeline failover, monitoring continuity during recovery, and communication to deployers during outages. Recovery procedures are tested at least annually with results documented in the AISDP.
Edge and on-premises deployments create compliance challenges distinct from cloud hosting.
Edge and on-premises deployments create compliance challenges distinct from cloud hosting. The system runs on infrastructure the provider does not control, making monitoring more difficult, updates harder to enforce, and incident response slower. The AISDP documents the edge deployment model including hardware specifications and minimum requirements, the model update mechanism covering how new model versions are delivered, validated, and activated, the monitoring approach defining what data is collected locally and what is transmitted to the provider, and the rollback procedure specifying how a faulty update is reversed.
For systems deployed across multiple EU member states, data sovereignty requirements add complexity. The AISDP documents the data residency policy for each data category, the mechanism ensuring personal data is processed within the declared region, the regulatory mapping showing which national competent authority has jurisdiction over which deployment, and the approach to language and localisation requirements. Some member states may impose additional requirements through national implementation measures; the Legal and Regulatory Advisor monitors the regulatory landscape in each deployment jurisdiction.
Multi-region architectures must ensure that monitoring data from all regions feeds into a unified PMM framework. A system that is compliant in one region but drifting in another must be detected through cross-region analysis. Data transfer between regions for monitoring purposes must comply with GDPR transfer rules, which typically requires an adequacy decision, standard contractual clauses, or processing within the EEA.
For organisations at earlier maturity levels, infrastructure can be documented with standard diagrams and spreadsheets. A cloud resource inventory spreadsheet listing every service, its region, and its purpose provides the foundation. Architecture diagrams maintained in standard drawing tools and reviewed quarterly against the deployed environment prevent drift. The key principle is that documented infrastructure must match deployed infrastructure; any discrepancy is a non-conformity.
System Context (C4 Level 1), Container (C4 Level 2), Component (C4 Level 3), Data Flow, Deployment, and Sequence diagrams. Each serves a different audience and abstraction level, from governance reviewers to engineering teams.
The monitoring layer includes specific checks comparing training data distributions against production data distributions while controlling for the system's own influence. Feedback loops where outputs influence subsequent training data are detected through this cross-distribution analysis.
The system runs on infrastructure the provider does not control, making monitoring more difficult, updates harder to enforce, and incident response slower. The AISDP must document the hardware specs, model update mechanism, local monitoring approach, and rollback procedure.
Feature stores centralise feature definitions with a single computation specification for both training and serving. Feature parity enforcement ensures production features match training features exactly.
Present underlying data before the recommendation, require minimum review duration, display confidence indicators prominently, and periodically present calibration cases with known outcomes to assess operator engagement.
Cloud provider, EU/EEA deployment region, specific services used, instance types with GPU/TPU specs, data processing agreements, managed AI service dependencies, container configurations, and disaster recovery plans.
Feature stores such as Feast, Tecton, or Hopsworks centralise feature definitions with a single computation specification used for both training and serving, preventing the pernicious failure mode where separate training and serving pipelines compute features differently. Feature distribution monitoring runs continuously on production data, computing drift metrics per feature and alerting when thresholds are crossed.