We use cookies to improve your experience and analyse site traffic.
The EU AI Act requires high-risk AI systems to implement systematic controls against both intent drift and outcome drift. The eight-layer reference architecture structures these controls so that each layer can be independently validated, monitored, and audited, creating a defence-in-depth approach to compliance.
A high-risk AI system designed for EU AI Act compliance is structured as a layered architecture where each layer provides specific compensating protections against two categories of failure: intent drift and outcome drift.
A high-risk AI system designed for EU AI Act compliance is structured as a layered architecture where each layer provides specific compensating protections against two categories of failure: intent drift and outcome drift. Intent drift occurs when the system's behaviour diverges from its stated business intent. Outcome drift occurs when the system's outputs change over time in ways that affect fairness, accuracy, or safety.
The eight-layer reference architecture separates concerns so that each layer can be independently validated, monitored, and audited. Data enters through the ingestion layer, passes through feature engineering and model inference, is shaped by post-processing, explained by the explainability layer, reviewed through the human oversight interface, recorded by the logging layer, and continuously observed by the monitoring layer. Each layer implements controls against both intent drift and outcome drift, creating a defence-in-depth structure where a failure at one layer is caught by controls at adjacent layers.
This layered approach supports the System Architecture and Technical Documentation requirements by making the system's internal structure transparent and auditable. It also provides the evidentiary foundation for conformity assessment: each layer generates the records needed to demonstrate compliance with specific articles of the regulation.
The data ingestion layer is the system's first contact with external data, and the point at which malformed, adversarial, or out-of-distribution data is intercepted.
The data ingestion layer is the system's first contact with external data, and the point at which malformed, adversarial, or out-of-distribution data is intercepted. Every data source has a defined contract consisting of a schema (field names, types, formats), a quality specification (acceptable missing-value rates, value ranges, distributional properties), and a freshness requirement (maximum acceptable age of records). The ingestion pipeline enforces these contracts before data enters the system.
Schema validation ensures every incoming data record conforms to expected types, formats, and value ranges. Records that do not conform are rejected with a logged error, not silently coerced. Input range enforcement checks numerical features against expected ranges derived from the training data distribution, flagging values outside the range as potential anomalies. This prevents the system from generating outputs based on inputs outside the domain for which it was validated.
Prohibited feature blocking enforces the exclusion of features the organisation has determined should not be used, such as photographs, home addresses, or other features identified as proxies for protected characteristics. This is a hard technical control, not a policy instruction. Data minimisation strips personal data not required for the system's intended purpose at the ingestion layer, supporting GDPR data minimisation requirements and reducing the attack surface for data breaches.
The practical risk this addresses is intent drift at the data source. Upstream systems change: a CRM vendor modifies a field's enumeration values, a data provider changes date encoding, or a partner organisation alters how it computes a derived field. Without boundary validation, these changes enter the training pipeline and alter the model's learning signal. With boundary validation, they are caught at ingestion, quarantined, and investigated before they can cause harm. The investigation log becomes evidence for the compliance record.
For batch ingestion, expectation suites run against each incoming dataset to validate these contracts. For streaming ingestion, schema validation is enforced on every message, with messages that do not conform rejected to a dead-letter queue for investigation. Failures at this layer propagate silently: a schema change in a source system or a subtle distributional shift may avoid triggering an error while still altering the model's behaviour in ways that surface only later as performance degradation or fairness drift.
For controls against outcome drift, the ingestion layer computes real-time summary statistics (mean, variance, quantile distributions) for incoming data and compares them against the training data baseline. Statistically significant shifts are reported to the monitoring layer. Every data record received is logged with a timestamp, source identifier, schema validation result, and a hash of the data content, supporting the Article 12 traceability requirement and enabling post-hoc reconstruction of the system's inputs for any given decision.
The feature engineering layer transforms raw data into the feature vectors consumed by the model, and is where proxy variable risks materialise, data quality issues compound, and undocumented transformations can introduce hidden bias.
The feature engineering layer transforms raw data into the feature vectors consumed by the model, and is where proxy variable risks materialise, data quality issues compound, and undocumented transformations can introduce hidden bias. The primary control against intent drift at this layer is training-serving consistency, enforced through a centralised feature registry and shared transformation code.
Every feature is defined in a central feature store that records its name, source, transformation logic, data type, expected distribution, business justification for inclusion, and an assessment of its proxy variable risk. New features cannot be added to the production system without registry approval. Feature parity enforcement ensures the features computed for inference are identical to those used during training and validation. Feature parity violations, often called training-serving skew, cause the model to receive inputs it was not trained to process, leading to unpredictable outputs.
Feature stores such as Feast, Tecton, and Hopsworks centralise feature definitions so each feature has a single computation specification used for both training and serving. The store also versions feature values, making the exact features that trained a given model version retrievable. Feast is open-source and integrates with most cloud and on-premises data infrastructure. Tecton and Hopsworks are commercial offerings with additional capabilities around real-time feature computation and monitoring.
A pernicious failure mode occurs when the training feature pipeline and the serving feature pipeline are maintained by different teams, use different code paths, or run on different infrastructure. A model trained on features computed with one normalisation scheme that is served features computed with a slightly different scheme will produce silently degraded predictions. Feature stores eliminate this risk by providing a single computation specification for each feature.
The model inference layer executes the trained model against the feature vector and produces a raw output: a score, classification, generated text, or other result.
The model inference layer executes the trained model against the feature vector and produces a raw output: a score, classification, generated text, or other result. The foundational control against intent drift is model version pinning, where the inference service loads a specific, immutable model version from the model registry. Switching to a different version is a deployment event requiring human approval and CI/CD pipeline validation.
The model registry's stage management (experimental, staging, production, archived) enforces this constraint. Only models in the production stage can be loaded by the inference service, and promotion to production requires approval. Model registries such as MLflow, SageMaker, and Vertex AI all support this stage-gated pattern.
Confidence thresholding adds a safety net for uncertain predictions. Every classifier produces some form of confidence estimate: a probability score, a softmax output, or a distance from the decision boundary. Predictions where confidence falls below a defined threshold are routed to human review before being acted upon automatically. The Technical SME calibrates the threshold carefully: too high, and most predictions are sent to humans, defeating the purpose of automation; too low, and uncertain predictions slip through with potential adverse consequences. Calibration uses the validation dataset, setting the threshold at the level where the model's error rate for below-threshold predictions is unacceptably high. The threshold value, calibration methodology, and resulting human review volume are documented in the ai system description.
The post-processing layer applies thresholds, calibrations, business rules, and output formatting, shaping the raw model output into the actionable result that users and affected persons experience.
The post-processing layer applies thresholds, calibrations, business rules, and output formatting, shaping the raw model output into the actionable result that users and affected persons experience. The key control at this layer is transparent business rule application: every rule is documented with three elements consisting of the rule itself, the rationale for its existence, and the fairness impact assessment showing how the rule affects different subgroups.
Business rules can override the model's outputs in ways that affect fairness. A rule that automatically rejects applicants without a university degree, for instance, may disproportionately affect certain demographic groups. The Technical SME assesses each rule for fairness impact and documents the assessment. Calibration validation ensures that fairness calibrations applied at this layer (threshold adjustments, score corrections) are validated on representative data to confirm they achieve the intended fairness improvement without introducing unintended side effects.
Override logging at this layer is essential. Every instance where a business rule or fairness calibration changes the model's raw output is logged with the original output, the modified output, and the specific rule that triggered the modification. This log enables retrospective analysis: if the fairness profile shifts in production, the organisation can determine whether the shift originates in the model's predictions or in the post-processing rules.
For outcome drift, threshold stability monitoring tracks the proportion of inputs crossing decision thresholds over time. Changes in the crossing rate may indicate the model's score distribution has shifted, requiring threshold recalibration. The fairness metrics computed during development are periodically recomputed on production data passing through the post-processing layer, catching drift that affects final outputs rather than just raw predictions. Subgroup-specific threshold calibration tools can find thresholds that satisfy a constraint while maximising accuracy, though optimised thresholds require periodic re-validation on production data because distributional shifts can make previously optimal thresholds sub-optimal.
The explainability and human oversight layers work in tandem: the explainability layer generates the information the human overseer needs, while the oversight interface presents it and enforces the review workflow.
The explainability and human oversight layers work in tandem: the explainability layer generates the information the human overseer needs, while the oversight interface presents it and enforces the review workflow. If either layer fails, the human oversight obligation under Article 14 is undermined.
At the explainability layer, the production concern differs from the development concern. During development, explanations debug and validate the model. In production, explanations must be generated for every prediction or a defined subset, at speed, without degrading inference latency. SHAP TreeExplainer for gradient-boosted trees runs in near-linear time and is suitable for per-prediction explanation at high throughput. KernelSHAP and DeepSHAP for neural networks are significantly more expensive, often requiring hundreds or thousands of model evaluations per explanation.
The Technical SME validates explanations against the model's actual behaviour. An explanation that attributes a decision to Feature A when the model actually relied on Feature B is worse than no explanation, because it misleads the human overseer. Fidelity can be tested by comparing the explanation's feature attributions against the model's sensitivity to feature perturbations. The AI system description must document the explanation formats for each audience and the validation performed to confirm comprehensibility. Explanations for technical operators provide precise feature contributions and confidence indicators, while explanations for affected persons use plain language focused on the factors most relevant to the individual's situation.
The logging and audit layer captures a comprehensive record of system operation, supporting Article 12's automatic recording requirements.
The logging and audit layer captures a comprehensive record of system operation, supporting Article 12's automatic recording requirements. The word "automatic" is important: logging should be a structural property of the system, not something that depends on application code remembering to write a log entry. Logging and Traceability infrastructure instruments the application at the framework level, automatically capturing traces, spans, and structured log events.
Each inference request generates a trace that captures the input, feature values, model version, raw output, post-processing output, explanation, and operator action. The trace is exported to a logging backend in a structured format that can be queried. Immutability is enforced at the storage layer: logs are written in append-only format with cryptographic hash chains ensuring tamper evidence. No system component, user, or administrator can modify historical log entries.
Cloud storage configured in WORM mode (Write Once Read Many) prevents deletion or modification of log objects for a defined retention period. For organisations requiring the highest assurance, cryptographic hash chains add tamper evidence: each log entry includes a hash of the preceding entry, creating a chain that breaks visibly if any entry is modified or deleted. This is computationally inexpensive and can be implemented as a thin layer on top of any logging backend.
The comprehensiveness requirement means every material event is captured. The minimum event set for a high-risk AI system includes: data ingestion events recording source, timestamp, record count, and quality check result; feature computation events recording feature version and computation status; inference events recording input hash, model version, raw output, and confidence score; post-processing events recording rules applied with original and modified output; explanation events recording method and feature attributions; operator events recording review timestamp, decision, and override rationale; configuration change events recording what changed, who changed it, and when; and deployment events recording version deployed and approval evidence. Each event includes a correlation ID tying it to the specific inference request, enabling end-to-end trace retrieval. The logging layer must also support export of logs in formats suitable for regulatory inspection, available on demand within response timelines expected by national competent authorities.
The monitoring layer continuously observes the system's operational behaviour, tracking performance, fairness, data quality, and anomalous patterns.
The monitoring layer continuously observes the system's operational behaviour, tracking performance, fairness, data quality, and anomalous patterns. Its goal is to detect problems before they cause harm, serving two audiences: the engineering team (who need operational visibility) and the governance team (who need compliance visibility).
Intent alignment dashboards show the system's current behaviour relative to its documented intended purpose. If the system is intended to rank candidates with specified performance and fairness thresholds, the dashboard displays these metrics in real time with clear indication of whether the system is within specification. Statistical anomaly detection algorithms identify unusual patterns in the system's inputs, outputs, or operational metrics, triggering alerts and, above defined severity thresholds, automatic escalation to human reviewers.
Multi-dimensional drift monitoring tracks drift across multiple dimensions simultaneously: input feature distributions, output score distributions, fairness metrics, error rates, and operator override rates. Single-dimension monitoring may miss drift that manifests across multiple dimensions without crossing any individual threshold. The monitoring layer also includes specific checks for feedback loop effects, where the system's outputs influence the data subsequently used to evaluate or retrain the system. Detecting feedback loops requires comparing the training data distribution against production data distribution while controlling for the system's own influence on that distribution.
A layered monitoring approach works well in practice. At the base, infrastructure monitoring covers system health, latency, throughput, and error rates. Above it, the model monitoring layer adds ML-specific capabilities: feature drift, prediction drift, performance estimation, and fairness metrics. Above that, the governance layer aggregates model-layer metrics into compliance-relevant dashboards answering questions such as whether all declared thresholds are being met, when the last threshold breach occurred, and what the response was. This approach ensures operational teams see the technical detail they need while governance teams see the compliance summary, all drawn from the same underlying data. Override rate monitoring at the aggregate, per-deployer, and per-operator levels provides further signals: consistently low rates may indicate automation bias, while suddenly increasing rates may indicate degrading outputs.
Intent drift occurs when a system's behaviour diverges from its stated business intent, such as when upstream data source changes alter the model's learning signal. Outcome drift occurs when outputs change over time in ways that affect fairness, accuracy, or safety, often detected through distributional shift monitoring.
A feature store centralises feature definitions so each feature has a single computation specification used for both training and serving. This eliminates the risk of different teams or code paths computing features differently, which would cause the model to receive inputs it was not trained to process.
Four evidence-based countermeasures: displaying case data before the system's recommendation, enforcing a minimum dwell time before acceptance, showing confidence levels prominently with uncertainty highlighted, and injecting calibration cases with known outcomes to assess operator engagement.
Logs are written to storage in WORM mode (Write Once Read Many) preventing modification for a defined retention period. Cryptographic hash chains add tamper evidence by including a hash of the preceding entry in each log entry, creating a chain that breaks visibly if any entry is altered.
Drift can manifest across multiple dimensions without crossing any individual threshold. Multi-dimensional monitoring tracks input feature distributions, output score distributions, fairness metrics, error rates, and operator override rates simultaneously, catching drift that single-dimension monitoring would miss.
The data ingestion layer enforces schema validation, input range enforcement, prohibited feature blocking, and data minimisation, catching upstream changes before they alter the model's learning signal.
Feature engineering requires a centralised feature registry, training-serving consistency enforcement, proxy variable audits, and continuous distribution monitoring against training baselines.
Model inference maintains alignment through version pinning, confidence thresholding for uncertain predictions, output constraint enforcement, and per-prediction feature attribution.
Post-processing applies business rules, calibrations, and thresholds to model outputs, with each rule documented for fairness impact and every override logged for retrospective analysis.
The explainability layer generates per-prediction attributions while the oversight interface presents them and enforces review workflows with automation bias countermeasures.
The logging layer must capture every material event with immutable storage and cryptographic hash chains, supporting Article 12's automatic recording requirements with end-to-end trace retrieval.
The monitoring layer tracks multi-dimensional drift across input distributions, output distributions, fairness metrics, error rates, and override rates, with feedback loop detection and layered dashboards.
Each feature's correlation with protected characteristics is computed and recorded in the feature registry. Features exceeding a defined correlation threshold are reviewed by the Technical SME and the AI Governance Lead. Approved features above the threshold must have a documented justification explaining why their predictive value outweighs the proxy risk.
Beyond consistency, the feature engineering layer monitors feature distributions against the training baseline. If a feature's distribution in production diverges from its distribution during training, the model is operating in a regime it was not trained for, and performance may degrade in subgroup-specific ways. Automated feature distribution monitoring computes drift metrics (PSI, KS test, Jensen-Shannon divergence) per feature and alerts when thresholds are crossed. This monitoring should run continuously on production data and feed into the post-market monitoring framework. Feature transformation logic is version-controlled alongside model code, with each model version linked to the specific transformation version that produced its training features.
Output constraint enforcement serves as the last-resort guard. The inference layer enforces hard bounds on what the model can output: scores must fall within a defined range, classifications must be drawn from a defined set, and generated text must conform to length and format constraints. This prevents pathological model behaviour (extreme score values from adversarial inputs, hallucinated classification labels, excessively long outputs) from propagating downstream. Schema validation is a clean implementation approach: define the output schema, validate every inference output against it, and reject outputs that do not conform.
For outcome drift controls, per-prediction feature attribution is computed where architectures support it (tree ensembles, linear models, or neural networks with SHAP/LIME). The inference layer records feature contributions for each prediction, supporting explainability requirements and enabling post-hoc analysis of drift patterns. The distribution of the model's outputs is monitored in real time; shifts in output distribution may indicate that the model's behaviour is evolving in response to input distribution changes.
Production-grade explainability infrastructure hooks into the serving pipeline, computes feature attributions per prediction, stores them alongside the inference log, and provides monitoring dashboards that track explanation patterns over time. This monitoring capability matters for explanation consistency: if the dominant features in explanations shift over time without a corresponding model update, it may indicate that the model's behaviour is changing in response to input distribution shifts. An alert on explanation pattern change is a valuable early warning signal for outcome drift.
For the human oversight interface, the critical design principle is that the interface must make independent human judgement possible, not merely require a click. Article 14 requires operators to properly understand the AI system's relevant capacities and limitations and to correctly interpret its output. Four specific automation bias countermeasures have evidence behind them. A data-before-recommendation display shows underlying case data before revealing the system's recommendation. A minimum dwell time (typically 15 to 60 seconds) prevents rapid bulk-acceptance without review. Confidence visualisation displays the system's confidence level prominently, with uncertainty highlighted. Calibration cases injected at random intervals present operators with cases where the correct answer is known, recording whether the operator agrees with the system on cases where the system is wrong.
Operators who agree with the system on cases where the system is wrong are exhibiting automation bias, and this signal feeds into operator training and oversight review. Average review time per case serves as a proxy for review thoroughness: operators consistently reviewing cases in under 60 seconds are unlikely to be performing meaningful oversight.
The interface must technically prevent bypass. For High-Risk AI Systems there should be no API endpoint, configuration flag, or administrative override allowing outputs to be applied without human review. This is an architectural constraint, not a policy constraint: the deployment infrastructure is designed so that the only path from model inference to consequential action passes through the human review interface. Penetration testing should specifically test for human oversight bypass paths.
Override rate monitoring provides further signals of system health. The percentage of system recommendations that operators override is tracked at the aggregate level, per-deployer, and per-operator. Consistently low override rates may indicate automation bias. Suddenly increasing override rates may indicate that the system's outputs are degrading.