We use cookies to improve your experience and analyse site traffic.
Article 13 transparency and Article 14 human oversight require that AI system outputs be understandable to those who rely on them. The choice of model architecture directly constrains an organisation's ability to satisfy these requirements, making explainability a selection criterion rather than a post-development concern.
The choice of model architecture directly constrains an organisation's ability to satisfy Article 13 transparency and Article 14 human oversight requirements.
The choice of model architecture directly constrains an organisation's ability to satisfy Article 13 transparency and Article 14 human oversight requirements. These articles require that system outputs be understandable to the humans who rely on them and that oversight personnel have the information they need to exercise effective oversight. explainability properties must be assessed during model selection by the Technical SME, not retrofitted after development.
Model selection should therefore evaluate whether a candidate architecture can produce explanations that serve all three required audiences: operators, affected persons, and auditors. An architecture that generates technically precise but opaque explanations, such as raw attention weights from a transformer, may satisfy auditors while failing operators and affected persons entirely. Model Selection for EU AI Act Compliance covers the broader selection framework within which these explainability assessments sit.
The consequence of deferring explainability assessment is significant. Post-hoc explanation methods are approximations of the model's reasoning, and their fidelity varies substantially by architecture. Selecting a model without understanding its explainability properties creates a compliance risk that may only surface during conformity assessment or market surveillance. The model selection process must therefore evaluate explainability alongside accuracy, treating it as a hard constraint rather than a desirable property.
Some model architectures provide intrinsic explainability, where the decision process is transparent by construction.
Some model architectures provide intrinsic explainability, where the decision process is transparent by construction. Linear models expose their coefficients directly. Decision trees expose their branching logic. Rule-based systems expose the rules that fired for each decision.
Other architectures require post-hoc explanation methods that approximate the model's reasoning after the fact. SHAP values, LIME, attention maps, and counterfactual explanations fall into this category. Each of these methods has known limitations in two critical dimensions: fidelity (how accurately the explanation reflects the model's actual reasoning) and stability (whether the explanation changes if the input is perturbed slightly).
For simple models such as linear models, decision trees, and small rule sets, explanations can be generated manually by inspecting the model's internal state. The coefficient for each feature is documented; the explanation is the feature values multiplied by their coefficients. For decision trees, the explanation is the path from root to leaf for the specific prediction. For rule-based systems, the explanation is the triggered rule.
Post-hoc explainability for complex models, including neural networks, gradient-boosted ensembles, and LLMs, cannot be achieved without tooling. Without these methods, the organisation loses the ability to generate per-decision explanations entirely. SHAP and LIME are open-source and free, with the only cost being compute time, making the barrier to adoption a matter of engineering effort rather than licensing expense.
The AISDP documents three distinct audiences for explanations, each with different needs that the explanation system must address simultaneously.
The AISDP documents three distinct audiences for explanations, each with different needs that the explanation system must address simultaneously. Operators need to understand the system's recommendation well enough to exercise independent judgement. Affected persons need to understand, in non-technical terms, the factors that influenced a decision affecting them. Auditors need to verify that the system's behaviour is consistent with the documented design.
Operators need explanations that enable them to review a system recommendation critically. For a recruitment screening system, the operator needs to see which features drove the ranking, how confident the system is, and where the prediction sits relative to the decision boundary. SHAP values are particularly effective here: they decompose each prediction into the contribution of each input feature, expressed in the same units as the model's output. For tree-based models (XGBoost, LightGBM, CatBoost), SHAP values are exact and fast to compute via the TreeExplainer algorithm. For deep learning models, KernelSHAP or DeepSHAP provide approximations at higher computational cost.
Affected persons need explanations in plain language that help them understand why a decision went a particular way and what they could change. The Technical SME translates technical feature attributions into natural language. Alibi Explain's counterfactual explanations are particularly useful for this audience because they answer the question "what would need to be different for the outcome to change?" in concrete terms.
Auditors reviewing explanations across a sample of predictions should see patterns that match the model's documented feature importance. If the AISDP states that the model primarily relies on features A, B, and C, but explanations consistently attribute decisions to feature D (a proxy variable not intended to be influential), the auditor has found a discrepancy that requires investigation. The explanation system must therefore serve all three audiences simultaneously, and the model selection process must assess whether the candidate architecture can do so.
Operator-facing explanation delivery is integrated into the oversight interface.
Operator-facing explanation delivery is integrated into the oversight interface. Each case presented for review includes the system's recommendation, the confidence score, and the explanation. The Technical SME calibrates the explanation format to the operator's expertise: for a trained assessor reviewing credit applications, feature attributions expressed as percentage contributions may be appropriate; for a caseworker reviewing benefits eligibility, a ranked list of the top three factors in plain language is more useful. The format should be tested with actual operators during the interface design process and refined based on their feedback.
For affected-person explanations, the delivery depends on the deployment context. In some systems, the explanation is provided directly, such as a rejection letter explaining the factors behind the decision. In others, the explanation is provided on request, retrieved from the logging infrastructure when the affected person asks why a decision was made. Article 86 requires deployers to provide meaningful information about the basis for the decision, which the provider must support through the Instructions for Use and the explanation infrastructure.
Plain-language explanation templates bridge the gap between technical feature attributions and human-readable explanations. A template might read: "Your application was assessed based on [factor_1], [factor_2], and [factor_3]. The most significant factor was [factor_1_name], which [direction] the likelihood of [outcome]." The SHAP or LIME feature attributions populate the template, with the top N features by absolute attribution value inserted into the template slots and the attribution sign determining the direction language. These templates must be domain-specific (different language for recruitment, credit, benefits, and healthcare contexts) and reviewed by the AI Governance Lead for accuracy and clarity. The distinction matters because a recruitment rejection requires different explanatory language than a credit decision or a benefits eligibility assessment.
For LLM-based systems, feature attribution methods do not apply meaningfully to generative models.
For LLM-based systems, feature attribution methods do not apply meaningfully to generative models. Explanations instead rely on three approaches: chain-of-thought prompting, which instructs the model to articulate its reasoning step by step; retrieval source citation, which identifies the retrieved documents that contributed to a RAG response; and structured output formats, which require the model to separate its answer from its reasoning. Grounding Verification: Hallucination Detection as Compliance Control addresses the specific challenge of verifying that retrieved sources actually support the generated output.
These explanations are inherently less precise than SHAP values. The model's stated reasoning may not accurately reflect its internal computations. This is a fundamental limitation of LLM-based explainability: the explanation is a generated output, not a mathematical decomposition of the decision process. The AISDP must document this limitation and describe the compensating controls, including output quality monitoring and human evaluation sampling, that provide assurance about explanation quality. Tool-Use Governance and Reasoning Trace Logging covers the related challenge of logging reasoning traces for agentic systems that invoke external tools.
Fidelity testing is the quality assurance step that most organisations skip, to their detriment.
Fidelity testing is the quality assurance step that most organisations skip, to their detriment. An explanation method has high fidelity if it accurately reflects what the model actually did, and low fidelity if it produces a plausible-sounding narrative that does not correspond to the model's internal computations. LIME, for instance, fits a local linear model around each prediction point and presents the linear model's coefficients as the explanation. If the true decision boundary is highly non-linear in that region, the LIME explanation may be misleading.
Fidelity testing involves systematically perturbing input features according to the explanation's attributions and verifying that the model's output changes as the explanation predicts. If removing the feature that the explanation identifies as most important does not meaningfully change the output, the explanation is unfaithful and cannot be relied upon for compliance purposes. The Technical SME automates this testing and runs it on a representative sample of predictions as part of the CI pipeline, ensuring that explanation quality is continuously validated rather than assessed once and assumed to hold.
Explanation quality monitoring in production tracks whether explanations remain faithful and useful over time. Three complementary monitoring approaches are recommended. Fidelity testing should be run periodically on production predictions, comparing explanations against the model's actual sensitivity to feature perturbations. Explanation pattern monitoring, which tracks the distribution of top features in explanations over time, provides an early warning signal for model behaviour drift: if the features cited in explanations change without a corresponding model update, something has shifted. Human evaluation sampling, which presents a random sample of explanation-output pairs to domain experts for quality rating, provides a ground truth against which automated quality metrics can be calibrated.
Post-hoc explanation methods impose significant computational cost that must be assessed during model selection.
Post-hoc explanation methods impose significant computational cost that must be assessed during model selection. SHAP values for complex models can require thousands of model evaluations per explanation. For high-throughput systems, this cost may be prohibitive. The model selection should estimate the computational overhead of the explanation method required for the candidate architecture and assess whether it is compatible with the system's latency and throughput requirements.
For a system processing thousands of predictions per day, generating SHAP explanations for every prediction may be infeasible. Practical approaches include: generating full explanations for a random sample, which is sufficient for monitoring and audit purposes; generating lightweight explanations covering the top three features only for all predictions; and pre-computing explanations for common input patterns and caching the results.
The AISDP must document the explanation coverage (what proportion of predictions receive full explanations), the method used, and the computational overhead. This documentation is essential for conformity assessment, where the notified body will need to understand the trade-offs the organisation has made between explanation completeness and operational feasibility.
The right to explanation under Article 86 and the transparency requirements under Article 13 create an obligation to provide meaningful information to deployers and affected persons.
The right to explanation under Article 86 and the transparency requirements under Article 13 create an obligation to provide meaningful information to deployers and affected persons. The European Accessibility Act (Directive (EU) 2019/882), which applies to a broad range of digital products and services, imposes additional requirements on how that information is delivered. Where the high-risk AI system falls within the EAA's scope (products and services placed on the market after 28 June 2025), the explanations, the Instructions for Use, and the transparency disclosures must be accessible to persons with disabilities.
In practice, explanation interfaces, whether web-based dashboards for operators or notification letters for affected persons, must comply with the Web Content Accessibility Guidelines (WCAG) 2.1 at Level AA or the applicable harmonised standard (EN 301 549). Explanations delivered as text must be compatible with screen readers. Explanations that rely on visual elements such as feature attribution charts, SHAP waterfall plots, or colour-coded risk indicators must provide text alternatives that convey the same information. Interactive explanation interfaces must be navigable by keyboard alone.
For affected-person explanations delivered by letter or email, the language must be clear and plain, avoiding technical jargon that creates a comprehension barrier even for persons without disabilities. Where the deployment context involves a population with known accessibility needs (healthcare, social services, public administration), the AI Governance Lead should commission an accessibility review of the explanation templates during the interface design phase. The review should involve persons with disabilities or their representative organisations, consistent with the EAA's emphasis on user involvement in the design of accessible products and services.
No. Explainability properties must be assessed during model selection by the Technical SME. Post-hoc explanation methods are approximations with varying fidelity by architecture, and selecting a model without understanding its explainability properties creates compliance risk.
Practical approaches include generating full explanations for a random sample sufficient for monitoring and audit, generating lightweight top-three-feature explanations for all predictions, and pre-computing explanations for common input patterns. The AISDP must document the coverage and computational overhead.
Yes, where the system falls within the European Accessibility Act's scope. Explanation interfaces must comply with WCAG 2.1 Level AA, provide text alternatives for visual elements like SHAP waterfall plots, and be navigable by keyboard alone.
Three audiences: operators who need feature attributions to exercise independent judgement, affected persons who need plain-language summaries to understand decisions, and auditors who need consistency checks against documented system design.
Operator explanations are integrated into the oversight interface with format calibrated to expertise. Affected-person explanations use plain-language templates populated by feature attributions, with delivery via direct notification or on-request retrieval.
Feature attribution methods do not apply to generative models. Instead, LLM explanations rely on chain-of-thought prompting, retrieval source citation, and structured output formats, though these are inherently less precise than SHAP values.
Fidelity testing systematically perturbs input features according to explanation attributions and verifies the model output changes as predicted. If removing the most important feature does not change the output, the explanation is unfaithful.
The AISDP should document the accessibility measures applied to each explanation channel, the WCAG conformance level achieved, and any residual accessibility gaps with their justification. The post-market monitoring plan should include periodic accessibility audits, particularly after updates to the explanation templates or the oversight interface.