We use cookies to improve your experience and analyse site traffic.
Hallucination in retrieval-augmented generation systems produces ungrounded outputs that directly engage the accuracy, transparency, and safety requirements of the EU AI Act. Grounding verification provides the technical control to detect and manage this risk at inference time.
Hallucination, the generation of outputs not supported by retrieved context, represents the central compliance risk for retrieval-augmented generation systems.
Hallucination, the generation of outputs not supported by retrieved context, represents the central compliance risk for retrieval-augmented generation systems. A RAG system that hallucinates produces ungrounded outputs that may be incorrect, misleading, or harmful to the people affected by the system's decisions. For high-risk applications governed by the eu ai act, hallucination is not merely an accuracy problem; it is a safety and fundamental rights risk that directly engages the requirements of Articles 9, 13, and 15.
The regulatory framework treats output accuracy as a core compliance obligation. RAG Pipeline Compliance establishes the broader context for retrieval-augmented systems, where the combination of a general-purpose AI model with an organisation's knowledge base creates a composite system subject to the full range of high-risk requirements. When that system produces outputs containing claims not grounded in the retrieved documents, it fails the accuracy requirements that underpin conformity assessment.
Grounding verification provides the technical mechanism to detect and manage hallucination at inference time, transforming an intractable model behaviour problem into a measurable, auditable compliance control. Without this layer, organisations deploying RAG systems in high-risk contexts have no systematic means to demonstrate that the system's outputs are traceable to authoritative source material.
The grounding verification layer checks whether each claim in the GPAI model's output is supported by the retrieved documents, producing a grounding score for every inference.
The grounding verification layer checks whether each claim in the GPAI model's output is supported by the retrieved documents, producing a grounding score for every inference. This layer operates at inference time, sitting between the model's raw output and the response delivered to the user or downstream system. The verification produces a quantitative grounding score that determines whether an output is released, flagged for review, or suppressed.
Outputs that fall below the grounding threshold follow a defined escalation path. Depending on the system's risk profile and the severity of potential harm, flagged outputs may be held for human review, annotated with a disclaimer, or suppressed entirely. The ai governance lead defines the grounding policy, including the threshold values and escalation actions, while the Technical SME implements the verification pipeline.
The grounding verification results are logged for every inference and documented in the AI System Description Profile Module 5 as part of the system's accuracy evaluation. This creates a continuous record linking each output to its source evidence, supporting both operational monitoring and regulatory audit. Post-Market Monitoring Metrics covers how grounding metrics feed into broader post-market surveillance.
Three approaches are available for grounding verification, listed in increasing order of sophistication and implementation complexity.
Three approaches are available for grounding verification, listed in increasing order of sophistication and implementation complexity. Each approach trades off auditability against detection coverage, and organisations may combine approaches for defence in depth.
Citation verification is the simplest and most auditable approach. The system requires the GPAI model to cite specific passages from the retrieved documents, and the verification layer checks that the cited passages exist and support the claim. This method produces transparent, easily reviewed evidence: auditors can trace each output claim to the exact source passage. The limitation is that it depends on the model's ability to generate accurate citations, which varies across providers and prompt configurations.
Entailment checking uses a separate natural language inference model to evaluate whether each claim in the output is entailed by the retrieved context. This approach operates independently of the GPAI model's citation behaviour, providing a second-opinion verification. The NLI model assesses the logical relationship between output claims and source passages, classifying each as entailed, contradicted, or neutral. This approach catches hallucinations that citation verification misses, particularly when the model generates plausible but unsupported paraphrases.
Fact extraction and matching is the most sophisticated approach. The system extracts discrete factual claims from the output and matches them against facts extracted from the retrieved documents. This method provides the most granular analysis, identifying exactly which claims lack source support. The trade-off is implementation complexity: maintaining robust fact extraction pipelines across diverse domains requires significant engineering investment.
The system's response to a grounding failure must be defined in the AI System Description Profile and implemented in the inference pipeline before deployment.
The system's response to a grounding failure must be defined in the AI System Description Profile and implemented in the inference pipeline before deployment. The failure handling policy is a compliance artefact, not merely an engineering decision, because it determines how the system behaves when it cannot demonstrate output accuracy.
Options for handling grounding failures range from soft to hard interventions. At the softest end, the system appends a disclaimer to the output: "This response could not be fully verified against the knowledge base." This alerts users but still delivers the potentially ungrounded content. A middle option flags the output for human review before delivery, introducing a delay but maintaining a human-in-the-loop safeguard. At the hardest end, the system suppresses the output entirely and returns a fallback response.
For high-risk systems where ungrounded outputs could cause harm, suppression is the safer default. The choice between these options depends on the system's risk classification, the downstream consequences of an ungrounded output, and the availability of human reviewers. Human Oversight Mechanisms provides detailed guidance on structuring the human review pathway for flagged outputs.
The AI Governance Lead defines the grounding failure policy based on the system's risk assessment. The Technical SME implements the policy in the inference pipeline, ensuring that failure handling operates consistently across all deployment environments.
The grounding verification layer produces structured evidence that can be audited by both internal governance teams and external conformity assessment bodies.
The grounding verification layer produces structured evidence that can be audited by both internal governance teams and external conformity assessment bodies. Each inference generates a grounding report that provides a complete chain of evidence from query through retrieval to output verification.
A grounding report contains six elements: the original query; the retrieved documents with their identifiers and relevance scores; the model's raw output; the grounding analysis showing which claims are supported and which are not; the overall grounding score; and the action taken, whether the output was passed, flagged, or suppressed. This structure ensures that any individual output can be fully reconstructed and evaluated during audit.
The engineering team stores grounding reports in the logging infrastructure alongside standard inference logs. Aggregate grounding metrics are tracked as post-market monitoring indicators in AISDP Module 12: mean grounding score, proportion of flagged outputs, and proportion of suppressed outputs. These aggregate metrics are monitored for drift over time. A sustained decline in mean grounding score, or an increase in the suppression rate, may indicate knowledge base degradation, model drift, or changes in query patterns that warrant investigation.
Where automated grounding verification is not yet implemented, human reviewers can assess the grounding of the system's outputs as a compensating control.
Where automated grounding verification is not yet implemented, human reviewers can assess the grounding of the system's outputs as a compensating control. This approach is feasible only for low-volume systems where every output is reviewed by a qualified person before delivery to the end user. The reviewer checks each output against the retrieved documents and records whether it is fully grounded, partially grounded, or ungrounded.
The review records are retained as Module 5 evidence, providing the same traceability as automated verification, albeit at significantly higher operational cost. Each review must document the reviewer's identity, the timestamp, the documents consulted, and the grounding assessment with a classification of fully grounded, partially grounded, or ungrounded. This creates an audit trail equivalent to the automated grounding report, suitable for conformity assessment purposes.
For systems producing more than approximately fifty outputs per day, manual grounding verification becomes unsustainable without disproportionate staffing. Organisations relying on manual review should plan a transition to automated verification, with the manual process serving as an interim control during development. The timeline for this transition should be documented in the system's risk management plan and monitored by the AI Governance Lead.
RAG systems face a specific prompt injection attack vector distinct from direct prompt injection: indirect prompt injection through the knowledge base.
RAG systems face a specific prompt injection attack vector distinct from direct prompt injection: indirect prompt injection through the knowledge base. An adversary who can insert or modify documents in the knowledge base can embed instructions that the GPAI model may follow when the document is retrieved as context. This attack exploits the trust boundary between the retrieval pipeline and the model's context window.
The knowledge base itself is the attack surface. Any pathway through which documents enter the knowledge base is a potential injection vector. These include user-uploaded documents, web-scraped content, partner data feeds, and automated document ingestion pipelines. A malicious document containing hidden instructions, such as directives to override the system's intended behaviour, may be retrieved by the RAG pipeline and processed by the GPAI model as context, producing outputs that violate the system's safety constraints.
The cybersecurity testing programme should include specific test cases for indirect prompt injection through the knowledge base. Red-teaming exercises should attempt to inject malicious documents through each ingestion pathway and verify that the defence layers detect and block the injection. Results are documented in AISDP Module 9 as part of the system's security assessment, establishing that the organisation has tested and mitigated this RAG-specific threat vector.
Four layers of defence protect against indirect prompt injection through the knowledge base, each addressing a different point in the attack chain.
Four layers of defence protect against indirect prompt injection through the knowledge base, each addressing a different point in the attack chain. Implementing all four layers provides defence in depth; no single layer is sufficient on its own.
Input sanitisation screens documents for injection patterns before they enter the knowledge base. This first layer catches known injection techniques at the point of ingestion, preventing malicious content from reaching the retrieval pipeline. The sanitisation rules should be updated as new injection patterns are discovered through security research and red-teaming exercises.
Content isolation ensures the GPAI model's system prompt is protected from override by retrieved content. This layer uses the provider's recommended separation mechanisms to maintain a clear boundary between trusted instructions and retrieved context. The goal is to ensure that even if a malicious document is retrieved, it cannot override the model's core behavioural constraints.
Output validation checks the model's output against expected behaviour boundaries regardless of the retrieved context. This layer catches cases where injection has bypassed the earlier defences and influenced the model's output.
Retrieval filtering, the fourth layer, excludes documents that have been flagged as potentially adversarial from the retrieval results entirely, removing them from the candidate set before they can influence the model. The cybersecurity testing programme should include specific test cases for indirect prompt injection through each ingestion pathway, with results documented in AISDP Module 9.
Whether a change to the knowledge base, without any change to the GPAI model or system architecture, constitutes a SUBSTANTIAL MODIFICATION under Article 3(23) is a critical compliance question without a definitive regulatory answer.
Whether a change to the knowledge base, without any change to the GPAI model or system architecture, constitutes a substantial modification under Article 3(23) is a critical compliance question without a definitive regulatory answer. The following analysis reflects the current interpretive position and the practical approach recommended for high-risk RAG deployments.
The case for treating knowledge base changes as substantial modifications is straightforward: a material change to the knowledge base changes the information available to the model, which changes outputs, which may change compliance with Articles 9 through 15. Updates that introduce new bias, contain incorrect information about a protected group, or remove previously available information may alter the system's fairness, accuracy, and safety profiles.
The case against is equally compelling: knowledge base updates are normal operational activity for any information system, and treating every document addition or removal as a substantial modification would make RAG systems operationally impractical for high-risk applications, because every update would trigger a new conformity assessment.
The practical approach is to define a materiality threshold for knowledge base changes. Changes below the threshold, such as routine document additions within the existing domain, corrections to factual errors, or removal of outdated documents, are treated as normal operational maintenance documented in AISDP Module 12. Changes above the threshold, such as introduction of a new domain, material change to demographic coverage, or significant shifts in source distribution, trigger the governance pipeline's risk gate. The materiality threshold should reference quantitative indicators: knowledge base size change exceeding twenty per cent, document source distribution change exceeding a defined divergence metric, or grounding score shift in sentinel evaluation exceeding a defined tolerance. The threshold is documented in the AISDP and reviewed quarterly.
Citation verification is the simplest and most auditable approach, but it depends on the model's ability to generate accurate citations. Combining it with entailment checking provides defence in depth, catching hallucinations where the model generates plausible but unsupported paraphrases.
Suppression is the recommended default for high-risk systems where ungrounded outputs could cause harm. The output is withheld entirely and a fallback response is returned. The failure handling policy must be defined in the AISDP before deployment.
Implement four defensive layers: input sanitisation at ingestion, content isolation to protect system prompts, output validation against behaviour boundaries, and retrieval filtering to exclude flagged documents. Red-team each ingestion pathway and document results in AISDP Module 9.
Three approaches in increasing sophistication: citation verification, entailment checking with a separate NLI model, and fact extraction and matching against retrieved documents.
Options range from appending disclaimers to flagging for human review to suppressing the output entirely, with suppression recommended as the default for high-risk systems.
Adversaries can embed instructions in knowledge base documents that the model follows when retrieved as context, exploiting the trust boundary between retrieval and generation.
Organisations define a materiality threshold: routine updates are operational maintenance, while changes exceeding quantitative indicators trigger the governance risk gate.