Why is hallucination a compliance risk for RAG systems?

Hallucination produces ungrounded outputs that may be incorrect or harmful, directly engaging accuracy and safety requirements under Articles 9 and 15 of the EU AI Act.

How does grounding verification work?

A verification layer checks whether each claim in the model's output is supported by retrieved documents, producing a grounding score that determines whether the output is released, flagged, or suppressed.

What implementation approaches exist for hallucination detection?

Three approaches in increasing sophistication: citation verification, entailment checking with a separate NLI model, and fact extraction and matching against retrieved documents.

What happens when grounding verification detects a failure?

Options range from appending disclaimers to flagging for human review to suppressing the output entirely, with suppression recommended as the default for high-risk systems.

How does prompt injection affect RAG system compliance?

Adversaries can embed instructions in knowledge base documents that the model follows when retrieved as context, exploiting the trust boundary between retrieval and generation.

When do knowledge base changes require re-assessment?

Organisations define a materiality threshold: routine updates are operational maintenance, while changes exceeding quantitative indicators trigger the governance risk gate.

Can we rely solely on citation verification for grounding?

Citation verification is the simplest and most auditable approach, but it depends on the model's ability to generate accurate citations. Combining it with entailment checking provides defence in depth, catching hallucinations where the model generates plausible but unsupported paraphrases.

What should the grounding failure response be for high-risk systems?

Suppression is the recommended default for high-risk systems where ungrounded outputs could cause harm. The output is withheld entirely and a fallback response is returned. The failure handling policy must be defined in the AISDP before deployment.

How do we handle prompt injection through uploaded documents?

Implement four defensive layers: input sanitisation at ingestion, content isolation to protect system prompts, output validation against behaviour boundaries, and retrieval filtering to exclude flagged documents. Red-team each ingestion pathway and document results in AISDP Module 9.

Can we rely solely on citation verification for grounding?

Citation verification is the simplest and most auditable approach, but it depends on the model's ability to generate accurate citations. Combining it with entailment checking provides defence in depth, catching hallucinations where the model generates plausible but unsupported paraphrases.

What should the grounding failure response be for high-risk systems?

Suppression is the recommended default for high-risk systems where ungrounded outputs could cause harm. The output is withheld entirely and a fallback response is returned. The failure handling policy must be defined in the AISDP before deployment.

How do we handle prompt injection through uploaded documents?

Implement four defensive layers: input sanitisation at ingestion, content isolation to protect system prompts, output validation against behaviour boundaries, and retrieval filtering to exclude flagged documents. Red-team each ingestion pathway and document results in AISDP Module 9.

Grounding Verification: Hallucination Detection as Compliance Control

Written by

Michael Clark

Chief Executive Officer, Standard Intelligence

Founder and CEO of Standard Intelligence. Author of the Practitioners Implementation Guide series for EU AI Act compliance.

Martin Dean

Chief Technology Officer, Standard Intelligence

CTO of Standard Intelligence. Leads platform engineering and contributes to the PIG series technical content.

Hallucination in retrieval-augmented generation systems produces ungrounded outputs that directly engage the accuracy, transparency, and safety requirements of the EU AI Act. Grounding verification provides the technical control to detect and manage this risk at inference time.

Abstract

Read abstract

Hallucination, the generation of outputs not supported by retrieved context, is the defining compliance risk for retrieval-augmented generation systems operating in high-risk contexts under the EU AI Act. This guide covers the full grounding verification lifecycle: architecture design, implementation approaches ranging from citation verification through entailment checking to fact extraction, and failure handling policies from disclaimers to full output suppression. Three implementation approaches are examined in detail, each trading off auditability against detection coverage. The guide addresses compensating controls including structured grounding reports that log every inference with source traceability, and the procedural alternative of manual review for low-volume systems. Two additional RAG-specific compliance challenges are covered: indirect prompt injection through the knowledge base, where four defensive layers protect against adversarial document injection, and the substantial modification question for knowledge base changes, where a materiality threshold approach balances compliance obligations against operational practicality. Quantitative indicators for the materiality threshold include knowledge base size changes exceeding twenty per cent and grounding score drift in sentinel evaluations.

Why is hallucination the defining compliance risk for RAG systems?

Regulatory Requirement

Hallucination, the generation of outputs not supported by retrieved context, represents the central compliance risk for retrieval-augmented generation systems.

Hallucination, the generation of outputs not supported by retrieved context, represents the central compliance risk for retrieval-augmented generation systems. A RAG system that hallucinates produces ungrounded outputs that may be incorrect, misleading, or harmful to the people affected by the system's decisions. For high-risk applications governed by the eu ai act, hallucination is not merely an accuracy problem; it is a safety and fundamental rights risk that directly engages the requirements of Articles 9, 13, and 15.

The regulatory framework treats output accuracy as a core compliance obligation. RAG Pipeline Compliance establishes the broader context for retrieval-augmented systems, where the combination of a general-purpose AI model with an organisation's knowledge base creates a composite system subject to the full range of high-risk requirements. When that system produces outputs containing claims not grounded in the retrieved documents, it fails the accuracy requirements that underpin conformity assessment.

Grounding verification provides the technical mechanism to detect and manage hallucination at inference time, transforming an intractable model behaviour problem into a measurable, auditable compliance control. Without this layer, organisations deploying RAG systems in high-risk contexts have no systematic means to demonstrate that the system's outputs are traceable to authoritative source material.

How does grounding verification architecture work?

Engineering Approach

The grounding verification layer checks whether each claim in the GPAI model's output is supported by the retrieved documents, producing a grounding score for every inference.

The grounding verification layer checks whether each claim in the GPAI model's output is supported by the retrieved documents, producing a grounding score for every inference. This layer operates at inference time, sitting between the model's raw output and the response delivered to the user or downstream system. The verification produces a quantitative grounding score that determines whether an output is released, flagged for review, or suppressed.

Outputs that fall below the grounding threshold follow a defined escalation path. Depending on the system's risk profile and the severity of potential harm, flagged outputs may be held for human review, annotated with a disclaimer, or suppressed entirely. The ai governance lead defines the grounding policy, including the threshold values and escalation actions, while the Technical SME implements the verification pipeline.

The grounding verification results are logged for every inference and documented in the AI System Description Profile Module 5 as part of the system's accuracy evaluation. This creates a continuous record linking each output to its source evidence, supporting both operational monitoring and regulatory audit. Post-Market Monitoring Metrics covers how grounding metrics feed into broader post-market surveillance.

Which implementation approaches are available for grounding verification?

Engineering Approach

Three approaches are available for grounding verification, listed in increasing order of sophistication and implementation complexity.

Three approaches are available for grounding verification, listed in increasing order of sophistication and implementation complexity. Each approach trades off auditability against detection coverage, and organisations may combine approaches for defence in depth.

Citation verification is the simplest and most auditable approach. The system requires the GPAI model to cite specific passages from the retrieved documents, and the verification layer checks that the cited passages exist and support the claim. This method produces transparent, easily reviewed evidence: auditors can trace each output claim to the exact source passage. The limitation is that it depends on the model's ability to generate accurate citations, which varies across providers and prompt configurations.

Entailment checking uses a separate natural language inference model to evaluate whether each claim in the output is entailed by the retrieved context. This approach operates independently of the GPAI model's citation behaviour, providing a second-opinion verification. The NLI model assesses the logical relationship between output claims and source passages, classifying each as entailed, contradicted, or neutral. This approach catches hallucinations that citation verification misses, particularly when the model generates plausible but unsupported paraphrases.

Fact extraction and matching is the most sophisticated approach. The system extracts discrete factual claims from the output and matches them against facts extracted from the retrieved documents. This method provides the most granular analysis, identifying exactly which claims lack source support. The trade-off is implementation complexity: maintaining robust fact extraction pipelines across diverse domains requires significant engineering investment.

How should the system handle grounding failures?

Engineering Approach

The system's response to a grounding failure must be defined in the AI System Description Profile and implemented in the inference pipeline before deployment.

The system's response to a grounding failure must be defined in the AI System Description Profile and implemented in the inference pipeline before deployment. The failure handling policy is a compliance artefact, not merely an engineering decision, because it determines how the system behaves when it cannot demonstrate output accuracy.

Options for handling grounding failures range from soft to hard interventions. At the softest end, the system appends a disclaimer to the output: "This response could not be fully verified against the knowledge base." This alerts users but still delivers the potentially ungrounded content. A middle option flags the output for human review before delivery, introducing a delay but maintaining a human-in-the-loop safeguard. At the hardest end, the system suppresses the output entirely and returns a fallback response.

For high-risk systems where ungrounded outputs could cause harm, suppression is the safer default. The choice between these options depends on the system's risk classification, the downstream consequences of an ungrounded output, and the availability of human reviewers. Human Oversight Mechanisms provides detailed guidance on structuring the human review pathway for flagged outputs.

The AI Governance Lead defines the grounding failure policy based on the system's risk assessment. The Technical SME implements the policy in the inference pipeline, ensuring that failure handling operates consistently across all deployment environments.

What evidence does the grounding verification layer produce?

Compensating Controls

The grounding verification layer produces structured evidence that can be audited by both internal governance teams and external conformity assessment bodies.

The grounding verification layer produces structured evidence that can be audited by both internal governance teams and external conformity assessment bodies. Each inference generates a grounding report that provides a complete chain of evidence from query through retrieval to output verification.

A grounding report contains six elements: the original query; the retrieved documents with their identifiers and relevance scores; the model's raw output; the grounding analysis showing which claims are supported and which are not; the overall grounding score; and the action taken, whether the output was passed, flagged, or suppressed. This structure ensures that any individual output can be fully reconstructed and evaluated during audit.

The engineering team stores grounding reports in the logging infrastructure alongside standard inference logs. Aggregate grounding metrics are tracked as post-market monitoring indicators in AISDP Module 12: mean grounding score, proportion of flagged outputs, and proportion of suppressed outputs. These aggregate metrics are monitored for drift over time. A sustained decline in mean grounding score, or an increase in the suppression rate, may indicate knowledge base degradation, model drift, or changes in query patterns that warrant investigation.

What is the procedural alternative to automated grounding verification?

Compensating Controls

Where automated grounding verification is not yet implemented, human reviewers can assess the grounding of the system's outputs as a compensating control.

Where automated grounding verification is not yet implemented, human reviewers can assess the grounding of the system's outputs as a compensating control. This approach is feasible only for low-volume systems where every output is reviewed by a qualified person before delivery to the end user. The reviewer checks each output against the retrieved documents and records whether it is fully grounded, partially grounded, or ungrounded.

The review records are retained as Module 5 evidence, providing the same traceability as automated verification, albeit at significantly higher operational cost. Each review must document the reviewer's identity, the timestamp, the documents consulted, and the grounding assessment with a classification of fully grounded, partially grounded, or ungrounded. This creates an audit trail equivalent to the automated grounding report, suitable for conformity assessment purposes.

For systems producing more than approximately fifty outputs per day, manual grounding verification becomes unsustainable without disproportionate staffing. Organisations relying on manual review should plan a transition to automated verification, with the manual process serving as an interim control during development. The timeline for this transition should be documented in the system's risk management plan and monitored by the AI Governance Lead.

How does prompt injection threaten RAG system compliance?

Regulatory Requirement

RAG systems face a specific prompt injection attack vector distinct from direct prompt injection: indirect prompt injection through the knowledge base.

RAG systems face a specific prompt injection attack vector distinct from direct prompt injection: indirect prompt injection through the knowledge base. An adversary who can insert or modify documents in the knowledge base can embed instructions that the GPAI model may follow when the document is retrieved as context. This attack exploits the trust boundary between the retrieval pipeline and the model's context window.

The knowledge base itself is the attack surface. Any pathway through which documents enter the knowledge base is a potential injection vector. These include user-uploaded documents, web-scraped content, partner data feeds, and automated document ingestion pipelines. A malicious document containing hidden instructions, such as directives to override the system's intended behaviour, may be retrieved by the RAG pipeline and processed by the GPAI model as context, producing outputs that violate the system's safety constraints.

The cybersecurity testing programme should include specific test cases for indirect prompt injection through the knowledge base. Red-teaming exercises should attempt to inject malicious documents through each ingestion pathway and verify that the defence layers detect and block the injection. Results are documented in AISDP Module 9 as part of the system's security assessment, establishing that the organisation has tested and mitigated this RAG-specific threat vector.

What defences protect against knowledge base injection?

Engineering Approach

Four layers of defence protect against indirect prompt injection through the knowledge base, each addressing a different point in the attack chain.

Four layers of defence protect against indirect prompt injection through the knowledge base, each addressing a different point in the attack chain. Implementing all four layers provides defence in depth; no single layer is sufficient on its own.

Input sanitisation screens documents for injection patterns before they enter the knowledge base. This first layer catches known injection techniques at the point of ingestion, preventing malicious content from reaching the retrieval pipeline. The sanitisation rules should be updated as new injection patterns are discovered through security research and red-teaming exercises.

Content isolation ensures the GPAI model's system prompt is protected from override by retrieved content. This layer uses the provider's recommended separation mechanisms to maintain a clear boundary between trusted instructions and retrieved context. The goal is to ensure that even if a malicious document is retrieved, it cannot override the model's core behavioural constraints.

Output validation checks the model's output against expected behaviour boundaries regardless of the retrieved context. This layer catches cases where injection has bypassed the earlier defences and influenced the model's output.

Retrieval filtering, the fourth layer, excludes documents that have been flagged as potentially adversarial from the retrieval results entirely, removing them from the candidate set before they can influence the model. The cybersecurity testing programme should include specific test cases for indirect prompt injection through each ingestion pathway, with results documented in AISDP Module 9.

When do knowledge base changes trigger a new conformity assessment?

Regulatory Requirement

Whether a change to the knowledge base, without any change to the GPAI model or system architecture, constitutes a SUBSTANTIAL MODIFICATION under Article 3(23) is a critical compliance question without a definitive regulatory answer.

Whether a change to the knowledge base, without any change to the GPAI model or system architecture, constitutes a substantial modification under Article 3(23) is a critical compliance question without a definitive regulatory answer. The following analysis reflects the current interpretive position and the practical approach recommended for high-risk RAG deployments.

The case for treating knowledge base changes as substantial modifications is straightforward: a material change to the knowledge base changes the information available to the model, which changes outputs, which may change compliance with Articles 9 through 15. Updates that introduce new bias, contain incorrect information about a protected group, or remove previously available information may alter the system's fairness, accuracy, and safety profiles.

The case against is equally compelling: knowledge base updates are normal operational activity for any information system, and treating every document addition or removal as a substantial modification would make RAG systems operationally impractical for high-risk applications, because every update would trigger a new conformity assessment.

The practical approach is to define a materiality threshold for knowledge base changes. Changes below the threshold, such as routine document additions within the existing domain, corrections to factual errors, or removal of outdated documents, are treated as normal operational maintenance documented in AISDP Module 12. Changes above the threshold, such as introduction of a new domain, material change to demographic coverage, or significant shifts in source distribution, trigger the governance pipeline's risk gate. The materiality threshold should reference quantitative indicators: knowledge base size change exceeding twenty per cent, document source distribution change exceeding a defined divergence metric, or grounding score shift in sentinel evaluation exceeding a defined tolerance. The threshold is documented in the AISDP and reviewed quarterly.

Grounding Verification: Hallucination Detection as Compliance Control

Written by

Why is hallucination the defining compliance risk for RAG systems?

How does grounding verification architecture work?

Which implementation approaches are available for grounding verification?

How should the system handle grounding failures?

What evidence does the grounding verification layer produce?

What is the procedural alternative to automated grounding verification?

How does prompt injection threaten RAG system compliance?

What defences protect against knowledge base injection?

When do knowledge base changes trigger a new conformity assessment?

Frequently Asked Questions

Related Pages

In This Section

Build compliance into your pipeline