What is change classification and why does it matter?

It categorises every change as routine, significant, or substantial modification, determining approval authority, governance stringency, and whether conformity re-assessment is triggered.

How is change classification automated?

As policy-as-code using OPA or equivalent. The engine checks change characteristics against classification rules. The result is a recommendation confirmed by the AI System Assessor.

What pipeline response does each classification trigger?

Routine: standard gates with Technical SME approval. Significant: full governance suite with AI Governance Lead approval. Substantial: full suite plus conformity re-assessment with joint approval.

Can a single indicator make a change a substantial modification?

Yes. A single positive indicator is sufficient to presumptively classify the change as substantial. The AI System Assessor can override the presumption only with documented reasoning that the change does not affect compliance or intended purpose.

What PSI threshold indicates substantial dataset distribution shift?

A PSI above 0.25 against the assessed dataset is a presumptive substantial modification indicator. This is higher than the routine drift threshold of 0.20 used in the engineering pipeline's drift gate.

How are classification calibration exercises conducted?

Multiple assessors independently classify the same set of hypothetical changes, then compare results. Discrepancies are discussed and the classification criteria refined. This maintains consistency across assessors handling different systems.

Can a single indicator make a change a substantial modification?

Yes. A single positive indicator is sufficient to presumptively classify the change as substantial. The AI System Assessor can override the presumption only with documented reasoning that the change does not affect compliance or intended purpose.

What PSI threshold indicates substantial dataset distribution shift?

A PSI above 0.25 against the assessed dataset is a presumptive substantial modification indicator. This is higher than the routine drift threshold of 0.20 used in the engineering pipeline's drift gate.

How are classification calibration exercises conducted?

Multiple assessors independently classify the same set of hypothetical changes, then compare results. Discrepancies are discussed and the classification criteria refined. This maintains consistency across assessors handling different systems.

The governance pipeline must classify every change to determine its regulatory significance. Three categories drive the downstream governance response: routine, significant, and substantial modification. Classification determines which approval authority is required and whether conformity re-assessment is triggered. This page covers the classification framework, substantial modification indicators, and policy-as-code automation.

Abstract

Read abstract

Change classification maps every proposed change to one of three categories determining the governance response. Routine changes do not affect model behaviour and require only Technical SME approval. Significant changes affect model behaviour below the substantial modification threshold and require AI Governance Lead approval. Substantial modifications meeting Article 3(23) criteria trigger conformity re-assessment and require joint approval from the Governance Lead and Legal Advisor. Six indicators are presumptively substantial: model architecture change, intended purpose modification, material dataset distribution shift (PSI above 0.25), fairness tolerance breach, human oversight mechanism modification, and unforeseen deployment context change. Classification can be automated as policy-as-code using Open Policy Agent, though the automated result is a recommendation confirmed by the AI System Assessor. Manual classification is adequate for infrequent changes, with calibration exercises maintaining consistency across assessors. The classification directly determines the approval authority level and is documented as part of the quality management system.

How does the governance pipeline classify changes?

Regulatory Requirement

The governance pipeline must classify every change to determine its regulatory significance under the EU AI Act.

The governance pipeline must classify every change to determine its regulatory significance under the EU AI Act. Article 3(23) defines substantial modification as a change to an AI system after market placement that is not foreseen in the initial conformity assessment and which affects compliance with Chapter 2 requirements or modifies the intended purpose. The pipeline implements a structured three-category classification.

Routine changes do not affect model behaviour, risk profile, fairness metrics, or intended purpose. Examples include infrastructure patches, logging improvements, and UI adjustments. These proceed through standard pipeline gates with automated evidence generation and require only Technical SME approval. Significant changes affect model behaviour, fairness metrics, feature set, training data, or performance profile but do not meet the substantial modification threshold. These require the full governance gate suite, enhanced evidence generation, AISDP module updates, and AI Governance Lead approval.

Substantial modifications meet the Article 3(23) criteria: they affect Chapter 2 compliance or modify the intended purpose. These trigger the full governance gate suite, a conformity re-assessment, and a new Declaration of Conformity. They require joint approval from the AI Governance Lead and Legal and Regulatory Advisor.

What indicators trigger a substantial modification classification?

Engineering Approach

The following changes are presumptively substantial modifications unless the AI System Assessor documents a reasoned determination otherwise.

The following changes are presumptively substantial modifications unless the AI System Assessor documents a reasoned determination otherwise. A change to the model architecture, such as replacing one algorithm with another. A change to the intended purpose or target population. Retraining on a dataset whose distribution differs materially from the assessed dataset, measured by a Population Stability Index above 0.25. A fairness metric shift exceeding the declared tolerance for any protected characteristic subgroup. Removal or material modification of a human oversight mechanism. A change to the system's deployment context that was not foreseen in the initial conformity assessment.

The Technical SME implements the change classification as an automated pre-screening step in the governance pipeline. The classifier analyses the change's scope covering which files, configurations, and artefacts are modified, computes the metrics delta from the evaluation stage, and applies the classification rules defined by the AI Governance Lead. The automated classification is a recommendation; the AI System Assessor reviews and confirms the classification before the pipeline proceeds.

The classification logic should be codified as policy-as-code using Open Policy Agent and evaluated automatically at the start of each pipeline execution. The policy examines the change's characteristics and returns the classification, the indicators that triggered it, whether re-assessment is required, and the approval authority.

What is the procedural alternative for change classification?

Compensating Controls

Without policy-as-code automation, the AI System Assessor classifies each change manually before deployment.

Without policy-as-code automation, the AI System Assessor classifies each change manually before deployment. The Assessor completes a Change Classification Form recording the change description, the artefacts modified, whether the change affects model behaviour, whether it affects the intended purpose, and the classification determination with supporting reasoning. The AI Governance Lead reviews and approves the classification.

Manual classification is adequate for systems with infrequent changes, quarterly or less. The risk is inconsistent classification across different assessors; the AI Governance Lead mitigates this through calibration exercises using historical change examples to align assessor judgement.

Change Classification and Substantial Modification Detection in CI/CD

Written by

How does the governance pipeline classify changes?

What indicators trigger a substantial modification classification?

What is the procedural alternative for change classification?

Frequently Asked Questions

Related Pages

In This Section

Build compliance into your pipeline