We use cookies to improve your experience and analyse site traffic.
Multi-model AI systems require cascading change management through dependency graphs, composite version manifests for ten-year reproducibility, stage-by-stage fairness evaluation, and three-phase conformity assessment. Automated cascade-aware testing in the CI/CD pipeline becomes essential for systems with five or more components.
A change to any component in a multi-model system can affect the outputs of every downstream component and, through them, the system's aggregate behaviour.
A change to any component in a multi-model system can affect the outputs of every downstream component and, through them, the system's aggregate behaviour. The Technical SME maintains a cascade map: a directed graph showing which components consume the outputs of which other components. When a component changes, the cascade map identifies every downstream component that may be affected.
When a component is updated, retrained, reconfigured, or replaced, the Technical SME assesses the impact on all downstream components using the cascade map. The assessment determines three things: whether downstream components need re-evaluation because their inputs have changed, whether the aggregate evaluation thresholds still hold when the changed component's outputs flow through the pipeline, and whether the change triggers the substantial modification criteria under Article 3(23).
The governance pipeline implements cascade testing as a pipeline stage. When a component changes, the pipeline re-evaluates not only the changed component but every downstream component and the aggregate system. The pipeline fails if any component or the aggregate falls below its documented threshold. The cascade map is maintained as a machine-readable configuration file in YAML or JSON, version-controlled alongside the pipeline definition. Tools such as DVC for data and model lineage and MLflow for experiment tracking provide the infrastructure for tracking component relationships.
The composite version identifier gains critical importance in multi-model systems because the system's behaviour at any point in time is determined by the specific combination of component versions deployed.
The composite version identifier gains critical importance in multi-model systems because the system's behaviour at any point in time is determined by the specific combination of component versions deployed. The composite version must capture every component version, the configuration version, and the pipeline version.
Each deployment produces a composite version manifest listing every component, its version, its model registry reference, and the configuration applied. The manifest is stored in the governance artefact registry and referenced in aisdp Module 10.
The Technical SME must be able to reproduce the system's behaviour at any historical point from the composite version manifest. This requires that every component version, every dataset version, and every configuration version referenced in the manifest is retrievable from the organisation's artefact stores. The ten-year retention obligation under Article 18 applies to the complete set of artefacts referenced by the manifest, not merely the manifest itself.
The CONFORMITY ASSESSMENT must evaluate the composite system, not merely its components.
The conformity assessment must evaluate the composite system, not merely its components. A system that demonstrates per-component compliance but cannot demonstrate aggregate compliance fails the assessment.
The Conformity Assessment Coordinator structures the assessment in three phases. Phase 1 verifies each component against its individual requirements covering training data documentation, performance metrics, and fairness evaluation. Phase 2 verifies the integration covering data flows between components, version alignment, and cascade testing results. Phase 3 verifies the aggregate system covering end-to-end performance, disaggregated fairness, risk management, and human oversight effectiveness.
Non-conformities identified at any phase are logged in the non-conformity register with the specific component or interaction responsible. Remediation may require changes to a single component, to the interaction between components, or to the system's aggregate architecture.
Ensemble-level monitoring in the PMM programme should track not only aggregate metrics but per-component contribution metrics: how much each component contributes to the aggregate output and whether that contribution is shifting over time.
Ensemble-level monitoring in the PMM programme should track not only aggregate metrics but per-component contribution metrics: how much each component contributes to the aggregate output and whether that contribution is shifting over time. A component whose contribution decreases may be masking a degradation. A component whose contribution increases disproportionately may be introducing bias. Monitoring contribution metrics provides early warning of interaction effects before they manifest in aggregate performance.
For systems with two to three model components, composite governance can be managed manually. The component registry is a spreadsheet. The cascade map is a documented diagram. Change impact assessment is conducted through manual review. Cascade testing is executed manually by running downstream evaluation suites after each component change. For systems with five or more components, manual governance becomes unsustainable, and automated cascade testing with component contribution monitoring becomes necessary.
DVC for data and model lineage, MLflow for experiment tracking, and cascade-aware CI/CD pipelines that automatically identify and test downstream components via the cascade map.
The governance pipeline fails the change. The non-conformity is logged with the specific component or interaction responsible, and remediation may require changes to a single component, the interaction between components, or the system's aggregate architecture.
Through aggregate disaggregated evaluation testing the complete system's outputs by protected characteristic subgroups, alongside per-component contribution monitoring that tracks whether each component's contribution is shifting over time.
Manual governance is feasible for two to three components. For five or more, automated cascade testing and contribution monitoring become necessary.