How should PMM alerts be categorised by severity?

Three tiers: informational (within tolerance, reviewed at next meeting), warning (breach requiring investigation within five days), and critical (compliance threshold breach, AI Governance Lead notified within 24 hours).

How should escalation paths be designed for AI monitoring?

Document who is notified, through which channel, within what timeframe, with named alternates for every role and coverage for out-of-hours and multi-jurisdiction scenarios.

How should monitoring thresholds be set?

Derive from validation performance, reflecting the system's risk profile and consequences. Review quarterly, tightening thresholds that never trigger and loosening those generating benign alerts.

What is the silent escalation problem?

An alert is acknowledged but produces no root cause analysis, documented decision, or resolution. The alerting system should track subsequent actions and outcomes, not just acknowledgement.

Should escalation paths cover multi-jurisdiction incidents?

Yes. Different authorities in different time zones may need notification for the same incident, and the escalation path must define how the Legal and Regulatory Advisor coordinates these notifications.

What is the silent escalation problem?

An alert is acknowledged but produces no root cause analysis, documented decision, or resolution. The alerting system should track subsequent actions and outcomes, not just acknowledgement.

Should escalation paths cover multi-jurisdiction incidents?

Yes. Different authorities in different time zones may need notification for the same incident, and the escalation path must define how the Legal and Regulatory Advisor coordinates these notifications.

Alerting and Escalation Framework for AI Systems

Q: How should thresholds differ for accuracy versus fairness metrics?

Accuracy thresholds typically use percentage degradation from the baseline. Fairness thresholds must reflect both statistical significance (to avoid alerting on random variation) and practical significance (real-world consequences for affected persons).

Written by

Michael Clark

Chief Executive Officer, Standard Intelligence

Founder and CEO of Standard Intelligence. Author of the Practitioners Implementation Guide series for EU AI Act compliance.

Martin Dean

Chief Technology Officer, Standard Intelligence

CTO of Standard Intelligence. Leads platform engineering and contributes to the PIG series technical content.

Post-market monitoring alerts follow a three-tier severity framework: informational, warning, and critical. Escalation paths must be documented, rehearsed, and account for out-of-hours scenarios and multi-jurisdiction incidents. Threshold calibration balances sensitivity against operational burden.

Abstract

Read abstract

Effective post-market monitoring requires a structured alerting and escalation framework that translates monitoring data into timely action. A three-tier severity framework distinguishes informational shifts within tolerance, warning-level threshold breaches requiring investigation within five working days, and critical breaches requiring AI Governance Lead notification within 24 hours. Escalation paths must be documented with named alternates for every role and tested against out-of-hours and multi-jurisdiction scenarios. Threshold calibration derives from validation performance and balances sensitivity against alert fatigue, with quarterly reviews documented as part of the PMM plan. The alerting infrastructure routes through a dedicated service ensuring delivery, acknowledgement tracking, and automatic escalation of unacknowledged alerts.

What severity tiers structure the alerting framework?

Engineering Approach

Three severity tiers structure the response to monitoring alerts, each with distinct notification requirements and action expectations.

Three severity tiers structure the response to monitoring alerts, each with distinct notification requirements and action expectations. Informational alerts indicate a metric has shifted but remains within the established tolerance band. These are logged and reviewed at the next scheduled PMM review meeting with no immediate action required.

Warning alerts indicate a metric has breached its warning threshold, typically set at a level indicating potential drift before the compliance threshold is reached. The Technical SME reviews the alert within five working days and initiates root cause analysis. If the cause is identified and benign, such as a known seasonal pattern, the alert is documented and closed. If the cause is unclear or concerning, the alert is escalated to the AI Governance Lead.

Critical alerts indicate a metric has breached its compliance threshold, a fundamental rights concern has been identified, or multiple warning-level alerts have occurred simultaneously or in rapid succession. Immediate investigation is initiated and the AI Governance Lead is notified within 24 hours. If the breach indicates potential harm, the break-glass procedure is considered. The serious incident reporting process is assessed for applicability.

How should escalation paths be designed?

Engineering Approach

The AI Governance Lead documents and rehearses the escalation path, ensuring it is accessible to every person who may need to initiate it.

The AI Governance Lead documents and rehearses the escalation path, ensuring it is accessible to every person who may need to initiate it. For each severity tier, the path defines who is notified, through which channel, within what timeframe, and what actions are expected of the recipient. Escalation paths must account for out-of-hours scenarios, key person unavailability with named alternates for every role, and multi-jurisdiction incidents where different authorities in different time zones are notified by the Legal and Regulatory Advisor.

A common failure mode is the silent escalation, where an alert is acknowledged but no action follows. The alerting system should track not only acknowledgement but also the subsequent actions taken and their outcomes. An alert that is acknowledged but produces no root cause analysis, no documented decision, and no resolution is an indicator that the escalation framework has a gap. The alerting infrastructure should route critical alerts through multiple channels simultaneously: dedicated incident Slack channels, PagerDuty or equivalent on-call notification, and direct email to the AI Governance Lead and Legal and Regulatory Advisor. Warning alerts should route to the monitoring Slack channel and email to the Technical SME. The routing configuration ensures that critical alerts cannot be silently ignored.

How should alerting thresholds be calibrated?

Engineering Approach

Setting thresholds is a decision that balances sensitivity against operational burden.

Setting thresholds is a decision that balances sensitivity against operational burden. Thresholds set too tightly generate excessive false alerts, causing alert fatigue and desensitisation. Thresholds set too loosely allow compliance-relevant changes to pass undetected.

The Technical SME derives initial thresholds from the system's validation performance, using the validation metrics as the baseline and defining warning and critical thresholds as deviations from that baseline. For accuracy metrics, a warning threshold might be a two per cent degradation and a critical threshold a five per cent degradation, but these figures are system-specific and must reflect the system's risk profile and the consequences of degraded performance. For fairness metrics, the thresholds should reflect both statistical significance, to avoid alerting on random variation, and practical significance, the point at which a fairness deviation has real-world consequences for affected persons.

Rolling baseline computation compares current metrics against the declared AISDP baseline using rolling averages that smooth daily noise. A rolling seven-day fairness metric average filters out single-day anomalies while still detecting sustained drift. Inference volume tracking provides context: metrics computed on low-volume periods are statistically unreliable and should be flagged rather than treated as definitive threshold breaches.

The Technical SME reviews thresholds quarterly as operational experience accumulates. A threshold that has never triggered may be too loose. A threshold that triggers weekly on benign variation is too tight. The Technical SME documents each threshold review as part of the PMM plan's continuous improvement cycle.

What are the procedural alternatives for alerting?

Compensating Controls

For organisations without automated alerting infrastructure, the escalation framework can be implemented through documented procedures.

For organisations without automated alerting infrastructure, the escalation framework can be implemented through documented procedures. A severity classification guide provides clear criteria for each tier with worked examples from the system's domain. A notification matrix lists the responsible person, communication channel, and response timeframe for each severity tier.

A weekly monitoring review meeting, chaired by the Technical SME, examines all monitoring dashboards and identifies any threshold breaches that occurred since the last meeting. Informational and warning-level alerts detected during the review are documented in meeting minutes with assigned follow-up actions. Critical alerts require immediate out-of-cycle notification using the notification matrix.

The procedural approach detects problems at the next weekly review rather than when they occur, creating a window during which compliance-relevant changes may go unaddressed. For high-risk systems where a week of undetected drift could affect thousands of individuals, automated alerting is strongly recommended. Prometheus and Alertmanager are both open-source and provide the automated alerting capability at zero licence cost.

Alerting and Escalation Framework for AI Systems

Written by

What severity tiers structure the alerting framework?

How should escalation paths be designed?

How should alerting thresholds be calibrated?

What are the procedural alternatives for alerting?

Frequently Asked Questions

Related Pages

In This Section

Build compliance into your pipeline