Performance Claims and Ongoing Monitoring for AI Software
Your CER is only as strong as your claims. Vague claims cannot be verified. Overstated claims will be rejected. For AI software, claims must also address the unique challenge of algorithmic systems: they can change over time. Here is how to write claims that hold up and monitoring that keeps them valid.
In This Article
Performance claims are the backbone of your clinical evaluation. They define what you are proving. For AI software, claims must be specific enough to verify, realistic enough to achieve, and robust enough to survive post-market reality.
Writing Claims That Reviewers Accept
Every claim should contain these elements:
- Who uses it – the intended user population
- In what setting – the clinical environment
- For which population – the patient population
- For what decision – the clinical use case
- With what metric – the measurable endpoint
- At what threshold – the acceptance criterion
Strong claims use precise language. Instead of the software detects disease accurately, write: In radiologists using the software for screening mammography, the software achieves sensitivity of at least 90% for detecting malignant lesions in women aged 40-74, on external validation data with defined case mix.
A claim without a metric is an opinion. A claim without a threshold is incomplete. Both will be rejected.
Claim Categories for AI Software
Clinical Association Claims. The model output is associated with the targeted condition as defined by the accepted reference standard. Primary endpoint is area under the ROC curve meeting target and floor from the CEP.
Analytical Performance Claims. The software processes input data and produces outputs within specified limits for accuracy and repeatability. Time to result is within workflow limits. Performance under degraded conditions remains within tolerance.
Clinical Performance Claims. In the intended users and setting, the device achieves sensitivity and specificity at or above targets on external validation data. Subgroup performance meets floors for predefined groups.
Safety and Oversight Claims. Outputs are accompanied by information needed for user understanding. Human oversight procedures prevent or minimize risks. Post-market monitoring detects drift and new risks.
Mapping Claims to Evidence
For every claim, your CER must include:
- GSPR item from Annex I that the claim supports
- Acceptance criteria from CEP including target and floor
- Datasets and studies that test the claim
- Risk controls that depend on the claim
This creates traceability. Reviewers can follow any claim from requirement through evidence to conclusion.
Claims that cannot be traced to specific evidence. Generic conclusions that could apply to any device. Missing links between claims and risk controls.
Ongoing Monitoring Requirements
AI software can drift. Performance validated pre-market may degrade as patient populations change, clinical practice evolves, or data quality varies. MDR and MDCG require post-market monitoring that addresses these risks.
Performance Logging. Maintain records of algorithm inputs, outputs, and outcomes where available. This data feeds drift detection.
Drift Monitoring. Define metrics and thresholds for detecting performance degradation. When sensitivity drops below 85% on rolling monitoring data, trigger investigation.
Update Validation. When the algorithm is updated, revalidate against acceptance criteria. Document that new versions maintain or improve performance.
PMCF Integration. Connect post-market clinical follow-up objectives to specific gaps and uncertainties from your clinical evaluation. Each gap should map to a monitoring activity.
Closing the Loop
Clinical evaluation for AI software is not a one-time exercise. It is a continuous process. Your CER represents your understanding at a point in time. Post-market activities either confirm that understanding or reveal where it needs updating.
Build this lifecycle thinking into your clinical evaluation from the start. Identify what you know, what you do not know, and how you will learn. This demonstrates maturity that reviewers recognize and respect.
This concludes our series on AI Medical Device Clinical Evaluation. The principles are straightforward: understand the three-pillar framework, plan systematically, set defensible criteria, appraise evidence rigorously, structure your CER clearly, and monitor continuously. Do these consistently, and your AI software CERs will pass review.
Peace,
Hatem
Your Clinical Evaluation Partner
Frequently Asked Questions
How often should AI software performance be monitored?
Continuous monitoring is ideal. At minimum, define regular review intervals based on usage volume and risk level. High-risk devices may need monthly performance reviews. Lower-risk devices may suffice with quarterly or annual reviews.
What triggers an algorithm update?
Performance drift below thresholds, new safety signals, changes in clinical practice, and regulatory requirement changes. Define these triggers in your PMCF plan with specific criteria.
Part 6 of 6
Need Expert Help with Your Clinical Evaluation?
Get personalized guidance on MDR compliance, CER writing, and Notified Body preparation.
✌
Peace, Hatem
Your Clinical Evaluation Partner
Follow me for more insights and practical advice.
– MDR 2017/745 Article 83, Annex XIV
– MDCG 2020-1: Guidance on Clinical Evaluation for Medical Device Software
– MDCG 2020-7: PMCF Plan Template
– EU AI Act Requirements





