Building Your Clinical Evaluation Plan for AI Software
Your Clinical Evaluation Plan is not a checkbox. It is the roadmap that determines whether your CER succeeds or fails. For AI software, the CEP must address specific requirements that traditional devices do not face. Get this wrong, and you will spend months responding to deficiencies.
In This Article
The Clinical Evaluation Plan defines what you will prove about your AI software, how you will prove it, and what triggers updates. MDR Annex XIV requires that you plan your evaluation using state of the art, define evidence needs, and maintain the plan throughout the device lifecycle.
For AI software, this means addressing the three evidence pillars from MDCG 2020-1 and the specific challenges that come with algorithmic medical devices.
Essential CEP Components
A complete CEP for AI software contains twelve key sections. Each section builds on the previous one to create a coherent evaluation strategy.
1. Device and Intended Purpose. Define exactly what your software does, who uses it, in what setting, for which patients, and for what clinical decision. Anchor this to your State of the Art summary with explicit claims.
2. Regulatory References. Cite MDR Article 61, Annex I, Annex XIV, MEDDEV 2.7.1 rev 4, and MDCG 2020-1. Clarify whether equivalence applies to any claims.
3. SOTA Synopsis. Summarize the clinical context, alternatives, benchmark metrics, identified gaps, and implications for your device. Six lines for the clinical context, one table for benchmarks.
4. Claims and Acceptance Criteria. List each claim with its metric, target value, and floor threshold. Map claims to GSPRs and to evidence pillars.
Claims without acceptance criteria are opinions. Every claim must have a measurable endpoint and a predefined threshold from your SOTA analysis.
Evidence Sources and Study Designs
5. Evidence Sources. Describe your systematic literature review, internal verification and validation, external validation on independent sites, usability studies per IEC 62366, and real-world registries where relevant.
6. Literature Search Methods. Document databases, timeframes, keywords, inclusion and exclusion criteria, and screening procedures. Commit to capturing unfavorable findings.
7. Appraisal and Analysis Methods. Explain how you assess scientific validity and device relevance. Include your appraisal matrix, prespecified statistics, subgroup analyses, and calibration approaches.
AI-Specific Requirements
8. Data Governance. Address dataset representativeness, training and test data separation, bias detection, robustness testing with degraded inputs, user transparency, and human oversight mechanisms. This section responds to both MDR and emerging AI Act expectations.
9. Risk Linkage. Connect every claim and metric to specific risks in your risk management file and their corresponding controls. Explain how evidence validates control effectiveness per ISO 14971.
CEPs that list evidence sources without explaining how they map to claims. Every study, every dataset must trace to specific acceptance criteria.
Gap Analysis and Lifecycle
10. Gap Analysis. Identify SOTA gaps including missing subgroups, lack of external validation sites, and weak reference standards. Define pre-market actions versus PMCF commitments with escalation triggers.
11. PMCF Connections. Reference MDCG 2020-7 and 2020-8 templates. Specify objectives, methods, endpoints, sites, sample size intent, and mechanisms for capturing performance drift.
12. Update Rules. State review frequency and accountable parties. Define triggers for updates: new literature, software versions, standard changes, safety signals, performance drift.
Tables That Reviewers Expect
Your CEP should include three key tables:
Table A: Claims-to-Metrics Mapping. Shows claim, metric, target, floor, evidence pillar, GSPR reference, dataset, analysis method, and decision rule.
Table B: Evidence Inventory. Shows source type, study design, sites, sample size, reference standard, subgroups covered, external validation status, and usability tasks.
Table C: Gaps-to-Actions. Lists each gap, its impact on claims, pre-market mitigation, PMCF strategy, and escalation conditions.
In the next post, we cover how to set acceptance criteria that are both rigorous and achievable.
Peace,
Hatem
Your Clinical Evaluation Partner
Frequently Asked Questions
How detailed should the CEP be?
Detailed enough that another qualified person could execute it. Every claim should map to acceptance criteria, datasets, and analysis methods. Vague plans lead to vague evidence.
When should the CEP be updated?
MDR requires ongoing updates. Key triggers include new SOTA information, software version changes, safety signals, and PMCF findings. Document review cycles and responsible parties.
Part 2 of 6
Setting Acceptance Criteria for AI Software Performance
Need Expert Help with Your Clinical Evaluation?
Get personalized guidance on MDR compliance, CER writing, and Notified Body preparation.
✌
Peace, Hatem
Your Clinical Evaluation Partner
Follow me for more insights and practical advice.
– MDR 2017/745 Article 61, Annex XIV
– MDCG 2020-1: Guidance on Clinical Evaluation for Medical Device Software
– MDCG 2020-7: PMCF Plan Template





