AI validation data is not clinical evidence. Here’s why.
Three months before your submission deadline, the Notified Body sends back your clinical evaluation report with a major nonconformity. The issue? Your AI algorithm validation study was treated as clinical evidence supporting safety and performance claims. It wasn’t.
This happens more often than you’d think. Teams invest heavily in technical validation—accuracy metrics, confusion matrices, sensitivity and specificity on test datasets. The numbers look good. The AI performs as expected. The assumption follows naturally: this proves clinical safety and performance.
But that assumption creates a gap that Notified Bodies will identify immediately.
Understanding what counts as clinical evidence for AI-enabled medical devices requires distinguishing between technical validation and clinical validation. They are not the same thing. And regulatory reviewers know the difference.
What technical validation actually demonstrates
Technical validation answers a narrow question: does the algorithm perform its intended computational task correctly?
When you train and test an AI model on labeled datasets, you’re measuring how well the algorithm classifies, predicts, or detects based on input data. You calculate metrics like area under the curve, precision-recall, or mean absolute error. These metrics matter. They establish that your algorithm works as designed under controlled conditions.
But technical performance on a dataset is not clinical performance in use.
The dataset might be representative, well-annotated, and thoroughly curated. The algorithm might achieve 95% accuracy. That still doesn’t tell you what happens when a clinician uses the device in practice. It doesn’t tell you whether the device improves diagnosis, treatment decisions, or patient outcomes. It doesn’t establish clinical benefit or address clinical risk.
Clinical evaluation reports that present validation accuracy as proof of clinical performance. The Notified Body asks: where is the evidence that this performance translates to safe and effective use by the intended users in the intended clinical setting?
The answer cannot be
Frequently Asked Questions
What is a Clinical Evaluation Report (CER)?
A CER is a mandatory document under MDR 2017/745 that demonstrates the safety and performance of a medical device through systematic analysis of clinical data. It must be updated throughout the device lifecycle based on PMCF findings.
How often should the CER be updated?
The CER should be updated whenever significant new clinical data becomes available, after PMCF activities, when there are changes to the device or intended purpose, and at minimum during annual reviews as part of post-market surveillance.
What causes CER rejection by Notified Bodies?
Common reasons include inadequate equivalence demonstration, insufficient clinical data for claims, poorly structured SOTA analysis, missing gap analysis, and lack of clear benefit-risk determination. Structure and logical flow are as important as the data itself.
Which MDCG guidance documents are most relevant for clinical evaluation?
Key documents include MDCG 2020-5 (Equivalence), MDCG 2020-6 (Sufficient Clinical Evidence), MDCG 2020-13 (CEAR Template), MDCG 2020-7 (PMCF Plan), and MDCG 2020-8 (PMCF Evaluation Report).
Need Expert Help with Your Clinical Evaluation?
Get personalized guidance on MDR compliance, CER writing, and Notified Body preparation.
✌
Peace, Hatem
Your Clinical Evaluation Partner
Follow me for more insights and practical advice.
Why Technical Validation Alone Falls Short Under MDR
AI-based medical devices require three distinct layers of evidence under the MDR framework and MDCG 2020-1: technical validation, clinical validation, and clinical evaluation. Many manufacturers confuse technical validation (proving the algorithm works correctly) with clinical validation (proving the algorithm delivers clinically meaningful results in the intended patient population).
Technical validation data, such as accuracy metrics, sensitivity, specificity, and AUC scores measured on test datasets, demonstrates analytical performance. However, these metrics do not prove that the AI device improves patient outcomes, reduces clinical burden, or is safe for use in the intended clinical workflow. This gap is where clinical evidence becomes essential.
Building a Clinical Evidence Strategy for AI Medical Devices
To satisfy MDR requirements, AI device manufacturers should develop a layered evidence strategy:
- Level 1 – Technical validation: Algorithm performance on representative datasets, including subgroup analyses for different patient demographics
- Level 2 – Clinical validation: Prospective studies showing the AI output is clinically meaningful and actionable by healthcare professionals
- Level 3 – Clinical evaluation: Systematic assessment of all available clinical data, including literature on the clinical condition and the state of the art
- Level 4 – Post-market monitoring: PMCF plan with defined metrics for tracking real-world performance, algorithmic drift, and user interaction patterns
MDCG 2020-1 specifically states that the clinical evaluation of MDSW must demonstrate the clinical association between the software output and the targeted clinical condition. For AI devices, this means showing not just that the algorithm is accurate, but that its outputs lead to better clinical decisions compared to the current standard of care.
Deepen Your Knowledge
Read Complete Guide to Clinical Evaluation under EU MDR for a comprehensive overview of clinical evaluation under EU MDR 2017/745.





