Clinical validity of algorithms – what auditors reject first

Hatem Rabeh

Written by HATEM RABEH, MD, MSc Ing

Your Clinical Evaluation Expert And Partner

in
S

I reviewed a submission last month where the manufacturer provided validation accuracy from a controlled dataset and assumed clinical validity was demonstrated. The Notified Body stopped at page three. The reason? No evidence that the algorithm actually improves clinical outcomes when used by the intended user in the intended environment.

This is not an isolated case. It happens in most software algorithm reviews I see.

The confusion is systemic. Teams conflate technical performance with clinical validity. They submit analytical validation studies showing high sensitivity and specificity on a reference dataset. Then they claim the device is clinically valid.

But clinical validity under MDR is not about whether your algorithm can classify images correctly in a lab setting. It is about whether using that classification leads to a correct clinical decision that benefits the patient.

And that requires a completely different evidence structure.

What clinical validity actually means for software algorithms

MDCG 2020-1 defines clinical validity as the ability of a device to achieve its intended clinical benefit in the target population and setting. For algorithms, this means two things must be demonstrated:

First, the algorithm output must be clinically meaningful. A classification, score, or measurement must map to something actionable in clinical practice.

Second, acting on that output must lead to better patient outcomes compared to the alternative, whether that alternative is another device, clinical judgment, or doing nothing.

Most manufacturers stop after demonstrating the first part. They show that their algorithm detects a feature or produces a value. But they do not show that a clinician using that output makes better decisions.

Key Insight
Analytical validation proves your algorithm works under controlled conditions. Clinical validity proves it works when real clinicians use it in real clinical environments to make real decisions.

This distinction is critical because software algorithms do not treat patients. Clinicians treat patients. The algorithm is an input to a decision process.

If the clinician cannot interpret the output correctly, or if the output does not change the clinical pathway, then no clinical benefit occurs, regardless of how accurate the algorithm is on a test dataset.

Why technical performance data is not enough

I see this pattern repeatedly. A manufacturer submits a clinical evaluation report with extensive performance tables. Sensitivity, specificity, AUC, F1 scores, confusion matrices. Pages of statistical validation.

Then the auditor asks: Where is the evidence that using this device in practice improves patient management?

The answer is often missing, or it is assumed. The reasoning goes like this: If our algorithm detects disease accurately, then clinicians will act on it, and patients will benefit.

But this assumption skips three critical steps:

First, can the clinician correctly interpret the algorithm output in the context of other clinical information?

Second, does the algorithm output change the clinical decision compared to standard practice?

Third, does that change in decision lead to better patient outcomes?

Without evidence addressing these steps, you have analytical performance, not clinical validity.

Common Deficiency
Manufacturers submit retrospective studies showing algorithm accuracy on archived images or data. Auditors reject these because they do not demonstrate how the device is used in the clinical workflow or whether it changes clinician behavior.

The regulatory expectation is clear. MDCG 2020-6 for software states that clinical evidence must address the entire care pathway, not just the technical function of the algorithm.

That means understanding what the clinician does with the output, how it integrates into decision-making, and whether patients are better off as a result.

What evidence structure auditors expect

So what does acceptable evidence look like?

It depends on the intended purpose and risk class, but the structure is consistent across submissions that pass review.

You need three layers of evidence, and each one builds on the previous layer.

Layer one: Analytical validation. This proves the algorithm performs its technical function accurately under defined conditions. You show that when you input data, the output is correct according to a reference standard. This is necessary but not sufficient.

Layer two: Clinical performance. This proves the algorithm output is meaningful in a clinical context. You show that the classification, score, or measurement correlates with a clinical truth that matters for patient management. You demonstrate that clinicians can interpret the output correctly. Ideally, you show agreement between algorithm output and expert clinical judgment.

Layer three: Clinical benefit. This proves that using the algorithm in practice leads to better patient outcomes or equivalent outcomes with additional benefits such as reduced time, cost, or invasiveness. You show a change in clinical pathway or decision accuracy when the device is used.

Most deficiencies occur because manufacturers stop at layer one or layer two. They prove the algorithm works. They prove it produces clinically relevant information. But they do not prove it improves care.

Key Insight
The third layer is where auditors focus. If your clinical evaluation does not address how the device changes clinical decisions or patient outcomes, expect a major non-conformity.

The challenge is that demonstrating layer three often requires prospective clinical studies or strong real-world evidence showing the device in use. Retrospective algorithm testing is rarely sufficient unless combined with evidence of clinical utility.

The role of intended purpose in shaping evidence requirements

The evidence you need depends entirely on what you claim the device does.

If your algorithm provides a measurement that the clinician uses as one input among many, the evidence requirement may focus on measurement accuracy and clinical correlation. You show the measurement is reliable and that it adds value to clinical assessment.

If your algorithm provides a diagnostic conclusion or recommendation, the evidence requirement is higher. You must show that acting on that conclusion improves diagnostic accuracy or patient outcomes compared to standard practice.

If your algorithm automates a decision or triage, you must show that the automated pathway is non-inferior or superior to the manual pathway, with evidence that the patient is not harmed by removing human oversight at that step.

The regulatory risk increases as the algorithm moves from providing information to driving decisions.

What I see in practice is that manufacturers often write an intended purpose that sounds modest but implies decision support. Then they submit evidence that only covers information provision. The gap is obvious to the reviewer.

Common Deficiency
Intended purpose states the device

Frequently Asked Questions

What is a Clinical Evaluation Report (CER)?

A CER is a mandatory document under MDR 2017/745 that demonstrates the safety and performance of a medical device through systematic analysis of clinical data. It must be updated throughout the device lifecycle based on PMCF findings.

How often should the CER be updated?

The CER should be updated whenever significant new clinical data becomes available, after PMCF activities, when there are changes to the device or intended purpose, and at minimum during annual reviews as part of post-market surveillance.

What causes CER rejection by Notified Bodies?

Common reasons include inadequate equivalence demonstration, insufficient clinical data for claims, poorly structured SOTA analysis, missing gap analysis, and lack of clear benefit-risk determination. Structure and logical flow are as important as the data itself.

Which MDCG guidance documents are most relevant for clinical evaluation?

Key documents include MDCG 2020-5 (Equivalence), MDCG 2020-6 (Sufficient Clinical Evidence), MDCG 2020-13 (CEAR Template), MDCG 2020-7 (PMCF Plan), and MDCG 2020-8 (PMCF Evaluation Report). MDCG 2020-1, MDCG 2020-6

Need Expert Help with Your Clinical Evaluation?

Get personalized guidance on MDR compliance, CER writing, and Notified Body preparation.

Peace, Hatem

Your Clinical Evaluation Partner

Follow me for more insights and practical advice.

Deepen Your Knowledge

Read Complete Guide to Clinical Evaluation under EU MDR for a comprehensive overview of clinical evaluation under EU MDR 2017/745.