Why software clinical evaluation is not just medical device lite

Hatem Rabeh

Written by HATEM RABEH, MD, MSc Ing

Your Clinical Evaluation Expert And Partner

in
S

I reviewed a Class IIa medical device software clinical evaluation last month where the manufacturer argued they only needed usability testing because ‘the software doesn’t touch the patient.’ The Notified Body stopped the review on page 12. The fundamental misunderstanding was not about testing. It was about what clinical evaluation actually means for software.

Medical device software presents a conceptual challenge that many teams underestimate. The absence of physical contact does not reduce clinical risk. It shifts it. And this shift demands a different approach to clinical evaluation, not a lighter one.

MDCG 2020-1 was published precisely because reviewers kept seeing the same pattern. Manufacturers would submit clinical evaluation reports for software that read like hardware reports with software vocabulary swapped in. The structure was there. The references were there. But the reasoning was missing.

This is not about compliance theater. It is about understanding how software creates clinical benefit and clinical risk in ways that are fundamentally different from traditional devices.

The Core Misconception About Software Evidence

Most deficiencies I see in software clinical evaluations stem from a single conceptual error. Teams think software clinical evaluation is about proving the algorithm works. It is not.

Clinical evaluation is about proving the device, in its intended use environment, with its intended users, achieves its intended clinical benefit without unacceptable risk. For software, this means understanding the entire sociotechnical system.

An algorithm can be mathematically perfect and clinically dangerous. Why? Because clinical benefit depends on what the user does with the output. And clinical risk emerges from what happens when the output is wrong, incomplete, delayed, or misunderstood.

Key Insight
MDCG 2020-1 does not ask you to prove your software works. It asks you to prove your software, as used in practice, delivers the claimed clinical outcome without creating unacceptable risk to the patient.

This is why purely analytical performance data is never sufficient. Analytical validity tells you what the software does. Clinical validity tells you whether what it does matters clinically. Clinical utility tells you whether it actually improves patient outcomes when used in real conditions.

These are three different questions. Many clinical evaluation reports only answer the first one.

What MDCG 2020-1 Actually Requires

The guidance does not create new requirements. It interprets MDR Annex XIV for software. But the interpretation matters because it clarifies what evidence is acceptable and what is not.

Let me walk through the key areas where I see submissions fail.

Clinical Benefit Must Be Measurable and Patient-Relevant

For software, clinical benefit is often indirect. A diagnostic algorithm does not treat anyone. It provides information. The benefit comes from what clinicians do with that information.

This means your clinical evaluation must trace the chain from software output to clinical decision to patient outcome. If you cannot trace this chain with evidence, you do not have a clinical benefit claim. You have an assumption.

I see this often with AI-based diagnostic tools. The manufacturer shows the algorithm detects a condition with 95% sensitivity. They claim this improves patient outcomes. But they provide no evidence that detection leads to different treatment decisions or that those decisions improve outcomes.

MDCG 2020-1 is explicit here. You must demonstrate the clinical benefit. Not assume it. Not infer it. Demonstrate it.

Common Deficiency
Claiming improved diagnostic accuracy as clinical benefit without evidence that improved accuracy changes clinical management or patient outcomes. Accuracy is analytical performance, not clinical benefit.

Risk Analysis Must Address Real-World Use Conditions

Software risk analysis often focuses on software failure modes. What happens if the algorithm crashes? What happens if the output is incorrect?

These are relevant risks. But they are incomplete.

MDCG 2020-1 requires you to consider risks that emerge from correct operation in suboptimal conditions. What happens when a tired physician interprets the output at 3 AM? What happens when the software is used on a patient population it was not validated for? What happens when the output contradicts clinical judgment?

These are use-related risks, not software-related risks. They do not appear in your software verification testing. They appear in clinical use.

This is why usability testing and human factors engineering are not optional extras. They are core components of clinical evaluation for software. You must show that users can correctly interpret and act on the software output in realistic conditions.

The State of the Art for Software Is Moving

State of the art analysis for medical device software is particularly challenging because the field evolves rapidly. What was state of the art three years ago may be obsolete now.

This creates a practical problem. You cannot write a state of the art analysis that remains valid throughout your product lifecycle. You must update it.

But there is a deeper issue. For software, state of the art is not just about feature parity with competitors. It is about the quality of evidence expected for similar claims.

If you are claiming your AI diagnostic tool is equivalent to clinical expert judgment, the state of the art question is not what other AI tools do. It is what evidence is required to validate clinical expert equivalence. And that standard is high.

Many manufacturers miss this. They benchmark against other software products when they should be benchmarking against the clinical standard they claim to match or replace.

Where Equivalence Falls Apart for Software

Equivalence is already difficult for traditional medical devices. For software, it is often impossible.

MDCG 2020-1 acknowledges this. Software changes rapidly. A software update can fundamentally alter how the device works. This means demonstrating equivalence to a predicate device requires not just similar intended use and technical characteristics, but also evidence that the clinical performance remains comparable after your specific implementation.

Here is what this means in practice.

You cannot claim equivalence to another diagnostic software tool based purely on similar algorithms and similar performance metrics. You must show that your specific implementation, with your specific training data, your specific user interface, and your specific intended use environment, produces clinically equivalent outcomes.

This is a high bar. In most cases, it cannot be met without clinical performance data from your own device.

Key Insight
For software, technical equivalence is necessary but not sufficient. Small implementation differences can create large clinical performance differences. Your clinical evaluation must address your specific implementation, not a category of similar products.

This is why I rarely see successful equivalence claims for Class IIb or Class III software. The evidence burden is simply too high. You end up needing nearly the same clinical data you would need for a non-equivalence route.

What Sufficient Clinical Evidence Actually Looks Like

So what does a compliant clinical evaluation for medical device software contain?

First, it contains a clear statement of clinical benefit that is measurable and patient-relevant. Not improved accuracy. Not faster processing. A patient outcome.

Second, it contains a clinical validity analysis that shows the software output is clinically meaningful. This typically requires clinical performance data in the intended use population. Not just analytical validation. Clinical validation.

Third, it contains evidence of clinical utility. This means data showing that use of the software actually leads to better patient outcomes compared to current practice. This is often the missing piece.

Fourth, it contains human factors and usability evidence showing that intended users can correctly interpret and act on the software output in realistic conditions. This is not a separate workstream. It is part of clinical evaluation.

Fifth, it contains a risk-benefit analysis that considers not just software failures but also risks from correct operation in suboptimal conditions.

And finally, it contains a PMCF plan that addresses the specific uncertainties and assumptions in your clinical evidence. For software, this typically includes monitoring real-world performance, user behavior, and off-label use.

Common Deficiency
Treating clinical evaluation as a document deliverable rather than an ongoing process. For software, where updates are frequent, clinical evaluation must be a living process that updates with each significant change.

The PMCF Challenge for Software

Post-market clinical follow-up for software is conceptually straightforward and practically complex.

Conceptually, you need to monitor whether your software performs as expected in real-world use and whether it continues to deliver the claimed clinical benefit as clinical practice evolves.

Practically, this is difficult because software use often leaves minimal clinical documentation. A physician uses your diagnostic tool, sees the output, makes a decision. Where is the record of whether the tool influenced the decision? Where is the record of whether the decision was correct?

MDCG 2020-1 does not solve this problem, but it does clarify the expectation. You must define specific metrics that can be monitored and you must have a realistic plan for collecting that data.

For many software products, this means building data collection into the product itself. Not as a separate study. As a product feature. Usage analytics, outcome tracking, error logging. These are not just product improvements. They are regulatory requirements for PMCF.

If you cannot monitor real-world performance, you cannot demonstrate ongoing safety and performance. And if you cannot demonstrate ongoing safety and performance, you cannot maintain compliance.

What This Means for Your Next Submission

If you are preparing a clinical evaluation for medical device software, the first question is not what evidence you have. It is what clinical outcome you are claiming.

Define the patient-relevant benefit clearly. Then work backward to identify what evidence would demonstrate that benefit. Then assess what evidence you have and what gaps remain.

Most teams work forward from available evidence. This leads to clinical evaluation reports that describe what the software does without proving what it achieves clinically.

The second question is whether you can realistically claim equivalence. For most software, the answer is no. Accept this early and plan for clinical performance data from your own device. This is not a failure. It is the expected pathway for software with meaningful clinical claims.

The third question is how you will monitor real-world performance post-market. If you do not have a realistic answer to this during development, you will not have one after launch. PMCF for software requires planning during design, not after CE marking.

MDCG 2020-1 is not a checklist. It is a framework for reasoning about clinical evidence for software. The reviewers using it are looking for that reasoning, not for box-checking.

They want to see that you understand how your software creates clinical benefit, what could go wrong, and how you will monitor whether the benefit is real and sustained. If your clinical evaluation report answers these questions with evidence, not assumptions, you are on the right path.

If it does not, no amount of reformatting will fix it. The problem is not the document. It is the evidence strategy.

Next in this series, I will address MDCG 2020-13 on clinical evaluation for legacy devices. The guidance that forces manufacturers to confront what they actually know about devices that have been on the market for decades.

Peace,
Hatem
Clinical Evaluation Expert for Medical Devices
Follow me for more insights and practical advice.

References:
– Regulation (EU) 2017/745 (MDR), Annex XIV
– MDCG 2020-1 Guidance on Clinical Evaluation of Medical Device Software

Frequently Asked Questions

What is a Clinical Evaluation Report (CER)?

A CER is a mandatory document under MDR 2017/745 that demonstrates the safety and performance of a medical device through systematic analysis of clinical data. It must be updated throughout the device lifecycle based on PMCF findings.

How often should the CER be updated?

The CER should be updated whenever significant new clinical data becomes available, after PMCF activities, when there are changes to the device or intended purpose, and at minimum during annual reviews as part of post-market surveillance.

What causes CER rejection by Notified Bodies?

Common reasons include inadequate equivalence demonstration, insufficient clinical data for claims, poorly structured SOTA analysis, missing gap analysis, and lack of clear benefit-risk determination. Structure and logical flow are as important as the data itself.

Which MDCG guidance documents are most relevant for clinical evaluation?

Key documents include MDCG 2020-5 (Equivalence), MDCG 2020-6 (Sufficient Clinical Evidence), MDCG 2020-13 (CEAR Template), MDCG 2020-7 (PMCF Plan), and MDCG 2020-8 (PMCF Evaluation Report). MDCG 2020-1

Need Expert Help with Your Clinical Evaluation?

Get personalized guidance on MDR compliance, CER writing, and Notified Body preparation.

Peace, Hatem

Your Clinical Evaluation Partner

Follow me for more insights and practical advice.

Deepen Your Knowledge

Read Complete Guide to Clinical Evaluation under EU MDR for a comprehensive overview of clinical evaluation under EU MDR 2017/745.