Why literature appraisal fails most clinical evaluations

Hatem Rabeh

Written by HATEM RABEH, MD, MSc Ing

Your Clinical Evaluation Expert And Partner

in
S

I see this in almost every review: manufacturers present hundreds of papers in their clinical evaluation reports, run them through extraction tables, and assume the work is done. Then the Notified Body sends a major nonconformity on literature quality. The problem is not that the papers are bad. The problem is that the appraisal methodology was never rigorous enough to prove it.

The literature review is not a documentary exercise. It is not about collecting papers, extracting data, and moving on. It is an evaluation process with a critical purpose: to determine whether the data you selected is actually fit to support your clinical claims.

Most manufacturers treat appraisal as a checkbox. They mention bias. They mention study design. They list limitations in a table. But when the Notified Body reviewer reads the report, they do not see evidence of real critical thinking. They see template language. And that triggers questions.

What appraisal actually means in MDR context

Appraisal means assessing the quality, relevance, and weight of evidence for your specific device and intended use. It is the step where you decide if a study is reliable enough to include in your conclusions. It is also the step where you justify why you kept certain studies and excluded others.

Under MDR 2017/745 Annex XIV Part A, the clinical evaluation must be based on a “critical evaluation of the relevant scientific literature.” The word “critical” is not decorative. It means you must apply a structured methodology to assess the strength and limitations of each source.

MDCG 2020-13 reinforces this by stating that the appraisal must address internal validity, external validity, and relevance to the device under evaluation. These are not interchangeable terms. Each one requires separate analysis.

Key Insight
Internal validity asks: Is the study scientifically sound? External validity asks: Can the results generalize? Relevance asks: Does this apply to my device, my population, my claims?

If your appraisal does not answer all three questions for each study, the Notified Body will consider it incomplete. And incomplete appraisal means your conclusions lack foundation.

The structured appraisal framework reviewers expect

Notified Bodies do not expect you to invent your own appraisal criteria from scratch. They expect you to apply a recognized framework adapted to your device type and clinical context. The most common approach is based on study design hierarchy and bias assessment tools.

For clinical studies, that often means using frameworks like GRADE, Cochrane Risk of Bias tools, or Newcastle-Ottawa Scale for observational studies. For device-specific evaluations, it means considering how the study design matches the regulatory evidence requirements.

But here is where many manufacturers go wrong: they mention these tools in the methodology section, then never actually apply them. Or they apply them superficially without showing the reasoning.

Common Deficiency
Listing appraisal criteria in a table without demonstrating how each criterion was assessed, what evidence was examined, and how it influenced the study’s weight in your conclusions.

The reviewer needs to see the logic. If you rated a study as “low risk of bias,” explain why. If you accepted a single-arm study despite its design limitations, explain what made it acceptable for your specific gap. If you excluded a study due to population mismatch, show the comparison.

This is not about writing more. It is about writing with precision and transparency.

Internal validity: Can you trust the study results?

Internal validity addresses whether the study was conducted in a way that produces reliable results. This is where bias assessment lives. And this is where most appraisals become vague.

Saying “the study had low risk of bias” is not an appraisal. Saying “randomization was computer-generated, allocation concealment was maintained, blinding was verified for outcome assessors, and loss to follow-up was under 5%” is an appraisal. The first is a conclusion. The second is evidence.

You need to look at selection bias, performance bias, detection bias, attrition bias, and reporting bias. For each one, you need to identify what the authors did, whether it was adequate, and whether any residual risk affects your interpretation of the results.

In device studies, detection bias is often critical. If the outcome was assessed by an unblinded operator, and the outcome is subjective, the results carry less weight. You do not discard the study. But you acknowledge the limitation and consider it when forming your conclusion.

The same applies to attrition bias. If 30% of patients dropped out and the analysis was not intention-to-treat, you cannot treat the results as definitive. You can still use the study, but you must adjust the level of confidence accordingly.

Key Insight
Internal validity is not pass/fail. It is a spectrum. Your appraisal must position each study on that spectrum and explain what it means for your conclusions.

External validity: Can the results generalize?

A study can be internally valid and still not apply to your device. External validity is about generalizability. It asks whether the study population, clinical setting, and intervention are close enough to your real-world use that the results can transfer.

This is where relevance starts to overlap with validity. But they are not the same thing. External validity is about the study design and conduct. Relevance is about your device specifically.

For external validity, you assess the inclusion and exclusion criteria of the study. Were the patients representative of your intended population? Or were they highly selected? A study conducted only in tertiary centers with experienced operators may not generalize to community settings.

You also assess the intervention. If the study used a strict protocol with frequent follow-ups and monitoring, the results may not reflect real-world use where adherence is lower and oversight is limited.

Notified Bodies pay attention to this because many clinical claims fail in post-market reality due to external validity gaps that were never acknowledged in the clinical evaluation.

If your device will be used by general practitioners, but all the literature comes from specialized centers, you have an external validity issue. You do not ignore the literature. But you must acknowledge the gap and address it, often through post-market surveillance or additional studies.

Relevance: Does this study support your specific device?

Relevance is the most device-specific part of appraisal. It is where you assess whether the study actually applies to the claims you are making for your product.

This is not just about equivalence. Even if you are using your own clinical data, relevance still matters. The study may have been conducted on an earlier version of your device. The population may have been narrower than your current indication. The endpoints may not match your performance claims.

You need to assess technical relevance, clinical relevance, and population relevance separately.

Technical relevance asks: Is the device in the study similar enough in design, material, mechanism of action, and performance characteristics? If you are claiming equivalence, this becomes a detailed comparison. If you are using your own data, this becomes a question of version control and design changes.

Clinical relevance asks: Does the study address the same clinical condition, at the same stage, with the same treatment goals? A study on acute management does not support claims about chronic use. A study on prevention does not support claims about treatment.

Population relevance asks: Are the patients in the study comparable to your intended users in terms of age, comorbidities, disease severity, and other factors that affect outcomes?

Common Deficiency
Claiming that a study is relevant without addressing each dimension of relevance separately. Reviewers need to see a structured comparison, not a general statement.

If any dimension shows a gap, you need to either justify why the gap is acceptable, provide additional evidence to fill it, or acknowledge it as a limitation that will be monitored post-market.

How to document appraisal in a way that survives review

Documentation is where most appraisals collapse. The analysis may have been done mentally, but if it is not visible in the report, it does not exist for the Notified Body.

The appraisal must be traceable for each individual study. That means each paper you include in your analysis should have a documented appraisal. Not just a rating. A reasoning.

I structure this using appraisal tables with specific criteria in rows and evidence in cells. Each criterion has a judgment and a justification. The justification cites specific elements from the study: sample size, randomization method, blinding approach, loss to follow-up percentage, patient characteristics, device specifications.

This makes the appraisal auditable. A reviewer can open the source paper, check the claim, and verify that the appraisal is accurate. If they cannot do that, the appraisal is not sufficient.

For overall quality rating, I use a consistent scale across all studies. High, moderate, low quality based on the combined assessment of validity and relevance. But the rating is not the appraisal. The rating is the conclusion of the appraisal. The appraisal itself is the detailed reasoning that led to that rating.

Key Insight
If you cannot trace each element of your appraisal back to a specific statement or data point in the source paper, your appraisal is not documented well enough for regulatory review.

What happens when appraisal is weak

Weak appraisal does not just delay your submission. It undermines your entire clinical evaluation. Because if the Notified Body cannot trust your appraisal methodology, they cannot trust your conclusions.

They will question your literature selection. They will question your equivalence claim. They will question your safety and performance conclusions. All of it becomes suspect because the foundation was not solid.

This often results in requests for additional studies, revised search strategies, or complete re-evaluation. Not because the data was wrong. But because the methodology was not rigorous enough to prove that the data was right.

I have seen manufacturers with strong clinical data receive major nonconformities simply because the appraisal was too shallow. The evidence was there. The reasoning was not visible.

The Notified Body is not trying to make your life difficult. They are trying to ensure that the conclusions in your clinical evaluation can be defended scientifically. And that requires an appraisal methodology that is structured, transparent, and applied consistently across all evidence.

Final thoughts

Literature appraisal is not a formality. It is the quality control step of your clinical evaluation. It is where you prove that your evidence base is solid enough to support the claims you are making and the risks you are accepting.

The manufacturers who get this right do not write more. They write with precision. They show their reasoning. They apply their methodology consistently. And when the Notified Body reviews their work, the appraisal becomes a strength, not a gap.

If your appraisal cannot withstand critical review, your clinical evaluation will not either.

Peace,
Hatem
Clinical Evaluation Expert for Medical Devices
Follow me for more insights and practical advice.

Frequently Asked Questions

What is a Clinical Evaluation Report (CER)?

A CER is a mandatory document under MDR 2017/745 that demonstrates the safety and performance of a medical device through systematic analysis of clinical data. It must be updated throughout the device lifecycle based on PMCF findings.

How often should the CER be updated?

The CER should be updated whenever significant new clinical data becomes available, after PMCF activities, when there are changes to the device or intended purpose, and at minimum during annual reviews as part of post-market surveillance.

What causes CER rejection by Notified Bodies?

Common reasons include inadequate equivalence demonstration, insufficient clinical data for claims, poorly structured SOTA analysis, missing gap analysis, and lack of clear benefit-risk determination. Structure and logical flow are as important as the data itself.

Which MDCG guidance documents are most relevant for clinical evaluation?

Key documents include MDCG 2020-5 (Equivalence), MDCG 2020-6 (Sufficient Clinical Evidence), MDCG 2020-13 (CEAR Template), MDCG 2020-7 (PMCF Plan), and MDCG 2020-8 (PMCF Evaluation Report).

Need Expert Help with Your Clinical Evaluation?

Get personalized guidance on MDR compliance, CER writing, and Notified Body preparation.

Peace, Hatem

Your Clinical Evaluation Partner

Follow me for more insights and practical advice.

References:
– Regulation (EU) 2017/745 (MDR), Annex XIV Part A
– MDCG 2020-13: Clinical Evaluation Assessment Report Template