Why your equivalence breaks down at the subgroup level

Hatem Rabeh

Written by HATEM RABEH, MD, MSc Ing

Your Clinical Evaluation Expert And Partner

in
S

Last month, I reviewed a CER where the manufacturer claimed clinical equivalence based on overall trial results. The device was approved. Then real-world data started coming in. Certain patient subgroups showed different outcomes. The equivalence argument collapsed because no one had analyzed whether it held across populations.

This is not a rare case. It happens often.

Manufacturers build equivalence arguments on aggregated data. They compare overall performance. They conclude the devices are similar. Then they submit.

But equivalence is not just about average results. It is about whether the clinical performance holds across the different types of patients who will actually use the device.

If your device performs differently in elderly patients, in diabetics, or in patients with comorbidities, your equivalence claim may not be valid for those groups. And if those groups represent a significant part of your intended population, your clinical evaluation has a gap.

This is where subgroup analysis becomes essential.

What the MDR Actually Requires

MDR Article 61 and Annex XIV require that clinical evaluation demonstrates safety and performance for the intended patient population. Not an average patient. Not a theoretical user. The actual population.

MDCG 2020-5 reinforces this. It states that clinical data must be sufficient to cover the intended purpose and the target population. If your population includes distinct subgroups with different risk profiles, you need to evaluate whether your clinical evidence applies to them.

This is not about being thorough for the sake of documentation. This is about whether your evidence actually supports your claims.

Key Insight
Equivalence demonstrated in a general population does not automatically mean equivalence holds in every subgroup of your intended users. Reviewers know this. Notified Bodies look for it.

When Subgroup Differences Matter

Not every difference requires a separate analysis. Some variation is expected. Some is clinically irrelevant.

But certain differences change how a device performs or how the body responds to it.

Age is one factor. Elderly patients may have reduced healing capacity, thinner skin, altered metabolism. A wound dressing that works well in young adults may behave differently on fragile elderly skin.

Comorbidities are another. Diabetes affects wound healing. Renal insufficiency affects drug metabolism. Immunosuppression affects infection risk. If your device interacts with any of these pathways, subgroup analysis is not optional.

Disease severity matters too. A monitoring device validated in stable patients may not perform the same in acute or critical cases. The physiological signals differ. The clinical context differs. The interpretation of results differs.

Then there is anatomical variation. Devices used in different anatomical sites, or in patients with different body habitus, may show different mechanical behavior or different complication rates.

The question is not whether differences exist. The question is whether those differences affect clinical outcomes in a way that challenges your equivalence or safety claims.

How Reviewers Identify the Gap

Notified Body reviewers do not start by asking whether you did subgroup analysis. They start by reading your intended use and your target population.

Then they look at your clinical data sources. They check the inclusion and exclusion criteria of the studies you cite. They compare the study populations to your claimed population.

If there is a mismatch, they flag it.

If your device is indicated for patients over 65, but all your data comes from trials with a mean age of 50, you have a gap.

If your device is intended for diabetic foot ulcers, but your equivalence data comes from general wound care studies that excluded diabetics, you have a problem.

If your device will be used across a range of disease severities, but your data only covers mild cases, your evidence base is incomplete.

Common Deficiency
Manufacturers claim their device is safe and effective for a broad population, then submit clinical data from a narrow, well-controlled trial that excluded most real-world patients. The gap becomes obvious during review.

What Subgroup Analysis Actually Looks Like

Subgroup analysis is not just splitting data into categories and reporting means. It is asking whether the clinical conclusions you drew from the overall data still hold when you look at different patient groups separately.

Start with the key outcomes. Safety outcomes. Performance outcomes. The outcomes that define whether your device works.

Then stratify by the factors that could plausibly affect those outcomes. Age groups. Comorbidities. Disease severity. Anatomical location. Whatever is clinically relevant.

For each subgroup, evaluate whether:

  • The safety profile remains acceptable
  • The performance remains within expected range
  • The benefit-risk balance remains favorable
  • The equivalence claim (if applicable) still holds

If any subgroup shows a different pattern, you need to understand why. Is it a real difference or statistical noise? Is it clinically significant? Does it affect your claims?

Sometimes the answer is that certain subgroups should not be included in your intended use. That is a valid conclusion. Better to define your population accurately than to claim universal applicability without evidence.

The Equivalence Problem

Equivalence claims are particularly vulnerable to subgroup effects.

You demonstrate equivalence by showing that your device and the equivalent device perform similarly. But that similarity is based on data. And data comes from specific populations.

If the equivalent device was tested in a different population than yours, you are not actually comparing like with like.

If the equivalent device shows different performance in subgroups, and you have not assessed whether your device shows the same pattern, your equivalence argument is incomplete.

I have seen CERs where the manufacturer claimed equivalence based on a meta-analysis of multiple studies. But when you look closely, the studies included different patient populations. Some included diabetics, some excluded them. Some included elderly patients, some had upper age limits.

The manufacturer averaged across all studies and concluded equivalence. But equivalence in an average does not mean equivalence in each relevant subgroup.

Notified Bodies catch this. They ask:

Frequently Asked Questions

What is a Clinical Evaluation Report (CER)?

A CER is a mandatory document under MDR 2017/745 that demonstrates the safety and performance of a medical device through systematic analysis of clinical data. It must be updated throughout the device lifecycle based on PMCF findings.

How often should the CER be updated?

The CER should be updated whenever significant new clinical data becomes available, after PMCF activities, when there are changes to the device or intended purpose, and at minimum during annual reviews as part of post-market surveillance.

What causes CER rejection by Notified Bodies?

Common reasons include inadequate equivalence demonstration, insufficient clinical data for claims, poorly structured SOTA analysis, missing gap analysis, and lack of clear benefit-risk determination. Structure and logical flow are as important as the data itself.

Which MDCG guidance documents are most relevant for clinical evaluation?

Key documents include MDCG 2020-5 (Equivalence), MDCG 2020-6 (Sufficient Clinical Evidence), MDCG 2020-13 (CEAR Template), MDCG 2020-7 (PMCF Plan), and MDCG 2020-8 (PMCF Evaluation Report).

Need Expert Help with Your Clinical Evaluation?

Get personalized guidance on MDR compliance, CER writing, and Notified Body preparation.

Peace, Hatem

Your Clinical Evaluation Partner

Follow me for more insights and practical advice.