Why calibration traceability breaks your IVD clinical claim

Hatem Rabeh

Written by HATEM RABEH, MD, MSc Ing

Your Clinical Evaluation Expert And Partner

in
S

I reviewed a performance evaluation last month where the manufacturer demonstrated excellent clinical sensitivity and specificity. The Notified Body still flagged a critical gap. The problem was not the clinical data. The problem was calibration traceability. The entire claim collapsed because the reference standard could not be defended.

Reference standards are not administrative details. They are the foundation of your IVD’s clinical performance claim. If your calibration cannot be traced to a recognized reference system, your performance data means nothing under regulatory scrutiny.

This is not about documentation hygiene. This is about whether your clinical claim can survive review.

What reviewers actually look for in calibration documentation

When a Notified Body reviews your performance evaluation, they read calibration documentation differently than most manufacturers expect.

They do not just check if you used a reference standard. They assess whether that reference standard is fit for regulatory defense.

The question they ask is simple: if we challenge this claim in a market surveillance action, can the manufacturer prove that the measured values are traceable to an internationally recognized reference system?

If the answer is unclear, the entire clinical performance claim is at risk.

Key Insight
Calibration traceability is not a laboratory procedure. It is a regulatory argument. The question is whether your IVD’s output can be defended against an independent reference that authorities recognize.

Why reference standard choice determines performance validity

Most manufacturers understand that reference standards matter. What they underestimate is how deeply reference standard selection affects the defensibility of clinical performance data.

Here is what happens in practice.

You design a clinical performance study. You select samples. You define comparators. You measure sensitivity and specificity. Everything looks strong.

Then the Notified Body asks: what reference standard did you use to assign the true positive and true negative status of your samples?

If that reference standard is not recognized or traceable, your entire performance dataset becomes questionable.

This is not theoretical. I have seen performance evaluations where the manufacturer used an in-house method as the reference. The clinical data was robust. The study design was solid. But the performance claim could not be validated because the reference was not externally recognized.

Common Deficiency
Manufacturers often select reference standards based on availability or convenience. Reviewers assess reference standards based on regulatory traceability and external recognition. The gap between these two perspectives creates most calibration-related rejections.

What makes a reference standard defensible

A defensible reference standard is one that can be traced to an internationally recognized reference system or a widely accepted clinical reference method.

For quantitative IVDs, this typically means traceability to an International System of Units (SI) reference or to a reference material from a recognized organization like the Joint Committee for Traceability in Laboratory Medicine (JCTLM) or the World Health Organization (WHO).

For qualitative IVDs, this often means using a validated clinical reference method or a comparator that is itself traceable to a recognized standard.

The critical element is external recognition. If your reference standard requires explanation or justification, it is already weak from a regulatory perspective.

How calibration gaps destroy clinical performance claims

Let me show you what calibration failure looks like in a real submission.

A manufacturer submitted a performance evaluation for a quantitative immunoassay. The clinical performance data showed excellent correlation with predicate devices. The analytical performance was well-documented. The clinical evidence appeared strong.

The Notified Body rejected the submission with a single question: how is your device calibrated, and what is the traceability chain to an internationally recognized reference standard?

The manufacturer had calibrated the device using an internal reference panel. That panel was characterized using the device itself. There was no external traceability.

This created a circular validation problem. The device was validated using a reference that was defined by the device. There was no independent anchor point.

The clinical performance claim collapsed because there was no way to demonstrate that the measured values corresponded to anything outside the manufacturer’s own measurement system.

Key Insight
Circular calibration is one of the most common and most fatal errors in IVD performance evaluation. If your reference standard is defined by your own device, you have not demonstrated performance. You have demonstrated internal consistency, which is not the same thing.

What reviewers see when calibration is missing

When calibration traceability is weak or absent, reviewers see several problems at once.

First, they cannot verify that your performance data reflects real clinical utility. If your device measures something in arbitrary units with no external reference, the clinical performance numbers are meaningless outside your laboratory.

Second, they cannot compare your device to other devices in the market. If two devices use different calibration systems with no common reference, their performance cannot be compared. This makes equivalence claims impossible.

Third, they cannot assess whether your device will remain accurate over time or across different production lots. Without traceability to a stable external reference, there is no anchor to detect drift or degradation.

All of these problems trace back to the same root cause: the absence of a defensible reference standard with clear traceability.

What IVDR Annex I and MDCG 2022-2 actually require

The IVDR does not leave calibration as an optional detail. Annex I, Section 9.2 explicitly requires manufacturers to establish traceability of values assigned to calibrators and control materials.

The regulation states that traceability shall be to available reference measurement procedures or available reference materials of a higher metrological order.

This is not guidance. This is a regulatory requirement.

MDCG 2022-2 reinforces this by emphasizing that performance evaluation must demonstrate how the device achieves its intended performance characteristics. That demonstration depends entirely on the validity of the reference system used during performance studies.

If your reference system is not traceable, your performance evaluation does not meet the regulation.

Common Deficiency
Many manufacturers address calibration only in the technical documentation. But calibration traceability must also be explicitly documented in the performance evaluation. Reviewers expect to see how the reference standard used in clinical studies connects to the device’s calibration system.

How to document calibration traceability in the performance evaluation

Documenting calibration traceability is not about listing standards. It is about demonstrating a logical chain from your device’s output back to an internationally recognized reference.

The documentation should answer these questions clearly:

What reference standard was used to establish the device’s calibration?

What is the metrological traceability chain from that reference standard to a higher-order reference material or measurement procedure?

How was the reference standard used in clinical performance studies, and how does it connect to the device’s routine calibration process?

What evidence supports the stability and validity of the reference standard over time?

If any of these questions cannot be answered with clear evidence and documented traceability, the performance evaluation is incomplete.

Why commutability matters for reference materials

Here is a calibration problem that most manufacturers discover too late: not all reference materials behave the same way in different measurement systems.

A reference material is commutable if it behaves like a real patient sample across different measurement procedures. If a reference material is not commutable, it may give accurate results in one system but inaccurate results in another.

This creates a hidden risk in performance evaluation. You may calibrate your device using a reference material that performs well in your system. But if that reference material is not commutable, your device may not perform accurately when measuring real patient samples.

Reviewers know this. When they see a performance evaluation based on reference materials, they assess whether those materials are commutable with the intended sample type.

If commutability is not addressed, the performance claim is questionable.

Key Insight
Commutability is the bridge between calibration and clinical performance. A reference material may be metrologically traceable but still fail to predict real-world performance if it does not behave like actual patient samples. This gap is a common source of post-market performance failures.

What happens when calibration is not aligned with clinical reality

Let me give you a real example of how calibration misalignment creates clinical risk.

A manufacturer developed a diagnostic test for a cardiac biomarker. The device was calibrated using a reference material from a recognized international organization. The analytical performance was excellent. The clinical performance study showed strong diagnostic accuracy.

After market launch, clinical users reported that the device systematically overestimated biomarker levels compared to other devices in the same clinical setting.

The root cause was calibration misalignment. The reference material used for calibration was valid but was not harmonized with the reference materials used by other manufacturers in the same clinical space.

The result was technically accurate measurements that were clinically incompatible with existing diagnostic thresholds and decision algorithms.

This is what happens when calibration traceability is technically correct but clinically disconnected.

Common Deficiency
Manufacturers often focus on metrological traceability without assessing clinical harmonization. A device can be perfectly traceable to an SI unit and still be clinically incompatible with existing diagnostic pathways if it is not harmonized with the reference systems used in the relevant clinical context.

How to defend your calibration strategy under review

When a Notified Body challenges your calibration strategy, they are not looking for perfection. They are looking for defensibility.

Here is what a defensible calibration strategy looks like in practice.

First, you must demonstrate that your reference standard is recognized outside your organization. This means using materials or methods from organizations like JCTLM, WHO, or other internationally recognized reference bodies.

Second, you must document the full traceability chain. This is not a one-line statement. It is a clear, stepwise explanation of how values are transferred from the highest-order reference down to your device’s calibration system.

Third, you must show that your calibration system is stable and reproducible. This means providing evidence that calibration can be maintained across production lots, over time, and in different laboratories.

Fourth, you must connect calibration to clinical performance. Reviewers want to see how the reference standard used in your performance studies relates to the calibration system in your commercial device.

If any of these elements is weak, the entire calibration strategy is at risk.

What to include in the calibration section of the performance evaluation

The calibration section of your performance evaluation should not be an afterthought. It should be a structured argument that supports your clinical performance claim.

Include the following:

A clear statement of the reference standard or reference measurement procedure used for calibration.

A documented traceability chain showing how your calibration links to a higher-order reference material or measurement procedure.

Evidence of commutability if you used reference materials that differ from the intended sample type.

A demonstration of how the calibration system used in performance studies connects to the calibration system in your commercial device.

Data showing that calibration is stable and reproducible across production lots and over time.

If your performance evaluation includes this evidence clearly, you significantly reduce the risk of calibration-related deficiencies.

What comes next: integrating calibration into the broader performance argument

Calibration is not an isolated technical requirement. It is the foundation of your entire IVD performance argument.

Without defensible calibration, your analytical performance data has no anchor. Your clinical performance data cannot be compared to other devices. Your post-market surveillance cannot detect meaningful drift.

The next part of this series will address how to structure the entire performance evaluation so that calibration, analytical performance, and clinical performance form a coherent and defensible argument.

Because in regulatory review, coherence is not optional. It is the difference between approval and rejection.

Frequently Asked Questions

What is a Clinical Evaluation Report (CER)?

A CER is a mandatory document under MDR 2017/745 that demonstrates the safety and performance of a medical device through systematic analysis of clinical data. It must be updated throughout the device lifecycle based on PMCF findings.

How often should the CER be updated?

The CER should be updated whenever significant new clinical data becomes available, after PMCF activities, when there are changes to the device or intended purpose, and at minimum during annual reviews as part of post-market surveillance.

What causes CER rejection by Notified Bodies?

Common reasons include inadequate equivalence demonstration, insufficient clinical data for claims, poorly structured SOTA analysis, missing gap analysis, and lack of clear benefit-risk determination. Structure and logical flow are as important as the data itself.

Which MDCG guidance documents are most relevant for clinical evaluation?

Key documents include MDCG 2020-5 (Equivalence), MDCG 2020-6 (Sufficient Clinical Evidence), MDCG 2020-13 (CEAR Template), MDCG 2020-7 (PMCF Plan), and MDCG 2020-8 (PMCF Evaluation Report). IVDR Annex I, MDCG 2022-2

Need Expert Help with Your Clinical Evaluation?

Get personalized guidance on MDR compliance, CER writing, and Notified Body preparation.

Peace, Hatem

Your Clinical Evaluation Partner

Follow me for more insights and practical advice.

References:
– Regulation (EU) 2017/746 (IVDR), Annex I, Section 9.2
– MDCG 2022-2: Guide on performance evaluation of in vitro diagnostic medical devices

Related Resources

Read our complete guide to CER under EU MDR: Clinical Evaluation Report (CER) under EU MDR

Or explore Complete Guide to Clinical Evaluation under EU MDR