Setting Acceptance Criteria From Your SOTA Analysis

Hatem Rabeh

Written by HATEM RABEH, MD, MSc Ing

Your Clinical Evaluation Expert And Partner

in
S

Your device achieves 85% sensitivity. Is that good enough? Without SOTA-derived acceptance criteria, you cannot answer. You might celebrate results that reviewers reject. Or worse, you might abandon a device that actually exceeds current standards. Acceptance criteria are not arbitrary targets. They are evidence-based thresholds derived from clinical reality.

Acceptance criteria define the measurable thresholds your device must achieve. They appear in your Clinical Evaluation Plan and are validated in your Clinical Evaluation Report. They determine whether your evidence demonstrates conformity with safety and performance requirements. Get them wrong, and your entire clinical evaluation strategy fails.

The most common mistake is setting criteria without reference to SOTA. Teams pick round numbers that feel reasonable. They copy criteria from similar products without understanding the context. They set targets based on what they hope to achieve rather than what the clinical context requires.

What Regulatory Guidance Requires

MDCG 2020-1 requires software to demonstrate valid clinical association, analytical performance, and clinical performance, with evidence tailored to device risk and intended population. MDR Annex I mandates that acceptance and performance criteria are defined up front, including reliability, accuracy, robustness, and safety under normal and foreseeable misuse.

The key phrase is “defined up front.” You cannot evaluate evidence without knowing what success looks like. And you cannot define success without understanding the clinical context. This is why SOTA analysis must precede acceptance criteria definition.

Key Insight
Acceptance criteria are hypotheses about what performance level is clinically meaningful. Your SOTA analysis provides the evidence to support these hypotheses.

The Four-Step Criteria Method

Step 1: Identify the key performance metrics aligned with your intended use. What does success look like for this device? Sensitivity? Specificity? Time to result? Usability measures? List every metric that matters for your clinical claims.

Step 2: Benchmark against state-of-the-art competitors and published literature. What do current solutions achieve for each metric? What does the literature establish as clinically meaningful thresholds? Build a table of current performance levels.

Step 3: Set thresholds based on risk analysis and clinical requirements. Where does your device need to match current standards? Where does it need to exceed them? Where is acceptable performance lower because your device addresses a different need?

Step 4: Validate that criteria are measurable and achievable. Can you actually measure these metrics with available data? Is there a realistic path to achieving these thresholds? Criteria that cannot be measured or achieved are useless.

Example Acceptance Criteria Thresholds

Sensitivity Threshold
90%
Specificity Threshold
85%
AUROC (Benchmark)
90%
AUROC (Real-World)
85%
Max Degradation %
5%

Real-World Examples

Strong acceptance criteria look like this:

Sensitivity at least 90% and specificity at least 85% on external datasets with 500 or more patients. AUROC at least 0.90 on benchmark data and at least 0.85 on real-world data. Performance degradation no more than 5% moving from clean to noisy data. Subgroup thresholds met across demographic categories. Latency no more than 2 seconds per case. No more than 1 serious user error per 100 operations.

Notice the specificity. Not just “high sensitivity” but a number with a dataset requirement. Not just “fast” but a measurable time threshold. Each criterion is tied to how it will be verified.

Common Rejection
Reviewers flag devices lacking defined criteria, featuring only lab metrics without real-world validation, missing subgroup analysis, or lacking post-market performance monitoring plans.

Risk-Based Scaling

CORE-MD recommends scaling evidence depth and criterion strictness according to harm risk. Higher risk requires tighter thresholds and more pre-market data. Lower risk may allow wider tolerances with post-market monitoring.

Your acceptance criteria should reflect this scaling. A diagnostic device for a life-threatening condition needs tighter sensitivity thresholds than a wellness device. A device used in emergency settings needs faster response time criteria than one used in routine screening.

Document the risk reasoning behind each threshold. Why is 90% sensitivity the right target? Because lower sensitivity in this clinical context could result in missed diagnoses with specific harm potential. This reasoning demonstrates that criteria are derived, not arbitrary.

Criteria for Different Evidence Types

Different evidence types require different criteria formats:

For clinical performance, define outcome thresholds with confidence intervals. For analytical performance, define technical specifications with tolerance ranges. For usability, define error rates and task completion metrics. For safety, define acceptable adverse event frequencies relative to clinical benefit.

Each criterion type connects to a specific evidence generation activity. Clinical performance criteria drive clinical study design. Analytical criteria drive verification testing. Usability criteria drive human factors studies.

Key Insight
Acceptance criteria are not just evaluation tools. They are study design drivers. Define criteria first, then design evidence generation to verify them.

Documentation Requirements

Your CEP should contain a claims-to-metrics mapping table showing each claim, its associated metric, target value, floor threshold, evidence pillar, GSPR reference, dataset, analysis method, and decision rule.

This table makes criteria traceable and auditable. Reviewers can see exactly what you will measure, how you will measure it, and what results you will accept. No ambiguity, no interpretation required.

In the next post, we will address the often-neglected Step 4 of SOTA analysis: documenting risks, limitations, and uncertainties honestly and constructively.

Peace,
Hatem
Your Clinical Evaluation Partner

Frequently Asked Questions

What if my device cannot meet benchmark thresholds?

If your device offers other advantages (speed, cost, accessibility), criteria may be justifiably lower for some metrics if higher for others. Document the clinical reasoning. A faster device with slightly lower sensitivity may be appropriate if it enables screening in settings where current devices are not available.

How do I handle metrics without established benchmarks?

For novel metrics, establish clinical meaningfulness from first principles. What level of performance would be clinically useful? What would physicians accept? Document this reasoning and plan PMCF activities to validate your thresholds over time.

Should criteria include confidence intervals?

Yes, when possible. A criterion of 90% sensitivity is ambiguous. A criterion of 90% sensitivity with 95% CI lower bound of 85% is precise. Include statistical requirements that reflect the uncertainty inherent in your evidence.

Series: State of the Art Mastery

Part 3 of 5

Coming Soon

Documenting Risks and Gaps in Your SOTA

Need Expert Help with Your Clinical Evaluation?

Get personalized guidance on MDR compliance, CER writing, and Notified Body preparation.

Peace, Hatem

Your Clinical Evaluation Partner

Follow me for more insights and practical advice.

References:
– MDCG 2020-1: Guidance on Clinical Evaluation for Medical Device Software
– MDR 2017/745 Annex I GSPRs
– CORE-MD Framework (Nature Digital Medicine 2024)