Comparing Artificial Intelligence and Physicians: A Vignette-Based Study in Pediatric Clinical Decision-Making

Haseki Training and Research Hospital30 enrolled

Overview

This study evaluates how well anonymized artificial-intelligence (AI) tools perform on standardized pediatric case vignettes and whether showing AI suggestions can improve clinicians' answers. About 30 board-certified/eligible pediatric specialists at a single hospital complete a one-time session. Participants are randomized to two groups. Group A (n≈15): physicians answer each vignette once. Group B (n≈15): physicians answer and rate confidence (1-10), then review anonymized suggestions from five different AI tools (tool names not shown) and may keep or change their answer; changes and confidence are recorded. Primary focus: measure AI performance (diagnostic accuracy, medication-dosing accuracy, interpretation accuracy) overall and by difficulty tier, and record AI response time. Secondary focus: quantify how AI suggestions affect human performance (change in accuracy, direction of change, confidence shift, and time). No patients or biospecimens are involved; risks are minimal (time and possible discomfort with performance review). Findings may inform safe, evidence-based ways to use AI alongside clinicians in pediatrics.

Study Type

OBSERVATIONAL

Enrollment

Conditions

Artificial Intelligence (AI) in Diagnosis Decision Support Systems, Clinical Clinical Decision-making Pediatrics

AI Suggestions (Anonymized 5-tool panel)OTHER

What: Display of AI-generated suggestions for each vignette, aggregated from five large language model tools (names not shown to participants). When/Who: Shown only in Group 2, after the physician's initial answer and confidence score. Purpose: Measure AI performance (primary) and quantify the effect of AI suggestions on physicians' answers (secondary). Applies to: Group 2.

Confidence Rating Task (1-10 Likert)OTHER

What: Self-rated confidence for the initial answer on a 1-10 scale. When/Who: Group 2 before viewing AI suggestions. Purpose: Quantify confidence changes pre- vs post-AI and relate confidence to correctness. Applies to: Group 2.

Eligibility

Sex: ALLMin age: 28 YearsMax age: 40 YearsHealthy volunteers:

Medical Language ↔ Plain English

Inclusion Criteria: * Board-certified or board-eligible pediatric specialist (general pediatrics) (in the first 10 years of expertise) * Actively practicing at the participating institution/network at the time of enrollment. * Able and willing to complete all vignette items individually in a single session and to follow study instructions for the assigned cohort (direct answers or confidence rating + viewing anonymized AI suggestions). * Fluent in Turkish and able to use a computer interface. * Provides written informed consent. Exclusion Criteria: * Pediatric subspecialist practice as primary role (e.g., cardiology, infectious diseases, neurology, neonatology, etc.), to maintain a homogeneous general pediatrics cohort. * Prior access to or participation in creating the study vignettes, answer keys, or scoring rubrics; direct involvement with the study team. * Inability to complete the session without external help or use of non-protocol resources (internet/AI tools) during answering (outside of anonymized AI suggestions shown by the system in Group 2). * Failure to complete ≥90% of items or major protocol deviation (e.g., discussion with others during the task). * Any condition judged by investigators to interfere with valid participation (e.g., severe time constraints, inability to provide consent).

Outcomes

Primary Outcomes

AI Interpretation Accuracy (%)

Proportion of correct laboratory/imaging interpretations or appropriate next-test selections, per AI tool and pooled; stratified by difficulty tier. Unit: percent (0-100).

Time frame: Day 1

AI Diagnostic Accuracy (%)

Proportion of vignettes with a correct primary diagnosis produced by each anonymized AI tool and pooled across tools. Correctness is defined against a pre-specified reference answer key; results are also stratified by pre-defined difficulty tiers (easy/moderate/difficult/very difficult). Unit of measure: percent (0-100).

Time frame: Day 1

AI Medication-Dosing Accuracy (%)

Proportion of dose recommendations meeting pediatric standards (weight- or BSA-based ranges, route, frequency) per reference rubric, per AI tool and pooled; stratified by difficulty tier. Unit: percent (0-100).

Time frame: Day 1

Secondary Outcomes

Change in Physician Diagnostic Accuracy (percentage points) (Group 2 only)

Post-AI accuracy minus pre-AI accuracy per participant on the same case set; also categorized as beneficial (incorrect→correct), harmful (correct→incorrect), or no change. Accuracy is the proportion of cases with a correct final diagnosis according to a prespecified answer key.

Time frame: Day 1: Baseline (pre-AI) and immediate Post-AI within the same session (0-15 min after baseline).

Confidence Shift (Δ on a 1-10 scale) (Group 2 only)

Post-AI self-rated confidence minus pre-AI confidence; association with correctness is examined. Unit: scale points (-9 to +9).

Time frame: Day 1: Baseline (pre-AI) and immediate Post-AI within the same session (0-15 min after baseline).

Answer-Change Frequency (%) (Group 2 only)

Proportion of vignettes for which physicians revised their initial answer after AI suggestions; reported overall and by difficulty tier. Unit: percent (0-100).

Time frame: Day 1

AI Response Time (seconds per vignette)

Time from vignette display to final AI output, reported per tool and pooled; also by difficulty tier. Unit: seconds.

Time frame: Day 1

Net Benefit Index of AI Exposure (percentage points) (Group 2 only)

Beneficial change rate (incorrect→correct) minus harmful change rate (correct→incorrect) for diagnostic items; sensitivity analyses for dosing/interpretation. Unit: percentage points.

Time frame: Day 1

Comparing Artificial Intelligence and Physicians: A Vignette-Based Study in Pediatric Clinical Decision-Making

Overview

Conditions

Interventions

Eligibility

Locations (1)

Outcomes

Primary Outcomes

Secondary Outcomes