This study evaluates how well anonymized artificial-intelligence (AI) tools perform on standardized pediatric case vignettes and whether showing AI suggestions can improve clinicians' answers. About 30 board-certified/eligible pediatric specialists at a single hospital complete a one-time session. Participants are randomized to two groups. Group A (n≈15): physicians answer each vignette once. Group B (n≈15): physicians answer and rate confidence (1-10), then review anonymized suggestions from five different AI tools (tool names not shown) and may keep or change their answer; changes and confidence are recorded. Primary focus: measure AI performance (diagnostic accuracy, medication-dosing accuracy, interpretation accuracy) overall and by difficulty tier, and record AI response time. Secondary focus: quantify how AI suggestions affect human performance (change in accuracy, direction of change, confidence shift, and time). No patients or biospecimens are involved; risks are minimal (time and possible discomfort with performance review). Findings may inform safe, evidence-based ways to use AI alongside clinicians in pediatrics.
Study Type
OBSERVATIONAL
Enrollment
30
What: Display of AI-generated suggestions for each vignette, aggregated from five large language model tools (names not shown to participants). When/Who: Shown only in Group 2, after the physician's initial answer and confidence score. Purpose: Measure AI performance (primary) and quantify the effect of AI suggestions on physicians' answers (secondary). Applies to: Group 2.
What: Self-rated confidence for the initial answer on a 1-10 scale. When/Who: Group 2 before viewing AI suggestions. Purpose: Quantify confidence changes pre- vs post-AI and relate confidence to correctness. Applies to: Group 2.
SBÜ Sultangazi Haseki Training and Research Hospital
Istanbul, Sultangazi, Turkey (Türkiye)
AI Interpretation Accuracy (%)
Proportion of correct laboratory/imaging interpretations or appropriate next-test selections, per AI tool and pooled; stratified by difficulty tier. Unit: percent (0-100).
Time frame: Day 1
AI Diagnostic Accuracy (%)
Proportion of vignettes with a correct primary diagnosis produced by each anonymized AI tool and pooled across tools. Correctness is defined against a pre-specified reference answer key; results are also stratified by pre-defined difficulty tiers (easy/moderate/difficult/very difficult). Unit of measure: percent (0-100).
Time frame: Day 1
AI Medication-Dosing Accuracy (%)
Proportion of dose recommendations meeting pediatric standards (weight- or BSA-based ranges, route, frequency) per reference rubric, per AI tool and pooled; stratified by difficulty tier. Unit: percent (0-100).
Time frame: Day 1
Change in Physician Diagnostic Accuracy (percentage points) (Group 2 only)
Post-AI accuracy minus pre-AI accuracy per participant on the same case set; also categorized as beneficial (incorrect→correct), harmful (correct→incorrect), or no change. Accuracy is the proportion of cases with a correct final diagnosis according to a prespecified answer key.
Time frame: Day 1: Baseline (pre-AI) and immediate Post-AI within the same session (0-15 min after baseline).
Confidence Shift (Δ on a 1-10 scale) (Group 2 only)
Post-AI self-rated confidence minus pre-AI confidence; association with correctness is examined. Unit: scale points (-9 to +9).
Time frame: Day 1: Baseline (pre-AI) and immediate Post-AI within the same session (0-15 min after baseline).
Answer-Change Frequency (%) (Group 2 only)
Proportion of vignettes for which physicians revised their initial answer after AI suggestions; reported overall and by difficulty tier. Unit: percent (0-100).
Time frame: Day 1
AI Response Time (seconds per vignette)
Time from vignette display to final AI output, reported per tool and pooled; also by difficulty tier. Unit: seconds.
Time frame: Day 1
Net Benefit Index of AI Exposure (percentage points) (Group 2 only)
Beneficial change rate (incorrect→correct) minus harmful change rate (correct→incorrect) for diagnostic items; sensitivity analyses for dosing/interpretation. Unit: percentage points.
Time frame: Day 1
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.