The goal of this randomized questionnaire-based study is to evaluate how different presentations of artificial intelligence (AI) decision support influence clinical judgment among medical doctors working in obstetrics and gynecology when assessing the risk of spontaneous preterm birth using clinical case vignettes with cervical ultrasound images. The study specifically compares two AI presentation formats: a binary classification (preterm vs term birth) and an individualized risk estimate of preterm birth. The main questions it aims to answer are: * Which AI presentation format leads to better alignment between clinicians' confidence and decision accuracy (diagnostic calibration)? * Do different AI presentation formats lead to helpful or harmful changes in clinical decisions? Participants will complete an online questionnaire in which they review clinical cases, make diagnostic and management decisions, rate their diagnostic confidence before and after seeing the AI output, and report their trust in the AI.
Study Type
INTERVENTIONAL
Allocation
RANDOMIZED
Purpose
OTHER
Masking
SINGLE
Enrollment
125
AI decision support based on cervical ultrasound providing a binary classification (preterm birth before 37 weeks or term birth) in addition to standard clinical information.
AI decision support based on cervical ultrasound providing an estimate of preterm birth risk (%) in addition to standard clinical information.
Copenhagen University Hospital, Rigshospitalet
Copenhagen, Denmark
NOT_YET_RECRUITINGHerlev Hospital
Herlev, Denmark
RECRUITINGCopenhagen University Hospital, North Zealand
Hillerød, Denmark
RECRUITINGHolbæk Hospital
Holbæk, Denmark
NOT_YET_RECRUITINGHvidovre Hospital
Hvidovre, Denmark
NOT_YET_RECRUITINGZealand University Hospital, Roskilde
Roskilde, Denmark
RECRUITINGSlagelse Hospital
Slagelse, Denmark
NOT_YET_RECRUITINGClinician diagnostic calibration (accuracy-confidence alignment) after AI exposure.
Agreement between post-AI decision correctness (0/1) and post-AI confidence rating (0-10) will be quantified using the Brier score. Confidence will be rescaled to 0-1 and squared differences between confidence and correctness will be averaged across cases to produce a participant-level score. Lower scores indicate better diagnostic calibration. Results will be compared between randomized arms.
Time frame: Immediately after AI exposure during a single questionnaire session (approximately 20 minutes).
Helpful switch rate and harmful switch rate.
Proportion of cases with helpful and harmful switches calculated for each participant and compared between study arms. Helpful switch = incorrect pre-AI decision changing to correct post-AI decision. Harmful switch = correct pre-AI decision changing to incorrect post-AI decision.
Time frame: Baseline (pre-AI) and immediately after AI exposure during a single questionnaire session (approximately 20 minutes).
Change in decision accuracy, confidence, and diagnostic calibration from pre-AI to post-AI.
Within-participant change from pre-AI to post-AI in decision accuracy (proportion of correct decisions), confidence rating, and diagnostic calibration. Differences will be compared between randomized arms and stratified by AI correctness.
Time frame: Baseline (pre-AI) and immediately after AI exposure during a single questionnaire session (approximately 20 minutes).
Association between self-rated trust in AI and behavioral reliance on AI.
Self-rated trust in the AI output will be measured using a numeric rating scale (0-10) after AI exposure for each case. Behavioral reliance will be quantified as the proportion of post-AI decisions concordant with the AI output. The relationship between trust ratings and behavioral reliance, including concordance when the AI is correct and incorrect, will be evaluated at the participant level and compared between randomized arms.
Time frame: Immediately after AI exposure during a single questionnaire session (approximately 20 minutes).
Follow-up cervical ultrasound planning.
Proportion of cases in which clinicians plan an additional cervical ultrasound (yes/no), summarized per participant and compared pre-post AI and between randomized arms.
Time frame: Baseline (pre-AI) and immediately after AI exposure during a single questionnaire session (approximately 20 minutes).
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.