Optimizing the interaction between the human and the machine is a major topic when deploying artificial intelligence (AI) at the bedside. The goal of this randomized clinical vignette study is to learn if presenting AI model outputs via continuous Bayesian updates and/or uncertainty quantification can improve diagnostic accuracy and clinician trust in healthcare professionals (physicians, residents, fellows, physician assistants (PAs), and nurse practitioners (NPs)) from US academic institutions evaluating patients with chest pain or dyspnea. The main questions it aims to answer are: * Does presenting AI predictions as Bayesian-updated post-test probabilities improve diagnostic accuracy compared to standard predicted probabilities? * Does the addition of uncertainty quantification (95% confidence intervals) to AI predictions improve diagnostic accuracy? * Do these interventions (Bayesian updating and/or uncertainty quantification) help clinicians recover from the negative effects of intentionally misleading AI predictions? Comparison: Researchers will compare standard AI predicted probabilities (presented without uncertainty) to Bayesian-updated post-test probabilities and/or outputs containing 95% confidence intervals to see if the interventions improve diagnostic accuracy, clinician confidence, and resilience against misleading AI. Participants will: * Review 8 clinical vignettes (simulated patient cases) focusing on chest pain or dyspnea. * Provide an initial "pre-test" diagnostic probability for 5 possible diagnoses based on the clinical history alone. * View AI model outputs that vary by experimental condition (standard probability vs. Bayesian update, with or without uncertainty intervals, and accurate vs. misleading). * Provide an updated "post-test" diagnostic probability for the diagnoses after viewing the AI output. * Select and rank diagnostic tests and therapeutic steps for each vignette. Complete a post-survey regarding their trust in the AI, comfort with the data presentation, and demographics.
Study Design: This is a 2x2 factorial within-subjects design. The two factors are (1) Bayesian updating via continuous likelihood ratios (CLR) vs. standard predicted probability, and (2) uncertainty quantification (95% confidence intervals) vs. point estimate only. AI prediction accuracy (accurate vs. intentionally misleading) is varied as a within-subjects stratification factor balanced across all 4 conditions, with half of each participant's vignettes receiving accurate predictions and half receiving misleading predictions. AI predictions are simulated (pre-programmed) for experimental control. Vignette order and condition assignment are independently randomized per participant. Primary Analysis: Diagnostic accuracy is analyzed using a generalized linear mixed model (GLMM) with fixed effects for CLR, Uncertainty, Misleading, and vignette, and a participant random intercept. Pre-specified secondary analyses examine interactions of presentation format with misleading AI. Sample Size: Simulation-based power analysis (1,000 Monte Carlo iterations per scenario) was conducted using the planned GLMM. Assuming 70% baseline diagnostic accuracy and within-participant ICC of 0.25, the study achieves 85.8% power for the CLR main effect and 85.7% for the Uncertainty main effect with N=100 at alpha=0.05 (two-tailed).
Study Type
INTERVENTIONAL
Allocation
RANDOMIZED
Purpose
OTHER
Masking
SINGLE
Enrollment
100
Rather than presenting the AI model's raw predicted probability, the system takes the clinician's pre-test probability (entered before seeing AI output) and applies a continuous likelihood ratio (CLR) derived from the AI model to calculate a Bayesian-updated post-test probability. The output is displayed as a shift from the clinician's own assessment (e.g., "Your assessment: 45% -\> Updated assessment: 72%"). The raw AI prediction is not shown. This approach mirrors how clinicians use diagnostic test results such as D-dimer to update pre-test probability of pulmonary embolism.
AI model prediction is presented as a simple predicted probability (0-100%) for each of the possible diagnoses, together with the top 3 clinical features driving the prediction (e.g., "Acute Myocardial Infarction: 68% - Key factors: elevated troponin, ST-segment changes on ECG, chest pain radiation to left arm"). This represents the most common current approach to presenting AI-based diagnostic predictions in clinical settings.
The AI output (whether Bayesian-updated post-test probability or standard predicted probability) is presented together with a 95% confidence band displayed as error bars on probability bars. For accurate AI predictions, confidence interval width is approximately +/-12-15 percentage points. For misleading AI predictions, confidence intervals are widened by a factor of 1.5x (approximately +/-18-23 percentage points) to simulate reduced model confidence in unfamiliar or edge-case scenarios. Confidence intervals are constrained to the 0-100% range.
Clinician Diagnostic Accuracy
Proportion of correct diagnostic assessments across all vignettes and experimental conditions. For each vignette, participants rate 5 possible diagnoses on a 0-100% probability scale. The diagnosis assigned the highest probability is considered the participant's final diagnosis. Accuracy is determined by comparing the final diagnosis to the ground truth diagnosis established by expert panel consensus (minimum 4 of 5 board-certified physicians in agreement). Analyzed using a generalized linear mixed model (GLMM) with binary outcome (correct vs. incorrect), fixed effects for CLR, uncertainty quantification, misleading AI, and vignette, and a random intercept for participant.
Time frame: Day 1 during survey completion
Change in Diagnostic Probability Estimates
Magnitude and direction of change in clinician-provided probability estimates from pre-test assessment (before AI output) to post-test assessment (after AI output) for each of 5 possible diagnoses per vignette. Measured on a 0-100% scale.
Time frame: Day 1 during survey completion
Diagnostic Accuracy Under Misleading AI Predictions
Proportion of correct final diagnoses when AI predictions are intentionally misleading vs. accurate, and whether the interventions (Bayesian updating, uncertainty quantification) mitigate the negative effect of misleading AI. Assessed via interaction terms (CLR x Misleading, Uncertainty x Misleading) in the primary GLMM.
Time frame: Day 1 during survey completion
Clinician Satisfaction With AI Decision Support (Exploratory)
Self-reported satisfaction with the AI-based clinical decision support, measured via question(s) in the post-survey questionnaire.
Time frame: Day 1 during survey completion
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.