Impact of AI Feedback on Ultrasound Biometry Accuracy Across the Expertise Levels

N/ANot Yet RecruitingNCT07476638

Copenhagen Academy for Medical Education and Simulation75 enrolled

Overview

Objective: To evaluate the impact of real-time AI feedback on fetal biometry accuracy and investigate the Expertise Reversal Effect-whether AI benefits diminish as user experience increases. Design: A stratified randomized trial of 75 participants (25 Novices, 25 Intermediates, 25 Experts). Users are randomized 1:1 to either AI-assisted or manual measurement groups. Outcomes: * Primary: EFW accuracy (MAPE) compared to actual birthweight. * Secondary: Procedure time, image quality, error relative to baseline scans, and cognitive workload (NASA-TLX).

Study Overview: This study evaluates how real-time Artificial Intelligence (AI) feedback impacts the accuracy of fetal biometry measurements in obstetric ultrasound. While AI tools are designed to assist clinicians, their effectiveness may vary depending on the user's baseline skill level-a phenomenon known as the "Expertise Reversal Effect." Research Aim: The primary objective is to determine if AI-guided feedback significantly reduces measurement error in ultrasound fetal weight estimation to traditional manual methods. The study specifically investigates whether the benefit of AI is greater for novice users, intermediate users users than for experienced specialists. Study Design: This is a stratified, randomized controlled trial involving 75 participants categorized into three expertise tiers: Novices (e.g., students or residents with minimal scan experience). Intermediate Users (e.g., physicians in mid-level training). Experts (e.g., senior specialists). Participants within each tier will be randomized 1:1 to either the AI-Assisted Group (receiving real-time automated plane validation and calipers) or the Control Group (performing standard manual biometry). Primary Outcome Measure: Accuracy of Estimated Fetal Weight (EFW): The Mean Absolute Percentage Error (MAPE) of the EFW relative to the actual birthweight, assessing the clinical impact of AI assistance on weight prediction. Secondary Outcome Measures: * Procedural Efficiency: Total procedure time (probe-to-skin) required to complete the biometry. * Image Quality: Objective assessment of captured planes based on standardized salomon criteria. * Relative Measurement Error: Deviation of estimated fetal weight when compared to a standard (expert-validated) ultrasound scan. * Subjective Workload: Evaluation of cognitive load and user effort using the NASA Task Load Index (NASA-TLX). * Determination of Experience Threshold: Defining the 'cutoff' in clinical experience (years and total scans) for significant AI-mediated accuracy gains.

Study Type

INTERVENTIONAL

Eligibility

Sex: ALLHealthy volunteers:

Medical Language ↔ Plain English

Clinical Target Population: Healthcare professionals and students, including but not limited to: * Medical students (doing their masters. * Resident physicians and Senior Consultants in Obstetrics and Gynecology. Exclusion: \- If the participants do not understand and speak either Danish or English Pregnant women: Inclusion Criteria: * Pre pregnancy BMI \< 40 * Singelton pregnancy * GA ≥ 37+0 at time of induction * Intact membranes (to ensure consistent amniotic fluid index) Exclusion Criteria: * Major fetal anatomical anomaly * Anhydramnios (DVP \< 2 cm) * CPR ratio \< 2.5th percentile

Outcomes

Primary Outcomes

To evaluate the sensitivity to change in ultrasound measurement accuracy when using AI-feedback compared to standard scanning

Mean absolute percentage error (MAPE), defined as the absolute difference between estimated fetal weight (EFW) and actual birth weight (ABW) divided by actual birth weight and expressed as a percentage, for AI-assisted and manual fetal biometry.

Time frame: The two scans will be performed within a timeframe of 14 days.

Secondary Outcomes

Procedural Efficacy

Scan duration (seconds) will be modeled as the dependent variable to assess the tem-poral impact of the AI-feedback.

Time frame: The duration of the scan, maximum of 30 minutes

Image Quality

Salomon Quality Score on the 16-point fetal biometry plane quality scale, comparing AI-assisted and manual ultrasound acquisition. The scale ranges from 0 to 16, with higher scores indicating better anatomical plane quality.

Time frame: Through study completion, an average of 1 year.

Cognitive and Physiological Load

Both the NASA-TLX (subjective) and GSR (objective) data will be modeled as de-pendent variables. These analyses will determine if the AI-intervention significantly alters the mental effort or autonomic stress response during the procedure.

Time frame: During the ultrasound procedure (GSR) and immediately following the procedure (NASA-TLX), approximately 30 minutes in total.

Measurement Deviation:

The absolute difference between the participant's EFW and a baseline EFW performed of an experienced clinician, will be modeled to assess if AI reduces inter-observer variability.

Time frame: The duration from pre study scan and study scan.

Experience Threshold for AI-Mediated Accuracy Gains

Estimated interaction effect between operator experience (continuous, experience level) and intervention (AI-assisted vs manual) on mean absolute percentage error (MAPE), and the corresponding experience level at which the adjusted difference in MAPE between groups is not statistically significant.

Time frame: Through study completion, an average of 1 year.

Impact of AI Feedback on Ultrasound Biometry Accuracy Across the Expertise Levels

Overview

Conditions

Interventions

Eligibility

Locations (1)

Outcomes

Primary Outcomes

Secondary Outcomes

Central Contacts