AI-Assisted Acute Myeloid Leukemia Evaluation With the Leukemia End-to-End Analysis Platform (LEAP) Versus Clinician-Only Assessment

N/ACompletedNCT07203885

Harvard Medical School (HMS and HSDM)10 enrolled

Overview

This study will test whether artificial intelligence (AI) can help doctors diagnose a rare blood cancer called acute promyelocytic leukemia (APL) more quickly and accurately. Doctors usually examine bone marrow samples under a microscope to make this diagnosis, but it can be challenging and time-consuming. In this study, doctors will review bone marrow samples under three different conditions: * Unaided Review: Without AI assistance. * AI as Double-Check: AI-generated evaluation shown after the doctor makes an initial decision. * AI as First Look: AI-generated evaluation shown at the start of the review. Doctors will be randomly assigned to different orders of these three conditions. This design will allow us to compare how AI support affects diagnostic accuracy, speed, and confidence.

This study aims to evaluate the effect of artificial intelligence (AI) assistance on clinicians' diagnostic performance in detecting acute promyelocytic leukemia (APL) using Wright-Giemsa-stained bone marrow whole-slide images (WSIs). The Leukemia End-to-End Analysis Platform (LEAP) will serve as the AI model under assessment. This is a single-session, within-reader study. Participants will be randomly assigned to one of two study arms, which differ in the order of diagnostic blocks: \* Arm 1 (X -\> Y): Block X (Unaided Review): Clinicians review WSIs without AI support. Diagnostic accuracy, time to decision, and confidence will be recorded. Block Y (AI-Assisted Review): Comprising two sub-blocks presented in randomized order: Y1 (AI as Double-Check): Clinicians provide an initial diagnosis and confidence score without the aid of AI. AI predictions are then revealed, and clinicians may revise their diagnosis. Both pre-AI and post-AI decisions will be recorded. Y2 (AI as First Look): Clinicians review WSIs with AI-predicted diagnoses visible from the beginning. \* Arm 2 (Y -\> X): Block Y (AI-Assisted Review): Sub-blocks Y1 and Y2 presented in randomized order. Block X (Unaided Review): As described above. Each clinician will review 102 de-identified WSIs. For each reader, slides will be randomly divided into three disjoint subsets (e.g. 34/34/34), stratified by APL status, and assigned to Block X (Unaided), Block Y1 (AI as Double-Check), or Block Y2 (AI as First Look). No slide will be shown to the same reader in more than one block. In addition, the AI system will independently generate diagnostic predictions for all WSIs to enable benchmarking; however, this does not constitute a participant arm. Ground-truth diagnoses will be determined by molecular confirmation and expert consensus.

Study Type

INTERVENTIONAL

Interventions

Unaided Review First, Then AI-Assisted ReviewBEHAVIORAL

Readers first complete Block X (Unaided) on their assigned subset SX (34 slides). They then complete Block Y (AI-Assisted) on two separate subsets: SY1 (34 slides; AI as Double-Check) and SY2 (34 slides; AI as First Look). Within Block Y, the order of Y1 and Y2 is randomized. For each reader, SX, SY1, and SY2 are disjoint and stratified by APL status.

AI-Assisted Review First, Then Unaided ReviewBEHAVIORAL

Readers first complete Block Y (AI-Assisted) on two assigned subsets: SY1 (up to 40 slides; AI as Double-Check) and SY2 (up to 40 slides; AI as First Look), with the order of Y1 and Y2 randomized. They then complete Block X (Unaided) on subset SX (up to 40 slides). For each reader, SX, SY1, and SY2 are disjoint and stratified by APL status.

Eligibility

Sex: ALL

Medical Language ↔ Plain English

Inclusion Criteria for Pathology Slides (i.e., Cases): * Wright-Giemsa-stained bone marrow aspirate smears * Final diagnosis confirmed through molecular testing in conjunction with expert pathology evaluation Exclusion Criteria for Pathology Slides (i.e., Cases): * Poor-quality or unreadable slides * Cases used in AI training Inclusion Criteria for Readers (i.e., Participants): * Board-certified or board-eligible pathologists, or board-certified/board-eligible hematologists who routinely make hematopathology diagnoses in their clinical practice * Willingness to complete both unaided and AI-assisted review sessions

Outcomes

Primary Outcomes

Diagnostic performance of APL detection

Performance of clinicians (unaided and AI-assisted) in detecting APL, measured in accuracy, sensitivity, specificity, positive predictive value, and negative predictive value.