The Big Unknown: A Journey Into Generative AI's Transformative Effect on Medical Professions

Maastricht University249 enrolled

Overview

A parallel group randomized controlled trial using a superiority framework. Clinical vignettes will be used to assess the impact of a large language model on the clinical reasoning of physicians. Quantitative analyses will be performed on graded vignette responses.

This study is a multi-country, parallel-group randomized controlled trial designed to evaluate whether access to a large language model (LLM) improves physician clinical decision-making. The trial uses a superiority framework and compares physicians randomized to either complete standardized clinical vignettes with access to GPT-4o or without any AI assistance. Clinical vignettes simulate common primary care conditions such as cardiovascular, respiratory, musculoskeletal, fatigue-related, and infectious diseases. Each vignette includes multiple steps in the clinical reasoning process, from initial history-taking to diagnosis, treatment, and follow-up. Physician responses are graded using rubrics developed from evidence-based, context-specific best-practice guidelines. The study is conducted across three countries-Indonesia, Kenya, and the Netherlands-representing different income levels and health system contexts. The primary outcome is performance on clinical vignettes, defined as adherence to best-practice guidelines. Secondary objectives include examining cross-country variation in physician performance, variation in performance distributions, and the role of engagement with the LLM in shaping outcomes.

Study Type

INTERVENTIONAL

Allocation

RANDOMIZED

Purpose

DIAGNOSTIC

Masking

SINGLE

Enrollment

249

Conditions

Diagnosis Vignette of Fictional Patients

Interventions

GPT-4oOTHER

GPT-4o provided via an iFrame in the online Qualtrics environment

Eligibility

Sex: ALLHealthy volunteers:

Medical Language ↔ Plain English

Inclusion Criteria: * Registered medical physicians * Training in internal or family medicine Exclusion Criteria: * Not currently practicing clinically

Locations (3)

Universitas Indonesia

Jakarta, Indonesia

Aga Khan University Hospital

Nairobi, Kenya

Maastricht University

Maastricht, Netherlands

Outcomes

Primary Outcomes

Percentage Correct Score

Following Peabody et al (2000), the primary outcome is a percentage correct score across all steps in a vignette. This is generated by dividing the weighted total sum of rubric items assessed as present by the total number of rubric items possible in a vignette. Rubric items will be weighted with regards to their relevance by our expert panel.

Time frame: During Evaluation

Secondary Outcomes

Quality Per Answer

This outcome is generated as the average weight of rubric items assessed as present across vignettes. As each item is provided a weight (0.33,0.5,1), the average weight is the sum of weights divided by the number of answers marked as present.

Time frame: During Evaluation

Number of Answers

This outcome is generated as the count of the total number of answers assessed as present by reviewers per vignette

Time frame: During evaluation

Less obvious answers

This outcome is generated as the number of answers given that are less obvious, i.e. mentioned less frequently by the control group. If the answer is mentioned by 25% or less of the control group, it is considered less obvious.

Time frame: During evaluation

Data from ClinicalTrials.gov

This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.