This is a RCT of 284 outpatient physicians at a large academic health system, randomized 1:1 to an electronic health record (EHR) produced generative AI outpatient chart summarization tool or a usual-care control group. The 90 day study will observe the effects of the tool prior to system-wide roll out of the tool.
The primary aim of this study is to evaluate the impact of an EHR developed generative AI outpatient chart summarization tool on self-reported physician-task load score (PTL), comparing the tool to a control group. Exploratory outcomes include EHR-derived time metrics (Caboodle and Signal), Professional fulfillment Index (PFI), usability (SUS), provider satisfaction and productivity, and patient experience item results from CG-CAHPS. We will also evaluate whether AI literacy modifies adoption and effect of the tool using the short-form Meta AI Literacy Scale (MAILS). On an exploratory basis, we will also perform adjustments based on provider specialty, access to an ambient-listening AI scribe, panel complexity, provider age group, provider sex, and time-varying effects by month over the study period. Enrolled participants are randomized to one of two groups. Randomization will be stratified by whether the participant has an active AI scribe license, and covariate-constrained randomization will be performed within strata to improve balance on baseline PTL (NASA-TLX-adapted score) and a modified baseline chart review time (Caboodle-derived). Due to the nature of the intervention, participants cannot be blinded to group assignment. The primary purpose of the initiative is to improve quality, efficiency, and business operations at University of California, Los Angeles (UCLA) Health and will inform the operational implementation of the tool across all providers within the UCLA Health System. Nevertheless, the UCLA study team plans to rigorously examine and publish the impact of this intervention across the health system, which is why the study team pre-registered the initiative.
Study Type
INTERVENTIONAL
Allocation
RANDOMIZED
Purpose
HEALTH_SERVICES_RESEARCH
Masking
SINGLE
Enrollment
284
Epic's generative AI chart summarization tool summarizes a subset of a patient's notes. Use of the tool is optional and intended solely to provide a summary for providers and does not provide clinical decision support. The system automatically selects recent notes or a provider can manually select specific notes of interest. The number of notes summarized is limited by the character constraints of the EHR, 24,000 English characters or 30 notes. The system uses AI to generate a short summary of relevant information. The summaries are meant to be used as a tool to aid providers and are not intended to be placed in clinical notes. The summaries created are currently not stored in the patient's chart.
University of California, Los Angeles
Los Angeles, California, United States
Change from Baseline Physician Task Load
Physician task load adapted from the NASA Task Load Index (TLX), a validated tool for assessing EHR-related cognitive task load in four sub-scales (mental demand, temporal demand, physical demand, and effort). This outcome is adapted to capture the task of pre-charting, defined for this study as the practice of reviewing patient information in the EHR before a patient visit to prepare for the encounter. Each sub-scale is rated from 0 (low) to 100 (high) and is aggregated to a 0-400 point scale. No patient level information will be collected for this outcome measure.
Time frame: Baseline and 90 days after initial exposure to the intervention
Change in Modified Total Chart Time Per Encounter
Using Caboodle, Epic's enterprise data warehouse, we will use a customized metric to measure clinician time spent reviewing the patient's chart. Based on internal validation where clinicians could access the chart summarization tool, the metric includes Caboodle Tier 1 activities corresponding to Clinical Review (all activities), Documentation limited to pre-charting activity only, and Other limited to Navigator-related activity, as well as the activity dedicated to the chart summarization tool. All note-writing, order entry, and encounter-signing activities are excluded. Time post checkout is also excluded. Change will be assessed relative to a 6-month retrospective baseline. No patient level information will be collected for this outcome measure.
Time frame: Baseline, after 60 days of exposure to the intervention, and after 90 days of exposure to the intervention.
Change from Baseline Professional Fulfillment Index Score
The Professional Fulfillment Index (PFI) is a validated 16-item instrument that uses a 5-point Likert scale (0-4) to measure professional fulfillment, work exhaustion, and interpersonal disengagement. Burnout is reported based on combined results of the work exhaustion and interpersonal disengagement subscales. A higher score indicates greater level of exhaustion. No patient level information will be collected for this outcome measure.
Time frame: Baseline and 90 days after initial exposure to the intervention
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.
Change from Baseline Self-Reported Pre-Charting Effectiveness
Providers will answer questions regarding self-reported effectiveness and efficiency in pre-charting. The questions use a 5-point Likert scale to measure these metrics, with a higher score indicating greater self-reported efficiency and effectiveness.
Time frame: Baseline and 90 days from initial exposure to the intervention
Provider Satisfaction Scores
Self-reported satisfaction survey that includes physician reported effect of chart summarization tool on pre-charting efficiency and effectiveness, and other potential unintended consequences. No patient level information will be collected for this outcome.
Time frame: 90 days after initial exposure to the intervention
System Usability Scale
The system usability scale (SUS) is a ten-item questionnaire that uses a 5-point Likert scale to measure different aspects of system usability. The total scale ranges from 0-100. A higher SUS score indicates higher usability.
Time frame: 90 days from initial exposure to the intervention
Safety Events
Clinician-reported safety of AI-generated chart summaries, including perceived frequency of clinically significant errors and occurrence of major safety events during the intervention period. No patient level information will be collected for this outcome.
Time frame: Over the 90 day period following after initial exposure to the intervention
Qualitative Tool-Specific EHR Feedback
Aggregated real-time feedback submitted by clinicians through the EHR- native summarization interface during the intervention period, including ratings of accuracy, completeness, and hallucinations. No patient level information will be collected for this outcome.
Time frame: Over the 90 days following initial exposure to the intervention
Change from Baseline Consumer Assessment of Healthcare Providers and Systems Clinician & Group Survey (CG-CAHPS) Metric
The Consumer Assessment of Healthcare Providers and Systems Clinician \& Group Survey (CG-CAHPS) is a standardized patient feedback survey measuring experiences with providers. We will examine changes in the CG-CAHPS mean scores compared to 6 months prior to the intervention for the question "In the last 6 months, how often did this doctor seem to know the important information about your medical history".
Time frame: Baseline and 90 days after initial exposure to the intervention
Change from Baseline Epic Signal (Activity) Metric: Time Outside Scheduled Hours
We will examine change from a retrospective baseline 6 months prior to enrollment in Signal metrics including time outside scheduled hours per scheduled day. Using this data will determine how a provider's time is utilized in the EHR. No patient level information will be collected for this outcome measure.
Time frame: Baseline and 90 days after the initial exposure to the intervention
Clinician Relative Value Units (RVUs) per Week
Average relative value units (RVUs) per week during intervention months 2 and 3, adjusted for baseline (6-month pre-intervention period)
Time frame: 60 days after exposure to the intervention, and 90 days after exposure to the intervention