This is a three-arm pragmatic RCT of 238 outpatient physicians at a large academic health system, randomized 1:1:1 to one of two AI scribe tools or a usual-care control group. The two-month study will observe and compare the effects of each tool prior to system-wide roll out of selected tool (anticipated Spring 2025). We will use covariate-constrained randomization to balance the arms in terms of physician baseline time in notes, survey-measured level of burnout, and clinic days per week. The primary purpose of the initiative is to improve quality, efficiency, and business operations at University of California, Los Angeles (UCLA) Health, and this initiative is not being done for research purposes. The results of this operational initiative will inform the widespread roll out of AI scribe tools across all providers within the UCLA Health System. Nevertheless, the UCLA study team plans to rigorously examine and publish the impact of this intervention across the health system, which is why the study team pre-registered the initiative.
This study will assess operational-oriented outcomes across all groups. Notably, all groups will eventually receive all interventions over time in this observational study of a randomized roll out of a QI initiative. Moreover, the primary purpose of this initiative is operational. In other words, based on the results of this initiative, one of these tools will be eventually selected and operationalized widely across the health system. Enrolled participants are randomized to one of three groups. Randomization was needed to overcome secular trends, seasonal and holiday effects in December, and other factors confounding the relationship between exposure to the AI tools and the outcomes. The primary aim of this study is to evaluate the impact of two ambient AI scribe technologies on clinician change from baseline time spent on EHR documentation, comparing each scribe to a control group. Secondary objectives include assessing the AI scribes' impact on clinician metrics such as burnout, physician satisfaction, and productivity. Additionally, the study team intends to perform an economic evaluation analysis of the tools to guide business decision making. The study team will also analyze physician reported effects of the AI tools on patient safety, equity, and any unintended consequences of the initiative.
Study Type
INTERVENTIONAL
Allocation
RANDOMIZED
Purpose
HEALTH_SERVICES_RESEARCH
Masking
SINGLE
Enrollment
238
AI Scribe technologies capture physician-patient conversations to create a transcript, then summarize the transcript in the form of a clinical notes. These tools are integrated into the EHR and automatically adds the generated text to the provider note. All physicians must inform patients about the recording and obtain their verbal consent, and instances of patients declining to consent are tracked. Nabla leverages its proprietary speech-to-text to transform the conversation into a written context, combined with HIPAA compliant Large Language Models (LLM) like Azure OpenAI's GPT-4. Nabla does not store any audio.
AI Scribe technologies capture physician-patient conversations to create a transcript, then summarize the transcript in the form of a clinical notes. These tools are integrated into the EHR and automatically adds the generated text to the provider note. All physicians must inform patients about the recording and obtain their verbal consent, and instances of patients declining to consent are tracked.
UCLA Health System
Los Angeles, California, United States
Change in the time in notes per note
The primary outcome measure is the change in the time in notes per note in the second month of the trial from a retrospective baseline six months prior to enrollment. This change will be computed on the natural log scale. No patient level information will be collected for this outcome measure.
Time frame: Study month 2
Provider Burnout Score
The Mini Z 2.0 Survey is a validated 10-item instrument designed to measure key factors influencing workplace satisfaction and burnout among healthcare professionals. Each item is scored on a Likert scale (1-5), with higher scores generally indicating more positive outcomes - greater job satisfaction, sufficiency of time for electronic medical record documentation, and lower levels of stress. For negatively framed items (e.g., stress due to the job or frustration with the electronic medical record), higher scores indicate lower levels of dissatisfaction. The total score ranges from 10 to 50, with scores ≥40 representing a joyful workplace. No patient level information will be collected for this outcome measure.
Time frame: Study month 2
Provider Task Load Score
Provider task load adapted from the NASA Task Load Index (TLX), a validated tool for assessing perceived workload across six sub-scales: mental demand, physical demand, temporal demand, performance, effort, and frustration. For this study, we adapted the TLX to focus on note-writing workload, including four sub-scales (mental demand, temporal demand, physical demand, and effort) as done previously. Each sub-scale is rated from 0 (low) to 100 (high). No patient level information will be collected for this outcome measure.
Time frame: Study month 2
Provider Professional Fulfillment
The Professional Fulfillment Index (PFI) is a validated 16-item instrument that uses a 5-point Likert scale (0-4) to measure professional fulfillment, work exhaustion, and interpersonal disengagement. For this study, we utilize the 4-item work exhaustion subscale, where a higher score indicates greater level of exhaustion. No patient level information will be collected for this outcome measure.
Time frame: Study month 2
Provider Satisfaction Scores
Self-reported satisfaction survey that includes physician reported effects on note accuracy, patient safety, equity, and other potential unintended consequences. No patient level information will be collected for this outcome measure.
Time frame: Study month 2
Change in clinic turnover rate
The study team will use clinic turnover rate to determine their change in productivity from a retrospective baseline 6 months prior to enrollment. No patient level information will be collected for this outcome measure.
Time frame: Study month 2
Change in provider RVU
The study team will use physician-level billing information via RVU to determine their change in productivity from a retrospective baseline 6 months prior to enrollment. No patient level information will be collected for this outcome measure.
Time frame: Study month 2
Change in EHR Signal (Activity) Data - Pajama Time
We will examine change from a retrospective baseline 6 months prior to enrollment in Signal metrics including pajama time per scheduled day. Using this data will determine how a providers time is utilized in the EHR. No patient level information will be collected for this outcome measure.
Time frame: Study month 2
Change in EHR Signal (Activity) Data - Time outside scheduled hours
We will examine change from a retrospective baseline 6 months prior to enrollment in Signal metrics including time outside scheduled hours per scheduled day. Using this data will determine how a providers time is utilized in the EHR. No patient level information will be collected for this outcome measure.
Time frame: Study month 2
Change in EHR Signal (Activity) Data - Number of unscheduled days
We will examine change from a retrospective baseline 6 months prior to enrollment in Signal metrics including number of unscheduled days where time is spent in the system. Using this data will determine how a providers time is utilized in the EHR. No patient level information will be collected for this outcome measure.
Time frame: Study month 2
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.