This trial tests if AI can help make medical info clear and readable. Many patients struggle to find medical informations that easy to read and understand from verified medical sources. The study tests if an AI tool can assist health providers to craft clear text for patients more fast than what they do now. Health providers are split at random into two groups-one uses the AI tool and one does not. The trial tests how clear the text is, how correct it is, and how much time is saved. The aim is to see if AI can close the gap between complex research and what patients can grasp.
This study evaluates whether a generative artificial intelligence (AI) tool can improve the readability and accessibility of lay summaries derived from scientific medical abstracts. Many patients encounter difficulty understanding medical literature due to technical language and complexity, which can limit informed decision-making and engagement with healthcare information. The BRIDGE-AI (Provider Perspective) initiative aims to address this gap by enabling healthcare professionals and researchers to generate patient-friendly summaries of scientific content using AI-assisted tools. The intervention leverages a generative AI framework (pub2people) designed to translate complex medical terminology into language that is understandable to a general audience. In this randomized controlled study, participants with experience in scientific publishing will be assigned to either an AI-assisted group or a control group using conventional methods. Participants will be asked to transform scientific abstracts into layperson-friendly summaries. The study compares AI-assisted and manually generated outputs in terms of readability, accuracy, and efficiency. The primary objective is to determine whether AI-assisted generation improves the readability of lay summaries compared to standard approaches. Secondary objectives include evaluating the accuracy of AI-generated summaries relative to source material and assessing potential time savings associated with AI use. This study contributes to ongoing efforts to improve health communication by evaluating scalable tools that may enhance the translation of complex medical information into patient-accessible formats.
Study Type
INTERVENTIONAL
Allocation
RANDOMIZED
Purpose
HEALTH_SERVICES_RESEARCH
Masking
SINGLE
Enrollment
120
Pub2Post, a generative artificial intelligence agent which helps in drafting the layperson abstracts and summaries
University of Southern California
Los Angeles, California, United States
Readability Change
Flesch Reading Ease Score Description: Measures text readability based on sentence length and word syllables. Scale: 0 to 100 Interpretation: Higher scores indicate easier readability (better outcome).
Time frame: The assessment will be conducted immediately after the study closes, which will occur 4 weeks after enrollment.
Readability Change
Flesch-Kincaid Grade Level Description: Estimates U.S. school grade level required to understand the text. Scale: Typically ranges from \~0 to 18+ Interpretation: Lower scores indicate easier readability (better outcome).
Time frame: The assessment will be conducted immediately after the study closes, which will occur 4 weeks after enrollment.
Readability Change
Gunning Fog Index Description: Estimates years of formal education needed to understand the text on first reading. Scale: Typically 0 to 20+ Interpretation: Lower scores indicate easier readability (better outcome).
Time frame: The assessment will be conducted immediately after the study closes, which will occur 4 weeks after enrollment.
Readability Change
SMOG Index (Simple Measure of Gobbledygook) Description: Estimates years of education required to comprehend the text. Scale: Typically 0 to 20+ Interpretation: Lower scores indicate easier readability (better outcome).
Time frame: The assessment will be conducted immediately after the study closes, which will occur 4 weeks after enrollment.
Readability Change
Coleman-Liau Index Description: Readability formula based on characters per word and sentence length. Scale: Typically 0 to 18+ (grade level equivalent) Interpretation: Lower scores indicate easier readability (better outcome).
Time frame: The assessment will be conducted immediately after the study closes, which will occur 4 weeks after enrollment.
Readability Change
Automated Readability Index (ARI) Description: Estimates grade level required for comprehension using characters and word counts. Scale: Typically 0 to 14+ Interpretation: Lower scores indicate easier readability (better outcome).
Time frame: The assessment will be conducted immediately after the study closes, which will occur 4 weeks after enrollment.
Time Saving
To evaluate the time savings achieved by using generative AI compared to traditional methods for generating layperson abstracts and summaries. Time will be recorded in hours, minutes, and seconds. We will collect and compare the total time spent drafting the complete layperson abstract and summaries, as well as the time spent on each individual section - background, methods, results, conclusion, and short summaries. The comparison will be made between summaries created by humans alone versus those created with GAI assistance. Time will be reported in minutes
Time frame: The assessment will be conducted immediately after the study closes, which will occur 4 weeks after enrollment.
Correctness and meaning retention
Accuracy Score of Layperson Abstract Sections Description: Degree to which each section (Background, Methods, Results, Conclusion, Short Summary) reflects key information from the original scientific abstract. Scale: 5-point Likert scale (1 = very inaccurate, 5 = highly accurate) Assessment Method: Two independent reviewers score each section Interpretation: Higher scores indicate better accuracy (better outcome).
Time frame: The assessment will be conducted immediately after the study closes, which will occur 4 weeks after enrollment.
Correctness and meaning retention
Completeness Score of Layperson Abstract Sections Description: Extent to which essential information from the original abstract is included in each section. Scale: 5-point Likert scale (1 = very incomplete, 5 = fully complete) Assessment Method: Two independent reviewers evaluate each section. Interpretation: Higher scores indicate greater completeness (better outcome).
Time frame: The assessment will be conducted immediately after the study closes, which will occur 4 weeks after enrollment.
Correctness and meaning retention
Clarity Score for Layperson Readability Description: Evaluates simplicity, avoidance of jargon, and coherence for lay audiences. Scale: 5-point Likert scale (1 = very unclear, 5 = very clear and understandable) Assessment Method: Two independent reviewers evaluate each section. Interpretation: Higher scores indicate better clarity (better outcome).
Time frame: The assessment will be conducted immediately after the study closes, which will occur 4 weeks after enrollment.
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.
Correctness and meaning retention
Section-Level Correctness Rate Description: Proportion of sections rated as "correct," defined as receiving scores ≥4 from both reviewers on accuracy, completeness, and clarity simultaneously. Scale: 0 to 1 (proportion) or 0% to 100% Interpretation: Higher values indicate better overall section quality.
Time frame: The assessment will be conducted immediately after the study closes, which will occur 4 weeks after enrollment.
Correctness and meaning retention
Hallucination Rate Description: Frequency of false or misleading content in generated layperson abstracts, defined as information not supported by the original abstract. Scale: Proportion of sections or documents containing hallucinations (0 to 1 or %) Assessment Method: Evaluated separately by reviewers using predefined criteria. Interpretation: Lower values indicate better performance (fewer hallucinations).
Time frame: The assessment will be conducted immediately after the study closes, which will occur 4 weeks after enrollment.
Perceived Task Difficulty
Description: Participant-reported difficulty of completing the lay abstract summarization task. Scale: 5-point Likert scale (1 = very easy, 5 = very difficult) (adjust anchors if different in your instrument) Assessment Timing: Immediately after task completion Interpretation: Lower scores indicate less perceived difficulty (better outcome)
Time frame: The assessment will be conducted immediately after the study closes, which will occur 4 weeks after enrollment
Perceived Task Duration
Description: Participant perception of time required to complete the task. Scale: 5-point Likert scale (1 = very short, 5 = very long) Assessment Timing: Post-task Interpretation: Lower scores indicate shorter perceived duration (better outcome)
Time frame: The assessment will be conducted immediately after the study closes, which will occur 4 weeks after enrollment
Perceived Helpfulness of the intervention
Description: Participant-reported usefulness of the generative AI tool in assisting lay abstract creation. Scale: 5-point Likert scale (1 = not helpful at all, 5 = extremely helpful) Assessment Timing: Post-task Interpretation: Higher scores indicate greater perceived helpfulness (better outcome)
Time frame: The assessment will be conducted immediately after the study closes, which will occur 4 weeks after enrollment
System Usability Scale (SUS) Score
Description: Standardized assessment of system usability using the System Usability Scale. Scale: 0 to 100 Interpretation: Higher scores indicate better usability * 70: acceptable usability * 90: superior usability
Time frame: Immediately after completing the system/task (post-use assessment)
Perceived Usefulness (Technology Acceptance Model)
Description: Degree to which participants believe the GAI tool enhances task performance. Scale: Likert scale (typically 1-5 or 1-7; specify exact instrument version) Interpretation: Higher scores indicate greater perceived usefulness (better outcome)
Time frame: Immediately after completing the system/task (post-use assessment)
Perceived Ease of Use (Technology Acceptance Model)
Description: Degree to which participants find the GAI tool easy to use. Scale: Likert scale (typically 1-5 or 1-7; must match instrument used) Interpretation: Higher scores indicate greater ease of use (better outcome)
Time frame: Immediately after completing the system/task (post-use assessment)