This study investigates the use of Generative AI (GAI) to support primary care practices in delivering accurate, accessible patient education. With the rise of health misinformation, increasingly complex patient needs, and a strained healthcare workforce, primary care must find new ways to communicate trusted health information effectively. Leveraging the Canadian Primary Care Information Network (CPIN), this study will generate patient education messages on key health topics using both GAI and human content experts. Diverse review panels of patients and providers will assess the messages on quality of information, adaptability, and relevance and usefulness, with special attention to socioeconomic factors that may impact message accessibility. CPIN will recruit a diverse sample of participants to evaluate both GAI- and human-generated messages. Review panels will provide structured feedback via surveys, aiming to identify differences in content quality and effectiveness. The study's goal is to determine whether GAI can produce high-quality health information that meets primary care standards. Results will reveal how GAI tools can support primary care in reducing misinformation and administrative burdens, fostering patient-provider relationships, and improving health equity. Findings will inform best practices for integrating GAI in primary care to ensure accessible, timely patient education across Canada.
Background. The increasing prevalence of health misinformation, complex patient needs, and a strained healthcare workforce necessitate innovative approaches to patient education in primary care. Generative AI (GAI) offers the potential to deliver accurate, accessible health information while reducing administrative burdens. This study explores the use of GAI to support primary care practices in producing trusted, high-quality patient education materials. Objective. The investigators propose to leverage advances in GAI and our experience with CPIN to provide timely and accurate health information for primary care practices across Canada. Our goal is to determine whether GAI can produce education material for primary care that is non-inferior compared to experts in primary care and public health. Methods. The content team for this study will consist of experts specializing in primary care, public health, or health communication. Team members will create digital health messages in two formats: a short, text-messaging format (850 characters or less), and a one-page handout for patients. On the other hand, a generative AI system will also generate messages. Topics and prompts for message writing will be provided to both the content team and the GAI. Messages in English and French will be available. To evaluate the generated content, two review panels (a review panel of 25 providers and one of 25 patients) will assess messages created by both human experts and generative AI over the course of 12 months. Each month, using an evaluation grid provided to assess the quality of information, adaptability, and relevance and usefulness of the message, panelists will review a total of 16 messages (four topics x 4 messages). Panelists will be blinded to the message generation source (AI or human). Short messages will be shown first to minimize potential bias from the detailed information in longer messages and ensure their clarity and completeness are effectively assessed. Both providers and patients on the review panels will complete assessments via REDCap surveys. The evaluation grid will be the same for providers and patients and will use a Likert scale from 1 to 4 (1: Strongly disagree; 4: Strongly agree). Specifically, there will be statements on Adaptability (subcategories: Clarity and understandability, Appropriate emotional appeal, Appropriate rational appeal, Tone, and Inclusivity) and Relevance and Usefulness. Statements on Quality of Information (subcategories: Accuracy, Reliability, and Completeness) will only be asked to providers. Patients, in contrast, will be asked at the end whether they noticed any inaccuracies in the message.
Study Type
INTERVENTIONAL
Allocation
RANDOMIZED
Purpose
HEALTH_SERVICES_RESEARCH
Masking
DOUBLE
Enrollment
50
Short (850 characters) and long (1 page) messages will be generated by a Generative Artificial Intelligence (ChatGPT 4.0) on different health-related topics
Short (850 characters) and long (1 page) messages will be generated by a primary care and/or public health human expert on different health-related topics
Institut du Savoir Montfort
Ottawa, Ontario, Canada
Clarity and Understandability score
Message score on three statements related to Clarity and Understandability after the participant has read the message. Three Likert scale questions from 1 - 4 (1: Strongly disagree and 4: Strongly agree), total score between 3 - 12 (low score: poorly rated message, high score: well rated message).
Time frame: 12 months
Overall message score
Overall message score after the participant has read the message. Specifically, the scores from statements on Adaptability (subcategories: Clarity and understandability, Appropriate emotional appeal, Appropriate rational appeal, Tone, and Inclusivity) and Relevance and Usefulness will be used to measure the overall message score. Statements on Quality of Information (subcategories: Accuracy, Reliability, and Completeness) will only be asked to providers. A Likert scale from 1 - 4 will be used (1: Strongly disagree and 4: Strongly agree), total score between 22 and 88 (low score: poorly rated message, high score: well rated message).
Time frame: 12 months
Message category and subcategory scores
Message score for each main category (quality of information, adaptability, and relevance and usefulness) and subcategories after the participant has read the message. A Likert scale from 1 - 4 will be used (1: Strongly disagree and 4: Strongly agree). * Quality of information: total score between 5 and 20 (low score: poorly rated message, high score: well rated message) * Adaptability: total score between 13 and 52 (low score: poorly rated message, high score: well rated message). * Relevance and usefulness: total score between 3 and 12 (low score: poorly rated message, high score: well rated message).
Time frame: 12 months
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.