This randomized controlled trial evaluates the effectiveness of a generative artificial intelligence (AI)-based simulation program in improving diagnostic communication skills among medical students. The study is conducted at the Faculty of Higher Studies Iztacala, National Autonomous University of Mexico (UNAM). A total of 120 medical students are randomized to either an intervention group using the DIALOGUE-DM2 AI simulation platform or a control group following traditional educational methods. Participants complete a pre-test, receive training according to group assignment, and then undergo a post-test evaluation. The primary outcome is improvement in diagnostic communication skills, measured by standardized patient scenarios and validated rubrics. Secondary outcomes include self-reported confidence, communication domains, and inter-rater agreement between faculty evaluators and AI scoring. This trial aims to provide high-quality evidence on the potential of generative AI to enhance communication training in medical education, specifically in the context of type 2 diabetes diagnosis.
This study builds on a prior pilot trial (published in 2024) that demonstrated the feasibility of using generative artificial intelligence (AI) to train medical students in diagnostic communication. The current trial extends that work with a randomized, blinded, controlled design and a larger sample size. Design: The study is a randomized, blinded, parallel-group, controlled trial conducted at the Faculty of Higher Studies Iztacala (FES Iztacala), UNAM. A total of 120 medical students are enrolled and randomized (1:1) into either the intervention group (AI-based simulation training) or the control group (traditional training with standardized patients and faculty feedback). Intervention: * Intervention group: Students interact with the DIALOGUE-DM2 platform, which provides generative AI-driven simulated patients. They complete multiple diagnostic disclosure scenarios and receive immediate feedback on performance, based on standardized communication rubrics. * Control group: Students receive standard training, including lectures and supervised practice with peer role-play and faculty-guided feedback. Assessments: * Pre-test: All students complete one standardized patient scenario with faculty and AI evaluation prior to intervention. * Training phase: Participants complete their assigned training (AI vs. standard). * Post-test: Students complete a standardized diagnostic disclosure scenario. Independent faculty evaluators (blinded to group assignment) and the AI platform score performance. Outcomes: * Primary outcome: Change in diagnostic communication performance score from pre-test to post-test, measured by validated rubrics (Kalamazoo framework, MRS). * Secondary outcomes: * Student self-assessment of communication confidence. * Domain-specific improvements (information delivery, empathy, risk explanation, shared decision-making). * Agreement between human evaluators and AI scoring. Ethics and Oversight: The study has been reviewed and approved by the Research Ethics Committee of FES Iztacala, UNAM (Approval Number CE/FESI/042025/1915). Risks are minimal, as the intervention is educational and non-invasive. Significance: This is the first randomized controlled trial in Mexico to evaluate a generative AI-based simulation for diagnostic communication. Results will inform the integration of AI-driven training tools into medical education curricula and could contribute to scalable innovations in the training of healthcare professionals for chronic disease management, starting with type 2 diabetes.
Study Type
INTERVENTIONAL
Allocation
RANDOMIZED
Purpose
HEALTH_SERVICES_RESEARCH
Masking
TRIPLE
Enrollment
120
Medical students interact with the DIALOGUE-DM2 platform, a generative AI-based simulation system. The platform delivers virtual patient encounters focused on type 2 diabetes diagnostic disclosure. Students complete multiple simulated scenarios and receive immediate AI-generated feedback aligned with standardized communication rubrics (Kalamazoo, MRS). Training aims to enhance diagnostic communication skills prior to post-test evaluation.
Medical students receive traditional training in diagnostic communication. This includes lectures, peer role-play, and faculty-supervised feedback sessions covering diagnostic disclosure in type 2 diabetes. The training duration and number of sessions are matched to the intervention group.
Universidad Nacional Autónoma de México, Faculty of Higher Studies Iztacala (FES Iztacala)
Tlalnepantla, Mexico
Change in Diagnostic Communication Performance Score
Improvement in diagnostic communication skills, measured using validated rubrics - the Kalamazoo Essential Elements Communication Checklist and the Medical Communication Rating Scale (MCRS) - applied to standardized patient scenarios. Independent blinded faculty evaluators and AI scoring will be used. Scores range from 0 to 100, with higher values indicating better diagnostic communication performance.
Time frame: Approximately 12 weeks (from pre-test to post-test per participant).
Change in Student Self-Reported Confidence in Diagnostic Communication
Change in students' self-reported confidence when disclosing a diagnosis of type 2 diabetes, measured through a structured questionnaire using a 5-point Likert scale (1 = very low confidence, 5 = very high confidence). Higher scores indicate greater self-perceived confidence in diagnostic communication.
Time frame: Approximately 12 weeks (from pre-test to post-test per participant).
Change in Domain-Specific Diagnostic Communication Scores (Kalamazoo Framework and Medical Communication Rating Scale)
Improvement in specific communication domains - information delivery, empathy, risk explanation, and shared decision-making - evaluated using the Kalamazoo Essential Elements Communication Checklist and the Medical Communication Rating Scale (MCRS). Each domain is scored from 0 to 100, with higher scores indicating better performance.
Time frame: Approximately 12 weeks (from pre-test to post-test per participant).
Agreement Between Human Evaluators and AI Scoring
Level of concordance between blinded human evaluators and AI-based scoring of diagnostic communication performance, assessed using Cohen's kappa coefficient (κ). Scores range from -1.0 to +1.0, where values closer to +1.0 indicate stronger agreement between evaluators.
Time frame: Assessed at post-test, approximately 12 weeks after baseline per participant.
Student Satisfaction With the Assigned Training Method
Satisfaction with the assigned training method (AI-based simulation vs. traditional training), measured using a structured 5-point Likert satisfaction survey (1 = very dissatisfied; 5 = very satisfied). Higher scores indicate greater satisfaction with the training method.
Time frame: Assessed immediately after completion of the post-test, approximately 12 weeks after baseline per participant.
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.