Cluster trial to assess the effect of a digital clinical simulation (PainTrain AI) on clinical adequacy (0-100 rubric) across T0-T1-T2 in Primary Care. Secondary endpoints: SUS, adherence, low value practice indicators, and latency/friction. The intervention is educational/behavioral; the platform is non diagnostic and RAG fenced to validated content. Analysis per protocol: DiD and LMM/GLMM.
This study is a pragmatic, cluster-assigned implementation trial designed to evaluate the impact of the digital educational intervention PainTrain-AI on clinical decision-making in Primary Care. The intervention trains health professionals through standardized virtual patient simulations and brief interactive micromodules, using a retrieval-augmented generation (RAG) safety architecture that restricts all outputs to validated clinical content. PainTrain-AI does not generate new clinical information and is used exclusively for educational purposes. The trial corresponds to Phase 3 of a broader research program described in the project protocol. In this phase, entire Primary Care centers are allocated to either the intervention arm (PainTrain-AI training) or the control arm (usual training/standard practice). A natural comparative setting in specialized care is included for analytical purposes, without access to master datasets and without receiving any financial support. The primary outcome is clinical adequacy, assessed through a validated 0-100 rubric applied to standardized clinical cases at three time points: baseline (T0), immediate post-training (T1), and follow-up (T2). Double-blind scoring and a third-reviewer arbitration process are used to ensure reliability (target kappa ≥0.70). Secondary outcomes include changes in validated psychometric measures (NPQ-12, PABS-PT, HC-PAIRS, FABQ), adherence to PainTrain-AI micromodules, system usability (SUS), latency/friction metrics derived from platform logs, and indicators of low-value clinical practices. The trial incorporates a sex/gender randomization of clinical vignettes to identify and analyze potential gender bias in clinical decisions. All data collected refer to professionals, not patients. No identifiable clinical data are used; all datasets are pseudonymized according to the approved Data Management Plan (FAIR). Assignment by clusters prevents contamination between arms, and all centers follow standardized operating procedures (PNT) for recruitment, training, and evaluation. The platform maintains audit logs, human-in-the-loop oversight, and ENS/RGPD-compliant security measures. Analyses will use mixed-effects models and Difference-in-Differences approaches to account for clustering, repeated measures, and baseline differences. Results will inform the feasibility, acceptability, and preliminary effectiveness of a digital simulation-based educational intervention intended to reduce the competency gap, decrease low-value practices, and mitigate gender bias in the management of chronic musculoskeletal pain in Primary Care.
Study Type
INTERVENTIONAL
Allocation
NON_RANDOMIZED
Purpose
HEALTH_SERVICES_RESEARCH
Masking
NONE
Enrollment
185
PainTrain-AI is a behavioral educational intervention delivered through a digital clinical simulation platform. The system uses a retrieval-augmented generation (RAG) safety architecture that restricts all outputs to validated clinical content; the platform is non-diagnostic and does not generate new clinical information. Participants complete simulated consultations with virtual standardized patients and a series of brief micromodules designed to train biopsychosocial clinical reasoning for chronic musculoskeletal pain. The intervention is used exclusively for professional training and does not involve patient data. All participants in the intervention arm complete assessments at T0, T1, and T2.
Participants continue with usual training and standard practice available at their institution. No exposure to PainTrain-AI occurs. This arm serves as the active comparator.
Institut Català de la Salut (ICS) - Primary Care Network
Lleida, Lleida, Spain
Specialized Care Comparative Setting
Lleida, Lleida, Spain
Clinical adequacy score (0-100 rubric)
Clinical adequacy will be assessed using a validated 0-100 rubric applied to standardized clinical cases. Scores range from 0 (lowest adequacy) to 100 (highest adequacy), and higher scores indicate a better clinical outcome. The rubric evaluates three domains: (1) evaluation/triage (appropriate tests and red-flag assessment), (2) therapeutic recommendation (prioritizing active strategies and evidence-based pain education), and (3) clinical communication (avoiding iatrogenic messages and establishing functional goals). Scores are generated by double-blinded assessors with adjudication by a third reviewer in case of disagreement (target kappa ≥0.70). Change will be analyzed from T0 to T1 and T2.
Time frame: Baseline (T0), immediate post-training (T1), and 3-month follow-up (T2)
System Usability Scale (SUS, 0-100)
Usability of the PainTrain-AI intervention will be assessed with the System Usability Scale (SUS), a validated 10-item questionnaire that produces a score from 0 (lowest usability) to 100 (highest usability). Higher scores indicate better usability. The feasibility threshold is SUS ≥70, with an optimal target ≥80.
Time frame: Immediately after training
Adherence to PainTrain-AI micromodules (% completed)
Adherence will be measured as the percentage of scheduled micromodules or simulation sessions completed by each participant within the defined training period. Scores range from 0% (no modules completed) to 100% (all assigned modules completed). Higher percentages indicate greater engagement with the behavioral intervention.
Time frame: During the training period (up to 4 weeks)
Low-value clinical practice indicators (count per case)
Low-value practice indicators will be measured using standardized clinical cases scored with the 0-100 adequacy rubric. For this outcome, low-value practices are counted as discrete events, with possible values ranging from 0 (no low-value decisions in a case) to a higher count indicating more inappropriate decisions. Indicators include inappropriate diagnostic imaging, unnecessary pharmacologic escalation, or early referral to specialized care when not clinically indicated. The measure reflects the number of low-value decisions per case at each time point.
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.
Time frame: Baseline, immediately after training, and 3 months after training
System latency (seconds)
Latency will be measured as the average system response time (in seconds) between user input and platform output during PainTrain-AI interactions. Scores range from 0 seconds (best possible performance) upward, with higher values indicating greater technological friction and lower technical performance. Latency is automatically recorded through system logs.
Time frame: During the training period (up to 4 weeks)