Primary Goal: This study aims to evaluate the diagnostic and therapeutic accuracy of GPT-4 (an advanced AI language model) compared to three orthopedic surgeons with varying experience levels in cases of failed or painful total hip arthroplasty. Key Research Questions: Diagnostic Accuracy: Does GPT-4 provide correct, partially correct, or incorrect diagnoses compared to human orthopaedic surgeons? Diagnostic Completeness: Are GPT-4's diagnostic suggestions complete, partially complete, or incomplete compared to those of orthopedic surgeons? Treatment Accuracy: Does GPT-4 recommend correct, partially correct, or incorrect treatments for failed hip arthroplasty? Treatment Completeness: Are GPT-4's treatment recommendations fully comprehensive, partially complete, or incomplete compared to those of orthopaedic surgeon? Study Design: Participants: 20 anonymized patient cases (ages 18-80) with failed or painful hip arthroplasties, treated at IRCCS Istituto Ortopedico Rizzoli (Bologna, Italy) between 2004-2024. Cases were selected based on clear diagnostic and treatment records (no ambiguous or incomplete data). Comparison Groups: GPT-4 (via ChatGPT interface) Three orthopedic doctors (with different experience levels: resident, specialist, senior surgeon) Method: Each case (clinical summary + X-ray image) is presented to GPT-4 and the three doctors. They must provide a diagnosis and treatment recommendations. Two independent evaluators (principal investigator + department head) blindly assess responses for correctness and completeness using a 3-point scale (0=wrong/incomplete, 2=correct/complete). Statistical analysis compares GPT-4 vs. human performance. Expected Outcomes: Determine if AI can match or outperform doctors in diagnosing and treating hip arthroplasty failures. Assess whether GPT-4 could serve as a supplementary tool in orthopedic decision-making. Ethical \& Privacy Considerations: No real-time patient data is used-only anonymized past cases. No personal/sensitive data is shared with OpenAI (GPT-4 is used via a standard web interface). Study complies with GDPR, HIPAA, and ethical AI guidelines. Timeline: Study duration: \~8 months (from ethics approval to final analysis). Results will be published regardless of outcome. Why This Study Matters: First study evaluating GPT-4's role in complex orthopedic diagnostics. Could influence future AI-assisted clinical decision-making in joint replacement surgeries.
Study Type
OBSERVATIONAL
Enrollment
20
Diagnostic/Prognostic evaluation of any single case provided by AI (GPT-4). GPT-4 provides diagnosis/treatment recommendations via standardized prompts
Diagnostic/Prognostic evaluation of any single case provided by an human expert
Diagnostic/Prognostic evaluation of any single case provided by an human expert
Diagnostic/Prognostic evaluation of any single case provided by an human expert
SC Ortopedia e Traumatologia e Chirurgia Protesica e dei Reimpianti di Anca e Ginocchio, IRCCS Istituto Ortopedico Rizzoli
Bologna, Italy, Italy
Diagnostic correctness
Proportion of fully correct diagnoses (score=2) by each rater, Scale 0 (worst outcome) - 2 (best outcome). 0: incorrect, 1: imprecise, 2: correct
Time frame: Immediate (post-case evaluation)
Diagnostic completeness
Proportion of fully complete diagnoses (score=2). Scale 0 (worst outcome) - 2 (best outcome). 0: incomplete, 1: partially complete, 2: complete
Time frame: Immediate (post-case evaluation)
Treatment recommendation correctness
Proportion of fully correct treatments (score=2) by each rater. Scale 0 (worst outcome) - 2 (best outcome). 0: incorrect, 1: imprecise, 2: correct
Time frame: Immediate (post-case evaluation)
Treatmetn recommendation completeness
Proportion of fully complete treatments (score=2). Scale 0 (worst outcome) - 2 (best outcome). 0: incomplete, 1: partially complete, 2: complete
Time frame: Immediate (post-case evaluation)
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.