Trapeziometacarpal osteoarthritis (TMC OA) is a common condition affecting the base of the thumb that causes pain, weakness, and difficulty with daily hand use. Current clinical assessment often focuses on physical findings alone, without considering psychological and social factors that also influence patient outcomes. This study has three objectives organized as interrelated work packages: OBJECTIVE 1 (Clinical Assessment): To comprehensively assess individuals with TMC OA using the International Classification of Functioning, Disability and Health (ICF) framework. This includes evaluating pain, joint mobility, grip strength, daily activity limitations, social participation, psychological factors (anxiety, depression, fear of movement, pain beliefs), and environmental factors (family support, ergonomic adaptations). OBJECTIVE 2 (AI Knowledge Evaluation): To compare the medical knowledge performance of four large language models (Claude, ChatGPT, Gemini, LLaMA) in answering clinical questions about TMC OA, using criteria such as accuracy, reproducibility, comprehensiveness, clinical relevance, and readability. OBJECTIVE 3 (AI-Based Prediction): To analyze whether the best-performing large language model can predict multidimensional ICF-based patient profiles using only a limited set of core clinical parameters.
This research consists of three independent but interrelated work packages with different methods and targets. Work Package 1 (Clinical Data Collection and ICF-Based Profile Analysis): Participants with TMC OA will undergo a single face-to-face comprehensive assessment using a cross-sectional design. The assessment battery is structured according to the ICF framework and covers five domains: (a) Body Structure/Function: pain, joint mobility, grip and pinch strength, joint stability, and OA staging; (b) Activity: daily activity limitations and pain-activity patterns (avoidance, overdoing, pacing); (c) Participation: social, domestic, and occupational participation; (d) Personal Factors: pain beliefs, coping strategies, kinesiophobia, anxiety, and depression; (e) Environmental Factors: family support and ergonomic adaptations. Work Package 2 (Comparison of Large Language Models' Clinical Knowledge Performance): Four large language models (Claude, ChatGPT, Gemini, LLaMA) will be queried with questions frequently encountered in the TMC OA domain. Responses will be evaluated by subject matter experts using five criteria: accuracy, reproducibility (same questions repeated two times), comprehensiveness, clinical relevance, and readability (health literacy appropriateness). Work Package 3 (LLM-Based Predictive Profile Modeling): The best-performing LLM identified in WP2 will be provided with core clinical predictors from WP1 data. The model's predictions for multidimensional ICF-based patient profiles will be compared against actual assessment results using established agreement and performance metrics. Sample size: Based on a priori power analysis (alpha=0.05, power=0.80, effect size=0.131), a minimum of 93 participants is required.
Study Type
OBSERVATIONAL
Enrollment
93
Hacettepe University, Faculty of Physical Therapy and Rehabilitation, Hand Surgery Rehabilitation Unit
Ankara, Turkey (Türkiye)
Grip Strength
Measured using a Jamar dynamometer. The participant performs the test in a seated position with the elbow flexed at 90 degrees. Unit of Measure: Kilograms
Time frame: Baseline (single assessment at enrollment)
Pinch Strength
Measured using a pinchmeter to assess tip-to-tip and key pinch strength. Unit of Measure: Kilograms
Time frame: Baseline (single assessment at enrollment)
Thumb Opposition (Kapandji Score)
Assessment of thumb opposition using the Kapandji score, which ranges from 0 to 10. Higher scores indicate better thumb opposition and mobility.
Time frame: Baseline (single assessment at enrollment)
Pain Intensity
Measured using a Visual Analog Scale (VAS) ranging from 0 (no pain) to 10 (worst imaginable pain). Higher scores indicate greater pain intensity.
Time frame: Baseline (single assessment at enrollment)
Pain Duration
Total duration of thumb pain reported by the participant.
Time frame: Baseline (single assessment at enrollment)
Radiographic Severity (Eaton-Littler Stage)
Evaluation of the trapeziometacarpal joint osteoarthritis stage based on the Eaton-Littler classification (Stages I through IV).
Time frame: Baseline (single assessment at enrollment)
Radial Subluxation Ratio
Radiographic measurement of the radial subluxation of the metacarpal base on the trapezium.
Time frame: Baseline (single assessment at enrollment)
Upper Extremity Disability (QuickDASH)
Measured using the Quick Disabilities of the Arm, Shoulder and Hand (QuickDASH) questionnaire. The score ranges from 0 to 100, where higher scores indicate greater disability and symptoms.
Time frame: Baseline (single assessment at enrollment)
Hand Disability (Turkish Thumb Disability Index - TDX)
Assessment of thumb-related disability. Scores range from 0 to 100, with higher scores indicating greater functional impairment.
Time frame: Baseline (single assessment at enrollment)
Joint Hypermobility (Beighton Score)
Assessment of generalized joint laxity using the Beighton score. The total score ranges from 0 to 9, where higher scores indicate greater hypermobility.
Time frame: Baseline (single assessment at enrollment)
Thumb Joint Range of Motion
Active range of motion of the thumb joints measured using a goniometer. Unit of Measure: Degrees
Time frame: Baseline (single assessment at enrollment)
Provocative Tests
Clinical assessment using metacarpal adduction and extension tests to provoke symptoms. Presence or absence of pain (Binary: Yes/No)
Time frame: Baseline (single assessment at enrollment)
Environmental Factors: Social Support and Ergonomic Adaptations
Qualitative assessment of the participant's family support and the presence of ergonomic adaptations in their daily environment.
Time frame: Baseline (single assessment at enrollment)
Emotional Status (Hospital Anxiety and Depression Scale)
Measured using the Hospital Anxiety and Depression Scale (HADS), which consists of two subscales: Anxiety (HADS-A) and Depression (HADS-D). Each subscale ranges from 0 to 21, where higher scores indicate greater levels of anxiety or depression (worse outcome).
Time frame: Baseline (single assessment at enrollment)
Kinesiophobia Level (Tampa Scale of Kinesiophobia)
Measured using the 17-item Tampa Scale of Kinesiophobia (TSK-17) to assess the fear of movement or re-injury. Total scores range from 17 to 68, where higher scores indicate greater kinesiophobia (worse outcome).
Time frame: Baseline (single assessment at enrollment)
Pain-Activity Patterns (Patterns of Activity Measure-Pain).
Measured using the Patterns of Activity Measure-Pain (POAM-P) questionnaire to classify participants into three patterns: avoidance, overdoing, and pacing. Each subscale score indicates the frequency of that specific activity pattern. Higher scores on each subscale indicate a more frequent use of that specific activity pattern.
Time frame: Baseline (single assessment at enrollment).
Pain Beliefs Profile (Pain Beliefs Questionnaire)
Assessed using the Pain Beliefs Questionnaire (PBQ), which evaluates two dimensions: Organic and Psychological pain beliefs. Scores range from 1 to 6 for each subscale, where higher scores indicate a stronger belief in that specific dimension (e.g., higher organic scores mean a stronger belief that pain is due to physical damage).
Time frame: Baseline (single assessment at enrollment).
Pain Coping Strategies (Pain Coping Questionnaire).
Measured using the Pain Coping Questionnaire (PCQ) to assess the frequency of different coping strategies (e.g., information seeking, problem solving, distraction). Higher scores indicate a more frequent use of the respective coping strategy.
Time frame: Baseline (single assessment at enrollment).
Large Language Model Clinical Knowledge Accuracy
Performance comparison of four large language models (Claude, ChatGPT, Gemini, LLaMA) on 40 clinical questions. Evaluated by two independent blinded experts using a 4-point Likert scale (1: Completely Incorrect, 4: Completely Correct). Higher scores indicate better accuracy.
Time frame: Baseline (single assessment during the data collection period)
Large Language Model Response Reproducibility
Assessment of content consistency between two repeated queries of the same questions. Evaluated based on the percentage of agreement between the two sets of responses.
Time frame: Within 24 hours after initial query
Large Language Model Content Comprehensiveness
Evaluation of how thoroughly the model covers the necessary clinical details. Measured on a 5-point scale (1: Very Poor, 5: Very Good). Higher scores indicate more comprehensive answers.
Time frame: Baseline (single assessment at enrollment)
Large Language Model Clinical Relevance
Assessment of the practical utility of the responses for clinical practice. Measured on a 5-point scale (1: Not Relevant, 5: Highly Relevant). Higher scores indicate greater clinical utility.
Time frame: Baseline (single assessment)
Large Language Model Readability Score
The readability of the generated responses will be calculated using the Flesch Reading Ease Score. Scores typically range from 0 to 100, where higher scores indicate that the text is easier to read.
Time frame: Baseline (calculated immediately after response generation)
LLM Prediction Accuracy for Continuous ICF Profiles
Prediction accuracy of the best-performing LLM (identified in WP2) in estimating continuous clinical scores (e.g., Grip Strength, QuickDASH scores) from core clinical predictors. Accuracy will be measured using the Intraclass Correlation Coefficient (ICC) to evaluate the agreement between LLM-predicted values and actual clinical assessment results.
Time frame: Within 3 months after the completion of clinical data collection.
LLM Prediction Accuracy for Categorical ICF Profiles
Prediction accuracy of the best-performing LLM in estimating categorical patient profiles (e.g., Eaton-Littler Stage, POAM-P activity patterns). Accuracy will be measured using Cohen's Kappa coefficient to evaluate the agreement between LLM-predicted categories and actual expert-diagnosed categories.
Time frame: Within 3 months after the completion of clinical data collection.
Correlations Between Pain-Activity Patterns and Clinical Variables
This measure evaluates the correlation between the subscale scores of the Patterns of Activity Measure-Pain (POAM-P) questionnaire (Avoidance, Overdoing, and Pacing) and clinical parameters, including pain intensity (VAS), grip strength (kg), upper extremity disability (QuickDASH), and emotional status (HADS). Correlation will be analyzed using Spearman's or Pearson's correlation coefficients depending on the data distribution. Unit of Measure: Correlation Coefficient
Time frame: Baseline (single assessment at enrollment)
Kinesiophobia Level (Tampa Kinesiophobia Scale)
Level of fear of movement or re-injury will be assessed using the unabbreviated 17-item Tampa Scale of Kinesiophobia (TSK-17). Total scores range from a minimum of 17 to a maximum of 68. Higher scores indicate a greater level of kinesiophobia (worse outcome).
Time frame: Baseline (single assessment at enrollment).
Pain Beliefs Profile (Pain Beliefs Questionnaire)
This measure assesses the patients' beliefs about the cause of their pain using the unabbreviated Pain Beliefs Questionnaire (PBQ). The PBQ consists of two subscales: Organic Beliefs and Psychological Beliefs. Each subscale score is evaluated, and their correlation with coping strategies and emotional status is analyzed. For each subscale, scores range from 1 to 6 (calculated as an average of items), where higher scores indicate a stronger belief in that specific dimension (e.g., higher organic scores mean a stronger belief that pain has a physical cause).
Time frame: Baseline (single assessment at enrollment)
Anxiety and Depression (Hospital Anxiety and Depression Scale)
Emotional status will be assessed using the Hospital Anxiety and Depression Scale (HADS). The scale consists of two subscales: HADS-Anxiety (HADS-A) and HADS-Depression (HADS-D). Each subscale ranges from 0 to 21, where higher scores indicate greater levels of anxiety or depression (worse outcome).
Time frame: Baseline (single assessment at enrollment).
Pain Coping Strategies (Pain Coping Questionnaire)
Evaluation of the various methods used by participants to manage their pain using the unabbreviated Pain Coping Questionnaire (PCQ). The questionnaire assesses different subscales such as Information Seeking, Problem Solving, and Distraction. Each subscale is scored, and higher scores indicate a more frequent use of that specific coping strategy (higher scores generally represent better or more active coping, depending on the specific subscale).
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.
Time frame: Baseline (single assessment at enrollment)