Given the limited evidence on digital cognitive behavioral therapy (CBT) for chronic subjective tinnitus, particularly interventions supported by large language models (LLMs), this two-arm, 30-day randomized controlled trial will aim to evaluate the efficacy, safety, and usability of an LLM-based conversational CBT intervention compared with a digital education control. Participants with chronic subjective tinnitus will be randomly assigned to either the intervention group, which will receive daily AI-supported CBT sessions through the Fudan Tinnitus Doctor (FTD) system, or the control group, which will receive static tinnitus education materials matched for duration and platform interface. The FTD system will be powered by a multi-agent large language model and will deliver personalized CBT-style dialogues, including psychoeducation, cognitive restructuring, relaxation, and mindfulness guidance. Outcomes will include tinnitus severity, sleep quality, mood symptoms, global improvement, and user adherence and experience.
Chronic subjective tinnitus is a common and often distressing auditory condition that frequently co-occurs with insomnia, anxiety, and depression. Cognitive behavioral therapy (CBT) remains the most evidence-based treatment to reduce tinnitus-related distress, but traditional face-to-face CBT is limited by accessibility, cost, and availability of trained professionals. Digital CBT interventions have demonstrated potential benefits, yet their efficacy varies, and engagement and personalization remain major challenges. Recent advances in large language models (LLMs) have enabled the development of interactive conversational systems capable of delivering psychologically informed support. The Fudan Tinnitus Doctor (FTD) is a multi-agent conversational AI platform specifically designed to provide CBT-based tinnitus management. It integrates psychoeducation, cognitive restructuring, relaxation, and mindfulness guidance within natural language conversations. The system will be powered by a multi-agent large language model with retrieval-augmented generation (RAG) and safety moderation to ensure content validity, consistency, and user safety. This single-centre, parallel-group, open-label randomized controlled trial will evaluate the efficacy, safety, and usability of the FTD system compared with a digital education control. Participants with chronic subjective tinnitus meeting the inclusion criteria will be randomly assigned (1:1) to either the FTD intervention group or the control group. The intervention group will receive 30 days of AI-supported CBT sessions through the secure hospital web platform, while the control group will receive static tinnitus education materials matched for platform and duration but without interactive AI components. The primary outcome will be the change in tinnitus severity measured by the Tinnitus Handicap Inventory (THI) from baseline to Day 30. Secondary outcomes will include sleep quality (Pittsburgh Sleep Quality Index, PSQI), anxiety (GAD-2), depression (PHQ-2), and overall perceived improvement (Patient Global Impression of Change, PGIC). Exploratory outcomes will assess usability (System Usability Scale, SUS), satisfaction (Net Promoter Score, NPS), and engagement metrics. All assessments will use validated Chinese versions of the respective questionnaires. Safety monitoring will be conducted throughout the study period. Adverse events and any AI-related performance errors will be automatically logged and reviewed by the research team. This study will provide preliminary evidence on the efficacy and safety of an LLM-based conversational CBT system for tinnitus management, offering a scalable and patient-centered approach that could enhance accessibility and adherence in digital mental health care.
Study Type
INTERVENTIONAL
Allocation
RANDOMIZED
Purpose
TREATMENT
Masking
SINGLE
Enrollment
172
The FTD system will deliver personalized CBT-style dialogues including tinnitus psychoeducation, cognitive restructuring, relaxation training, sleep hygiene, and mindfulness guidance. The system will operate through a secure hospital web platform, accessible via personal devices, and will incorporate retrieval-augmented generation and safety moderation to ensure evidence-based and safe interaction.
Educational modules will include information on tinnitus mechanisms, common coping strategies, and general lifestyle recommendations. The materials will be accessed via the same secure platform as the intervention group to control for exposure time and digital interface effects.
Fudan University Eye Ear Nose and Throat Hospital, Otorhinolaryngology Department
Shanghai, Shanghai Municipality, China
Tinnitus Handicap Inventory (THI)
The THI questionnaire, widely used in research, including functional, emotional, and catastrophic subscales. It consists of 25 questions, and answers are rated on a "yes" (4 points), "sometimes" (2 points), and "no" (0 points) scale. The total score is calculated by adding up the scores for all questions and classifying the severity of tinnitus as no handicap (0-16), mild handicap (18-36), moderate handicap (38-56), and severe handicap (58-100).
Time frame: Baseline, Day 14, and Day 30
Pittsburgh Sleep Quality Index (PSQI)
The PSQI will be used to evaluate subjective sleep quality over the past month. It comprises 19 items yielding seven component scores (subjective sleep quality, latency, duration, efficiency, disturbances, use of sleep medication, and daytime dysfunction). Higher total scores indicate poorer sleep quality.
Time frame: Baseline, Day 14, and Day 30
Generalized Anxiety Disorder Scale (GAD-2)
The GAD-2 will assess anxiety symptoms over the past two weeks. Each of the two items is rated from 0 ("not at all") to 3 ("nearly every day"). The total score ranges from 0 to 6, with higher scores indicating greater anxiety. A score ≥3 suggests clinically significant anxiety.
Time frame: Baseline, Day 14, and Day 30
Patient Health Questionnaire-2 (PHQ-2)
The PHQ-2 will measure depressive symptoms over the past two weeks. Each of the two items is rated from 0 ("not at all") to 3 ("nearly every day"). The total score ranges from 0 to 6, with higher scores indicating greater depressive symptoms. A score ≥3 suggests clinically significant depression.
Time frame: Baseline, Day 14, and Day 30
Patient Global Impression of Change (PGIC)
The PGIC will assess the participant's overall perception of improvement since starting the intervention. It is rated on a 7-point Likert scale from 1 ("very much improved") to 7 ("very much worse"), with lower scores representing greater perceived improvement.
Time frame: Day 14, and Day 30
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.