This research study aims to evaluate the effect of treatment delivery method on voice outcomes over 12 months in people with a primary complaint of a voice problem, diagnosed with either non-phonotraumatic vocal hyperfunction, also known as primary muscle tension dysphonia (MTD) or phonotraumatic vocal hyperfunction, also known as benign vocal fold lesions (lesions). The secondary objectives are: * To evaluate acoustic correlates of clear speech and the relationship to vocal acoustic and patient-reported voice outcomes. * To determine the association between overall dysphonia outcomes and adoption of clear speech.
About 23 million Americans-roughly 1 in 13 people-suffer from voice problems at any given time. These issues can make it hard to speak clearly, lead to throat pain or fatigue, and affect daily life, work, and emotional well-being. The two most common types of voice problems are: * Muscle tension dysphonia (MTD): when muscles in the throat are too tight during speaking. * Benign vocal fold lesions: such as nodules or swelling on the vocal cords due to overuse or strain. The most common treatment for these conditions is behavioral voice therapy, which involves working with a speech-language pathologist (SLP) to learn new ways to use the voice. However, over a third of patients drop out, and long-term success is uncertain. One major challenge is helping patients apply what they learn in therapy to their real-life conversations-a step often saved for the end of treatment or skipped entirely. Traditional voice therapies often follow a strict step-by-step ("hierarchical") approach. Patients start with basic sounds or exercises and only work up to everyday speech later. But this method may not be the most effective, and many people struggle to use the new techniques outside the clinic. To solve this problem, the research team developed a new method called Conversation Training Therapy (CTT). CTT flips the traditional approach: it begins with practicing clear, intentional speech in real conversation from the first session. This helps patients immediately apply new voice skills in real-life situations, which may lead to faster, more lasting results. Studies have shown that CTT leads to meaningful improvements in voice-related quality of life both immediately and up to three months after therapy. It is now being used in national research studies and has gained recognition as a promising, evidence-based therapy. The current research will compare CTT to traditional methods over a full year, helping to answer important questions about what makes voice therapy work-and how to help more people benefit from it long-term.
Study Type
INTERVENTIONAL
Allocation
RANDOMIZED
Purpose
TREATMENT
Masking
NONE
Enrollment
120
Participants in the hierarchical version of Conversation Training Therapy (CTTH) will receive four weekly sessions of voice therapy. This approach gradually increases the difficulty of speaking tasks-from simple sounds to full conversations-based on the participant's progress. The therapy begins with basic awareness and speech sounds (e.g., consonant-vowel pairs), then progresses through words, phrases, and sentences, culminating in natural conversation. Each level must be completed with at least 80% accuracy before proceeding to the next one. The structure is modeled after traditional voice therapies like resonant voice and aims to help participants succeed early and reduce mental fatigue. Daily homework includes seven short (2.5-minute) practice sessions, aligned with prior research showing this is a realistic and effective amount of practice.
The therapy includes four weekly sessions and several key techniques: Clear Speech: Speaking clearly, like leaving an important voicemail. Awareness Training: Paying attention to how the voice sounds and feels in the mouth and face. Negative Practice: Switching between their "bad" voice and "good" therapy voice to recognize and improve differences. Embedded Gestures: Briefly holding certain speech sounds to reduce vocal strain and boost clarity. Prosody and Projection: Working on pitch, rhythm, and speaking louder through better technique. Participants practice these skills throughout the day using a mobile app to track their progress and record a weekly sample. Unlike hierarchical models, components in CTT can be introduced in any order based on individual needs, making it flexible and personalized
Emory Voice Center at Emory University Hospital Midtown
Atlanta, Georgia, United States
RECRUITINGChange in Voice Handicap Index-10 (VHI-10) score
The Voice Handicap Index-10 (VHI-10) is a 10-question survey used to measure how much a voice problem affects a person's daily life. Scores range from 0 to 40, with higher scores indicating a greater perceived voice handicap. Each item is rated from 0 ("never") to 4 ("always") A higher score means greater voice-related disability as perceived by the patient.
Time frame: During intervention (4 week period of active treatment: Week 1, Week 2, Week 3, Week 4) immediately post treatment ( week 5), 3month, 6 month and 12-months post treatment
Change in vowel space
Change in vowel space refers to alterations in the acoustic range of vowel production-specifically, how far apart vowels are from each other in the formant frequency space (typically plotted as F1 vs F2, the first and second formants). It reflects the clarity, precision, and distinctiveness of vowel articulation during speech. Vowel space will be measured through acoustic analysis of participants' spoken sentences from the Sentence Intelligibility Test (SIT). These sentences include both corner vowels (e.g., heed, had, hod, who'd) and non-corner vowels (e.g., hid, head, hut, hood), allowing for detailed tracking of articulatory patterns
Time frame: During intervention (4 week period of active treatment: Week 1, Week 2, Week 3, Week 4) immediately post treatment ( week 5), 3month, 6 month and 12-months post treatment.
Auditory- Perceptual severity
It is measured using the CAPE-V (Consensus Auditory-Perceptual Evaluation of Voice) scale, a standardized tool. Raters (blinded speech-language pathologists) will score overall voice severity on a 0-100 mm visual analog scale-with 0 indicating no perceived disorder and 100 indicating extremely severe voice abnormality. This rating captures a listener's perception of qualities like roughness, breathiness, strain, and overall voice quality.
Time frame: Baseline , 3 month, 6 month, 12 month
Stroboscopic changes: Glottal Closure
Visual inspection during phonation. Described as: complete, anterior gap, posterior gap, hourglass, spindle, irregular, incomplete. Assessments will be made using a standardized tool called VALI (Voice-Vibratory Assessment with Laryngeal Imaging), which scores multiple vibratory features.
Time frame: Baseline, Post intervention (1 week), 3 month, 6 month, 12 month
Stroboscopic changes: Amplitude
Observed lateral movement of vocal folds. Subjectively rated as reduced, normal, or excessive. Assessments will be made using a standardized tool called VALI (Voice-Vibratory Assessment with Laryngeal Imaging), which scores multiple vibratory features.
Time frame: Baseline, Post intervention (1 week), 3 month, 6 month, 12 month
Stroboscopic changes: Mucosal Wave
Magnitude of mucosal membrane during vibration. Scored on a 0-10 scale based on visibility and extent Assessments will be made using a standardized tool called VALI (Voice-Vibratory Assessment with Laryngeal Imaging), which scores multiple vibratory features.
Time frame: Baseline, Post intervention (1 week), 3 month, 6 month, 12 month
Stroboscopic changes: Free edge contour
Shape of vocal fold edges during vibration. Rated as: normal, convex, concave, irregular, or rough Assessments will be made using a standardized tool called VALI (Voice-Vibratory Assessment with Laryngeal Imaging), which scores multiple vibratory features.
Time frame: Baseline, Post intervention (1 week), 3 month, 6 month, 12 month
Stroboscopic changes: Phase closure
Duration of open vs. closed phase during each vibratory cycle. Rated as: open phase dominates, equal, closed phase dominates Assessments will be made using a standardized tool called VALI (Voice-Vibratory Assessment with Laryngeal Imaging), which scores multiple vibratory features.
Time frame: Baseline, Post intervention (1 week), 3 month, 6 month, 12 month
Stroboscopic changes: Phase symmetry
Synchrony of left and right vocal fold movement. Rated as: symmetric or asymmetric Assessments will be made using a standardized tool called VALI (Voice-Vibratory Assessment with Laryngeal Imaging), which scores multiple vibratory features.
Time frame: Baseline, Post intervention (1 week), 3 month, 6 month, 12 month
Stroboscopic changes: Regularity percentage
Consistency of vibratory cycles over time. Expressed as % of time vibration is regular Assessments will be made using a standardized tool called VALI (Voice-Vibratory Assessment with Laryngeal Imaging), which scores multiple vibratory features.
Time frame: Baseline, Post intervention (1 week), 3 month, 6 month, 12 month
Aerodynamic changes: Average airflow in speech
Average airflow in speech - measures how much air passes through the vocal folds per second during phonation (typically in liters/second or mL/second). Abnormal airflow can indicate vocal fold dysfunction, such as incomplete closure or excessive strain. This data will be collected using the Phonatory Aerodynamic System 6600 (PAS) by PENTAX, a specialized tool for assessing voice aerodynamics. The PAS system uses a mask and microphone setup to capture airflow and sound pressure during structured speech tasks.
Time frame: Baseline, Post intervention (1 week), 3 month, 6 month, 12 month
Aerodynamic changes: Average number of breaths
Average number of breaths - counts how many breaths a speaker takes during a speech task. Higher breath counts may reflect inefficient voice use or reduced respiratory support. This data will be collected using the Phonatory Aerodynamic System 6600 (PAS) by PENTAX, a specialized tool for assessing voice aerodynamics. The PAS system uses a mask and microphone setup to capture airflow and sound pressure during structured speech tasks.
Time frame: Baseline, Post intervention (1 week), 3 month, 6 month, 12 month
Aerodynamic changes: Average speaking duration
Speaking duration is how long it takes the person to speak the entire passage. This data will be collected using the Phonatory Aerodynamic System 6600 (PAS) by PENTAX, a specialized tool for assessing voice aerodynamics. The PAS system uses a mask and microphone setup to capture airflow and sound pressure during structured speech tasks.
Time frame: Baseline, Post intervention (1 week), 3 month, 6 month, 12 month
Adherence
Practice will be analyzed for frequency of practice per week
Time frame: Baseline, end of study (12 month)
Practice fidelity
Practice fidelity will be assessed via analysis of audio-recordings for presence/absence of negative practice and type of practice (i.e. conversation, monology, recitation, reading aloud).
Time frame: Baseline, end of study (12 month)
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.