Participants received a bilateral pure-tone hearing screen administered by the research team. All potential participants who failed the hearing screen were provided with information about its meaning and referral for further audiological testing. Participants who passed the hearing screen and other inclusion criteria were divided into 6 groups, each of which were presented with 144 stimuli equally distributed among processing conditions. Listeners choose a comfortable listening level using supplied headphones and were able to control the rate of presentation. Following a short practice session, listeners were be asked to transcribe each target sentence. The intelligibility of each stimulus was estimated by determining the mean percentage of content words correctly transcribed. After transcription, listeners were asked for two qualitative judgments: (1) the "clarity" of the stimulus, and (2) the "listening effort" involved. The quality of each stimulus was estimated by the median quality judgment, and the effort likewise. Listening sessions were located in a quiet room and presentation was controlled by the Superlab presentation software program. The Stimuli consisted of audio recordings of target spondaic words embedded in a carrier sentence produced by a male and a female native speaker of American English recorded under quiet conditions. Each stimulus presented to the listeners for identification was either unmasked pristine speech or speech that had been processed in one of five ways with different mixtures of noise and sensor movement. The latter are identified as QoS Levels 1-5. Collectively, the estimates of word intelligibility, clarity, and listening effort under the different conditions shed light on the effectiveness with which the tested algorithm preserves listener intelligibility with acceptable effort and quality.
Participants received a bilateral pure-tone hearing screen administered by a clinically trained member of the research team. The threshold criteria was 20 dB SPL at 250 and 500 Hz, and 25 dB SPL at 1000, 2000, 4000, and 8000 Hz. All potential participants who failed the hearing screen were provided with information about its meaning and referred for further audiological testing. Participants who passed the hearing screen and other inclusion criteria were divided into 6 groups, each of which was presented with 144 stimuli equally distributed among processing conditions (Pristine Non-Moving Speech plus QoS Levels 1-5). Listeners self-selected a comfortable listening level using supplied headphones and were able to control the rate of presentation. Following a short practice session, listeners were asked to transcribe the target sentences, and the intelligibility of each stimulus was estimated by determining the mean percentage of content words correctly transcribed. After transcription, listeners were also asked for two qualitative judgments using a visual analog scale: (1) the "clarity" of the stimulus, and (2) the "listening effort" involved. These measures are sensitive to situations where listeners manage to extract the uttered words from the signal, but with increasing difficulty. The quality of each stimulus was estimated by the median quality judgment, and the effort likewise. Listening sessions took place in a quiet room and presentation was controlled by the Superlab presentation software program. The Stimuli consisted of audio recordings of target spondaic words embedded in a carrier sentence produced by a male and a female native speaker of American English recorded under quiet conditions. Each stimulus presented to the listeners for identification was either unmasked pristine speech or speech that had have been processed in one of five ways with different mixtures of noise and sensor movement. The latter are identified as QoS Levels 1-5. Each type of processing was expected to have a different effect on the underlying probability that the listener would be able to correctly identify the spoken word, the effort required to do so, and the quality of the presented recording. Data on familiarity ratings for the target spondaic words and relative intelligibility of the two speakers under different conditions of noise masking has been previously reported. Collectively, the estimates of word intelligibility, clarity, and listening effort under the different conditions is expected to shed light on the effectiveness with which the tested algorithm preserves listener intelligibility with acceptable effort and quality.
Study Type
INTERVENTIONAL
Allocation
RANDOMIZED
Purpose
BASIC_SCIENCE
Masking
SINGLE
Enrollment
72
Speech stimuli recorded using non-moving speakers and mics. No masking sources present. No BSS applied to multi-channel recordings. Very high output QoS values.
Speech stimuli recorded using non-moving speakers and mics. All masking sources present. No speech separation or extraction methods applied to multi-channel recordings. Very low output QoS values.
Speech stimuli recorded using non-moving speakers and mics. All masking sources present. Joint ACES scrubbing of both noise sources applied to multi-channel recordings. Very high output QoS values.
Speech stimuli recorded using linearly moving speech source and stationary masking sources and mics. All masking sources present. Joint ACES scrubbing of both noise sources applied to multi-channel recordings. Moderately high output QoS values.
Mixed speech and noise sources recorded using a stationary speech source, a stationary noise source, and a linearly moving noise source. A valid source hypothesis of the speech source is used to extract the speech source. High output QoS values.
Mixed speech and noise sources recorded using all stationary sources, and a linearly moving microphone (mic 1). Joint ACES scrubbing of both noise sources is used to reduce the response of Mic 1 to a residue of speech. Low output QoS values.
University of Cincinnati
Cincinnati, Ohio, United States
Target Word Transcription Accuracy
For each audio stimulus of a target word in a carrier phrase, the subject transcribed the target word they heard (by typing that word in a field in the display. Their transcription was later graded as correct or incorrect by a researcher.
Time frame: The subject transcribed the target word immediately after hearing the carrier phrase.
Listening Effort
Subject-reported effort required to identify word in carrier phrase, reported by moving a slider on a 0% to 100% scale.
Time frame: Immediately after hearing the carrier phrase.
Naturalness of speech
Subject-reported perceived naturalness of target word in carrier phrase, reported by moving a slider on a 0% to 100% scale.
Time frame: Immediately after hearing the carrier phrase.
This platform is for informational purposes only and does not constitute medical advice. Always consult a qualified healthcare professional.