Recording and evaluation of a multiple-talker version of the WAKO rhyme test
Human speech recognition shows great robustness despite a wide range of speech production variability including talkers, speaking rates, and dialects. Word recognition accuracy decreases and response latency increases when this variability is introduced in word recognition tests compared to a single talker condition (Mullennix et al., 1988; Kirk et al., 1997). The use of a single professional speaker within a commercially available speech test underestimates the real difficulties met in daily life by hearing impaired listeners and limits the ability to generalize any findings on speech perception in general (Clark, 1973).
A multi talker version of the WAKO rhyme test (von Wallenberg & Kollmeier, 1989) was recorded by four non-professional talkers (two male and two female native German speakers with either German or Swiss accent) for this experiment. The quality of the recordings was first evaluated by ten normal hearing subjects.
Nine hearing aid wearers were tested in aided and unaided conditions with the original and the newly recorded versions of the WAKO test. Word recognition, response time and subjective sound quality ratings were simultaneously measured for each test condition. Statistical analysis used a logistic mixed effect regression model on the word recognition scores and a linear mixed effect regression model on the response times and the sound quality ratings. Test condition and test version were treated as fixed effects and their p-values were obtained by likelihood ratio tests comparing the full model against the model without the effect. Introducing variability in the test material led to a significant decrease in performance for word recognition accuracy, response times and sound quality ratings. These results suggest that using speech materials closer to that experienced in daily life makes word recognition more difficult compared to material intended for clinical use. Applications of these findings are suggested for the use of test designs for researchers who have difficulties interpreting results due to celling effects.