Identification of oral and nasal segments in band-vocoded speech : Specific difficulties associated with nasal vowels in French.
Recognition of french oral and nasal segments (consonants and vowels) was investigated in 6 normal-hearing listeners using noise-band vocoded speech from naturally produced VCV and V tokens. Our aim was to determine whether nasal segments would behave specifically in comparison to oral sounds when processed through spectrally degraded algorithms in order to investigate the influence of band-limited envelope modulations on their perceptual classification. This issue was addressed in two parallel N-Alternate forced-choice experiments in which participants were required to classify either consonants (VCV sequences, 18 consonants in 3 different vowel contexts) or vowels (natural steady-state extracts, 13 vowels) within the full French segment inventory.
Naturally produced VCV sequences and steady-state vowels were processed through a noise-band vocoder in order to produce various degrees of spectral (number of frequency bands among {1, 2, 4, 6, 8} bands) and temporal (low-pass modulation frequency cutoff among {4, 16, 128} Hz) resolution. Performance were compared to chance levels using binomial tests for each combination of number of bands and frequency modulation cutoff in each experiment (consonant identification vs. vowel identification), specifically comparing participants perfomance between oral and nasal segments.
As expected, classification performance showed a gradual increase in performance in relation to the increase in both spectral and temporal resolution. In the consonant classification experiment, nasal segments displayed an evolution of performance that closely mimicked the results of oral consonants. In the vowel classification experiment however, a strong discrepancy was observed between oral and nasal segments: Though oral vowels showed performance patterns that were relatively similar to the consonant identification performance, nasal vowels showed very poor performance over all conditions of acoustic degradation. In the specific case of nasal vowels, performance never significantly out-performed chance levels.
We are now addressing issues related to the interpretation of this observation through two different directions. Confusion matrices were extracted from the observed data to provide information concerning the various strategies and feature confusions involved. Though there is a small under-representation of nasal responses, this difference is (1) not massive (9% obs. vs. 11% exp. for consonants, 17% obs. vs. 23% exp. for vowels) and (2) displayed over both consonants and vowels. Further investigations will help identifying tendencies in the classification errors. In parallel, acoustic analyses and control experiments are performed in order to target which cues (static or dynamic) may be lacking that prevent nasal vowels from being correctly identified.