Phase modification for increasing the loudness of telephone speech in adverse noise conditions
Near-end noise conditions, where the background noise is in the listener’s environment, are common in mobile communications. In such conditions, post-processing techniques can be used in the receiving mobile device to improve the speech signal's intelligibility and reduce its listening effort by increasing speech prominence, clarity, and loudness. While many intelligibility enhancement methods have been proposed for this problem scenario, they are commonly based on either modifying the magnitude spectrum or the time domain signal directly.
In this study, two loudness increasing methods based on the modification of the phase spectrum are studied. One of the algorithms aims to reduce the dynamic range of the signal and take advantage of the energy gain resulting from amplitude normalization to increase the loudness, while the other algorithm is designed to sharpen the high-amplitude peaks in the time-domain signal generated by the periodic glottal excitation to make the speech sound more clear. Both methods are based on first modifying only the phase spectrum, after which the time-domain signal is computed using the inverse Fourier transform. Finally, the time-domain signal is amplitude normalized by scaling its sample values so that they occupy the original amplitude range of the processed frame. The performance of the proposed methods was compared to unprocessed speech using subjective loudness and quality evaluations as well as objective quality measures. The results suggest that the phase modification methods increase both the loudness and clarity of telephone speech, thus reducing the listening effort while generally maintaining the subjective speech quality.