Impact of Vocal Tract Length Normalization on the Speech Recognition Performance of an English Vowel Phoneme Recognizer for the Recognition of Children Voices

  IJCTT-book-cover
 
International Journal of Computer Trends and Technology (IJCTT)          
 
© 2016 by IJCTT Journal
Volume-39 Number-2
Year of Publication : 2016
Authors : Swapnanil Gogoi, Utpal Bhattacharjee
  10.14445/22312803/IJCTT-V39P118

MLA

Swapnanil Gogoi, Utpal Bhattacharjee "Impact of Vocal Tract Length Normalization on the Speech Recognition Performance of an English Vowel Phoneme Recognizer for the Recognition of Children Voices". International Journal of Computer Trends and Technology (IJCTT) V39(2):105-109, September 2016. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract -
Differences in human vocal tract lengths can cause inter speaker acoustic variability in speech signals spoken by different speakers for the same textual version and due to these variations, the robustness of a speaker independent (SI) speech recognition system is affected. Speaker normalization using vocal tract length normalization (VTLN) is an effective approach to reduce the affect of these types of variability from speech signals. In this paper impact of VTLN approach has been investigated on the speech recognition performance of an English vowel phoneme recognizer with both noise free and noisy speech signals spoken by children. Pattern recognition approach based on Hidden Markov Model (HMM) has been used to develop the English vowel phoneme recognizer. Here training phase of the automatic speech recognition (ASR) system has been performed with speech signals spoken by adult male and female speakers and testing phase is performed by the children speech signals. In this investigation, it has been observed that use of VTLN can effectively improve the robustness of the English vowel phoneme recognizer in both noise free and noisy conditions.

References
[1] L. R. Rabiner and R. W. Schafer, Digital processing of speech signals. Prentice Hall, 1978.
[2] B.H. JUANG, L. R. RABINER, and J. G. WILPON, "On the Use of Bandpass Liftering in Speech Recognition," IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-35, no. 7, pp. 947–954, 1987.
[3] S. V. Arora, "Effect of Time Derivatives of MFCC Features on HMM Based Speech Recognition System,"ACEE international Journal on Signal and Image Processing, vol. 4, no. 3, pp. 50–55, 2013.
[4] S. Sharma, A. Shukla, and P. Mishra, "Speech and Language Recognition using MFCC and DELTAMFCC," International Journal of Engineering Trends and Technology (IJETT), vol. 12, no. 9, pp. 449–452, 2014.
[5] F. Zheng, G. Zheng, and Z. Song, "Comparison of different implementations of MFCC," Journal of Computer Science and Technology, vol. 16, no. 6, pp. 582–589, 2001.
[6] J. Hillenbrand, L. A. Getty, M. J. Clark, and K. Wheeler, "Acoustic characteristics of American English vowels," The Journal of the Acoustical society of America, vol. 95, no. 5, pp. 3099–3111, 1995. [Online]. Available: http://homepages.wmich.edu/~hillenbr/voweldata.html. Accessed: Aug.22,2014.
[7] "NOISEX92 noise database". [Online]. Available: http://spib.rice.edu/spib/select noise.html. Accessed: Dec. 20, 2013.
[8] J. Lung and W. Jing, et al., "Implementation of vocal tract length normalization for phoneme recognition on TIMIT speech corpus," in International Conference on Information Communication and Management, IPCSIT, vol. 16, 2011.
[9] D. Giuliani, M. Gerosa, and F. Brugnara, "Improved automatic speech recognition through speaker normalization," Computer Speech & Language, vol. 20, no. 1, pp. 107–123, 2006.
[10] L. Lee and R. C. Rose, "Speaker normalization using efficient frequency warping procedures," in IEEE International Conference on Acoustics, Speech, and Signal Processing, IEEE, vol. 1, 1996.
[11] L. Lee and R. C. Rose, "A frequency warping approach to speaker normalization," IEEE Transactions on Speech and audio processing, vol. 6, no. 1, pp. 49–60, 1998.
[12] C. J. Leggetter and P. C. Woodland. "Maximum likelihood linear regression for speaker adaptation of continuous density hidden Markov models." Computer Speech & Language, vol. 9, no. 2, pp. 171-185, 1995.
[13] B. Das, et al. "Aging speech recognition with speaker adaptation techniques: Study on medium vocabulary continuous Bengali speech."Pattern Recognition Letters, vol. 34, no. 3 pp. 335-343, 2013.
[14] J. Lung et al., "Implementation of Vocal Tract Length Normalization for Phoneme Recognition on TIMIT Speech Corpus," in International Conference on Information Communication and Management, Singapore: IPCSIT, 2011, pp. 136–140.
[15] J.W. Picone, "Signal modeling techniques in speech recognition," Proceedings of the IEEE, vol. 81, no. 9, pp. 1215–1247, 1993.
[16] E. Loweimi, S. M. Ahadi, T. Drugman and S. Loveymi, "On the Importance of Pre-emphasis and Window Shape in Phase-Based Speech Recognition," in International Conference on Nonlinear Speech Processing, Berlin: Springer, 2013.

Keywords
Automatic speech recognition, speaker independent, vocal tract lengths, vocal tract length normalization, Hidden Markov model.