Large Vocabulary in Continuous Speech Recognition Using HMM and Normal Fit

  IJCTT-book-cover
 
International Journal of Computer Trends and Technology (IJCTT)          
 
© 2016 by IJCTT Journal
Volume-42 Number-2
Year of Publication : 2016
Authors : Hemakumar G, Punithavalli M, Thippeswamy K
DOI :  10.14445/22312803/IJCTT-V42P117

MLA

Hemakumar G, Punithavalli M, Thippeswamy K  "Large Vocabulary in Continuous Speech Recognition Using HMM and Normal Fit". International Journal of Computer Trends and Technology (IJCTT) V42(2):102-107, December 2016. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract -
this paper addresses the problem of large vocabulary speaker independent continuous speech recognition using the phonemes, Hidden Markov Model (HMM) and Normal fit method. Here we first detect for the voiced part in speech signal through computing dynamic threshold in each frame. Real Cepstrum coefficients are extracted as features from the voiced frames. The Baum–Welch algorithm is applied for training those features. Then normal fit technique is applied, the outputted values are labelled using correspondent phoneme or syllable. The model is tested for 5 languages namely English, Kannada, Hindi, Tamil and Telugu. The automatic segmentation of speech signals average accuracy rate is 95.42% and miss rate of about 4.58%. In the large vocabulary, average Word Recognition Rate (WRR) is 85.16% and average Word Error Rate (WER) is 14.84%. All computations are done using mat lab.

References
[1] Douglas O Shaughnessy, Speech Communications: Human and Machine, Universities Press (India) Private Limited, Hyderabad, Reprinted on 2004.
[2] Sabato Marco Siniscalchi et Al., “Hermitian Polynomial for Speaker Adaptation of Connectionist Speech Recognition Systems”, IEEE Transactions on Audio, Speech, And Language Processing, Vol. 21, NO. 10, October 2013, page No 2151-2161.
[3] Martin Krawczyk and Timo Gerkmann, “STFT Phase Reconstruction in Voiced Speech for an Improved Single- Channel Speech Enhancement”, IEEE Transactions on Audio, Speech and Language Processing, Vol. 22, No. 12, December 2014, Pg. 1931-1940.
[4] Matthew McCallum et al., “Stochastic-Deterministic MMSE STFT Speech Enhancement with General A Priori Information”, IEEE Transactions on Audio, Speech and Language Processing, Vol. 21, No. 7, July 2013, Pg. 1445- 1457.
[5] Jesper Rindom Jensen et al., “A Class of Optimal Rectangular Filtering Matrices for Single-Channel Signal Enhancement in the Time Domain”, IEEE Transactions on Audio, Speech and Language Processing, Vol. 21, No. 12, December 2013, Pg. 2595-2606.
[6] Yi Hu and Philipos C. Loizou, “Evaluation of Objective Quality Measures for Speech Enhancement”, IEEE Transactions on Audio, Speech and Language Processing, Vol. 16, No. 1, January 2008, Pg. 229-238.
[7] Robert Rozman and Dusan M. Kodek, “Using asymmetric windows in automatic speech recognition”, Speech Communication 49 (2007), page no 268–276.
[8] Li Deng, “A dynamic, feature-based approach to the interface between phonology and phonetics for speech modeling and recognition”, Speech Communication 24 (1998), page no. 299 to 323.
[9] Yi Hu and Philipos C. Loizou, “Evaluation of Objective Quality Measures for Speech Enhancement”, IEEE Transactions on Audio, Speech and Language Processing, Vol. 16, No. 1, January 2008, Pg. 229-238.
[10] Patricia Scanlon and Daniel P.W. Ellis, “Using Broad Phonetic Group Experts for Improved Speech Recognition”, IEEE transaction on Audio, Speech and Language processing, VOL 15, No. 3, March 2007.
[11] Hemakumar G. and Punitha P., “Large Vocabulary Isolated Word Recognition Using Syllable, HMM And Normal Fit”, published by International Journal of Scientific & Engineering Research, Volume 5, Issue 9, Sept-2014, Pg. No: 34-37, ISSN: 2229-5518.
[12] Hemakumar G. and Punitha P., “Large Vocabulary Speech Recognition: Speaker Dependent and Speaker Independent”, Springer - Advances in Intelligent and Soft Computing, Vol 339, Pg. No 73-80, Mandal et al (Eds): Information Systems Design and Intelligent Applications.
[13] V. Kamakshi Prasad et al., “Continuous Speech Recognition Using Automatically Segmented Data as Syllabic Units”, Published at ICSP’02 Proceedings, 0- 7803-7488-6/02 © 2002 IEEE, Page No.235-238.
[14] Lalit R.Bahl, et al, “Estimating Hidden Markov Model Parameters So as to maximize speech recognition Accuracy”, IEEE Transactions on Audio, Speech and Language processingvol.1,no.1, 1993.
[15] Nam Soo Kim et al., “On estimating Robust probability Distribution in HMM based speech recognition”, IEEE Transactions on Audio, Speech and Language processing, vol.3, no.4, 1995.
[16] Thangarajan R., Natarajan A. M. and Selvam M. "Syllable modeling in continuous speech recognition for Tamil language", International Journal for Speech Technology, vol. 12, pp.47 -57 2009.
[17] R. K. Aggarwal et al (2011), “Using Gaussian Mixtures for Hindi Speech Recognition System”, International Journal of Signal Processing, Image Processing and Pattern Recognition Vol. 4, No. 4, December, 2011, page no 157- 170.

Keywords
Automatic Speech Recognition (ASR), Speech Enhancement, Speech Perception, HMM and Normal fit method.