An Improvised Word Recognition System using CNN in a Non-Isolated Environment

Neethu Mohan; Arul V H

doi:https://doi.org/10.14445/22312803/IJCTT-V67I4P117

Research Article | Open Access | Download PDF

Volume 67 | Issue 4 | Year 2019 | Article Id. IJCTT-V67I4P117 | DOI : https://doi.org/10.14445/22312803/IJCTT-V67I4P117

An Improvised Word Recognition System using CNN in a Non-Isolated Environment

Neethu Mohan, Arul V H

Citation :

Neethu Mohan, Arul V H, "An Improvised Word Recognition System using CNN in a Non-Isolated Environment," International Journal of Computer Trends and Technology (IJCTT), vol. 67, no. 4, pp. 76-78, 2019. Crossref, https://doi.org/10.14445/22312803/ IJCTT-V67I4P117

Abstract

</>This paper mainly focuses on developing a word recognition system using the CNN structure. Several advancements are made in the Automatic Speech Recognition (ASR) technology that brings to ease the machine to understand the natural language. The main constrain rise is the nature of the input speech signal, which makes it difficult to retain the original information. The noisy speech signal is initially passed through the pre–processing stage and converted to the spectrogram to extract the feature. To extract the features, these spectrogram are now fed to the layers of CNN in order to feature extract and then train the model. At the testing phase the vectors are now cross matched and the maximum close weighted value from the fully connected layers lead to the output. The system performs with an efficiency of 88.20% in non-isolated environment..

Keywords

ASR,CNN, spectrogram

References

[1] Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu, “Convolutional Neural Networks for Speech Recognition”, IEEE/ACM transactions on audio, speech, and language processing, vol. 22, no. 10, october 2014.
[2] Niko Moritz ,JörnAnemüller , Birger Kollmeier; “Amplitude Modulation Spectrogram Based Features for Robust Speech Recognition in Noisy and Reverberant Environments”, Conference Paper in Acoustics, Speech, and Signal Processing, 1988. ICASSP-88., 1988 International Conference on May 2011.
[3] Jui-Ting Huang, Jinyu Li, and Yifan Gong, “An Analysis of Convolutional Neural Networks for Speech Recognition”, Microsoft Corporation, One Microsoft Way, Redmond, WA 98052.
[4] Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu, “Convolutional Neural Networks for Speech Recognition”, IEEE/ACM transactions on audio, speech, and language processing, vol. 22, no. 10, october 2014.
[5] Osisanwo F.Y, Akinsola J.E.T, Awodele O, Hinmikaiye J.O , Olakanmi O, Akinjobi J, “Supervised Machine Learning Algorithms: Classification and Comparison”, International Journal of Computer Trends and Technology (IJCTT) – Volume 48 Number 3 June 2017.
[6] https://www.quora.com/How-is-convolutional-neural-network-algorithm-better-as-compared-to-other-imageclassification-algorithms.
[7] Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, NavdeepJaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Brian Kingsbury, Tara Sainath, “Deep Neural Networks for Acoustic Modeling in Speech Recognition”,IEEE Signal Processing Magazine | November 2012, Vol 29: pp. 82-97.
[8] Harsh Pokarana, “Explanation of Convolutional Neural Network”, IIT Kanpur.
[9] Tao Wang, David J. Wu, Adam Coates Andrew Y. Ng; “End-to-End Text Recognition with Convolutional Neural Networks”; Stanford University, 353 Serra Mall, Stanford, CA 94305.