An Improvised Word Recognition System by Hybridizing CNN and SIFT

International Journal of Computer Trends and Technology (IJCTT)          
© 2019 by IJCTT Journal
Volume-67 Issue-4
Year of Publication : 2019
Authors : Neethu Mohan, Arul V H
DOI :  10.14445/22312803/IJCTT-V67I4P109


MLA Style: Neethu Mohan, Arul V H "An Improvised Word Recognition System by Hybridizing CNN and SIFT" International Journal of Computer Trends and Technology 67.4 (2019): 40-43.

APA Style:Neethu Mohan, Arul V H (2019). An Improvised Word Recognition System by Hybridizing CNN and SIFT. International Journal of Computer Trends and Technology, 67(4), 40-43.

This paper mainly focuses on developing an efficient word recognition system by combining the good parameters of the SIFT incorporating it with the CNN structure. Several advancements are made in the Automatic Speech Recognition (ASR) technology that brings to ease the machine to understand the natural language. The main constrain rise is the nature of the input speech signal, which makes it difficult to retain the original information. This can be overcome by hybridizing the (SIFT) Scale Invariant Feature Transform with (CNN) Convolutional Neural Network architecture. The noisy speech signal is initially passed through the pre –processing stage and converted to the spectrogram to extract the feature. The extracted features are now fed to the layers of CNN in order to train the model. At the testing phase the vectors are now cross matched and the maximum close weighted value from the fully connected layers lead to the output. The system performs with an efficiency of 94.78% in non-isolated environment.

[1] Ossama Abdel-Hamid, Abdel-rahman Mohamed, Hui Jiang, Li Deng, Gerald Penn, and Dong Yu, “Convolutional Neural Networks for Speech Recognition”, IEEE/ACM transactions on audio, speech, and language processing, vol. 22, no. 10, october 2014.
[3] QuangTrung Nguyen, The Duy Bui, “Speech classification using SIFT features on spectrogram images”, Vietnam Journal of Computer Science, 3(4), 247-257.
[4] classification-algorithms.
[5] Osisanwo F.Y, Akinsola J.E.T, Awodele O, Hinmikaiye J.O , Olakanmi O, Akinjobi J, “Supervised Machine Learning Algorithms: Classification and Comparison”, International Journal of Computer Trends and Technology (IJCTT) – Volume 48 Number 3 June 2017.
[6] Geoffrey Hinton, Li Deng, Dong Yu, George Dahl, Abdel-rahman Mohamed, NavdeepJaitly, Andrew Senior, Vincent Vanhoucke, Patrick Nguyen, Brian Kingsbury, Tara Sainath, “Deep Neural Networks for Acoustic Modeling in Speech Recognition”,IEEE Signal Processing Magazine | November 2012, Vol 29: pp. 82-97.
[8] Xiu Zhang, Bilei Zhu, Linwei Li, Wei Li, Xiaoqiang Li, Wei Wang, Peizhong Lu and Wenqiang Zhang, “SIFT-based local spectrogram image descriptor: a novel feature for robust music identification”, Zhang et al. EURASIP Journal on Audio, Speech, and Music Processing (2015) 2015:6.
[9] Jui-Ting Huang, Jinyu Li, and Yifan Gong, “An Analysis of Convolutional Neural Networks for Speech Recognition”, Microsoft Corporation, One Microsoft Way, Redmond, WA 98052.
[10] Tomoaki Yamazaki, Tetsuya Fujikawa, Jiro Katto, “Improving the performance of SIFT using bilateral filter and its application to generic object recognition”, 2012 IEEEInernational Conference on Acoustics, Speech, and Signal Processing, ICASSP 2012- Kyoto
[11] Harsh Pokarana, “Explanation of Convolutional Neural Network”, IIT Kanpur.
[12] Tao Wang, David J. Wu, Adam Coates Andrew Y. Ng; “End-to-End Text Recognition with Convolutional Neural Networks”; Stanford University, 353 Serra Mall, Stanford, CA 94305

ASR, SIFT, CNN, spectrogram