Low-Resource Constraints for Speech Recognition using HDNN

S. Kousar Bhanu

doi:10.14445/22312803/IJCTT-V61P113

Research Article | Open Access | Download PDF

Volume 61 | Number 1 | Year 2018 | Article Id. IJCTT-V61P113 | DOI : https://doi.org/10.14445/22312803/IJCTT-V61P113

Low-Resource Constraints for Speech Recognition using HDNN

S. Kousar Bhanu

Citation :

S. Kousar Bhanu, "Low-Resource Constraints for Speech Recognition using HDNN," International Journal of Computer Trends and Technology (IJCTT), vol. 61, no. 1, pp. 70-73, 2018. Crossref, https://doi.org/10.14445/22312803/IJCTT-V61P113

Abstract

In Speech Recognition acoustic model is a document to represent the communication between audio signals that make up speech. In many approaches Neural Network develops an attractive acoustic modelling like Speaker Adaptation whereby adapted to acoustic features. The study determines that in Speech Recognition the Highway Deep Neural Network (HDNN’s) contains the two gate units that secured over all the hidden layers to supervise the way of the highway networks. These gate units are shared beyond all the hidden layers to reduce the size of model parameters, all the model parameters are updated in sequence training to improve the results. In this paper, HDNN is used for implementation of Gate functions using Stacked Autoencoder (SAE), a layer-wise approach to train Deep Neural Network based on Speech Repository analysis. These Encoders decides a Machine Learning Model to locate a Low level Dimensional portrayal of model parameters has taken from Direct Voice Input (DVI). DVI intended to voice command-and-control to read the parameters from the speech utterances for each speaker.

Keywords

Stacked Auto-Encoders, Automatic Speech Recognition, Stacked Auto-Encoders, Speech Recognition, HDNN, Acoustic Models

References

[1] Swietojanski, Pawel, Jinyu Li, and Steve Renals. "Learning hidden unit contributions for unsupervised acoustic model adaptation." IEEE/ACM Transactions on Audio, Speech, and Language Processing 24.8 (2016): 1450-1463.
[2] Dahl, George E., et al. "Context-dependent pre-trained deep neural networks for large-vocabulary speech recognition." IEEE Transactions on audio, speech, and language processing 20.1 (2012): 30-42.
[3] Lu, Liang, and Steve Renals. "Small-footprint highway deep neural networks for speech recognition." IEEE/ACM Transactions on Audio, Speech, and Language Processing25.7 (2017): 1502-1511.
[4] Lu, Liang. "Sequence training and adaptation of highway deep neural networks." Spoken Language Technology Workshop (SLT), 2016 IEEE. IEEE, 2016.
[5] Srivastava, Rupesh K., Klaus Greff, and Jürgen Schmidhuber. "Training very deep networks." Advances in neural information processing systems. 2015.
[6] Sak, Ha?im, Andrew Senior, and Françoise Beaufays. "Long short-term memory recurrent neural network architectures for large scale acoustic modeling." Fifteenth annual conference of the international speech communication association. 2014.
[7] Abdel-Hamid, Ossama, et al. "Convolutional neural networks for speech recognition." IEEE/ACM Transactions on audio, speech, and language processing 22.10 (2014): 1533-1545.
[8] Zhou, Ju, Li Ju, and Xiaolong Zhang. "A hybrid learning model based on auto-encoders." Industrial Electronics and Applications (ICIEA), 2017 12th IEEE Conference on. IEEE, 2017.
[9] Shinoda, Koichi. "Speaker adaptation techniques for automatic speech recognition." Proc. APSIPA ASC 2011 Xi`an(2011).
[10] Cao, Zihong, et al. "Auto-encoder using the bi-firing activation function." Machine Learning and Cybernetics (ICMLC), 2014 International Conference on. Vol. 1. IEEE, 2014.
[11] Lee, Jae-Neung, and Keun-Chang Kwak. "A performance comparison of auto-encoder and its variants for classification." Signals and Systems (ICSigSys), 2017 International Conference on. IEEE, 2017.
Lin, Szu-Yin, et al. "A Dynamic Data-Driven Fine-Tuning Approach for Stacked Auto-Encoder Neural Network." e-Business Engineering (ICEBE), 2017 IEEE 14th International Conference on. IEEE, 2017