Key Word searching in Speech using QbE and RNN

  IJCTT-book-cover
 
International Journal of Computer Trends and Technology (IJCTT)          
 
© 2019 by IJCTT Journal
Volume-67 Issue-9
Year of Publication : 2019
Authors : M.Mamatha
DOI :  10.14445/22312803/IJCTT-V67I9P109

MLA

MLA Style:M.Mamatha "Key Word searching in Speech using QbE and RNN" International Journal of Computer Trends and Technology 67.9 (2019):50-54.

APA Style M.Mamatha. Key Word searching in Speech using QbE and RNN International Journal of Computer Trends and Technology, 67(9),50-54.

Abstract
The modeling of textual content queries as sequences of embeddings for conducting similarity matching headquartered search inside speech aspects has been recently shown to beef up key word search (KWS) efficiency, peculiarly for the out-of-vocabulary (OOV) phrases. This procedure uses a dynamic time warping(DTW) centered search methodology, changing the KWS problem right into a pattern search difficulty by artificially modeling the text queries as pronunciation-founded embedding sequences. This question modeling is finished via concatenating and repeating body representations for every phoneme in the keyword’s pronunciation. In this letter, we advise a query model that contains temporal context information using recurrent neural networks(RNN) educated to generate practical question representations. With experiments conducted on the IARPA Babel software’s Turkish and Zulu datasets, we exhibit that the proposed RNN-founded query generation yields significant upgrades over the statistical query items of prior work, and yields a comparable performance to the state-of-the-art techniques for OOV KWS..

Reference
[1] C. Allauzen, M. Mohri, and M. Saraclar, “General indexation of weighted automata: Application to spoken utterance retrieval,” in Proc. Workshop Interdiscip. Approaches Speech Indexing Retrieval HLT-NAACL, 2004, pp. 33–40.
[2] D.Can and M. Sarac¸lar, “Lattice indexing for spoken term detection,” IEEE Trans. Audio, Speech, Lang. Process., vol. 19, no. 8, pp. 2338– 2347, Nov. 2011.
[3] M. Sarac¸lar and R. Sproat, “Lattice-based search for spoken utterance retrieval,” in Proc. HLT-NAACL Main Proc., 2004, vol. 51, pp. 129–136.
[4] D. Wang, J. Frankel, J. Tejedor, and S. King, “A comparison of phone and grapheme-based spoken term detection,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2008, pp. 4969–4972.
[5] A. Garcia and H. Gish, “Keyword spotting of arbitrary words using min- imal speech resources,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2006, vol. 1, pp. 949–952.
[6] S.-w. Lee, K. Tanaka, and Y. Itoh, “Combining multiple subword repre- sentations for open-vocabulary spoken document retrieval,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2005, vol. 1, pp. 505–508.
[7] Y. Zhang, “Unsupervised speech processing with applications to query- by-example spoken term detection,” Ph.D. dissertation, Dept. Elect. Eng. Comp. Sci., Massachusetts Inst. Technol., Cambridge, MA, USA, 2013.
[8] Y. He et al., “Using pronunciation-based morphological subword units to improve OOV handling in keyword search,” IEEE/ACM Trans. Audio, Speech, Lang. Process, vol. 24, no. 1, pp. 79–92, Jan. 2016.
[9] J. Mamou, B. Ramabhadran, and O. Siohan, “Vocabulary independent spoken term detection,” in Proc. 30th Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2007, pp. 615–622.
[10] D. Karakos and R. Schwartz, “Subword and phonetic search for detecting out-of-vocabulary keywords,” in Proc. Interspeech, 2014, pp. 2469–2473.
[11] L. Burget, “Hybrid word-subword decoding for spoken term detection,” in Proc. 31st Annu. Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2008, pp. 42–48.
[12] G. Chen, S. Khudanpur, D. Povey, J. Trmal, D. Yarowsky, and O. Yilmaz, “Quantifying the value of pronunciation lexicons for keyword search in lowresource languages,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Process., 2013, pp. 8560–8564.
[13] A. Gandhe, L. Qin, F. Metze, A. Rudnicky, I. Lane, and M. Eck, “Using web text to improve keyword spotting in speech,” in Proc. IEEE Workshop Autom. Speech Recognit. Understanding, 2013, pp. 428–433.
[14] G. Chen, O. Yilmaz, J. Trmal, D. Povey, and S. Khudanpur, “Using proxies for OOV keywords in the keyword search task,” in Proc. IEEE Workshop Autom. Speech Recognit. Understanding, 2013, pp. 416–421.
[15] M. Sarac¸lar et al., “An empirical study of confusion modeling in key- word search for low resource languages,” in Proc. IEEE Workshop Autom. Speech Recognit. Understanding, 2013, pp. 464–469.
[16] C. Liu, A. Jansen, G. Chen, K. Kintzley, J. Trmal, and S. Khudanpur, “Low- resource open vocabulary keyword search using point process models,” in Proc. Interspeech, 2014, pp. 2789–2793.
[17] B. Gundog?du, B. Yusuf, and M. Sarac¸lar, “Joint learning of distance metric and query model for posteriorgram-based keyword search,” IEEE J. Sel. Topics Signal Process., vol. 11, no. 8, pp. 1318–1328, Dec. 2017.
[18] D. R. Miller et al., “Rapid and accurate spoken term detection,” in Proc. Interspeech, 2007, pp. 314–317.
[19] Y. Wang and F. Metze, “An in-depth comparison of keyword specific thresholding and sum-to-one score normalization,” in Proc. Interspeech, 2014, pp. 2474–2478.
[20] M. Harper, “IARPA Babel program,” Accessed: Dec. 2017, 2014. [Online]. Available: https://www.iarpa.gov/index.php/research- programs/babel
[21] T. J. Hazen, W. Shen, and C. White, “Query-by-example spoken term de- tection using phonetic posteriorgram templates,” in Proc. IEEE Workshop Autom. Speech Recognit. Understanding, 2009, pp. 421–426.
[22] X. Anguera, L. J. Rodriguez-Fuentes, I. Szoke, A. Buzo, and F. Metze, “Query-by-example spoken term detection evaluation on low-resource languages,” in Proc. Int. Workshop Spoken Lang. Technol. Underresourced Lang., 2014, vol. 24, pp. 24–31.
[23] M. Mu¨ller, Information Retrieval for Music and Motion. Berlin, Germany: Springer-Verlag, 2007.
[24] L. Sar?, B. Gu¨ndog?du, and M. Sarac¸lar, “Fusion of LVCSR and poste- riorgram based keyword search,” in Proc. Interspeech, 2015, pp. 824– 828.
[25] M. Bisani and H. Ney, “Joint-sequence models for grapheme-to-phoneme conversion,” Speech Commun., vol. 50, no. 5, pp. 434–451, 2008.
[26] H. Sak, A. Senior, and F. Beaufays, “Long short-term memory recur- rent neural network architectures for large scale acoustic modeling,” in Proc. 15th Annu. Conf. Int. Speech Commun. Assoc., 2014, pp. 338– 342.
[27] A. Graves, A.-rahman Mohamed, and G. Hinton, “Speech recognition with deep recurrent neural networks,” Acoustics, speech signal process. (icassp), IEEE Int. Confe. pp. 6645–6649, 2013.
[28] J. Trmal et al., “A keyword search system using open source software,” IEEE Spoken Lang. Techn. Workshop (SLT), pp. 530–535, Dec. 2014, doi: 10.1109/SLT.2014.7078630
[29] F. Chollet et al., “Keras,” 2015. [Online]. Available: https://github.com/ fchollet/keras
[30] B. Gu¨ndogdu, “Keyword search for low resource languages,” Ph.D. dis- sertation, Electrical Engineering Dept., Bogazic¸i Univ., Istanbul, Turkey, 2017.
[31] Batuhan Gundogdu , Bolaji Yusuf and Murat Saraclar “Generative RNNs for OOV Keyword Search “, in IEEE SIGNAL PROCESSING LETTERS, VOL. 26, NO. 1, JANUARY 2019

Keywords
keyword search, out of vocabulary phrases, question modeling, recurrent neural networks