An Efficient Speaker Diarization using Privacy Preserving Audio Features Based of Speech/Non Speech Detection

  IJCTT-book-cover
 
International Journal of Computer Trends and Technology (IJCTT)          
 
© 2014 by IJCTT Journal
Volume-9 Number-4                          
Year of Publication : 2014
Authors : S.Sathyapriya , A.Indhumathi
DOI :  10.14445/22312803/IJCTT-V9P136

MLA

S.Sathyapriya , A.Indhumathi."An Efficient Speaker Diarization using Privacy Preserving Audio Features Based of Speech/Non Speech Detection". International Journal of Computer Trends and Technology (IJCTT) V9(4):184-187, March 2014. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract -
Privacy-sensitive audio features for speaker diarization in multiparty conversations: i.e., a set of audio features having low linguistic information for speaker diarization in a single and multiple distant microphone scenarios is a challenging research field in now-a-days. Existing system used a supervised framework using deep neural architecture for deriving privacy-sensitive audio features. In proposed system Patterns of speech/nonspeech detection (SND) is utilized for privacy-sensitive audio feature to capture real-world audio. SND and diarization can then be used to analyze social interactions. In this research privacy preserving audio features has been investigated instead for recording and storage that can respect privacy by minimizing the amount of linguistic information, whereas achieving modern performance in conversational speech processing tasks. Certainly, the main contribution of the proposed system is the achievement of state-of-the-art performances in speech/nonspeech detection and speaker diarization tasks using such features, which we refer to, as privacy-sensitive. In addition a comprehensive analysis of these features has been provided for the two tasks in a variety of conditions, such as indoor (predominantly) and outdoor audio. To objectively evaluate the notion of privacy, the proposed system use automatic speech recognition tests, with higher accuracy in either being interpreted as yielding lower privacy.

References
[1] Syed Ayaz Ali Shah, Azzam ul Asar, S.F. Shaukat, “Neural Network Solution for Secure Interactive Voice Response,” World Applied Sciences Journal 6 (9): 1264-1269, ISSN 1818-4952, 2009
[2] Corneliu Octavian Dumitru, Inge Gavat, “A Comparative Study of Feature Extraction Methods Applied to Continuous Speech Recognition in Romanian Language,” International Symphosium ELMAR, 07-09 June, 2006, Zadar, Croatia.
[3] 6. Goranka Zoric, “Automatic Lip Synchronization by Speech Signal Analysis,” Master Thesis, Faculty of Electrical Engineering and Computing, University of Zagreb, Zagreb, Oct-2005
[4] D. Olguin-Olguin and A. Pentland, “Sensor-based organisational design and engineering,” Int. J. Organisational Design and Eng., vol. 1, pp. 5–28, 2010.
[5] C. Wooters and M. Huijbregts, “The ICSI RT07s speaker diarization system,” in Proc. Workshop Classification of Events, Activities, and Relationships and the Rich Transcript. Meeting Recognit., 2008.
[6] B. P. Milner and X. Shao, “Prediction of fundamental frequency and voicing from mel-frequency cepstral coefficients for unconstrained speech reconstruction,” IEEE Trans. Audio, Speech, Lang. Process., vol. 15, no. 1, pp. 24–33, Jan. 2007
[7] D. P. W. Ellis and K. Lee, “Accessing minimal impact personal audio archives,” IEEE Multimedia, vol. 13, pp. 30–38, 2006.
[8] D. P. W. Ellis and K. Lee, “Features for segmenting and classifying longduration recordings of personal audio,” in Proceedings of Workshop on Statistical and Perceptual Audio Processing, 2004.
[9] S. Basu, “Conversational scene analysis,” Ph.D. dissertation, Massachusetts Institute of Technology. Dept. of Electrical Engineering and Computer Science, 2002.
[10] J. Makhoul, “Linear prediction: A tutorial review,” Proceedings of IEEE, vol. 63, pp. 561–580, 1975
[11] S. H. K. Parthasarathi, M. Magimai.-Doss, H. Bourlard, and D. Gatica- Perez, “Investigating privacy-sensitive features for speech detection in multiparty conversations,” in Proceedings of Interspeech, 2009.
[12] C. Wooters and M. Huijbregts, “The ICSI RT07s speaker diarization system,” in Proc. Workshop Classification of Events, Activities, and Relationships and the Rich Transcript. Meeting Recognit., 2008.
[13] D.Wyatt, T. Choudhury, J. Bilmes, and H. Kautz, “A privacy-sensitive approach to modeling multi-person conversations,” in Proc. Int. Joint Conf. Artif. Intell., 2007.
[14]S. H. K. Parthasarathi, M. Magimai-Doss,H. Bourlard, andD.Gatica- Perez, “Evaluating the robustness of privacy-sensitive audio features for speech detection in personal audio log scenarios,” in Proc. Int. Conf. Acoust., Speech, Signal Process., 2010, pp. 4474–4477.

Keywords
Speech/Nonspeech Detection, Diarization,Privacy-sensitive features, deep neural networks, LP residual.