Speaker Diarization

  IJCTT-book-cover
 
International Journal of Computer Trends and Technology (IJCTT)          
 
© 2019 by IJCTT Journal
Volume-67 Issue-9
Year of Publication : 2019
Authors : Ms. Apoorva Iyer , Ms. Deepika Kini , Mrs. Shanthi Therese
DOI :  10.14445/22312803/IJCTT-V67I9P110

MLA

MLA Style:Ms.Apoorva Iyer , Ms.Deepika Kini , Mrs.Shanthi Therese "Speaker Diarization" International Journal of Computer Trends and Technology 67.9 (2019):50-54.

APA Style Ms.Apoorva Iyer , Ms.Deepika Kini , Mrs.Shanthi Therese. Speaker Diarization International Journal of Computer Trends and Technology, 67(9),50-54.

Abstract
Speaker Diarization is the task of determining ‘who spoke when?’.Speaker Diarization uses unsupervised as well as supervised approaches to detect the change of speaker in the temporal dimension. This paper primarily describes the implementation of Speaker Diarization using Neural Networks (a supervised method). First a summary of the clustering algorithms is given. Then the three approaches using neural networks is specified. They are Speaker Diarization using Artificial Neural Networks, Recurrent Neural Networks and Adaptive Long Short Term Memory or Multiple LSTMs. Finally the accuracy is calculated and the results are compared.

Reference
[1] https://towardsdatascience.com/speaker-diarization-with-kaldi- e30301b05cc8
[2] Xavier Anguera, Simon Bozonnet, Nicholas Evans, Corinne Fredouille, Gerald Friedland, Oriol Vinyals, Speaker Diarization: A Review of Recent Research, First draft submitted to the IEEE, 19th August, 2010.
[3] Speaker Diarization for Meeting Room Audio Hanwu Sun, Tin Lay Nwe, Bin Ma and Haizhou Li
[4] Arun Chandhandrasekhar, Shashankar Sudarsan “AUTOMATIC SPEAKER DIARIZATION USING MACHINE LEARNING TECHNIQUES”
[5] http://practicalcryptography.com/miscellaneous/machine-learning/guid e-mel-frequency-cepstral-coefficients-mfccs/
[6] Speaker Diarization using Deep Recurrent Convolutional Neural Networks for Speaker Embeddings Pawe? Cyrta1 , Tomasz Trzcinski ´ 1,2 , Wojciech Stokowiec 1,3 1 Tooploox, Poland, 2 Warsaw University of Technology, Poland, 3 Polish-Japanese Academy of Information Technology, Poland
[7] https://www.isca-speech.org/archive_open/archive_papers /iscslp2006/ B11.pdf https://en.wikipedia.org/wiki/Mel-frequency_cepstrum

Keywords
Artificial Neural Network, Recurrent Neural Networks, LSTM, MFCC