Environmental Audio Tagging: Trends and Techniques

Dr. Jayasudha.J.S; Mrs.Sangeetha.M.S

doi:https://doi.org/10.14445/22312803/IJCTT-V62P101

Research Article | Open Access | Download PDF

Volume 62 | Number 1 | Year 2018 | Article Id. IJCTT-V62P101 | DOI : https://doi.org/10.14445/22312803/IJCTT-V62P101

Environmental Audio Tagging: Trends and Techniques

Dr. Jayasudha.J.S, Mrs.Sangeetha.M.S

Citation :

Dr. Jayasudha.J.S, Mrs.Sangeetha.M.S, "Environmental Audio Tagging: Trends and Techniques," International Journal of Computer Trends and Technology (IJCTT), vol. 62, no. 1, pp. 1-13, 2018. Crossref, https://doi.org/10.14445/22312803/IJCTT-V62P101

Abstract

Real life environment consists of various kinds of sounds other than speech and music. All these sounds carry information about our everyday environment and have its own features. In order to categorize different kinds of sounds and to study them separately, tagging is introduced into the area of sound analysis. Environmental audio tagging predicts the presence or absence of certain acoustic events in the interested acoustic scene. Audio tagging forms the backbone of sound recognition and classification work. This work on audio tagging consists of extracting relevant features from input audio and of using those features to identify a set of classes into which the sound is most likely to fit. Existing works for this task largely uses conventional classifiers which do not have the feature abstraction found in deeper models. A deep learning framework is used here for unsupervised feature learning and classification.

Keywords

deep learning, environmental audio tagging, unsupervised feature learning, multilabel classification.

References

[1] Garima Vyas, Malay Kishore Dutta “Automatic Mood Detection of Indian Music Using MFCCs and K-means Algorithm ”, Contemporary Computing (IC3), IEEE Seventh International Conference, 7-9 Aug 2014.
[2] Jurgen T. Geiger, Bjorn Schuller, Gerhard Rigoll “Large Scale Audio Feature Extraction And SVM For Acoustic Scene Classification ”, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics, 2013.
[3] Li Yang and Feng Su, “Auditory Context Classification Using Random Forests ”, Acoustics, Speech and Signal Processing (ICASSP), IEEE International Conference on 25-30 March 2012.
[4] Alexandros Nanopoulos, Ioannis Karydis, “Know Thy Neighbor: Combining Audio Features And Social Tags For Effective Music Similarity ”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 2011.
[5] Xi Shao and Changsheng Xu and Mohan S Kankanhalli, “Unsupervised classification of music genre using hidden markov model ”, IEEE International Conference on Multimedia and Expo, 2004.
[6] Remi Foucard, Slim Essid, Mathieu Lagrang and Ga el Richard, “A regressive boosting approach to automatic audio tagging based on soft annotator fusion ”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), 25-30 March 2012.
[7] Courtenay V. Cotton and Daniel P. W. Ellis, “Spectral VS. Spectro - Temporal Features For Acoustic Event Detection ”, IEEE Workshop on Applications of Signal Processing to Audio and Acoustics (WASPAA) 2011.
[8] Qiuqiang Kong, Yong Xu, Wenwu Wang, Mark D. Plumbley “A Joint Detection -Classification Model For Audio Tagging Of Weakly Labelled Data ”, IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) 2017.
[9] Derek Tingle, Youngmoo E. Kim, Douglas Turnbull “Exploring Automatic Music Annotation with ’Acoustically-Objective’ Tags ”, MIR,Philadelphia,Pennsylvania, USA. Copyright 2010 ACM.
[10] Axel Plinge, Rene Grzeszick, and Gernot A. Fink, “A BAG-OFFEATURES Approach To Acoustic Event Detection ”, IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), 2014.
[11] Ian McLoughlin, Haomin Zhang, Zhipeng Xie, and Wei Xiao “Robust Sound Event Classification using Deep Neural Networks ”, IEEE/ACM Transactions on Audio, Speech, and Language Processing, Volume: 23, Issue: 3, March 2015.
[12] Yong Xu Qiuqiang Kong Qiang HuangWenwuWang Mark D. Plumbley “Convolutional Gated Recurrent Neural Network Incorporating Spatial Features for Audio Tagging ”, Neural Networks (IJCNN), International Joint Conference on 14-19 May 2017.
[13] Jun Takagi, Yasunori Ohishi, Akisato Kimura, Masashi Sugiyama, Makoto Yamada, Hirokazu Kameoka “Automatic Audio Tag Classification Via Semi-Supervised Canonical Density Estimation ”, IEEE International Conference on Acoustic, Speech and Signal Processing (ICASSP), 2011.
[14] Gordon Wichern, Makoto Yamada, 1Harvey Thornburg, 2Masashi Sugiyama, and Andreas Spanias “Automatic Audio Tagging Using Covariate Shift Adaptation ”, IEEE International Conference on Acoustics, Speech and Signal Processing, 2010.
[15] Yong Xu, Qiuqiang Kong, Qiang Huang, Wenwu Wang, Mark D. Plumbley “Attention and Localization based on a Deep Convolutional Recurrent Model for Weakly Supervised Audio Tagging ”, http://dx.doi.org/10.21437/Interspeech.2017.
[16] H. Shimodaira “Improving predictive inference under covariate shift by weighting the log-likelihood function, ”Journal of Statistical Planning and Inference, vol. 90, 2000.
[17] M. Sugiyama, S. Nakajima, H. Kashima, P. von Bunau, and M. Kawanabe, “Direct importance estimation with model selection and its application to covariate shift adaptation ”, Advances in Neural Information Processing Systems. Cambridge, MA: MIT Press, 2008.
[18] Yong Xu, Qiang Huang, Wenwu Wang, Peter Foster, Siddharth Sigtia, Philip J. B. Jackson, and Mark D. Plumbley, “Unsupervised Feature Learning Based on Deep Models for Environmental Audio Tagging ”, IEEE/ACM Transactions on Audio, Speech and Language Processing, VOL. 25, NO. 6, June 2017.