Hinglish Profanity Filter and Hate Speech Detection

Nirali Arora; Aartem Singh; Laik Shaikh; Mawrah Khan; Yash Devadiga

doi:https://doi.org/10.14445/22312803/IJCTT-V71I2P101

Research Article | Open Access | Download PDF

Volume 71 | Issue 2 | Year 2023 | Article Id. IJCTT-V71I2P101 | DOI : https://doi.org/10.14445/22312803/IJCTT-V71I2P101

Hinglish Profanity Filter and Hate Speech Detection

Nirali Arora, Aartem Singh, Laik Shaikh, Mawrah Khan, Yash Devadiga

Received	Revised	Accepted	Published
15 Dec 2022	19 Jan 2023	01 Feb 2023	11 Feb 2023

Citation :

Nirali Arora, Aartem Singh, Laik Shaikh, Mawrah Khan, Yash Devadiga, "Hinglish Profanity Filter and Hate Speech Detection," International Journal of Computer Trends and Technology (IJCTT), vol. 71, no. 2, pp. 1-7, 2023. Crossref, https://doi.org/10.14445/22312803/ IJCTT-V71I2P101

Abstract

Freedom of speech is highly valued on the Internet, yet it is frequently also abused there. Events such as social media applications have become necessary instead of luxury. Many children and young teenagers at a tender age are introduced to this content and are prone to verbal abuse or exposed to illegitimate content or deadlines. There are no constraints or regulations to prevent the flow of hatred and violent content; this nature of the Internet inevitably gives rise to soul stigmas such as cyberbullying and cybercrime, which can impact the minds of children and young teenagers in society. The use of a profanity filter censors out all the above content. The hate filter recognizes hate speech and blocks any hateful material, making the application suitable for kids[2]. The paper proposes a hate speech detector along with a profanity filter algorithm. One of the simulation findings demonstrates that when considering profanity as noise input in the sentiment classification for review data, accuracy decreased by roughly 2%[10]

Keywords

Censorship, Corpus, Filtering, Profanity filtering, Tokenization.

References

[1] Elisabeth Métais et al., “Natural Language Processing and Information Systems,” 26th International Conference on Applications of Natural Language to Information Systems, vol. 12801, 2021.
[2] “Profanity Filters: Everything You Need to Know + Our Top 5 Picks,” 2021.[Online]. Available: https://vpnoverview.com/internet-safety/kids-online/profanity-filters/
[3] A. D. Moore, “Python GUI Programming with Tkinter,” 2021.
[4] Sanjana Kumar, Srikrishna Veturi, and Varun Sreedhar, “Profanity Filter and Safe Chat Application using Deep Learning,” International Research Journal of Engineering and Technology, vol. 08 no. 07, 2021.
[5] MoungHo Yi et al., “Method of Profanity Detection Using Word Embedding and LSTM,” Mobile Information Systems, vol. 2021, pp. 1-9, 2021. Crossref, https://doi.org/10.10.1155/2021/6654029
[6] Nur Chamidah, and Reiza Sahawaly, “Comparison Support Vector Machine and Naive Bayes Methods for Classifying Cyberbullying in Twitter,” Jurnal Ilmiah Teknik Elektro Komputer dan Informatika, vol. 7, no. 2, pp. 338, Crossref, https://doi.org/10.10.26555/jiteki.v7i2.21175
[7] Sean MacAvaney et al., “Hate Speech Detection: Challenges and Solutions,” Plos One, 2019. Crossref, https://doi.org/10.10.1371/0221152
[8] F Razali1 et al., “Implementation of Anti-Profanity Words in Mobile Application Platform,” International Colloquium on Computational & Experimental Mechanics, vol. 1062, Crossref, https://doi.org/10.1088/1757-899X/1062/1/012026
[9] Raktim Chatterjee, Sukanya Bhattacharya, and Soumyajeet Kabi, “Profanity Detection in Social Media Text using a Hybrid Approach of NLP and Machine Learning”, International Journal of Advance Research, Ideas and Innovations in Technology, vol. 7, no. 1, 2021.
[10] Cheong-Ghil Kim, Young-Jun Hwang, and Chayapol Kamyod, “A Study of Profanity Effect in Sentiment Analysis on Natural Language Processing Using ANN”, Journal of Web Engineering, vol. 21, no. 3, 2022. Crossref, https://doi.org/10.13052/jwe1540- 9589.2139
[11] Taijin Yoon, Sun-Young Park, and Hwan-Gue Cho, “A Smart Filtering System for Newly Coined Profanities by Using Approximate String Alignment”, 10th IEEE International Conference on Computer and Information Technology, pp. 643-650, 2010. Crossref, https://doi.org/10.1109/CIT.2010.129
[12] Abdulrehman A. Mohamed, George O.Okeyo, and Michael W. Kimwele, “Literature Survey: Data-driven Approach for Selection of an Ensemble Model of Profane Words Detection in Social Media”, International Journal of Scientific & Engineering Research, vol. 9 no. 10, 2018.
[13] Zeerak Waseem, and Dirk Hovy, “Hateful Symbols or Hateful People? Predictive Features for Hate Speech Detection on Twitter,” In Proceedings of the NAACL Student Research Workshop, Association for Computational Linguistics, pp. 88–93, 2016. Crossref, https://doi.org/10.18653/v1/N16-2013
[14] Sourya Dipta Das, Soumil Mandal, and Dipankar Das, “Language Identification of Bengali-English Code-Mixed Data Using Character & Phonetic Based LSTM Models,” In Proceedings of the 11th Forum for Information Retrieval Evaluation, pp. 60–64, 2019. Crossref, https://doi.org/10.1145/3368567.3368578
[15] Shervin Malmasi, and Marcos Zampieri, “Challenges in Discriminating Profanity from Hate Speech,” Journal of Experimental & Theoretical Artificial Intelligence, vol. 30, no. 2, pp. 187–202, 2018. Crossref, https://doi.org/10.1080/0952813X.2017.1409284
[16] Prashanth Kannadaguli, and Vidya Bhat, "Phoneme Modeling for Speech Recognition in Kannada using Multivariate Bayesian Classifier," SSRG International Journal of Electronics and Communication Engineering, vol. 1, no. 9, pp. 1-4, 2014. Crossref, https://doi.org/10.14445/23488549/IJECE-V1I9P101
[17] Sara Sood, Judd Antin, and Elizabeth F. Churchill, "Profanity use in Online Communities," Conference on Human Factors in Computing Systems - Proceedings, pp. 1481-1490, 2012. Crossref, https://doi.org/10.1145/2207676.2208610
[18] Geetika Gautam, and Divakar Yadav, “Sentiment Analysis of Twitter Data Using Machine Learning Approaches and Semantic Analysis,” Seventh International Conference on Contemporary Computing, pp. 437- 442, 2014. Crossref, https://doi.org/10.1109/IC3.2014.6897213
[19] Hate Speech - ABA Legal Fact Check - American Bar Association, [Online]. Available: https://abalegalfactcheck.com/articles/hate-speech.html.
[20] What are Profanity Filters? How to Implement Them? [Online]. Available: https://caseguard.com/articles/what-are-profanity-filters/
[21] NoSwearing.com. Noswearing.com - List of Swear Words, Bad Words, & Curse Words. 2019. [Online]. Available: https://www.noswearing.com/dictionary
[22] Ekaterina Chernyak, “Comparison of String Similarity Measures for Obscenity Filtering”, aclanthology, vol. 04 no.06, 4 April 2017.
[23] Tobias Renwick, and Denilson Barbosa, “Detection and Identification of Obfuscated Obscene Language with Character Level Transformers,” The 34th Canadian Conference on Artificial Intelligence, pp. 1–8, 2021. [Online]. Available: https://caiac.pubpub.org/pub/5uqi2h7k/
[24] Pushkar Mishra, “Author Profiling for Abuse Detection,” 27th international conference on computational linguistics,” pp. 1088–1098, 2018. [Online]. Available: https://aclanthology.org/C18-1093
[25] Yi Chang et al., “Abusive Language Detection in Online User Content,” 25th international conference on world wide web, pp. 145–153, 2016. Crossref, https://doi.org/10.1145/2872427.2883062
[26] Sood S O, Antin J and Churchill E 2012 Conference on Human Factors in Computing Systems ACM 978-1-4503-1015
[27] Abdulrehman A Mohamed, Dr George O Okeyo and Dr Michael W Kimwele 2018 International Journal of Scientific & Engineering Research 9 (10) 2229-5518
[28] A. Abitha, and K Lincy, "A Faster RCNN Based Image Text Detection and Text to Speech Conversion," SSRG International Journal of Electronics and Communication Engineering, vol. 5, no. 5, pp. 11-14, 2018. Crossref, https://doi.org/10.14445/23488549/IJECEV5I5P103
[29] Kate Knibbs, “Curses! People swear a lot on Twitter, and here are the most popular words,” 2014. [Online]. Available: http://www.digitaltrends.com/socialmedia/popular-curse-words-twitter/
[30] C. J. Hutto, and Eric Gilbert, “VADER : A Parsimonious Rule-Based Model for Sentiment Analysis Of Social Media Text,” The Eighth International AAAI Conference on Weblogs and Social Media, vol. 8, no. 1, pp. 216–225, 2014. Crossref, https://doi.org/10.1609/icwsm.v8i1.14550
[31] N. D. Gitari, Z. Zuping, H. Damien, & J. Long.
[32] Hugo Rosa et al., “A ‘Deeper’ look at Detecting Cyberbullying in Social Networks,” International Joint Conference on Neural Networks, pp. 1–8, 2018. Crossref, https://doi.org/10.1109/IJCNN.2018.8489211T
[33] Tin Van Huynh et al., “Hate Speech Detection on Vietnamese Social Media Text using the Bi-GRU-LSTMCNN Model,” Computation and Language, 2019. Crossref, https://doi.org/10.48550/arXiv.1911.03644
[34] Bjorn Gambäck, and Utpal Kumar Sikdar, “Using Convolutional Neural Networks to Classify Hate-Speech,” The First Workshop on Abusive Language Online, Association for Computational Linguistics, pp. 85–90, 2017. Crossref, https://doi.org/10.18653/v1/W17- 3013
[35] Tomas Mikolov et al., “Efficient Estimation of Word Representations in Vector Space,” Computation and Language, 2013.[Online]. Available: http://arxiv.org/abs/1301.3781.
[36] Tom Young et al., “Recent Trends in Deep Learning Based Natural Language Processing, Computation and Language , 2017. [Online]. Available: http://arxiv.org/abs/1708.02709.
[37] Ayush Jain et al., "Detection of Sarcasm through Tone Analysis on video and Audio files: A Comparative Study On Ai Models Performance," SSRG International Journal of Computer Science and Engineering, vol. 8, no. 12, pp. 1-5, 2021. Crossref, https://doi.org/10.14445/23488387/IJCSE-V8I12P101
[38] Jeffrey Pennington, Richard Socher, and Christopher D. Manning “Global Vectors for Word Representation,” Conference on Empirical Methods in Natural Language Processing, pp. 1532-1543, 2014. Crossref, https://doi.org/10.3115/v1/D14-116
[39] Piotr Bojanowski et al., “Enriching Word Vectors with Subword Information,” 2017. [Online]. Available: http://arxiv.org/abs/1607.04606.
[40] Armand Joulin et al., “Bag of Tricks for Efficient Text Classification,” 2016. [Online]. Available: http://arxiv.org/abs/1607.01759.
[41] Armand Joulin et al., “Compressing Text Classification Models,” 2016. [Online]. Available: http://arxiv.org/abs/1612.03651.
[42] Tomas Mikolov et al., “Advances in Pre-Training Distributed Word Representations,” 2017. [Online]. Available: http://arxiv.org/abs/1712.09405.
[43] ZENG Runhua, and ZHANG Shuqun, "Improving Speech Emotion Recognition Method of Convolutional Neural Network,” International Journal of Recent Engineering Science, vol. 5, no. 3, pp. 1-7, 2018. Crossref, https://doi.org/10.14445/23497157/IJRES-V5I3P101
[44] Mike King, “Types of Profanity Filters for Online Safety,” 2013. [Online]. Available: https://cleanspeak.com/blog/2013/03/28/types-of-profanity-filters-for-online-safety
[45] Ng Wai Foong, “Profanity Filtering in Speech,” 2022. [Online]. Available: https://levelup.gitconnected.com/profanity-filtering-in-speech-41ae4fd6cccf
[46] Wikidocs, “Introduction to natural language processing using deep learning.”, 2020. [Online]. Available: https://wikidocs.net/33520