Classification of Spam Categorization on Hindi Documents using Bayesian Classifier
||International Journal of Computer Trends and Technology (IJCTT)||
|© 2018 by IJCTT Journal|
|Year of Publication : 2018|
|Authors : Mr.Ishaan Tamhankar, Dr.Ashysh Chaturvedi|
|DOI : 10.14445/22312803/IJCTT-V66P102|
MLA Style: Mr.Ishaan Tamhankar, Dr.Ashysh Chaturvedi "Classification of Spam Categorization on Hindi Documents using Bayesian Classifier" International Journal of Computer Trends and Technology 66.1 (2018): 8-13.
APA Style:Mr.Ishaan Tamhankar, Dr.Ashysh Chaturvedi (2018). Classification of Spam Categorization on Hindi Documents using Bayesian Classifier. International Journal of Computer Trends and Technology, 66(1), 8-13.
In the current e-world, mostly all the transactions and the business are taking place through e-mails. Now a day, e-mail has become a powerful tool for communication as it saves a lot of time, paper and cost. But, due to social networks sites and advertiser most of the e-mails are containing unwanted information i.e. called spam. The spam e-mails may contain text of any languages. On the web there are some documents that contain Indian language which may be a spam e-mail. As there are various languages available in India it is a challenging task to identify the spam e-mail due to its linguistic variance and language barriers. As I have reviewed so many research papers on E-mail Spam Categorization, I found that there are so many classifiers available for all the Indian Language, but there is no document classifier available for Hindi language. So in my research I am going to focus on document classifier for Hindi Spam E-Mail Categorization.
 Lin SH, Chen M C, Ho JM, Huang YM. ACIRD: Intelligent Internet document organization and retrieval. IEEE Transactions on Knowledge and Data Engineering. 2002; 14(3):599–614.https://doi.org/10.1109/ TKDE.2002.1000345
 Lee LH, Isa D. automatically computed document dependent weighting factor facility for Naïve Bayes classification. Expert Systems with Applications, 2010; 37(12):8471–8. https://doi.org/10.1016/j.eswa.2010.05.030
 Zhang H. The Optimality of Naive Bayes. Barr V, Markov Z, editors. FLAIRS Conference; AAAI Press; 2004.
 Patil JJ, Bogiri N. Automatic text categorization Marathi documents. International Journal of Advance Research in Computer Science and Management Studies. 2015; 3(3):280–7. https://doi.org/10.1109/icesa.2015.7503438
 Patil M, Game P. Comparison of Marathi text classifiers. ACEEE International Journal on Information Technology. 2014; 4(1):11–22.
 mandal ak, sen r. supervised learning method for bangla web Document Categorization. International Journal of Artificial Intelligence and Applications. 2014; 5(5):93–105. https://doi.org/10.5121/ijaia.2014.5508
 Murthy VG, Vardhan BV, Sarangam K, Reddy PVP. A comparative study on term weighting methods for automated Telugu text categorization with effective classifiers. International Journal of Data Mining and Knowledge Management Process. 2013; 3(6):95. https://doi. org/10.5121/ijdkp.2013.3606
 Swamy MN, Hanumanthappa M. Indian language text representation and categorization using supervised learning algorithm. International Journal of Data Mining Techniques and Applications. 2013; 2:251–7.
 Naseeb N, Gupta V. Domain based classification of punjabi text documents using ontology and hybrid based approach. Proceedings of the 3rd Workshop on South and Southeast Asian Natural Language Processing COLING; 2012. p. 109–122.
 Rajan K, Ramalingam V, Ganesan M, Palanivel S, Palaniappan B. Automatic classification of Tamil documents using vector space model and artificial neural network. Expert Systems with Applications. 2009, 36(8):10914–8. https://doi.org/10.1016/j.eswa.2009.02.010
 Raghuveer K, Murthy KN. Text categorization in Indian languages using machine learning approaches. IICAI; 2007. p. 1864–83.
 Pang B, Lee L, Vaithyanathan S. Thumbs up? Sentiment classification using machine learning techniques. Proceedings of the ACL-02 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics. 2002; 10:79–86.
 Rogati M, Yang Y. High-performing feature selection for text classification. Proceedings of the 11th International Conference on Information and Knowledge Management; 2002. p. 659–61. https://doi.org/10.1145/584792.584911
 Forman G. An extensive empirical study of feature selection metrics for text classification. The Journal of Machine Learning Research. 2003; 3:1289–305.
 Tan S, Zhang J. An empirical study of sentiment analysis for Chinese documents. Expert Systems with Applications. 2008; 34(4):2622–9. https://doi.org/10.1016/j. eswa.2007.05.028
 Prabowo R, Thelwall M. Sentiment analysis: A combined approach. Journal of Informetrics. 2009; 3(2):143–57. https://doi.org/10.1016/j.joi.2009.01.003
 Alsaleem S. Automated Arabic text categorization using SVM and NB. International Arab Journal of e-Technology. 2011; 2(2):124–8.
 El Kourdi M, Bensaid A, Rachidi TE. Automatic Arabic document categorization based on the Naïve Bayes algorithm. Proceedings of the Workshop on Computational Approaches to Arabic Script-based Languages, Association for Computational Linguistics; 2004. p. 51–8. https://doi. org/10.3115/1621804.1621819
 Hadni M, Lachkar A, Ouatik SA. A new and efficient stemming technique for Arabic text categorization. 2012 International Conference on Multimedia Computing and Systems (ICMCS); 2012. p. 791–6. https://doi.org/10.1109/ ICMCS.2012.6320308
 Harrag F, El-Qawasmah E, Al-Salman AMS. Stemming as a feature reduction technique for Arabic text categorization. 2011 10th International Symposium on Programming and Systems (ISPS); 2011. p. 128–33.
 Halder T, Karforma S, Mandal R. A novel data hiding approach by pixel-value-difference steganography and optimal adjustment to secure e-governance documents.Indian Journal of Science and Technology. 2015 Jul; 8(16):1–7. https://doi.org/10.17485/ijst/2015/v8i16/51269
 Prakash KB. Mining issues in traditional Indian web documents. Indian Journal of Science and Technology. 2015 Nov; 8(32):1–11.
 Antipov KV, Vinokur AI, Simakov SP, Isakov YV, Kazakova AY. Digitization of Russian parish registers of the 18-20th centuries as the contribution to the cultural foundation of historical documents. Indian Journal of Science and Technology. 2015 Dec; 8(10):1–10. https://doi. org/10.17485/ijst/2015/v8is(10)/87462
 Posonia AM, Jyothi VL. Context-based classification of XML documents in feature clustering. Indian Journal of Science and Technology. 2014 Jan; 7(9):1–4.
 Karthika S, Sairam N. A naïve bayesian classifier for educational qualification. Indian Journal of Science and Technology. 2015,Jul;8(16):1–5. https://doi.org/10.17485/ ijst/2015/v8i16/62055
 Sarangi PK, Ahmed P, Ravulakollu KK. Naïve Bayes classifier with LU factorization for recognition of handwritten Odia numerals. Indian Journal of Science and Technology. 2014 Jan; 7(1):1–4.
Hindi Language, Naïve Bayes (NB), Document Categorization, Support Vector Machines (SVM) and K-NN (K – Nearest Neighbors).