Classifying Unwanted Emails using Naïve Bayes Classifier

© 2021 by IJCTT Journal
Volume-69 Issue-9
Year of Publication : 2021
Authors : Victoria Oluwatoyin Oyekunle, Edward E. Ogheneovo
DOI :  10.14445/22312803/IJCTT-V69I9P103

How to Cite?

Victoria Oluwatoyin Oyekunle, Edward E. Ogheneovo, "Classifying Unwanted Emails using Naïve Bayes Classifier," International Journal of Computer Trends and Technology, vol. 69, no. 9, pp. 12-16, 2021. Crossref,

In recent years, the increasing use of Electronic mail for fast and cheap personal, official, academic communication, and electronic commerce has led to the emergence and further widespread of problems caused by unsolicited and unwanted bulk e-mail messages. In this study, the objective is to enhance the classification of incoming e-mails-using the Naïve Bayes classifier-into unwanted and ham (legitimate) based on features in both the Subject text of the email and the Email body. The system segments the input email body into tokens and analyses its structure. The dataset is cleaned, and the total number of unique words are counted and extracted, and then compared with already learned unwanted words in the database. If email is classified as ‘Unwanted with very high degree’ or ‘Unwanted with high degree’, users are notified and advised to block unwanted emails. Some emails were classified as Ham. This means that users can view such messages as legitimate messages.

Electronic mail, Machine Learning, Artificial Intelligence, Spam Filtering, Unwanted Emails, Ham, Phishing, junk Email.


[1] W. L. Sushma, D. Shailaja, D. Ganesh and B. Bipin Shinde. Overview of Anti-Spam Filtering Techniques. International Research Journal of Engineering and Technology (IRJET), 04(01) (2017). p-ISSN: 2395-0072.
[2] S. Geerthik and T. P. Anish. Filtering Spam: Current Trends and Techniques. International Journal of Mechatronics, Electrical and Computer Technology,3(8) (2013) 208–223.
[3] P. Pantel and D. Lin. SpamCop: A Spam Classification & Organization Program. In Proceedings of Workshop for Text Categorization, AAAI-98 (1998) 95–98.
[4] I. Koprinska, J. Poon, J. Clark and J. Chan., Learning to Classify E-mail. Information Sciences 177 (2007) 2167–2187.
[5] M. Sahami, S. Dumais, D. Heckerman and Horvitz, E. A Bayesian Approach to Filtering Junk E-mail, In Proceedings AAAI Workshop on Learning for Text Categorization (1998).
[6] W. A. Awad and S. M. ELseuofi Machine Learning Methods for Spam Email Classification. Proceedings of the International Journal of Computer Science & Information Technology (IJCSIT), 3(1) (2011) 273-284.
[7] I. Ismaila, S. Ali, N. ThanhNguyen, S. O. Omatu, and M. P. KamilKuca. A Combined Negative Selection Algorithm–Particle Swarm Optimization for an Email Spam Detection System. Engineering Applications of Artificial Intelligence 39 (2015) 33-44.
[8] V. Christina, S. Karpagavalli and G. Suganya. Email Spam Filtering Using Supervised Machine Learning Techniques. (IJCSE) International Journal on Computer Science and Engineering. 2(9) (2010) 3126-3129.
[9] N. Andrew and D. Jeff. Building High-level Features Using LargeScale Unsupervised Learning. Proceedings of the 29th International Conference on Machine Learning, Edinburgh, Scotland, UK, 1-13 (2013).
[10] C. Wu. Behavior-Based Spam Detection Using a Hybrid Method of Rule-Based Techniques and Neural Networks. Expert Systems with Applications 36 (2009) 4321–4330.
[11] M. N. Marsono, M. W El-Kharashi, Fayez Gebali., Targeting Spam control on middleboxes: Spam detection based on layer-3 Email content classification. Elsevier Computer Networks 53 (2009) 835– 848.
[12] Z. Bing, Y. Yiyu and J. Luo. A Three-Way Decision Approach to Email Spam Filtering. Canadian AI 2010, LNAI 6085 (2010) 28– 39.
[13] S. Roy, A. Patra, S. Sau, K. Mandal, S. Kunar An Efficient Spam Filtering Techniques for Email. American Journal of Engineering Research (AJER) e-ISSN: 2320-0847 p-ISSN: 2320-0936, 2(10) (2013) 63-73.
[14] M. N. Marsono, El-Kharashi, M. W. and F. Gebali Targeting Spam Control on Middleboxes: Spam Detection Based on Layer-3 Email Content Classification. Elsevier Computer Networks 53 (2009) 835–848.
[15] S.S. Shinde, and Patil, P. R. Improving Spam Mail Filtering Using with Discretization Filter. International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) (2014) 82–87.0
[16] N. Mirza, Mirza, T. and Auti, B. R. Evaluating Efficiency of Classifier for Email Spam Detector Using Hybrid Feature Selection Approaches. IEEE, International Conference on Intelligent Computing and Control Systems ICICCS, 978-1-53862745 (2017).
[17] A. Almeida, J. Almeida and A. Yamakami Spam Filtering: how the Dimensionality Reduction Affects the Accuracy of Naive Bayes Classifiers. Journal of Internet Services and Applications, Springer London, 1 (2011) 183–200.
[18] P. Revar, S. Arpita, P. Jitali and K. Pimal. A Review on Different Types of Spam Filtering Techniques. International Journal of Advanced Research in Computer Science, 8 (5) (2017) 2720-2723.
[19] S. S. Shinde, and P.R. Patil. Improving Spam Mail Filtering Using with Discretization Filter. International Journal of Emerging Technologies in Computational and Applied Sciences (IJETCAS) (2014) 82–87.
[20] V. O. Oyekunle, P. O. Asagba, F. Egbono. Detection of Violent Emails Using Fuzzy Logic. International Journal of Computer Trends and Technology, 69(3) (2021) 79-84.
[21] V. O. Oyekunle, M. Nwanyanwu, M. A. Ide., Efficient Method of Mining Sequential pattern in the retail database. Journal of Scientific and Engineering Research, 8(5) (2021) 65-74.
[22] Wegmuller, J. P. von der Weid, P. Oberson, and N. Gisin, Highresolution fiber distributed measurements with coherent OFDR, in Proc. ECOC’00, 11.3.4 (2000) 109.
[23] Surendiran,R., and Alagarsamy,K., 2013. "Privacy Conserved Access Control Enforcement in MCC Network with Multilayer Encryption". International Journal of Engineering Trends and Technology (IJCTT), 4(5), pp.2217-2224.