An Ensemble Model Based on Multinomial Naïve Bayes and Lexicon for Sentiment Classification of Product Reviews

Gabriel V. Oliko; Calvins Otieno; Titus M. Muhamb

doi:https://doi.org/10.14445/22312803/IJCTT-V73I3P101

Research Article | Open Access | Download PDF

Volume 73 | Issue 3 | Year 2025 | Article Id. IJCTT-V73I3P101 | DOI : https://doi.org/10.14445/22312803/IJCTT-V73I3P101

An Ensemble Model Based on Multinomial Naïve Bayes and Lexicon for Sentiment Classification of Product Reviews

Gabriel V. Oliko, Calvins Otieno, Titus M. Muhamb

Received	Revised	Accepted	Published
08 Jan 2025	18 Feb 2025	02 Mar 2025	15 Mar 2025

Citation :

Gabriel V. Oliko, Calvins Otieno, Titus M. Muhamb, "An Ensemble Model Based on Multinomial Naïve Bayes and Lexicon for Sentiment Classification of Product Reviews," International Journal of Computer Trends and Technology (IJCTT), vol. 73, no. 3, pp. 1-15, 2025. Crossref, https://doi.org/10.14445/22312803/ IJCTT-V73I3P101

Abstract

In the emerging trend, product developers and their customers use internet reviews as the primary tool for evaluating products. Online communities, blogs, and public review websites provide a multitude of data about customers' overall viewpoints, experiences, and opinions about goods. Product developers can harvest data on users' perceptions about their preferred features and use that information to boost revenue and profit by planning and monitoring business strategies and improving the overall quality of products. The reviews also assist prospective purchasers in making informed decisions on the suitability of a product and pricing while reducing time and effort. Machine learning algorithms are used to identify and categorize product evaluations. This paper presents an ensemble machine learning approach that integrates results drawn from two base learners to improve accuracy in classification, which is the percentage of correctly classified product evaluation. Multinomial Naïve Bayes and Unsupervised Lexicon were the base learners utilized to model the ensemble that was used to classify consumer reviews as positive, neutral or negative. Feature extraction methods N-gram, Part of Speech, and features from the lexical library TextBlob were used. The proposed model was evaluated on the real dataset for two items: the "Samsung Galaxy A12" smartphone and the "Nissan Sentra" automobile brand and series. The experimental results indicate that the MNB Lexicon Pooled Ensemble outperformed the individual MNB and Lexicon classifiers in rating prediction, with respective accuracy, precision, recall and F1 measurements of 0.8250, 0.8932, 0.7970 and 0.8325.

Keywords

Product, Reviews, Sentiment analysis, Multinomial Naïve Bayes, Lexicon.

References

[1] Caroline Blais, and Raymond K. Agbodoh-Falschau, “An Exploratory Investigation of Performance Criteria in Managing and Controlling New Product Development Projects: Canadian SMES' Perspectives,” International Journal of Managing Projects in Business, vol. 16, no. 6/7, pp. 788-807, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[2] Edim Eka James, Altuğ Ocak, and Samuel Eventus Bernard, “Exploring the Dynamics of Product Quality and Failures in Export Trade: A Systematic Literature Review,” International Journal of Science and Research Archive, vol. 12, pp. 272-306, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[3] Garmt Dijksterhuis, “New Product Failure: Five Potential Sources Discussed,” Trends in Food Science & Technology, vol. 50, pp. 243 248, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[4] Marianna Kazimierska, and Magdalena Grębosz-Krawczyk, “New Product Development (NPD) Process – An Example of Industrial Sector,” Management Systems in Production Engineering, vol. 25, no. 4, pp. 246-250, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[5] Dagmara Skurpel, “Advantages and Disadvantages of Internet Marketing Research,” World Scientific News, vol. 57, pp. 712-721, 2016.
[Google Scholar] [Publisher Link]
[6] Sertac Eroglu, and Nihan Tomris Kucun, “Traditional Market Research and Neuromarketing Research: A Comparative Overview,” Analyzing the Strategic Role of Neuromarketing and Consumer Neuroscience, pp. 1-22, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[7] Huimin Lu et al., “Brain Intelligence: Go Beyond Artificial Intelligence,” Mobile Networks and Applications, vol. 23, no. 2, pp. 368-375, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[8] Tarun Kumar Vashishth et al., “AI and Data Analytics for Market Research and Competitive Intelligence Final,” AI and Data Analytics Applications in Organizational Management, pp. 1-26, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[9] Federico Neri et al., “Sentiment Analysis on Social Media,” 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining, Istanbul, Turkey, pp. 919-926, 2012.
[CrossRef] [Google Scholar] [Publisher Link]
[10] Michael Etter et al., “Measuring Organizational Legitimacy in Social Media: Assessing Citizens’ Judgments with Sentiment Analysis,” Business & Society, vol. 57, no. 1, pp. 60-97, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[11] Lei Zhang et al., “Combining Lexicon-Based and Learning Based Methods for Twitter Sentiment Analysis,” HP Laboratories, Technical Report, vol. 89, pp. 1-8, 2011.
[Google Scholar]
[12] Ludmila I. Kuncheva, Combining Pattern Classifiers: Methods and Algorithms, John Wiley & Sons, pp. 1-300, 2004.
[Google Scholar] [Publisher Link]
[13] Oscar Castillo, Patricia Melin, and Witold Pedrycz, Hybrid Intelligent Systems: Analysis and Design, Springer, pp. 1-433, 2007.
[Google Scholar] [Publisher Link]
[14] Emilio Corchado, Ajith Abraham, and Andre de Carvalho, “Hybrid Intelligent Algorithms and Applications,” Information Sciences, vol. 180, pp. 2633-2634, 2010.
[CrossRef] [Google Scholar] [Publisher Link]
[15] Simona Valentina Pascalau, and Ramona Mihaela Urziceanu, “Traditional Marketing versus Digital Marketing,” Agora International Journal of Economical Sciences, vol. 14, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[16] Jyoti Thakur, and Bijay Prasad Kushwaha, “Artificial Intelligence in Marketing Research and Future Research Directions: Science Mapping and Research Clustering Using Bibliometric Analysis,” Global Business and Organizational Excellence, vol. 43, no. 3, pp. 139 155, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[17] Sourav Sinha, and Revathi Sathiya Narayanan, “A Novel Hybrid Lexicon Ensemble Learning Model for Sentiment Classification of Consumer Reviews,” Journal of Internet Services and Information Security, vol. 13, no. 3, pp. 16-30, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[18] Geriska Isabelle, Warih Maharani, and Ibnu Asror, “Analysis on Opinion Mining Using Combining Lexicon-Based Method and Multinomial Naïve Bayes,” Proceedings of the 2018 International Conference on Industrial Enterprise and System Engineering, pp. 214 219, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[19] Kousik Barik, and Sanjay Misra, “Analysis of Customer Reviews with an Improved Vader Lexicon Classifier,” Journal of Big Data, vol. 11, pp. 1-29, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[20] Son Trinh, Luu Nguyen, and Minh Vo, Combining Lexicon-Based and Learning-Based Methods for Sentiment Analysis for Product Reviews in Vietnamese Language, Computer and Information Science, Springer, Cham, pp. 57-75, 2017.
[CrossRef] [Google Scholar] [Publisher Link]
[21] Ade Romadhony et al., “Sentiment Analysis on a Large Indonesian Product Review Dataset,” Journal of Information Systems Engineering and Business Intelligence, vol. 10, pp. 167-178, 2024.
[CrossRef] [Google Scholar] [Publisher Link]
[22] Muhammad Fayaz et al., “Ensemble Machine Learning Model for Classification of Spam Product Reviews,” Complexity, vol. 2020, no. 1, pp. 1-10, 2020.
[CrossRef] [Google Scholar] [Publisher Link]
[23] Emmanuel Gbenga Dada et al., “Ensemble Machine Learning Model for Software Defect Prediction,” Advances in Machine Learning & Artificial Intelligence, vol. 2, no. 1, pp. 11-21, 2021.
[Google Scholar] [Publisher Link]
[24] Aleksandra Petrakova, Michael Affenzeller, and Galina Merkurjeva, “Heterogeneous versus Homogeneous Machine Learning Ensembles,” Information Technology and Management Science, vol. 18, no. 1, pp. 135-140, 2015.
[CrossRef] [Google Scholar] [Publisher Link]
[25] Triyanna Widiyaningtyas, Ilham Ari Elbaith Zaeni, and Riswanda Al Farisi, “Sentiment Analysis of Hotel Review Using N-Gram and Naive Bayes Methods,” 2019 Fourth International Conference on Informatics and Computing, Semarang, Indonesia, pp. 1-5, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[26] Basant Agarwal, and Namita Mittal, “Categorical Probability Proportion Difference (CPPD): A Feature Selection Method for Sentiment Classification,” Proceedings of the 2nd Workshop on Sentiment Analysis where AI Meets Psychology, Mumbai, India, pp. 17-26, 2012.
[Google Scholar] [Publisher Link]
[27] Lai Po Hung, Rayner Alfred, and Mohd Hanafi Ahmad Hijazi, “A Performance Comparison of Feature Selection Methods for Sentiment Classification,” Computational Science and Technology, vol. 488, pp. 21-30, 2018.
[CrossRef] [Google Scholar] [Publisher Link]
[28] Michel Généreux, Thierry Poibeau, and Moshe Koppel, “Sentiment Analysis Using Automatically Labelled Financial News Items,” Affective Computing and Sentiment Analysis, vol. 45, pp. 101-114, 2011.
[CrossRef] [Google Scholar] [Publisher Link]
[29] Zhongwu Zhai et al., “Exploiting Effective Features for Chinese Sentiment Classification,” Expert Systems with Applications, vol. 38, no. 8, pp. 9139-9146, 2011.
[CrossRef] [Google Scholar] [Publisher Link]
[30] Sepideh Foroozan Yazdani et al., “NgramPOS: A Bigram-Based Linguistic and Statistical Feature Process Model for Unstructured Text Classification,” Wireless Networks, vol. 28, pp. 1251-1261, 2022.
[CrossRef] [Google Scholar] [Publisher Link]
[31] Yelena Mejova, and Padmini Srinivasan, “Exploring Feature Definition and Selection for Sentiment Classifiers,” Proceedings of the International AAAI Conference on Web and Social Media, vol. 5, no. 1, pp. 546-549, 2021.
[CrossRef] [Google Scholar] [Publisher Link]
[32] Ayman S. Ghabayen, and Basem H. Ahmed, “Polarity Analysis of Customer Reviews Based on Part-of-Speech Subcategory,” Journal of Intelligent Systems, vol. 29, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[33] Pankaj et al., “Sentiment Analysis on Customer Feedback Data: Amazon Product Reviews,” 2019 International Conference on Machine Learning, Big Data, Cloud and Parallel Computing (COMITCon), Faridabad, India, pp. 320-322, 2019.
[CrossRef] [Google Scholar] [Publisher Link]
[34] Neri Van Otten, Part-of-speech (POS) Tagging In NLP: 4 Python How To Tutorials, 2023. [Online]. Available: https://spotintelligence.com/2023/01/24/part-of-speech-pos-tagging-in-nlp-python/
[35] Mohammad Salim Hamdard, and Hedayatullah Lodinx, “Effect of Feature Selection on the Accuracy of Machine Learning Model,” International Journal of Multidisciplinary Research and Analysis, vol. 6, no. 9, 4460-4466, 2023.
[CrossRef] [Google Scholar] [Publisher Link]
[36] G. Vaitheeswaran, and L. Arockiam, “Combining Lexicon and Machine Learning Method to Enhance the Accuracy of Sentiment Analysis on Big Data,” International Journal of Computer Science and Information Technologies, vol. 7, no. 1, pp. 306-311, 2016.
[Google Scholar] [Publisher Link]
[37] Madhavi Devaraj, Rajesh Piryani, and Vivek Kumar Singh, “Lexicon Ensemble and Lexicon Pooling for Sentiment Polarity Detection,” IETE Technical Review, vol. 33, no. 3, pp. 332-340, 2016.
[CrossRef] [Google Scholar] [Publisher Link]
[38] Rupika Dalal et al., “A Lexicon Pooled Machine Learning Classifier for Opinion Mining from Course Feedbacks,” Advances in Intelligent Systems and Computing, vol. 320, pp. 419-428, 2015.
[CrossRef] [Google Scholar] [Publisher Link]