Sentiment Analysis using Naive Bayes Classifier and Information Gain Feature Selection over Twitter

  IJCTT-book-cover
 
         
 
© 2020 by IJCTT Journal
Volume-68 Issue-5
Year of Publication : 2020
Authors : Manjit Singh, Swati Gupta
DOI :  10.14445/22312803/IJCTT-V68I5P117

How to Cite?

Meteb Altaf, Alaa Menshawi, Rana Alomran, Nada Asiri, Wadha Aldriawish, "An Intelligent Mobile Application for Customizing Travelers Trips," International Journal of Computer Trends and Technology, vol. 68, no. 5, pp. 84-91, 2020. Crossref, https://doi.org/10.14445/22312803/IJCTT-V68I5P117

Abstract
The development of the internet today is growing very rapidly which indirectly encourages the creation of personal web content that involves sentiments such as blogs, tweets, web forums and other types of social media. Humans often make decisions based on input from friends, relatives, colleagues and others. Supported by the availability of growth and popularity of opinion-rich resources or sentiments such as online site reviews for e-commerce products and personal blogs For example, the expression of personal feelings that allows users to discuss everyday problems, exchange political views, evaluate services and products like Smartphone’s Smart TV’s etc. This research applies opinion mining method by using Naïve Bayes Classifier and Information Gain algorithm based on Feature Selection. Testing this method uses the E-Commerce based tweet dataset downloaded from the Twitter Cloud Repository. The purpose of this study is to improve the accuracy of the Naïve Bayes algorithm in classifying documents along with Information Gain methodology. Accuracy achieved in this study amounted to 88.80% which is appropriate to evaluate the sentiments.

Keywords
Machine Learning, Sentiment Analysis, Information Gain, Naïve Bayes Classifier..

Reference
[1] https://en.wikipedia.org/wiki/E-commerce_in_India
[2] https://www.slideshare.net/mailforveena/ernst-and young-rebirthof-ecommerce-in-india-report
[3] https://www.kaspersky.com/blog/security_risks_report_financial_impact
[4] https://economictimes.indiatimes.com/wealth/personal-finance-news/over-50-indians-fell-prey-to-discount-scams-tips-to-stay-safe-this-holiday-season/articleshow/72453319.cms
[5] Natarajan, Bhalaji & Kb, Sundharakumar & Selvaraj, Chithra. (2018). “Empirical study of feature selection methods over classification algorithms”. International Journal of Intelligent Systems Technologies and Applications. 17. 98. 10.1504/IJISTA.2018.091590.
[6] Zhang, Lei & Liu, Bing. (2017). “Sentiment Analysis and Opinion Mining”. 10.1007/978-1-4899-7687-1_907.
[7] Zhang, Lei & Liu, Bing. (2016). “Sentiment Analysis and Opinion Mining”. 1-10. 10.1007/978-1-4899-7502-7_907-1.
[8] Arya, Apoorva & Shukla, Vishal & Negi, Arvind & Gupta, Kapil. (2020). “A Review: Sentiment Analysis and Opinion Mining”. SSRN Electronic Journal. 10.2139/ssrn.3602548.
[9] M.K., Sudha. (2020). “Social Media Sentiment Analysis for Opinion Mining”. International Journal of Psychosocial Rehabilitation. 24. 3672-3679. 10.37200/IJPR/V24I5/PR202075.
[10] (2020). Machine Learning. 10.1007/978-981-15-2770-8_6.
[11] Suthaharan, Shan. (2016). “Supervised Learning Models”. 10.1007/978-1-4899-7641-3_7.
[12] Quinto, Butch. (2020). Unsupervised Learning. 10.1007/978-1-4842-5669-5_4.
[13] Sujatha, Christy. (2018). “Building Predictive Model For Diabetics Data Using K Means Algorithm”.
[14] Chang, Mark. (2020). Reinforcement Learning. 10.1201/9780429345159-11.
[15] Schuppert, A. & Ohrenberg, A.. (2020). data mining. 10.1002/9783527809080.cataz04524.
[16] Bramer, Max. (2020). “Data for Data Mining”. 10.1007/978-1-4471-7493-6_2.
[17] Wardle, Claire & Greason, Grace & Kerwin, Joe & Dias, Nic. (2019). “Data Mining”. 10.32388/635643.
[18] Dawson, Catherine. (2019). “Data mining”. 10.4324/9781351044677-13.
[19] Verma, Nishchal & Salour, Al. (2020). “Feature Selection”. 10.1007/978-981-15-0512-6_5.
[20] Soltanian, Ali & Rabiei, Niloofar & Bahreini, Fatemeh. (2019). “Feature Selection in Microarray Data Using Entropy Information”.10.15586/computationalbiology.2019.ch10.
[21] Bramer, Max. (2020). “Decision Tree Induction: Using Entropy for Attribute Selection”. 10.1007/978-1-4471-7493-6_5.
[22] Hermanson, Eric. (2018). “Claude Shannon Information Theory”.
[23] Strawn, George. (2014). “Claude Shannon: Mastermind of Information Theory”. IT Professional. 16. 70-72. 10.1109/MITP.2014.87.
[24] Guizzo, Erico. (2003). “The Essential Message: Claude Shannon and the Making of Information Theory”.
[25] Nwanganga, Fred & Chapple, Mike. (2020). “Naïve Bayes”. 251-275. 10.1002/9781119591542.ch7.
[26] Webb, Geoffrey. (2016). “Naïve Bayes.” 10.1007/978-1-4899-7502-7_581-1.
[27] Janssen, Jürgen & Laatz, Wilfried. (2017). Naive Bayes. 10.1007/978-3-662-53477-9_25.
[28] Kaviani, Pouria. (2017). “Naïve Bayes Algorithm”.
[29] Cichosz, Pawe?. (2015). “Naïve Bayes classifier”. 10.1002/9781118950951.ch4.
[30] Caraffini, Fabio. (2019). “The Naive Bayes learning algorithm”. 10.13140/RG.2.2.18248.37120.
[31] Jockers, Matthew & Thalken, Rosamond. (2020). “Sentiment Analysis”. 10.1007/978-3-030-39643-5_14.
[32] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. “Thumbs up? Sentiment Classification using Machine Learning” Techniques. Proceedings of EMNLP, 2002. Introduced polarity dataset v0.9.
[33] A. Paul, B. S. Purkayastha and S. Sarkar, "Hidden Markov Model based Part of Speech Tagging for Nepali language," 2015 International Symposium on Advanced Computing and Communication (ISACC), Silchar, 2015, pp. 149-156, doi: 10.1109/ISACC.2015.7377332.
[34] Rusydiana, Aam & Firmansyah, Irman & Marlina, Lina. (2018). “SENTIMENT ANALYSIS OF MICROTAKAFUL INDUSTRY”. Vol 6, No 1 (2018). 10.15575/ijni.v6i1.3004.
[35] Yang, Xin-She. (2019). “Data mining techniques”. 10.1016/B978-0-12-817216-2.00013-2.
[36] Roiger, Richard. (2017). “Basic Data Mining Techniques”. 10.1201/9781315382586-3.
[37] Kaur, Harmeet & Kaur, Jasleen. (2018). “Survey on Data Mining Technique”. International Journal of Computer Sciences and Engineering. 6. 915-920. 10.26438/ijcse/v6i8.915920.
[38] Gritta, Milan. (2019). “A Comparison of Techniques for Sentiment Classification of Film Reviews”.
[39] Timor, Mehpare & Dincer, Hasan & Emir, ?enol. (2012). “Performance comparison of artificial neural network (ANN) and support vector machines (SVM) models for the stock selection problem: An application on the Istanbul Stock Exchange (ISE)-30 index in Turkey”. African Journal of Business Management. 6. 1191-1198.
[40] C, Spoorthi & Ravikumar, Dr & M.J, Mr. (2019). “Sentiment Analysis of Customer Feedback on Restaurant Reviews”. SSRN Electronic Journal. 10.2139/ssrn.3506637.
[41] Sui, Haiyang & Khoo, Christopher & Chan, Syin. (2003). “Sentiment Classification of Product Reviews Using SVM and Decision Tree Induction”. 14. 10.7152/acro.v14i1.14113.
[42] Gupta, Divya & Sharma, Aditi & Kumar, Mukesh. (2020). “TweetsDaily: Categorised News from Twitter”. 10.1007/978-981-15-0222-4_5.
[43] Rajvanshi, Nitin & Chowdhary, Prof. (2017). “Comparison of SVM and Naïve Bayes Text Classification Algorithms using WEKA”. International Journal of Engineering Research and. V6. 10.17577/IJERTV6IS090084.
[44] Kesumawati, Ayundyah & Thalib, A.K.. (2018). “Hoax classification with Term Frequency - Inverse Document Frequency using non-linear SVM and Naïve Bayes”. International Journal of Advances in Soft Computing and its Applications. 10. 116-128.
[45] Dey, Sanjay & Wasif, Sarhan & Tonmoy, Dhiman & Sultana, Subrina & Sarkar, Jayjeet & Dey, Monisha. (2020). “A Comparative Study of Support Vector Machine and Naive Bayes Classifier for Sentiment Analysis on Amazon Product Reviews”. 217-220. 10.1109/IC3A48958.2020.233300.
[46] Garšva, Gintautas & Korovkinas, Konstantinas. (2018). “SVM and Naïve Bayes Classification Ensemble Method for Sentiment Analysis”. Baltic J. Modern Computing.
[47] Dilrukshi, Inoshika & De Zoysa, Kasun & Caldera, Amitha. (2013). “Twitter news classification using SVM”. Proceedings of the 8th International Conference on Computer Science and Education, ICCSE 2013. 287-291. 10.1109/ICCSE.2013.6553926.
[48] FARKHUND IQBAL, “A Hybrid Framework for Sentiment Analysis Using Genetic Algorithm Based Feature Reduction”, Digital Object Identifier 10.1109/ACCESS.2019.2892852
[49] Chavan, Somanath & Chavan, Yash. (2019). “Sentiment Classification of News Headlines on India in the US Newspaper: Semantic Orientation Approach vs Machine Learning”. 10.13140/RG.2.2.34008.75522.
[50] Duffy, Andrew. (2020). Twitter. 10.4324/9780429356612-7.
[51] Bell, Jason. (2020). “The Twitter API Developer Application Configuration”. 10.1002/9781119642183.app2.