Multi-Class Tweet Categorization Using Map Reduce Paradigm

International Journal of Computer Trends and Technology (IJCTT)          
© 2014 by IJCTT Journal
Volume-9 Number-2                          
Year of Publication : 2014
Authors : Mohit Tare , Indrajit Gohokar , Jayant Sable , Devendra Paratwar , Rakhi Wajgi
DOI :  10.14445/22312803/IJCTT-V9P117


Mohit Tare , Indrajit Gohokar , Jayant Sable , Devendra Paratwar , Rakhi Wajgi."Multi-Class Tweet Categorization Using Map Reduce Paradigm". International Journal of Computer Trends and Technology (IJCTT) V9(2):78-81, March 2014. ISSN:2231-2803. Published by Seventh Sense Research Group.

Abstract -
Twitter is one of the most popular micro-blogging website in today`s globalized world. Twitter messages can be mined to gain valuable information. Although Twitter provides a list of most popular topics people tweet about known as Trending Topics in real time, it is often hard to understand what these trending topics are about. Therefore, various efforts are being made to classify these topics into general categories with high accuracy for better information retrieval. We propose the use of one of the classification algorithm called Naïve Bayes for the categorization of tweets which has been discussed in this paper. It then proposes how the Map – Reduce paradigm can be applied to existing Naïve Bayes algorithm to handle large number of tweets.

[1] Java, Akshay, Xiaodan Song, Tim Finin, and Belle Tseng. "Why we twitter: understanding microblogging usage and communities." In Proceedings of the 9th WebKDD and 1st SNA-KDD 2007 workshop on Web mining and social network analysis, pp. 56-65. ACM, 2007.
[2] Kwak, Haewoon, Changhyun Lee, Hosung Park, and Sue Moon. "What is Twitter, a social network or a news media?." In Proceedings of the 19th international conference on World wide web, pp. 591-600. ACM, 2010.
[3] Choudhary, Alok, William Hendrix, Kathy Lee, Diana Palsetia, and Wei-Keng Liao. "Social media evolution of the Egyptian revolution." Communications of the ACM 55, no. 5 (2012): 74-80.
[4] Lee, Kathy, Diana Palsetia, Ramanathan Narayanan, Md Mostofa Ali Patwary, Ankit Agrawal, and Alok Choudhary. "Twitter trending topic classification." InData Mining Workshops (ICDMW), 2011 IEEE 11th International Conference on, pp. 251-258. IEEE, 2011.
[5] Malkani, Zahan, and Evelyn Gillie. "Supervised Multi-Class Classification of Tweets." (2012).
[6] Naaman, Mor, Jeffrey Boase, and Chih-Hui Lai. "Is it really about me?: message content in social awareness streams." In Proceedings of the 2010 ACM conference on Computer supported cooperative work, pp. 189-192. ACM, 2010.
[7] Sankaranarayanan, Jagan, Hanan Samet, Benjamin E. Teitler, Michael D. Lieberman, and Jon Sperling. "Twitterstand: news in tweets." In Proceedings of the 17th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, pp. 42-51. ACM, 2009.
[8] Witten, Ian H., and Eibe Frank. Data Mining: Practical machine learning tools and techniques. Morgan Kaufmann, 2005.
[9] Chu, Cheng-Tao, Sang Kyun Kim, Yi-An Lin, YuanYuan Yu, Gary Bradski, Andrew Y. Ng, and Kunle Olukotun. "Map-reduce for machine learning on multicore." In NIPS, vol. 6, pp. 281-288. 2006.

Categorization, Map-Reduce, Trending Topics, Tweet, Twitter