Achieving High Quality Tweet Segmentation using the HybridSeg Framework

International Journal of Computer Trends and Technology (IJCTT)          
© 2016 by IJCTT Journal
Volume-41 Number-1
Year of Publication : 2016
Authors : Dr Ilaiah Kavati, Dayakar P, E. Amarnath Reddy, Vinay Kumar Thumu
DOI :  10.14445/22312803/IJCTT-V41P107


Dr Ilaiah Kavati, Dayakar P, E. Amarnath Reddy, Vinay Kumar Thumu "Achieving High Quality Tweet Segmentation using the HybridSeg Framework". International Journal of Computer Trends and Technology (IJCTT) V41(1):37-41, November 2016. ISSN:2231-2803. Published by Seventh Sense Research Group.

Abstract -
Social networking site (Twitter) has attracted several users to share and distribute most modern data, leading to giant volumes of knowledge created every day. In most of the applications, at the time of IR (Information Retrieval) process, data suffers severely from noise and produces the short nature of the tweets. In the present paper, system uses a framework for segmenting the tweets in the form of batch mode, named as HybridSeg. This process easily preserve the semantic data or content by splitting tweets in the form of understandable segments. ‘HybridSeg’ derives the principal segmentation of each and every tweet by maximizing its sum and the stickiness scores of corresponding candidate segments that are to be maintained. HybridSeg is additionally intended to iteratively gain from con?dent sections as pseudo criticism. Experiments show that tweet segmentation quality is signi?cantly improved.

[1] Chenliang Li, Aixin Sun, Jianshu Weng and Qi Hi, “Tweet Segmentation and Its Application to Named Entity Recognition ,” IEEE Transactions on Knowledge and Data Engineering , vol. 27, No. 2, February 2015.
[2] Chenliang Li, Aixin Sun, Anwi taman Datta, “Twevent: Segment-based Event Detection from Tweets”, School of Computer Engineering, Nanyang Technological University, Singapore.
[3] J. F. da Silva and G. P. Lopes. A local maxima method and a fair dispersion normalization for extracting multi-word units from corpora. In Proc. of the 6th Meeting on Mathematics of Language, 1999.
[4] D. Downey, M. Broadhead, and O. Etzioni. Locating complex named entities in web text. In Proc. of IJCAI, 2007.
[5] Chenliang Li, Jianshu Weng, Qi Hi, Yuxia Yao, Anwitaman Datta, Aixin Sun and Bu-Sung Lee, “TwiNER: Named Entity Recognition in Targeted Twitter Stream, ” School of Computer Engineering ,Singapore, August 2012.
[6] Chao Yang , Robert Harkreader and Guofei Gu, “Empirical Evluation and New Design for Fighting Evolving Twitter Spammers,” Member, IEEE, vol. 8, No. 8, August 2013.
[7] Alian Ritter, Sam Clark, Mausam and Oream Etzioni, “Named Entity Recognition in Tweets: An Experimental Study,” Computer Science and Engineering University of Washington, USA.
[8] Deniz Karatay and Pinar Karatay, “User Interest Modeling in Twitter with Named Entity Recognition,” Turkey, vol. 1395, 18th May 2015.
[9] Mena B. Habib , Maurice van Keulen and Zhemin Zhu, “Named Entity Extraction and Linking Challenges,” University of Twente Microposts , 7TH April 2014.
[10] C. Li, A. Sun, J. Weng, and Q. He, “Exploiting hybrid contexts for tweet segmentation,” in Proc. 36th Int. ACM SIGIR Conf. Res. Develop. Inf. Retrieval, 2013, pp. 523–532.

HybridSeg, Named Entity Recognition, Twitter, Tweet Segmentation.