Efficient Preprocessing and Patterns Identification Approach for Text Mining

  IJCOT-book-cover
 
International Journal of Computer Trends and Technology (IJCTT)          
 
© - December Issue 2013 by IJCTT Journal
Volume-6 Issue-2                           
Year of Publication : 2013
Authors :Pattan Kalesha , M. Babu Rao ,Ch. Kavitha

MLA

Pattan Kalesha , M. Babu Rao ,Ch. Kavitha"Efficient Preprocessing and Patterns Identification Approach for Text Mining"International Journal of Computer Trends and Technology (IJCTT),V6(2):124-129 December Issue 2013 .ISSN 2231-2803.www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract:- -Due to the rapid expansion of digital data , knowledge discovery and data mining have attracted significant amount of attention for turning such data into helpful information and knowledge. Text categorization is continuing to become the most researched NLP problems on account of the ever-increasing levels of electronic documents and digital libraries. we present a novel text categorization method that puts together the decision on multiple attributes. Since the most of existing text mining methods adopted term-based approaches, all of these are affected by the difficulties of polysemy and synonymy. Existing pattern discovery technique includes the processes of pattern deploying and pattern evolving, to strengthen the impact of using and updating discovered patterns for looking for relevant and interesting information. But the current association Rules methods exist shortage in two aspects once it is used on patterns classification. a person is the strategy ignored the data about word`s frequency in a text . The opposite happens to be the method need pruning rules whenever the mass rules are generated. Within this proposed work specific documents are preprocessed before placing patterns discovery. Preprocessing the document dataset using tokenization, stemming, and probability filtering approaches. Proposed approach gives better decision rules compare to existing approach.

References:-

[1] Effective Pattern Discovery for Text Mining Ning Zhong, IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 1,
[2] Hybrid Approach to Improve Pattern Discovery in Text mining Charushila Kadu, International Journal of Advanced Research in Computer and Communication Engineering
[3] R. Agrawal and R. Srikant, “Fast Algorithms for Mining Association Rules in Large Databases,” Proc. 20th Int’l Conf. Very Large Data Bases (VLDB ’94), pp. 478-499, 1994.
[4] H. Ahonen, O. Heinonen, M. Klemettinen, and A.I. Verkamo, “Applying Data Mining Techniques for Descriptive Phrase Extraction in Digital Document Collections,” Proc. IEEE Int’l Forum on Research and Technology Advances in Digital Libraries (ADL ’98), pp. 2-11, 1998.
[5] R. Baeza-Yates and B. Ribeiro-Neto,Modern Information Retrieval. Addison Wesley, 1999.
[6] T. Chau and A. K. C.Wong, “Pattern discovery by residual analysis and recursive partitioning,” IEEE Trans. Knowledge Data Eng., vol. 11, pp.833–852, Nov./Dec. 1999.
[7] Nitin Jindal, Bing Liu, Ee-Peng Lim, “Finding Unusual Review Patterns Using Unexpected Rules”.

Keywords:-Pa t t e r n s, Rules, Stemming, Probability.