Discriminative Features Selection in Text Mining Using TF-IDF Scheme

  IJCOT-book-cover
 
International Journal of Computer Trends and Technology (IJCTT)          
 
© July to Aug Issue 2011 by IJCTT Journal
Volume-1 Issue-3                           
Year of Publication : 2011
Authors : Ms. Vaishali Bhujade, Prof. N. J. Janwe, Ms. Chhaya Meshram.

MLA

Ms. Vaishali Bhujade, Prof. N. J. Janwe, Ms. Chhaya Meshram. "Discriminative Features Selection in Text Mining Using TF-IDF Scheme"International Journal of Computer Trends and Technology (IJCTT),V1(3):277-280 July to Aug Issue 2011 .ISSN 2231-2803.www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract: ——This paper describes technique for discriminative features selection in Text mining. `Text mining’ is the discovery of new, previously unknown information, by computer. Discriminative features are the most important keywords or terms inside document collection which describe the informative news included in the document collection. Generated keyword set are used to discover Association Rules amongst keywords labeling the document. For feature extraction Information Retrieval Scheme i.e. TF-IDF is used. This system uses previous work, which contains Text Preprocessing Phases (filtration and stemming). This work serves as basis for Association Rule Mining Phase. Association rule mining represents a Text Mining technique and its goal is to find interesting association or correlation relationships among a large set of data items. With massive amounts of data continuously being collected and stored in databases, many companies are becoming interested in mining association rules from their databases to increase their profits Knowledge discovery in databases (KDD) is the process of finding useful information and pattern in data.

References-

[1]. A g r a w a l , R . S r i k a n t , R . - Fast Algorithms for Mining Association Rules, Proc. of the 20th Int`l Conference on Very Large Databases, Santiago, Chile, 1994
[2]. Fa y y a d, U. M., P i a t e t s k y - Sh a p i r o , G . , Smy t h , P . , Ut h u r u s a m y, R . -Advances in Knowledge Discovery and Data Mining, AAAI Press Series in Computer Science. A Bradford Book, the MIT Press, Cambridge Massachusetts, London Englan, 1996
[3]. Fa y y a d, W., P i a t e t s k y - S h a p i r o, G., S m y t h , P . - From data mining to knowledge discovery: An overview, In: Advances in Knowledge Discovery and Data Mining, W. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (eds.), AAAI/MIT Press,Cambridge/USA, pp. 1 – 3, 1996
[4]. H a n, J., F u, Y. – Discovery of Multiple-Level Association Rules from Large Databases, Proc.of 1995 Int`l Conf. on Very International Journal of Computer Trends and Technology- July to Aug Issue 2011 ISSN: 2231-2803 http://www.internationaljournalssrg.org Page 198 Large Data Bases (VLDB`95), Zürich, Switzerland, September 1995, pp.420-431, 1995
[5]. S r i k a n t , R . , A g r a w a l , R . - Mining Generalized Association Rules, Future Generation Computer Systems, 13(2- 3), 1997
[6]. * * * - Data Mining, CINECA site, http://open.cineca.it/datamining/, accessed 15.01.2008
[7] H. Mahgoub,”Mining association rules from unstructured documents” in Proc. 3rd Int. Conf. on Knowledge Mining, ICKM, Prague, Czech Republic, Aug. 25-27, 2006, pp. 167-172.
[8] Ms. Vaishali G. Bhujade, and Prof. N. Janwe, “OBSOLESCENCE DATA REMOVAL IN TEXT MINING,” International ConferenceICISET,8-9April 2011
[9] J. Paralic and P. Bednar, “Text mining for documents annotation and ontology support (A book chapter in: "intelligent systems at service of Mankind,” ISBN 3-935798-25- 3, Ubooks, Germany, 2003).
[10] C. Manning and H Schütze, Foundations of statistical natural language processing (MIT Press, Cambridge, MA, 1999).
[11] M. Rajman and R. Besancon, “Text mining: natural language techniques and text mining applications”, in Proc. 7th working conf. on database semantics (DS-7), Chapan &Hall IFIP Proc. Series. Leysin, Switzerland Oct. 1997, 7-10.
[12] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo,editors, Proc. 20th Int. conf. of very Large Data Bases, VLDB,Santigo, Chile, 1994, 487-499.
[13] R. Baeza-Yates and B. Ribeiro-Neto, Modern information retrieval(Addison-Wesley, Longman publishing company, 1999).
[14] R. Feldman and I. Dagan, “Knowledge discovery in textual databases (KDT)”, in Proc. 1st Int. Conf. on Knowledge Discovery and Data Mining, 1995.
[15] R. Feldman and H. Hirsh, “Mining associations in text in the presence of background knowledge,” in Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, USA, 1996.

Keywords-Data Mining, Text Mining, Knowledge Data Discovery, Association Rules.