Discriminative Features Selection in Text Mining Using TF-IDF Scheme

Ms. Vaishali Bhujade; Prof. N. J. Janwe; Ms. Chhaya Meshram

doi:10.14445/22312803/IJCTT-V1I3P107

Research Article | Open Access | Download PDF

Volume 1 | Issue 3 | Year 2011 | Article Id. IJCTT-V1I3P107 | DOI : https://doi.org/10.14445/22312803/IJCTT-V1I3P107

Discriminative Features Selection in Text Mining Using TF-IDF Scheme

Ms. Vaishali Bhujade, Prof. N. J. Janwe, Ms. Chhaya Meshram

Citation :

Ms. Vaishali Bhujade, Prof. N. J. Janwe, Ms. Chhaya Meshram, "Discriminative Features Selection in Text Mining Using TF-IDF Scheme," International Journal of Computer Trends and Technology (IJCTT), vol. 1, no. 3, pp. 277-280, 2011. Crossref, https://doi.org/10.14445/22312803/IJCTT-V1I3P107

Abstract

This paper describes technique for discriminative features selection in Text mining. `Text mining’ is the discovery of new, previously unknown information, by computer. Discriminative features are the most important keywords or terms inside document collection which describe the informative news included in the document collection. Generated keyword set are used to discover Association Rules amongst keywords labeling the document. For feature extraction Information Retrieval Scheme i.e. TF-IDF is used. This system uses previous work, which contains Text Preprocessing Phases (filtration and stemming). This work serves as basis for Association Rule Mining Phase. Association rule mining represents a Text Mining technique and its goal is to find interesting association or correlation relationships among a large set of data items. With massive amounts of data continuously being collected and stored in databases, many companies are becoming interested in mining association rules from their databases to increase their profits Knowledge discovery in databases (KDD) is the process of finding useful information and pattern in data.

Keywords

Data Mining, Text Mining, Knowledge Data Discovery, Association Rules.

References

[1]. A g r a w a l , R . S r i k a n t , R . - Fast Algorithms for Mining Association Rules, Proc. of the 20th Int`l Conference on Very Large Databases, Santiago, Chile, 1994
[2]. Fa y y a d, U. M., P i a t e t s k y - Sh a p i r o , G . , Smy t h , P . , Ut h u r u s a m y, R . -Advances in Knowledge Discovery and Data Mining, AAAI Press Series in Computer Science. A Bradford Book, the MIT Press, Cambridge Massachusetts, London Englan, 1996
[3]. Fa y y a d, W., P i a t e t s k y - S h a p i r o, G., S m y t h , P . - From data mining to knowledge discovery: An overview, In: Advances in Knowledge Discovery and Data Mining, W. Fayyad, G. Piatetsky-Shapiro, P. Smyth, and R. Uthurusamy (eds.), AAAI/MIT Press,Cambridge/USA, pp. 1 – 3, 1996
[4]. H a n, J., F u, Y. – Discovery of Multiple-Level Association Rules from Large Databases, Proc.of 1995 Int`l Conf. on Very International Journal of Computer Trends and Technology- July to Aug Issue 2011 ISSN: 2231-2803 http://www.internationaljournalssrg.org Page 198 Large Data Bases (VLDB`95), Zürich, Switzerland, September 1995, pp.420-431, 1995
[5]. S r i k a n t , R . , A g r a w a l , R . - Mining Generalized Association Rules, Future Generation Computer Systems, 13(2- 3), 1997
[6]. * * * - Data Mining, CINECA site, http://open.cineca.it/datamining/, accessed 15.01.2008
[7] H. Mahgoub,”Mining association rules from unstructured documents” in Proc. 3rd Int. Conf. on Knowledge Mining, ICKM, Prague, Czech Republic, Aug. 25-27, 2006, pp. 167-172.
[8] Ms. Vaishali G. Bhujade, and Prof. N. Janwe, “OBSOLESCENCE DATA REMOVAL IN TEXT MINING,” International ConferenceICISET,8-9April 2011
[9] J. Paralic and P. Bednar, “Text mining for documents annotation and ontology support (A book chapter in: "intelligent systems at service of Mankind,” ISBN 3-935798-25- 3, Ubooks, Germany, 2003).
[10] C. Manning and H Schütze, Foundations of statistical natural language processing (MIT Press, Cambridge, MA, 1999).
[11] M. Rajman and R. Besancon, “Text mining: natural language techniques and text mining applications”, in Proc. 7th working conf. on database semantics (DS-7), Chapan &Hall IFIP Proc. Series. Leysin, Switzerland Oct. 1997, 7-10.
[12] R. Agrawal and R. Srikant, “Fast algorithms for mining association rules,” In Jorge B. Bocca, Matthias Jarke, and Carlo Zaniolo,editors, Proc. 20th Int. conf. of very Large Data Bases, VLDB,Santigo, Chile, 1994, 487-499.
[13] R. Baeza-Yates and B. Ribeiro-Neto, Modern information retrieval(Addison-Wesley, Longman publishing company, 1999).
[14] R. Feldman and I. Dagan, “Knowledge discovery in textual databases (KDT)”, in Proc. 1st Int. Conf. on Knowledge Discovery and Data Mining, 1995.
[15] R. Feldman and H. Hirsh, “Mining associations in text in the presence of background knowledge,” in Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, USA, 1996.