Web Content Mining Techniques Tools & Algorithms – A Comprehensive Study

International Journal of Computer Trends and Technology (IJCTT)          
© - August Issue 2013 by IJCTT Journal
Volume-4 Issue-8                           
Year of Publication : 2013
Authors :R.Malarvizhi, K.Saraswathi


R.Malarvizhi, K.Saraswathi"Web Content Mining Techniques Tools & Algorithms – A Comprehensive Study"International Journal of Computer Trends and Technology (IJCTT),V4(8):2940-2945 August Issue 2013 .ISSN 2231-2803.www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract:- Nowadays, the growth of World Wide Web has exceeded a lot with more expectations. Large amount of text documents, multimedia files and images were available in the web and it is still increasing in its forms. Data mining is the form of extracting data’s available in the internet. Web mining is a part of data mining which relates to various research communities such as information retrieval, database management systems and Artificial intelligence. The information’s in these forms are well structured from the ground principles. This Web mining adopts much of the data mining techniques to discover potentially useful information from web contents. In this paper, the concepts of web mining with its categories were discussed. The paper mainly focused on the Web Content mining tasks along with its techniques and algorithms.


[1] Han, J., Kamber, M. Kamber. “Data mining: concepts and techniques”. Morgan Kaufmann Publishers, 2000.
[2] Chang G, Healey MJ, McHugh JAM, Wang JTL. Web minig. In Mining the World Wide Web—An Information Search Approach, Dordetch: Kluwer; 2001.
[3] R. Baeza-Yates and e. Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Longman Publishing Company, 1999.
[4] Dunham, M. H. 2003. Data Mining Introductory and Advanced Topics. Pearson Education.
[5] Boley D, Gross R, Gini ML, Han EH, Hastings K, Karypis G, Kumar V, Mobasher B, Moore J. Document categorization and query generation on the world wide web using WebACE. JArtif Intell Rev 1999;13(5-6): 365–91.
[6] Y. Wilks. Information Extraction as a core language technology, volume 1299 of Lecture Notes in Computer Science, chapter In M-T. Pazienza (ed.), Information Extraction, pages 1–9. Springer, 1997.
[7] S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins wid. Mining the link structure of the world e web. IEEE Computer, 32(8):60–67, 1999.
[8] P. Maes. Agents that reduce work and information overload. Communications of the ACM, 37(7):30–40, 1994.
[9] D. Mladenic. Text-learning and related intelligent agents. IEEE Intelligent Systems, 14(4):44–54, 1999.
[10] M. T. Pazienza, editor. Information Extraction: A multidisciplinary Approach to an Emerging Information Technology, volume 1299 of Lecture Notes in Computer Science. International Summer School, SCIE-97, Frascati (Rome), Springer, 1997.
[11] S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, pages 307–318. ACM Press, 1998.
[12] J. Cowie and W. Lehnert. Information extraction. Communications of the ACM, 39(1):80–91, 1996.
[13] O. Etzioni. The world wide web: Quagmire or gold mine. Communications of the ACM, 39(11):65–68, 1996.
[14] J. M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proc. of ACM- SIAM Symposium on Discrete Algorithms, pages 668–677, 1998.
[15] Cooley, R.; Mobasher, B.; Srivastava, J.; “Web mining: information and pattern discovery on the World Wide Web”. In Proceedings of Ninth IEEE International Conference. pp. 558 – 567, 3-8 Nov. 1997.
[16] J. Srivastava, R. Cooley, M. Deshpande, Pag-Ning Tan, “Web Usage Mining: Discovery and Applications of Usage Patterns from WebData” in proceedings of ACMSIGKDD Explorations NewsletterVol.1Issue2,January 2000.
[17] Johnson, F., Gupta, S.K., Web Content Minings Techniques: A Survey, International Journal of Computer Application. Volume 47 – No.11, p44, June (2012).
[18] Bharanipriya, V. and Prasad, K. 2011. Web content Mining Tools: A Comparative study. International Journal of Information Technology and Knowledge Management. Vol. 4. No 1,211- 215.
[19] Inamdar, S. A. and shinde, G. N. 2010. An Agent Based Intelligent Search Engine System for Web Mining. International Journal on Computer Science and Engineering, Vol. 02, No. 03.
[20] Zhang, Q., Segall, R.S., Web Mining: A Survey of Current Research, Techniques, and Software, International Journal of Information Technology & Decision Making. Vol.7, No. 4, pp. 683-720. World Scientific Publishing Company (2008).
[21] Aggarwal C, Wolf JL, Yu PS. Caching on the world wide web. IEEE Trans Knowledge Data Engg 1999;11(1): 94–107.
[22] K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 104–111, 1998.
[23] S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. In 7th International WWW Conference, 1998.
[24] Darshna Navadiya, Roshni Patel, Web Content Mining Techniques-A Comprehensive Survey, International Journal of Engineering Research & Technology (IJERT) Vol. 1 Issue 10, December

Keywords : Mining tools, techniques, structured data mining.