International Journal of Computer
Trends and Technology

Research Article | Open Access | Download PDF

Volume 4 | Issue 8 | Year 2013 | Article Id. IJCTT-V4I8P200 | DOI : https://doi.org/10.14445/22312803/IJCTT-V4I8P200

Web Content Mining Techniques Tools & Algorithms – A Comprehensive Study


R.Malarvizhi, K.Saraswathi

Citation :

R.Malarvizhi, K.Saraswathi, "Web Content Mining Techniques Tools & Algorithms – A Comprehensive Study," International Journal of Computer Trends and Technology (IJCTT), vol. 4, no. 8, pp. 2940-2945, 2013. Crossref, https://doi.org/10.14445/22312803/IJCTT-V4I8P200

Abstract

Nowadays, the growth of World Wide Web has exceeded a lot with more expectations. Large amount of text documents, multimedia files and images were available in the web and it is still increasing in its forms. Data mining is the form of extracting data’s available in the internet. Web mining is a part of data mining which relates to various research communities such as information retrieval, database management systems and Artificial intelligence. The information’s in these forms are well structured from the ground principles. This Web mining adopts much of the data mining techniques to discover potentially useful information from web contents. In this paper, the concepts of web mining with its categories were discussed. The paper mainly focused on the Web Content mining tasks along with its techniques and algorithms.

Keywords

Mining tools, techniques, structured data mining.

References

[1] Han, J., Kamber, M. Kamber. “Data mining: concepts and techniques”. Morgan Kaufmann Publishers, 2000.
[2] Chang G, Healey MJ, McHugh JAM, Wang JTL. Web minig. In Mining the World Wide Web—An Information Search Approach, Dordetch: Kluwer; 2001.
[3] R. Baeza-Yates and e. Berthier Ribeiro-Neto. Modern Information Retrieval. Addison-Wesley Longman Publishing Company, 1999.
[4] Dunham, M. H. 2003. Data Mining Introductory and Advanced Topics. Pearson Education.
[5] Boley D, Gross R, Gini ML, Han EH, Hastings K, Karypis G, Kumar V, Mobasher B, Moore J. Document categorization and query generation on the world wide web using WebACE. JArtif Intell Rev 1999;13(5-6): 365–91.
[6] Y. Wilks. Information Extraction as a core language technology, volume 1299 of Lecture Notes in Computer Science, chapter In M-T. Pazienza (ed.), Information Extraction, pages 1–9. Springer, 1997.
[7] S. Chakrabarti, B. Dom, D. Gibson, J. Kleinberg, S. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins wid. Mining the link structure of the world e web. IEEE Computer, 32(8):60–67, 1999.
[8] P. Maes. Agents that reduce work and information overload. Communications of the ACM,  37(7):30–40, 1994. [9] D. Mladenic. Text-learning and related intelligent agents. IEEE Intelligent Systems, 14(4):44–54, 1999.
[10] M. T. Pazienza, editor. Information Extraction: A multidisciplinary Approach to an Emerging Information Technology, volume 1299 of Lecture Notes in Computer Science. International Summer School, SCIE-97, Frascati (Rome), Springer, 1997.
[11] S. Chakrabarti, B. Dom, and P. Indyk. Enhanced hypertext categorization using hyperlinks. In SIGMOD 1998, Proceedings ACM SIGMOD International Conference on Management of Data, pages 307–318. ACM Press, 1998.
[12] J. Cowie and W. Lehnert. Information extraction. Communications of the ACM, 39(1):80–91, 1996.
[13] O. Etzioni. The world wide web: Quagmire or gold mine. Communications of the ACM, 39(11):65–68, 1996.
[14] J. M. Kleinberg. Authoritative sources in a hyperlinked environment. In Proc. of ACM-  SIAM Symposium on Discrete Algorithms, pages 668–677, 1998.
[15] Cooley, R.; Mobasher, B.; Srivastava, J.; “Web mining: information and pattern discovery on the World Wide Web”. In Proceedings of Ninth IEEE International Conference. pp. 558 – 567, 3-8 Nov. 1997.
[16] J. Srivastava, R. Cooley, M. Deshpande, Pag-Ning Tan, “Web Usage Mining: Discovery and Applications of Usage Patterns from WebData” in proceedings of ACMSIGKDD Explorations NewsletterVol.1Issue2,January 2000.
[17] Johnson, F., Gupta, S.K., Web Content Minings Techniques: A Survey, International Journal of Computer Application. Volume 47 – No.11, p44, June (2012).
[18] Bharanipriya, V. and Prasad, K. 2011. Web content Mining Tools: A Comparative study. International Journal of Information Technology and Knowledge Management. Vol. 4. No 1,211- 215.
[19] Inamdar, S. A. and shinde, G. N. 2010. An Agent Based Intelligent Search Engine System for Web Mining. International Journal on Computer Science and Engineering, Vol. 02, No. 03.
[20] Zhang, Q., Segall, R.S., Web Mining: A Survey of Current Research, Techniques, and Software, International Journal of Information Technology & Decision Making. Vol.7, No. 4, pp. 683-720. World Scientific Publishing Company (2008).
[21] Aggarwal C, Wolf JL, Yu PS. Caching on the world wide web. IEEE Trans Knowledge Data Engg 1999;11(1): 94–107.
[22] K. Bharat and M. R. Henzinger. Improved algorithms for topic distillation in a hyperlinked environment. In Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval, pages 104–111, 1998.
[23] S. Brin and L. Page. The anatomy of a large-scale hypertextual Web search engine. In 7th International WWW Conference, 1998.
[24] Darshna Navadiya, Roshni Patel, Web Content Mining Techniques-A Comprehensive Survey, International Journal of Engineering Research & Technology (IJERT) Vol. 1 Issue 10, December- 2012 ISSN: 2278-0181.
[25] S. R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. Trawling the web for emerging cybercommunities. In Proceedings of the Eighth World Wide Web Conference (WWW8), 1999.
[26] Pirolli P, Pitkow J, Rao R. Silk from a sow’s ear: extracting usable structures from the web. In Proceedings of Conference on Human Factors in Computing Systems (CHI(96), Vancouver, British Columbia, Canada 1996;1996:118–25.
[27] G. Srivastava, K. Sharma, V. Kumar," Web Mining: Today and Tomorrow", in the Proceedings of 2011 3rd International Conference on Electronics Computer Technology (ICECT), pp.399-403, April 2011
[28] Wang X, Abraham A, Smith KA. Web traffic mining using a concurrent neuro-fuzzy approach. In Proceedings of the 2nd International Conference on Hybrid Intelligent Systems, Computing Systems: Design, Management and Applications, Santiago, Chile 2002;2002:853–62.
[29] Mozenda, http://www.mozenda.com/web-mining-software Viewed 18 February 2013.
[30] Web Content Extractor help. WCE, http://www.newprosoft.com/webcontent-extractor.htm Viewed 18 February 2013.
[31] Screen-scraper, http://www.screen-scraper.com Viewed 19 February 2013.
[32] Automation Anywhere Manual. AA, http://www.automationanywhere.com Viewed 06 February 2013.