A Survey on Research trends & approaches used for structuring Web server log files data

  IJCTT-book-cover
 
International Journal of Computer Trends and Technology (IJCTT)          
 
© 2015 by IJCTT Journal
Volume-24 Number-1
Year of Publication : 2015
Authors : Rupesh Sendre
  10.14445/22312803/IJCTT-V24P103

MLA

Rupesh Sendre "A Survey on Research trends & approaches used for structuring Web server log files data". International Journal of Computer Trends and Technology (IJCTT) V24(1):11-16, June 2015. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract -
Web server log access file is a simple plain text file, contains information about every visit to the pages hosted on a server like when they were requested, the internet protocol (IP) address of the request, the error code, the number of bytes sent to the user, and the type of browser used. Web servers can also capture referrer logs, which show the page from which a visitor makes the next request. Since the web server log files contains huge amount of information which may be unstructured or unorganized, hence it is required that server information must be structured prior to perform any analysis. In this paper I have discussed the various approaches that can be used for this purpose & elaborate the research trends in this field.

References
[1] R. Kosala & H. Blockeel, “Web mining research: a survey”, SIGKDD, ACM 2 (1), pp.1-15, 2000.
[2] R. Cooley, B. Mobasher, and J. Srivastava, “Web Mining Information and Pattern Discovery on the World Wide Web,” In Proc. IEEE Computer Society, 1997.
[3] F. Johnson & S. Kumar, “Web content mining using genetic algorithm”, In Proc. Springer, CCIS 361, pp. 82-93, 2013.
[4] V. Chitraa and Dr. A.S. Davamani, “A Survey on Preprocessing Methods for Web Usage Data”, IJCSIS, Vol. 7, No. 3, pp. 78-83, 2010.
[5] P.A. Laur, M. Teisseire & P. Poncelet, “Web usage mining: extraction, maintenance and behavior trends”, Proceedings of the 1st Indian International Conference on Artificial Intelligence (IICAI'03), Hyderabad, India, December 2003.
[6] Naresh Barsagade “Web usage Mining and Pattern Discovery “a survey paper” –Dec 8, 2003.
[7] J. Srivastava, R. Cooley, M. Deshpande & P. N. Tan, “Web usage mining: Discovery and applications of usage patterns from web data,” SIGKDD Explorations- Vol. 1, No. 2, pp. 12- 33, 2000.
[8] W.W.W. Consortium the common log file format. [Online].Available: http://www.w3.org/Daemon/User/Config/
[9] A. Vakali, J. Pokorny & T. Dalamagas, “An overview of web data clustering practices”, In Proc. Springer, EDBT 2004 Workshops, LNCS 3268, pp. 597-606, 2004.
[10] J Guo, V Keselj & Q Gao, “Integrating web content clustering into web log association rule mining”, In Proc. Springer, CCIS, Volume 3501, pp. 182-193, 2005.
[11] K.R. Suneetha & Dr. R. Krishnamoorthi, “Identifying user behavior by analyzing web server access log file”, IJCSNS, Vol.9 No.4, pp. 327-332, April 2009.
[12] Geeta R. B., S. G. Totad & Prasad Reddy, “Amalgamation of web usage mining and web structure mining”, ACEEE, IJRTE, Vol. 1, No. 2, pp. 279-281, May 2009.
[13] R. Thakur, S. Jain & N. S. Chaudhari, “User behavior analysis using alignment based grammatical interference from web server access log”, IJFCC, Vol. 2, No. 6, pp. 543-547, December 2013.
[14] C.R. Anderson, P Domingos & D. S. Weld, “Relational markov models and their application to adaptive web navigation”, KDD'02 Proc. of the eighth ACM SIGKDD international conference on Knowledge discovery and data mining, pp.143-152, 2002.
[15] Emine Tug, Merve Sakiroglu & Ahmet Arslan, “Automatic discovery of the sequential accesses from web log data files via a genetic algorithm”, In Proc. Elsevier, KBS. 19, pp. 180-186, 2005.
[16] H. Liu & V. Keselj, “Combined mining of web server logs and web contents for classifying user navigation patterns and predicting users’ future requests”, In Proc. Elsevier, Data & Knowledge Engineering 61, pp. 304-330, 2007.
[17] R. Chourasia & P. Choudhary, “An approach for web log preprocessing and evidence preservation for web mining”, IJCSE, Vol. 2, Issue 4, pp. 210-215, 2014.
[18] S. O. Fageeri & R. Ahmad, “An efficient log file analysis algorithm using binary based data structure”, Procedia- Social and Behavioral Sciences 129 pp. 518-526, 2014.
[19] L. K. Joshila Grace, V. Maheswari & D. Nagamalai, “Analysis of web logs and web user in web mining”, IJNSA, Vol. 3 No. 1, pp. 99-110, 2011.
[20] T. Pamutha, S. Chimphlee, C. Kimpan & P. Sanguansat, “Data preprocessing on web server log files for mining users access patterns”, IJRRWC, Vol. 2, No. 2, pp. 92-98, 2012.
[21] A. Guerbas, O. Addam, O. Zaarour, M. Nagi, A. Elhajj, M. Ridley & R. Alhajj, “Effective web log mining and online navigational pattern prediction”, In Proc. Elsevier, Knowledge Based Systems 49, pp. 50-62, 2013.
[22] P. Patel & M. Parmar, “Improve heuristics for user session identification through web server log in web usage mining”, IJCSIT, Vol. 5 No. 3, pp. 3562-3565, 2014.
[23] Savitha K & Vijaya MS, “Mining of web server logs in a distributed cluster using big data technologies”, IJACSA, Vol. 5 No. 3, pp. 137-142, 2014.
[24] N. Poggi, V. Muthusamy, D. Carrera & R. Khalaf, “Business process mining from e-Commerce web logs”, In Proc. Springer, Business process management, Vol. 8094, pp. 65-80, 2013

Keywords
Web usage mining, Web log, Web server access log file, big data.