Hybrid Data Preprocessing: User Sessions Identification through Hadoop

International Journal of Computer Trends and Technology (IJCTT)          
© 2015 by IJCTT Journal
Volume-28 Number-4
Year of Publication : 2015
Authors : Vikram Singh Chauhan, B.L Pal


Vikram Singh Chauhan, B.L Pal "Hybrid Data Preprocessing: User Sessions Identification through Hadoop". International Journal of Computer Trends and Technology (IJCTT) V28(4):200-202, October 2015. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract -
As with the growth in businesses which requires knowledge about customer behavior and trends for making crucial and vital decision about policies to be formed based on varying complex parameter are real need for overall benefits and growth of business as well as end users. User metadata analysis plays one of the vital role for the same. User sessions are session obtained from various logs maintained by application or web servers. Access logs basically maintains records containing access time, IP, URL, response etc. through which useful results can be derived. Session identification is a common strategy used to develop metrics for web analytics and behavioral analyses of userfacing systems and further it is used for pattern identification and analysis. A very powerful way to handle huge amount of data is by using HDFS, Hadoop Distributed File System, which provides way to distribute data among several machines connected in a network called cluster. Map-Reduce provides creation of such queries which run on all nodes trough mapper and collect the individual result to form as a whole in reducer. This research suggests implementation of each sessionization [1] process using Hadoop Map-Reduce to improve processing performance. User session identification process can be improved by combining right available techniques to get more effective and accurate results and using distributed file processing system like Hadoop, the overall processing can be speedup to a great extent.

[1] Yan Li and Boqin FENG “The Construction of Transactions for Web Usage Mining”. International Conference on Computational Intelligence and Natural Computing, IEEE, 2009
[2] Dilip Singh Sisodia, Shirish Verma “Web Usage Pattern Analysis through Web Logs: A Review” IEEE 2012.
[3] Sheetal A. Raiyani , Shailendra Jain, “Efficient Preprocessing technique using Web log mining” International Journal of Advancements in Research & Technology, Volume 1, Issue6, November-2012.
[4] Apache Foundation Web site, “Hadoop, HDFS Architecture”. Apache Hadoop 2.7.1, Version: 2.7.1, 2015.
[5] V. Chitraa, Dr. Antony, Selvdoss Thanamani, “Web Log Data Cleaning For Enhancing Mining Process” Volume 01 – No.11, Issue: 03 December 2012, IJCCTS.
[6] L. Shaily, B. Mehul and M. Darshak. “Pre-processing: Procedure on Web Log File for Web Usage Mining”. International Journal of Emerging Technologies and Advance Engineering (IJETAE), Dec 2012, Vol 2, Issue 12, Pg 419, 2012.
[7] Jeffrey Dean and Sanjay Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters” OSDI 2004.

Web Mining, Data Preprocessing, Pattern Analysis, Hadoop, Distributed File System.