Data Analysis using Mapper and Reducer with Optimal Configuration in Hadoop

  IJCOT-book-cover
 
International Journal of Computer Trends and Technology (IJCTT)          
 
© - Issue 2013 by IJCTT Journal
Volume-4 Issue-3                           
Year of Publication : 2013
Authors :Sasiniveda.G, Revathi.N

MLA

Sasiniveda.G, Revathi.N "Data Analysis using Mapper and Reducer with Optimal Configuration in Hadoop"International Journal of Computer Trends and Technology (IJCTT),V4(3):264-268 Issue 2013 .ISSN 2231-2803.www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract: - Data analysis is an important functionality in cloud computing which allows a huge amount of data to be processed over very large clusters. Hadoop is a software framework for large data analysis. It provide a Hadoop distributed file system for the analysis and transformation of very large data sets is performed using the MapReduce paradigm. MapReduce is known as a popular way to hold data in the cloud environment due to its excellent scalability and good fault tolerance. Map Reduce is a programming model widely used for processing large data sets. Hadoop Distributed File System is designed to stream those data sets. The Hadoop MapReduce system was often unfair in its allocation and a dramatic improvement is achieved through the Elastic Mapper Reducer System. The proposed Mapper Reducer function allows us to analyze the data set and achieve better performance in executing the job by using optimal configuration of mappers and reducers based on the size of the data sets and also helps the users to view the status of the job and to find the error localization of scheduled jobs. This will efficiently utilize the performance properties of optimized scheduled jobs. So, the efficiency of the system will result in substantially lowered system cost, energy usage, management complexity and increases the performance of the system.

References-

[1] Apache,“Hadoop,” http://hadoop.apache.org/docs/r0.20.2/hdfs_design.html
[2] D. Comaniciu and P. Meer. Mean shift: A robust approach toward feature space analysis. IEEE Trans. Pattern Anal. Machine Intell., 24:603–619, 2002.
[3] S. Ghemawat, H. Gobioff, and S.-T. Leung, “The Google File System,” Proc. 19th ACM Symp. Operating Systems Principles, 2003.
[4] Hung-chih Yang, Ali Dasdan, Ruey-Lung Hsiao, and D. Stott Parker. Map-reduce-merge: simplified relational data processing on large clusters.In SIGMOD ’07: Proceedings of the 2007 ACM SIGMOD international conference on Management of data, pages 1029–1040, New York, NY, USA, 2007. ACM.
[5] M. Zaharia, A. Konwinski, A. D. Joseph, R. Katz, I. Stoica. Improving MapReduce Performance in Heterogeneous Environments. In OSDI, USENIX Symposium on Operating System design and Implementation pp.1-16August
[6] J. Dean and S. Ghemawat, “Mapreduce: Simplified Data Processing on Large Clusters,” Comm. ACM, vol. 51, no. 1, pp. 107-113,December 2008.

Keywords— Cloud Computing, Hadoop Distributed file System, Performance Paradigm.