Hadoop Mapreduce Framework in Big Data Analytics
Vidyullatha Pellakuri , Dr.D. Rajeswara Rao. "Hadoop Mapreduce Framework in Big Data Analytics". International Journal of Computer Trends and Technology (IJCTT) V8(3):115-119, February 2014. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.
Abstract -
As Hadoop is a Substantial scale, open source programming system committed to adaptable, disseminated, information concentrated processing. Hadoop [1] Mapreduce is a programming structure for effectively composing requisitions which prepare boundless measures of information (multi-terabyte information sets) in-parallel on extensive bunches (many hubs) of merchandise fittings in a dependable, shortcoming tolerant way. A Mapreduce [6] skeleton comprises of two parts. They are "mapper" and "reducer" which have been examined in this paper. Fundamentally this paper keeps tabs on Mapreduce modifying model, planning undertakings, overseeing and re-execution of the fizzled assignments. Workflow of Mapreduce is indicated in this exchange.
References
[1] White, Tom (10 May 2012). Hadoop: The Definitive Guide. O`Reilly Media. p. 3. ISBN 978-1-4493-3877-0.
[2] "Applications and organizations using Hadoop". Wiki.apache.org. 2013-06-19. Retrieved 2013-10-17.
[3] "HDFS User Guide". Hadoop.apache.org. Retrieved 2012-05-23.
[4] "HDFS Architecture". Retrieved 1 September 2013.
[5] "Improving MapReduce performance through data placement in heterogeneous Hadoop Clusters" (PDF). Eng.auburn.ed. April 2010.
[6] "HDFS Users Guide - Rack Awareness". Hadoop.apache.org. Retrieved 2013-10-17.
[7]"Cloud analytics: Do we really need to reinvent the storage stack?". IBM. June 2009.
[8]"HADOOP-6330: Integrating IBM General Parallel File System implementation of Hadoop File system interface". IBM. 2009-10-23.
[9] "Refactor the scheduler out of the JobTracker". Hadoop Common. Apache Software Foundation. Retrieved 9 June 2012.
[10] M. Tim Jones (6 December 2011). "Scheduling in Hadoop". ibm.com. IBM. Retrieved 20 November 2013.
[11]"Under the Hood: Hadoop Distributed File system reliability with Namenode and Avatarnode". Facebook. Retrieved 2012-09-13.
[12] "Under the Hood: Scheduling MapReduce jobs more efficiently with Corona". Facebook. Retrieved 2012-11-9.
[13] "Zettaset Launches Version 4 of Big Data Management Solution, Delivering New Stability for Hadoop Systems and Productivity Boosting Features | | Zettaset.comZettaset.com". Zettaset.com. 2011-12-06. Retrieved 2012-05-23.
[14] Curt Monash. "More patent nonsense — Google MapReduce". dbms2.com. Retrieved 2010-03-07.
[15] D. Wegener, M. Mock, D. Adranale, and S. Wrobel, “Toolkit-Based High-Performance Data Mining of Large Data on MapReduce Clusters,” Proc. Int’l Conf. Data Mining Workshops (ICDMW ’09), pp. 296-301, 2009
[16] J. Dean and S. Ghemawat, “Mapreduce: simpli?ed data processing on large clusters,” in Proceedings of the 6th conference on Symposium on Opearting Systems Design & Implementation - Volume 6, ser. OSDI’04.
[17] http://www.slideshare.net/mcsrivas/design-scale-and-performance-of-maprs-distribution-for-hadoop
[18] http://www.mapr.com/products/mapr-editions
[19] http://aws.amazon.com/elasticmapreduce/mapr/
[20] http://aws.amazon.com/elasticmapreduce/
Keywords
Framework, HDFS, Mapreduce, Shuffle, Workflow.