A Big Data Hadoop building blocks comparative study

International Journal of Computer Trends and Technology (IJCTT)          
© 2017 by IJCTT Journal
Volume-48 Number-1
Year of Publication : 2017
Authors : Allae Erraissi, Abdessamad Belangour, Abderrahim Tragha
DOI :  10.14445/22312803/IJCTT-V48P109


Allae Erraissi, Abdessamad Belangour, Abderrahim Tragha "A Big Data Hadoop building blocks comparative study". International Journal of Computer Trends and Technology (IJCTT) V48(1):36-40, June 2017. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract -
These last years, the new technologies produce each day large quantities of data. Companies are faced with certain problems of collecting, storing, analyzing and exploiting these large volumes of data in order to create the added value. The whole issue, for companies and administrations, is not to pass by valuable information drowned in the mass. It is here where the technology of the "Big Data" intervenes. This technology is based on an analysis of very fine masses of data. It is interesting to note that there are several publishers who offer distributions ready to use for managing a system Big Data namely HortonWorks [1], Cloudera [2], MapR [3], IBM Infosphere BigInsights [4], pivotal HD [5], Microsoft HD Insight [6], etc. The different distributions have an approach and a different positioning in relation to the vision of a platform Hadoop. These solutions are the Apache Projects and therefore available. Yet, the interest of a complete package resides in the compatibility between the components, the simplicity of installation, support, etc. In this article, we shall discuss the world of big data by defining these characteristics and its architecture. Then we shall talk about some distributions Hadoop, and finally, we shall conclude by a comparative study on the top five suppliers of Hadoop distributions of Big Data.

[1] HortonWorks Data Platform HortonWorks Data Platform: New Book. (2015).
[2] Menon, R. (2014). Cloudera Administration Handbook
[3] Dunning, T., & Friedman, E. (2015). Real-World Hadoop
[4] Quintero, D. (n.d.). Front cover implementing an IBM InfoSphere BigInsights Cluster using Linux on Power.
[5] Pivotal Software, I. (2014). Pivotal HD Enterprise Installation and Administrator Guide.
[6] Sarkar, D. (2014). Pro Microsoft HDInsight. Berkeley, CA: Apress.
[7] Thibaud Chardonnens, “Big Data analytics on high velocity streams: specific use cases with Storm”, Software Engineering Group, Department of Informatics, University of Fribourg, Switzerland, 2013.
[8] McKinsey Global Institute. Big data: The next frontier for innovation, competition, and productivity. Paper, June 2011. 7, 9, 10, 11
[9] Nauman Sheikh, “Big Data, Hadoop, and Cloud Computing, Implementing Analytics”, Morgan Kaufmann, 2013.
[10] C. Dobrea, and F. Xhafa b, “Intelligent services for Big Data science”, Future Generation Computer Systems, Volume 37, 2014, pp. 267-281.
[11] Sawant, N., & Shah, H. (Software engineer). (2013). Big data application architecture & A a problem-solution approach. Apress.
[12] Lenovo, I. (2015). Lenovo Big Data Reference Architecture for Cloudera Distribution for Hadoop, (August).
[13] Read, W., Report, T., & Takeaways, K. (2016). The Forrester WaveTM: Big Data Hadoop Distributions, Q1 2016.
[14] Gates, Alan, and Daniel Dai. Programming Pig: Dataflow Scripting with Hadoop. 2 edition. O’Reilly Media, 2016.
[15] Capriolo, Edward, Dean Wampler, and Jason Rutherglen. Programming Hive: Data Warehouse and Query Language for Hadoop. 1 edition. Sebastopol, CA: O’Reilly Media, 2012.
[16] Ting, Kathleen, and Jarek Jarcec Cecho. Apache Sqoop Cookbook: Unlocking Hadoop for Your Relational Database. 1 edition. Sebastopol, CA: O’Reilly Media, 2013.
[17] Murthy, Arun, Vinod Vavilapalli, Douglas Eadline, Joseph Niemiec, and Jeff Markham. Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2. 1 edition. Upper Saddle River, NJ: Addison-Wesley Professional, 2014.

Big Data, 5 V’s, Distribution Hadoop, comparison.