A Big Data Hadoop building blocks comparative study

Allae Erraissi; Abdessamad Belangour; Abderrahim Tragha

doi:https://doi.org/10.14445/22312803/IJCTT-V48P109

Research Article | Open Access | Download PDF

Volume 48 | Number 1 | Year 2017 | Article Id. IJCTT-V48P109 | DOI : https://doi.org/10.14445/22312803/IJCTT-V48P109

A Big Data Hadoop building blocks comparative study

Allae Erraissi, Abdessamad Belangour, Abderrahim Tragha

Citation :

Allae Erraissi, Abdessamad Belangour, Abderrahim Tragha, "A Big Data Hadoop building blocks comparative study," International Journal of Computer Trends and Technology (IJCTT), vol. 48, no. 1, pp. 36-40, 2017. Crossref, https://doi.org/10.14445/22312803/IJCTT-V48P109

Abstract

These last years, the new technologies produce each day large quantities of data. Companies are faced with certain problems of collecting, storing, analyzing and exploiting these large volumes of data in order to create the added value. The whole issue, for companies and administrations, is not to pass by valuable information drowned in the mass. It is here where the technology of the "Big Data" intervenes. This technology is based on an analysis of very fine masses of data. It is interesting to note that there are several publishers who offer distributions ready to use for managing a system Big Data namely HortonWorks [1], Cloudera [2], MapR [3], IBM Infosphere BigInsights [4], pivotal HD [5], Microsoft HD Insight [6], etc. The different distributions have an approach and a different positioning in relation to the vision of a platform Hadoop. These solutions are the Apache Projects and therefore available. Yet, the interest of a complete package resides in the compatibility between the components, the simplicity of installation, support, etc. In this article, we shall discuss the world of big data by defining these characteristics and its architecture. Then we shall talk about some distributions Hadoop, and finally, we shall conclude by a comparative study on the top five suppliers of Hadoop distributions of Big Data.

Keywords

Big Data, 5 V’s, Distribution Hadoop, comparison.

References

[1] HortonWorks Data Platform HortonWorks Data Platform: New Book. (2015).
[2] Menon, R. (2014). Cloudera Administration Handbook
[3] Dunning, T., & Friedman, E. (2015). Real-World Hadoop
[4] Quintero, D. (n.d.). Front cover implementing an IBM InfoSphere BigInsights Cluster using Linux on Power.
[5] Pivotal Software, I. (2014). Pivotal HD Enterprise Installation and Administrator Guide.
[6] Sarkar, D. (2014). Pro Microsoft HDInsight. Berkeley, CA: Apress.
[7] Thibaud Chardonnens, “Big Data analytics on high velocity streams: specific use cases with Storm”, Software Engineering Group, Department of Informatics, University of Fribourg, Switzerland, 2013.
[8] McKinsey Global Institute. Big data: The next frontier for innovation, competition, and productivity. Paper, June 2011. 7, 9, 10, 11
[9] Nauman Sheikh, “Big Data, Hadoop, and Cloud Computing, Implementing Analytics”, Morgan Kaufmann, 2013.
[10] C. Dobrea, and F. Xhafa b, “Intelligent services for Big Data science”, Future Generation Computer Systems, Volume 37, 2014, pp. 267-281.
[11] Sawant, N., & Shah, H. (Software engineer). (2013). Big data application architecture & A a problem-solution approach. Apress.
[12] Lenovo, I. (2015). Lenovo Big Data Reference Architecture for Cloudera Distribution for Hadoop, (August).
[13] Read, W., Report, T., & Takeaways, K. (2016). The Forrester WaveTM: Big Data Hadoop Distributions, Q1 2016.
[14] Gates, Alan, and Daniel Dai. Programming Pig: Dataflow Scripting with Hadoop. 2 edition. O’Reilly Media, 2016.
[15] Capriolo, Edward, Dean Wampler, and Jason Rutherglen. Programming Hive: Data Warehouse and Query Language for Hadoop. 1 edition. Sebastopol, CA: O’Reilly Media, 2012.
[16] Ting, Kathleen, and Jarek Jarcec Cecho. Apache Sqoop Cookbook: Unlocking Hadoop for Your Relational Database. 1 edition. Sebastopol, CA: O’Reilly Media, 2013.
[17] Murthy, Arun, Vinod Vavilapalli, Douglas Eadline, Joseph Niemiec, and Jeff Markham. Apache Hadoop YARN: Moving beyond MapReduce and Batch Processing with Apache Hadoop 2. 1 edition. Upper Saddle River, NJ: Addison-Wesley Professional, 2014.