Clustering-Based Data Analysis With Hadoop

Ms. Shweta Bhonde; Prof. Mirza Baig

doi:10.14445/22312803/ IJCTT-V67I5P113

Research Article | Open Access | Download PDF

Volume 67 | Issue 5 | Year 2019 | Article Id. IJCTT-V67I5P113 | DOI : https://doi.org/10.14445/22312803/IJCTT-V67I5P113

Clustering-Based Data Analysis With Hadoop

Ms. Shweta Bhonde, Prof. Mirza Baig

Citation :

Ms. Shweta Bhonde, Prof. Mirza Baig, "Clustering-Based Data Analysis With Hadoop," International Journal of Computer Trends and Technology (IJCTT), vol. 67, no. 5, pp. 78-81, 2019. Crossref, https://doi.org/10.14445/22312803/ IJCTT-V67I5P113

Abstract

Large collection of data sets includes different types such as structured, unstructured and semi-structured data. This data is categories as “Big Data” due to its absolute volume, variety and velocity. Traditional data management, warehousing and analysis system fall short of tools to analyze this data. Big data exceeds the processing capability of traditional database to capture, manage, and process the voluminous amount of data. Due to its specific nature of Big Data, in this paper we first introduce the big data is stored in distributed file system architectures. Hadoop and HDFS by Apache is widely used for storing and managing Big Data and the data processing is done by the Map Reduced system. To process or analyse this huge amount of data or extracting meaningful information is a challenging task.

Keywords

Big Data, HDFS, Map Reduced, Cluster

References

[1] PrathyushaRani Merla; Yiheng Liang, “Data analysis using HadoopMapReduce environment” ,IEEE International Conference on Big data,2017.
[2] Jinhua chen; Jing Tang, “Reaserch on architecture of education big data analysis system”,IEEE 2nd International Conference on Big data analysis, 2017.
[3] Ankita Saldhi; Dipesh Yadav, “Big data analysis using Hadoop cluster”,IEEE International Conference on computational Intelligence and Computing Research,2014
[4] Shankar ganesh manikandan ; Siddarth Ravi, “Big data analysis using Apache Hadoop” International Conference On it Convergence and Security (ICITCS),2014.
[5] Foster, C. Kesselman, J. M. Nick, S. Tuecke, "Grid services for distributed system integration", IEEE journals in Computer Science, vol. 35, no. 6, pp. 37-46, 2002.
[6] Byung-Hoon Park, HillolKargupta, "Distributed Data Mining: Algorithms Systems and Applications", CiteSeerX, pp. 341-358, 2002.
[7] Jefry Dean and Sanjay Ghemwat, .MapReduce: Simplified data processing on large clusters, Communications of the ACM, Volume 51 pp. 107-113, 2008
[8] Jefry Dean and Sanjay Ghemwat, MapReduce:A Flexible Data Processing Tool, Communications of the ACM, Volume 53, Issuse.1, January 2010, pp 72-77.
[9] Qi Zhang, Lu Cheng, RaoufBoutaba, "Cloud computing: state-of-the-art and research challenges", Journal of Internet Services and Applications, vol. 1, pp. 7-18, 2010.
[10] Brad Brown, Michael Chui, and James Manyika, Are you ready for the era of big data-, McKinseyQuaterly, Mckinsey Global Institute, October 2011.
[11] Multi Node Cluster Setup on AWS, [online] Available:https://blog.insightdatascience.com/spinning-up-a-free-hadoop-cluster-step-bv-step-c406d56bae42
[12] Multi Node Cluster Setup Tutorial: http://www.michaelnoll. com/tutorials/running-hadoop-on-ubuntu-linux-multi-nodecluster/