Improving Performance of Map Reduce using DLAJS Algorithm

Balaji Siva Jyothi; Dr. P. Radhika Raju; Dr.A.Ananda Rao

doi:10.14445/22312803/IJCTT-V61P104

Research Article | Open Access | Download PDF

Volume 61 | Number 1 | Year 2018 | Article Id. IJCTT-V61P104 | DOI : https://doi.org/10.14445/22312803/IJCTT-V61P104

Improving Performance of Map Reduce using DLAJS Algorithm

Balaji Siva Jyothi, Dr. P. Radhika Raju, Dr.A.Ananda Rao

Citation :

Balaji Siva Jyothi, Dr. P. Radhika Raju, Dr.A.Ananda Rao, "Improving Performance of Map Reduce using DLAJS Algorithm," International Journal of Computer Trends and Technology (IJCTT), vol. 61, no. 1, pp. 21-25, 2018. Crossref, https://doi.org/10.14445/22312803/IJCTT-V61P104

Abstract

Cloud Computing provides different services to the users with regard to processing data. The main concepts in cloud computing are big data and big data analysis. Hadoop framework is used to process big data in parallel processing mode. Job scheduling and optimized resource allocation can help improve performance of Hadoop. In the existing system Hadoop architecture has been enhanced in order to reduce computational complexity while processing big data. It also takes care of efficient resource allocation and processing textual data such as DNA sequence. Their architecture was named as H2Hadoop that improves the ability of NameNode to assign jobs to the TaskTrackers (DataNodes) in a given cluster. By adding control features to NameNode, their architecture can intelligently assign tasks to the DataNodes where required data is present thus reducing resource utilization pertaining to CPU time, number of read operations etc. However, the existing system can be improved to have more focused approach by considering data locality awareness to the job scheduling process. In the proposed system, an algorithm is proposed to have data locality aware job scheduling. This algorithm is named as Data Locality Aware Job Scheduling (DLAJS) algorithm. The algorithm explores the data locality aware to know how far efficient job scheduling. Thus, consuming less cloud resources such as CPU, memory and execution time.

Keywords

Cloud computing, Big data, Hadoop, MapReduce framework,Data-locality,Job scheduling

References

[1] Patel, A.B., M. Birla, and U. Nair. Addressing big data problem using Hadoop and Map Reduce. in Engineering (NUiCONE), 2012 Nirma University International Conference on. 2012.
[2] HamoudAlshammari, Jeongkyu Lee and Hassan Bajwa. (2016). H2Hadoop: Improving Hadoop Performance using the Metadata of Related Jobs. IEEE TRANSACTIONS ON Cloud Computing, p1-11.
[3] Herodotou, H., Hadoop performance models. arXiv preprint arXiv:1106.0940, 2011.
[4] Xu, W., W. Luo, and N. Woodward. Analysis and optimization of data import with Hadoop. IEEE.
[5] P.Radhika Raju, Dr. A.Ananda Rao, Optimization of program invariants, ACM SIGSOFT Software Engineering Notess, Vol.39, Issue 1, January 2014.
[6] Palanisamy, B., et al. Purlieus: locality-aware resource allocation for MapReduce in a cloud. in Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. ACM.
[7] Hammoud, M. and M.F. Sakr. Locality-Aware Reduce Task Scheduling for MapReduce. in Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on. 2011.
[8] Chen, M., S. Mao, and Y. Liu, Big Data: A Survey. Mobile Networks and Applications, 2014. 19(2): p. 171-209.
[9] Buck, J.B., et al. SciHadoop: Array-based query processing in Hadoop. in High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for. 2011.
[10] Condie, T., et al.MapReduce Online.in NSDI.2010
[11] Schatz, M.C., B. Langmead, and S.L. Salzberg, Cloud computing and the DNA data race. Nature biotechnology, 2010. 28(7): p. 691.
[12] Changqing, J., et al. Big Data Processing in Cloud Computing Environments. in Pervasive Systems, Algorithms and Networks (ISPAN), 2012 12th International Symposium on. 2012.
[13] Farrahi, K. and D. Gatica-Perez, A probabilistic approach to mining mobile phone data sequences. Personal Ubiquitous Comput., 2014. 18(1): p. 223-238.
[14] Jagadish, H., et al., Big data and its technical challenges. Communications of the ACM, 2014. 57(7): p. 86-94.
[15] Marx, V., Biology: The big challenges of big data. Nature, 2013. 498(7453): p. 255-260.
[16] Schadt, E.E., et al., Computational solutions to large-scale data management and analysis. Nature Reviews Genetics, 2010. 11(9): p. 647-657.
[17] Ming, M., G. Jing, and C. Jun-jie. Blast-Parallel: The parallelizing implementation of sequence alignment algorithms based on Hadoop platform. in Biomedical Engineering and Informatics (BMEI), 2013 6th International Conference on. 2013.
[18] Wu, S., et al. Query optimization for massively parallel data processing. in Proceedings of the 2nd ACM Symposium on Cloud Computing. 2011. ACM.