Improving Performance of Map Reduce using DLAJS Algorithm

International Journal of Computer Trends and Technology (IJCTT)          
© 2018 by IJCTT Journal
Volume-61 Number-1
Year of Publication : 2018
Authors : Balaji Siva Jyothi, Dr. P. Radhika Raju, Dr.A.Ananda Rao
DOI :  10.14445/22312803/IJCTT-V61P104


MLA Style: Balaji Siva Jyothi, Dr. P. Radhika Raju, Dr.A.Ananda Rao "Improving Performance of Map Reduce using DLAJS Algorithm" International Journal of Computer Trends and Technology 61.1 (2018): 21-25.

APA Style:Balaji Siva Jyothi, Dr. P. Radhika Raju, Dr.A.Ananda Rao, (2018). Improving Performance of Map Reduce using DLAJS Algorithm. International Journal of Computer Trends and Technology, 61(1), 21-25.

Cloud Computing provides different services to the users with regard to processing data. The main concepts in cloud computing are big data and big data analysis. Hadoop framework is used to process big data in parallel processing mode. Job scheduling and optimized resource allocation can help improve performance of Hadoop. In the existing system Hadoop architecture has been enhanced in order to reduce computational complexity while processing big data. It also takes care of efficient resource allocation and processing textual data such as DNA sequence. Their architecture was named as H2Hadoop that improves the ability of NameNode to assign jobs to the TaskTrackers (DataNodes) in a given cluster. By adding control features to NameNode, their architecture can intelligently assign tasks to the DataNodes where required data is present thus reducing resource utilization pertaining to CPU time, number of read operations etc. However, the existing system can be improved to have more focused approach by considering data locality awareness to the job scheduling process. In the proposed system, an algorithm is proposed to have data locality aware job scheduling. This algorithm is named as Data Locality Aware Job Scheduling (DLAJS) algorithm. The algorithm explores the data locality aware to know how far efficient job scheduling. Thus, consuming less cloud resources such as CPU, memory and execution time.

[1] Patel, A.B., M. Birla, and U. Nair. Addressing big data problem using Hadoop and Map Reduce. in Engineering (NUiCONE), 2012 Nirma University International Conference on. 2012.
[2] HamoudAlshammari, Jeongkyu Lee and Hassan Bajwa. (2016). H2Hadoop: Improving Hadoop Performance using the Metadata of Related Jobs. IEEE TRANSACTIONS ON Cloud Computing, p1-11.
[3] Herodotou, H., Hadoop performance models. arXiv preprint arXiv:1106.0940, 2011.
[4] Xu, W., W. Luo, and N. Woodward. Analysis and optimization of data import with Hadoop. IEEE.
[5] P.Radhika Raju, Dr. A.Ananda Rao, Optimization of program invariants, ACM SIGSOFT Software Engineering Notess, Vol.39, Issue 1, January 2014.
[6] Palanisamy, B., et al. Purlieus: locality-aware resource allocation for MapReduce in a cloud. in Proceedings of 2011 International Conference for High Performance Computing, Networking, Storage and Analysis. ACM.
[7] Hammoud, M. and M.F. Sakr. Locality-Aware Reduce Task Scheduling for MapReduce. in Cloud Computing Technology and Science (CloudCom), 2011 IEEE Third International Conference on. 2011.
[8] Chen, M., S. Mao, and Y. Liu, Big Data: A Survey. Mobile Networks and Applications, 2014. 19(2): p. 171-209.
[9] Buck, J.B., et al. SciHadoop: Array-based query processing in Hadoop. in High Performance Computing, Networking, Storage and Analysis (SC), 2011 International Conference for. 2011.
[10] Condie, T., et al.MapReduce NSDI.2010
[11] Schatz, M.C., B. Langmead, and S.L. Salzberg, Cloud computing and the DNA data race. Nature biotechnology, 2010. 28(7): p. 691.
[12] Changqing, J., et al. Big Data Processing in Cloud Computing Environments. in Pervasive Systems, Algorithms and Networks (ISPAN), 2012 12th International Symposium on. 2012.
[13] Farrahi, K. and D. Gatica-Perez, A probabilistic approach to mining mobile phone data sequences. Personal Ubiquitous Comput., 2014. 18(1): p. 223-238.
[14] Jagadish, H., et al., Big data and its technical challenges. Communications of the ACM, 2014. 57(7): p. 86-94.
[15] Marx, V., Biology: The big challenges of big data. Nature, 2013. 498(7453): p. 255-260.
[16] Schadt, E.E., et al., Computational solutions to large-scale data management and analysis. Nature Reviews Genetics, 2010. 11(9): p. 647-657.
[17] Ming, M., G. Jing, and C. Jun-jie. Blast-Parallel: The parallelizing implementation of sequence alignment algorithms based on Hadoop platform. in Biomedical Engineering and Informatics (BMEI), 2013 6th International Conference on. 2013.
[18] Wu, S., et al. Query optimization for massively parallel data processing. in Proceedings of the 2nd ACM Symposium on Cloud Computing. 2011. ACM.

Cloud computing, Big data, Hadoop, MapReduce framework,Data-locality,Job scheduling