Concepts and Technologies of Big Data Management and Hadoop File System

  IJCTT-book-cover
 
International Journal of Computer Trends and Technology (IJCTT)          
 
© 2017 by IJCTT Journal
Volume-44 Number-2
Year of Publication : 2017
Authors : Balu Srinivasulu, Andemariam Mebrahtu
DOI :  10.14445/22312803/IJCTT-V44P114

MLA

Balu Srinivasulu, Andemariam Mebrahtu; "FiLeD: File Level Deduplication Approach". International Journal of Computer Trends and Technology (IJCTT) V44(2):80-88, February 2017. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract -
In the digital era, uncontrolled data growth is a huge problem. This paper intends to cover the various data storage medium and their backup patterns adopted by end users for their personal data. With respect to an individual concern; the rate of increase in personal data is directly proportional to storage space issues; we focus on an implementation of file-level deduplication, which keeps away the duplicate files. This increases the storage capacity making a room for new data. It also illustrates the comparison of compression, deduplication, and deduplication with compression. We conclude that data will continue to grow and users should seek intelligent methods to shrink the storage space.

References
[1]. DunrenChe, MejdlSafran, ZhiyongPeng,"From Big Data to Big Data Mining: Challenges, Issues, and Opportunities", DASFAA Workshops 2013, LNCS 7827, pp. 1–15, 2013
[2]. Venkata Narasimha Inukollu , Sailaja Arsi and Srinivasa Rao Ravuri “Security issues associated with big data in cloud computing “International Journal of Network Security & Its Applications (IJNSA), Vol.6, No.3, May 2014.
[3]. DDai, Jinquan, et al.,“Hitune: dataflow-based performance analysis for big data cloud”, Proc. of the 2011 USENIX ATC (2011), pp. 87-100. [Online]Available:https://www.usenix.org/legacy/event/atc11/tech/final_files/Dai.pdf.
[4]. KK, Chitharanjan, and Kala Karun A. "A review on hadoop — HDFS infrastructure extensions.” JeJu Island: 2013, pp. 132-137, 11-12 Apr. 2013.
[5]. Lohr, Steve. “The Age of Big Data.” New York Times. 11 Feb, 2012. http://www.nytimes.com/2012/02/12/sunday-review/big-datas-impact-in-the-world.html?_r=2& pagewanted=all
[6]. D. Borthakur, “The hadoop distributed ? le system: Architecture and design,” Hadoop Project Website, vol. 11, 2007
[7]. Wie, Jiang, Ravi V.T, and Agrawal G. "A Map-Reduce System with an Alternate API for Multi-core Environments.” Melbourne, VIC: 2010, pp. 84-93, 17-20 May. 2010
[8]. JJefry Dean, Sanjay Ghemwat,"Mapreduce: A Flexible Data Processing Tool", communications of the ACM, Vol. 53, Issuse 1, January 2010, pp. 72-77.
[9]. Manyika, James, Michael Chui, Brad Brown, Jacques Bughin, Richard Dobbs, Charles Roxburgh and Angela H. Byers. “Big data: The next frontier for innovation, competition, and productivity.” McKinsey Global Institute (2011): 1-137. May 2011.
[10]. ! Big data: The next frontier for innovation, competition, and productivity. McKinsey Global Institute, June 2011.http://www.mckinsey.com/mgi/publications/big_data/pdfs/MGI_big_data_full_report.pdf
[11]. Boyd, Dana and Crawford, Kate. “Six Provocations for Big Data.” Working Paper - Oxford Internet Institute. 21 Sept. 2011http://papers.ssrn.com/sol3/papers.cfm?abstract_id=1926431>
[12]. Villars, R. L., Olofson, C. W., & Eastwood, M. (2011, June). Big data: What it is and why you should care. IDC White Paper. Framingham, MA: IDC.
[13]. F.C.P, Muhtaroglu, Demir S, Obali M, and Girgin C. "Busines on big dataapplications." Big Data, 2013 IEEE International Conference, Silicon Valley, CA, Oct 6-9, 2013, pp.32 - 37.

Keywords
Big Data, Hadoop, Map Reduce