A Survey on Data Aggregation in Big Data and Cloud Computing

International Journal of Computer Trends and Technology (IJCTT)          
© 2014 by IJCTT Journal
Volume-17 Number-1
Year of Publication : 2014
Authors : N.Karthick , X.Agnes Kalarani


N.Karthick , X.Agnes Kalarani. "A Survey on Data Aggregation in Big Data and Cloud Computing". International Journal of Computer Trends and Technology (IJCTT) V17(1):28-32, Nov 2014. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.

Abstract -
Cloud computing, rapidly emerging as a new computation concept, offers agile and scalable resource access in a utility-like fashion, particularly for the processing of big data. An important open problem here is to effectively progress the data, from various geographical locations more time, into a cloud for efficient processing. Big Data introduces to datasets whose sizes are beyond the capability of typical database software tools to capture, accumulate, maintain and examined. Big Data is not just about the size of data but also contains data variety and data velocity. Simultaneously, these three attributes known as volume, velocity and variety form the three Vs of Big Data. The application of Big Data differs across verticals since of the several challenges that bring about the various use cases. The principle is that data aggregation is the response to maintaining up with the ever improving demands of big data. Data aggregation is a kind of data and information mining progression where data is explored, collected and presented in a report-based, shortened format to accomplish specific business purposes or processes and/or perform human analysis. Such information aggregation appears with natural issues, such as provision of poor quality, incorrect, inappropriate or fraudulent information. In this survey we discuss various methods of data aggregation in big data and cloud.

[1] Herodotou H, Lim H, Luo G et al. Starsh: A self-tuning system for big data analytics. In Proc. the 15th CIDR, Apr. 2011, pp.261-272.
[2] Wu S, Ooi B C, Tan K L. Continuous sampling for online aggregation over multiple queries. In Proc. the 2010 International Conference on Management of Data (SIGMOD), June 2010, pp.651-662.
[3] Chaudhuri S, Das G, Datar M et al. Overcoming limitations of sampling for aggregation queries. In Proc. the 17th Int.Conf. Data Engineering, Apr. 2001, pp.534-544.
[4] Laptev N, Zeng K, Zaniolo C. Early accurate results for advanced analytics on MapReduce. PVLDB, 2012, 5(10): 1028-1039.
[5] Yu-Xiang Wang, Jun-Zhou Luo, Ai-Bo Song, Fang Dong, “Partition-Based Online Aggregation with Shared Sampling in the Cloud”, Journal of Computer Science and Technology, November 2013, Volume 28, Issue 6, pp 989-1011.
[6] Hadassa Daltrophe, Shlomi Dolev and Zvi Lotker, “Data Interpolation: An Efficient Sampling Alternative for Big Data Aggregation”, CoRR abs/1210.3171 (2012).
[7] Linquan Zhang, Chuan Wu, Zongpeng Li, Chuanxiong Guo, Minghua Chen, and Francis C.M. Lau, “Moving Big Data to The Cloud: An Online Cost-Minimizing Approach”, IEEE Journal On Selected Areas In Communications, VOL. 31, NO. 12, DEC 2013.
[8] Tomas Knap, Jan Michelfeit, “Linked Data Aggregation Algorithm: Increasing Completeness and Consistency of Data”, Provided by Charles University, Jun 2012.
[9] Rabinovici-Cohen.S, Marberg.J, Nagin. K and Pease. D, “PDS Cloud: Long Term Digital Preservation in the Cloud”, IC2E `13 Proceedings IEEE International Conference on Cloud Engineering, pp.38-45, 2013.
[10] COSTA. P, DONNELLY. A, ROWSTRON. A and O’SHEA.G, “Camdoop: exploiting in-network aggregation for big data applications”, In USENIX NSDI (2012).
[11] Satoshi Tsuchiya, Yoshinori Sakamoto, Yuichi Tsuchimoto and Vivian Lee, “Big data processing in cloud environments”, FUJITSU Sci. Tech. J., Vol. 48, No. 2, pp. 159–168 (April 2012).

Big Data, Cloud Computing, Data Management, Data Aggregation