Literature Review: An Efficient Clustering Approach to Big Data

Satish S. Banait; Tanuja B. Kaklij; Gauri K. Bankar; Srushti B. Hire; Digvijay B. Wagh

doi:10.14445/22312803/ IJCTT-V71I2P105

Research Article | Open Access | Download PDF

Volume 71 | Issue 2 | Year 2023 | Article Id. IJCTT-V71I2P105 | DOI : https://doi.org/10.14445/22312803/IJCTT-V71I2P105

Literature Review: An Efficient Clustering Approach to Big Data

Satish S. Banait, Tanuja B. Kaklij, Gauri K. Bankar, Srushti B. Hire, Digvijay B. Wagh

Received	Revised	Accepted	Published
24 Dec 2022	25 Jan 2023	05 Feb 2023	17 Feb 2023

Citation :

Satish S. Banait, Tanuja B. Kaklij, Gauri K. Bankar, Srushti B. Hire, Digvijay B. Wagh, "Literature Review: An Efficient Clustering Approach to Big Data," International Journal of Computer Trends and Technology (IJCTT), vol. 71, no. 2, pp. 25-31, 2023. Crossref, https://doi.org/10.14445/22312803/ IJCTT-V71I2P105

Abstract

In today’s era, data generated by scientific applications and the corporate environment has grown rapidly, not only in size but also in variety. There is difficulty in collecting, storing, transforming, and analyzing such big data. One of the major issues with big data is that the time taken to execute the traditional algorithms is larger, and it is very difficult to process a huge amount of data. Clustering is one of the popular data mining tasks. It is used in various domains. Machine learning is well-known for its unsupervised learning methods, such as the K-Means clustering algorithm. It has the benefits of easy implementation, good effect, and simplicity of the concept. But as the Internet expanded rapidly, the number of data collection points also increased, leading to the era of big data and information explosion. This research work proposes the IK-ABC (Improved K-Means - Artificial Bee Colony) Algorithm to address the issue of k-means clustering algorithms, such as low global search ability, sensitive selection of cluster center, initialization randomness, early development, and slow convergence of the original artificial bee colony Algorithm. A fitness function adapted to the K-means clustering method and a position update formula based on global guidance was created with MapReduce to speed up computation and increase the effectiveness of the iterative optimization process.

Keywords

Big Data, Clustering Algorithms, MapReduce, Swarm Optimization Techniques.

References

[1] Guma Abdulkhader Lakshen, Sanja Vranes, and Valentina Janev, “Big Data & Quality- A Literature Review,” 24th Telecommunications forum TELFOR, pp. 1-4, 2016. Crossref, https://doi.org/10.1109/TELFOR.2016.7818902
[2] Prajesh P. Anchalia, Anjan K. Koundinya, and Srinath N. K, “Map Reduce Design of K-means Clustering Algorithm,” IEEE International Conference on Information Science and Applications (ICISA), pp. 1-5, 2013. Crossref, https://doi.org/10.1109/ICISA.2013.6579448
[3] Chen Jie et al., “Review on the Research of K-means Clustering Algorithm in Big Data,” IEEE, International Conference on Electronics and Communication Engineering, pp. 107-111, 2020. Crossref, https://doi.org/10.1109/ICECE51594.2020.9353036
[4] R Rawat and R Yadav, “Big Data: Big Data Analysis, Issues and Challenges and Technologies,” IOP Conference SeriesMaterials Science and Engineering, vol. 1022, 2021. Crossref, https://doi.org/10.1088/1757-899X/1022/1/012014
[5] Abdulbaset S. Albaour, and Yousof A. Aburawe, “Big Data: Review Paper,” International Journal Of Advance Research And Innovative Ideas In Education, vol. 7, no. 1, 2021.
[6] Chun‑Wei Tsai et al., “Big Data Analytics: A Survey,” Journal of Big Data, vol. 2, no. 20, 2015. Crossref, https://doi.org/10.1186/s40537-015-0030-3
[7] Fatema Jamnagarwala, and P.A.Tijare "Implementation of Data Mining With lustering of Big data for Shopping mall’s data using SOM and K-means Algorithm," International Journal of Computer Trends and Technology, vol. 67, no. 12, pp. 3-7, 2019. Crossref, https://doi.org/10.14445/22312803/IJCTT-V67I12P102
[8] Adil Fahad et al., “A Survey of Clustering Algorithms for Big Data: Taxonomy and Empirical Analysis,” IEEE Transactions on Emerging Topics in Computing, vol. 2, no. 3, pp. 267-279, 2013. Crossref, https://doi.org/10.1109/TETC.2014.2330519
[9] Bao Chong, “K-Means Clustering Algorithm: A Brief Review,” Academic Journal of Computing & Information Science, vol. 4, no. 5, 2021. Crossref, https://doi.org/10.25236/AJCIS.2021.040506
[10] Shi Na, Liu Xumin, and Guan Yong “Research on k-means Clustering Algorithm”, 3 rd Intl Symposium on Intelligent Information Technology and Security Informatics, pp. 63-67, 2010. Crossref, https://doi.org/10.1109/IITSI.2010.74
[11] Unnati R. Raval, and Chaita Jani, “Implementing & Improvisation of K-means Clustering Algorithm,” International Journal of Computer Science & Mobile Computing, vol. 5, no. 5, pp. 191-203, 2016.
[12] Ajit Kumar, Dharmender Kumar, and S. K. Jarial, “A Review on Artificial Bee Colony Algorithms and Their Applications to Data Clustering,” Cybermetics and Information Technologies, vol. 17, no. 3, pp. 3-28, 2017. Crossref, https://doi.org/10.1515/cait2017-0027
[13] Yi Yang, and Ke Luo, “An Artificial Bee Colony Algorithm Based on Improved Search Strategy,” 2nd International Conference on Artificial Intelligence and Information, no. 191, pp. 1-4, 2021. Crossref, https://doi.org/10.1145/3469213.3470398
[14] Wei-Feng Gao et al., “Artificial Bee Colony Algorithm Based on Information Learning,” IEEE Transactions On Cybernetics, vol. 45, no. 12, pp. 2827-2839, 2015. Crossref, https://doi.org/10.1109/TCYB.2014.2387067
[15] S. Sudhakar Ilango et al., “Optimization using Artificial Bee Colony Based Clustering Approach for Big Data,” Cluster Computing, vol. 22, no. 5, pp. 12169-12177, 2019. Crossref, https://doi.org/10.1007/s10586-017-1571-3
[16] Zhenrong Zhang, Jiayi Lan and Zhenrong Zhang, “K-Means Clustering Algorithm Based on Bee Colony Strategy,” 2nd Internation Conference on Signal Processing and Computer Science, 2021. Crossref, https://doi.org/10.1088/1742- 6596/2031/1/012058
[17] Sabreen Fawzi Raheem, and Maytham Alabbas “Optimal K-Means Clustering Using Artificial Bee colony Algorithm with Variable Food Sources Length,” International Journal of Electrical and Computer Engineering (IJECE), vol. 12, no. 5, 2022. Crossref, http://doi.org/10.11591/ijece.v12i5.pp5435-5443
[18] Ting-En Lee, Jao-Hong Cheng, and Lai-Lin Jiang, “A New Artificial Bee Colony Based Clustering Method & its Application to the Business Failure Prediction,” International Symposium on Computer, Consumer and Control, pp. 72-75, 2012. Crossref, https://doi.org/10.1109/IS3C.2012.28
[19] Ranjit Rajak, Satish Chaurasiya, and Anjali Choudhary, "Integration of Big Data and Cloud Computing: Tools, Issues, and Reliability," International Journal of Engineering Trends and Technology, vol. 70, no. 11, pp. 170-177, 2022. Crossref, https://doi.org/10.14445/22315381/IJETT-V70I11P218
[20] P. Sudha, and R. Gunavathi, “A Survey Paper on Map Reduce in Big Data,” International Journal of Science and Research, vol. 5, no. 9, 2016.
[21] Seema Maitreya, and C.K. Jha, “MapReduce: Simplified Data Analysis of Big Data,” Procedia Computer Science, vol. 57, pp. 563-571, 2015. Crossref, https://doi.org/10.1016/j.procs.2015.07.392
[22] Muthu Dayalan, “MapReduce: Simplified Data Processing on Large Cluster,” International Journal of Research and Engineering, vol. 5, no. 5, PP. 399-403, 2018. Crossref, https://doi.org/10.21276/ijre.2018.5.5.4
[23] Hongqin Wang et al., “Research & Application of Improved K-Means Based on MapReduce,” Journal of Physics: Conference Series, vol. 1651, no. 1, pp. 12074, 2020. Crossref, https://doi.org/10.1088/1742-6596/1651/1/012074
[24] Oussama Lachiheb, Mohamed Salah Gouider, and Lamjed Ben Said, “An Improved MapReduce Design of Kmeans with Iteration Reducing for Clustering Stock Exchange the Very Large Datasets,” 11th International Conference on Semantics, Knowledge and Grids, pp. 252-255, 2015. Crossref, https://doi.org/10.1109/SKG.2015.24
[25] Jiyang Jia, Hui Xie, and Tao Xu, “Design and Implementation of K-Means Parallel Algorithm Based on Hadoop,” 2nd International Conference on Artificial Intelligence and Information Systems, no. 206, pp. 1-4, 2021. Crossref, https://doi.org/10.1145/3469213.3470413
[26] Li Ma et al., “An Improved K-means Algorithm based on Mapreduce and Grid,” International Journal of Grid and Distributed Computing, vol. 8, no.1, pp.189-200, 2015. Crossref, https://doi.org/10.14257/ijgdc.2015.8.1.18
[27] K.Iswarya, "Security Issues Associated With Big Data in Cloud Computing," SSRG International Journal of Computer Science and Engineering , vol. 1, no. 8, pp. 1-5, 2014. Crossref, https://doi.org/10.14445/23488387/IJCSE-V1I8P101
[28] Anan Banharnsakun, “A MapReduce-Based Artificial Bee Colony for Large Scale Data Clustering,” Pattern Recognition Letters, vol. 93, pp. 78-84, 2016. Crossref, https://doi.org/10.1016/j.patrec.2016.07.027
[29] Parikshit Patil et al., “Optimization of Data using Artificial Bee Colony Optimization with Map Reduce,” ITM Web of Conference, vol. 32, no. 3031, pp. 1-6, 2020. Crossref, https://doi.org/10.1051/itmconf/20203203031
[30] Nupur Bansal, Sanjay Kumar, and Ashish Tripathi, “Application of Artificial Bee Colony Algorithm Using Hadoop,” IEEE 3rd International Conference on Computing for Sustainable Global Development, pp. 3615-3619, 2016.
[31] S.A.Gowri Manohari, and S.Jawahar “Large Biological Dataset Analysis Using Enhanced Map Reducing Method With Modified Artificial Bee Colony Optimization (MABC),” Journal of Emerging Technologies and Innovative Research, vol. 5, no. 12, 2018.
[32] Satish S. Banait, S. S. Sane and Sopan A.Talekar, “An Efficient Clustering for Big Data Mining”, International Journal of NextGeneration Computing, vol. 13, no. 3, pp. 702-717, 2022.
[33] Ajit Kumar, Dharmender Kumar, and S. K. Jarial, “A Novel Hybrid K-Means & Artificial Bee Colony Algorithm Approach for Data Clustering,” Decision Science Letters, vol. 7, no. 1, pp. 65–76, 2018. Crossref, https://doi.org/10.5267/j.dsl.2017.4.003