Literature Review: An Efficient Clustering Approach to Big Data

© 2023 by IJCTT Journal
Volume-71 Issue-2
Year of Publication : 2023
Authors : Satish S. Banait, Tanuja B. Kaklij, Gauri K. Bankar, Srushti B. Hire, Digvijay B. Wagh
DOI :  10.14445/22312803/IJCTT-V71I2P105

How to Cite?

Satish S. Banait, Tanuja B. Kaklij, Gauri K. Bankar, Srushti B. Hire, Digvijay B. Wagh, "Literature Review: An Efficient Clustering Approach to Big Data," International Journal of Computer Trends and Technology, vol. 71, no. 2, pp. 25-31, 2023. Crossref,

In today’s era, data generated by scientific applications and the corporate environment has grown rapidly, not only in size but also in variety. There is difficulty in collecting, storing, transforming, and analyzing such big data. One of the major issues with big data is that the time taken to execute the traditional algorithms is larger, and it is very difficult to process a huge amount of data. Clustering is one of the popular data mining tasks. It is used in various domains. Machine learning is well-known for its unsupervised learning methods, such as the K-Means clustering algorithm. It has the benefits of easy implementation, good effect, and simplicity of the concept. But as the Internet expanded rapidly, the number of data collection points also increased, leading to the era of big data and information explosion. This research work proposes the IK-ABC (Improved K-Means - Artificial Bee Colony) Algorithm to address the issue of k-means clustering algorithms, such as low global search ability, sensitive selection of cluster center, initialization randomness, early development, and slow convergence of the original artificial bee colony Algorithm. A fitness function adapted to the K-means clustering method and a position update formula based on global guidance was created with MapReduce to speed up computation and increase the effectiveness of the iterative optimization process.

Big Data, Clustering Algorithms, MapReduce, Swarm Optimization Techniques.


