Comparative Analysis of Clustering Approaches for Big Data Analysis

  IJCTT-book-cover
 
         
 
© 2022 by IJCTT Journal
Volume-70 Issue-3
Year of Publication : 2022
Authors : Satish S. Banait, and Shrish S. Sane
DOI :  10.14445/22312803/IJCTT-V70I3P105

How to Cite?

Satish S. Banait, and Shrish S. Sane, "Comparative Analysis of Clustering Approaches for Big Data Analysis," International Journal of Computer Trends and Technology, vol. 70, no. 3, pp. 27-33, 2022. Crossref, https://doi.org/10.14445/22312803/IJCTT-V70I3P105

Abstract
This paper performs a comparative study of the most popular big data clustering techniques. Clustering is an unsupervised classification of patterns (observations, data items or feature vectors) into teams (clusters). The drawbacks of clustering have been noticed in several contexts by researchers in many disciplines and react to its broad charm and quality in concert with the steps in exploratory data analysis. K-means clustering algorithm falls underneath the category of centroid-based clustering. Hierarchical clustering is a cluster analysis technique that seeks to construct a hierarchy of clusters. Agglomerative clustering is a form of hierarchical clustering that uses the backside-up technique. Density-based Spatial Clustering of Algorithms with Noise (DBSCAN) is a clustering algorithm that organisations collectively point near every other primarily based on a distance dimension (Euclidean distance) and a minimal quantity of factors. Map-reduce is a programming paradigm for huge datasets which may be processed speedily by processing them on distributed clusters in parallel. This paper compares k-means, hierarchical agglomerative clustering, DBSCAN and k-means with map-reduce strategies for clustering big data.

Keywords
Big Data, Clustering Strategies, Density-Based Spatial Clustering, Hierarchical Agglomerative Clustering, K-Means.

Reference

[1] Yang Liu, Shuaifeng Ma, and Xinxin Du, A Novel Effective Distance Measure and a Relevant Algorithm for Optimising the Initial Cluster Centroids of K-means IEEE Access Early Access, DOI: 10.1109/ACCESS.2020.3044069, (2021).
[2] Dhanachandra, N., Manglem, K., & Chanu, Y. J. Image Segmentation Using K -means Clustering Algorithm and Subtractive Clustering Algorithm, Procedia Computer Science, 54 764–771.
[3] Habib, S. T., & Zahid, A. An Analysis of MapReduce Efficiency in Document Clustering using Parallel K-Means Algorithm, Future Computing & Informatics Journal, (2018).
[4] Siddiqui, F. U., & Mat Isa, N. A., Enhanced moving K-means (EMKM) algorithm for image segmentation, IEEE Transactions on Consumer Electronics, 57(2) (2011) 833–841.
[5] Tleis, M., Callieris, R., & Roma, R., Segmenting the organic food market in Lebanon: an application of K-means cluster analysis, British Food Journal, 119(7) (2017) 1423–1441.
[6] Sridharan, K., & Sivakumar, P., A Systematic Review On Feature Selection and Classification Techniques for Text Mining, International Journal of Business Information Systems, 28(4) (2018) 504–518 .
[7] Tal, G. Dend Extend An R Package for Visualising, Adjusting and Comparing Trees Of Hierarchical Clustering. Bioinformatics, 22 (2015) 3718–3720.
[8] Premkumar, M. S., & Ganesh, S. H., A Median Based External Initial Centroid Selection Method for K-Means Clustering, 143–146, 2017.
[9] Cohen-Addad, V., Approximation Schemes for Capacitated Clustering in Doubling measures, (2018).
[10] Friggstad, Z., Khodamoradi, K., & Salavatipour, M. R., Exact Algorithms and Lower Bounds for Stable Instances of Euclidean k-means, Proceedings of the Thirtieth Annual ACM-SIAM Symposium on Discrete Algorithms, (2019)2958–2972..
[11] Stemmer, U., Locally Private k -Means Clustering, 2020.
[12] Chakraborty, S., & Das, S., k means Clustering with a New Divergence-Based Distance Measure: Convergence and Performance Analysis, Pattern Recognition Letters, (2017).
[13] Celebi, M. E., Kingravi, H. A., & Vela, P. A., A comparative study of Efficient Initialisation Methods For The K-Means Clustering Algorithm, Expert Systems with Applications: An International Journal, 40 (2013).
[14] Lei, J., Jiang, T., Wu, K., Du, H., Zhu, G., & Wang, Z., Robust K-Means Algorithm with Automatically Splitting and Merging Clusters and its Applications for Surveillance Data, Multimedia Tools And Applications, 75(19) (2016) 12043–12059.
[15] Adil Abdu Bushra and Gangman Yi, Comparative Analysis Review of Pioneering DBSCAN and Successive Density-Based Clustering Algorithms, IEEE Access, 9 (2021) 87918 – 8793.
[16] T. N. Tran, K. Drab, and M. Daszykowski, Revised DBSCAN Algorithm to cluster data with dense adjacent clusters, Chemometrics Intell. Lab. Syst., 120 (2013) 92-96.
[17] H. Chebi, D. Acheli, and M. Kesraoui, Dynamic detection of Abnormalities in Video Analysis of Crowd Behavior with DBSCAN and Neural Networks, Adv. Sci., Technol. Eng. Syst. J., 1(5) (2016) 56-63.
[18] H. Li, J. Liu, K.Wu, Z. Yang, R.W. Liu, and N. Xiong, Spatio-temporal vessel Trajectory Clustering Based on Data Mapping and Density, IEEE Access, 6 (2018) 58939-58954.
[19] H. Li, J. Liu, Z. Yang, R. W. Liu, K. Wu, and Y. Wan, Adaptively constrained dynamic time warping for time series classification and clustering, Inf. Sci., 534 (2020) 97-116.
[20] R.W. Liu, J. Nie, S. Garg, Z. Xiong, Y. Zhang, and M. S. Hossain, Datadriven trajectory quality improvement for promoting intelligent vessel traffic services in 6G-enabled maritime IoT systems, IEEE Internet Things J., 8(7) (2021) 5374-5385.
[21] A. K. Jain, Data clustering: 50 years beyond K-means, Pattern Recognit. Lett., 31(8) (2010) 651–666.
[22] L. Dalton et al., Clustering algorithms: On learning, Validation, Performance, and Applications to Genomics, Current Genomics, 10(6) (2009) 430–445.
[23] A. Srivastava et al., Statistical shape analysis: Clustering, Learning, and Testing, IEEE Trans. Pattern Anal. Mach. Intell., 27(4) (2005) 590–602.
[24] T. Wu, S. A. N. Sarmadi, V. Venkatasubramanian, A. Pothen and A. Kalyanaraman, Fast svd computations for synchrophasor algorithms, IEEE Transactions on Power Systems, 31(2) (2015) 1651-1652.
[25] Chris Ding and Xiaofeng He, K-Means Clustering via Principal Component Analysis, In proceedings of the 21st International Conference on Machine Learning, Banff, Canada, (2004).