Performance Comparison of Two Streaming Data Clustering Algorithms

International Journal of Computer Trends and Technology (IJCTT)          
© 2014 by IJCTT Journal
Volume-12 Number-2
Year of Publication : 2014
Authors : Chandrakant Mahobiya , Dr. M. Kumar
DOI :  10.14445/22312803/IJCTT-V12P111


Chandrakant Mahobiya , Dr. M. Kumar."Performance Comparison of Two Streaming Data Clustering Algorithms". International Journal of Computer Trends and Technology (IJCTT) V12(2):56-59, June 2014. ISSN:2231-2803. Published by Seventh Sense Research Group.

Abstract -
The weighted fuzzy c-mean clustering algorithm (WFCM) and weighted fuzzy c-mean-adaptive cluster number (WFCM-AC) are extension of traditional fuzzy c-mean algorithm to stream data clustering algorithm. Clusters in WFCM are generated by renewing the centers of weighted cluster by iteration. On the other hand, WFCM-AC generates clusters by applying WFCM on the data & selecting best K± initialize center. In this paper we have compared these two methods using KDD-CUP’99 data set. We have compared these algorithms with respect to number of valid clusters, computational time and mean standard error.

[1] Aggarwal, J. Han, J. Wang, and P.S. Yu, “ A Framework for Clustering Evolving Data Streams,” Proc. 2 th Int’l Conf. Very Large Data Bases (VLDB), 2003.
[2] A. Zhou, F. Cao, Y. Yan, C. Sha, and X. He, “Distributed Data Stream Clustering: A Fast EM-Based Approach,” Proc. 23rd Int’l Conf. Data Eng., 2007.
[3] H. Kargupta and B.-H. Park, “A Fourier Spectrum-Based Approach to Represent Decision Trees for Mining Data Streams in Mobile Environments,” IEEE Trans. Knowledge Data Eng.,vol. 16, no. 2, pp. 216-229, Feb. 2004.
[4] P. Zhang, X. Zhu, and Y. Shi, “Categorizing and Mining Concept Drifting Data Streams,” Proc. 14th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, 2008.
[5] P. Wang, H. Wang, X. Wu, W. Wang, and B. Shi, “A Low- Granularity Classifier for Data Streams with Concept Drifts and Biased Class Distribution,” IEEE Trans. Knowledge Data Eng., vol. 19, no. 9, pp. 1202-1213, Sept. 2007.
[6] J. Han and M. Kamber, Data Mining: Concepts and Techniques, J. Kacprzyk and L. C. Jain, Eds. Morgan Kaufmann, 2006, vol. 54, no. Second Edition.
[7] C. C. Aggarwal, J. Han, J. Wang, and P. S. Yu., A framework for clustering evolving data streams, In Proc. of VLDB, 2003, pp. 81– 92.
[8] Zhang, Ramakrishnan, and L. M., "BIRCH: An efficient data clustering method for very large databases " presented at ACM SIGMOD Conference on Management of Data, 1996.
[9] S. Guha, A. Meyerson, N. Mishra, R. Motwani, and L. O’Callaghan, “Clustering Data Streams: Theory and Practice,” IEEE Trans. Knowledge Data Eng., vol. 15, no. 3, pp. 515-528, May 2003.
[10] S. Guha, N. Mishra, R. Motwani, and L. O’Callaghan, “Clustering Data Streams,” Proc. 41st Ann. IEEE Symp. Foundations of Computer Science, 2000.
[11] B. Babcock, M. Datar, and R.M.L. O’Callaghan, “Maintaining Variance and k-Medians over Data Stream Windows,” Proc. 22nd ACM Symp. Principles of Databases Systems, 2003.
[12] C.C. Aggarwal, J. Han, J. Wang, and P.S. Yu, “A Framework for Clustering Evolving Data Streams,” Proc. 29th Int’l Conf. Very LargeData Bases (VLDB), 2003.
[13] C.C. Aggarwal, J. Han, J. Wang, and P.S. Yu, “On High Dimensional Projected Clustering of Data Streams,” Data Mining and Knowledge Discovery, vol. 10, pp. 251-273, 2005.
[14] F. Cao, M. Ester, W. Qian, and A. Zhou, “Density-Based Clustering over an Evolving Data Stream with Noise,” Proc. Sixth SIAM Int’l Conf. Data Mining, 2006.
[15] Y. Chen and L. Tu, “Density-Based Clustering for Real-Time Stream Data,” Proc. 13th ACM SIGKDD Int’l Conf. Knowledge Discovery and Data Mining, 2007.
[16] M. Khalilian, N. Mustapha, M. N. Sulaiman, and F. Z. Boroujeni, "K-Means Divide and Conquer Clustering," presented at ICCAE, Thiland, Bangkok, 2009.
[17] S. Lühr and M. Lazarescu, "Incremental clustering of dynamic data streams using connectivity based representative points," Data & Knowledge Engineering, vol. 68, pp. 1-27, 2009.
[18] K. Udommanetanakit, T. Rakthanmanon, and K. Waiyamai, “E-stream: Evolution-based technique for stream clustering,” in Proceedings of the 3rd international conference on Advanced Data Mining and Applications, ser. ADMA.
[19] J. Gao, J. Li, Z. Zhang, and P.-N. Tan, “An incremental data stream lustering algorithm based on dense units detection,” in Proceedings of the 9th Pacific-Asia conference on Advances in Knowledge Discovery and Data Mining, ser.
[20] C. Jia, C. Tan, and A. Yong, “A grid and density-based clustering algorithm for processing data stream,” in Proceedings of the 2008Second International Conference on Genetic and Evolutionary Computing, ser. WGEC ’08. Washington, DC, USA: IEEE Computer Society, 2008, pp. 517–521.
[21] W. Meesuksabai, T. Kangkachit, and K. Waiyamai, “Hue-stream: Evolution-based clustering technique for heterogeneous data streams with uncertainty.” in ADMA (2), ser. Lecture Notes in Computer Science, vol. 7121. Springer, 2011, pp. 27–40.
[22] Tai Wai Cheng, Dmitry B. Goldgof, Lawrence O. Hall (1998). “Fast fuzzy clustering”. Fuzzy Sets and Systems. pp. 49-56.
[23] David Altman (1999). “Efficient Fuzzy Clustering of Multispectral Images”. Proceedings of international Geoscience and Remote Sensing Symposium. pp. 1594-1596.
[24] Richard J. Hathaway, James C. Bezdek (2006). “Extending Fuzzy and Probabilistic Clustering to Very Large Data Sets”, Journal of Computational Statistics and Data Analysis.Vol.51, No.1, pp. 215-234.
[25] Robert Cannon, Janison V. Dave, and James C. Bezdek (1986). “Efficient implementation of the fuzzy c-means clustering algorithms”, IEEE Transaction on Pattern Analysis and Machine Intelligence. Vol.8, No.2, pp. 248- 255.
[26] Chin-Hsiung Wu, Shi-Jinn Horng,Yi-Wen Chen and Wei- Yi Lee (2000). “Designing Scalable and Efficient Parallel Clustering Algorithms on Arrays with Reconfigurable Optical Buses”. Image and Vision Computing.Vol.18, No.13, pp.1033–1043.
[27] Moh’d Belal AL-Zoubi, Amjad Hudaib, Bashar Al-Shboul (2007). ”A Fast Fuzzy Clustering Algorithm”. Proceedings of the 6th WSEAS international conference On Artificial Intelligence, Knowledge Engineering and Data Bases. pp. 28-32.
[28] R. Wan, X. Yan, and X. Su, "A Weighted Fuzzy Clustering Algorithm for Data Stream," presented at ISECS International Colloquium on Computing, Communication, Control, and Management CCCM`08, 2008.
[29] S.Mostafavi,and A.Amiri ”Extending Fuzzy C-means to Clustering Data Streams” 20th Iranian Conference on Electrical Engineering, ICEE Iran 2012.

Streaming data, weighted fuzzy c-mean, weighted fuzzy c-mean-adaptive clustering.