An Efficient Hubness Clustering Model For High Dimensional Data

International Journal of Computer Trends and Technology (IJCTT)          
© 2015 by IJCTT Journal
Volume-30 Number-2
Year of Publication : 2015
Authors : V.Geetha, G.Bharathi


V.Geetha, G.Bharathi "An Efficient Hubness Clustering Model For High Dimensional Data". International Journal of Computer Trends and Technology (IJCTT) V30(2):81-86, December 2015. ISSN:2231-2803. Published by Seventh Sense Research Group.

Abstract -
High dimensional data clustering can be seen in all fields these days and is becoming very tedious process. The important disadvantage of high dimensional data which we can give is that of the curse of dimensionality. As the magnitude of data sets grows the data points become sparse and density of the area becomes less making it difficult to cluster that data which further reduces the performance of traditional algorithms used for clustering .The organization maintains customer or product information in different forms which is difficult to perform clustering. Each data point has different in size and properties, but has to be clustered in meaningful and efficient way to get some knowledge from that. Many strategies have been proposed for clustering high dimensional data, but suffer with the problem of overlapping and retrieval efficiency. The proposed algorithm is basically used for increasing efficiency and accuracy.

[1]. Han, J., Kamber, M.: Data Mining: Concepts and Techniques. 2nd edn. Morgan Kaufman Publishers (2006)
[2] Nenad Tomasev, Milos Radovanovic, Dunja Mladenic, Mirjana Ivanovic.”The Role of Hubness in Clustering High- Dimensional Data IEEE transactions on knowledge and data engineering, vol. 26, no. 3, march 2014.
[3]. Kailing, K., Kriegel, H.P.,Kr¨oger, P.,Wanka, S.: Ranking interesting subspaces for clustering high dimensional data. In: Proc. 7th European Conf. on Principles and Practice of Knowledge Discovery in Databases (PKDD). (2003) 241–252
[4]. Kailing, K., Kriegel, H.P., Kr¨oger, P.: Density-connected subspace clustering for highdimensional data. In: Proc. 4th SIAM Int. Conf. on Data Mining (SDM). (2004) 246–257
[5]Nikita Dhamal, Antara Bhatttacharya : Survey on Hubness - Based Clustering Algorithms. IJSR vol 3,issue 10. pp 2253- 2256.
[6] Nenad Tomaˇsev, Miloˇs Radovanovi´c, Dunja Mladeni´c, and Mirjana Ivanovi´c: Hubness-Based Fuzzy Measures for High-Dimensional k-Nearest Neighbor Classification. proc seventh int’l conf. machine learning and datamining pp 16- 30,2011.
[7] Nenad Tomašev Miloš Radovanovi´c Dunja Mladeni´c: A Probabilistic Approach to Nearest-Neighbor Classification: Naive Hubness Bayesian kNN Proc. 20th ACM int’l Conf. information and knowledgement Management(CIKM),pp.2173-2176,2011
[8] Aggarwal, C.C., Hinneburg, A., Keim, D.A.: On the surprising behavior of distance metrics in high dimensional spaces. In: Proc. 8th Int. Conf. on Database Theory (ICDT). (2001) 420–434
[9] Franc¸ois, D., Wertz, V., Verleysen, M.: The concentration of fractional distances. IEEE Transactions on Knowledge and Data Engineering 19(7) (2007) 873–886
[10] N. Tomaˇsev and D. Mladeni´c, “Hubness-aware shared neighbor distances for high-dimensional k-nearest neighbor classification,” Knowledge and Information Systems, 2013.
[11] M. E. Houle, H.-P. Kriegel, P. Kr¨oger, E. Schubert, and A. Zimek, “Can Shared-Neighbor Distances Defeat the Curse of Dimensionality?” in Scientific and Statistical Database Management, ser. Lecture Notes in Computer Science. Springer Berlin / Heidelberg, 2010, vol. 6187, ch. 34, pp. 482–500.
[12] M. Radovanovi_c, A. Nanopoulos, and M. Ivanovi_c, “Hubs in Space: Popular Nearest Neighbors in High- Dimensional Data,” J. Machine Learning Research, vol. 11, pp. 2487-2531, 2010.
[13] Pradeepa S Dr R.Thamilselvan High-Dimensional Data Clustering using Hubness Based Clustering Algorithms IJSRD - International Journal for Scientific Research & Development| Vol. 3, Issue 02
[14] N. Tomaˇsev, R. Brehar, D. Mladeni´c, and S. Nedevschi, “The influence of hubness on nearest-neighbor methods in object recognition,” in IEEE Conference on Intelligent Computer Communication and Processing, 2011.
[14]E. Muller, S. Gunnemann, I. Assent, and T. Seidl, “Evaluating clustering in subspace projections of high dimensional data,”Proceedings of the VLDB Endowment, vol. 2, pp. 1270–1281, 2009.
[15] J. Aucouturier and F. Pachet, “Improving timbre similarity: How high is the sky?” Journal of Negative Results in Speech and Audio Sciences,vol. 1, 2004.
[16] Miloˇs Radovanovi´c, Dunja Mladeni´c, and Mirjana Ivanovi´c “On the existence of obstinate results in vector space models,” in Proc. 33rd Annual Int. ACM SIGIR Conf. on Research and Development in Information Retrieval, 2010, pp. 186–193.
[17] G. LakshmiPriya, Shanmugasundaram Hariharan, “An Efficient Approach for Generating Frequent Patterns Without Candidate Generation”, ICACCI’12, August 03-05 2012,CHENNAI, India. Copyright 2012 ACM 978-1-4503- 1196-0/12/08$10.00
[18] Hubs in space: Popular nearest neighbors in highdimensional data,Journal of Machine Learning Research, vol 11, pp 2487–2531, 2010. [10] J. Hartigan. Clustering Algorithms. John Wiley & Sons,1975.
[19] Qinbao Song., Jingjie N.i, and Guangtao Wang., “A Fast Clustering- Based Feature Subset Selection Algorithm for High Dimensional Data,” IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 25, NO. 1, JANUARY 2013
[20] Yi-Hong Chu, Jen-Wei Huang, Kun-Ta Chuang, De-Nian Yang., and Ming-Syan Chen., “Density Conscious Subspace Clustering for High-Dimensional Data,” IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 1, JANUARY 2010
[21] Sharadh Ramaswamy., and Kenneth Rose., “ Adaptive Cluster Distance Bounding for High-Dimensional Indexing,” IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 23, NO. 6, JUNE 2011
[22] HANS-PETER KRIEGEL., PEER KROGER., and ARTHUR ZIMEK., “Clustering High-Dimensional Data: A Survey on Subspace Clustering, Pattern-Based Clustering, and Correlation Clustering,” ACM Transactions on Knowledge Discovery from Data, Vol. 3, No. 1, Article 1, Publication date: March 2009.

Dataclustering, Sparsity ,Hubness, Nearest neighbours.