An Efficient Clustering and Distance Based Approach for Outlier Detection

International Journal of Computer Trends and Technology (IJCTT)          
© - July Issue 2013 by IJCTT Journal
Volume-4 Issue-7                           
Year of Publication : 2013
Authors :Garima Singh, Vijay Kumar


Garima Singh, Vijay Kumar "An Efficient Clustering and Distance Based Approach for Outlier Detection"International Journal of Computer Trends and Technology (IJCTT),V4(7):2067-2072 July Issue 2013 .ISSN Published by Seventh Sense Research Group.

Abstract: - Outlier detection is a substantial research problem in the domain of data mining that aims to uncover objects which exhibit significantly different, exceptional and inconsistent from rest of the data. Outlier detection has been widely researched and finds use within various application domains including tax fraud detection, network robustness analysis, network intrusion and medical diagnosis. In this paper we propose an efficient clustering and distance based outlier detection technique. The clustering algorithms employed for this task are PAM, CLARA and CLARANS and a novel clustering algorithm I-CLARANS is proposed. The process of outlier detection is divided into two stages. In the first stage clustering is performed and in the second stage outlier detection is performed. The purpose is to perform clustering and outlier mining simultaneously. The experimental results depict that the proposed method is effective and promising in practice. We also present comparison of proposed algorithm with existing algorithms to validate its advantage in outlier detection.


[1] Bigus Joseph P., Data Mining with Neural Networks, McGraw–Hill, U.S.A., 1996
[2] Williams, G., Baxter, R., He, H., Hawkins, S., and Gu, L.A comparative study of rnn for outlier detection in data mining. In Proceedings of the 2002 IEEE International Conference on Data Mining. IEEE Computer Society, USA, 2002
[3] J. J. Han, and M. Kamber, “Data Mining Concepts and Techniques,” Morgan Kaufmann, USA, 2001.
[4] E. Knorr and et al. Distance-based outliers: Algorithms and applications. VLDB Journal, 2000.
[5] E. Knorr and R. Ng. A unified notion of outliers: Properties and computation. In ACM SIGKDD, 1997
[6] E. Knorr and R. Ng. Finding intentional knowledge of distance-based outliers. In VLDB, 1999.
[7] V. Barnett and T. Lewis. Outliers in Statistical Data. John Wiley, 1994.
[8] K. Yamanishi and J. Takeuchi.On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. In Proceedings of Data Min. Knowledge Discovery. Vol. 8, No. 3, pp 275-300, 2004
[9] K. Yamanishi and J. Takeuchi, 2001. Discovering outlier filtering rules from unlabeled data-combining a supervised learner with an unsupervised learner. In Proceedings of KDD, pp 389-394, 2001
[10] R. Nuts and P. Rousseeuw. Computing depth contours of bivariate point clouds. Computational Statistics and Data Analysis, Vol 23, No 2, pp 153-168, 1996
[11] M. F. Jiang, S.S. Tseng, and C.M. Su. “Two-phase clustering process for outliers detection”, Pattern Recognition Letters, Vol 22, No. 6-7, pp. 691-700, 2001
[12] Zengyou He, Shengchun Deng , Xiaofei Xu. Outlier detection integrating semantic knowledge. In Proceedings of Third international Conference on Advances in Web-Age information Management., Vol. 2419. pp 126-131. 2002.

Keywords : — Outlier detection, Data Mining, Clustering, PAM, CLARA, CLARANS.