International Journal of Computer
Trends and Technology

Research Article | Open Access | Download PDF

Volume 4 | Issue 7 | Year 2013 | Article Id. IJCTT-V4I7P117 | DOI : https://doi.org/10.14445/22312803/IJCTT-V4I7P117

An Efficient Clustering and Distance Based Approach for Outlier Detection


Garima Singh, Vijay Kumar

Citation :

Garima Singh, Vijay Kumar, "An Efficient Clustering and Distance Based Approach for Outlier Detection," International Journal of Computer Trends and Technology (IJCTT), vol. 4, no. 7, pp. 2067-2072, 2013. Crossref, https://doi.org/10.14445/22312803/IJCTT-V4I7P117

Abstract

Outlier detection is a substantial research problem in the domain of data mining that aims to uncover objects which exhibit significantly different, exceptional and inconsistent from rest of the data. Outlier detection has been widely researched and finds use within various application domains including tax fraud detection, network robustness analysis, network intrusion and medical diagnosis. In this paper we propose an efficient clustering and distance based outlier detection technique. The clustering algorithms employed for this task are PAM, CLARA and CLARANS and a novel clustering algorithm I-CLARANS is proposed. The process of outlier detection is divided into two stages. In the first stage clustering is performed and in the second stage outlier detection is performed. The purpose is to perform clustering and outlier mining simultaneously. The experimental results depict that the proposed method is effective and promising in practice. We also present comparison of proposed algorithm with existing algorithms to validate its advantage in outlier detection.

Keywords

Outlier detection, Data Mining, Clustering, PAM, CLARA, CLARANS.

References

[1] Bigus Joseph P., Data Mining with Neural Networks, McGraw–Hill, U.S.A., 1996 
[2] Williams, G., Baxter, R., He, H., Hawkins, S., and Gu, L.A comparative study of rnn for outlier detection in data mining. In Proceedings of the 2002 IEEE International Conference on Data Mining. IEEE Computer Society, USA, 2002 
[3] J. J. Han, and M. Kamber, “Data Mining Concepts and Techniques,” Morgan Kaufmann, USA, 2001. 
[4] E. Knorr and et al. Distance-based outliers: Algorithms and applications. VLDB Journal, 2000. 
[5] E. Knorr and R. Ng. A unified notion of outliers: Properties and computation. In ACM SIGKDD, 1997 
[6] E. Knorr and R. Ng. Finding intentional knowledge of distance-based outliers. In VLDB, 1999. 
[7] V. Barnett and T. Lewis. Outliers in Statistical Data. John Wiley, 1994. 
[8] K. Yamanishi and J. Takeuchi.On-line unsupervised outlier detection using finite mixtures with discounting learning algorithms. In Proceedings of Data Min. Knowledge Discovery. Vol. 8, No. 3, pp 275-300, 2004 
[9] K. Yamanishi and J. Takeuchi, 2001. Discovering outlier filtering rules from unlabeled data-combining a supervised learner with an unsupervised learner. In Proceedings of KDD, pp 389-394, 2001 
[10] R. Nuts and P. Rousseeuw. Computing depth contours of bivariate point clouds. Computational Statistics and Data Analysis, Vol 23, No 2, pp 153-168, 1996 
[11] M. F. Jiang, S.S. Tseng, and C.M. Su. “Two-phase clustering process for outliers detection”, Pattern Recognition Letters, Vol 22, No. 6-7, pp. 691-700, 2001 
[12] Zengyou He, Shengchun Deng , Xiaofei Xu. Outlier detection integrating semantic knowledge. In Proceedings of Third international Conference on Advances in Web-Age information Management., Vol. 2419. pp 126-131. 2002. 
[13] Zengyou He, Shengchun Deng , Xiaofei Xu. Outlier detection integrating semantic knowledge. In Proceedings of Third international Conference on Advances in Web-Age information Management., Vol. 2419. pp 126-131. 2002. 
[14] Spiros Papadimitriou, Hiroyuki Kitawaga, PhillipB. Gibbons, and Christos Faloutsos. LOCI: Fast outlier detection using the local correlation integral. In ICDE,2003. 
[15] Aleksandar Lazarevic, Levent Ertoz, Vipin Kumar, Aysel Ozgur, and Jaideep Srivastava. A comparative study of outlier detection schemes for network intrusion detection. In SIAM Data Mining, 2003.
[16] Shalini S Singh, N C Chauhan, “K-means v/s Kmedoids: A Comparative Study”, National Conference on Recent Trends in Engineering & Technology, May 2011.
[17] Krzysztof Koperski, Junas Adhikary and Jiawei Han. Spatial Data Mining: Progress and Challenges Survey Paper, School of Computer Science Simon Fraser University Burnaby, 1996.
[18] Ng, R. and Han, J. “Efficient and Effective Clustering Methods for Spatial Data Mining” . In Proceedings of 20th Conference. Very Large Databases, Pp. 144–155,1994. 
[19] A.Loureiro, L.Torgo and C.Soares, “Outlier detection using Clustering methods: A data cleaning Application”, In Proceedings of KDNet Symposium on Knowledge-based systems for the Public Sector. Bonn, Germany, 2004.
[20] Rousseeuw, P. and A. Leroy, Robust Regression and Outlier Detection, 3rd ed., John Wiley & Sons, 1996.
[21] Chandola, V., Banerjee, A. and Kumar, V. “Anomaly detection: A survey”, ACM Computing Surveys, Vol. 41, Issue 3, Pp.1-58, 2009
[22] P. Murugavel, Dr. M. Punithavalli, “Improved Hybrid Clustering and Distance-based Technique for Outlier Removal”, International Journal on Computer Science and Engineering, Volume 3, pp.333-339, 2011.
[23] S.Vijayarani, S.Nithya, “Sensitive Outlier Protection in Privacy Preserving Data Mining”, International Journal of Computer Applications, Volume 33, pp 19-27, 2011.
[24] Moh’d Belal Al-Zoubi, “An Effective Clustering-Based Approach for Outlier Detection”, European Journal of Scientific Research, Vol.28 No.2, pp. 310-316, 2009. 
[25] R. T. Ng and J. Han CLARANS: A method for clustering objects for spatial data mining, IEEE Transactions on Knowledge and Data Engineering, 14 pp. 1003–1016, 2002.
[26] P Chandore, P Chatur.“Outlier Detection Techniques over Streaming Data in Data Mining: A Research Perspective”. International Journal  Journal of Advanced Research in Computer Science and Software Engineering, Vol2, pp 12-16, June 2012.
[27] Pamula, R., Deka, J.K., Nandi, S. “An Outlier Detection Method Based on Clustering”. Emerging Applications of Information Technology, pp. 253–256, 2011
[28] Ms. S. D. Pachgade, Ms. S. S. Dhande. “Outlier Detection over Data Set Using Cluster-Based and Distance-Based Approach”, International Journal of Advanced Research in Computer Science and Software Engineering, Vol 2, pp 12-16, June 2012.
[29] H M Koupaie, S. Ibrahim and J. Hosseinkhani.“Outlier Detection in Stream Data by Clustering Method”. International Journal of Advanced Computer Science and Information Technology,Vol. 2, No. 3,pp. 25-34, 2013.
[30] S. Salvador and P. Chan, “Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms”, in Proceedings Sixteenth IEEE International Conference on Tools with Artificial Intelligence, Los Alamitos, CA, USA, IEEE Computer Society, pp. 576–584 , 2004.
[31] K.Yoon, O.Kwon and D.Bae, “An approach to outlier Detection of Software Measurement Data using the Kmeans Clustering Method”, First International Symposium on Empirical Software Engineering and  Measurement, Madrid.,pp:443-445, 2007
[32] V. J. Hodge and J. Austin, “A Survey of Outlier Detection Methodologies,” Artificial Intelligence Review, vol. 22, no. 2,pp. 85126, 2004.
[33] L. Portnoy, E. Eskin, and S. Stolfo, “Intrusion Detection with Unlabeled Data using Clustering,” in Proceedings of the ACM CSS Workshop on Data Mining Applied to Security (DMSA) ,pp. 5–8, 2001
[34] Velmurugan, T. and Santhanam, T. “A survey of partition based clustering algorithms in data mining: An experimental approach”, Information Technology Journal., Vol.10, pp. 478-484, 2011.
[35] Jiang, S. and An, Q. Clustering-Based Outlier Detection Method, Fifth International Conference on Fuzzy Systems and Knowledge Discovery, Vol. 2, pp.429-433, 2008.