Partition Based with Outlier Detection
Saswati Bhattacharyya,RakeshK. Das,Nilutpol Sonowal,Aloron Bezbaruah, Rabinder K. Prasad "Partition Based with Outlier Detection". International Journal of Computer Trends and Technology (IJCTT) V59(1):63-67, May 2018. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.
Abstract
Partition based method is widely used in every field of science and technology. It can detect spherical shaped clusters, but cannot detect any noisy information that is present in a data set. In this paper, we have proposed a partition based method with an outlier detection feature which can detect good quality clusters as well as identify the outliers present in it in optimal time. We have figured the outlier detection issue for the most part and planned calculations which can precisely identify anomalies in a way that the time complexity ought to be least. We have calculated the degree of outlier of each data object and included in existing partition based clustering technique to get good quality clusters along with the required anomalies. Additionally, utilizing a real world data set, we will exhibit that our methodologies can abstain from distinguishing false anomalies as well as discover genuine outliers overlooked by existing techniques.
Reference
[1] BharatiKamble ,KanchanDoke , “Outlier Detection Approaches in Data Mining”Computer Engineering, Mumbai University, April 2010.
[2] Pang Ning Tan, Vipin Kumar, Michael Steinbarch, Introduction to Data Mining, Sixth Edition,2011, Pearson Education
[3] Yufeng Kou, “Abnormal Pattern Recognition in Spatial Data”, Virginia Polytechnic Institute and State University, Doctor of Philosophy in Computer Science and Applications, November, 2006.
[4] Jiwai Hen, MichelineKamber, Jianpei, Data Mining Concepts and Techniques, Third Edition, 2012, Morgan-Kaufmann.
[5] (2018) The Wikipedia [Online] Available: https://en.wikipedia.org/wiki/Local_outlier_factor
[6] Hawkins, D, “Identification of Outliers”, Chapman and Hall, London, 1980.
[7] Edwin M. Knorr and Raymond T. Ng , “Algorithms for Mining Distance-Based Outliers in Large Datasets”, Department of Computer Science, University of British Columbia, Vancouver, BC V6T 124 Canada, December 1998.
[8] Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng, JörgSander, “LOF: Identifying Density-Based Local Outliers”, London 1983.
[9] Jihwan Lee and Nam-Wook Cho, “Fast Outlier Detection Using a Grid-Based Algorithm”, Department of Industrial and Management Engineering, Hankuk University of Foreign Studies, Republic of Korea , November 2016.
[10] Charu C. Aggarwal,Outlier Analysis, Second Edition, New York, November 25, 2016.
[11] Jaeshin Lee, Bokyoung Kang?, Suk-Ho Kang, “Independent component analysis and local outlier factor for plant-wide process monitoring”, Department of Industrial Engineering, Seoul National University, Seoul, 151-742, Republic of Korea
[12] Barnett V., Lewis T.: “Outliers in statistical data”, John Wiley, 1994.
[13] Ester M., Kriegel H.-P., Sander J., Xu X.: “A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise”, Proc. 2nd Int. Conf. on Knowledge Discovery and Data Mining, Portland, OR, AAAI Press, 1996.
[14] Knorr E. M., Ng R. T.: “Finding Intentional Knowledge of Distance-based Outliers”, Proc. 25th Int. Conf. on Very Large Data Bases, Edinburgh, Scotland, 1999.
[15] Ramaswamy S., Rastogi R., Kyuseok S.: “Efficient Algorithms for Mining Outliers from Large Data Sets”, Proc. ACMSIDMOD Int. Conf. on Management of Data, 2000.
[16] V. Barnett and T. Lewis. “Outliers in Statistical Data”, John Wiley & Sons, 1994.
[17] Ng R.T., and Han J. 1994, “Efficient and Effective Clustering Methods for Spatial Data Mining”, Proc. 20th Int. Conf. on Very Large Data Bases, 144-155. Santiago, Chile.
[18] K. Zhang, M. Hutter, and H. Jin, “A new local distance-based outlier detection approach for scattered real-world data”, in Proc 13th Pacific-Asia Conf on Knowledge Discovery and Data Mining (PAKDD), 2009.
[19] M. E. Houle, H.-P. Kriegel, P. Kr ¨oger, E.Schubert, and A. Zimek, “Can shared-neighbour distances defeat the curse of dimensionality?” in Proc 22nd Int Conf on Scientific and Statistical Database Management (SSDBM), 2010.
Keywords
Partition-based Clustering, Outlier detection, degree of outlier, k-Mean