Nearest Neighbour Based Outlier Detection Techniques

Dr. Shuchita Upadhyaya; Karanjit Singh

doi:https://doi.org/10.14445/22312803/IJCTT-V3I2P119

Research Article | Open Access | Download PDF

Volume 3 | Issue 2 | Year 2012 | Article Id. IJCTT-V3I2P119 | DOI : https://doi.org/10.14445/22312803/IJCTT-V3I2P119

Nearest Neighbour Based Outlier Detection Techniques

Dr. Shuchita Upadhyaya, Karanjit Singh

Citation :

Dr. Shuchita Upadhyaya, Karanjit Singh, "Nearest Neighbour Based Outlier Detection Techniques," International Journal of Computer Trends and Technology (IJCTT), vol. 3, no. 2, pp. 295-299, 2012. Crossref, https://doi.org/10.14445/22312803/IJCTT-V3I2P119

Abstract

Outlier detection is an important research area forming part of many application domains. Specific application domains call for specific detection techniques, while the more generic ones can be applied in a large number of scenarios with good results. This survey tries to provide a structured and comprehensive overview of the research on Nearest Neighbor Based Outlier Detection listing out various techniques as applicable to our area of research. We have focused on the underlying approach adopted by each technique. We have identified key assumptions, which are used by the techniques to differentiate between normal and Outlier behavior. When applying a given technique to a particular domain, these assumptions can be used as guidelines to assess the effectiveness of the technique in that domain. We provide a basic outlier detection technique, and then show how the different existing techniques in that category are variants of this basic technique. This template provides an easier and succinct understanding of the Nearest Neighbor based techniques. Further we identify the advantages and disadvantages of various Nearest Neighbor based techniques. We also provide a discussion on the computational complexity of the techniques since it is an important issue in our application domain. We hope that this survey will provide a better understanding of the different directions in which research has been done on this topic, and how techniques developed in this area can be applied in other domains for which they were not intended to begin with.

Keywords

Outlier, Outlier Detection, Nearest Neighbour Concept, Multivariate, Algorithms, Data Mining, Nearest Neighbor based Outlier Detection, K-Nearest Neighbor, LOF Neighborhood, COF Neighborhood.

References

[1] Tan, P.-N., Steinbach, M., and Kumar, V. 2005. Introduction to Data Mining. Addison-Wesley.
[2] Boriah, S., Chandola, V., and Kumar, V. 2008. Similarity measures for categorical data: A comparative evaluation. In Proceedings of the eighth SIAM International Conference on Data Mining. 243 - 254.
[3] Chandola, V., Eilertson, E., Ertoz, L., Simon, G., and Kumar, V. 2006. Data mining for cyber security. In Data Warehousing and Data Mining Techniques for Computer Security, A. Singhal, Ed. Springer.
[4] Byers, S. D. and Raftery, A. E. 1998. Nearest neighbor clutter removal for estimating features in spatial point processes. Journal of the American Statistical Association 93, 577 - 584.
[5] Guttormsson, S., II, R. M., and El-Sharkawi, M. 1999. Elliptical novelty grouping for on-line short-turn detection of excited running rotors. IEEE Transactions on Energy Conversion 14, 1 (March).
[6] Ramaswamy, S., Rastogi, R., and Shim, K. 2000. Efficient algorithms for mining outliers from large data sets. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data. ACM Press, 427 - 438.
[7] Eskin, E., Arnold, A., Prerau, M., Portnoy, L., and Stolfo, S. 2002. A geometric frame-work for unsupervised outlier detection. In Proceedings of Applications of Data Mining in Computer Security. Kluwer Academics, 78 - 100.
[8] Angiulli, F. and Pizzuti, C. 2002. Fast outlier detection in high dimensional spaces. In Proceed-ings of the 6th European Conference on Principles of Data Mining and Knowledge Discovery. SpringerVerlag, 15 - 26.
[9] Zhang, J. and Wang, H. 2006. Detecting outlying subspaces for highdimensional data: the new task, algorithms, and performance. Knowledge and Information Systems 10, 3, 333 - 355.
[10] Bolton, R. and Hand, D. 1999. Unsupervised profiling methods for fraud detection. In Credit Scoring and Credit Control VII.
[11] Knorr, E. M. and Ng, R. T. 1997. A unified approach for mining outliers. In Proceedings of the 1997 conference of the Centre for Advanced Studies on Collaborative research. IBM Press, 11.
[12] Knorr, E. M. and Ng, R. T. 1998. Algorithms for mining distancebased outliers in large datasets. In Proceedings of the 24rd International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc., 392 - 403.
[13] Knorr, E. M. and Ng, R. T. 1999. Finding intensional knowledge of distance-based outliers. In The VLDB Journal. 211 - 222.
[14]Knorr, E. M., Ng, R. T., and Tucakov, V. 2000. Distance-based outliers: algorithms and applications. The VLDB Journal 8, 3-4, 237 - 253.
[15] Wei, L., Qian, W., Zhou, A., and Jin, W. 2003. Hot: Hypergraph-based outlier test for categori-cal data. In Proceedings of the 7th Pacific-Asia Conference on Knowledge and Data Discovery. 399 - 410.
16] Otey, M. E., Ghoting, A., and Parthasarathy, S. 2006. Fast distributed outlier detection in mixed-attribute data sets. Data Mining and Knowledge Discovery 12, 2-3, 203 - 228.
[17] Palshikar, G. K. 2005. Distance-based outliers in sequences. Lecture Notes in Computer Science 3816, 547 - 552.
[18] Kou, Y., Lu, C.-T., and Chen, D. 2006. Spatial weighted outlier detection. In Proceedings of SIAM Conference on Data Mining.
[19] Bay, S. D. and Schwabacher, M. 2003. Mining distance-based outliers in near linear time with randomization and a simple pruning rule. In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press, 29 - 38.
[20] Ramaswamy, S., Rastogi, R., and Shim, K. 2000. Efficient algorithms for mining outliers from large data sets. In Proceedings of the 2000 ACM SIGMOD international conference on Management of data. ACM Press, 427 - 438.
[21] Wu, M. and Jermaine, C. 2006. Outlier detection by sampling with accuracy guarantees. In Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. ACM, New York, NY, USA, 767 - 772.
[22] Knorr, E. M. and Ng, R. T. 1998. Algorithms for mining distancebased outliers in large datasets. In Proceedings of the 24rd International Conference on Very Large Data Bases. Morgan Kaufmann Publishers Inc., 392 - 403.
[23] Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J. 1999. Opticsof: Identifying local outliers. In Proceedings of the Third European Conference on Principles of Data Mining and Knowledge Discovery. Springer-Verlag, 262 - 270.
[24] Breunig, M. M., Kriegel, H.-P., Ng, R. T., and Sander, J. 2000. Lof: identifying density-based local outliers. In Proceedings of 2000 ACM SIGMOD International Conference on Management of Data. ACM Press, 93 - 104.
[25] Tang, J., Chen, Z., chee Fu, A. W., and W.Cheung, D. 2002. Enhancing effectiveness of outlier detections for low density patterns. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining. 535 - 548.
[26] Hautamaki, V., Karkkainen, I., and Franti, P. 2004. Outlier detection using k-nearest neigh-bour graph. In Proceedings of 17th International Conference on Pattern Recognition. Vol. 3. IEEE Computer Society, Washington, DC, USA, 430 - 433.
[27] Papadimitriou, S., Kitagawa, H., Gibbons, P. B., and Faloutsos, C. 2002. Loci: Fast out-lier detection using the local correlation integral. Tech. Rep. IRP-TR-02-09, Intel Research Laboratory, Pittsburgh, PA. July.
[28] Sun, P. and Chawla, S. 2004. On local spatial outliers. In Proceedings of 4th IEEE International Conference on Data Mining. 209 - 216.
[29] Yu, J. X., Qian, W., Lu, H., and Zhou, A. 2006. Finding centric local outliers in categorical/ numerical spaces. Knowledge and Information Systems 9, 3, 309 - 338.
[30] Sun, P. and Chawla, S. 2006. Slom: a new measure for local spatial outliers. Knowledge and Information Systems 9, 4, 412 - 429.
[31] Pokrajac, D., Lazarevic, A., and Latecki, L. J. 2007. Incremental local outlier detection for data streams. In Proceedings of IEEE Symposium on Computational Intelligence and Data Mining.
[32] Jin, W., Tung, A. K. H., and Han, J. 2001. Mining top-n local outliers in large databases. In Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining. ACM Press, 293 - 298.
[33] Chiu, A. and chee Fu, A. W. 2003. Enhancements on local outlier detection. In Proceedings of 7th International Database Engineering and Applications Symposium. 298 - 307.
[34] Bentley, J. L. 1975. Multidimensional binary search trees used for associative searching. Com-munications of the ACM 18, 9, 509 - 517.
[35] Roussopoulos, N., Kelley, S., and Vincent, F. 1995. Nearest neighbor queries. In Proceedings of ACM-SIGMOD International Conference on Management of Data.