Classification of Efficient Imputation Method for Analyzing Missing Values
S.Kanchana , Dr. Antony Selvadoss Thanamani."Classification of Efficient Imputation Method for Analyzing Missing Values". International Journal of Computer Trends and Technology (IJCTT) V12(4):193-195, June 2014. ISSN:2231-2803. www.ijcttjournal.org. Published by Seventh Sense Research Group.
Abstract -
In Statistical analysis, missing data is a common problem for data quality. Many real datasets have missing data. Imputation preserves all cases by replacing missing data with a probable value based on other available information. Once all missing values have been imputed, the data set can be analyzed using standard techniques for complete data. This paper aim is to describe the efficient imputation method like Mean, Median, Refined Mean, Standard Deviation, Linear Regression, Discretization based method and some of clustering techniques like K-Mean and KNN methods which are used for imputing missing values in the dataset. The datasets are taken from the UCI ML repository. The results are compared in terms of accuracy.
References
[1] R. J. Little and D. B. Rubin. Statistical Analysis with missing Data, John Wiley and Sons, New York, 1997.
[2] R. Kavitha Kumar and Dr. R. M. Chandrasekar. Missing data imputation in cardiac data set (Survival progonosis).
[3] R.S. Somasundaram, R. Nedunchezhian, “Evaluation on Three Simple Imputation Methods for Enhancing Preprocessing of Data with Missing Values”, International Journal of Computer Applications, Vol21-No. 10, May 2011, pp14-19.
[4] Graham, J.W,”Missing Data Analysis: Making it work in the real world. Annual Review of Psychology”, 60, 546-576, 2009.
[5] Jeffrey C.Wayman, “Multiple Imputation for Missing Data: What Is It And How Can I Use It?”, Paper presented at the 2003 Annual Meeting of the American Educational Research Association, Chicago, IL, pp. 2-16,2003.
[6] A.Rogier T.Donders, Geert J.M.G Vander Heljden, Theo St ijnen, Kernel G.M Moons, “Review: A gentle introduction to imputation of missing values”, Journal of Clinical Epidemiology 59, pp.1087-1091,2006.
[7] Kin Wagstaff, “Clustering with Missing Values: No Imputation Required”-NSF grant IIS-0325329, pp.1-10.
[8] S.Hichao Zhang, Jilian Zhang, Xiaofeng Zhu, Yongsong Qin, Chengqi Zhang, “Missing Value Imputation Based on Data Clustering”, Springer-Verlag Berlin, Heidelberg, 2008.
[9] Shalini S. Singh, N C Chauhan – “K-means v/s K-medoids: A comparative Study”. National Conference on Recent Trends in Engineering & Technology,(13-14 May 2011).
[10] Blessie, C.E.; Karthikeyan,E.: Selvaraj,B. (2010): NAD – A Discretization approach for improving interdependency, Journal of Advanced Research in Computer Science, 2910, pp. 9-17.
[11] Liu, H.; Setiono,R. (1997): Feature selection via discretization, IEEE Transaction on Knowledge and Data Engineering 9(4), pp. 642-645.
[12] Ms.R.Malarvizhi, Dr. Antony Selvadoss Thanamani- “K-Nearest Neighbor in Missing Data Imputation”, International Journal of Engineering Research and Development, Volume 5 Issue 1- November-2012.
Keywords
Clustering Techniques, Discretization, K-Mean, KNN, Mean, Median, Refined Mean, Standard Deviation.