Imputation Framework for Missing Values

International Journal of Computer Trends and Technology (IJCTT)          
© - Issue 2012 by IJCTT Journal
Volume-3 Issue-2                           
Year of Publication : 2012
Authors :K. Raja, G. Tholkappia Arasu ,Chitra. S. Nair.


K. Raja, G. Tholkappia Arasu ,Chitra. S. Nair.Imputation Framework for Missing Values"Imputation Framework for Missing Values"International Journal of Computer Trends and Technology (IJCTT),V3(2):215-219 Issue 2012 .ISSN Published by Seventh Sense Research Group.

Abstract: -Missing values may occur for several reasons and affects the quality of data, such as malfunctioning of measurement equipment, changes in experimental design during data collection, collation of several similar but not identical datasets and also when respondents in a survey may refuse to answer certain questions such as age or income. Missing values in datasets can be taken as a common problem in statistical analysis. This paper first proposes the analysis of broadly used methods to treat missing values which are either continuous or discrete. And then, an estimator is advocated to impute both continuous and discrete missing target values. The proposed method is evaluated to demonstrate that the approach is better than existing methods in terms of classification accuracy.


[1] J. Racine and Q. Li, “Nonparametric Estimation of Regression Functions with Both Categorical and Continuous Data,” J. Econometrics, vol. 119, no. 1, pp. 99-130, 2004.
[2] R. Little and D. Rubin, Statistical Analysis with Missing Data, second ed. John Wiley and Sons, 2002.
[3] J. Barnard and D. Rubin, “Small-Sample Degrees of Freedom with Multiple Imputation,” Biometrika, vol. 86, pp. 948-955, 1999.
[4] A. Dempster, N.M. Laird, and D. Rubin, “Maximum Likelihood from Incomplete Data via the EM Algorithm,” J. Royal Statistical Soc. vol. 39, pp. 1-38, 1977.
[5] K. Cios and L. Kurgan, “Knowledge Discovery in Advanced Information Systems,” Trends in Data Mining and Knowledge Discovery, N. Pal, L. Jain, and N. Teoderesku, eds., Springer, 2002.
[6] S.C. Zhang et al., “Missing Is Useful: Missing Values in Cost- Sensitive Decision Trees,” IEEE Trans. Knowledge and Data Eng.,vol. 17, no. 9, pp. 1689-1693, Dec. 2005

Keywords — Classification, data mining, methodologies.