Nearest neighbour imputation and variance estimation methods.
Degree GrantorUniversity of Canterbury
Degree NameDoctor of Philosophy
In large-scale surveys, non-response is a common phenomenon. This non-response can be of two types; unit and item non-response. In this thesis we deal with item non-response as other responses from the survey unit can be used for adjustment. Usually non-response adjustment is carried out in one of three ways; weighting, imputation and no adjustments. Imputation is the most commonly used adjustment method, either as single imputation or multiple imputations. In this thesis we study single imputation, in particular nearest neighbour methods, and we have developed a new method. Our method is based on dissimilarity measures and is nonparametric and handles categorical and continuous covariates without requiring any transformations. One drawback with this method was that it is relatively computer intensive, so we investigated data reduction methods. For data reduction we developed a new method that uses propensity scores. Propensity score is used as it has properties that suggest that it would make a good method for matching the respondents and non-respondents. We also looked at subset selection of the covariates using graphical modelling and principal component analysis. We found that the data reduction methods gave as good a result as when using all variables and there was considerable reduction in computation time especially with the propensity score method. As the imputed values are not true values, estimating the variance of the parameter of interest using standard methods would underestimate the variance if no allowance is made for the extra uncertainty due to imputed data being used. We examined various existing methods of variance estimation, particularly the bootstrap method, because both nearest neighbour imputation and bootstrap are non parametric. Also bootstrap is a unified method for estimating smooth as well as non-smooth parameters. Shao and Sitter (1996) proposed a bootstrap method, but for some extreme situations this method has problems. We have modified the bootstrap method of Shao and Sitter to overcome this problem and simulations indicate that both methods give good results. The conclusions from the study are that our new method of multivariate nearest neighbour is at least as good as regression based nearest neighbour and is often better. For large data sets, data reduction may be desirable and we recommend our propensity score method as it was observed to be the fastest among the subset selection methods as well as have some other advantages over the others. Imputing using any of the subsets methods we looked at appear to have similar results to imputing using all covariates. To compute the variance of the imputed data, we recommend the method proposed by Shao and Sitter or our modification of Shao and Sitter's method.