Nonparametric Tests of Lack of Fit for Multivariate Data

12-18-2022

2020

Degree Name

Doctor of Philosophy (PhD)

Document Type

Doctoral Dissertation

College

Arts and Sciences

Department/School/Program

Statistics

Dr. Solomon W. Harrar

Abstract

A common problem in regression analysis (linear or nonlinear) is assessing the lack-of-fit. Existing methods make parametric or semi-parametric assumptions to model the conditional mean or covariance matrices. In this dissertation, we propose fully nonparametric methods that make only additive error assumptions. Our nonparametric approach relies on ideas from nonparametric smoothing to reduce the test of association (lack-of-fit) problem into a nonparametric multivariate analysis of variance. A major problem that arises in this approach is that the key assumptions of independence and constant covariance matrix among the groups will be violated. As a result, the standard asymptotic theory is not applicable. Furthermore, the appropriate asymptotic framework differs from the usual large group sample size (replication size) requirement. The asymptotics involved requires both group size and number of groups to increase at a different rate. We develop our methods and theory in three separate steps.

First, we extend and expand existing multivariate results for large number of factor levels. We investigate the asymptotic properties of a carefully chosen test in the high-dimensional setting where both sample size and number of factor levels tend to infinity. The asymptotic results are distribution-free in the sense that the tests do not require knowledge of the population distributions. Based on this observation, a finite-sample approximation is constructed for the distribution of a Wilks Lambda type statistic. The tests are further extended to a more elaborate nested design. An important consequence of the large factor and sample size asymptotics is that it naturally yields a distribution-free asymptotic test for lack-of-fit in Multivariate Analysis of Variance (MANOVA). In the second step, a nonparametric test of association between a multivariate response and predictors is developed by exploiting the lack-of-fit test developed in the first step. Essentially, the association test is reduced to a nonparametric lack-of-fit test by binning the multivariate responses on the values of the predictor variable. A global test and multiple contrast test procedures are proposed and their asymptotics are explored in detail. In the last step, the lack-of-fit test is extended for use in testing the presence of a covariate effect in Multivariate Analysis of Covariance (MANCOVA) model. The MANCOVA test is built upon the nonparametric nested MANOVA design investigated in step 1. The numerical investigations have shown favorable performances of the methods in all three steps. The applications of the methods are illustrated with financial and college smoking data.

In summary, the methods developed in this thesis make mild assumptions. Both the mean and covariance matrices are allowed to depend on the predictor variable. No parametric model is assumed for the relationship between the response variables and predictor variables as well as the conditional means as well as covariance matrices. Furthermore, no distribution assumption is required in the response variable.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2020.524

COinS