Author ORCID Identifier

https://orcid.org/0000-0001-5108-8637

Year of Publication

2019

Degree Name

Doctor of Philosophy (PhD)

Document Type

Doctoral Dissertation

College

Arts and Sciences

Department

Statistics

First Advisor

Dr. Solomon W. Harrar

Abstract

This dissertation focuses on the problem of making high-dimensional inference for two or more groups. High-dimensional means both the sample size (n) and dimension (p) tend to infinity, possibly at different rates. Classical approaches for group comparisons fail in the high-dimensional situation, in the sense that they have incorrect sizes and low powers. Much has been done in recent years to overcome these problems. However, these recent works make restrictive assumptions in terms of the number of treatments to be compared and/or the distribution of the data. This research aims to (1) propose and investigate refined small-sample approaches for high-dimension data in the multi-group setting (2) propose and study a fully-nonparametric approach, and (3) conduct an extensive comparison of the proposed methods with some existing ones in a simulation.

When treatment effects can meaningfully be formulated in terms of means, a semiparametric approach under equal and unequal covariance assumptions is investigated. Composites of F-type statistics are used to construct two tests. One test is a moderate-p version – the test statistic is centered by asymptotic mean – and the other test is a large-p version asymptotic-expansion based finite-sample correction for the mean of the test statistic. These tests do not make any distributional assumptions and, therefore, they are nonparametric in a way. The theory for the tests only requires mild assumptions to regulate the dependence. Simulation results show that, for moderately small samples, the large-p version yields substantial gain in the size with a small power tradeoff.

In some situations mean-based inference is not appropriate, for example, for data that is in ordinal scale or heavy tailed. For these situations, a high-dimensional fully-nonparametric test is proposed. In the two-sample situation, a composite of a Wilcoxon-Mann-Whitney type test is investigated. Assumptions needed are weaker than those in the semiparametric approach. Numerical comparisons with the moderate-p version of the semiparametric approach show that the nonparametric test has very similar size but achieves superior power, especially for skewed data with some amount of dependence between variables.

Finally, we conduct an extensive simulation to compare our proposed methods with other nonparametric test and rank transformation methods. A wide spectrum of simulation settings is considered. These simulation settings include a variety of heavy tailed and skewed data distributions, homoscedastic and heteroscedastic covariance structures, various amounts of dependence and choices of tuning (smoothing window) parameter for the asymptotic variance estimators. The fully-nonparametric and the rank transformation methods behave similarly in terms of type I and type II errors. However, the two approaches fundamentally differ in their hypotheses. Although there are no formal mathematical proofs for the rank transformations, they have a tendency to provide immunity against effects of outliers. From a theoretical standpoint, our nonparametric method essentially uses variable-by-variable ranking which naturally arises from estimating the nonparametric effect of interest. As a result of this, our method is invariant against application of any monotone marginal transformations. For a more practical comparison, real-data from an Encephalogram (EEG) experiment is analyzed.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2019.339

Available for download on Saturday, August 01, 2020

Share

COinS