In clinical trials with continuous multiple endpoints, we model the outcome variables as a mixture of multivariate normal distributions to account for the effect of misclassification errors. We propose two methods for estimating and testing treatment effects. When the misclassification errors are known from previous studies, we develop moment-based tests and confidence interval procedures which are accurate in finite samples. When the misclassification errors are unknown, we propose likelihood-based procedures for estimation and testing via the EM algorithm. In addition, methods for sample size and power calculations are developed. The moment-based methods can also be used when the misclassification rates are unknown if validation samples are available. In this case, consistent estimators of the misclassification error rates are derived using a novel distance-based criterion.

When the data are measured in a nonmetric scale or when the distribution of the data is heavy-tailed or skewed, the normality assumption is not valid. In this case, we develop a fully-nonparametric method to assess treatment effect. We model the distribution of the outcomes by as a nonparametric mixture of unknown distributions. To overcome identifiability problems, we assume availability of training data from the component distributions. In the nonparametric setting, functionals of these distribution functions are used to characterize treatment effects. We provide consistent estimators and asymptotic distributions of the estimators of the misclassification error rates as well as the treatment effect. We do not require any assumptions regarding existence of moments of any order.

Typically, clinical trials involve collection of baseline covariates which are associated with the misclassification of a patient and treatment outcomes. In this situation, we propose a nonparametric finite mixture of regression models to approximate the distribution of outcomes. We establish identifiability conditions and derive an estimation procedure using the kernel methods and the EM algorithm.

Simulation results show significant advantage of the proposed methods in terms of bias reduction, coverage probability, and power. The applications of the methods are illustrated with datasets from a sleep deprivation and electroencephalogram (EEG) studies.

]]>In this dissertation, four models to deal with NPH patterns are discussed. First, a piecewise proportional hazards model is proposed to incorporate delayed treatment effect into the trial design consideration. Second, we consider a piecewise proportional hazard model with cure rate to deal with both delayed treatment effect and cure rate. Third, we extended the second model as a general random delayed cure rate model in cancer immunotherapy trials design. Fourth, we proposed a piecewise proportional hazard responder rate model to deal with both delayed treatment effect and responder rate. Sample size formulas are derived for weighted log-rank tests under a fixed alternative hypothesis under various models. The accuracy of sample size calculation using the new formulas are assessed and compared with the existing methods via simulation studies. The sensitivities for mis-specifying the random delay time are also studied through simulations. What is more, a real immunotherapy trial is used to illustrate the study design along with practical consideration of balance between sample size and follow-up time in second model.

]]>The first project proposes an approximate finite-sample test using modified sums of squares matrices to make them insensitive to the heterogeneity in MANOVA. The modification corrects the associated quadratic forms of the two sums of squares for the effect of heterogeneity. The distribution of the proposed test statistic is invariant to the original data distribution. The proposed approximation method can be used in various experimental designs, for example, factorial design and crossover design. Under various simulation settings, the proposed method outperforms the classical Doubly Multivariate Model and Multivariate Mixed Model, especially for unbalanced sample sizes. The applications of the proposed method are illustrated with ophthalmology data in a factorial design and in a 2 × 2 crossover design.

In the semiparametric situation, parametric and nonparametric bootstraps are known to have satisfactory finite-sample performance in general factorial designs. In this regard, the second project provides resampling-based tests for multivariate growth curve data. Such tests are useful in situations where data are not necessarily exchangeable under the null hypothesis of interest and with small sample sizes. Simulation studies are conducted to evaluate the finite-sample performance of the proposed test procedures under various practical scenarios. Data from an optometry study are used to illustrate the benefits of the nonparametric methods proposed.

For multivariate growth curve data which are measured in ordered categorical scales, the usual mean- and covariance-based inferences are not appropriate anymore. The third project deals with general nonparametric methods for multivariate growth curve data in factorial designs. Treatment effects are characterized in terms of functionals of distribution functions with the sole assumption of nondegenerate marginal distributions. This model accommodates binary, discrete, ordered categorical, and continuous data in a unified manner. Hypotheses are formulated in terms of meaningful nonparametric measures of treatment effects. In this project, the Wald-type statistic is proposed and its asymptotic properties are investigated. In addition, the ANOVA-type statistic and the modified Wilks' Lambda statistic under the nonparametric framework are also presented. The theory can be used to construct confidence intervals for the nonparametric treatment effects. Simulation studies are conducted to show the finite-sample performance of the proposed methods in comparison with other parametric and nonparametric methods. Data from a study of infantile nystagmus syndrome (INS) are analyzed to illustrate the application of the proposed methods.

]]>First, we extend and expand existing multivariate results for large number of factor levels. We investigate the asymptotic properties of a carefully chosen test in the high-dimensional setting where both sample size and number of factor levels tend to infinity. The asymptotic results are distribution-free in the sense that the tests do not require knowledge of the population distributions. Based on this observation, a finite-sample approximation is constructed for the distribution of a Wilks Lambda type statistic. The tests are further extended to a more elaborate nested design. An important consequence of the large factor and sample size asymptotics is that it naturally yields a distribution-free asymptotic test for lack-of-fit in Multivariate Analysis of Variance (MANOVA). In the second step, a nonparametric test of association between a multivariate response and predictors is developed by exploiting the lack-of-fit test developed in the first step. Essentially, the association test is reduced to a nonparametric lack-of-fit test by binning the multivariate responses on the values of the predictor variable. A global test and multiple contrast test procedures are proposed and their asymptotics are explored in detail. In the last step, the lack-of-fit test is extended for use in testing the presence of a covariate effect in Multivariate Analysis of Covariance (MANCOVA) model. The MANCOVA test is built upon the nonparametric nested MANOVA design investigated in step 1. The numerical investigations have shown favorable performances of the methods in all three steps. The applications of the methods are illustrated with financial and college smoking data.

In summary, the methods developed in this thesis make mild assumptions. Both the mean and covariance matrices are allowed to depend on the predictor variable. No parametric model is assumed for the relationship between the response variables and predictor variables as well as the conditional means as well as covariance matrices. Furthermore, no distribution assumption is required in the response variable.

]]>In the first project, we propose a nonparametric approach for one-sample clustered data in pre-post intervention design. In particular, we consider the situation where for some clusters all members are only observed at either pre or post intervention but not both. This type of clustered data is referred to us as partially complete clustered data. Unlike most of its parametric counterparts, we do not assume specific models for data distributions, intra-cluster dependence structure or variability, in effect addressing the so-called nonparametric Behrens-Fisher problem. A nonparametric measure of effect size is proposed. By constructing hypotheses based on the nonparametric effect size measure, the proposed test can eventually provide meaningful and interpretable probabilistic comparisons of treatments. The method accommodates continuous, ordered categorical and ordinal data seamlessly.

The second project focuses on nonparametric methods for multivariate data in a pre-post design where some of the samples are partially complete. This type of data can also be viewed as missing data where all the variables are missing at either pre or post intervention. Here also we derive asymptotic theory for estimating the vector of nonparametric effect size measures. The Wald-type statistic is proposed for large sample size and ANOVA-type statistic is proposed for small sample size. Apart from asymptotic evaluations, the Wald-type and ANOVA-type statistics are also shown to have good finite-sample performance in a variety of settings and missing patterns by a simulation study. The methods are further extended to the arbitrary missing pattern situation.

The third project is motivated by the Asthma Randomized Trial of Indoor Wood Smoke (ARTIS). The study involved a three-arm placebo controlled randomized trial on homes with wood burning stoves in a pre-post intervention design. The active treatments were aimed at improving functional, emotional and activity symptoms of children with asthma. In this project, nonparametric procedures are developed for the general pre-post clustered data collected in a factorial layout. Compared with the first project, the methods in this project are able to make comparisons across multiple treatments and between intervention periods. Here also, both complete and partially complete clusters are allowed. Simulation studies provide evidence that our method perform reasonably well in both large sample and small-sample settings. Therefore, the proposed nonparametric methods provide a new way of analyzing non-metric clustered data in factorial designs with repeated measures and is also a powerful competitor of the parametric mixed effects model for continuous outcomes.

Simulation studies provide evidence that our methods perform well in a wide variety of settings that involve small samples. Real datasets from two randomized trials are used to illustrate the application of the methods.

]]>Theoretical assumptions are applied to this sepsis score to investigate distributional properties of the measure for applicable inferences. Finally, a new approximation to the degrees of freedom of a t-distribution, 𝜈𝑠, is proposed. This new approximation is investigated and compared to the Satterthwaite approximation.

]]>In phase II randomized trials, due to the limited sample size, randomization may not be able to balance the distributions of all baseline characteristics. As a result, some factors may be correlated with the treatment assignment, which could cause the treatment effect estimation to deviate from the underlying true effect value. Another important issue in treatment effect evaluation is to identify factors that are associated with the endpoint and include them in the model, which can reduce the overall variation of the model and lead to more precise estimation of the treatment effect. In this dissertation, we present a Bayesian model selection method to identify and adjust for both unbalanced confounding factors and factors associated with the endpoint. Our method extends the Bayesian adjustment for confounding (BAC) method, which is designed for observational studies, to randomized clinical trials. Simulation studies and real data analysis demonstrate that our method is able to provide an unbiased estimation of the treatment effect and reduce the variation of the estimation.

]]>When treatment effects can meaningfully be formulated in terms of means, a semiparametric approach under equal and unequal covariance assumptions is investigated. Composites of F-type statistics are used to construct two tests. One test is a moderate-*p* version – the test statistic is centered by asymptotic mean – and the other test is a large-*p* version asymptotic-expansion based finite-sample correction for the mean of the test statistic. These tests do not make any distributional assumptions and, therefore, they are nonparametric in a way. The theory for the tests only requires mild assumptions to regulate the dependence. Simulation results show that, for moderately small samples, the large-*p* version yields substantial gain in the size with a small power tradeoff.

In some situations mean-based inference is not appropriate, for example, for data that is in ordinal scale or heavy tailed. For these situations, a high-dimensional fully-nonparametric test is proposed. In the two-sample situation, a composite of a Wilcoxon-Mann-Whitney type test is investigated. Assumptions needed are weaker than those in the semiparametric approach. Numerical comparisons with the moderate-*p* version of the semiparametric approach show that the nonparametric test has very similar size but achieves superior power, especially for skewed data with some amount of dependence between variables.

Finally, we conduct an extensive simulation to compare our proposed methods with other nonparametric test and rank transformation methods. A wide spectrum of simulation settings is considered. These simulation settings include a variety of heavy tailed and skewed data distributions, homoscedastic and heteroscedastic covariance structures, various amounts of dependence and choices of tuning (smoothing window) parameter for the asymptotic variance estimators. The fully-nonparametric and the rank transformation methods behave similarly in terms of type I and type II errors. However, the two approaches fundamentally differ in their hypotheses. Although there are no formal mathematical proofs for the rank transformations, they have a tendency to provide immunity against effects of outliers. From a theoretical standpoint, our nonparametric method essentially uses variable-by-variable ranking which naturally arises from estimating the nonparametric effect of interest. As a result of this, our method is invariant against application of any monotone marginal transformations. For a more practical comparison, real-data from an Encephalogram (EEG) experiment is analyzed.

]]>