Author ORCID Identifier

Year of Publication


Degree Name

Doctor of Philosophy (PhD)

Document Type

Doctoral Dissertation


Public Health


Epidemiology and Biostatistics

First Advisor

Dr. Philip M. Westgate

Second Advisor

Dr. David Fardo


Data arising from Cluster Randomized Trials (CRTs) and longitudinal studies are correlated and generalized estimating equations (GEE) are a popular analysis method for correlated data. Previous research has shown that analyses using GEE could result in liberal inference due to the use of the empirical sandwich covariance matrix estimator, which can yield negatively biased standard error estimates when the number of clusters or subjects is not large. Many techniques have been presented to correct this negative bias; However, use of these corrections can still result in biased standard error estimates and thus test sizes that are not consistently at their nominal level. Therefore, there is a need for an improved correction such that nominal type I error rates will consistently result.

First, GEEs are becoming a popular choice for the analysis of data arising from CRTs. We study the use of recently developed corrections for empirical standard error estimation and the use of a combination of two popular corrections. In an extensive simulation study, we find that nominal type I error rates can be consistently attained when using an average of two popular corrections developed by Mancl and DeRouen (2001, Biometrics 57, 126-134) and Kauermann and Carroll (2001, Journal of the American Statistical Association 96, 1387-1396) (AVG MD KC). Use of this new correction was found to notably outperform the use of previously recommended corrections.

Second, data arising from longitudinal studies are also commonly analyzed with GEE. We conduct a simulation study, finding two methods to attain nominal type I error rates more consistently than other methods in a variety of settings: First, a recently proposed method by Westgate and Burchett (2016, Statistics in Medicine 35, 3733-3744) that specifies both a covariance estimator and degrees of freedom, and second, AVG MD KC with degrees of freedom equaling the number of subjects minus the number of parameters in the marginal model.

Finally, stepped wedge trials are an increasingly popular alternative to traditional parallel cluster randomized trials. Such trials often utilize a small number of clusters and numerous time intervals, and these components must be considered when choosing an analysis method. A generalized linear mixed model containing a random intercept and fixed time and intervention covariates is the most common analysis approach. However, the sole use of a random intercept applies assumptions that will be violated in practice. We show, using an extensive simulation study based on a motivating example and a more general design, alternative analysis methods are preferable for maintaining the validity of inference in small-sample stepped wedge trials with binary outcomes. First, we show the use of generalized estimating equations, with an appropriate bias correction and a degrees of freedom adjustment dependent on the study setting type, will result in nominal type I error rates. Second, we show the use of a cluster-level summary linear mixed model can also achieve nominal type I error rates for equal cluster size settings.

Digital Object Identifier (DOI)