Year of Publication
Doctor of Philosophy (PhD)
Arts and Sciences
Dr. Katherine Thompson
Dr. Arnold Stromberg
In recent years, statistical analyses, algorithms, and modeling of big data have been constrained due to computational complexity. Further, the added complexity of relationships among response and explanatory variables, such as higher-order interaction effects, make identifying predictors using standard statistical techniques difficult. These difficulties are only exacerbated in the case of small sample sizes in some studies. Recent analyses have targeted the identification of interaction effects in big data, but the development of methods to identify higher-order interaction effects has been limited by computational concerns. One recently studied method is the Feasible Solutions Algorithm (FSA), a fast, flexible method that aims to find a set of statistically optimal models via a stochastic search algorithm. Although FSA has shown promise, its current limits include that the user must choose the number of times to run the algorithm. Here, statistical guidance is provided for this number iterations by deriving a lower bound on the probability of obtaining the statistically optimal model in a number of iterations of FSA. Moreover, logistic regression is severely limited when two predictors can perfectly separate the two outcomes. In the case of small sample sizes, this occurs quite often by chance, especially in the case of a large number of predictors. Bhattacharyya distance is proposed as an alternative method to address this limitation. However, little is known about the theoretical properties or distribution of B-distance. Thus, properties and the distribution of this distance measure are derived here. A hypothesis test and confidence interval are developed and tested on both simulated and real data.
Digital Object Identifier (DOI)
Janse, Sarah A., "INFERENCE USING BHATTACHARYYA DISTANCE TO MODEL INTERACTION EFFECTS WHEN THE NUMBER OF PREDICTORS FAR EXCEEDS THE SAMPLE SIZE" (2017). Theses and Dissertations--Statistics. 30.