Theses and Dissertations--Statistics

Deriving the Distributions and Developing Methods of Inference for R²-type Measures, with Applications to Big Data Analysis

Gregory S. Hawk, University of KentuckyFollow

Author ORCID Identifier

https://orcid.org/0000-0001-8935-9965

Date Available

5-24-2023

Year of Publication

2022

Degree Name

Doctor of Philosophy (PhD)

Document Type

Doctoral Dissertation

College

Arts and Sciences

Department/School/Program

Statistics

First Advisor

Dr. Katherine L. Thompson

Abstract

As computing capabilities and cloud-enhanced data sharing has accelerated exponentially in the 21st century, our access to Big Data has revolutionized the way we see data around the world, from healthcare to investments to manufacturing to retail and supply-chain. In many areas of research, however, the cost of obtaining each data point makes more than just a few observations impossible. While machine learning and artificial intelligence (AI) are improving our ability to make predictions from datasets, we need better statistical methods to improve our ability to understand and translate models into meaningful and actionable insights.

A central goal in the world of statistics and data science is the construction of linear regression models for continuous variables of interest. Often, our objective is to examine the impact of one or more explanatory variables, after adjusting for demographic variables or some other known/relevant covariate(s). While the traditional methodology uses a combination of partial F-tests and individual t-tests to determine statistical significance, we know that p-values obtained from such methods are heavily dependent on sample size. This is particularly problematic for large datasets or "overpowered" studies, where even the tiniest of effects will appear to be highly significant, or for extremely small datasets, where real effects may not reach statistical significance. The coefficient of partial determination (also known as partial R²) is widely used in the applied sciences to supplement hypothesis testing, but little work has been done to understand its statistical properties. In this dissertation, the exact, complete distribution of partial R² is derived, accompanied by simulation studies and real-world data examples to show the advantages of adding coefficients of determination to the analysis of quantitative data models, regardless of sample size. Additionally, two novel inference methods are proposed for both R² and partial R², which build on these distributional results to provide better coverage and more focused intervals for models built using small- and medium-sized datasets.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2022.211

Recommended Citation

Hawk, Gregory S., "Deriving the Distributions and Developing Methods of Inference for R²-type Measures, with Applications to Big Data Analysis" (2022). Theses and Dissertations--Statistics. 63.
https://uknowledge.uky.edu/statistics_etds/63

Download

Included in

Applied Statistics Commons, Statistical Methodology Commons, Statistical Models Commons

COinS

Theses and Dissertations--Statistics

Deriving the Distributions and Developing Methods of Inference for R²-type Measures, with Applications to Big Data Analysis

Author ORCID Identifier

Date Available

Year of Publication

Degree Name

Document Type

College

Department/School/Program

First Advisor

Abstract

Digital Object Identifier (DOI)

Recommended Citation

Included in

Search

Browse by Author

Author Corner

Connect

Theses and Dissertations--Statistics

Deriving the Distributions and Developing Methods of Inference for R2-type Measures, with Applications to Big Data Analysis

Author

Author ORCID Identifier

Date Available

Year of Publication

Degree Name

Document Type

College

Department/School/Program

First Advisor

Abstract

Digital Object Identifier (DOI)

Recommended Citation

Included in

Share

Search

Browse by Author

Author Corner

Connect

Deriving the Distributions and Developing Methods of Inference for R²-type Measures, with Applications to Big Data Analysis