Year of Publication

2006

Document Type

Dissertation

College

Arts and Sciences

Department

Statistics

First Advisor

Arnold J. Stromberg

Second Advisor

Constance L. Wood

Abstract

Receiver operating characteristic (ROC) curves are widely used in medical decision making. It was recognized in the last decade that only a specific region of the ROC curve is of clinical interest, which can be summarized by the partial area under the ROC curve (partial AUC). Early statistical methods for evaluating partial AUC assume that the data are from a specified underlying distribution. Nonparametric estimators of the partial AUC emerged recently, but there are theoretical issues to be addressed. In this dissertation, we propose two new nonparametric statistics, partially integrated ROC and partially integrated weighted ROC, for estimating partial AUC. We show that our partially integrated ROC statistic is a consistent estimator of the partial AUC, and derive its asymptotic distribution which is distribution free under the null hypothesis. In the partially integrated ROC statistic, when the ROC curve crosses the Uniform distribution function (CDF) and if the partial area evaluated contains the crossing point, or when there are multiple crossing, the partially integrated ROC statistic might not perform well. To address this issue, we propose the partially integrated weighted ROC statistic. This statistic evaluates the partially weighted AUC, where larger weight is given when the ROC curve is above the Uniform CDF and smaller weight is given when the ROC curve is below the Uniform CDF. We show that our partially integrated weighted ROC statistic is a consistent estimator of the partially weighted AUC. We derive its asymptotic distribution which is distribution free under the null hypothesis. We propose to apply our two nonparametric statistics to functional category analysis in microarray experiments. We define the functional category analysis to be the statistical identification of over-represented functional gene categories in a microarray experiment based on differential gene expression. We compare our statistics with existing methods for the functional category analysis both via simulation study and application to a real microarray data, and demonstrate that our two statistics are effective for identifying over-represented functional gene categories. We also emphasize the essential role of the empirical distribution function plots and the ROC curves in the functional category analysis.

Share

COinS