Date Available
12-14-2011
Year of Publication
2007
Document Type
Dissertation
College
Arts and Sciences
Department
Chemistry
First Advisor
Robert A. Lodder
Abstract
A new era of chemical analysis is upon us. In the past, a small number of samples were selected from a population for use as a statistical representation of the entire population. More recently, advancements in data collection rate, computer memory, and processing speed have allowed entire populations to be sampled and analyzed. The result is massive amounts of data that convey relatively little information, even though they may contain a lot of information. These large quantities of data have already begun to cause bottlenecks in areas such as genetics, drug development, and chemical imaging. The problem is straightforward: condense a large quantity of data into only the useful portions without ignoring or discarding anything important. Performing the condensation in the hardware of the instrument, before the data ever reach a computer is even better. The research proposed tests the hypothesis that clusters of data may be rapidly identified by linear fitting of quantile-quantile plots produced from each principal component of principal component analysis. Integrated Sensing and Processing (ISP) is tested as a means of generating clusters of principal component scores from samples in a hyperspectral near-field scanning optical microscope. Distances from the centers of these multidimensional cluster centers to all other points in hyperspace can be calculated. The result is a novel digital staining technique for identifying anomalies in hyperspectral microscopic and nanoscopic imaging of human atherosclerotic tissue. This general method can be applied to other analytical problems as well.
Recommended Citation
Harris, Justin Clay, "NEW BIOINFORMATIC TECHNIQUES FOR THE ANALYSIS OF LARGE DATASETS" (2007). University of Kentucky Doctoral Dissertations. 544.
https://uknowledge.uky.edu/gradschool_diss/544