Author ORCID Identifier
https://orcid.org/0000-0002-7169-796
Date Available
12-20-2024
Year of Publication
2024
Document Type
Doctoral Dissertation
Degree Name
Doctor of Philosophy (PhD)
College
Engineering
Department/School/Program
Electrical and Computer Engineering
Advisor
Dr. Luis G. Sánchez Giraldo
Abstract
Information theory provides tools to quantify uncertainty, dependence, and similarity between probability distributions, which are crucial for addressing various machine-learning problems. However, estimating these quantities is challenging because data distributions are usually unknown, and only observations are available for analysis. In this dissertation, we advance the field of information-theoretic learning by developing a comprehensive framework using kernel methods for analyzing probability distributions using reproducing kernel Hilbert spaces (RKHS). By leveraging covariance operators in this representation space, we propose approaches to estimate a set of fundamental information-theoretic quantities, that, because of their resemblance with conventional quantities in information theory, we call representation entropy, representation mutual information, and representation divergence, enabling non-parametric and distribution-free estimation from empirical data.
We introduce novel kernel-based estimators for these quantities which do not require explicit mapping to the RKHS, demonstrating their convergence to population values and showing the connection of the proposed framework to kernel density estimation. To address the computational complexity of these estimators, we employ techniques such as random Fourier features and power-series expansions of the matrix logarithm, which significantly improve the efficiency of computing information-theoretic measures. Furthermore, we propose the difference of matrix-based entropies (DiME), an alternative estimator for representation mutual information that reduces bias and exhibits superior performance in tasks such as independence testing and multiview learning.
Additionally, we introduce new divergence measures, including the representation Jensen-Rényi divergence (RJRD) and representation Jensen-Shannon divergence (RJSD). We demonstrate that RJSD is closely related to the widely-used Maximum Mean Discrepancy (MMD), while capturing higher-order data information, making it a more powerful tool for tasks such as two-sample testing, domain adaptation, and generative modeling.
The information-theoretic learning framework developed in this work offers a versatile and robust approach to solving diverse machine-learning challenges. By allowing the use of multiple kernel functions and different entropy orders, this framework provides a broad set of tools for characterizing distributions, quantifying information, capturing complex dependencies, and detecting distributional differences. These capabilities make it a valuable resource for learning representations that are generalizable, informative, and robust, which is crucial for modern machine-learning tasks.
Digital Object Identifier (DOI)
https://doi.org/10.13023/etd.2024.519
Funding Information
This material is based upon work supported by the Office of the Under Secretary of Defense for Research and Engineering under award number FA9550-21-10227.
Recommended Citation
Hoyos Osorio, Jhoan Keider, "INFORMATION-THEORETIC LEARNING FRAMEWORK BASED ON COVARIANCE OPERATORS ON REPRODUCING KERNEL HILBERT SPACES" (2024). Theses and Dissertations--Electrical and Computer Engineering. 208.
https://uknowledge.uky.edu/ece_etds/208