Author ORCID Identifier

https://orcid.org/0000-0002-7169-796

Date Available

12-20-2024

Year of Publication

2024

Document Type

Doctoral Dissertation

Degree Name

Doctor of Philosophy (PhD)

College

Engineering

Department/School/Program

Electrical and Computer Engineering

Advisor

Dr. Luis G. Sánchez Giraldo

Abstract

Information theory provides tools to quantify uncertainty, dependence, and similarity between probability distributions, which are crucial for addressing various machine-learning problems. However, estimating these quantities is challenging because data distributions are usually unknown, and only observations are available for analysis. In this dissertation, we advance the field of information-theoretic learning by developing a comprehensive framework using kernel methods for analyzing probability distributions using reproducing kernel Hilbert spaces (RKHS). By leveraging covariance operators in this representation space, we propose approaches to estimate a set of fundamental information-theoretic quantities, that, because of their resemblance with conventional quantities in information theory, we call representation entropy, representation mutual information, and representation divergence, enabling non-parametric and distribution-free estimation from empirical data.

We introduce novel kernel-based estimators for these quantities which do not require explicit mapping to the RKHS, demonstrating their convergence to population values and showing the connection of the proposed framework to kernel density estimation. To address the computational complexity of these estimators, we employ techniques such as random Fourier features and power-series expansions of the matrix logarithm, which significantly improve the efficiency of computing information-theoretic measures. Furthermore, we propose the difference of matrix-based entropies (DiME), an alternative estimator for representation mutual information that reduces bias and exhibits superior performance in tasks such as independence testing and multiview learning.

Additionally, we introduce new divergence measures, including the representation Jensen-Rényi divergence (RJRD) and representation Jensen-Shannon divergence (RJSD). We demonstrate that RJSD is closely related to the widely-used Maximum Mean Discrepancy (MMD), while capturing higher-order data information, making it a more powerful tool for tasks such as two-sample testing, domain adaptation, and generative modeling.

The information-theoretic learning framework developed in this work offers a versatile and robust approach to solving diverse machine-learning challenges. By allowing the use of multiple kernel functions and different entropy orders, this framework provides a broad set of tools for characterizing distributions, quantifying information, capturing complex dependencies, and detecting distributional differences. These capabilities make it a valuable resource for learning representations that are generalizable, informative, and robust, which is crucial for modern machine-learning tasks.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2024.519

Funding Information

This material is based upon work supported by the Office of the Under Secretary of Defense for Research and Engineering under award number FA9550-21-10227.

Share

COinS