Author ORCID Identifier

https://orcid.org/0000-0002-4160-8392

Date Available

12-11-2025

Year of Publication

2025

Document Type

Doctoral Dissertation

Degree Name

Doctor of Philosophy (PhD)

College

Engineering

Department/School/Program

Computer Science

Faculty

Luis Gonzalo Sanchez Giraldo

Faculty

Simone Silvestri

Abstract

Self-supervised learning (SSL) has become a cornerstone of modern machine learning, offering a scalable alternative to costly human annotation by constructing pretext tasks directly from raw data. While SSL has delivered strong results across vision, language, and multimodal domains, two major limitations persist: (1) SSL methods are often significantly slower to train than supervised counterparts, and (2) evaluation protocols remain narrow, with most studies relying on linear probing accuracy on the pretraining dataset. . These challenges are particularly acute for large language models (LLMs), where training costs and interpretability of intermediate representations are critical concerns.

In this work, we propose a unified framework for analyzing and improving SSL, grounded in information-theoretic principles. Central to our approach is matrix-based entropy, a tractable surrogate for Rényi’s entropy that captures the information content of high-dimensional neural representations. We show that this framework underpins a broad class of evaluation metrics, including fully unsupervised methods, and introduce DiME, a debiased mutual information estimator that robustly measures invariance and dependence in representations. Applying these metrics to LLMs, we reveal that intermediate layers frequently contain richer information than final-layer embeddings for downstream tasks, offering new insights into representation learning dynamics. Notably, we observe a compression phenomenon in which intermediate layers reduce information more than either early or final layers.

On the efficiency side, we survey SSL pretraining methods in the computer vision domain and introduce FroSSL, an information-theoretic algorithm rooted in matrix-based entropy that accelerates convergence by reshaping the spectrum of embedding covariance matrices. FroSSL achieves competitive downstream performance with reduced training time, highlighting structural factors that govern SSL efficiency.

Our contributions aim to clarify how pretext tasks, objective functions, and architectures shape both training dynamics and the information content of representations. The overarching goal is to chart a principled path toward efficient training and robust evaluation of self-supervised representations, enabling more reliable and generalizable unsupervised learning systems.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2025.584

Funding Information

This material is based upon work supported by the Office of the Under Secretary of Defense for Research and Engineering under award number FA9550-21-10227.

Share

COinS