Archived

This content is available here for research, reference, and/or recordkeeping.

Author ORCID Identifier

https://orcid.org/0000-0002-7746-4247

Date Available

6-1-2026

Year of Publication

2026

Document Type

Doctoral Dissertation

Degree Name

Doctor of Philosophy (PhD)

College

Engineering

Department/School/Program

Computer Science

Faculty

Qiang Cheng

Faculty

Simone Silvestri

Abstract

The effective utilization of structured data is fundamental to modern machine learning, yet it presents distinct challenges in both predictive analysis and generative modeling. Traditional deep learning architectures, particularly Transformers, often suffer from quadratic computational complexity when processing long sequences. This dissertation addresses these limitations by introducing novel architectures based on State-Space Models (SSMs) and Diffusion Models. In the area of predictive analysis, we focus on overcoming the computational bottlenecks of attention mechanisms for tabular and time-series data. First, we introduce MambaTab, a selective state-space architecture designed for efficient tabular classification. By leveraging the linear complexity of SSMs, MambaTab significantly reduces memory overhead while maintaining high accuracy. Second, for temporal data, we propose TimeMachine, a scalable architecture for long-term time-series forecasting. This model captures extended temporal dependencies efficiently, addressing the scalability issues inherent in Transformer-based forecasters. Additionally, we present TSCMamba, a specialized framework for multi-view time-series classification that optimizes feature extraction across diverse temporal benchmarks. Transitioning to generative modeling, we address the challenge of synthesizing strictly constrained structured data. We introduce RefiDiff, a diffusion-based framework for missing data imputation. RefiDiff employs a progressive refinement strategy that bridges predictive initialization with generative synthesis to recover lost information in tabular datasets accurately. Finally, we propose Mol-CADiff, a causality-aware autoregressive diffusion model for molecular graph generation. Unlike standard generative models that often produce similar structures, Mol-CADiff ensures chemical diversity by learning through a causality-integrated diffusion process. Collectively, these contributions demonstrate that specialized State-Space and Diffusion architectures provide superior efficiency and accuracy for structured data tasks.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2026.313

Archival?

Archival

Share

COinS