Archived

This content is available here for research, reference, and/or recordkeeping.

Author ORCID Identifier

https://orcid.org/0000-0001-6829-1463

Date Available

5-8-2026

Year of Publication

2026

Document Type

Doctoral Dissertation

Degree Name

Doctor of Philosophy (PhD)

College

Engineering

Department/School/Program

Computer Science

Faculty

Xin Liang

Faculty

Simone Silvestri

Abstract

Scientific simulations and instruments now produce data at rates that overwhelm the storage, memory, and network subsystems of modern high-performance computing (HPC) facilities. Error-bounded lossy compression reduces data movement costs while bounding reconstruction error, yet three barriers limit its adoption in mission-critical workflows: existing compressors cannot guarantee the accuracy of domain-specific quantities of interest (QoIs) derived from compressed data; compression-induced artifacts such as posterization, blocking, and interpolation banding erode user confidence in decompressed fields; and significant compressibility in the quantization index arrays of interpolation-based pipelines remains unexploited. This dissertation addresses all three barriers through four contributions, with the artifact barrier tackled by two complementary sub-contributions.

First, to address the QoI barrier, we develop a modular QoI-preservation layer atop SZ3 grounded in rigorous error-propagation theory for polynomial, product, composition, and bounded-linear QoI families. The framework guarantees that domain metrics remain within user-specified tolerances and achieves up to 4× higher compression ratios than state-of-the-art compressors under equivalent QoI accuracy constraints.

Second, we tackle the artifact barrier through two complementary sub-contributions. (a) We present the first systematic characterization of compression artifacts across both transform-based and prediction-based compressor families and develop compressor-agnostic detection algorithms that provide actionable severity diagnostics without requiring access to the original data. (b) Building on this characterization, we develop a post-decompression quantization-aware interpolation algorithm for GPU-based pre-quantization compressors (cuSZ, cuSZp, cuSZp2, FZ-GPU, SZp) that mitigates posterization by recovering missing gradient information from quantization index patterns, achieving up to 108% improvement in structural similarity and 1.17×–1.34× gains in effective compression ratio at equivalent quality targets.

Third, to address the coding-efficiency barrier, we introduce an adaptive quantization index prediction algorithm for interpolation-based compressors (MGARD, SZ3, QoZ, HPEZ) that reduces quantization index entropy by up to 98% and improves compression ratios by up to 95% on real scientific datasets.

Together, these contributions improve the accuracy, efficiency, and reliability of lossy compression for scientific data, broadening its applicability in production HPC workflows.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2026.276

Archival?

Archival

Funding Information

This dissertation was supported by the National Science Foundation's CRII: OAC: Enabling Quantities-of-Interest Error Control for Trust-Driven Lossy Compression (no.: OAC-2330367) from 2023 to 2025; the National Science Foundation's Collaborative Research: Elements: ProDM: Developing A Unified Progressive Data Management Library for Exascale Computational Science (no.: OAC-2313122) from 2023 to 2026; the National Science Foundation's Collaborative Research: Elements: ProDM: Developing A Unified Progressive Data Management Library for Exascale Computational Science (no.: OAC-2311756) from 2023 to 2026; the National Science Foundation's CAREER: Data Polymorphism: Enabling Fast and Adaptable Scientific Data Retrieval with Progressive Representations (no.: OAC-2442627) from 2025 to 2026; and the National Science Foundation's Collaborative Research: OAC Core: Mitigating Artifacts in Scientific Data Compressors with a Learning-Driven Framework (no.: OAC-2504255) from 2025 to 2026. The author also gratefully acknowledges the University of Kentucky Center for Computational Sciences and Information Technology Services Research Computing for support and use of the Lipscomb Compute Cluster, Morgan Compute Cluster, and associated research computing resources from 2022 to 2026.

Share

COinS