Archived

This content is available here for research, reference, and/or recordkeeping.

Author ORCID Identifier

https://orcid.org/0009-0000-5151-0420

Date Available

4-20-2026

Year of Publication

2026

Document Type

Doctoral Dissertation

Degree Name

Doctor of Philosophy (PhD)

College

Engineering

Department/School/Program

Computer Science

Faculty

Hana Khamfroush

Faculty

Simone Silvestri

Abstract

The rapid proliferation of Internet of Things (IoT) devices and cyber–physical systems (CPS) in domains such as smart healthcare, intelligent transportation, and industrial automation has led to unprecedented volumes of heterogeneous data. While advances in deep learning and large-scale foundation models have enabled powerful data-driven decision making, their deployment in real-world distributed environments remains fundamentally constrained by limited computation, communication bandwidth, energy resources, and data privacy requirements. This dissertation addresses these challenges by developing novel federated learning frameworks and efficient federated data-processing pipelines that make large-scale artificial intelligence practical, scalable, and trustworthy in resource-constrained settings. This dissertation identifies data preprocessing as a critical yet underexplored bottleneck in federated learning systems. It introduces the first comprehensive framework for federated data engineering, with a particular emphasis on federated feature selection for high-dimensional and multi-label data commonly encountered in cyber–physical systems and IoT applications. Within this framework, several novel federated multi-label feature selection methods are developed, leveraging information theory, reinforcement learning, fuzzy similarity measures, and graph-based ranking techniques to effectively reduce data dimensionality, communication overhead, and model complexity without sacrificing predictive performance. These methods are further extended to semi-supervised federated settings, where clients possess only unlabeled data, enabling effective and privacy-preserving collaboration under realistic labeling constraints. Beyond data preprocessing, this dissertation investigates efficient federated model training and the adaptation of foundation models in distributed environments with limited labeled data. To address the prohibitive computational and communication costs of deploying large models on edge devices, the dissertation proposes Federated Reprogramming Knowledge Distillation (FedRD), a framework that keeps foundation models centralized while enabling lightweight client-side student models through task-aware reprogramming and feature alignment. This approach allows resource-constrained clients to benefit from powerful foundation models without incurring the cost of local model deployment. Finally, the dissertation presents an embedded dynamic sparse federated feature selection method that tightly integrates feature selection with federated model training. By dynamically pruning and regrowing model connections during training, this method jointly optimizes prediction accuracy, communication efficiency, and computational cost, resulting in compact and efficient models suitable for large-scale, resource-constrained federated environments. Extensive evaluations on real-world datasets across healthcare, IoT, and multimodal learning tasks demonstrate that the proposed methods significantly reduce communication overhead, computational demands, and inference latency while maintaining or improving model performance. Overall, this dissertation contributes a unified and scalable framework for federated intelligence, advancing the deployment of privacy-preserving, efficient, and interpretable AI systems in real-world cyber–physical environments.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2026.96

Archival?

Archival

Share

COinS