Author ORCID Identifier

Date Available


Year of Publication


Degree Name

Master of Science in Electrical Engineering (MSEE)

Document Type

Master's Thesis




Electrical Engineering

First Advisor

Dr. Biyun Xie

Second Advisor

Dr. Michael Sama


The focus of this research is to design a sensor data aggregation system and centralized sensor-driven trajectory planning algorithm for fixed-wing aircraft to optimally assist atmospheric simulators in mapping the local environment in real-time. The proposed application of this work is to be used in the event of a hazardous contaminant leak into the atmosphere as a fleet of sensing unmanned aerial vehicles (UAVs) could provide valuable information for evacuation measures. The data aggregation system was designed using a state-of-the-art networking protocol and radio with DigiMesh and a process/data management system in the ROS2 DDS. This system was tested to consistently operate within the latencies and distances tolerated for the project while being highly extensible to sensor configurations. The problem of creating optimal trajectory planning for exploration has been modelled accurately using partially-observable Markov decision processes (POMDP). Deep Reinforcement learning (DRL) is commonly applied to approximate optimal solutions within a POMDP as it can be analytically intractable for complex state spaces. This research produces a POMDP that describes this exploration problem and applies the state-of-the-art soft actor-critic (SAC) reinforcement learning algorithm to create a policy that produces near-
optimal trajectories within this new POMDP. A subset of the spatially relevant input
is used instead of complete state during training and a turn-taking sequential planner is designed for using multiple UAVs to help mitigate scalability problems that come with multi-UAV coordination. The learned policy from SAC can outperform a greedy and fixed trajectory on 1, 2, and 3 UAVs by a 30% margin on average. The turn-taking strategy provides small, but repeatable scaling benefits while the windowed input results in a 50%-60% increase in reward versus trained networks without windowed input. The proposed planning algorithm is effective in dynamic map exploration and has the potential to increase UAV effectiveness in atmospheric contaminant leak monitoring as it is expanded to be integrated on real-world UAVs.

Digital Object Identifier (DOI)

Funding Information

This work was supported by the National Science Foundation (no.: 1932105) in 2019.