Author ORCID Identifier
Year of Publication
Doctor of Philosophy (PhD)
We introduce a novel approach for learning behaviors using human-provided feedback that is subject to systematic bias. Our method, known as BASIL, models the feedback signal as a combination of a heuristic evaluation of an action's utility and a probabilistically-drawn bias value, characterized by unknown parameters. We present both the general framework for our technique and specific algorithms for biases drawn from a normal distribution. We evaluate our approach across various environments and tasks, comparing it to interactive and non-interactive machine learning methods, including deep learning techniques, using human trainers and a synthetic oracle with feedback distorted to varying degrees. We demonstrate that our algorithm can rapidly learn even in the presence of normally distributed bias, which other methods struggle with, while also exhibiting some resistance to other types of distortion.
Digital Object Identifier (DOI)
This research was supported by a scholarship provided by the University of Kentucky from 2016 to 2023.
Watson, Jonathan Indigo, "The BASIL technique: Bias Adaptive Statistical Inference Learning Agents for Learning from Human Feedback" (2023). Theses and Dissertations--Computer Science. 134.