Year of Publication
Doctor of Philosophy (PhD)
Dr. Brent Harrison
Dr. Stephen Ware
As more machine learning agents interact with humans, it is increasingly a prospect that an agent trained to perform a task optimally - using only a measure of task performance as feedback--can violate societal norms for acceptable behavior or cause harm. Consequently, it becomes necessary to prioritize task performance and ensure that AI actions do not have detrimental effects. Value alignment is a property of intelligent agents, wherein they solely pursue goals and activities that are non-harmful and beneficial to humans. Current approaches to value alignment largely depend on imitation learning or learning from demonstration methods. However, the dynamic nature of values makes it difficult to learn values through imitation learning-based approaches.
To overcome the limitations of imitation learning-based approaches, in this work, we introduced a complementary technique in which a value-aligned prior is learned from naturally occurring stories that embody societal norms. This value-aligned prior can detect the normative and non-normative behavior of human society as well as describe the underlying social norms associated with these behaviors. To train our models, we sourced data from the children’s educational comic strip, Goofus \& Gallant. Additionally, we have built another dataset by utilizing a crowdsourcing platform. This dataset was created specifically to identify the norms or principles exhibited in the actions depicted within the comic strips. To build a normative prior model, we trained multiple machine learning models to classify natural language descriptions and visual demonstrations of situations found in the comic strip as either normative or non-normative and into different social norms.
Finally, to train a value-aligned agent, we introduced a reinforcement learning-based method, in which we train an agent with two reward signals: a standard task performance reward plus a normative behavior reward. The test environment provides the standard task performance reward, while the normative behavior reward is derived from the value-aligned prior model. We show how variations on a policy shaping technique can balance these two sources of reward and produce policies that are both effective and perceived as being more normative. We test our value-alignment technique on different interactive text-based worlds; each world is designed specifically to challenge agents with a task as well as provide opportunities to deviate from the task to engage in normative and/or altruistic behavior.
Digital Object Identifier (DOI)
Nahian, Md Sultan Al, "Practical AI Value Alignment Using Stories" (2023). Theses and Dissertations--Computer Science. 139.