Date Available

9-6-2023

Year of Publication

2023

Document Type

Doctoral Dissertation

Degree Name

Doctor of Philosophy (PhD)

College

Engineering

Department/School/Program

Computer Science

Faculty

Dr. Brent Harrison

Faculty

Dr. Simone Silvestri

Faculty

Dr. Stephen Ware

Abstract

As more machine learning agents interact with humans, it is increasingly a prospect that an agent trained to perform a task optimally - using only a measure of task performance as feedback--can violate societal norms for acceptable behavior or cause harm. Consequently, it becomes necessary to prioritize task performance and ensure that AI actions do not have detrimental effects. Value alignment is a property of intelligent agents, wherein they solely pursue goals and activities that are non-harmful and beneficial to humans. Current approaches to value alignment largely depend on imitation learning or learning from demonstration methods. However, the dynamic nature of values makes it difficult to learn values through imitation learning-based approaches.

To overcome the limitations of imitation learning-based approaches, in this work, we introduced a complementary technique in which a value-aligned prior is learned from naturally occurring stories that embody societal norms. This value-aligned prior can detect the normative and non-normative behavior of human society as well as describe the underlying social norms associated with these behaviors. To train our models, we sourced data from the children’s educational comic strip, Goofus \& Gallant. Additionally, we have built another dataset by utilizing a crowdsourcing platform. This dataset was created specifically to identify the norms or principles exhibited in the actions depicted within the comic strips. To build a normative prior model, we trained multiple machine learning models to classify natural language descriptions and visual demonstrations of situations found in the comic strip as either normative or non-normative and into different social norms.

Finally, to train a value-aligned agent, we introduced a reinforcement learning-based method, in which we train an agent with two reward signals: a standard task performance reward plus a normative behavior reward. The test environment provides the standard task performance reward, while the normative behavior reward is derived from the value-aligned prior model. We show how variations on a policy shaping technique can balance these two sources of reward and produce policies that are both effective and perceived as being more normative. We test our value-alignment technique on different interactive text-based worlds; each world is designed specifically to challenge agents with a task as well as provide opportunities to deviate from the task to engage in normative and/or altruistic behavior.

Digital Object Identifier (DOI)

https://doi.org/10.13023/etd.2023.404

Recommended Citation

Nahian, Md Sultan Al, "Practical AI Value Alignment Using Stories" (2023). Theses and Dissertations--Computer Science. 139.
https://uknowledge.uky.edu/cs_etds/139

Download

Included in

Artificial Intelligence and Robotics Commons, Data Science Commons

COinS

Theses and Dissertations--Computer Science

Practical AI Value Alignment Using Stories

Date Available

Year of Publication

Document Type

Degree Name

College

Department/School/Program

Faculty

Faculty

Faculty

Abstract

Digital Object Identifier (DOI)

Recommended Citation

Included in

Search

Browse by Author

Author Corner

Connect

Theses and Dissertations--Computer Science

Practical AI Value Alignment Using Stories

Author

Date Available

Year of Publication

Document Type

Degree Name

College

Department/School/Program

Faculty

Faculty

Faculty

Abstract

Digital Object Identifier (DOI)

Recommended Citation

Included in

Share

Search

Browse by Author

Author Corner

Connect