Computer Science and Engineering

Faculty Candidate Seminar

The Value Alignment Problem in Artificial Intelligence

Dylan Hadfield-MenellPh.D. CandidateUniversity of California, Berkeley
3725 Beyster BuildingMap

Abstract: Much of our success in artificial intelligence stems from the adoption of a simple paradigm: specify an objective or goal, and then use optimization algorithms to identify a behavior (or predictor) that optimally achieves this goal. This has been true since the early days of AI (e.g., search algorithms such as A* that aim to find the optimal path to a goal state), and this paradigm is common to AI, statistics, control theory, operations research, and economics. Loosely speaking, the field has evaluated the intelligence of an AI system by how efficiently and effectively it optimizes for its objective. This talk will provide an overview of my thesis work, which proposes and explores the consequences of a simple, but consequential, shift in perspective: we should measure the intelligence of an AI system by its ability to optimize for our objectives.

In an ideal world, these measurements would be the same — all we have to do is write down the correct objective! This is easier said than done: misalignment between the behavior a system designer actually wants and the behavior incentivized by the reward or loss functions they specify is routine, it is commonly observed in a wide variety of practical applications, and fundamental, as a consequence of limited human cognitive capacity. This talk will build up a formal model of this value alignment problem as a cooperative human-robot interaction: an assistance game of partial information between a human principal and an autonomous agent. It will begin with a discussion of a simple instantiation of this game where the human designer takes one action, write down a proxy objective, and the robot attempts to optimize for the true objective by treating the observed proxy as evidence about the intended goal. Next, I will generalize this model to introduce Cooperative Inverse Reinforcement Learning, a general and formal model of this assistance game, and discuss the design of efficient algorithms to solve it. The talk will conclude with a discussion of directions for further research including applications to content recommendation and home robotics, the development of reliable and robust design environments for AI objectives, and the theoretical study of AI regulation by society as a value alignment problem with multiple human principals.

Bio: Dylan is a final year Ph.D. student at UC Berkeley, advised by Anca Dragan, Pieter Abbeel, and Stuart Russell. His research focuses on the value alignment problem in artificial intelligence. His goal is to design algorithms that learn about and pursue the intended goal of their users, designers, and society in general. His recent work has focused on algorithms for human-robot interaction with unknown preferences and reliability engineering for learning systems.

11:30-Noon: Round Table with Graduate Students following seminar, 3725 Beyster


Cindy Estell

Faculty Host

Satinder Baveja