UCL ELLIS

Speaker	Tom Everitt
Affiliation	DeepMind
Date	Friday, 26 August 2022
Time	12:00-13:00
Location	Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH
Link	https://ucl.zoom.us/j/91411912764
Event series	Jump Trading/ELLIS CSML Seminar Series
Abstract	With great power comes great responsibility. Human-level+ artificial general intelligence (AGI) may become humanity’s best friend or worst enemy, depending on whether we manage to align its behavior with human interests or not. To overcome this challenge, we must identify the potential pitfalls and develop effective mitigation strategies. In this talk, I’ll argue that (Pearlian) causality offers a useful formal framework for reasoning about AI risk, and describe some of our recent work on this topic. In particular, I’ll cover causal definitions of incentives, agents, side effects, generalization, and preference manipulation, and discuss how techniques like recursion, interpretability, impact measures, incentive design, and path-specific effects can combine to address AGI risks.
Biography	Tom Everitt is a senior researcher at DeepMind, leading a small team on causal approaches to AGI safety. He holds a PhD from Australian National University, where he wrote the first PhD thesis fully focused on AGI safety under the supervision of Prof. Marcus Hutter.