\dm_csml_event_details UCL ELLIS

Causal Foundations for Safe AI


Speaker

Tom Everitt

Affiliation

DeepMind

Date

Friday, 26 August 2022

Time

12:00-13:00

Location

Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH

Link

https://ucl.zoom.us/j/91411912764

Event series

DeepMind/ELLIS CSML Seminar Series

Abstract

With great power comes great responsibility. Human-level+ artificial general intelligence (AGI) may become humanity’s best friend or worst enemy, depending on whether we manage to align its behavior with human interests or not. To overcome this challenge, we must identify the potential pitfalls and develop effective mitigation strategies. In this talk, I’ll argue that (Pearlian) causality offers a useful formal framework for reasoning about AI risk, and describe some of our recent work on this topic. In particular, I’ll cover causal definitions of incentives, agents, side effects, generalization, and preference manipulation, and discuss how techniques like recursion, interpretability, impact measures, incentive design, and path-specific effects can combine to address AGI risks.

Biography

Tom Everitt is a senior researcher at DeepMind, leading a small team on causal approaches to AGI safety. He holds a PhD from Australian National University, where he wrote the first PhD thesis fully focused on AGI safety under the supervision of Prof. Marcus Hutter.