Value functions are a core component of reinforcement learning (RL) systems. The main idea is to construct a single function approximator that estimates the long-term reward from any state. We introduce universal value function approximators (UVFAs) that generalise not just over states but also over goals. We develop an efficient technique for supervised learning of UVFAs, by factoring observed values into separate embedding vectors for state and goal, and then learning a mapping from state and goal to these factored embedding vectors. We show how this technique may be incorporated into a reinforcement learning algorithm that updates the UVFA solely from observed rewards. Finally, we demonstrate that a UVFA can successfully generalise to previously unseen goals, and can be scaled to complex RL problems such as learning to play Ms Pac-Man from pixels.
Tom Schaul is a senior researcher at Google DeepMind in London, interested in robust, general-purpose learning algorithms. He thinks that progress is possible on general AI, and that games are the perfect benchmark domain for that. Tom did his PhD with Jürgen Schmidhuber at IDSIA and his Postdoc with Yann LeCun at NYU. Since 2008, he has published 40 papers on reinforcement learning, neural networks, artificial curiosity, evolution and other optimization algorithms.