25 February 2021
We are pleased to announce a new joint venture between UCL and Inria, one of the leading French research institutes in computer science and applied mathematics. Inria have launched ‘The Inria London Programme’, an initiative to establish a presence in the UK.
To be announced
To be announced
Abstract: As its width tends to infinity, a deep neural network's behavior under gradient descent can become simplified and predictable (e.g. given by the Neural Tangent Kernel (NTK)), if it is parametrized appropriately (e.g. the NTK parametrization). However, we show that the standard and NTK parametrizations of a neural network do not admit infinite-width limits that can learn representations (i.e. features), which is crucial for pretraining and transfer learning such as with BERT. We propose simple modifications to the standard parametrization to allow for feature learning in the limit. Using the *Tensor Programs* technique, we derive explicit formulas for such limits. On Word2Vec and few-shot learning on Omniglot via MAML, two canonical tasks that rely crucially on feature learning, we compute these limits exactly. We find that they outperform both NTK baselines and finite-width networks, with the latter approaching the infinite-width feature learning performance as width increases.
More generally, we classify a natural space of neural network parametrizations that generalizes standard, NTK, and Mean Field parametrizations. We show 1) any parametrization in this space either admits feature learning or has an infinite-width training dynamics given by kernel gradient descent, but not both; 2) any such infinite-width limit can be computed using the Tensor Programs technique.
This work is based on https://arxiv.org/abs/2011.14522.
Abstract: While we have seen immense progress in machine learning, a critical shortcoming of current methods lies in handling distribution shift between training and deployment. Distribution shift is pervasive in real-world problems ranging from natural variation in the distribution over locations or domains, to shift in the distribution arising from different decision making policies, to shifts over time as the world changes. In this talk, I’ll discuss three general principles for tackling these forms of distribution shift: pessimism, adaptation, and anticipation. I’ll present the most general form of each principle before providing concrete instantiations of using each in practice. This will include a simple method for substantially improving robustness to spurious correlations, a framework for quickly adapting a model to a new user or domain with only unlabeled data, and an algorithm that enables robots to anticipate and adapt to shifts caused by other agents.
Bio: Chelsea Finn is an Assistant Professor in Computer Science and Electrical Engineering at Stanford University. Finn's research interests lie in the capability of robots and other agents to develop broadly intelligent behavior through learning and interaction. To this end, her work has included deep learning algorithms for concurrently learning visual perception and control in robotic manipulation skills, inverse reinforcement methods for scalable acquisition of nonlinear reward functions, and meta-learning algorithms that can enable fast, few-shot adaptation in both visual perception and deep reinforcement learning. Finn received her Bachelor's degree in Electrical Engineering and Computer Science at MIT and her PhD in Computer Science at UC Berkeley. Her research has been recognized through the ACM doctoral dissertation award, the Microsoft Research Faculty Fellowship, the C.V. Ramamoorthy Distinguished Research Award, and the MIT Technology Review 35 under 35 Award, and her work has been covered by various media outlets, including the New York Times, Wired, and Bloomberg. Throughout her career, she has sought to increase the representation of underrepresented minorities within CS and AI by developing an AI outreach camp at Berkeley for underprivileged high school students, a mentoring program for underrepresented undergraduates across four universities, and leading efforts within the WiML and Berkeley WiCSE communities of women researchers.
Abstract: We show how to do gradient-based stochastic variational inference in stochastic differential equations (SDEs), in a way that allows the use of adaptive SDE solvers. This allows us to scalably fit a new family of richly-parameterized distributions over irregularly-sampled time series. We apply latent SDEs to motion capture data, and to demonstrate infinitely-deep Bayesian neural networks. We also discuss the pros and cons of this barely-explored model class, comparing it to Gaussian processes and neural processes.
Some technical details are in this paper: https://arxiv.org/abs/2001.01328
And code is available at: https://github.com/google-research/torchsde
Bio: David Duvenaud is an assistant professor in computer science at the University of Toronto. His research focuses on continuous-time models, latent-variable models, and deep learning. His postdoc was done at Harvard University, and his Ph.D. at the University of Cambridge. David also co-founded Invenia, an energy forecasting company.