I recently proposed the lottery ticket: hypothesis that the dense neural networks we typically train have much smaller subnetworks capable of reaching full accuracy from early in training. This hypothesis raises (1) scientific questions about the nature of overparameterization in neural network optimization and (2) practical questions about our ability to accelerate training. In this talk, I will discuss established results and the latest developments in my line of work on the lottery ticket hypothesis, including the empirical evidence for these claims on small vision tasks, changes necessary to scale these ideas to practical settings, and the relationship between these subnetworks and their “stability” to the noise of stochastic gradient descent. I will also describe my vision for the future of research on this topic.
Convolutional networks are the workhorses of modern computer vision, thanks to their efficiency on hardware accelerators and the inductive biases suitable for processing and generating images. However, ConvNets distribute compute uniformly across the input, which makes them convenient to implement and train, but can be extremely computationally inefficient, especially on high-dimensional inputs such as video or 3D data. Moreover, representations extracted by ConvNets lack interpretability and systematic generalization. In this talk, I will present our recent work towards models that aim to avoid these shortcomings by respecting the sparse structure of the real world. On the image recognition front, we are investigating two directions; 1) architectures for learning object-centric representations either with or without supervision (Slot Attention); 2) large-scale non-convolutional models applied to real-world image recognition tasks (Vision Transformer . For image generation, we scale a recent implicit-3D-based neural rendering approach, Neural Radiance Fields, from controlled small-scale datasets to noisy large-scale real-world data (NeRF in the Wild).