ELLIS Unit London

Non-convolutional architectures for recognition and generation

Speaker	Alexey Dosovitskiy
Affiliation	Google Brain
Date	Friday, 27 November 2020
Time	14:00-15:00
Location	Zoom
Link	Zoom
Event series	Jump Trading/ELLIS CSML Seminar Series
Abstract	Slides: https://www.dropbox.com/s/0s3sjpefn9e2pl9/Alexey_Dosovitskiy_Non_convolutional_architectures.pdf?dl=0 Convolutional networks are the workhorses of modern computer vision, thanks to their efficiency on hardware accelerators and the inductive biases suitable for processing and generating images. However, ConvNets distribute compute uniformly across the input, which makes them convenient to implement and train, but can be extremely computationally inefficient, especially on high-dimensional inputs such as video or 3D data. Moreover, representations extracted by ConvNets lack interpretability and systematic generalization. In this talk, I will present our recent work towards models that aim to avoid these shortcomings by respecting the sparse structure of the real world. On the image recognition front, we are investigating two directions: 1) architectures for learning object-centric representations either with or without supervision (Slot Attention); 2) large-scale non-convolutional models applied to real-world image recognition tasks (Vision Transformer). For image generation, we scale a recent implicit-3D-based neural rendering approach, Neural Radiance Fields, from controlled small-scale datasets to noisy large-scale real-world data (NeRF in the Wild). Join Zoom Meeting https://ucl.zoom.us/j/97094846920?pwd=MlYvNVZTN2llM2dZZVRpRFh5a1JHZz09
Biography