\dm_csml_event_details
| Speaker | Alexey Dosovitskiy | 
|---|---|
| Affiliation | Google Brain | 
| Date | Friday, 27 November 2020 | 
| Time | 14:00-15:00 | 
| Location | Zoom | 
| Link | Zoom | 
| Event series | Jump Trading/ELLIS CSML Seminar Series | 
| Abstract | Slides:  Convolutional networks are the workhorses of modern computer vision, thanks to their efficiency on hardware accelerators and the inductive biases suitable for processing and generating images. However, ConvNets distribute compute uniformly across the input, which makes them convenient to implement and train, but can be extremely computationally inefficient, especially on high-dimensional inputs such as video or 3D data. Moreover, representations extracted by ConvNets lack interpretability and systematic generalization. In this talk, I will present our recent work towards models that aim to avoid these shortcomings by respecting the sparse structure of the real world. On the image recognition front, we are investigating two directions: 1) architectures for learning object-centric representations either with or without supervision (Slot Attention); 2) large-scale non-convolutional models applied to real-world image recognition tasks (Vision Transformer). For image generation, we scale a recent implicit-3D-based neural rendering approach, Neural Radiance Fields, from controlled small-scale datasets to noisy large-scale real-world data (NeRF in the Wild). Join Zoom Meeting | 
| Biography |