\dm_csml_event_details
Speaker |
Aleksandar Botev |
---|---|
Affiliation |
Google DeepMind |
Date |
Friday, 22 March 2024 |
Time |
12:00-13:00 |
Location |
Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH |
Link |
https://ucl.zoom.us/j/97245943682 |
Event series |
Jump Trading/ELLIS CSML Seminar Series |
Abstract |
In the last few years transformers have become the default architecture for sequential modelling tasks like language modelling. However, a new family of models - state space models - are trying to challenge the status quo. In this talk we will investigate the recent progress on these class models and provide context and different perspectives on them from both theoretical and practical point of view. We will argue that not only the choice of recurrent layer matters, but rather than whole block design and architecture plays a huge role in their success. With this we will present Griffin - a hybrid of Recurrent Gated Linear Recurrent Unit and Local Attention that achieves state of the art performance similar to Transformers, but is significantly faster at inference, both in latency and throughput. We will also show that these models can leverage much longer contexts than being trained on and will discuss interesting implications of this. |
Biography |
Alex is a Research Scientist in Machine Learning at Google DeepMind. He has worked on generative models, second order optimization, Bayesian methods, ML applied to physics and now is finally dabbling into LLMs. Previously he studied in UCL under the supervision of David Barber for his PhD. |