\dm_csml_event_details UCL ELLIS

Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models


Speaker

Aleksandar Botev

Affiliation

Google DeepMind

Date

Friday, 22 March 2024

Time

12:00-13:00

Location

Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH

Link

https://ucl.zoom.us/j/97245943682

Event series

Jump Trading/ELLIS CSML Seminar Series

Abstract

In the last few years transformers have become the default architecture for sequential modelling tasks like language modelling. However, a new family of models - state space models - are trying to challenge the status quo. In this talk we will investigate the recent progress on these class models and provide context and different perspectives on them from both theoretical and practical point of view. We will argue that not only the choice of recurrent layer matters, but rather than whole block design and architecture plays a huge role in their success. With this we will present Griffin - a hybrid of Recurrent Gated Linear Recurrent Unit and Local Attention that achieves state of the art performance similar to Transformers, but is significantly faster at inference, both in latency and throughput. We will also show that these models can leverage much longer contexts than being trained on and will discuss interesting implications of this.

Biography

Alex is a Research Scientist in Machine Learning at Google DeepMind. He has worked on generative models, second order optimization, Bayesian methods, ML applied to physics and now is finally dabbling into LLMs. Previously he studied in UCL under the supervision of David Barber for his PhD.