Markov chain Monte Carlo (MCMC) provides the dominant methodology for inference over statistical models with non-conjugate priors. Despite a wealth of theoretical characterisation of mixing times, geometric ergodicity, and asymptotic step-sizes, the design and implementation of MCMC methods remains something of an engineering art-form. An attempt to address this issue in a systematic manner leads one to consider the geometry of probability distributions, as has been the case previously in the study of e.g. higher-order efficiency in statistical estimators. By considering the natural Riemannian geometry of probability distributions MCMC proposal mechanisms based on Langevin diffusions that are characterised by the metric tensor and associated manifold connections are proposed and studied. Furthermore, optimal proposals that follow the geodesic paths related to the metric are defined via the Hamilton-Jacobi approach and these are empirically evaluated on some challenging modern-day inference tasks. Finally the exploitation of foliations in defining proposal mechanisms for hierarchical Bayesian models provides a tantalising glimpse of potential general methodology for efficient sampling for these notoriously challenging problems.

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20120427T123000 DTEND;TZID=/Europe/London:20120427T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20120511123000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Tom Furmston DESCRIPTION:Tom Furmston (UCL): Gradient-based algorithms for policy search\n\nLocation: Zoom\n\nLink: Wilkins Haldane Room, Wilkins Building\n\nAbstract:Gradient-based algorithms are one of the methods of choice for the optimisation of Markov Decision Processes. In this talk we will present a novel approximate Newton algorithm for the optimisation of such models. The algorithm has various desirable properties over the naive application of Newton's method. Firstly the approximate Hessian is guaranteed to be negative-semidefinite over the entire parameter space in the case where the controller is $\log$-concave in the control parameters. Additionally the inference required for our approximate Newton method is often the same as that required for first order methods, such as steepest gradient ascent. The approximate Hessian also has many nice sparsity properties that are not present in the Hessian and that make its inversion efficient in many situations of interest. We also provide an analysis that highlights a relationship between our approximate Newton method and both Expectation Maximisation and natural gradient ascent. Empirical results suggest that the algorithm has excellent convergence and robustness properties. Time permitting we will then go onto the problem of performing inference in gradient-based algorithms, where we shall focus on model-based inference.

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20120511T123000 DTEND;TZID=/Europe/London:20120511T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20120525123000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Gabi Teodoru DESCRIPTION:Gabi Teodoru (UCL): Spectral Learning of Latent Variable Models and its Interpretation as an Optimization Problem\n\nLocation: Zoom\n\nLink: Darwin B15 Biochemistry LT\n\nAbstract:Spectral learning is a novel method for learning latent variable models (e.g. hidden Markov models, Kalman filters). In the limit of infinite data, the spectral learning algorithm is able to identify the true model parameters, unlike the more popular Expectation Maximization (EM) algorithm, which typically optimizes a non-convex cost function, and therefore fails to identify the true parameters even in the limit of infinite data because the optimization gets stuck in local minima.

Previous work has applied spectral learning to HMMs and Kalman filters, as well as tree-structured graphs. It has also proven algorithm consistency and provided finite sample bounds. We take a step further and re-interpret the spectral learning algorithm as an optimization problem; this provides several advantages: it allows for more efficient use of the data, makes it possible to add regularizers to this cost function or use the generalized method of moments cost function instead, and allows us to extend the method to other models for which there does not exist a convex cost function.

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20120525T123000 DTEND;TZID=/Europe/London:20120525T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20120611120000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Larry Wasserman DESCRIPTION:Larry Wasserman (Carnegie Mellon University): Discussion\n\nLocation: Zoom\n\nLink: Malet Place Eng 1.20\n\nAbstract:A great opportunity to interact closely with a researcher of such a caliber as Larry and ask him lots of questions. Since he is already giving various talks and lectures there will be no talk just lunch so prepare the questions you would like to ask. We are giving you a heads up now so you have time to prepare and maybe read about his current research. We will send another email once we are closer to the date. Looking forward to see you all there.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20120611T120000 DTEND;TZID=/Europe/London:20120611T133000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20120612123000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Shivani Lamba (Founder/CEO of Chechako) and Marshall Levine (Wise Counsel for Chechako Ltd) DESCRIPTION:Shivani Lamba, (Founder/CEO of Chechako) and Marshall Levine, (Wise Counsel for Chechako Ltd) (Multiple): Startup Pitch\n\nLocation: Zoom\n\nLink: Darwin Biochemistry LT\n\nAbstract:**Startup Pitch**: An online competition platform that intelligently identifies the best new (cross-industry) talent using social data points and predictive analytics

**The Concept**

Talent on social media is mostly latent but potentially lucrative. Companies across film, music, publishing and sport use current social networks, or create their own, to source untapped talent from around the world.

As yet, there is no single dedicated interface which connects organisations to users in a way that creates tangible assets, or uses analytics to propel this process forward.

We are currently negotiating with several high-visibility "scout" organisations to become part of our platform and use our technology, including: constituent members of the Independent Publishers Guild; Warner Music Group; and Film Tank.

**Potential Partnership with CSML**

When it launches in late 2012, it will integrate innovative technology to help companies recruit talent more intelligently.

As we have received interested from multiple investors (including NESTA) and will be applying for grants from the UK Technology Strategy Board, we are looking to build a partnership with CSML and recruit a team from the Centre.

Please come along if you'd like to become an intrinsic part of a startup with tremendous possibility.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20120612T123000 DTEND;TZID=/Europe/London:20120612T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20120622123000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Adam Sykulski DESCRIPTION:Adam Sykulski (UCL): Statistical modelling and estimation of physical phenomena in ocean surface trajectories\n\nLocation: Zoom\n\nLink: Darwin B15 Biochemistry LT\n\nAbstract:We have a large data set from 10,000 or so floating devices which drift around the Earth's oceans. Summarising these 'ocean surface trajectories’ is best done using time series analysis. In this talk we provide a general introduction to time series analysis, with a focus on complex-valued time series and spectral domain analysis - which is particularly useful for ocean data (and indeed many other applications). We then construct a semi-parametric statistical model for ocean surface trajectories which is composed of a combination of stochastic processes. We demonstrate the usefulness of our model in terms of detecting and summarising changes in the behaviour of the data - which helps the oceanographers understand how the movement of oceans change over time and space.

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20120622T123000 DTEND;TZID=/Europe/London:20120622T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20120702123000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Yuan (Alan) Qi DESCRIPTION:Yuan (Alan) Qi (Purdue University): Bayesian learning with big data: virtual vector machines and Gaussian processes with sparse eigenval\n\nLocation: Zoom\n\nLink: Darwin B15 Biochemistry LT\n\nAbstract:Title: Bayesian learning with big data: virtual vector machines and Gaussian processes with sparse eigenvalues

Abstract:

In this talk I will cover two topics that have become increasingly important given big data: online learning and sparse Gaussian process models. First, in a typical online learning scenario, a learner is required to process a large data stream using a small memory buffer. Such a requirement is usually in conflict with a learner’s primary pursuit of prediction accuracy. To address this dilemma, we introduce a novel Bayesian online classification algorithm, called the Virtual Vector Machine. The virtual vector machine allows you to smoothly trade-off prediction accuracy with memory size. The virtual vector machine summarizes the information contained in the preceding data stream by a Gaussian distribution over the classification weights plus a constant number of virtual data points. The extra information provided by the virtual points leads to improved predictive accuracy over previous online classification algorithms. Second, we propose a sparse Gaussian process model, EigenGP, based on Karhunen-Loeve (KL) expansions of a GP prior. We use the Nystrom approximation to obtain eigenfunctions of the covariance function and use an empirical Bayesian approach to select these eigenfunctions. By selecting eigenfunctions of Gaussian kernels that are associated with data clusters, EigenGP is also suitable for semi-supervised learning. Our experimental results demonstrate improved predictive performance of EigenGP over several state-of-the- art sparse GP and semisupervised learning methods for regression, classification, and semisupervised classification.

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20120702T123000 DTEND;TZID=/Europe/London:20120702T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20120713123000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Vinayak Rao DESCRIPTION:Vinayak Rao (UCL): Efficient MCMC for Continuous Time Discrete State Systems\n\nLocation: Zoom\n\nLink: TBA\n\nAbstract:A variety of phenomena are best described using dynamical models which operate on a discrete state space and in continuous time. Examples include Markov jump processes, continuous time Bayesian networks, renewal processes and other point processes, with applications ranging from systems biology, genetics, computing networks and human-computer interactions. Posterior computations typically involve approximations like time discretization and can be computationally intensive. In this

talk I will describe recent work on a class of Markov chain Monte Carlo methods that allow efficient computations while still being exact. The core idea is to use an auxiliary variable Gibbs sampler based on uniformization, a representation of a continuous time dynamical system as a Markov chain operating over a discrete set of points drawn from a Poisson process.

Joint work with Yee Whye Teh.

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20120713T123000 DTEND;TZID=/Europe/London:20120713T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20120928123000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Janaina Mourao-Miranda Jane Maryam Rondina Maria Joao Rosa DESCRIPTION:Janaina Mourao-Miranda, Jane Maryam Rondina, Maria Joao Rosa (UCL): Machine learning approaches for clinical neuroimaging data\n\nLocation: Zoom\n\nLink: Cruciform B404 - LT2\n\nAbstract:Our group is interested in developing mathematical models and tools for

the application of novel machine learning techniques to the analysis of

brain imaging data. We focus on the diagnosis and prognosis of psychiatric

disorders and on understanding affective processing in normal and patients

groups. In this talk we will describe the machine-learning framework we

use for pattern recognition analysis of neuroimaging data. We will

introduce the relevant clinical questions that can be addressed with this

framework and briefly describe our current methodological developments to

investigate these questions, such as new feature selection, multivariate

brain mapping and generative embedding approaches. We will also present

our recently developed software: Pattern Recognition for Neuroimaging

Toolbox, aka PRoNTo.

Slides for the talk: Part 1, Part 2, Part 3

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20120928T123000 DTEND;TZID=/Europe/London:20120928T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20121012123000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Jan Gasthaus DESCRIPTION:Jan Gasthaus (UCL): Hierarchical Bayesian Nonparametric Models for Sequences\n\nLocation: Zoom\n\nLink: Cruciform B404 - LT2\n\nAbstract:Hierarchical Bayesian nonparametric models based on the Dirichlet process (DP) or the Pitman-Yor process (PYP) have recently become popular because they provide a flexible framework for expressing prior beliefs over sets of related probability measures. One area where this approach has been particularly effective is sequence modeling in general and language modeling (i.e. modeling sequences of words in natural language text) in particular, where the dependencies between context-dependent probability distributions can naturally be modeled using a context tree hierarchy, and the power-law properties of the PYP prior match those found in natural language data. I will present the basic hierarchical PYP model, its extension to infinitely deep context trees (dubbed the "Sequence Memoizer"), and recent developments for modeling multi-domain data and non-stationary sequences.

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20121012T123000 DTEND;TZID=/Europe/London:20121012T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20121026123000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Dino Sejdinovic DESCRIPTION:Dino Sejdinovic (UCL): Equivalence of distance-based and RKHS-based statistics in hypothesis testing\n\nLocation: Zoom\n\nLink: Cruciform B404 - LT2\n\nAbstract:We provide a unifying framework linking two classes of statistics used in two-sample and independence testing: on the one hand, the energy distances and distance covariances from the statistics literature; on the other, Maximum Mean Discrepancies (MMD), i.e., distances between embeddings of distributions to reproducing kernel Hilbert spaces (RKHS), as established in machine learning. In the case where the energy distance is computed with the semimetric of negative type, a positive definite kernel, termed distance kernel, may be defined such that the MMD corresponds exactly to the energy distance. Conversely, for any positive definite kernel, we can interpret the MMD as energy distance with respect to some negative-type semimetric. This equivalence readily extends to the case of independence testing using kernels on the product space. We determine the class of probability distributions for which the test statistics are consistent against all alternatives. Finally, we investigate the performance of the family of distance kernels in two-sample and independence tests.

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20121026T123000 DTEND;TZID=/Europe/London:20121026T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20121109123000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Steffen Grunewalder DESCRIPTION:Steffen Grunewalder (UCL): Conditional Expectation Estimates for Discrete Control\n\nLocation: Zoom\n\nLink: Cruciform B404 - LT2\n\nAbstract:I will talk about estimates of conditional expectations and their application to control problems in discrete time and space. I will show empirical performance of the approach and will sketch a proof of convergence to the best possible strategy. If time permits, I will discuss the discrete time and continuous space setting and show how the proof can be extended to this case.

Joint work with Guy Lever, Luca Baldassarre, Massi Pontil & Arthur Gretton.

In the first half of this talk, Ben Calderhead will give an introduction to Markov chain Monte Carlo methods, in which he will demonstrate the main challenges encountered when performing inference over many of the complex statistical models that are of interest in current biological scientific research. He will then introduce the connection between statistical models and Riemannian geometry, and show how this allows far more efficient MCMC algorithms to be developed. Finally he will discuss his very recent NIPS paper, which presents a sampling scheme based on a Langevin type diffusion that approximates the local Riemannian geometry. This work extends differential geometric MCMC methods to statistical models where the metric tensor (given by the Expected Fisher Information) is analytically intractable.

In the second part, Simon Byrne will talk about MCMC methods over embedded manifolds. Embedded manifolds, such as simplices and hyperspheres, arise in a variety of statistical models, but can often be difficult to work with computationally. I'll talk about the Hamiltonian Monte Carlo algorithm, and how it may be modified to operate on these complicated spaces by exploiting their unique geometric structure. Applications include dimension reduction models such as mixture models and latent factor models.

Slides for the talk: Ben Calderhead (PDF), Simon Byrne (PDF)

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20121123T123000 DTEND;TZID=/Europe/London:20121123T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20121129130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Juan Carlos Martinez-Ovando DESCRIPTION:Juan Carlos Martinez-Ovando (Banco de México): Non- and semi-parametric construction of stationary dependent models\n\nLocation: Zoom\n\nLink: Bedford Way LG04\n\nAbstract:In this talk we present a procedure to constructing stationary dependent models based on latent probability measures. The idea focuses in developing first-order dependent models from particular conditional independence structures. We address the idea with the introduction of a fully non-parametric Markov model in discrete time, as well with some other extensions incorporating exogenous interventions and multivariate versions in a semi-parametric setting. We also sketch some ideas to perform inference and predictions within the Bayesian framework.

(Joint work with Stephen G. Walker, U. Kent)

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20121129T130000 DTEND;TZID=/Europe/London:20121129T143000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20130111123000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Ed Challis DESCRIPTION:Ed Challis (UCL): Variational approximate inference in linear latent variable models\n\nLocation: Zoom\n\nLink: Cruciform B404 - LT2\n\nAbstract:Linear latent variable models (such as factor analysis and probabilistic principal components analysis) and Bayesian generalized linear models (such as logistic regression and noise robust linear regression) are used widely throughout Machine Learning and Statistics. However, in all but the simplest cases inference remains computationally intractable.

This talk will focus on parametric Kullback-Leibler approximate inference methods as applied to such models. Parametric Kullback-Leibler approximate inference provides both a parametric approximation to the intractable posterior and lower bound to its normalisation constant. I will present my work on developing Gaussian KL approximate inference methods and introduce a new flexible approximating density class for which parametric KL inference is tractable and efficient.

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20130111T123000 DTEND;TZID=/Europe/London:20130111T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20130125123000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Andriy Mnih DESCRIPTION:Andriy Mnih (UCL): A fast and simple algorithm for training neural probabilistic language models\n\nLocation: Zoom\n\nLink: Cruciform B404 - LT2\n\nAbstract:In spite of their superior performance, neural probabilistic language models (NPLMs) remain far less widely used than n-gram models due to their notoriously long training times, which are measured in weeks even for moderately-sized datasets. Training NPLMs is computationally expensive because they are explicitly normalized, which leads to having to consider all words in the vocabulary when computing the log-likelihood gradients.

We propose a fast and simple algorithm for training NPLMs based on noise-contrastive estimation, a newly-introduced procedure for estimating unnormalized continuous distributions. We investigate the behaviour of the algorithm on the Penn Treebank corpus and show that it reduces the training times by more than an order of magnitude without affecting the quality of the resulting models. The algorithm is also more efficient and much more stable than importance sampling because it requires far fewer noise samples to perform well.

We demonstrate the scalability of the proposed approach by training several neural language models on a 47M-word corpus with a 80K-word vocabulary, obtaining state-of-the-art results on the Microsoft Research Sentence Completion Challenge dataset.

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20130125T123000 DTEND;TZID=/Europe/London:20130125T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20130208123000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Gary Macindoe DESCRIPTION:Gary Macindoe (UCL): A hybrid Cholesky decomposition algorithm for multicore CPUs with GPU accelerators\n\nLocation: Zoom\n\nLink: Cruciform B404 - LT2\n\nAbstract:Use of the Cholesky decomposition appears throughout the field of computational statistics and is often the performance bottleneck of such algorithms. As the number of cores available in a processor increases algorithms need to be redesigned to extract performance by running operations in parallel rather than relying on an increase in clock speeds. In addition, graphics processing units are capable of executing tens of thousands of operations in parallel and are no longer restricted to graphical calculations.

We have developed a Cholesky decomposition algorithm for multi-core CPUs and GPUs. We introduce a new method of copying submatrices and use it to have the GPU and CPU calculate the matrix in parallel. We add a new level of dynamic blocking that matches the workload to the compute device at each iteration and also exploit the differences between SIMD and SIMT programming to have multiple functions execute simultaneously on older classes of GPU that do not have this capability built into the hardware.

Our methods are generally applicable to blocked algorithms for linear algebra such as those in the LAPACK library.

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20130208T123000 DTEND;TZID=/Europe/London:20130208T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20130222123000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Matthew Higgs DESCRIPTION:Matthew Higgs (UCL): A Population Approach to Ubicomp System Design (APAUSD)\n\nLocation: Zoom\n\nLink: Cruciform B404 - LT2\n\nAbstract:The emergence of the “app store” market as a way to distribute software applications for mobile devices has provided a means for researchers to run worldwide trials of ubiquitous computing (ubicomp) applications with very large numbers of users. The state of the app, the action of the user, and any contextual information the phone can detect, are all values that can be timestamped, logged, and used for analysis. The APAUSD project is a collaboration between UCL and Glasgow University, and aims to utilise this abundance of new information to develop analysis tools that will hopefully aid in the design of future software. UCL’s role in the project is to develop statistical models admitting a natural extension to the domain of application. In this talk I will discuss ongoing work within the project and the related ideas that make the future work so exciting. The talk will be low in mathematical content and should appeal to a wide CSML audience.

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20130222T123000 DTEND;TZID=/Europe/London:20130222T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20130301123000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Tamara Broderick DESCRIPTION:Tamara Broderick (University of California, Berkeley): Feature allocations, probability functions, and paintboxes\n\nLocation: Zoom\n\nLink: Roberts 421\n\nAbstract:The problem of inferring a clustering of a data set has been the subject of much research in Bayesian analysis, and there currently exists a solid mathematical foundation for Bayesian approaches to clustering. In particular, the class of probability distributions over partitions of a data set has been characterized in a number of ways, including via exchangeable partition probability functions (EPPFs) and the Kingman paintbox. Here, we develop a generalization of the clustering problem, called feature allocation, where we allow each data point to belong to an arbitrary, non-negative integer number of groups, now called features or topics. We define and study an "exchangeable feature probability function" (EFPF)---analogous to the EPPF in the clustering setting---for certain types of feature models. Moreover, we introduce a "feature paintbox" characterization---analogous to the Kingman paintbox for clustering---of the class of exchangeable feature models. We use this feature paintbox construction to provide a further characterization of the subclass of feature allocations that have EFPF representations.

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20130301T123000 DTEND;TZID=/Europe/London:20130301T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20130308123000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: David Silver DESCRIPTION:David Silver (UCL): Reinforcement Learning and Simulation-Based Search\n\nLocation: Zoom\n\nLink: Cruciform B404 - LT2\n\nAbstract:Simulation-based search is a highly successful paradigm for planning in challenging search spaces. Intuitively, the idea is to repeatedly imagine how the future might play out, and to learn from this imagined experience. Simulation-based search methods typically play out millions of sequences, and build up a large search tree of possible futures. By applying reinforcement learning (i.e. trial-and-error learning) to these sequences, it is possible to identify a near-optimal strategy in a computationally efficient manner. In this talk I will outline the relationship between reinforcement learning and simulation-based search, and show how reinforcement learning methods can be turned into powerful planning algorithms. Highlights of this approach include i) the world's first master-level computer Go program, ii) a program that convincingly defeated the built-in AI in Civilization II, and iii) the winning algorithm for the international POMDP planning competition (problems with hidden state).

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20130308T123000 DTEND;TZID=/Europe/London:20130308T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20130322123000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Vladimir Krylov DESCRIPTION:Vladimir Krylov (UCL): Extraction of geometrical objects from images with MCMC methods\n\nLocation: Zoom\n\nLink: Cruciform B404 - LT2\n\nAbstract:Numerous image processing applications require automatic extraction of geometrical structure information: from lines in mammographic images (for cancer detection) to buildings and roads on satellite imagery (for coregistration and mapping). In this talk I am going to present several applications of conventional Markov Chain Monte Carlo (MCMC) and reversible jump MCMC to extraction of such geometrical structure from 2D images. More specifically, each of the extracted objects is modeled by a geometrical element (e.g., line segments, rectangles, circles, ellipses) whose location and parameters are adjusted throughout iterative MCMC process in order to accurately fit the data.

Dr. Vladimir Krylov is a Research Associate working with Dr. James Nelson in the Department of Statistical Science at UCL. His research interests are statistical image and signal processing, in particular pattern recognition in medical and remotely sensed imagery.

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20130322T123000 DTEND;TZID=/Europe/London:20130322T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20130405123000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Robert Jenssen DESCRIPTION:Robert Jenssen (University of Tromso, Norway): Entropy-Relevant Dimensions in Kernel Feature Space\n\nLocation: Zoom\n\nLink: Malet Place Eng 1.03\n\nAbstract:The non-linear mapping to feature space is a very important concept in kernel-based machine learning for signal processing, within the framework of positive semi-definite (psd) kernels. Given labeled data, algorithms such as support vector machines or projection methods such as Fisher discriminant analysis may be executed in feature space. For unsupervised dimensionality reduction in feature space, the most common approach is to perform principal component analysis (PCA) in that space, thus maximally capturing the variability of the feature space data, however without necessarily capturing any cluster structure in the data. In this talk, the theory behind the feature space mapping is considered and recent advances are reviewed which broaden the understanding and interpretability of the mapping in terms of a key input space quantity, namely the quadratic Renyi entropy of the data, via the eigenvalues and eigenfunctions of a psd convolution operator. Focusing on the unsupervised case, the identification of entropy-relevant dimensions in feature space is described. Recent results showing that these dimensions capture structure in the data in the form of clusters are reviewed, and it is shown that they are in general different from the kernel PCA dimensions. Differences between these approaches to dimensionality reduction for visualization and clustering are illustrated.

The talk is based on the paper R. Jenssen, "Entropy-Relevant Dimensions in Kernel Feature Space," to appear in the IEEE Signal Processing Magazine, special issue on advances in kernel-based learning for signal processing, July 2013.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20130405T123000 DTEND;TZID=/Europe/London:20130405T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20130426123000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Chris Bracegirdle DESCRIPTION:Chris Bracegirdle (UCL (CS)): Probabilistic Inference for Changepoints and Cointegration\n\nLocation: Zoom\n\nLink: Malet Place Eng 1.03\n\nAbstract:In this talk I will present my PhD work - first on Bayesian generative models for time-series with changepoints, showing the complexity of exact inference. Second, I will describe work on cointegrated time series, including an approach to estimating the regression coefficients of cointegrated series based on a Bayesian generative model, which seeks to overcome the bias of the traditional OLS estimator for these non-stationary series. I show how the cointegration model can be effective for detecting simple cointegration between series and moreover, by application of changepoint inference, also for intermittent cointegration - allowing for periods when cointegration is "switched off".

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20130426T123000 DTEND;TZID=/Europe/London:20130426T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20130510123000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Isadora Antoniano-Villalobos DESCRIPTION:Isadora Antoniano-Villalobos (Department of Decision Sciences, Bocconi University, Italy): Bayesian inference for nonparametric mixture models with intractable normalizing constants\n\nLocation: Zoom\n\nLink: Malet Place Eng 1.03\n\nAbstract:Since the advent of Bayesian posterior inference via simulation techniques, it has been possible to estimate Bayesian nonparametric models. While the mixture of Dirichlet process (MDP) model remains one of the most popular, the advances in MCMC methods have now allowed models to move away from standard setups involving independent and identically distributed observations, to cover more complex data structures, such as regression models and time series models.

In this talk, we discuss some models for which the normalizing constant for the likelihood function involves an infinite sum, making it intractable. In such cases, it is not possible to apply directly the variety of MCMC schemes currently available for simulation from the posterior distributions of infinite mixture models. We propose a latent variable extension for the intractable models, involving auxiliary variables which are themselves infinite-dimensional. We then discuss inference for such extended models, via simulation techniques which combine the now popular slice sampling method for infinite mixture models, with trans-dimensional MCMC ideas.

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20130510T123000 DTEND;TZID=/Europe/London:20130510T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20131101130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Sam Livingstone DESCRIPTION:Sam Livingstone (UCL, Statistics): Diffusions with position-dependent volatility and the Metropolis-adjusted Langevin algorithm\n\nLocation: Zoom\n\nLink: Malet Place Engineering 1.03\n\nAbstract:The Metropolis-adjusted Langevin algorithm (MALA) and manifold-variant (MMALA) are two Markov chain Monte Carlo methods based on diffusions. While theoretical properties of the former are better understood, the latter has appeared more effective in practice, producing more efficient estimates for the same computational budget in many experiments (e.g. Girolami & Calderhead, 2011). The focus of this talk will be to highlight two properties of the diffusion on which MMALA is based, which suggest that a slightly different diffusion would prove a better basis for MCMC, both in terms of proposal choice and speed of computation.

The talk will be in two parts. In the first half I’ll review the motivation for diffusion-based MCMC methods like MALA, and use this motivation to derive a diffusion with position-dependent volatility which would seem to be a good choice in this respect. After this I’ll highlight why the diffusion on which previous position-dependent Langevin algorithms (such as MMALA and a similar algorithm suggested in Roberts & Stramer, 2002) are based is different to this, which involve introducing some simple concepts from differential geometry. To add some weight to the claim that the new algorithm is in fact a more suitable choice for MCMC, I’ll then show some experimental results from a range of statistical models.

This is joint work with Chris Sherlock & Tatiana Xifara (Lancaster), and Simon Byrne & Mark Girolami (UCL).

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20131101T130000 DTEND;TZID=/Europe/London:20131101T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20131115130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Thore Graepel DESCRIPTION:Thore Graepel (Microsoft Research Cambridge and Chair of Machine Learning, Department of Computer Science, UCL): Private traits and attributes are predictable from digital records of human behavior\n\nLocation: Zoom\n\nLink: Malet Place Engineering 1.03\n\nAbstract:We show that easily accessible digital records of behavior, Facebook Likes, can be used to automatically and accurately predict a range of highly sensitive personal attributes including: sexual orientation, ethnicity, religious and political views, personality traits, intelligence, happiness, use of addictive substances, parental separation, age, and gender. The analysis presented is based on a dataset of over 58,000 volunteers who provided their Facebook Likes, detailed demographic profiles, and the results of several psychometric tests. The proposed model uses dimensionality reduction for preprocessing the Likes data, which are then entered into logistic/linear regression to predict individual psycho-demographic profiles from Likes. The model correctly discriminates between homosexual and heterosexual men in 88% of cases, African Americans and Caucasian Americans in 95% of cases, and between Democrat and Republican in 85% of cases. For the personality trait "Openness," prediction accuracy is close to the test-retest accuracy of a standard personality test. We give examples of associations between attributes and Likes and discuss implications for online personalization and privacy. This is joint work with Michal Kosinski and David Stillwell at the University of Cambridge and is based on a PNAS paper of the same title.

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20131115T130000 DTEND;TZID=/Europe/London:20131115T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20131122130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Mario Marchand DESCRIPTION:Mario Marchand (Universite Laval): Risk Bounds and Learning Algorithms for the Regression Approach to Structured Output Prediction\n\nLocation: Zoom\n\nLink: Malet Place Engineering 1.03\n\nAbstract:In contrast with classification and regression where the learner tries to predict a scalar output, structured output prediction attempts at predicting outputs which can be very complex such as sequences, parse trees, and graphs. However, most learning algorithms, such as structured SVM and Max Margin Markov Networks require a prohibitive training time due to the fact that they have to solve a (often NP-hard) pre-image problem for each training example and for each update of the predictor produced by the learner. Here, we provide some conditions under which, a vector-valued regression approach, which avoids the pre-image problem during learning, is justified and has rigorous guarantees. More precisely, we show that the quadratic regression loss is a convex surrogate of the structured prediction loss when the output kernel satisfies some condition with respect to the structured prediction loss. We provide two upper bounds of the prediction risk that depend on the empirical quadratic risk of the predictor. The minimizer of the first bound is the regularized least-square predictor proposed by Cortes Mohri and Weston (2007) while the minimizer of the second bound is a predictor that has never been proposed so far. Both predictors are compared on practical tasks.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20131122T130000 DTEND;TZID=/Europe/London:20131122T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20131129130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Alfredo Kalaitzis Bernadino Romero Paredes Dino Sejdinovic DESCRIPTION:Alfredo Kalaitzis, Bernadino Romero Paredes, Dino Sejdinovic (UCL): NIPS preview talks\n\nLocation: Zoom\n\nLink: Malet Place Engineering 1.03\n\nAbstract:An opportunity for those with papers accepted at NIPS to practice their talks and for all those not going to get a preview! Talks will be in NIPS format 17mins + 3mins questions.

Talk 1: Alfredo Kalaitzis (with Ricardo Silva), Flexible sampling of discrete data correlations without the marginal distributions

arXiv link: http://arxiv.org/abs/1306.2685

Learning the joint dependence of discrete variables is a fundamental problem in machine learning, with many applications including prediction, clustering and dimensionality reduction. More recently, the framework of copula modeling has gained popularity due to its modular parametrization of joint distributions. Among other properties, copulas provide a recipe for combining flexible models for univariate marginal distributions with parametric families suitable for potentially high dimensional dependence structures. More radically, the extended rank likelihood approach of Hoff (2007) bypasses learning marginal models completely when such information is ancillary to the learning task at hand as in, e.g., standard dimensionality reduction problems or copula parameter estimation. The main idea is to represent data by their observable rank statistics, ignoring any other information from the marginals. Inference is typically done in a Bayesian framework with Gaussian copulas, and it is complicated by the fact this implies sampling within a space where the number of constraints increases quadratically with the number of data points. The result is slow mixing when using off-the-shelf Gibbs sampling. We present an efficient algorithm based on recent advances on constrained Hamiltonian Markov chain Monte Carlo that is simple to implement and does not require paying for a quadratic cost in sample size.

Slides for the talk: PDF

Talk 2: Bernadino Romero Paredes (with Massi Pontil), A New Convex Relaxation for Tensor Completion

Tensors can be succesfully employed to model the relationships between more than two entities, such as users, products, aspects, and time. Because of this, tensor completion has received a lot of interest recently in several fields such as computer vision, recommendation systems and natural language processing as the natural extension of matrix completion. A prominent methodology for matrix completion is low rank matrix learning by way of trace norm regularization. A generalization framework of this to tensor completion has been studied by several recent works. In this talk, I will highlight some limitations of this approach and propose an alternative convex relaxation on the Euclidean ball. I will then describe a technique to solve the associated regularization problem, which builds upon the alternating direction method of multipliers. Experiments on one synthetic dataset and two real datasets indicate that the proposed method improves significantly over tensor trace norm regularization in terms of estimation error, while remaining computationally tractable.

Slides for the talk: PDF

Talk 3: Dino Sejdinovic (with Arthur Gretton, Wicher Bergsma): A Kernel Test for Three-Variable Interactions

We introduce kernel nonparametric tests for Lancaster three-variable interaction and for total independence, using embeddings of signed measures into a reproducing kernel Hilbert space. The resulting test statistics are straightforward to compute, and are used in powerful interaction tests, which are consistent against all alternatives for a large family of reproducing kernels. We show the Lancaster test to be sensitive to cases where two independent causes individually have weak influence on a third dependent variable, but their combined effect has a strong influence. This makes the Lancaster test especially suited to finding structure in directed graphical models, where it outperforms competing nonparametric tests in detecting such V-structures.

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20131129T130000 DTEND;TZID=/Europe/London:20131129T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20140110130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Stephen Pasteris DESCRIPTION:Stephen Pasteris (UCL, Computer Science): Online Similarity Prediction of Networked Data\n\nLocation: Zoom\n\nLink: Malet Place Engineering Building 1.02\n\nAbstract:We consider online similarity prediction problems over networked

data. We begin by relating this task to the more standard class prediction

problem, showing that, given an arbitrary algorithm for class prediction,

we can construct an algorithm for similarity prediction with "nearly" the

same mistake bound, and vice versa. After noticing that this general

construction is computationally infeasible, we target our study to

feasible similarity prediction algorithms on networked data. We initially

assume that the network structure is known to the learner. Here we

observe that Matrix Winnow has a near-optimal mistake guarantee,

at the price of cubic prediction time per round. This motivates our effort

for an efficient implementation of a Perceptron algorithm with a weaker

mistake guarantee but with only poly-logarithmic prediction time. Our focus

then turns to the challenging case of networks whose structure is initially

unknown to the learner. In this novel setting, where the network

structure is only incrementally revealed, we obtain a mistake-bounded

algorithm with a quadratic prediction time per round.

We present dynamic spatial modelling and computational

methods for the analysis of collections of objects moving in a

spatially inhomogeneous force field under the influence of covariates.

Core motivating examples come from movement ecology and cell motility,

where multiple animals are tracked moving in 2-D or 3-D largely driven

by the external environmental characteristics. Interest lies in

identifying the role of different covariates in guiding the motion,

both in terms of the shape of their implied field, as well as their

overall presence or absence of influence. Models are based on

discrete-time, dynamic state-space models for locations and

directional velocities of each of a set of animals, combined with a

latent force-field over the temporal domain that drives changes in

velocities. We extend models for the force fields using dynamic

Bayesian radial basis function regression to define a potential

surface varying in space but also in the space of covariates, with the

force field given by the gradient of the potential in 3-D.

Corresponding variable selection priors allow us to detect which

covariates play a role in shaping the motion, and provide a basis for

understanding their precise functional form. We exemplify the work on

two examples: a 3-D dataset from in-vivo immune cell motility, and a

GPS tracking dataset from toucans in central America.

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20140124T130000 DTEND;TZID=/Europe/London:20140124T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20140214130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Peter Forbes DESCRIPTION:Peter Forbes (Oxford, Department of Statistics): Quantifying Fingerprint Evidence using Bayesian Alignment\n\nLocation: Zoom\n\nLink: Malet Place Engineering 1.02\n\nAbstract:Fingerprint evidence has long been considered infallible by courtrooms

worldwide. However, subjective human judgement plays a large role in

determining whether or not two fingerprints match, especially when

dealing with the blurry prints typical of crime scenes. Despite this

uncertainty, courtroom fingerprint evidence is always presented

categorically as a match or non-match. This leads to inflated

confidence in the forensic evidence, and sometimes to false

convictions. These false convictions have instigated a push within

the forensics community to present courtroom evidence as a likelihood

ratio€ rather than a categorical match. Before this is possible, a

standardized method for quantifying the strength of fingerprint

evidence needs to be developed. I am developing a Bayesian

hierarchical model where the feature points of fingerprints (called

minutiae) are represented using spatial Poisson point processes.

Determining how well two fingerprints match reduces to identifying a

matching (a bipartite graph which determines which minutiae are common

to both fingerprints), and a rigid motion between the fingerprint

images such that the common minutiae are spatially close. An MCMC

algorithm has been developed to sum over all possible matchings and

rigid motions to determine the likelihood ratio between the

prosecution hypothesis (the two observed fingerprints originate from

the same finger) and the defense hypothesis (the two observed

fingerprints originate from independent fingers). The model has been

tested on a small database of 258 forensic fingerprints provided by

the American Forensic Bureau of Investigation.

Joint work with Steffen Lauritzen (Oxford) and Jesper Møller (Aalborg).

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20140214T130000 DTEND;TZID=/Europe/London:20140214T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20140221130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Marc Deisenroth DESCRIPTION:Marc Deisenroth (Imperial College, London): Statistical Machine Learning for Autonomous Systems\n\nLocation: Zoom\n\nLink: Malet Place Engineering 1.20\n\nAbstract:Autonomous learning has been a promising direction in control and

robotics for more than a decade since learning models and controllers

from data allows us to reduce the amount of engineering knowledge that

is otherwise required. Due to their flexibility, autonomous

reinforcement learning (RL) approaches typically require many

interactions with the system to learn controllers. However, in real

systems, such as robots, many interactions can be impractical and time

consuming. To address this problem, current learning approaches

typically require task-specific knowledge in form of expert

demonstrations, pre-shaped policies, or specific knowledge about the

underlying dynamics.

In the first part of the talk, we follow a different approach and speed

up learning by efficiently extracting information from sparse data. In

particular, we learn a probabilistic, non-parametric Gaussian process

dynamics model. By explicitly incorporating model uncertainty into

long-term planning and controller learning our approach reduces the

effects of model errors, a key problem in model-based learning. Compared

to state-of-the art RL our model-based policy search method achieves

an unprecedented speed of learning. We demonstrate its applicability to

autonomous learning in real robot and control tasks.

In the second part of my talk, we will discuss an alternative method for

learning controllers based on Bayesian Optimization, where it is no

longer possible to learn models of the underlying dynamics. We

successfully applied Bayesian optimization to learning controller

parameters for a bipedal robot, where modeling the dynamics is very

difficult due to ground contacts. Using Bayesian optimization, we

sidestep this modeling issue and directly optimize the controller

parameters without the need of modeling the robot's dynamics.

Gaussian Processes are non-parametric Bayesian models which support flexible kernels and Bayesian reasoning under uncertainty. Despite their strengths and growing adoption in machine learning, they have had very few applications to language processing.

In this talk I will outline my recent work which represent some the first applications of GPs to language, and show significant improvements beyond state of the art in several language processing tasks. The first task I consider is machine translation evaluation, a task made difficult due to individual annotators bringing different biases, interpretations of the task and levels of consistency. I show how this problem can be framed as regression using a multi-task GP prior, such that individual models are learned while explicitly learning correlations between these models.

I will also present applications to social media, including user impact prediction and identification of temporal periodicities in text usage. In both cases Gaussian Processes allow for much more accurate and flexible modelling than alternative methods, raising questions about the near-universal use of linear models in NLP.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20140228T130000 DTEND;TZID=/Europe/London:20140228T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20140307130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Robert Stojnic DESCRIPTION:Robert Stojnic (Cambridge Systems Biology Centre): Bayesian Molecular LEGO\n\nLocation: Zoom\n\nLink: Malet Place Engineering 1.03\n\nAbstract:The assembly of cellular components has been traditionally modelled using differential equations. In this work I will present a new formalism that yields a model similar to Bayesian Networks where every node represents one molecule. The intuition behind the model can be captured with an analogy to LEGO building blocks where blocks (=molecules) are successively added to create the final molecular structure. I will show that the structure can be reverse-engineered from measurements of molecular abundance under perturbation. I will discuss the Bayesian approach to structure inference, and derive an efficient maximum a-posteriori inference scheme with uniform priors. I will discuss how the choice of prior has crucial influence on the success of inference.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20140307T130000 DTEND;TZID=/Europe/London:20140307T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20140314130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Srini Turaga DESCRIPTION:Srini Turaga (Gatsby Unit & Wolfson Institute for Biomedical Research): Using ConvNets, MALIS and crowd-sourcing to map the retinal connectome.\n\nLocation: Zoom\n\nLink: Malet Place Engineering 1.03\n\nAbstract:Neural circuits in the brain are formed from neurons connecting to one another in highly structured ways. However, technological limitations have prevented us from knowing much about the nature of neural connectivity and how it relates to neural computation. We have developed new technology based on 3d electron microscopy, computational image analysis and crowd-sourcing to reconstruct complete wiring diagrams for all the neurons in a piece of brain tissue.

We have densely reconstructed 950 neurons in the inner plexiform layer of the mouse retina using a combination of machine learning algorithms and human proof-reading. These reconstructions yield hints of the principles underlying neural connectivity and neural computation in the retina. I will briefly describe these results and present the computational methods leading to this work. Our machine learning method for image segmentation is a deep convolutional neural network (ConvNet), which when combined with the novel global cost function for image segmentation (MALIS) yields neuron tracing accuracy approaching that of a single human expert (tracings from multiple human experts are usually combined and proofread to increase tracing accuracy).

Joint work with Viren Jain, Moritz Helmstaedter, Kevin Briggman, Winfried Denk and Sebastian Seung.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20140314T130000 DTEND;TZID=/Europe/London:20140314T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20140321130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Lloyd Elliott DESCRIPTION:Lloyd Elliott (UCL, Gatsby): Bayesian nonparametric dynamic-clustering and genetic imputation\n\nLocation: Zoom\n\nLink: Malet Place Engineering 1.03\n\nAbstract:I will describe new approaches to dynamic-clustering based on Bayesian nonparametric (BNP) hidden Markov models (HMMs). I will apply these approaches to genotype imputation problems and illustrate the practical benefits of BNP. Genetic similarity within a population is a function of chromosome position and dynamic-clustering based on parametric HMMs are popular models of genetic structure. BNP priors are well suited as extensions of, or as competitors to, these HMMs because many aspects of genetic processes (such as allele sampling) arise naturally from BNP models. In addition, BNP priors provide several practical benefits over parametric HMMs. First, by defining probability distributions on the set of partitions, BNP priors avoid label switching problems. Second, costly model selection and ad-hoc methods to determine the number of latent clusters are also avoided. Finally, the flexibility of BNP often provides state-of-the-art imputation accuracy. I will conclude with directions of future work including the abstraction of auxiliary Gibbs schemes (used for inference in these models) to probabilistic programming for BNP models.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20140321T130000 DTEND;TZID=/Europe/London:20140321T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20140404130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Sarah Chisholm DESCRIPTION:Sarah Chisholm (UCL, Computer Science): Statistical Methods for Analysing Time Series Data of Animal Movement\n\nLocation: Zoom\n\nLink: MPEB 1.02\n\nAbstract:Collecting data to help understand the behaviour and interactions of animals has changed dramatically over the last two decades. When ecologists used to exclusively follow animals by car/foot, observing locations and behaviours in person and recording these manually. It is now possible to do much of this work automatically.

Technologies to collect data on animal movement have improved immensely in recent years. GPS units are becoming more and more accurate, lighter and last longer. They and Inertial Measurement Units (IMUs) do not only record the location of the individual, but include accelerometers, gyroscopes and many other interesting sensors to collect data about animals. These devices are small and light enough to fit on animals as small as pigeons.

Whilst the amount and quality of data exceeds that previously available, the methods to analyse this data are still lagging behind. For example, a method to detect whether individuals or groups are more or less often in close proximity of each other than expected by chance does not exist without underlying assumptions about the shape and size of the individuals' territory and boundaries.

Moreover methods to identify a relationship in the movement of individuals whose movements are not stationary, i.e. no constant mean and variance, still produce spurious results (identifying cointegration when none exists), or are restricted to first order integrated series.

This talk covers two new mathematical methods to allow ecologists and behaviourists to answer questions related to these two key aspects of behaviour and interaction. The methods rely on well-established mathematical theorems, they have been tested on synthetic data and applied to data collected on leopard, wild dog and sheep movements.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20140404T130000 DTEND;TZID=/Europe/London:20140404T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20140411130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Dimitrios Athanasakis DESCRIPTION:Dimitrios Athanasakis (UCL): Principled Non-Linear Feature Selection (with applications in representation learning)\n\nLocation: Zoom\n\nLink: Malet Place Engineering 1.03\n\nAbstract:Following recent work in non-linear feature selection we propose a novel method for assessing the contribution of

a feature through estimating its expected impact on the alignment or HSIC measure. Theoretical analysis of this

approach is included showing that for appropriate polynomial sample sizes influential features can be distinguished

from irrelevant ones. We present experimental evidence which confirm the analysis including applications in representation learning.

The method was used to obtain a 3rd position result in the 2013 ICML black box learning challenge, as well as competitive results in

signal peptide prediction, an important bioinformatics application.

Generating sequential data is the closest computers get to dreaming. Digital dreams are likely to play a crucial role in the future of AI, by helping agents to simulate, predict and interpret their surroundings. This talk shows how Long Short-term Memory recurrent neural networks can be used to generate complex sequences with large-scale structure, simply by

predicting one step at a time. The method is demonstrated for character-level language modelling (where the data are discrete) and speech and handwriting generation (where the data are real-valued). A novel extension allows the network to condition its predictions on an auxiliary input sequence, making it possible to speak or write specific texts.

Bag of feature (BoF) representations are omnipresent in machine learning; for example, an image can be described by a bag of visual features, a document might be considered as a bag of words, or a molecule can be handled as a bag of its different configurations. Set kernels (also called multi-instance or ensemble kernels; Gaertner 2002) defining the similarity of two bags as the average pairwise point similarities between the sets, are among the most widely applied tools to handle problems based on such BoF representations. Despite the wide applicability of set kernels, even the most fundamental theoretical questions such as their consistency in specific learning tasks is unknown.

In my talk, I am going to focus on the distribution regression problem: regressing from a probability distribution to a real-valued response. By considering the mean embeddings of the distributions, this is a natural generalization of set kernels to the infinite sample limit: the bags can be seen as i.i.d. (independent identically distributed) samples from a distribution. We will propose an algorithmically simple ridge regression based solution for distribution regression and prove its consistency under fairly mild conditions (for probability distributions defined on locally compact Polish spaces). As a special case, we give positive answer to a 12-year-old open question, the consistency of set kernels in regression. We demonstrate the efficiency of the studied ridge regression technique on (i) supervised entropy learning, and (ii) aerosol prediction based on satellite images. [preprint, code]

Slides can be downloaded here

Slides can be downloaded here

Abstract:

Founded in 2011 in London, Google DeepMind is a unique environment for

long-term ambitious research to flourish in. This talk will share how their

world-class interdisciplinary team has made a number of high profile

breakthroughs towards general AI by combining the best techniques from deep

learning, reinforcement learning and systems neuroscience to build powerful

general-purpose learning algorithms.

Bio:

Demis Hassabis is a neuroscientist and leading expert on the neural basis

of memory and imagination. He is the Founder/CEO of DeepMind Technologies

which was recently acquired by Google. Demis was a former child chess

prodigy, who finished his A-levels early at 16 before going on to co-create

the multi-million selling video game Theme Park for Bullfrog Productions.

Upon graduating from Cambridge University with a Double First in Computer

Science he founded the high-profile video games company Elixir Studios,

which he grew to 60 people, producing pioneering games for Microsoft and

Vivendi Universal. After successfully selling the IP and technology rights,

Demis returned to academia to complete a PhD in cognitive neuroscience at

UCL, focusing on the hippocampus and amnesia. His research systematically

connecting memory with imagination for the first time was listed in the top

ten scientific breakthroughs of 2007 by the journal Science. Subsequently

he was a visiting scientist jointly at MIT and Harvard, before securing a

Sir Henry Wellcome Fellowship as a Research Fellow at the Gatsby

Computational Neuroscience Unit at UCL.

Abstract:

Classic models from population genetics can be adapted to give evolutionary algorithms that are MCMC methods of a type widely used in Bayesian inference. Breeding is modelled using a generalisation of the Moran process; selection is modelled as a Metropolis acceptance. The result is a family of finite-population algorithms, for which the Markov chain of populations satisfies detailed balance, and the stationary distribution factorises exactly into a simple form. These algorithms are closely analogous to Gibbs-within-Metropolis algorithms for Bayesian inference.

We will consider a range of such probability models for both sexual and asexual evolution. Some basic information-theoretic differences between sexual and asexual reproduction become obvious using this approach.

Initial results on optimising movement of a robot arm will be described.

From the point of view of evolutionary computation, we propose new type of 'genetic algorithm', with known good statistical properties.

Evolution is perhaps the natural world's number one learning algorithm: this talk presents computational models of evolution that are examples of a standard MCMC approach widely used in machine learning.

Bio:

Dr. Chris Watkins is a reader in Computer Science at Royal Holloway University of London. His research interests have been in computational finance, kernel methods, evolutionary theory, and behavioural learning. He obtained his PhD from Cambridge University, and in his thesis he proposed that behavioural learning could be considered as using experience for incremental policy optimisation in a Markov decision process (MDP), and he introduced the Q-learning algorithm: this work was influential and became one of the standard models of reinforcement learning.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20140516T130000 DTEND;TZID=/Europe/London:20140516T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20140523130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Remi Bardenet DESCRIPTION:Remi Bardenet (Deptartment of Statistics, Oxford): Scaling up MCMC: a subsampling approach\n\nLocation: Zoom\n\nLink: MPEB 1.03\n\nAbstract:Markov chain Monte Carlo (MCMC) methods are often deemed far too computationally intensive to be of any practical use for large datasets. In this talk, I will describe a methodology that aims to scale up the Metropolis-Hastings (MH) algorithm in this context. We propose an approximate implementation of the accept/reject step of MH based on concentration inequalities, which only requires evaluating the likelihood of a random subset of the data, yet is guaranteed to coincide with the accept/reject step based on the full dataset with a probability superior to a user-specified tolerance level. This adaptive subsampling technique is an alternative to the recent approach developed in (Korattikara et al., to appear in ICML'14), and it allows to establish rigorously that the resulting approximate MH algorithm samples from a perturbed version of the target distribution of interest. Furthermore, the total variation distance between this perturbed target and the target of interest is controlled explicitely. I will demonstrate the benefits and limitations of this scheme on several examples.

Joint work with Arnaud Doucet and Chris Holmes, ICML'14.

Paper link

A chance to hear previews of ICML 2014 talks from CSML researchers.

Talk 1

Gaussian Processes for Bayesian Estimation in Ordinary Differential Equations

Yali Wang and David Barber

Bayesian parameter estimation in coupled ordinary differential equations (ODEs) is challenging due to the high computational cost of numerical integration. In gradient matching a separate

data model is introduced with the property that its gradient may be calculated easily. Parameter

estimation is then achieved by requiring consistency between the gradients computed from the

data model and those specified by the ODE. We propose a Gaussian process model that directly

links state derivative information with system observations, simplifying previous approaches and

improving estimation accuracy.

Talk 2

A Kernel Independence Test for Random Processes

Kacper Chwiałkowski and Arthur Gretton

A non-parametric approach to the problem of testing the independence of two random processes will be presented. The test statistic is the Hilbert-Schmidt Independence Criterion (HSIC), which was used previously in testing independence for i.i.d. pairs of variables. The asymptotic behaviour of HSIC will be established when computed from samples drawn from random processes. We will show that earlier bootstrap procedures which worked in the i.i.d. case will fail for random processes, and an alternative consistent estimate of the p-values will be proposed. Tests on artificial data and real-world forex data indicate that the new test procedure discovers dependence which is missed by linear approaches, while the earlier bootstrap procedure returns an elevated number of false positives.

Talk 3

Kernel adaptive Metropolis-Hastings

D. Sejdinovic, H. Strathmann, M. Lomeli Garcia, C. Andrieu and A. Gretton

Abstract: A Kernel Adaptive Metropolis-Hastings algorithm is introduced, for the purpose of sampling from a target distribution with strongly nonlinear support. The algorithm embeds the trajectory of the Markov chain into a reproducing kernel Hilbert space (RKHS), such that the feature space covariance of the samples informs the choice of proposal. The procedure is computationally efficient and straightforward to implement, since the RKHS moves can be integrated out analytically: our proposal distribution in the original space is a normal distribution whose mean and covariance depend on where the current sample lies in the support of the target distribution, and adapts to its local covariance structure. Furthermore, the procedure requires neither gradients nor any other higher order information about the target, making it particularly attractive for contexts such as Pseudo-Marginal MCMC. Kernel Adaptive Metropolis-Hastings outperforms competing fixed and adaptive samplers on multivariate, highly nonlinear target distributions, arising in both real-world and synthetic examples.

Code: https://github.com/karlnapf/kameleon-mcmc

One big aim in robotics is to learn modular control policies that can synthesize complex behaviour out of simpler elemental movements, often called movement primitives. Such structure of the control policy comes with the promise of simplifying complex learning problems into simpler tasks and alleviates learning of new, but similar tasks. In order to learn modular control policies efficiently, the underlying learning algorithm as well as the movement primitive representation has to fulfil several requirements. There need to be simple mechanisms to adapt the primitive to new situations, we need to learn how to sequence primitives and combine primitives simultaneously such that we can synthesize complex behaviour out of a compact set of movement primitives.

In this talk I will introduce our recent work on learning such a modular control policy with information theoretic policy search.

Information-theoretic policy search uses an information-theoretic bound to determine the step-size of the policy update. It exhibits several beneficial properties, such as a smooth and stable learning process and a fast learning speed. We extended information-theoretic policy search methods such that we can efficiently generalize elemental movements to new situations, learn to select between several elemental movements and learn how to sequence elemental movements. Furthermore, I will present a new probabilistic movement primitive (ProMP) representation that is particularly well suited for such a modular control approach. ProMPs allow for the use of new probabilistic operators that provide a principled way of generalization and co-activation of movement primitives.

Short Bio:

Gerhard Neumann is currently post-doctoral fellow at the Intelligent Autonomous Systems (IAS) Lab of Prof. Jan Peters at the TU Darmstadt. He is group leader of the Machine Learning for Control group. He finished his PhD. in 2012 at the Technical University Graz. His research interests are Bayesian Machine Learning, Hierarchical and Structured Learning for Robotics, Reinforcement Learning, Information Theoretic Policy Search, Kernel Embeddings and Movement Primitive Representations.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20140619T110000 DTEND;TZID=/Europe/London:20140619T120000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20140912130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Remi Munos DESCRIPTION:Remi Munos (INRIA Lille): Two generic principles in modern bandits: the optimistic principle and Thompson sampling\n\nLocation: Zoom\n\nLink: Roberts G08 (Sir David Davies LT)\n\nAbstract:Abstract: I will describe two principles considered in multi-armed bandits, namely the optimistic principle and Thompson sampling, and illustrate how they extend to structured bandit settings, such as in linear bandits and bandits in graphs.

Bio: Remi Munos received his PhD in 1997 in Cognitive Science from EHESS, France, and did a postdoc at CMU from 1998-2000. Then he was Assistant Professor in the department of Applied Mathematics at Ecole Polytechnique. In 2006 he joined the French public research institute INRIA as a Senior Researcher and co-leaded the project-team SequeL (Sequential Learning) which now gather approximately 25 people. His research interests cover several fields of Statistical Learning including Reinforcement Learning, Optimization, and Bandit Theory.

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20140912T130000 DTEND;TZID=/Europe/London:20140912T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20141017160000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Peter Flach DESCRIPTION:Peter Flach (University of Bristol): Comparing apples and oranges -- reinterpreting common evaluation metrics in classification\n\nLocation: Zoom\n\nLink: Malet Place Engineering Building 1.02\n\nAbstract:Abstract:

A wide range of evaluation metrics exist in supervised learning, including accuracy, area under the ROC curve (AUC) and Brier score. At first sight these metrics assess different aspects of a predictive model's performance: accuracy measures classification performance (ability to assign the correct class), AUC measures ranking performance (ability to score positives higher than negatives) and Brier score assesses scoring performance (ability to assign probabilities close to the 'ideal' 0/1 values). While it thus appears that these measures are not directly comparable, in this talk I will discuss recent results that demonstrate how each measure can be directly related to expected misclassification loss under varying operating conditions, utilising the notion of a threshold selection method. Among these results is a new interpretation -- and rehabilitation -- of AUC in terms of expected misclassification loss under a novel rate-driven threshold selection method. I will also demonstrate how each evaluation metric can be visualised in cost space, and discuss the importance and effect of classifier calibration. Finally, I will describe ongoing work that investigates how these results can be related to different cost models such as the F-measure.

Most of the talk is based on joint work with José Hernández-Orallo and Cèsar Ferri; some recent publications are accessible here:

http://www.icml-2011.org/papers/366_icmlpaper.pdf

http://www.icml-2011.org/papers/385_icmlpaper.pdf

http://jmlr.csail.mit.edu/papers/v13/hernandez-orallo12a.html

http://link.springer.com/article/10.1007%2Fs10994-013-5328-9

Bio:

Peter Flach has been Professor of Artificial Intelligence at the University of Bristol since 2003. An internationally leading researcher in the areas of mining highly structured data and the evaluation and improvement of machine learning models using ROC analysis, he has also published on the logic and philosophy of machine learning, and on the combination of logic and probability. He is author of Simply Logical: Intelligent Reasoning by Example (John Wiley, 1994) and Machine Learning: the Art and Science of Algorithms that Make Sense of Data (Cambridge University Press, 2012).

Prof Flach is the Editor-in-Chief of the Machine Learning journal, one of the two top journals in the field that has been published for over 25 years by Kluwer and now Springer. He was Programme Co-Chair of the 1999 International Conference on Inductive Logic Programming, the 2001 European Conference on Machine Learning, the 2009 ACM Conference on Knowledge Discovery and Data Mining, and the 2012 European Conference on Machine Learning and Knowledge Discovery in Databases in Bristol.

Slides for the talk: PDF

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20141017T160000 DTEND;TZID=/Europe/London:20141017T170000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20141107130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Chris Williams DESCRIPTION:Chris Williams (Edinburgh University): Switching Linear Dynamical Systems for Condition Monitoring in the Intensive Care Unit\n\nLocation: Zoom\n\nLink: Malet Place Engineering Building 1.02\n\nAbstract:Abstract:

Data drawn from an observed system is often usefully described by a

number of hidden (or latent) factors. Given a sequence of

observations, the task is to infer which latent factors are active at

each time frame. In this talk I will describe the application of a

switching linear dynamical model to monitoring the condition

of a patient receiving intensive care. The state of health of

a patient cannot be observed directly, but different underlying factors

are associated with particular patterns of measurements, e.g. in the

heart rate, blood pressure and temperature.

I will describe two recent developments for this framework:

1) A Hierarchical Switching Linear Dynamical System (HSLDS) has been

developed for the detection of sepsis in neonates in an intensive care

unit (ICU). This adds a higher-level discrete switch variable with

semantics sepsis/non-sepsis above the factors in the Factorial

Switching LDS (FSLDS) of Quinn et al. (2009).

2) The FSLDS is a generative model for the observations. We present a

Discriminative Switching Linear Dynamical System (DSLDS) applied to

patient monitoring in ICUs. Our approach is based on identifying the

state-of-health of a patient given their observed vital signs using a

discriminative classifier, and then inferring their underlying

physiological values conditioned on this status. We demonstrate on

two real-world datasets that the DSLDS is able to outperform the FSLDS

in most cases of interest, and that a combination of the two models

achieves higher performance than either of the two models

separately.

Joint work with Yvonne Freer, Konstantinos Georgatzis, Ioan Stanculescu

Speaker Bio:

Chris Williams is Professor of Machine Learning in the School of

Informatics, University of Edinburgh. He is interested in a wide range

of theoretical and practical issues in machine learning, statistical

pattern recognition, probabilistic graphical models and computer

vision. This includes theoretical foundations, the development of new

models and algorithms, and applications. His main areas of research

are in visual object recognition and image understanding,

models for understanding time-series, unsupervised learning, and

Gaussian processes.

He obtained his MSc (1990) and PhD (1994) at the University of

Toronto, under the supervision of Geoff Hinton. He was a member of

the Neural Computing Research Group at Aston University from 1994 to

1998, and has been at the University of Edinburgh since 1998. He was

program co-chair of NIPS in 2009, and is on the editorial boards of

the Journal of Machine Learning Research and Proceedings of the Royal

Society A.

http://homepages.inf.ed.ac.uk/ckiw/

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20141107T130000 DTEND;TZID=/Europe/London:20141107T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20141114130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Ata Kaban DESCRIPTION:Ata Kaban (University of Birmingham): Learning with random projections\n\nLocation: Zoom\n\nLink: Roberts G08 (Sir David Davies lecture theatre)\n\nAbstract:Abstract: Since the impressive advances in the area of compressed

sensing, dimensionality reduction by random projections for machine

learning and data mining gains a renewed interest. In direct analogy,

compressive learning means to carry out learning tasks efficiently on

cheaply compressed versions of high dimensional massive data sets that

have a sparse representation. This talk will discuss conditions and

guarantees for compressive learning to succeed, which do not require

the data to have a sparse representation but instead exploit the

natural structure of the learning problem. In particular, we give

tight risk bounds in classification and regression settings, which

have a clear interpretation and reveal meaningful structural

properties of the problem that make it solvable effectively in a small

dimensional random subspace. We will also demonstrate that performance

gains are achievable by combining several compressive learners into an

ensemble.

Speaker Bio: Dr. Ata Kaban is currently senior lecturer in Computer Science at the

University of Birmingham. She recieved her PhD in Computer Science from the

University of Paisley, supervised by Mark Girolami. She also holds a PhD in

Musicology. Her research interests are: statistical machine learning, data

mining - with emphasis on high dimensional data spaces; algorithmic learning

theory; probabilistic modelling of data, and Bayesian inference; high

dimensional phenomena, measure concentration, random matrix theory;

dimensionality reduction, random projections; large-scale heuristic

black-box optimisation.

Speaker's Webpage: http://www.cs.bham.ac.uk/~axk/

Slides for the talk: PDF

Video of the talk here.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20141114T130000 DTEND;TZID=/Europe/London:20141114T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20141121130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Amos Storkey DESCRIPTION:Amos Storkey (Edinburgh University): Series Expansion Methods for Approximate Learning, Filtering and Smoothing in Diffusions\n\nLocation: Zoom\n\nLink: Roberts G08 (Sir David Davies lecture theatre)\n\nAbstract:Many systems in science, engineering and finance are described using

known forms of parameterised differential equations. Often there are

unknown random influences to such systems and so a stochastic

differential system is an appropriate model. However inference and

learning in general nonlinear stochastic differential systems is

notoriously hard, primarily because the finite time transition

probability cannot be explicitly represented.

I will discuss the series expansion approach for approximating a

diffusion, and demonstrate examples of the method in direct

application for parameter estimation in diffusion processes, and via

nonlinear Kalman filters and the unscented particle filter.

This talk describes joint work with Simon Lyons and Simo Sarkka, and was funded by a MSR Cambridge PhD fellowship.

Bio:

Amos Storkey is a reader (associate professor) at the School of Informatics, Edinburgh University. He did his PhD in Neural Networks at the Neural Systems Group, Imperial College, London. His research interests include: machine learning markets; Bayesian methods for brain imaging; continuous time/depth systems; dynamical Boltzmann machine models; scalable deep learning.

Video of the talk here.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20141121T130000 DTEND;TZID=/Europe/London:20141121T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20141128130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Stephen Roberts DESCRIPTION:Stephen Roberts (Oxford University): Planets, Pulsars, People and Petabytes: Explorations of Machine Learning in Astronomy\n\nLocation: Zoom\n\nLink: Roberts G08 (Sir David Davies lecture theatre)\n\nAbstract:Astronomy has become a data-intensive science. In this talk we highlight our recent work in using scalable machine learning methods for astronomical data analysis and exploration. We consider the role of scalable Bayesian inference for inferring and removing systematic corruptions in data, detection of astrophysical transients & large-scale aggregation of information in a citizen science project.

Bio: Stephen's main area of research lies in machine learning approaches to data analysis. He has particular interests in the development of machine learning theory for problems in time series analysis and decision theory. Current research applies Bayesian statistics, graphical models and information theory to diverse problem domains including astronomy, mathematical biology, finance and sensor networks. He leads the Machine Learning Research Group, is a Professorial Fellow of Somerville College and a faculty member of the Oxford-Man Institute.

speaker's webpage: http://www.robots.ox.ac.uk/~sjrob/

Video of the talk here.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20141128T130000 DTEND;TZID=/Europe/London:20141128T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20141205130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Andrew McDonald Kacper Chwialkowski Balaji Lakshminarayanan DESCRIPTION:Andrew McDonald, Kacper Chwialkowski, Balaji Lakshminarayanan (UCL/Gatsby): NIPS Previews\n\nLocation: Zoom\n\nLink: Roberts G08 (Sir David Davies lecture theatre)\n\nAbstract:A presentation of several NIPS 2014 papers from UCL researchers.

Talk 1

Speaker: Andrew McDonald

Title: Spectral k-Support Norm Regularization

Abstract: The k-support norm has successfully been applied to sparse vector prediction problems. We observe that it belongs to a wider class of norms, which we call the box-norms. Within this framework we derive an efficient algorithm to compute the proximity operator of the squared norm, improving upon the original method for the k-support norm. We extend the norms from the vector to the matrix setting and we introduce the spectral k-support norm. We study its properties and show that it is closely related to the multitask learning cluster norm. We apply the norms to real and synthetic matrix completion datasets. Our findings indicate that spec- tral k-support norm regularization gives state of the art performance, consistently improving over trace norm regularization and the matrix elastic net. (Joint work with Massimiliano Pontil and Dimitris Stamos)

Talk 2

Speaker: Kacper Chwialkowski

Title: A Wild Bootstrap for Degenerate Kernel Tests

Abstract: A wild bootstrap method for nonparametric hypothesis tests based on kernel distribution embeddings is proposed. This bootstrap method is used to construct provably consistent tests that apply to random processes, for which the naive permutation-based bootstrap fails. It applies to a large group of kernel tests based on V-statistics, which are degenerate under the null hypothesis, and non-degenerate elsewhere. To illustrate this approach, we construct a two-sample test, an instantaneous independence test and a multiple lag independence test for time series. In experiments, the wild bootstrap gives strong performance on synthetic examples, on audio data, and in performance benchmarking for the Gibbs sampler.

(Joint work with Dino Sejdinovic and Arthur Gretton)

Talk 3

Speaker: Balaji Lakshminarayanan

Title: Mondrian Forests: Efficient Online Random Forests

Abstract: Ensembles of randomized decision trees, usually referred to as random forests, are widely used for classification and regression tasks in machine learning and statistics. Random forests achieve competitive predictive performance and are computationally efficient to train and test, making them excellent candidates for real-world prediction tasks. The most popular random forest variants (such as Breiman's random forest and extremely randomized trees) operate on batches of training data. Online methods are now in greater demand. Existing online random forests, however, require more training data than their batch counterpart to achieve comparable predictive performance. In this work, we use Mondrian processes (Roy and Teh, 2009) to construct ensembles of random decision trees we call Mondrian forests. Mondrian forests can be grown in an incremental/online fashion and remarkably, the distribution of online Mondrian forests is the same as that of batch Mondrian forests. Mondrian forests achieve competitive predictive performance comparable with existing online random forests and periodically re-trained batch random forests, while being more than an order of magnitude faster, thus representing a better computation vs accuracy tradeoff. (Joint work with Daniel M. Roy and Yee Whye Teh)

Video of the talks here.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20141205T130000 DTEND;TZID=/Europe/London:20141205T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20150116130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Kamil Ciosek DESCRIPTION:Kamil Ciosek (UCL, Computer Science): Combining state abstraction and temporal abstraction in MDP solving\n\nLocation: Zoom\n\nLink: Roberts G08 (Sir David Davies lecture theatre)\n\nAbstract:The talk presents a way of solving Markov Decision Processes that

combines state abstraction and temporal abstraction. Specifically, we

combine state aggregation with the options framework and demonstrate

that they work well together and indeed it is only after one combines

the two that the full benefit of each is realized. We introduce a

hierarchical value iteration algorithm where we first coarsely solve

subgoals and then use these approximate solutions to exactly solve the

MDP. This algorithm solves several problems faster than vanilla value

iteration.

About the speaker: Kamil Ciosek (ciosek.net) is a PhD student at CSML specialising in approximate approaches to solving MDPs.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20150116T130000 DTEND;TZID=/Europe/London:20150116T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20150123130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Vladimir Vovk DESCRIPTION:Vladimir Vovk (Royal Holloway University of London): Probabilistic prediction in machine learning\n\nLocation: Zoom\n\nLink: Roberts G08 (Sir David Davies lecture theatre)\n\nAbstract:The topic of this talk will be probabilistic prediction using standard machine learning algorithms. The advantage of probabilities over alternative methods of quantifying uncertainty (such as prediction sets) is that they can be easily combined with losses and utilities for the purpose of decision making. For simplicity I will concentrate on problems of classification with two labels, 0 or 1. Most of machine learning algorithms are scoring algorithms in that they output not only a prediction but also a score intuitively reflecting the algorithm's confidence that the label is 1. One way of obtaining probabilistic predictions is to calibrate the scores. I will briefly review traditional calibration methods and describe a new method which is both computationally efficient and guaranteed to produce well-calibrated predictions.

About the speaker:

Vladimir Vovk graduated from Moscow State University, where he specialized in mathematical logic and did PhD in algorithmic randomness and Kolmogorov complexity. Since 1999 he is Professor of Computer Science at Royal Holloway, University of London. His research interests include machine learning and the foundations of probability.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20150123T130000 DTEND;TZID=/Europe/London:20150123T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20150206130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Peter Tino DESCRIPTION:Peter Tino (University of Birmingham): Learning from Temporal Data Using Dynamical Feature Space\n\nLocation: Zoom\n\nLink: Roberts G08 (Sir David Davies lecture theatre)\n\nAbstract:In learning from "static" data (order of data presentation does not carry any useful information), one framework for dealing with such data is to transform the input items non-linearly into a feature space (usually high-dimensional), that is "rich" enough, so that linear techniques are sufficient. However, data such as EEG signals, or biological sequences naturally comes with a sequential structure. I will present a general dynamical filter that effectively acts as a dynamical feature space for representing temporally ordered samples. I will then outline a framework for learning on sets of sequential data by building kernels based such temporal filters. The methodology will be demonstrated in a series of sequence classification tasks and in an incremental temporal "regime" detection task.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20150206T130000 DTEND;TZID=/Europe/London:20150206T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20150227130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Matt Hoffmann DESCRIPTION:Matt Hoffmann (University Of Cambridge): Predictive Entropy Search\n\nLocation: Zoom\n\nLink: Roberts G08 (Sir David Davies lecture theatre)\n\nAbstract:Bayesian optimization provides a principled, probabilistic approach for global optimization. In this talk I will give a brief overview of Bayesian optimization and then provide details on novel, information-theoretic approaches to this problem. In particular I will detail an algorithm we have developed called Predictive Entropy Search (PES) which maximizes the expected information gained with respect to the global maximum at every iteration. This reformulation allows PES to obtain approximations that are both more accurate and efficient than other alternatives. Finally, this approach also allows one to easily incorporate additional constraints that are much more problematic for alternative methods.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20150227T130000 DTEND;TZID=/Europe/London:20150227T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20150304130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Jason Weston DESCRIPTION:Jason Weston (Facebook, New York): Memory Networks\n\nLocation: Zoom\n\nLink: Malet Place Engineering Building 1.03\n\nAbstract:We describe a new class of learning models called memory networks. Memory networks reason with inference components combined with a long-term memory component; they learn how to use these jointly. The long-term memory can be read and written to, with the goal of using it for prediction. We investigate these models in the context of question answering (QA) where the long-term memory effectively acts as a (dynamic) knowledge base, and the output is a textual response. We evaluate them on a large-scale QA task, and a set of smaller, but more complex, toy tasks generated from a simulated world. In the latter, we show the reasoning power of such models by chaining multiple supporting sentences to answer questions that require understanding the intension of verbs.

This is joint work with Sumit Chopra, Antoine Bordes and Tomas Mikolov.

About the speaker:

Jason Weston is a research scientist at Facebook, NY since February 2014. He earned his PhD in machine learning at Royal Holloway, University of London and at AT&T Research in Red Bank, NJ (advisors: Alex Gammerman, Volodya Vovk and Vladimir Vapnik) in 2000. From 2000 to 2002, he was a researcher at Biowulf technologies, New York. From 2002 to 2003 he was a research scientist at the Max Planck Institute for Biological Cybernetics, Tuebingen, Germany. From 2003 to 2009 he was a research staff member at NEC Labs America, Princeton. From 2009 to 2014 he was a research scientist at Google, NY. His interests lie in statistical machine learning and its application to text and images. Jason has published over 90 papers, including best paper awards at ICML and ECML. He was also part of the YouTube team that won a National Academy of Television Arts & Sciences Emmy Award for Technology and Engineering for Personalized Recommendation Engines for Video Discovery.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20150304T130000 DTEND;TZID=/Europe/London:20150304T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20150306130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Heiko Strathmann (covering co-author Mark Girolami who cannot make it) DESCRIPTION:Heiko Strathmann (covering co-author Mark Girolami who cannot make it) (Gatsby Unit, UCL): Unbiased Bayes for Big Data: Paths of Partial Posteriors\n\nLocation: Zoom\n\nLink: Roberts G06 (Sir Ambrose Fleming lecture theatre)\n\nAbstract:A key quantity of interest in Bayesian inference are expectations of functions with respect to a posterior distribution. Markov Chain Monte Carlo is a fundamental tool to consistently compute these expectations via averaging samples drawn from an approximate posterior. However, its feasibility is being challenged in the era of so called Big Data as all data needs to be processed in every iteration. Realising that such simulation is an unnecessarily hard problem if the goal is estimation, we construct a computationally scalable methodology that allows unbiased estimation of the required expectations -- without explicit simulation from the full posterior. The scheme's variance is finite by construction and straightforward to control, leading to algorithms that are provably unbiased and naturally arrive at a desired error tolerance. This is achieved at an average computational complexity that is sub-linear in the size of the dataset and its free parameters are easy to tune. We demonstrate the utility and generality of the methodology on a range of common statistical models applied to large-scale benchmark and real-world datasets.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20150306T130000 DTEND;TZID=/Europe/London:20150306T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20150313130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Peter Sollich DESCRIPTION:Peter Sollich (King's College, University Of London): Gaussian process regression on graphs\n\nLocation: Zoom\n\nLink: Roberts G06 (Sir Ambrose Fleming lecture theatre)\n\nAbstract:I will give an overview of our work over the last few years on understanding Gaussian process learning of functions on graphs, including kernel properties on locally treelike graphs and exact learning curve predictions for large graphs. Time permitting I will describe ongoing work on mismatched problems, which are best tackled with replicated belief propagation, and multi-task learning.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20150313T130000 DTEND;TZID=/Europe/London:20150313T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20150320130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Iain Murray DESCRIPTION:Iain Murray (Edinburgh University): Flexible and deep models for density estimation\n\nLocation: Zoom\n\nLink: Roberts G06 (Sir Ambrose Fleming lecture theatre)\n\nAbstract:We bring the empirical success of deep neural network models in

regression and classification to high-dimensional density estimation.

Using the product rule, density estimation in D-dimensions can be

reduced to D regression tasks. We tie these tasks together to improve

computational and statistical efficiency, obtaining state-of-the-art

fits across a wide range of benchmark tasks. I'll give an example

Bayesian data analysis application from cosmology.

Work with Benigno Uria and Hugo Larochelle.

Belief propagation is widely used for approximate inference in undirected graphical models, and rapidly returns an exact solution for a model with no cycles. If cycles are present, however, ‘loopy belief propagation’ (LBP) still often performs well but may not converge at all. It was shown previously that stable fixed points of LBP correspond to local minima of a function termed the Bethe free energy. The global minimum of the Bethe free energy defines the Bethe partition function.

In this seminar, we shall cover two aspects of recent work on the Bethe approximation, focusing on the class of binary pairwise (Ising) models:

(i) It was proved using graph covers (Ruozzi, 2012) that the Bethe partition function is upper bounded by the true partition function for a binary pairwise model that is attractive. Here we provide a new, arguably simpler proof from first principles. We make use of the idea of clamping a variable to a particular value. For an attractive model, we show that summing over the Bethe partition functions for each sub-model obtained after clamping any variable can only raise (and hence improve) the approximation. In fact, we derive a stronger result that may have other useful implications. Repeatedly clamping until we obtain a model with no cycles, where the Bethe approximation is exact, yields the result. We also provide a related lower bound on a broad class of approximate partition functions of general pairwise multi-label models that depends only on the topology. We demonstrate that clamping a few wisely chosen variables can be of practical value by dramatically reducing approximation error.

(ii) We describe a method that is guaranteed to return an epsilon-approximation to the (global optimum) Bethe partition function - to our knowledge, the first such method. For an attractive model, we demonstrate a fully polynomial-time approximation scheme (FPTAS). This addresses an open theoretical question, has practical value for small problems and allows the merits of other approaches to be tested.

Slides and papers are available at the speaker website

Part (i) relates to Weller and Jebara, Clamping variables and approximate inference, NIPS 2014 (oral).

Part (ii) relates to Weller and Jebara, Approximating the Bethe partition function, UAI 2014.

Dr Adrian Weller completed a PhD in 2014 at Columbia University, advised by Prof Tony Jebara, and is now a researcher in the Machine Learning Group at the University of Cambridge, working with Prof Zoubin Ghahramani.

Analysis of trace logging data collections of interactions of a heterogeneous and diverse population of consumers of digital software with mobile devices provides unprecedented possibilities for understanding how software is actually used and for finding recurring patterns of software usage over the population that are exhibited to a greater or lesser degree in each individual software user. In this work, we consider an elementary mobile game played by a population of mobile gamers and collect pieces of game sessions over an extended period, resulting in a collection of users’ trace logs for multiple sessions. We develop a simple, yet flexible, non-parametric Bayes approach to infer playing strategies adopted in the population from the logged traces of game interactions. We demonstrate that our approach finds interpretable strategies and provides good predictive performance compared with alternative modelling assumptions using a non-parametric Bayes framework.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20150410T130000 DTEND;TZID=/Europe/London:20150410T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20150417110000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Csaba Szepesvari DESCRIPTION:Csaba Szepesvari (University of Alberta, Canada): Optimistic Algorithms for Online Learning in Structured Decision Problems\n\nLocation: Zoom\n\nLink: Malet Place Engineering Building 1.03\n\nAbstract:I will describe two online stochastic learning problems that are highly structured. In the case of both problems, the structure will allow us to derive effective optimistic algorithms.

In the first setting, the problem is to learn when to stop waiting for the arrival of some recurring event, such as learning the optimal disk spin-down time for mobile computers. This is a partial-information feedback problem with a continuous unbounded action space and a discontinuous loss function. Yet, the loss has other properties which can be used to design effective algorithms.

In the second problem, the learning agent must distribute available resources among some jobs to maximize the number of completed jobs. Allocating more resources to a given job increases the probability that the job completes, but with a cut-off. The difficulty of each job is unknown initially. Again, I show that the problem's structure allows for an efficient and effective algorithm, which adapts to the actual difficulty of the problem (which ranges from polylogarithmic to polynomial regret).

About the speaker:

Csaba Szepesvari gained his PhD in 1999 from "Jozsef Attila" University, Szeged, Hungary and is currently an Associate Professor at the Department of Computing Science of the University of Alberta and a principal investigator of the Alberta Ingenuity Center for Machine Learning, with extensive experience in the software industry. He is the coauthor of a book on nonlinear approximate adaptive controllers and the author of a book on reinforcement learning, has published over 80 peer reviewed journal and conference papers, serves as the Associate Editor of IEEE Transactions on Adaptive Control and AI Communications, and as a member of the program committee at various machine learning and AI conferences. Areas of expertise include statistical machine learning, Markovian decision processes, reinforcement learning and nonlinear control.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20150417T110000 DTEND;TZID=/Europe/London:20150417T120000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20150508130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Patrick Conrad DESCRIPTION:Patrick Conrad (University of Warwick): Probability Measures on Numerical Solutions of ODEs and PDEs for Uncertainty Quant. and Inference\n\nLocation: Zoom\n\nLink: Roberts G06 (Sir Ambrose Fleming lecture theatre)\n\nAbstract:Deterministic ODE and PDE solvers are widely used, but characterizing the error in numerical solutions within a coherent statistical framework is challenging. We successfully address this problem by constructing a probability measure over functions consistent with the solution that provably contracts to a Dirac measure on the unique solution at rates determined by an underlying deterministic solver. The measure straightforwardly derives from important classes of numerical solvers and is illustrated on uncertainty quantification and inverse problems.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20150508T130000 DTEND;TZID=/Europe/London:20150508T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20150515130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Zhenwen Dai DESCRIPTION:Zhenwen Dai (University of Sheffield): Variational Hierarchical Community of Experts\n\nLocation: Zoom\n\nLink: Roberts 508 (different than usual)\n\nAbstract:Deep latent variable models are promising for unsupervised and semi-supervised learning, however, the development of models with continuous latent variables are left behind. We scale up a deep continuous latent variable model called a hierarchical community of experts. It contains a hierarchy of linear-Gaussian units and a mechanism for dynamically selecting an subset of these units. We derive a new variational lower bound that only needs the estimation of the variational posterior at the top layer and use a probabilistic generative model for approximating such variational posterior by directly generating samples given the inputs, in which variance reduction techniques are not necessary. We verify our new variational bound and our inference generative model by applying to SBN, and compare the performance with the literature on the MNIST dataset. With training HCE on MNIST, we show that it is able to capture sophisticated variances of characters in generated covariance matrices.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20150515T130000 DTEND;TZID=/Europe/London:20150515T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20150529130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Javier Gonzalez DESCRIPTION:Javier Gonzalez (University of Sheffield): Batch Bayesian Optimization via Local Penalization\n\nLocation: Zoom\n\nLink: Roberts G08\n\nAbstract:The popularity of Bayesian optimization methods for efficient exploration of parameter spaces has lead to a series of papers applying Gaussian processes as surrogates in the optimization of functions. However, most proposed approaches only allow the exploration of the parameter space to occur sequentially. Often, it is desirable to simultaneously propose batches of parameter values to explore. This is particularly the case when large parallel processing facilities are available, which can be either computational or physical facets of the process being optimized. Batch methods, however, require modelling of the interaction between the evaluations in the batch, which can be expensive in complex scenarios. We investigate this issue and we propose a simple heuristic based on an estimate of the function Lipschitz's constant that captures the most important aspect of this interaction, i.e., local repulsion, at negligible computational overhead. The resulting algorithm compares well, in running time, with much more elaborate alternatives. A penalized acquisition function is used to collect batches of points of certain size minimizing the non-parallelizable computational effort. The speed-up of our method with respect to previous approaches is significant in a set of computationally expensive experiments.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20150529T130000 DTEND;TZID=/Europe/London:20150529T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20150629120000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Manik Varma DESCRIPTION:Manik Varma (Microsoft Research India): Extreme Classification: A New Paradigm for Ranking & Recommendation\n\nLocation: Zoom\n\nLink: Malet Place Engineering Building 1.02\n\nAbstract:The objective in extreme multi-label classification is to learn a classifier that can automatically tag a data point with the most relevant subset of labels from a large label set. Extreme multi-label classification is an important research problem since not only does it enable the tackling of applications with many labels but it also allows the reformulation of ranking and recommendation problems with certain advantages over existing formulations.

Our objective, in this talk, is to develop an extreme multi-label classifier that is faster to train and more accurate at prediction than the state-of-the-art Multi-label Random Forest (MLRF) algorithm [Agrawal et al. WWW 13] and the Label Partitioning for Sub-linear Ranking (LPSR) algorithm [Weston et al. ICML 13]. MLRF and LPSR learn a hierarchy to deal with the large number of labels but optimize task independent measures, such as the Gini index or clustering error, in order to learn the hierarchy. Our proposed FastXML algorithm achieves significantly higher accuracies by directly optimizing an nDCG based ranking loss function. We also develop an alternating minimization algorithm for efficiently optimizing the proposed formulation. Experiments reveal that FastXML can be trained on problems with more than a million labels on a standard desktop in eight hours using a single core and in an hour using multiple cores.

Brief Bio:

Manik Varma is a researcher at Microsoft Research India. Manik received a bachelor's degree in Physics from St. Stephen's College, University of Delhi in 1997 and another one in Computation from the University of Oxford in 2000 on a Rhodes Scholarship. He then stayed on at Oxford on a University Scholarship and obtained a DPhil in Engineering in 2004. Before joining Microsoft Research, he was a Post-Doctoral Fellow at the Mathematical Sciences Research Institute Berkeley. He has been an Adjunct Professor at the Indian Institute of Technology (IIT) Delhi in the Computer Science and Engineering Department since 2009 and jointly in the School of Information Technology since 2011. His research interests lie in the areas of machine learning, computational advertising and computer vision. He has served as an Area Chair for machine learning and computer vision conferences such as CVPR, ICCV, ICML and NIPS. He has been awarded the Microsoft Gold Star award and has won the PASCAL VOC Object Detection Challenge.

This talk gives an overview about discriminative and reconstructive classification methods for remote sensing images. In the first part, the most commonly used remote sensing sensors and their value for geoscientific applications are introduced. The second part of the presentation explains the discriminative and reconstructive model component of classifiers. While reconstructive methods are able to provide valuable posterior probabilities and are especially suitable for incremental/sequential learning, discriminative models mostly achieve a higher classification accuracy. This talk will present some advantages that arise when both components are combined and used for classification. Applications mainly focus on landcover classification of multispectral and hyperspectral satellite images.

The presentation addresses joint works of the Remote Sensing and Geoinformatics Research Group of FU Berlin and the Institute of Geodesy and Geoinformation of University of Bonn.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20150724T130000 DTEND;TZID=/Europe/London:20150724T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20150828130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Elad Hazan DESCRIPTION:Elad Hazan (Princeton University): Classification with Low Rank and Missing Data\n\nLocation: Zoom\n\nLink: Roberts G08 (Sir David Davies lecture theatre)\n\nAbstract:We consider classification and regression tasks where we have missing

data and assume that the (clean) data resides in a low rank subspace.

We describe an efficient algorithm with provable guarantees for this

setting, as well as a general technique for circumventing

computational hardness via non-reconstructive learning.

based on joint work with Roi Livni and Yishay Mansour

bio:

Elad Hazan is researching the automation of the mechanism of

learning and its efficient algorithmic implementation. He is a member

of the faculty of Princeton University, department of computer

science.

Abstract:

The use of robots in our everyday life is hindered by the complexity necessary to design and tune appropriate controllers to execute the desired tasks.

In this talk, I will show how Bayesian modelling can help to substantially reduce such complexity by providing effective tools.

In the first part of my talk, I will discuss the learning of dynamical models required for accurate control and planning of the robot's movement, with a special emphasis on discontinuities deriving from contacts with the environment.

Following, I will discuss the use of Bayesian optimization to efficiently optimize the parameters of existing controllers. As demonstration, I will present results obtained on a dynamic bipedal walker.

Short Bio:

Roberto Calandra is a PhD Candidate in the Autonomous Intelligent Systems Lab at TU Darmstadt, Germany. Previously, he achieved a B.Sc. in Computer Science with an emphasis on control at the University of Palermo, Italy and a M.Sc. in Machine Learning and Data Mining at the Aalto University (formerly Helsinki University of Technology), Finland.

His research interest lie at the convergence between robotics and machine learning.

Bayesian models are rooted in Bayesian statistics, and easily benefit from the vast literature in the field. In contrast, deep learning lacks a solid mathematical grounding. Instead, empirical developments in deep learning are often justified by metaphors, evading the unexplained principles at play. These two fields are perceived as fairly antipodal to each other in their respective communities. It is perhaps astonishing then that most modern deep learning models can be cast as performing approximate inference in a Bayesian setting. The implications of this statement are profound: we can use the rich Bayesian statistics literature with deep learning models, explain away many of the curiosities with these, combine results from deep learning into Bayesian modelling, and much more.

In this talk I will explore the new theory linking Bayesian modelling and deep learning. The practical impact of the framework will be demonstrated with a range of real-world applications: from uncertainty modelling in deep learning, through training on small datasets, to new state-of-the-art results in image processing. I will finish by surveying open problems to research, problems which stand at the forefront of a new and exciting field combining modern deep learning and Bayesian techniques.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20151113T130000 DTEND;TZID=/Europe/London:20151113T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20151120130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Shakir Mohamed DESCRIPTION:Shakir Mohamed (Google Deepmind): Memory-based Bayesian Reasoning with Deep Learning\n\nLocation: Zoom\n\nLink: Roberts G06 (Sir Ambrose Flemming lecture theatre)\n\nAbstract:Deep learning and Bayesian machine learning are currently two of the most

active areas of machine learning research. Deep learning provides a

powerful class of models and an easy framework for learning that now

provides state-of-the-art methods for applications ranging from image

classification to speech recognition. Bayesian reasoning provides a

powerful approach for knowledge integration, inference, and decision making

that has established it as the key tool for data-efficient learning,

uncertainty quantification and robust model composition, widely-used in

applications ranging from information retrieval to large-scale ranking.

Each of these research areas has shortcomings that can be effectively

addressed by the other, pointing towards a needed convergence of these two

areas of machine learning, and that enhances our machine learning practice.

One powerful outcome of this convergence is our ability to develop systems

for probabilistic inference with memory. A memory-based inference amortises

the cost of probabilistic reasoning by cleverly reusing prior computations.

To explore this, we shall take a statistical tour of deep learning,

re-examine latent variable models and approximate Bayesian inference, and

make connections to denoising auto-encoders and other stochastic

encoder-decoder systems. In this way, we will make sense of what memory in

inference might mean, and highlight the use of amortised inference in many

other parts of machine learning.

---- Bio ----

Shakir's research focuses on exploring and incorporating probabilistic

reasoning in all aspects of machine learning, towards the goal of building

principled, scalable and general-purpose probabilistic decision-making

systems. His current research interests lie at the intersection of

variational inference, deep learning and reinforcement learning. Shakir is

a senior research scientist at Google DeepMind in London. Before moving to

London, he held a junior research fellowship from the Canadian Institute

for Advanced Research (CIFAR) as part of the programme on Neural

Computation and Adaptive Perception, at the University of British Columbia

with Nando de Freitas. He completed his PhD with Zoubin Ghahramani at the

University of Cambridge, as a Commonwealth Scholar to the United Kingdom

and a member of St John's College. He is from South Africa, and completed

his prior degrees in Electrical and Information Engineering at the

University of the Witwatersrand, Johannesburg.

Abstract:

Value functions are a core component of reinforcement learning (RL) systems. The main idea is to construct a single function approximator that estimates the long-term reward from any state. We introduce universal value function approximators (UVFAs) that generalise not just over states but also over goals. We develop an efficient technique for supervised learning of UVFAs, by factoring observed values into separate embedding vectors for state and goal, and then learning a mapping from state and goal to these factored embedding vectors. We show how this technique may be incorporated into a reinforcement learning algorithm that updates the UVFA solely from observed rewards. Finally, we demonstrate that a UVFA can successfully generalise to previously unseen goals, and can be scaled to complex RL problems such as learning to play Ms Pac-Man from pixels.

Bio:

Tom Schaul is a senior researcher at Google DeepMind in London, interested in robust, general-purpose learning algorithms. He thinks that progress is possible on general AI, and that games are the perfect benchmark domain for that. Tom did his PhD with Jürgen Schmidhuber at IDSIA and his Postdoc with Yann LeCun at NYU. Since 2008, he has published 40 papers on reinforcement learning, neural networks, artificial curiosity, evolution and other optimization algorithms.

Talk 1: Stephen Pasteris will present

Online Prediction at the Limit of Zero Temperature

by Mark Herbster and Stephen Pasteris

Abstract:

We design an online algorithm to classify the vertices of a graph. Underpinning the algorithm is the probability distribution of an Ising model isomorphic to the graph. Each classification is based on predicting the label with maximum marginal probability in the limit of zero-temperature with respect to the labels and vertices seen so far. Computing these classifications is unfortunately based on a\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20151204T130000 DTEND;TZID=/Europe/London:20151204T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20160219130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Thore Graepel DESCRIPTION:Thore Graepel (DeepMind, University College London): DeepMind's Quest for Artificial General Intelligence: From Atari to AlphaGo and beyond\n\nLocation: Zoom\n\nLink: Roberts G06 Sir Ambrose Fleming LT\n\nAbstract:

Abstract [NOTE slight change in the program]:

Thore will giving an overview of the ambitious research program at DeepMind, including some of our latest advances. I will also discuss some of the key challenges we are currently tackling in the quest to build Artificial General Intelligence, and the approaches we are taking to solve them. One example will be our new approach to computer Go that combines Monte-Carlo tree search with deep neural networks resulting in AlphaGo, the first computer program to defeat a human professional Go player.

Bio:

Thore Graepel is a senior researcher at Google DeepMind and an affiliated professor in Machine Learning at University College London. He is broadly interested in understanding intelligence and building it into general artificial systems. Throughout his career, he has had many outstanding contributions to the field of machine learning & probabilistic modelling -- most prominently in areas of recommender and ranking systems, social analytics, probabilistic programming, online advertising and development of game AI-s. He was previously a Principal Researcher at Microsoft Research Cambridge and head of the research group on Online Services and Advertising. Before joining Microsoft, he received his PhD from TU Berlin and conducted post-doctoral research at ETH Zurich and Royal Holloway, University of London.

Gaussian process models are widely used in statistics and machine learning. There are three key challenges to inference that might be tackled using variational methods: inference over the latent function values when the likelihood is non-Gaussian; scaling the computation to large datasets; inference over the kernel-parameters. I’ll show how the variational framework can be used to tackle all of these. In particular, I’ll share recent insights which allow us to interpret the approximation ain an elegant and straightforward way, using variational Bayes over stochastic processes. Finally, I’ll outline how this technology can be used to help tackle contemporary problems in biostatistics.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20160304T130000 DTEND;TZID=/Europe/London:20160304T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20160311130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Francois-Xavier Briol DESCRIPTION:Francois-Xavier Briol (University of Warwick): Probabilistic Numerics Approaches to Integration\n\nLocation: Zoom\n\nLink: Roberts G08 Sir David Davies LT (TBC)\n\nAbstract:Probabilistic numerical methods aim to model numerical error as a source of epistemic uncertainty that is subject to probabilistic analysis and reasoning, enabling the principled propagation of numerical uncertainty through a computational pipeline. This talk will present probabilistic numerical integrators based on Markov chain and Quasi Monte Carlo and prove asymptotic results on the coverage of the associated probability models for numerical integration error. The performance of probabilistic integrators is guaranteed to be no worse than non-probabilistic integrators and is, in many cases, asymptotically superior. These probabilistic integrators therefore enjoy the "best of both worlds", leveraging the sampling efficiency of advanced Monte Carlo methods whilst being equipped with valid probabilistic models for uncertainty quantification. Several applications and illustrations will be provided, including examples from computer vision and system modelling using non-linear differential equations.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20160311T130000 DTEND;TZID=/Europe/London:20160311T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20160318130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Emtiyaz Khan DESCRIPTION:Emtiyaz Khan (EPFL): Approximate Bayesian Inference: Bringing Statistics, Optimization, and Machine Learning Together.\n\nLocation: Zoom\n\nLink: Roberts G08 Sir David Davies LT (TBC)\n\nAbstract:Abstract:

Machine learning relies heavily on data to design computers that can learn autonomously, but dealing with noisy, unreliable, heterogeneous, high-dimensional, and missing data is a big challenge in itself. Surprisingly, living beings - even young ones - are very good in dealing with such data. This raises the question: how do they do it, and how can we design computers that can learn like them?

Bayesian methods are promising in answering such questions, but they are computationally challenging, especially when data are large and models are complex. In this talk, I will start by showing a few example applications where this is the case. I will then discuss my work which solves many computational challenges associated with Bayesian methods by converting the "Bayesian integration" problem into an optimization problem. I will outline some of my future plans to design linear-time algorithms for Bayesian inference. Overall, I will argue that, by combining ideas from statistics, optimization, and machine learning, we might be able to design computers that can learn autonomously, just like us.

A short biography:

Mohammad Emtiyaz Khan is a scientist in the School of Computer and Communication Sciences at the École polytechnique fédérale de Lausanne (EPFL). He obtained his PhD from the University of British Columbia (UBC) in 2012 under the supervision of Dr. Kevin Murphy, and later he worked as a post-doctoral fellow at EPFL under Dr. Matthias Seeger. His research lies at the intersection of machine learning, statistics, and optimization, and their applications to a wide-variety of areas such as health, education, sensor networks, social networks, and biomedicine. He has been actively involved in teaching, especially large courses in machine learning, for which he has received several teaching awards and prizes.

Abstract:

The game of Go has long been viewed as the most challenging of classic games for artificial intelligence owing to its enormous search space and the difficulty of evaluating board positions and moves. Here we introduce a new approach to computer Go that uses ‘value networks’ to evaluate board positions and ‘policy networks’ to select moves. These deep neural networks are trained by a novel combination of supervised learning from human expert games, and reinforcement learning from games of self-play. Using this search algorithm, our program AlphaGo achieved a 99.8% winning rate against other Go programs, and defeated the human European Go champion by 5 games to 0. This is the first time that a computer program has defeated a human professional player in the full-sized game of Go, a feat previously thought to be at least a decade away.

NOTE: This will include upcoming result & discussion of the match in Seoul, with the world champion Lee Sedol!

In 1972, Charles Stein published a central limit theorem for correlated variables. The mathematical approach used in the proof has since become known as Stein’s Method. This talk provides an introduction to Stein’s Method and describes a formal generalisation, based on Stein Operators. A characterisation of the action of Stein Operators on Hilbert spaces offers considerable potential for applications in kernel-based machine learning. One such application is presented, in the context of numerical integration for Bayesian posterior computation.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20160415T130000 DTEND;TZID=/Europe/London:20160415T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20160422130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Ingmar Schuster DESCRIPTION:Ingmar Schuster (Université Paris-Dauphine): Kernel Sequential Monte Carlo\n\nLocation: Zoom\n\nLink: Roberts G08 Sir David Davies LT\n\nAbstract:Bayesian posterior inference with Monte Carlo methods has a fundamental role in statistics and probabilistic machine learning. Target posterior distributions arising in increasingly complex models often exhibit high degrees of nonlinearity and multimodality and pose substantial challenges to traditional samplers.

We propose the Kernel Sequential Monte Carlo (KSMC) framework for building emulator models of the current particle system in a Reproducing Kernel Hilbert Space and use the emulator's geometry to inform local proposals. KSMC is applicable when gradients are unknown or prohibitively expensive and inherits the superior performance of SMC on multi-modal targets and its ability to estimate model evidence. Strengths of the proposed methodology are demonstrated on a series of challenging synthetic and real-world examples.

Joint work with Heiko Strathmann, Brooks Paige, Dino Sejdinovic.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20160422T130000 DTEND;TZID=/Europe/London:20160422T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20160506130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Ted Meeds DESCRIPTION:Ted Meeds (University of Amsterdam): Likelihood-free Inference by Controlling Simulator Noise\n\nLocation: Zoom\n\nLink: Roberts G08 Sir David Davies LT\n\nAbstract:Likelihood-free inference, or approximate Bayesian computation (ABC), is a general framework for performing Bayesian inference in simulation-based science. In this talk I will discuss two new approaches to likelihood-free inference that involve explicit control over a simulation’s randomness. By re-writing simulation code with two sets of arguments, the simulation parameters and its random numbers, many algorithmic options open up. The first approach, called Optimisation Monte Carlo, in an algorithm that efficiently and independently samples parameters from the posterior by first sampling a set of random numbers from a prior distribution, then running an optimisation algorithm---with fixed random numbers---to match simulation statistics with observed statistics. The second approach is recent and ongoing research on a variational ABC algorithm that has been written in an auto-differentiation language allowing for the gradients of the variational parameters to be computed through the simulation code itself.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20160506T130000 DTEND;TZID=/Europe/London:20160506T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20160601130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Yee-Whye Teh DESCRIPTION:Yee-Whye Teh (University of Oxford): Distributed Bayesian Learning\n\nLocation: Zoom\n\nLink: Roberts G06 Sir Ambrose Fleming LT\n\nAbstract:Full title: Distributed Bayesian Learning with Stochastic Natural-gradient

Expectation Propagation and the Posterior Server

We make two contributions to Bayesian machine learning algorithms. Firstly, we propose stochastic natural gradient expectation propagation (SNEP), a novel alternative to expectation propagation (EP), a popular variational inference algorithm. SNEP is a black box variational algorithm, in that it does not require any simplifying assumptions on the distribution of interest, beyond the existence of some Monte Carlo sampler for estimating the moments of the EP tilted distributions. Further, as opposed to EP which has no guarantee of convergence, SNEP can be shown to be convergent, even when using Monte Carlo moment estimates.

Secondly, we propose a novel architecture for distributed Bayesian learning which we call the posterior server. The posterior server allows scalable and robust Bayesian learning in cases where a dataset is stored in a distributed manner across a cluster, with each compute node containing a disjoint subset of data. An independent Markov chain Monte Carlo (MCMC) sampler is run on each compute node, with direct access only to the local data subset, but which targets an approximation to the global posterior distribution given all data across the whole cluster. This is achieved by using a distributed asynchronous implementation of SNEP to pass messages across the cluster. We demonstrate SNEP and the posterior server on distributed Bayesian learning of logistic regression and neural networks.

Authors: Yee Whye Teh, Leonard Hasenclever, Thibaut Lienart, Sebastian

Vollmer, Stefan Webb, Balaji Lakshminarayanan, Charles Blundell

Abstract:

In vision and machine learning, from 3D reconstruction to recommender systems, it is common to see optimization problems of the form

$\min_x \sum_i \min_u f_i(x,u)$

There are a few main strategies for minimizing these problems: block coordinate descent (a.k.a. alternation, “EM-style”, or ICP), joint optimization (a.k.a. lifting or bundle-style), variable projection (VarPro), and the various SGD techniques. For years I have been using lifting to great effect, and I will show examples where it dramatically improves convergence rates and wall-clock speed. Recently, new light has been cast on these alternatives, and I will show examples where VarPro wins hands down. Ultimately, I’ll try to give intuitions that allow you to know into which case your problem falls and when it matters; that is, when it’s important to use the more advanced strategies rather than ICP or SGD.

Joint work with John Hong, Cambridge University, and many others.

ABSTRACT:

Performing inference over large uncertain data sets is becoming a central data management problem. Recent large knowledge bases, such as Yago, Nell or DeepDive, have millions to billions of uncertain tuples. Because general reasoning under uncertainty is highly intractable, many state-of-the-art systems today perform approximate inference by reverting to sampling. This talk shows an alternative approach that allows ranking answers to hard probabilistic queries in guaranteed polynomial time, and by using only basic operators of existing database management systems (i.e. no sampling required).

(1) The first part of this talk develops upper and lower bounds for the probability of Boolean functions by treating multiple occurrences of variables as independent and assigning them new individual probabilities. We call this approach dissociation and give an exact characterization of optimal oblivious bounds, i.e. when the new probabilities are chosen independent of the probabilities of all other variables. Our new bounds shed light on the connection between previous relaxation-based and model-based approximations and unify them as concrete choices in a larger design space.

(2) The second part then draws the connection to lifted inference and shows how the problem of approximate probabilistic inference can be entirely reduced to a standard query evaluation problem with aggregates. There are no iterations and no exponential blow-ups. All benefits of relational engines (such as cost-based optimizations, multi-core query processing, shared-nothing parallelization) are directly available to queries over probabilistic databases. To achieve this, we compute approximate rather than exact probabilities, with a one-sided guarantee: The probabilities are guaranteed to be upper bounds to the true probabilities, which we show is sufficient to rank the top query answers with high precision. We give experimental evidence on synthetic TPC-H data that this approach can be orders of magnitude faster and also more accurate than sampling-based approaches.

(Talk based on joint work with Dan Suciu from TODS 2014 and VLDB 2015: http://arxiv.org/pdf/1409.6052, http://arxiv.org/pdf/1412.1069)

Abstract:

Over the last few years Convolutional Neural Networks (CNNs) have been shown to deliver excellent results in a broad range of low- and high-level vision tasks, spanning effectively the whole spectrum of computer vision problems.

In this talk we will present recent research progress along two complementary directions.

In the first part we will present research efforts on integrating established computer vision ideas with CNNs, thereby allowing us to incorporate task-specific domain knowledge in CNNs. We will present CNN-based adaptations of structured prediction techniques that use discrete (DenseCRF - Deeplab) and continuous energy-based formulations (Deep Gaussian CRF), and will also present methods to incorporate ideas from multi-scale processing, Multiple-Instance Learning and Spectral Clustering into CNNs.

In the second part of the talk we will turn to designing a generic architecture that can tackle a multitude of tasks jointly, aiming at designing a `swiss knife’ for computer vision. We call this network an ‘UberNet’ to underline its overarching nature. We will introduce techniques that allow us to train an UberNet while using datasets with diverse annotations, while also handling the memory limitations of current hardware. The proposed architecture is able to jointly address (a) boundary detection (b) saliency detection (c) normal estimation (d) semantic segmentation (e) human part segmentation (f) human boundary detection (g) region proposal generation and object detection in 0.7 seconds per frame, with a level of performance that is comparable to the current state-of-the-art on these tasks.

Links:

UberNet demo:

http://cvn.ecp.fr/ubernet/

Deeplab:

https://bitbucket.org/aquariusjay/deeplab-public-ver2

Boundary Detection:

http://cvn.ecp.fr/personnel/iasonas/deepboundaries.html

References:

I. Kokkinos, UberNet: Training a ‘Universal’ CNN for Low-, Mid-, and High- Level Vision using Diverse Datasets and Limited Memory, arxiv, 2016

S. Chandra and I. Kokkinos, Fast, Exact and Multi-Scale Inference for Semantic Image Segmentation with Deep Gaussian CRFs, Proc. European Conf. on Computer Vision (ECCV), 2016

L.-C. Chen, G. Papandreou, I. Kokkinos, K. Murphy, A. L. Yuille, DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs v1: ICLR 2015, v2: arxiv, 2016

I. Kokkinos, Pushing the Boundaries of Boundary Detection using Deep Learning, Int.l Conf. on Learning Representations (ICLR), 2016.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20161019T130000 DTEND;TZID=/Europe/London:20161019T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20161026130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Hrishi Aradhye DESCRIPTION:Hrishi Aradhye (Google Research/ Google Play): Personalized app/games recommendations on Google Play using machine learning\n\nLocation: Zoom\n\nLink: Roberts Building G06 Sir Ambrose Fleming LT\n\nAbstract:This informal, loosely technical talk will give a quick overview of personalization and discovery on the Play Store and our recent achievements and focus. Google Play is one of the most used mobile applications today with over one billion active users. We are actively engaged with research collaborations towards better personalized recommendations using scalable machine learning. I will briefly discuss a new ML paradigm called 'Wide & Deep learning'—jointly trained wide linear models and deep neural networks—to combine the benefits of memorization and generalization for recommender systems. Online experiment results show that Wide & Deep significantly increased app acquisitions compared with wide-only and deep-only models.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20161026T130000 DTEND;TZID=/Europe/London:20161026T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20161028130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Shakir Mohamed DESCRIPTION:Shakir Mohamed (Google DeepMind): Building Machines that Imagine and Reason: Principles and Applications of Deep Generative Models\n\nLocation: Zoom\n\nLink: Roberts Building 508\n\nAbstract:Deep generative models provide a solution to the problem of unsupervised learning, in which a machine learning system is required to discover the structure hidden within unlabelled data streams. Because they are generative, such models can form a rich imagery the world in which they are used: an imagination that can harnessed to explore variations in data, to reason about the structure and behaviour of the world, and ultimately, for decision-making. This tutorial looks at how we can build machine learning systems with a capacity for imagination using deep generative models, the types of probabilistic reasoning that they make possible, and the ways in which they can be used for decision making and acting.

Deep generative models have widespread applications including those in density estimation, image de-noising and in-painting, data compression, scene understanding, representation learning, 3D scene construction, semi-supervised classification, and hierarchical control, amongst many others. After exploring these applications, we'll sketch a landscape of generative models, drawing-out three groups of models: fully-observed models, transformation models, and latent variable models. Different models require different principles for inference and we'll explore the different options available. Different combinations of model and inference give rise to different algorithms, including auto-regressive distribution estimators, variational auto-encoders, and generative adversarial networks. Although we will emphasise deep generative models, and the latent-variable class in particular, the intention of the tutorial will be to explore the general principles, tools and tricks that can be used throughout machine learning. These reusable topics include Bayesian deep learning, variational approximations, memoryless and amortised inference, and stochastic gradient estimation. We'll end by highlighting the topics that were not discussed, and imagine the future of generative models.

In this work, we take a fresh look at some old and new algorithms for off-policy, return-based reinforcement learning. Expressing these in common form, we derive a novel algorithm, Retrace(λ), with three desired properties: (1) low variance; (2) safety, as it safely uses samples collected from any behaviour policy, whatever its degree of "off-policyness"; and (3) efficiency, as it makes the best use of samples collected from near on-policy behaviour policies. We analyse the contractive nature of the related operator under both off-policy policy evaluation and control settings and derive online sample-based algorithms. To our knowledge, this is the first return-based off-policy control algorithm converging a.s. to Q* without the GLIE assumption (Greedy in the Limit with Infinite Exploration). As a corollary, we prove the convergence of Watkins' Q(λ), which was still an open problem. We illustrate the benefits of Retrace(λ) on a standard suite of Atari 2600 games.

Bio: Remi Munos is currently research scientist at Google DeepMind and on leave from Inria. He worked on topics related to reinforcement learning, bandit theory, optimisation, and statistical learning.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20161104T130000 DTEND;TZID=/Europe/London:20161104T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20161111130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Daniel Tarlow DESCRIPTION:Daniel Tarlow (Microsoft Research Cambridge): Learning to Code: Machine Learning for Program Induction\n\nLocation: Zoom\n\nLink: Roberts Building 508\n\nAbstract:I'll present two of our recent works on using machine learning to induce computer programs from input-output examples. The first system is TerpreT, which enables comparison of machine learning-based program synthesis techniques to programming languages (PL)-based techniques. Based on our learnings from TerpreT, we develop the second system, DeepCoder, which induces programs from input-output examples using a neural network to guide PL-based search techniques. DeepCoder achieves an order of magnitude speedup over optimized search-based techniques, and it can solve problems of difficulty comparable to the simplest problems on programming competition websites.

Bio: Danny Tarlow is a Researcher in the Machine Intelligence and Perception group at Microsoft Research in Cambridge, UK. His research interests are in the application of machine learning to problems involving highly structured data, with a specific interest in the intersection of machine learning and programming languages. He is an editor of the forthcoming MIT Press book on Perturbations, Optimization, and Statistics, and his work has won paper awards at UAI (Best Student Paper, Runner Up), the ICML Workshop on Constructive Machine Learning (Best Paper), and NIPS (Best Paper). He holds a Ph.D. from the Machine Learning group at the University of Toronto (2013) and was previously a Research Fellow at Darwin College, University of Cambridge (2013-2016).

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20161111T130000 DTEND;TZID=/Europe/London:20161111T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20161118130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Ryota Tomioka DESCRIPTION:Ryota Tomioka (Microsoft Research Cambridge): f-GAN: Training Generative Neural Samplers using Variational Divergence Minimization\n\nLocation: Zoom\n\nLink: Roberts Building 508\n\nAbstract:Generative neural samplers are probabilistic models that implement sampling using feedforward neural networks: they take a random input vector and produce a sample from a probability distribution defined by the network weights. These models are expressive and allow efficient computation of samples and derivatives, but cannot be used for computing likelihoods or for marginalization. The generative-adversarial training method allows to train such models through the use of an auxiliary discriminative neural network. We show that the generative-adversarial approach is a special case of an existing more general variational divergence estimation approach. We show that any f-divergence can be used for training generative neural samplers. We discuss the benefits of various choices of divergence functions on training complexity and the quality of the obtained generative models.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20161118T130000 DTEND;TZID=/Europe/London:20161118T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20161125130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Theo Trouillon DESCRIPTION:Theo Trouillon (Xerox Research, Univ. Grenoble Alpes): Complex-Valued Embeddings for Knowledge Base Completion\n\nLocation: Zoom\n\nLink: Roberts Building 508\n\nAbstract:Abstract:

In statistical relational learning, knowledge base completion deals with automatically understanding the structure of large knowledge bases—labeled directed graphs—and predicting missing relationships—labeled edges. State-of-the-art embedding models propose different trade-offs between modeling expressiveness, and time and space complexity. We reconcile both expressiveness and complexity through the use of complex-valued embeddings and explore the link between such complex-valued embeddings and unitary diagonalization. We corroborate our approach theoretically and show that all real square matrices—thus all possible relation/adjacency matrices—are the real part of some unitarily diagonalizable matrix. This results opens the door to a lot of other applications of square matrices factorization. Our approach based on complex embeddings is arguably simple, as it only involves a Hermitian dot product, the complex counterpart of the standard dot product between real vectors, whereas other methods resort to more and more complicated composition functions to increase their expressiveness. The proposed complex embeddings are scalable to large data sets as it remains linear in both space and time, while consistently outperforming alternative approaches on standard link prediction benchmarks.

Bio:

After graduating from ENSIMAG, Théo started his PhD at Univ. Grenoble Alpes and at Xerox Research Centre Europe. He is currently visiting PhD student in the UCL Machine Reading team. His main research topic is statistical relational learning, focusing on complex-valued embedding models.

With NIPS coming up the following week, we'll have some of the people @ UCL present their accepted work at the conference.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20161202T130000 DTEND;TZID=/Europe/London:20161202T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20170127130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Mijung Park DESCRIPTION:Mijung Park (Amsterdam Machine Learning Lab): Variational Bayes In Private Settings (VIPS)\n\nLocation: Zoom\n\nLink: Roberts Building G08 Sir David Davies LT\n\nAbstract:Abstract: Bayesian methods are frequently used for analysing privacy-sensitive datasets, including medical records, emails, and educational data, and there is a growing need for practical Bayesian inference algorithms that protect the privacy of individuals' data. To this

end, we provide a general framework for privacy-preserving variational Bayes (VB) for a large class of probabilistic models, called the conjugate exponential (CE) family. Our primary observation is that when models are in the CE family, we can privatise the variational posterior distributions simply by perturbing the expected sufficient statistics of the complete-data likelihood. For widely used non-CE models with binomial likelihoods (e.g., logistic regression), we exploit the Polya-Gamma data augmentation scheme to bring such models into the CE family, such that inferences in the modified model resemble the original (private) variational Bayes algorithm as closely as possible. The iterative nature of variational Bayes presents a further challenge for privacy preservation, as each iteration increases the amount of noise needed. We overcome this challenge by combining: (1) a relaxed notion of differential privacy, called concentrated differential privacy, which provides a tight bound on the privacy cost of multiple VB iterations and thus significantly decreases the amount of additive noise; and (2) the privacy amplification effect of subsampling mini-batches from large-scale data in stochastic learning. We empirically demonstrate the effectiveness of our method in CE and non-CE models including latent

Dirichlet allocation (LDA), Bayesian logistic regression, and Sigmoid Belief Networks

(SBNs), evaluated on real-world datasets.

Gaussian processes (GPs) are a good choice for function approximation as they are flexible, robust to over-fitting, and provide well-calibrated predictive uncertainty. Deep Gaussian processes (DGPs) are multi-layer generalisations of GPs, but inference in these models has proved challenging. Existing approaches to inference in DGP models assume approximate posteriors that force independence between the layers, and do not work well in practice. We present a doubly stochastic variational inference algorithm, which does not force independence between layers. With our method of inference we demonstrate that a DGP model can be used effectively on data ranging in size from hundreds to a billion points. We provide strong empirical evidence that our inference scheme for DGPs works well in practice in both classification and regression.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20171020T130000 DTEND;TZID=/Europe/London:20171020T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20171103130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Aapo Hyvarinen DESCRIPTION:Aapo Hyvarinen (UCL, Gatsby): Nonlinear ICA using temporal structure: a principled framework for unsupervised deep learning\n\nLocation: Zoom\n\nLink: Roberts Building G08 Sir David Davies LT\n\nAbstract:Unsupervised learning, in particular learning general nonlinear representations, is one of the deepest problems in machine learning. Estimating latent quantities in a generative model provides a principled framework, and has been successfully used in the linear case, e.g. with independent component analysis (ICA) and sparse coding. However, extending ICA to the nonlinear case has proven to be extremely difficult: A straight-forward extension is unidentifiable, i.e. it is not possible to recover those latent components that actually generated the data. Here, we show that this problem can be solved by using temporal structure. We formulate two generative models in which the data is an arbitrary but invertible nonlinear transformation of time series (components) which are statistically independent of each other. Drawing from the theory of linear ICA, we formulate two distinct classes of temporal structure of the components which enable identification, i.e. recovery of the original independent components. We show that in both cases, the actual learning can be performed by ordinary neural network training where only the input is defined in an unconventional manner, making software implementations trivial. We can rigorously prove that after such training, the units in the last hidden layer will give the original independent components. [With Hiroshi Morioka, published at NIPS2016 and AISTATS2017.]

Gaussian processes are models that are equivalent to neural networks with infinitely many hidden units, and have many desirable properties, such as tractable Bayesian inference and sensible estimates. In recent years, there has been much progress on approximate inference for large datasets and non-conjugate likelihoods. However, the model structure of Gaussian processes has remained simple, especially compared to deep models. In this talk, we show how convolutional structure can be embedded in a Gaussian process and how to construct an tailored variational inference scheme for practical and accurate inference. We show that this structure significantly improves performance on classification tasks, as was seen in neural networks. We hope that this work will inspire work on more interesting Gaussian process models, where we obtain the benefits of both accurate inference and complex model structure.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20171117T130000 DTEND;TZID=/Europe/London:20171117T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20171124120000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Various DESCRIPTION:Various (UCL): NIPS Accepted Papers\n\nLocation: Zoom\n\nLink: Roberts Building Room 421\n\nAbstract:The following UCL researchers will present their NIPS 2017 accepted papers.

1. Nicolò Colombo: Tomography of the London Underground: a Scalable Model for Origin-Destination Data

2. Zhen He: Wider and Deeper, Cheaper and Faster: Tensorized LSTMs for Sequence Learning (https://arxiv.org/abs/1711.01577)

3. Wittawat Jitkrittum: A Linear-Time Kernel Goodness-of-Fit Test

4. Jamie Hayes: Generating steganographic images via adversarial training

5. Carlo Ciliberto: Consistent Multitask Learning with Nonlinear Output Relations

6. Thomas Anthony: Thinking Fast and Slow with Deep Learning and Tree Search (https://arxiv.org/abs/1705.08439)

This talk describes very recent efforts on developing approximate inference algorithms that enables approximations of arbitrary form. I will start by revisiting fundamental tractability issues of Bayesian computation and argue that density evaluation of the approximate posterior is mostly unnecessary. Then I will present 4 different categories of wild approximate inference methods that has been explored recently, with the focus on two of them developed by myself and colleagues. I will briefly cover: 1. the amortised MCMC algorithm that improves the approximate posterior by following the particle update of a valid MCMC sampler; and 2. a gradient estimation method that allow variational inference to be applied to those approximate distributions without a tractable density.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20171201T130000 DTEND;TZID=/Europe/London:20171201T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20171208130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Relja Arandjelovic DESCRIPTION:Relja Arandjelovic (DeepMind): Look, Listen and Learn\n\nLocation: Zoom\n\nLink: Roberts Building G08 Sir David Davies LT\n\nAbstract:We consider the question: what can be learnt by looking at and listening to a large number of unlabelled videos? There is a valuable, but so far untapped, source of information contained in the video itself -- the correspondence between the visual and the audio streams, and we introduce a novel "Audio-Visual Correspondence" (AVC) learning task that makes use of this. Training visual and audio networks from scratch, without any additional supervision other than the raw unconstrained videos themselves, is shown to successfully solve this task, and, more interestingly, result in good visual and audio representations. These features set the new state-of-the-art on two sound classification benchmarks, and perform on par with the state-of-the-art self-supervised approaches on ImageNet classification. We also design a network that can learn to embed audio and visual inputs into a common space that is suitable for cross-modal retrieval, and a network that can localize the object that sounds in an image, given the audio signal. We achieve all of these objectives by training from unlabelled video using only audio-visual cor-

respondence (AVC) as the objective function.

The infinite-dimensional exponential family is a rich generalization of the standard exponential family, going beyond finite sufficient statistic functions to allow for very complex models. In particular, we study the kernel exponential family, where the natural parameter lies in a reproducing kernel Hilbert space. Computing the normalization constant in this class of models is difficult, but efficient estimation is possible via score matching. This approach, however, has cubic computational complexity in both the number of sampled points and their dimension. We thus propose estimation with a low-rank, Nyström-like approximation. The new solution retains essentially the same convergence rate of the full-rank solution, with substantially less computational effort and storage. We demonstrate the applicability of the method both to density estimation and to approximating Hamiltonian Monte Carlo when gradients are available.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20171215T130000 DTEND;TZID=/Europe/London:20171215T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20180112130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Tamara Fernandez DESCRIPTION:Tamara Fernandez (UCL Gatsby): A Gaussian process model for survival analysis\n\nLocation: Zoom\n\nLink: Roberts Building G08 Sir David Davies LT\n\nAbstract:We introduce a novel Bayesian non-parametric model for survival data. The model is based on using a positive map of a Gaussian process with stationary covariance function as prior over the so-called hazard function. This model is thoughtfully studied in terms of prior behaviour and posterior consistency. Alternatives to incorporate covariates are discussed as well as an exact and tractable inference scheme.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20180112T130000 DTEND;TZID=/Europe/London:20180112T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20180126130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Edouard Oyallon DESCRIPTION:Edouard Oyallon (CentraleSupelec): Invariance & invertibility in CNNs\n\nLocation: Zoom\n\nLink: Roberts Building G08 Sir David Davies LT\n\nAbstract:Outstanding supervised classification performances obtained by CNNs indicate they have the ability to create relevant invariants for classification. We show that this can be achieved through progressive invariance incorporation and as well via perfectly invertible architectures. Illustrations are given through Hybrid Scattering Networks, based on a geometric representation, and $i$-RevNets, a class of invertible CNNs. We explicit several empirical properties, like progressive linear separability, in order to shed light on the inner mechanisms implemented by CNNs.

In the modern age, rankings data is ubiquitous and is useful for a variety of applications such as recommender systems, multiobject tracking and preference learning. However, most rankings data encountered in the real world is incomplete, which forbids the direct application of existing modelling tools for complete rankings. In this talk, we present a novel way to extend kernel methods for complete rankings to partial rankings, via consistent Monte Carlo estimators of Gram matrices. These Monte Carlo kernel estimators are given by extending kernel mean embeddings to the embedding of a set of full rankings consistent with an observed partial ranking. They form a computationally tractable alternative to previous approaches for partial rankings data. We also present a variance reduction scheme based on an antithetic variate construction between permutations to get an improved a Monte Carlo estimator. Once the Gram matrix estimators are obtained they can be used for supervised and unsupervised Machine Learning kernel methods. In particular, we present comparative simulation results demonstrating the efficacy of the proposed estimators for an MMD hypothesis test and a Gaussian process task by extending some of the existing methods in the GPy framework.

Slides from the talk: http://www.nowozin.net/sebastian/talks/nowozin-london-2018-02-09.pptx

Generative Adversarial Networks (GANs) have breathed new life into research on generative models. Generative models promise to be able to learn rich structural representations from unsupervised data, enabling data-efficient modelling in complex domains. The talk is divided into three parts.

The first part introduces the basic GAN approach, understanding it both on the statistical level in terms of minimizing a divergence between probability distributions and algorithmically in terms of a smooth two-player game.

The second part discusses problems in the GAN approach and consolidates recent research by highlighting problems both in the statistical viewpoint (existence of divergences) and in the algorithmic viewpoint (convergence of the GAN game), making recommendations for practical use of GAN models.

The third part discusses the relationship to other generative modelling approaches, potential applications of GANs and GAN-type approximations, and raises open problems for future research.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20180209T130000 DTEND;TZID=/Europe/London:20180209T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20180216130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Zbigniew Wojna DESCRIPTION:Zbigniew Wojna (UCL): Architectures for big scale 2D imagery\n\nLocation: Zoom\n\nLink: Roberts Building G08 Sir David Davies LT\n\nAbstract:I will present research that I conducted during my Ph.D. at University College London and in collaboration with Google. My primary interest lays in the development of neural architectures for 2D imagery problems in big scale. Will present the recently published analysis of different upsampling methods in the decoder part of visual architectures, together with last week ongoing extension for GANs. Will discuss attention mechanism for text recognition and review for what kind of application it can be useful (automatically updating Google Maps based on Google Street View imagery). I will explain the idea behind Inception and what had we change in inception-v3 to have it the best single model on ImageNet 2015 and how does it compare to Resnet architecture which was published 2 weeks after. Together with inception, will present our winning submission to MS COCO 2016 detection challenge and the extensive analysis of different models and backbone architectures inside. At the end will shortly review our UCL effort working with 4096x4096 images at The Digital Mammography DREAM Challenge for breast cancer recognition, where we have achieved 9th among 1375 teams worldwide and 2nd place in the community phase.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20180216T130000 DTEND;TZID=/Europe/London:20180216T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20180223130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Ricardo Silva DESCRIPTION:Ricardo Silva (UCL Statistical Science): Some Machine Learning Tools to Aid Causal Inference\n\nLocation: Zoom\n\nLink: Roberts Building G08 Sir David Davies LT\n\nAbstract:Causal inference from observational data requires untestable assumptions. As assumptions may fail, it is important to be able to understand how conclusions vary under different premises. Machine learning methods are particularly good at searching for hypotheses, but they do not always provide ways of expressing a continuum of assumptions from which causal estimands can be proposed. We introduce one family of assumptions and algorithms that can be used to provide alternative explanations for treatment effects. If we have time, I will also discuss some other developments on the integration of observational and interventional data using a nonparametric Bayesian approach.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20180223T130000 DTEND;TZID=/Europe/London:20180223T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20180309130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Sam Livingstone DESCRIPTION:Sam Livingstone (UCL Statistical Science): What we talk about when we talk about non-reversible MCMC\n\nLocation: Zoom\n\nLink: Roberts Building G08 Sir David Davies LT\n\nAbstract:There has been much recent interest in designing MCMC methods that exploit some form of non-reversibility. It has been known for some time that non-reversible Markov chains/processes can mix more quickly than reversible counterparts, and so it is believed that harnessing non-reversibility could lead to faster MCMC algorithms for Bayesian computation.

I’ll spend some time at the beginning of the talk discussing what is known about non-reversible processes, and building intuition. Then I will aim to draw several connections between many non-reversible MCMC methods in the literature, showing that each shares a common structure, which can be thought of as a particular type of non-reversibility, and can be reduced to simple expressions relating to the generator of the process. Using this structure we can compare different non-reversible processes, establishing simple Peskun-type orderings between them, which in turn prove some conjectures and strengthen some earlier results.

This is joint work with Christophe Andrieu.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20180309T130000 DTEND;TZID=/Europe/London:20180309T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20180413130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Seth Flaxman DESCRIPTION:Seth Flaxman (Imperial College): Predictor Variable Prioritization in Nonlinear Models: A Genetic Association Case Study\n\nLocation: Zoom\n\nLink: Roberts Building G06 Sir Ambrose Fleming LT\n\nAbstract:Title: "Predictor Variable Prioritization in Nonlinear Models: A

Genetic Association Case Study"

Abstract:

We address variable selection questions in nonlinear and nonparametric

regression. Motivated by statistical genetics, where nonlinear

interactions are of particular interest, we introduce a novel,

interpretable, and computationally efficient way to summarize the

relative importance of predictor variables. Methodologically, we

develop the “RelATive cEntrality” (RATE) measure to prioritize

candidate genetic variants that are not just marginally important, but

whose associations also stem from significant covarying relationships

with other variants in the data. We illustrate RATE through Bayesian

Gaussian process regression, but the methodological innovations apply

to other nonlinear methods. It is known that nonlinear models often

exhibit greater predictive accuracy than linear models, particularly

for phenotypes generated by complex genetic architectures. With

detailed simulations and an Arabidopsis thaliana QTL mapping study, we

show that applying RATE enables an explanation for this improved

performance.

Bio:

Seth Flaxman is a lecturer in the statistics section of the

Department of Mathematics at Imperial College London, joint with the

Data Science Institute. His research is on scalable methods and

flexible models for spatiotemporal statistics and Bayesian machine

learning, applied to public policy and social science. He has worked

on application areas that include public health, crime, voting

patterns, filter bubbles / echo chambers in media, the regulation of

machine learning algorithms, and emotion.

Probabilistic generative models can be used for compression, denoising, inpainting, texture synthesis, semi-supervised learning, unsupervised feature learning, and other tasks. Given this wide range of applications, it is not surprising that a lot of heterogeneity exists in the way these models are formulated, trained, and evaluated. As a consequence, direct comparison between models is often difficult. In this talk, we are going to take a look at some of the metrics which have been used to evaluate generative models. In particular, we will see that three popular criteria – average log-likelihood, Parzen window estimates, and visual fidelity of samples – are largely independent of each other when the data is high-dimensional. Good performance with respect to one criterion therefore need not imply good performance with respect to the other criteria. We conclude that generative models need to be evaluated directly with respect to the application(s) they were intended for.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20180420T130000 DTEND;TZID=/Europe/London:20180420T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20180427130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Stefanos Zafeiriou DESCRIPTION:Stefanos Zafeiriou (Imperial College London): Discovering correlations in the modern era: robust and deep learning approaches\n\nLocation: Zoom\n\nLink: Roberts Building 309\n\nAbstract:Discovering correlations in signals is a very important problem in the intersection of statistics and machine learning. Arguably the most used tool to this end in Canonical Correlation Analysis (CCA). CCA has certain limitations when it is used to model correlations in real world signals. First it discovers only the most correlated spaces, ignoring the individual spaces between signals. Second it is a linear method that is optimal under Gaussian noise, hence (a) it fails when gross outliers are present in the signals and (b) it cannot model non-linear correlations. In this talk, I will present recent advancements in CCA, as well as methods for discovering both the individual, as well as the most correlated components that are robust to gross outliers, as well as can model non-linear correlations. I will demonstrate applications in computer vision and signal processing

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20180427T130000 DTEND;TZID=/Europe/London:20180427T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20180525130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Marco Cuturi DESCRIPTION:Marco Cuturi (CREST-ENSAE/Université Paris-Saclay): Regularization for Optimal Transport and Dynamic Time Warping Distances\n\nLocation: Zoom\n\nLink: Roberts Building G08 Sir David Davies LT\n\nAbstract:Machine learning deals with objects that are structured. Two common structures arising in applications are point clouds / histograms, as well as time series. Early progress in optimization (linear and dynamic programming) have provided powerful families of distances between these structures, namely Wasserstein distances and dynamic time warping scores. Because they rely both on the minimization of a linear functional over respectively a polyhedral set of couplings and a (discrete) space of alignments, both result in non-differentiable quantities. We show how two distinct smoothing strategies result in quantities that are better behaved and more suitable for machine learning applications, with applications to several tasks arising in ML (clustering, structured prediction)

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20180525T130000 DTEND;TZID=/Europe/London:20180525T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20180601130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Piotr Mirowski DESCRIPTION:Piotr Mirowski (DeepMind): Learning to Navigate\n\nLocation: Zoom\n\nLink: Torrington (1-19) G12\n\nAbstract:Navigation is an important cognitive task that enables humans and animals to traverse, with or without maps, over long distances in the complex world. Such long-range navigation can simultaneously support self-localisation (“I am here”) and a representation of the goal (“I am going there”). For this reason, studying navigation is fundamental to the study and development of artificial intelligence, and trying to replicate navigation in artificial agents can also help neuroscientists understand its biological underpinnings.

This talk will cover our own journey to understand navigation by building deep reinforcement learning agents, starting from learning to control a simple agent that can explore and memorise large 3D mazes, to building agents that can learn to read and write to memory in order to generalise goal acquisition skills to previously unseen environments. I will show how these artificial agents relate to navigation in the real world, both through the study of the emergence of grid cell representations in neural networks -- akin to those found in the mammalian entorhinal cortex -- and by demonstrating that these agents can navigate in Street View-based real world photographic environments.

Reinforcement Learning (RL) generally presupposes the availability of possibly sparse–but primarily correct–reward signal from the environment, with which to reward an agent for behaving appropriately within the context of a task. Teaching agents to follow instructions using RL is a quintessentially multi-task problem: each instruction in a possibly combinatorially rich language corresponds to a specific task for which there must be a reward function against which the agent will learn. This has largely limited the RL community, thus far, to forms of instruction languages (e.g. templated instructions) where families of reward functions can be specified, and individual reward functions can be generated. In this talk, I discuss a new method which will allow us to take a step towards RL "in the wild", exploring a richer set of instruction languages, and enabling us to expose agents to a rich variety of tasks without needing to perpetually design reward functions over environment states.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20180608T130000 DTEND;TZID=/Europe/London:20180608T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20181019130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Kai Arulkumaran DESCRIPTION:Kai Arulkumaran (Imperial College): Tutorial on Deep RL\n\nLocation: Zoom\n\nLink: Roberts G08\n\nAbstract:Deep reinforcement learning is one of the hottest topics in machine learning research, and is thought to be a plausible route to Artificial General Intelligence. The idea behind this is that reinforcement learning is a formal way of training goal-directed agents, and can be combined with deep learning to train agents directly from raw, high-dimensional data. In this talk I will go through a quick introduction to deep learning and reinforcement learning, to then focus on the deep Q-network for playing Atari video games, as well as the asynchronous advantage actor-critic algorithm. To finish off I will discuss more specific topics of research in deep reinforcement learning, highlighting the broad spectrum of work to still be done.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20181019T130000 DTEND;TZID=/Europe/London:20181019T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20181116130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Kayvan Sadeghi DESCRIPTION:Kayvan Sadeghi (UCL): Probabilistic Independence, Graphs, and Random Networks\n\nLocation: Zoom\n\nLink: Roberts G08\n\nAbstract:The main purpose of this talk is to explore the relationship between the set of conditional independence statements induced by a probability distribution and the set of separations induced by graphs as studied in graphical models. I introduce the concepts of Markov property and faithfulness, and provide conditions under which a given probability distribution is Markov or faithful to a graph in a general setting. I discuss the implications of these conditions in devising structural learning algorithms, in understanding exchangeable vectors, and in random network analysis.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20181116T130000 DTEND;TZID=/Europe/London:20181116T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20181123130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Various DESCRIPTION:Various (UCL, DeepMind, Imperial): NIPS Previews\n\nLocation: Zoom\n\nLink: Anatomy G29\n\nAbstract:Researchers will present their NIPS 2018 accepted papers.

This event requires registration beforehand.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20181123T130000 DTEND;TZID=/Europe/London:20181123T160000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20181130130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Artur Garcez DESCRIPTION:Artur Garcez (City University): Logic Tensor Networks: A System for Deep Learning with Symbolic Reasoning\n\nLocation: Zoom\n\nLink: Roberts G08\n\nAbstract:Deep learning has achieved great success at image and audio analysis, language translation and multimodal learning. Recent results however indicate that deep networks are susceptible to adversarial examples, not being robust or capable of achieving extrapolation. To address this problem, much of the research has turned to neural Artificial Intelligence systems capable of harnessing knowledge as well as learning from large data sets. Neural-symbolic computing has sought to benefit from such combination of symbolic AI and neural computation for many years. In a neural-symbolic system, neural networks offer a machinery for efficient learning and computation, while symbolic knowledge representation and reasoning offer an ability to benefit from prior knowledge, transfer learning and extrapolation, and to produce explainable neural models. Neural-symbolic computing has found application in many areas including software specification evolution, training and assessment in simulators, and the prediction and explanation of the pathways to harm in gambling. In this talk, Professor Garcez will introduce the principles of neural-symbolic computing and will exemplify its use with defeasible knowledge representation, temporal logic reasoning and relational learning. He will then focus on Logic Tensor Networks (LTN), a neural-symbolic system capable of combining deep networks with first-order many-valued logics. LTNs were implemented in Tensorflow and have been applied successfully to semantic image interpretation and knowledge completion tasks, achieving state-of-the-art performance. Time permitting, he will also outline the main challenges for the research in AI and neural-symbolic integration going forward.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20181130T130000 DTEND;TZID=/Europe/London:20181130T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20181214130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Ricardo Pio Monti DESCRIPTION:Ricardo Pio Monti (UCL (Gatsby)): Causal discovery with general non-linear relationships using non-linear ICA\n\nLocation: Zoom\n\nLink: Roberts G08\n\nAbstract:We consider the bivariate causal discovery problem - this corresponds to inferring the causal relationship between two passively observed variables. While this problem has been extensively studied, the majority of current methods assume a linear causal relationship, and the few methods which consider non-linear dependencies usually make the assumption of additive noise. Here, we propose a framework through which we can perform causal discovery in the presence of general non-linear relationships. The proposed method exploits a correspondence between a piecewise stationary non-linear ICA model and non-linear causal models. We show that in the case of bivariate causal discovery, non-linear ICA can be used to infer the causal direction via a series of independence tests. A series of experiments on simulated data demonstrate the capabilities of the proposed method. Extensions to multivariate causal discovery are also discussed.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20181214T130000 DTEND;TZID=/Europe/London:20181214T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20190111130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Jean-Baptiste Alayrac DESCRIPTION:Jean-Baptiste Alayrac (DeepMind): Weakly Supervised Learning from Videos\n\nLocation: Zoom\n\nLink: Roberts G08\n\nAbstract:In this talk, I will introduce and motivate the importance of weak supervision for computer vision, especially in the context of video understanding. I will then illustrate it on two challenging video tasks. The first one aims to learn the sequence of actions required to achieve complex human tasks (such as 'changing a car tire') only from narrated instructional videos [1,2]. The second one concerns jointly modeling manipulation actions with their effects on the state of objects (such as 'full/empty cup') [3]. Finally, I will conclude my talk by discussing some open challenges associated with weakly supervised learning, including learning from large-scale datasets [4,5] and how to use weak supervision in the context of deep learning.

References:

[1] Unsupervised Learning from narrated instruction videos, Alayrac et al, CVPR16

[2] Learning from narrated instruction videos, Alayrac et al, TPAMI17

[3] Joint Discovery of Object States and Manipulation Actions, Alayrac et al, ICCV17

[4] Learning from Video and Text via Large-Scale Discriminative Clustering, Miech, Alayrac et al, ICCV17

[5] Minding the Gaps for Block Frank-Wolfe Optimization of Structured SVMs, Alayrac et al, ICML16

[6] DIFFRAC : a discriminative and flexible framework for clustering, Bach and Harchaoui, NIPS07

Abstract. Differential privacy is concerned about the prediction quality while measuring the privacy impact on individuals whose information is contained in the data. We consider differentially private risk minimization problems with regularizers that induce structured sparsity. These regularizers are known to be convex but they are often non-differentiable. We analyze the standard differentially private algorithms, such as output perturbation and objective perturbation. Output perturbation is a differentially private algorithm that is known to perform well for minimizing risks that are strongly convex. Previous works have derived dimensionality independent excess risk bounds for these cases. In this paper, we assume a particular class of convex but non-smooth regularizers that induce structured sparsity and loss functions for generalized linear models. We derive excess risk bound for output perturbation that is independent of the dimensionality of the problem. We also show that the existing analysis for objective perturbation may be extended to these risk minimization problems.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20190118T130000 DTEND;TZID=/Europe/London:20190118T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20190125130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Dino Sejdinovic DESCRIPTION:Dino Sejdinovic (University of Oxford): Learning on Aggregate Outputs with Kernels\n\nLocation: Zoom\n\nLink: Roberts 421\n\nAbstract:While a typical supervised learning framework assumes that the inputs and the outputs are measured at the same levels of granularity, many applications, including global mapping of disease, only have access to outputs at a much coarser level than that of the inputs. Aggregation of outputs makes generalization to new inputs much more difficult. We consider an approach to this problem based on variational learning with a model of output aggregation and Gaussian processes, where aggregation leads to intractability of the standard evidence lower bounds. We propose new bounds and tractable approximations, leading to improved prediction accuracy and scalability to large datasets, while explicitly taking uncertainty into account. We develop a framework which extends to several types of likelihoods, including the Poisson model for aggregated count data. We apply our framework to a challenging and important problem, the fine-scale spatial modelling of malaria incidences.

Joint work with Ho Chung Leon Law, Ewan Cameron, Tim CD Lucas, Seth Flaxman, Katherine Battle, Kenji Fukumizu

https://papers.nips.cc/paper/7847-variational-learning-on-aggregate-outputs-with-gaussian-processes

The talk will outline recent approaches for using (deep) convolutional neural networks to solve a wide range of inverse problems, such as tomographic image reconstruction. Emphasis is on learned iterative schemes that use a neural network architecture for reconstruction that includes physics based models for how data is generated. The talk will show how such reconstruction methods can be integrated with elements of decision making, learning to task adapted reconstruction. It will also survey a recent development in using generative adversarial networks for uncertainty quantification relevant for solving inverse problems.

Link to slides: https://people.kth.se/~ozan/UCL_slides.pdf

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20190201T130000 DTEND;TZID=/Europe/London:20190201T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20190222130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Tengyao Wang DESCRIPTION:Tengyao Wang (UCL): Sparse PCA: statistical and computational trade-offs\n\nLocation: Zoom\n\nLink: Roberts G08\n\nAbstract:In recent years, Sparse Principal Component Analysis has emerged as an extremely popular dimension reduction technique for high-dimensional data. The theoretical challenge, in the simplest case, is to estimate the leading eigenvector of a population covariance matrix under the assumption that this eigenvector is sparse. An impressive range of estimators have been proposed; some of these are fast to compute, while others are known to achieve the minimax optimal rate over certain Gaussian or subgaussian classes. In this paper we show that, under a widely-believed assumption from computational complexity theory, there is a fundamental trade-off between statistical and computational performance in this problem. More precisely, working with new, larger classes satisfying a Restricted Covariance Concentration condition, we show that there is an effective sample size regime in which no randomised polynomial time algorithm can achieve the minimax optimal rate. We also study the theoretical performance of a (polynomial time) variant of the well-known semidefinite relaxation estimator, revealing a subtle interplay between statistical and computational efficiency.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20190222T130000 DTEND;TZID=/Europe/London:20190222T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20190301130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Sander Dieleman DESCRIPTION:Sander Dieleman (DeepMind): Generating music in the raw audio domain\n\nLocation: Zoom\n\nLink: Roberts 421\n\nAbstract:Realistic music generation is a challenging task. When machine learning is used to build generative models of music, typically high-level representations such as scores, piano rolls or MIDI sequences are used that abstract away the idiosyncrasies of a particular performance. But these nuances are very important for our perception of musicality and realism, so we embark on modelling music in the raw audio domain. I will discuss some of the advantages and disadvantages of this approach, and the challenges it entails.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20190301T130000 DTEND;TZID=/Europe/London:20190301T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20190308130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Sanjeevan Ahilan DESCRIPTION:Sanjeevan Ahilan (UCL, Gatsby): Feudal Multi-Agent Hierarchies for Cooperative Reinforcement Learning\n\nLocation: Zoom\n\nLink: Roberts 421\n\nAbstract:We investigate how reinforcement learning agents can learn to cooperate. Drawing inspiration from human societies, in which successful coordination of many individuals is often facilitated by hierarchical organisation, we introduce Feudal Multi-agent Hierarchies (FMH). In this framework, a 'manager' agent, which is tasked with maximising the environmentally-determined reward function, learns to communicate subgoals to multiple, simultaneously-operating, 'worker' agents. Workers, which are rewarded for achieving managerial subgoals, take concurrent actions in the world. We outline the structure of FMH and demonstrate its potential for decentralised learning and control. We find that, given an adequate set of subgoals from which to choose, FMH performs, and particularly scales, substantially better than cooperative approaches that use a shared reward function.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20190308T130000 DTEND;TZID=/Europe/London:20190308T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20190426130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Nicolas Anastassacos DESCRIPTION:Nicolas Anastassacos (UCL): Investigating the Emergence of Cooperative Behaviour for Artificial Societies with RL\n\nLocation: Zoom\n\nLink: Roberts G06\n\nAbstract:The human ability to coordinate and cooperate has been vital to the development of societies for thousands of years, yet it is not fully clear how this behaviour arises. Mathematical and computational models have been used to get insights especially with respect to the underlying individual decision-making mechanisms.

In this talk I will discuss our current work on the emergence of cooperation in societies using Reinforcement Learning and social dilemma environments that highlight the tensions between individual goals and the collective interests of a group. In particular we explore social norms and how the success of norms may be attributed to certain dynamics that are key to developing cooperative behaviour. I will present our initial findings and outline the open challenges in this area.

https://arxiv.org/abs/1902.03185

https://arxiv.org/abs/1809.10007

By building up on the recent theory that established the connection between implicit generative modeling and optimal transport, in this talk, I will present a novel parameter-free algorithm for learning the underlying distributions of complicated datasets and sampling from them. The proposed algorithm is based on a functional optimization problem, which aims at finding a measure that is 'close to the data distribution as much as possible' and also 'expressive enough' for generative modeling purposes. The problem will be formulated as a gradient flow in the space of probability measures. The connections between gradient flows and stochastic differential equations will let us develop a computationally efficient algorithm for solving the optimization problem, where the resulting algorithm will resemble the recent dynamics-based Markov Chain Monte Carlo algorithms. I will then present finite-time error guarantees for the proposed algorithm. I will finally present some experimental results, which support our theory and shows that our algorithm is able to capture the structure of challenging distributions.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20190509T130000 DTEND;TZID=/Europe/London:20190509T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20190530130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Quentin Berthet DESCRIPTION:Quentin Berthet (University of Cambridge): Optimal transport methods in statistics and machine learning: theory and applications\n\nLocation: Zoom\n\nLink: Roberts 106\n\nAbstract:Optimal transport is one of the foundational problems of optimization, and a very important topic in analysis. It asks how one can transport mass with a given measure to have another measure, with minimal global transport cost. The associated Wasserstein distance is a useful tool to compare distributions, taking into account geometric properties of the data.

In this presentation, I will talk about two recent projects on this topic. In the first one, we propose a novel approach for unsupervised embedding alignment, and show applications to natural language processing. It is based on a new approach for Wasserstein loss minimization (joint work with E. Grave and A. Joulin, AISTATS 2019). In the second one, we provide new methods and guarantees for estimation of distributions with smooth densities, in Wasserstein distance. We show that these tools, inspired by techniques in nonparametric statistics, yield information-theoretic optimal results. We also develop ideas to handle our proposed estimators in a computationally efficient manner, and explore some of the associated computational trade-offs (joint work with J. Weed, COLT 2019).

I will be talking about the three main components of next-gen AR that the Active Vision Lab and 6D.ai are working on. First is dense 3D geometric reconstruction from depth, stereo and mono, where I will present some of our work towards hyper-real time 3D fusion, our latest GA-Net framework for stereo matching (CVPR 2019, oral) and show several demos. Next, I will talk about the AR Cloud, and the various methods we've developed for geometric relocalisation, including our RelocNet system (ECCV 2018 oral) and Grove (CVPR 2018 oral, T-PAMI 2019). Finally, I will talk about semantic 3D reconstruction, and, among others, showcase our upcoming work on very fast instance segmentation, Mobile RCNN.

High-impact areas of machine learning and AI, such as personalized healthcare, autonomous robots, or environmental science share some practical challenges: They are either small-data problems or a small collection of big-data problems. Therefore, learning algorithms need to be data/sample efficient, i.e., they need to be able to learn in complex domains, but only from fairly small datasets. Approaches for data-efficient learning include probabilistic modeling and inference, Bayesian deep learning, meta learning, Bayesian optimization, few-shot learning, etc.

High-impact areas of machine learning and AI, such as personalized healthcare, autonomous robots, or environmental science share some practical challenges: They are either small-data problems or a small collection of big-data problems. Therefore, learning algorithms need to be data/sample efficient, i.e., they need to be able to learn in complex domains, but only from fairly small datasets. Approaches for data-efficient learning include probabilistic modeling and inference, Bayesian deep learning, meta learning, Bayesian optimization, few-shot learning, etc.

In this talk, Marc will give a brief overview of some approaches to tackle the data-efficiency challenge. First, he will discuss a data-efficient reinforcement learning algorithm, which highlights the necessity for probabilistic models in RL. He will then present a meta-learning method for generalizing knowledge across tasks. Finally, he will motivate deep Gaussian processes, richer probabilistic models, which are composed of relatively simple building blocks. He will briefly discuss the model, inference and some potential extensions, which can be valuable for modeling complex relationships, while providing some uncertainty estimates, which will be useful in any downstream decision-making process.

In this talk, Marc will give a brief overview of some approaches to tackle the data-efficiency challenge. First, he will discuss a data-efficient reinforcement learning algorithm, which highlights the necessity for probabilistic models in RL. He will then present a meta-learning method for generalizing knowledge across tasks. Finally, he will motivate deep Gaussian processes, richer probabilistic models, which are composed of relatively simple building blocks. He will briefly discuss the model, inference and some potential extensions, which can be valuable for modeling complex relationships, while providing some uncertainty estimates, which will be useful in any downstream decision-making process.

*Key references*

- Marc P. Deisenroth, Dieter Fox, Carl E. Rasmussen, Gaussian Processes for Data-Efficient Learning in Robotics and Control, IEEE Transactions on Pattern Analysis and Machine Intelligence, volume 37, pp. 408–423, 2015
- Steindór Sæmundsson, Katja Hofmann, Marc P. Deisenroth, Meta Reinforcement Learning with Latent Variable Gaussian Processes, Proceedings of the International the Conference on Uncertainty in Artificial Intelligence (UAI), 2018
- Hugh Salimbeni, Marc P. Deisenroth, Doubly Stochastic Variational Inference for Deep Gaussian Processes, Advances in Neural Information Processing Systems (NIPS), 2017
- Hugh Salimbeni, Vincent Dutordoir, James Hensman, Marc P. Deisenroth, Deep Gaussian Processes with Importance-Weighted Variational Inference, International Conference on Machine Learning (ICML), 2019

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20190621T130000 DTEND;TZID=/Europe/London:20190621T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20190628130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Benjamin Guedj DESCRIPTION:Benjamin Guedj (UCL - INRIA): A primer on PAC-Bayesian learning with applications to deep neural networks\n\nLocation: Zoom\n\nLink: Gatsby Computational Neuroscience Unit Ground Floor\n\nAbstract:

PAC-Bayes is a generic and flexible framework to address generalisation abilities of machine learning algorithms. It leverages the power of Bayesian inference and allows to derive new learning strategies. Benjamin will briefly present the key concepts of PAC-Bayes and illustrate how it can be used to study generalization properties of deep neural networks.

Joint work with Gaël Letarte, Pascal Germain, François Laviolette (https://arxiv.org/abs/1905.13367) and John Shawe-Taylor (see our ICML 2019 tutorial https://bguedj.github.io/icml2019/index.html)

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20190628T130000 DTEND;TZID=/Europe/London:20190628T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20191011130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Arthur Mensch DESCRIPTION:Arthur Mensch (Ecole Normale Superieure (ENS) Paris): Geometric Losses for Distributional Learning\n\nLocation: Zoom\n\nLink: 1.03 Engineering Building (Malet Place)\n\nAbstract:Building upon recent advances in entropy-regularized optimal transport, and upon Fenchel duality between measures and continuous functions , we propose a generalization of the logistic loss that incorporates a metric or cost between classes. Unlike previous attempts to use optimal transport distances for learning, our loss results in unconstrained convex objective functions, supports infinite (or very large) class spaces, and naturally defines a geometric generalization of the softmax operator. The geometric properties of this loss make it suitable for predicting sparse and singular distributions, for instance supported on curves or hyper-surfaces. We study the theoretical properties of our loss and show-case its effectiveness on two applications: ordinal regression and drawing generation.

Arthur Mensch is a post-doctoral researcher at École Normale Supérieure, Paris, in the laboratory of Gabriel Peyré. He holds a Ph.D. in machine learning from the Inria Parietal team. He is currently working in structured prediction, optimal transport and game theory.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20191011T130000 DTEND;TZID=/Europe/London:20191011T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20191115130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Varun Kanade DESCRIPTION:Varun Kanade (University of Oxford): Implicit Regularization for Optimal Sparse Recovery\n\nLocation: Zoom\n\nLink: Roberts Building G06 Sir Ambrose Fleming LT\n\nAbstract:We present an implicit regularization scheme for gradient descent methods

applied to unpenalized least squares regression to solve the problem of

reconstructing a sparse signal from an underdetermined system of linear

measurements under the restricted isometry assumption. For a given

parameterization yielding a non-convex optimization problem, we show that

prescribed choices of initialization, step size and stopping time yield a

statistically and computationally optimal algorithm that achieves the minimax

rate with the same cost required to read the data up to poly-logarithmic

factors. Beyond minimax optimality, we show that our algorithm adapts to

instance difficulty and yields a dimension-independent rate when the

signal-to-noise ratio is high enough. We validate our findings with numerical

experiments and compare our algorithm against explicit $\ell_{1}$ penalization.

Going from hard instances to easy ones, our algorithm is seen to undergo a

phase transition, eventually matching least squares with an oracle knowledge of

the true support.

(based on joint work with Patrick Rebeschini and Tomas Vaskevicius)

Varun Kanade is an associate professor at University of Oxford in the Department of Computer Science. He has been a Simons Postdoctoral Fellow at the University of California, Berkeley and a FSMP postdoctoral fellow at ENS, Paris. He obtained his Ph.D. from Harvard University in 2012. His research interests are in machine learning and theoretical computer science.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20191115T130000 DTEND;TZID=/Europe/London:20191115T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20191128130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Claire Vernade DESCRIPTION:Claire Vernade (DeepMind): NeurIPS Previews 2019\n\nLocation: Zoom\n\nLink: Gordon Street 25, E28 Harrie Massey Lecture Theatre\n\nAbstract:* Claire Vernade (Deepmind) --- Weighted Linear Bandits for Non-Stationary Environments:

(Joint work with Yoan Russac (ENS), Olivier Cappé (CNRS))

We consider a stochastic linear bandit model in which the available actions correspond to arbitrary context vectors whose associated rewards follow a non-stationary linear regression model. In this setting, the unknown regression parameter is al-lowed to vary in time. To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past. This involves studying the deviations of the sequential weighted least-squares estimator under generic assumptions.As a by-product, we obtain novel deviation results that can be used beyond non-stationary environments. We provide theoretical guarantees on the behavior ofD-LinUCBin both slowly-varying and abruptly-changing environments. We obtain an upper bound on the dynamic regret that is of order d^2/3 B_T^1/3 T^2/3, where B_T is a measure of non-stationarity (d and T being, respectively, dimension and horizon). This rate is known to be optimal. We also illustrate the empirical performance ofD-LinUCBand compare it with recently proposed alternatives in simulated environments.

* Giulia Luise (UCL) --- Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm

(Joint work with Saverio Salzo (IIT), Carlo Ciliberto (Imperial), Massimiliano Pontil (UCL))

We present a novel algorithm to estimate the barycenter of arbitrary probability distributions with respect to the Sinkhorn divergence. Based on a Frank-Wolfe optimization strategy, our approach proceeds by populating the support of the barycenter incrementally, without requiring any pre-allocation. We consider dis-crete as well as continuous distributions, proving convergence rates of the proposed algorithm in both settings. Key elements of our analysis are a new result showing that the Sinkhorn divergence on compact domains has Lipschitz continuous gradient with respect to the Total Variation and a characterization of the sample complexity of Sinkhorn potentials. Experiments validate the effectiveness of our method in practice.

* Michael Arbel (Gatsby Unit, UCL)--- Maximum Mean Discrepancy Gradient Flow

(Joint work with Anna Korba (Gatsby Unit, UCL), Adil Salim (KAUST), Arthur Gretton (Gatsby Unit, UCL))

We construct a Wasserstein gradient flow of the maximum mean discrepancy(MMD) and study its convergence properties. The MMD is an integral probability metric defined for a reproducing kernel Hilbert space (RKHS), and serves as a metric on probability measures for a sufficiently rich RKHS. We obtain conditions for convergence of the gradient flow towards a global optimum, that can be related to particle transport when optimizing neural networks. We also propose a way to regularize this MMD flow, based on an injection of noise in the gradient. This algorithmic fix comes with theoretical and empirical evidence. The practical implementation of the flow is straightforward, since both the MMD and its gradient have simple closed-form expressions, which can be easily estimated with samples.

* Marcel Hirt (UCL) --- Copula-like Variational Inference

(Joint work with Petros Dellaportas, Alain Durmus (ENS Cachan))

This paper considers a new family of variational distributions motivated by Sklar’s theorem. This family is based on new copula-like densities on the hypercube with non-uniform marginals which can be sampled efficiently,i.e.with a complexity linear in the dimension d of the state space. Then, the proposed variational densities that we suggest can be seen as arising from these copula-like densities used as base distributions on the hypercube with Gaussian quantile functions and sparse rotation matrices as normalizing flows. The latter correspond to a rotation of the marginals with complexity O(d log d). We provide some empirical evidence that such a variational family can also approximate non-Gaussian posteriors and can be beneficial compared to Gaussian approximations. Our method performs largely comparably to state-of-the-art variational approximations on standard regression and classification benchmarks for Bayesian Neural Networks.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20191128T130000 DTEND;TZID=/Europe/London:20191128T150000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20200110130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Catalina Cangea DESCRIPTION:Catalina Cangea (University of Cambridge): Question Answering in Realistic Visual Environments: Challenges and Approaches\n\nLocation: Zoom\n\nLink: Malet Place Engineering Building 1.03\n\nAbstract:The Embodied Question Answering (EQA) and Interactive Question Answering (IQA) tasks were recently introduced as a means to study the capabilities of agents in rich, realistic 3D environments, requiring both navigation and reasoning to achieve success. Each of these skills typically needs a different approach, which should nevertheless be smoothly integrated with the rest of the system leveraged by the agent. However, initial approaches either suffer from potentially weaker performance than when using a language-only model or are preceded by additional hand-engineered steps. This talk will provide an overview of the existing work on this thread and describe in more detail our recent study (published at BMVC 2019, spotlight talk at ViGIL@NeurIPS 2019), VideoNavQA: Bridging the Gap between Visual and Embodied Question Answering. Here, we investigate the feasibility of EQA-type tasks by building a novel benchmark, which contains pairs of questions and videos generated in the House3D environment. While removing the navigation and action selection requirements from EQA, we increase the difficulty of the visual reasoning component via a much larger question space, tackling the sort of complex reasoning questions that make QA tasks challenging. By designing and evaluating several VQA-style models on the dataset, we establish a novel way of evaluating EQA feasibility given existing methods, while highlighting the difficulty of the problem even in the most ideal setting.

Bio:

Cătălina Cangea is a second-year PhD student at the Department of Computer Science and Technology from University of Cambridge - her research is focused on multimodal, visual reasoning and relational learning tasks. She was Aaron Courville's intern last summer at Mila and an AI Resident at (Google) X, the moonshot factory this summer. Her work was presented at various venues that include the British Machine Vision Conference (BMVC), NeurIPS workshops (ViGIL, R2L) and ICLR workshops (RLGM, AISG). Before starting the PhD, Cătălina obtained her BA and MPhil degrees in Computer Science from the University of Cambridge.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20200110T130000 DTEND;TZID=/Europe/London:20200110T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20200227130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Aude Genevay DESCRIPTION:Aude Genevay (MIT): Learning with entropy-regularized optimal transport\n\nLocation: Zoom\n\nLink: Roberts 106\n\nAbstract:Abstract: Entropy-regularized OT (EOT) was first introduced by Cuturi in 2013 as a solution to the computational burden of OT for machine learning problems. In this talk, after studying the properties of EOT, we will introduce a new family of losses between probability measures called Sinkhorn Divergences. Based on EOT, this family of losses actually interpolates between OT (no regularization) and MMD (infinite regularization). We will illustrate these theoretical claims on a set of learning problems formulated as minimizations over the space of measures.

Bio: Aude Genevay is a postdoctoral researcher in the Geometric Data Processing group at MIT, working with Justin Solomon. Prior to that she obtained at PhD in Mathematics from Ecole Normale Supérieure under the supervision of Gabriel Peyré.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20200227T130000 DTEND;TZID=/Europe/London:20200227T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20200409130000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Maurice Weiler DESCRIPTION:Maurice Weiler (University of Amsterdam): Equivariant Neural Networks\n\nLocation: Zoom\n\nLink: TBA\n\nAbstract:TBA

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20200409T130000 DTEND;TZID=/Europe/London:20200409T140000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20201127140000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Alexey Dosovitskiy DESCRIPTION:Alexey Dosovitskiy (Google Brain): Non-convolutional architectures for recognition and generation\n\nLocation: Zoom\n\nLink: Zoom\n\nAbstract:Slides:

https://www.dropbox.com/s/0s3sjpefn9e2pl9/Alexey_Dosovitskiy_Non_convolutional_architectures.pdf?dl=0

Convolutional networks are the workhorses of modern computer vision, thanks to their efficiency on hardware accelerators and the inductive biases suitable for processing and generating images. However, ConvNets distribute compute uniformly across the input, which makes them convenient to implement and train, but can be extremely computationally inefficient, especially on high-dimensional inputs such as video or 3D data. Moreover, representations extracted by ConvNets lack interpretability and systematic generalization. In this talk, I will present our recent work towards models that aim to avoid these shortcomings by respecting the sparse structure of the real world. On the image recognition front, we are investigating two directions: 1) architectures for learning object-centric representations either with or without supervision (Slot Attention); 2) large-scale non-convolutional models applied to real-world image recognition tasks (Vision Transformer). For image generation, we scale a recent implicit-3D-based neural rendering approach, Neural Radiance Fields, from controlled small-scale datasets to noisy large-scale real-world data (NeRF in the Wild).

Join Zoom Meeting

https://ucl.zoom.us/j/97094846920?pwd=MlYvNVZTN2llM2dZZVRpRFh5a1JHZz09

https://ucl.zoom.us/s/99166798620

Abstract: I recently proposed the lottery ticket hypothesis: that the dense neural networks we typically train have much smaller subnetworks capable of reaching full accuracy from early in training. This hypothesis raises (1) scientific questions about the nature of overparameterization in neural network optimization and (2) practical questions about our ability to accelerate training. In this talk, I will discuss established results and the latest developments in my line of work on the lottery ticket hypothesis, including the empirical evidence for these claims on small vision tasks, changes necessary to scale these ideas to practical settings, and the relationship between these subnetworks and their “stability” to the noise of stochastic gradient descent. I will also describe my vision for the future of research on this topic.

Bio: Jonathan Frankle is a fifth year PhD student at MIT, where he empirically studies deep learning with Prof. Michael Carbin. His current research focus is on the properties of sparse networks that allow them to train effectively as embodied by his “Lottery Ticket Hypothesis” (ICLR 2019 best paper award). Jonathan also has an interest in technology policy: he has worked closely with lawyers, journalists, and policymakers on topics in AI policy and has taught at the Georgetown University Law Center. He earned his BSE and MSE in computer science at Princeton and has previously spent time at Google, Facebook, and Microsoft.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20201204T140000 DTEND;TZID=/Europe/London:20201204T150000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20201218140000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Luigi Gresele & Giancarlo Fissore DESCRIPTION:Luigi Gresele & Giancarlo Fissore (MPI for Intelligent Systems & Inria Paris-Saclay): Relative gradient optimization of the Jacobian term in unsupervised deep learning\n\nLocation: Zoom\n\nLink: Zoom\n\nAbstract:Learning expressive probabilistic models correctly describing the data is a ubiquitous problem in machine learning. A popular approach for solving it is mapping the observations into a representation space with a simple joint distribution, which can typically be written as a product of its marginals -- thus drawing a connection with the field of nonlinear independent component analysis. Deep density models have been widely used for this task, but their maximum likelihood based training requires estimating the log-determinant of the Jacobian and is computationally expensive, thus imposing a trade-off between computation and expressive power. In this work, we propose a new approach for exact training of such neural networks. Based on relative gradients, we exploit the matrix structure of neural network parameters to compute updates efficiently even in high-dimensional spaces; the computational cost of the training is quadratic in the input size, in contrast with the cubic scaling of naive approaches. This allows fast training with objective functions involving the log-determinant of the Jacobian, without imposing constraints on its structure.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20201218T140000 DTEND;TZID=/Europe/London:20201218T150000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20210108170000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Sergey Levine DESCRIPTION:Sergey Levine (UC Berkeley): Data-Driven Reinforcement Learning: Deriving Common Sense from Past Experience\n\nLocation: Zoom\n\nLink: Zoom\n\nAbstract:https://ucl.zoom.us/j/99166798620

Reinforcement learning affords autonomous agents, such as robots, the ability to acquire behavioral skills through their own experience. However, a central challenge for machine learning systems deployed in real-world settings is generalization, and generalization has received comparatively less attention in recent research in reinforcement learning, with many methods focusing on optimization performance and relying on hand-designed simulators or closed-world domains such as games. In domains where generalization has been studied successfully -- computer vision, natural language processing, speech recognition, etc., -- invariably good generalization stems from access to large, diverse, and representative datasets. Put another way, data drives generalization. Can we transplant this lesson into the world of reinforcement learning? What does a data-driven reinforcement learning system look like, and what types of algorithmic and conceptual challenges must be overcome to devise such a system? In this talk, I will discuss how data-driven methods that utilize past experience can enable wider generalization for reinforcement learning agents, particularly as applied to challenging problems in robotic manipulation and navigation in open-world environments. I will show how robotic systems trained on large and diverse datasets can attain state-of-the-art results for robotic grasping, acquire a kind of "common sense" that allows them to generalize to new situations, learn flexible skills that allow users to set new goals at test-time, and even enable a ground robot to navigate sidewalks in the city of Berkeley with an entirely end-to-end learned model.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20210108T170000 DTEND;TZID=/Europe/London:20210108T180000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20210115160000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Jakob Foerster DESCRIPTION:Jakob Foerster (Facebook): Zero-Shot (Human-AI) Coordination (in Hanabi) and Ridge Rider\n\nLocation: Zoom\n\nLink: Zoom\n\nAbstract:https://ucl.zoom.us/j/99166798620

Abstract:

In recent years we have seen fast progress on a number of zero-sum benchmark problems in AI, e.g. Go, Poker and Dota. In contrast, success in the real world requires humans to collaborate and communicate with others, in settings that are, at least partially, cooperative. Recently, the card game Hanabi has been established as a new benchmark environment to fill this gap. In particular, Hanabi is interesting to humans since it is entirely focused on theory of mind, i.e., the ability to reason over the intentions, beliefs and point of view of other agents when observing their actions. This is particularly important in applications such as communication, assistive technologies and autonomous driving.

We start out by introducing the zero-shot coordination setting as a new frontier for multi-agent research, which is partially addressed by Other-Play, a novel learning algorithm which biases learning towards more human compatible policies.

Lastly we introduce Ridge Rider, our brand new algorithm which addresses both zero-shot coordination and other optimization problems where the objective we care about can by definition not be evaluated during training time.

Bio:

Jakob Foerster (Facebook AI Research / University of Toronto & Vector Institute (incoming))

Jakob Foerster received a CIFAR AI chair in 2019 and is starting as an Assistant Professor at the University of Toronto and the Vector Institute in the academic year 20/21. During his PhD at the University of Oxford, he helped bring deep multi-agent reinforcement learning to the forefront of AI research and interned at Google Brain, OpenAI, and DeepMind. He has since been working as a research scientist at Facebook AI Research in California, where he will continue advancing the field up to his move to Toronto. He was the lead organizer of the first Emergent Communication (EmeCom) workshop at NeurIPS in 2017, which he has helped organize ever since.

https://ucl.zoom.us/j/99166798620

Abstract:

Deep neural networks excel at function approximation, yet they are typically trained from scratch for each new function. On the other hand, Bayesian methods, such as Gaussian Processes (GPs), exploit prior knowledge to quickly infer the shape of a new function at test time. Yet GPs are computationally expensive, and it can be hard to design appropriate priors. We propose a family of neural models, Conditional Neural Processes (CNPs), that combine the benefits of both. CNPs are inspired by the flexibility of stochastic processes such as GPs, but are structured as neural networks and trained via gradient descent. CNPs make accurate predictions after observing only a handful of training data points, yet scale to complex functions and large datasets. In this talk we will introduce CNPs and their latent variable version ‘Neural Processes’ through the lens of meta-learning and discuss how they relate to a variety of existing models from this ML area.

Bio:

Marta is a senior research scientist at DeepMind where she has primarily worked on deep generative models and meta learning. In this context she was involved in developing Generative Query Networks and led the work on Neural Processes. Recently her research interests have expanded to include multi-agent systems and game theory.

We investigate gradient descent training of wide neural networks and the corresponding implicit bias in function space. For 1D regression, we show that the solution of training a width-$n$ shallow ReLU network is within $n^{- 1/2}$ of the function which fits the training data and whose difference from initialization has smallest 2-norm of the second derivative weighted by $1/\zeta$. The curvature penalty function $1/\zeta$ is expressed in terms of the probability distribution that is utilized to initialize the network parameters, and we compute it explicitly for various common initialization procedures. For instance, asymmetric initialization with a uniform distribution yields a constant curvature penalty, and hence the solution function is the natural cubic spline interpolation of the training data. While similar results have been obtained in previous works, our analysis clarifies important details and allows us to obtain significant generalizations. In particular, the result generalizes to multivariate regression and different activation functions. Moreover, we show that the training trajectories are captured by trajectories of spatially adaptive smoothing splines with decreasing regularization strength. This is joint work with Hui Jin (UCLA).

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20210129T140000 DTEND;TZID=/Europe/London:20210129T150000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20210205140000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Mihaela van der Schaar DESCRIPTION:Mihaela van der Schaar (University of Cambridge): Why medicine is creating exciting new frontiers for machine learning\n\nLocation: Zoom\n\nLink: Zoom\n\nAbstract:Link: https://ucl.zoom.us/j/99166798620

Meeting-ID: 99166798620

Abstract:

Medicine stands apart from other areas where machine learning can be applied. While we have seen advances in other fields with lots of data, it is not the volume of data that makes medicine so hard, it is the challenges arising from extracting actionable information from the complexity of the data. It is these challenges that make medicine the most exciting area for anyone who is really interested in the frontiers of machine learning – giving us real-world problems where the solutions are ones that are societally important and which potentially impact on us all. Think Covid 19! In this talk I will show how machine learning is transforming medicine and how medicine is driving new advances in machine learning, including new methodologies in automated machine learning, interpretable and explainable machine learning, dynamic forecasting, and causal inference.

Bio:

Professor van der Schaar is John Humphrey Plummer Professor of Machine Learning, Artificial Intelligence and Medicine at the University of Cambridge and a Turing Faculty Fellow at The Alan Turing Institute in London, where she leads the effort on data science and machine learning for personalized medicine. She is also a Chancellor's Professor at UCLA. She was elected IEEE Fellow in 2009. She has received numerous awards, including the Oon Prize on Preventative Medicine from the University of Cambridge (2018), an NSF Career Award (2004), 3 IBM Faculty Awards, the IBM Exploratory Stream Analytics Innovation Award, the Philips Make a Difference Award and several best paper awards, including the IEEE Darlington Award. She holds 35 granted USA patents. In 2019, she was identified by National Endowment for Science, Technology and the Arts as the female researcher based in the UK with the most publications in the field of AI. She was also elected as a 2019 'Star in Computer Networking and Communications'. Her research expertise spans signal and image processing, communication networks, network science, multimedia, game theory, distributed systems and machine learning. Her current research focus is on machine learning, AI and operations research for healthcare and medicine. For more details, see her website: http://www.vanderschaar-lab.com

Abstract: We show how to do gradient-based stochastic variational inference in stochastic differential equations (SDEs), in a way that allows the use of adaptive SDE solvers. This allows us to scalably fit a new family of richly-parameterized distributions over irregularly-sampled time series. We apply latent SDEs to motion capture data, and to demonstrate infinitely-deep Bayesian neural networks. We also discuss the pros and cons of this barely-explored model class, comparing it to Gaussian processes and neural processes.

Some technical details are in this paper: https://arxiv.org/abs/2001.01328

And code is available at: https://github.com/google-research/torchsde

Bio: David Duvenaud is an assistant professor in computer science at the University of Toronto. His research focuses on continuous-time models, latent-variable models, and deep learning. His postdoc was done at Harvard University, and his Ph.D. at the University of Cambridge. David also co-founded Invenia, an energy forecasting company.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20210212T140000 DTEND;TZID=/Europe/London:20210212T150000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20210219170000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Chelsea Finn DESCRIPTION:Chelsea Finn (Stanford University): Principles for Tackling Distribution Shift: Pessimism, Adaptation, and Anticipation\n\nLocation: Zoom\n\nLink: https://ucl.zoom.us/j/99166798620\n\nAbstract:Abstract: While we have seen immense progress in machine learning, a critical shortcoming of current methods lies in handling distribution shift between training and deployment. Distribution shift is pervasive in real-world problems ranging from natural variation in the distribution over locations or domains, to shift in the distribution arising from different decision making policies, to shifts over time as the world changes. In this talk, I’ll discuss three general principles for tackling these forms of distribution shift: pessimism, adaptation, and anticipation. I’ll present the most general form of each principle before providing concrete instantiations of using each in practice. This will include a simple method for substantially improving robustness to spurious correlations, a framework for quickly adapting a model to a new user or domain with only unlabeled data, and an algorithm that enables robots to anticipate and adapt to shifts caused by other agents.

Bio: Chelsea Finn is an Assistant Professor in Computer Science and Electrical Engineering at Stanford University. Finn's research interests lie in the capability of robots and other agents to develop broadly intelligent behavior through learning and interaction. To this end, her work has included deep learning algorithms for concurrently learning visual perception and control in robotic manipulation skills, inverse reinforcement methods for scalable acquisition of nonlinear reward functions, and meta-learning algorithms that can enable fast, few-shot adaptation in both visual perception and deep reinforcement learning. Finn received her Bachelor's degree in Electrical Engineering and Computer Science at MIT and her PhD in Computer Science at UC Berkeley. Her research has been recognized through the ACM doctoral dissertation award, the Microsoft Research Faculty Fellowship, the C.V. Ramamoorthy Distinguished Research Award, and the MIT Technology Review 35 under 35 Award, and her work has been covered by various media outlets, including the New York Times, Wired, and Bloomberg. Throughout her career, she has sought to increase the representation of underrepresented minorities within CS and AI by developing an AI outreach camp at Berkeley for underprivileged high school students, a mentoring program for underrepresented undergraduates across four universities, and leading efforts within the WiML and Berkeley WiCSE communities of women researchers.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20210219T170000 DTEND;TZID=/Europe/London:20210219T180000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20210226170000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Greg Yang DESCRIPTION:Greg Yang (Microsoft Research): Feature Learning in Infinite-Width Neural Networks\n\nLocation: Zoom\n\nLink: https://ucl.zoom.us/s/99166798620\n\nAbstract:Abstract: As its width tends to infinity, a deep neural network's behavior under gradient descent can become simplified and predictable (e.g. given by the Neural Tangent Kernel (NTK)), if it is parametrized appropriately (e.g. the NTK parametrization). However, we show that the standard and NTK parametrizations of a neural network do not admit infinite-width limits that can learn representations (i.e. features), which is crucial for pretraining and transfer learning such as with BERT. We propose simple modifications to the standard parametrization to allow for feature learning in the limit. Using the *Tensor Programs* technique, we derive explicit formulas for such limits. On Word2Vec and few-shot learning on Omniglot via MAML, two canonical tasks that rely crucially on feature learning, we compute these limits exactly. We find that they outperform both NTK baselines and finite-width networks, with the latter approaching the infinite-width feature learning performance as width increases.

More generally, we classify a natural space of neural network parametrizations that generalizes standard, NTK, and Mean Field parametrizations. We show 1) any parametrization in this space either admits feature learning or has an infinite-width training dynamics given by kernel gradient descent, but not both; 2) any such infinite-width limit can be computed using the Tensor Programs technique.

This work is based on https://arxiv.org/abs/2011.14522.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20210226T170000 DTEND;TZID=/Europe/London:20210226T180000 DTSTAMP;TZID=/Europe/London:20200301T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20211112160000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Mark Herbster DESCRIPTION:Mark Herbster (University College London): Online Multitask Learning with Long-Term Memory\n\nLocation: Zoom\n\nLink: https://ucl.zoom.us/j/92007296671\n\nAbstract: We introduce a novel online multitask setting. In this setting each task is partitioned into a sequence of segments that is unknown to the learner. Associated with each segment is a hypothesis from some hypothesis class. We give algorithms that are designed to exploit the scenario where there are many such segments but significantly fewer associated hypotheses. We prove regret bounds that hold for any segmentation of the tasks and any association of hypotheses to the segments. In the single-task setting this is equivalent to switching with long-term memory in the sense of (Bousquet & Warmuth, 2003). We provide an algorithm that predicts on each trial in time linear in the number of hypotheses when the hypothesis class is finite. We also consider infinite hypothesis classes from reproducing kernel Hilbert spaces for which we give an algorithm whose per trial time complexity is cubic in the number of cumulative trials. In the single-task special case this is the first example of an efficient regret-bounded switching algorithm with long-term memory for a non-parametric hypothesis class.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20211112T160000 DTEND;TZID=/Europe/London:20211112T170000 DTSTAMP;TZID=/Europe/London:20211112T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20211203150000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Ting Chen DESCRIPTION:Ting Chen (Google Brain): Contrastive Self-Supervised Learning and Potential Limitations\n\nLocation: Zoom\n\nLink: https://ucl.zoom.us/j/96937626478\n\nAbstract:Abstract: Contrastive learning has achieved some impressive results recently for learning visual representations from images without human supervision. As an example, SimCLR (and many other subsequent work) is able to learn representations that rival or outperform supervised learning on ImageNet without using any labels. In this talk, I will cover a few topics on contrastive self-supervised learning, including an overview of a few basic contrastive methods, important factors in contrastive learning, simple approaches for semi-supervised learning (with lots of unlabeled images and a few labeled images), some intriguing properties and potential limitations of existing contrastive learning methods.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20211203T150000 DTEND;TZID=/Europe/London:20211203T160000 DTSTAMP;TZID=/Europe/London:20210101T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20211217100000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Makoto Yamada DESCRIPTION:Makoto Yamada (Kyoto University and RIKEN AIP center): Selective inference with Kernels\n\nLocation: Zoom\n\nLink: https://ucl.zoom.us/j/92007296671\n\nAbstract:Abstract: Finding a set of statistically significant features from complex data (e.g., nonlinear and/or multi-dimensional output data) is important for scientific discovery and has many practical applications, including biomarker discovery. In this talk, I introduce kernel-based selective inference frameworks that can be used to find a set of statistically significant features from non-linearly related data without splitting the data for selection and inference. Specifically, I introduce a selective variant of hypothesis testing framework based on post selection inference: two sample test with Maximum Mean Discrepancy (MMD), an independence test with Hilbert-Schmidt Independence Criterion (HSIC), a goodness of fit with Kernel Stein Discrepancy (KSD). For example, in the selective independence test, we propose the hsicInf algorithm, which can handle non-linearity and/or multi-variate/multi-class outputs through kernels. Then, I show applications of kernel-based selective inference algorithms and discuss potential future work. The talk will be an overview of our recent ICML, NeurIPS, and AISTATS publications.

\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20211217T100000 DTEND;TZID=/Europe/London:20211217T110000 DTSTAMP;TZID=/Europe/London:20211214T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20220114140000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Richard Samworth DESCRIPTION:Richard Samworth (Cambridge University): Optimal Subgroup Selection\n\nLocation: Zoom\n\nLink: https://ucl.zoom.us/j/93936841636\n\nAbstract: In clinical trials and other applications, we often see regions of the feature space that appear to exhibit interesting behaviour, but it is unclear whether these observed phenomena are reflected at the population level. Focusing on a regression setting, we consider the subgroup selection challenge of identifying a region of the feature space on which the regression function exceeds a pre-determined threshold. We formulate the problem as one of constrained optimisation, where we seek a low-complexity, data-dependent selection set on which, with a guaranteed probability, the regression function is uniformly at least as large as the threshold; subject to this constraint, we would like the region to contain as much mass under the marginal feature distribution as possible. This leads to a natural notion of regret, and our main contribution is to determine the minimax optimal rate for this regret in both the sample size and the Type I error probability. The rate involves a delicate interplay between parameters that control the smoothness of the regression function, as well as exponents that quantify the extent to which the optimal selection set at the population level can be approximated by families of well-behaved subsets. Finally, we expand the scope of our previous results by illustrating how they may be generalised to a treatment and control setting, where interest lies in the heterogeneous treatment effect.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20220114T140000 DTEND;TZID=/Europe/London:20220114T150000 DTSTAMP;TZID=/Europe/London:20211223T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20220204140000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Stéphanie Allassonnière DESCRIPTION:Stéphanie Allassonnière (Paris Descartes University): Data Augmentation in High Dimensional Low Sample Size Setting Using a Geometry-Based Variational Autoencoder\n\nLocation: Zoom\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: In this presentation, we propose a new method to perform data augmentation in a reliable way in the High Dimensional Low Sample Size (HDLSS) setting using a geometry-based variational autoencoder. Our approach combines a proper latent space modeling of the VAE seen as a Riemannian manifold with a new generation scheme which produces more meaningful samples especially in the context of small data sets. The proposed method is tested through a wide experimental study where its robustness to data sets, classifiers and training samples size is stressed. It is also validated on a medical imaging classification task on the challenging ADNI database where a small number of 3D brain MRIs are considered and augmented using the proposed VAE framework. In each case, the proposed method allows for a significant and reliable gain in the classification metrics. For instance, balanced accuracy jumps from 66.3% to 74.3% for a state-of-the-art CNN classifier trained with 50 MRIs of cognitively normal (CN) and 50 Alzheimer disease (AD) patients and from 77.7% to 86.3% when trained with 243 CN and 210 AD while improving greatly sensitivity and specificity metrics.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20220204T140000 DTEND;TZID=/Europe/London:20220204T150000 DTSTAMP;TZID=/Europe/London:20220110T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20220225170000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Hossein Mobahi DESCRIPTION:Hossein Mobahi (Google): Sharpness-Aware Minimization (SAM) Current Method and Future Directions\n\nLocation: Zoom\n\nLink: https://ucl.zoom.us/j/96768812815\n\nAbstract: In today’s heavily overparameterized models, the value of the training loss provides few guarantees on model generalization ability. Indeed, optimizing only the training loss value, as is commonly done, can easily lead to suboptimal model quality. Motivated by prior work connecting the geometry of the loss landscape and generalization, we introduce a new and effective procedure for instead simultaneously minimizing loss value and loss sharpness. In particular, our procedure, Sharpness-Aware Minimization (SAM), seeks parameters that lie in neighborhoods having uniformly low loss; this formulation results in a min-max optimization problem on which gradient descent can be performed efficiently. We present empirical results showing that SAM improves model generalization across a variety of benchmark datasets (e.g., CIFAR-10, CIFAR-100, ImageNet, finetuning tasks) and models, yielding novel state-of-the-art performance for several. Additionally, we find that SAM natively provides robustness to label noise on par with that provided by state-of-the-art procedures that specifically target learning with noisy labels. Finally, we will discuss possible directions for further research around SAM.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20220225T170000 DTEND;TZID=/Europe/London:20220225T180000 DTSTAMP;TZID=/Europe/London:20220209T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20220311100000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Emtiyaz Khan DESCRIPTION:Emtiyaz Khan (Tokyo RIKEN center for Advanced Intelligence Project (AIP)): The Bayesian Learning Rule for Adaptive AI\n\nLocation: Zoom\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Humans and animals have a natural ability to autonomously learn and quickly adapt to their surroundings. How can we design AI systems that do the same? In this talk, I will present Bayesian principles to bridge such gaps between humans and AI. I will show that a wide-variety of machine-learning algorithms are instances of a single learning-rule called the Bayesian learning rule. The rule unravels a dual perspective yielding new adaptive mechanisms for machine-learning based AI systems. My hope is to convince the audience that Bayesian principles are indispensable for an AI that learns as efficiently as we do.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20220311T100000 DTEND;TZID=/Europe/London:20220311T110000 DTSTAMP;TZID=/Europe/London:20211222T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20220318140000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Alexandre Gramfort DESCRIPTION:Alexandre Gramfort (Inria Parietal Team): Machine Learning without human supervision on neuroscience signals\n\nLocation: Zoom\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: The revolution of artificial intelligence over the last decade has been made possible by statistical machine learning, and in particular by supervised learning where algorithms are given the labels associated with each observation. Although very efficient, this approach faces several difficulties in a neuroscience and more broadly in a medical context, one needs enough labels, one needs good labels, and sometimes simply defining what are the labels is problematic... While pure unsupervised learning can be tempting it leads to other kinds of difficulties, namely model selection, validation and results interpretation which is often challenging beyond computer vision and natural language processing. In my presentation, I will discuss recent strategies we have explored in my team to bring AI and neuroscience together by leveraging large EEG and fMRI datasets and without relying on tedious or costly human annotations. I will first present how self-supervised learning allows to reveal structures in EEG data [1], before explaining how fMRI and pretained language models can help us decipher language processing in the brain [2, 3]. Finally I will present how old ideas from latent factor models with independence assumptions can help us make sense of neuroimaging data collected when subjects are exposed to uncontroled naturalistic stimuli [4,5]. References [1] Banville, H., Chehab, O., Hyvärinen, A., Engemann, D. and Gramfort, A. (2020), Uncovering the structure of clinical EEG signals with self-supervised learning, J. Neural Engineering [2] Caucheteux, C, Gramfort, A, King, J.-R. (2021), Model-based analysis of brain activity reveals the hierarchy of language in 305 subjects, Proc. EMNLP Findings, 2021 [3] Caucheteux, C, Gramfort, A, King, J.-R. (2021), Disentangling Syntax and Semantics in the Brain with Deep Networks, Proc. ICML [4] Richard, H., Gresele, L., Hyvärinen, A., Thirion, B., Gramfort, A., Ablin, P. (2020), Modeling Shared Responses in Neuroimaging Studies through MultiView ICA, Proc. NeurIPS [5] Richard, H., Ablin, P., Thirion, B., Gramfort, A., Hyvärinen, A., P. (2021), Shared Independent Component Analysis for Multi-Subject Neuroimaging, Proc. NeurIPS\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20220318T140000 DTEND;TZID=/Europe/London:20220318T150000 DTSTAMP;TZID=/Europe/London:20211226T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20220325160000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Stefano Ermon DESCRIPTION:Stefano Ermon (Stanford University): Utilitarian Information Theory\n\nLocation: Zoom\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Shannon’s information theory, which lies at the foundation of AI and machine learning, provides a conceptual framework to characterize information in a mathematically rigorous sense. However, important computational aspects are not considered, as it does not account for how much information can actually be used by a computationally bounded decision maker. This limits its utility in several practical real-world scenarios. I will discuss generalizations of Shannon’s entropy, information and related divergences that account for how information will be used by a (computationally bounded) decision maker, as well as their applications in representation learning, structure learning, Bayesian optimization, fairness, among others.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20220325T160000 DTEND;TZID=/Europe/London:20220325T170000 DTSTAMP;TZID=/Europe/London:20220201T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20220401140000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Julien Mairal DESCRIPTION:Julien Mairal (Inria Grenoble, Thoth team): Lucas-Kanade Reloaded End-to-End Super-Resolution from Raw Image Bursts\n\nLocation: Zoom\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: This presentation addresses the problem of reconstructing a high-resolution image from multiple lower-resolution snapshots captured from slightly different viewpoints in space and time. Key challenges for solving this super-resolution problem include (i) aligning the input pictures with sub-pixel accuracy, (ii) handling raw (noisy) images for maximal faithfulness to native camera data, and (iii) designing and learning an image prior (regularizer) well suited to the task. We address these three challenges with a hybrid algorithm building on the insight that aliasing is an ally in this setting, with parameters that can be learned end to end, while retaining the interpretability of classical approaches to inverse problems. The effectiveness of our approach is demonstrated on synthetic and real image bursts, setting a new state of the art on several benchmarks and delivering excellent qualitative results on real raw bursts captured by smartphones and prosumer cameras. This is a joint work with Bruno Lecouat and Jean Ponce.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20220401T140000 DTEND;TZID=/Europe/London:20220401T150000 DTSTAMP;TZID=/Europe/London:20220215T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20220408170000@ucl-ellis.github.io LOCATION:Zoom SUMMARY:DeepMind/ELLIS CSML Seminar: Manfred Warmuth DESCRIPTION:Manfred Warmuth (Google Brain (formerly University of California at Santa Cruz)): The blessing and the curse of the multiplicative updates - discusses connections between in evolution and the multiplicative updates of online learning\n\nLocation: Zoom\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Multiplicative updates multiply the parameters by nonnegative factors. These updates are motivated by a Maximum Entropy Principle and they are prevalent in evolutionary processes where the parameters are for example concentrations of species and the factors are survival rates. The simplest such update is Bayes rule and we give an in vitro selection algorithm for RNA strands that implements this rule in the test tube where each RNA strand represents a different model. In one liter of the RNA soup there are approximately 10^15 different strands and therefore this is a rather high-dimensional implementation of Bayes rule. We investigate multiplicative updates for the purpose of learning online while processing a stream of examples. The "blessing" of these updates is that they learn very fast in the short term because the good parameters grow exponentially. However their "curse" is that they learn too fast and wipe out parameters too quickly. This can have a negative effect in the long term. We describe a number of methods developed in the realm of online learning that ameliorate the curse of the multiplicative updates. The methods make the algorithm robust against data that changes over time and prevent the currently good parameters from taking over. We also discuss how the curse is circumvented by nature. Surprisingly, some of nature's methods parallel the ones developed in Machine Learning, but Nature also has some additional tricks.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20220408T170000 DTEND;TZID=/Europe/London:20220408T180000 DTSTAMP;TZID=/Europe/London:20220422T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20220429120000@ucl-ellis.github.io LOCATION:Ground Floor Lecture Theatre UCL Gatsby Computational Neuroscience Unit 25 Howland St London W1T 4JG SUMMARY:DeepMind/ELLIS CSML Seminar: Siu Lun (Alan) Chau DESCRIPTION:Siu Lun (Alan) Chau (Oxford University): Explaining Kernel Methods with RKHS-SHAP\n\nLocation: Ground Floor Lecture Theatre, UCL Gatsby Computational Neuroscience Unit, 25 Howland St, London W1T 4JG\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Feature attribution for kernel methods is often heuristic and not individualised for each prediction. To address this, we turn to the concept of Shapley values, a coalition game theoretical framework that has previously been applied to different machine learning model interpretation tasks. By analysing Shapley values from a functional perspective, we propose RKHS-SHAP, an attribution method for kernel machines that can efficiently compute both Interventional and Observational Shapley values using kernel mean embeddings of distributions. In this talk, we will start by introducing Shapley values, and how they are used to interpret models such as linear models, trees and deep nets, and finally we will present RKHS-SHAP as the latest member to this family of model-specific SHAP methods.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20220429T120000 DTEND;TZID=/Europe/London:20220429T130000 DTSTAMP;TZID=/Europe/London:20220429T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20220506120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Alexander Terenin DESCRIPTION:Alexander Terenin (Cambridge University): Non-Euclidean Matérn Gaussian Processes\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: In recent years, the machine learning community has become increasingly interested in learning in settings where data lives in non-Euclidean spaces, for instance in applications to physics and engineering, or other settings where it is important that symmetries are enforced. In this talk, we will develop a class of Gaussian process models defined on Riemannian manifolds and graphs, and show how to effectively perform all computations needed to train these models using standard automatic-differentiation-based methods. This gives an effective framework to deploy data-efficient interactive decision-making systems such as Bayesian optimization to settings with symmetries and invariances.\n\nBiography: Alexander Terenin is a Postdoctoral Research Associate at the University of Cambridge. He is interested in statistical machine learning, particularly in settings where the data is not fixed, but is gathered interactively by the learning machine. This leads naturally to Gaussian processes and data-efficient interactive decision-making systems such as Bayesian optimization, to areas such as multi-armed bandits and reinforcement learning, and to techniques for incorporating inductive biases and prior information such as symmetries into machine learning models.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20220506T120000 DTEND;TZID=/Europe/London:20220506T130000 DTSTAMP;TZID=/Europe/London:20220215T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20220513120000@ucl-ellis.github.io LOCATION:G08 Sir David Davies LT UCL Roberts Building London WC1E 7JE SUMMARY:DeepMind/ELLIS CSML Seminar: Harita Dellaporta DESCRIPTION:Harita Dellaporta (University of Warwick): Robust Bayesian Inference for Simulator-based Models via the MMD Posterior Bootstrap\n\nLocation: G08 Sir David Davies LT, UCL Roberts Building, London WC1E 7JE\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Simulator-based models are models for which the likelihood is intractable but simulation of synthetic data is possible. They are often used to describe complex real-world phenomena, and as such can often be misspecified in practice. In this talk, I will present a novel algorithm based on the posterior bootstrap and maximum mean discrepancy estimators. This leads to a highly-parallelisable Bayesian inference algorithm with strong robustness properties. This is demonstrated through an in-depth theoretical study which includes generalisation bounds, frequentist consistency and robustness of our posterior guarantees. The approach is then illustrated on a range of examples including a g-and-k distribution and a toggle-switch model.\n\nBiography: Harita is a second-year PhD student at the Warwick CDT in Mathematics & Statistics under the supervision of Prof. Theo Damoulas. Prior to this, Harita obtained an MSc in Computational Statistics & Machine Learning from UCL. Her research focuses on generalised notions of Bayesian inference with emphasis on robustness and model misspecification.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20220513T120000 DTEND;TZID=/Europe/London:20220513T130000 DTSTAMP;TZID=/Europe/London:20220110T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20220527120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Daniel Paulin DESCRIPTION:Daniel Paulin (University of Edinburgh): Efficient MCMC Sampling with Dimension-Free Convergence Rate using ADMM-type Splitting\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Performing exact Bayesian inference for complex models is computationally intractable. Markov chain Monte Carlo (MCMC) algorithms can provide reliable approximations of the posterior distribution but are expensive for large data sets and high-dimensional models. A standard approach to mitigate this complexity consists in using subsampling techniques or distributing the data across a cluster. However, these approaches are typically unreliable in high-dimensional scenarios. We focus here on a recent alternative class of MCMC schemes exploiting a splitting strategy akin to the one used by the celebrated alternating direction method of multipliers (ADMM) optimization algorithm. These methods appear to provide empirically state-of-the-art performance but their theoretical behaviour in high dimensions is currently unknown. In this paper, we propose a detailed theoretical study of one of these algorithms known as the split Gibbs sampler. Under regularity conditions, we establish explicit convergence rates for this scheme using Ricci curvature and coupling ideas. We support our theory with numerical illustrations. This is joint work with Maxime Vono (Criteo AI Lab) and Arnaud Doucet (Oxford).\n\nBiography: Daniel Paulin obtained his PhD in mathematics at the National University of Singapore in 2014. He has done some postdoc years in NUS working with Alexandre Thiery and Ajay Jasra, and in Oxford working with Arnaud Doucet and George Deligiannidis. Since 2019, he has been a Lecturer at the University of Edinburgh. His research interests are mainly in applied probability, computational statistics, and optimization.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20220527T120000 DTEND;TZID=/Europe/London:20220527T130000 DTSTAMP;TZID=/Europe/London:20220114T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20220610120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Badr-Eddine Chérief-Abdellatif DESCRIPTION:Badr-Eddine Chérief-Abdellatif (University of Oxford): Robust Estimation via Maximum Mean Discrepancy\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: In this talk, we will study the properties of a minimum distance estimator based on the Maximum Mean Discrepancy (MMD). We will show that this estimator is universal in the i.i.d. setting, even in case of misspecification, it converges to the best approximation of the data generation process in the model, without any assumption on this model. We will also show that these results remain valid when the data are not independent, but rather satisfy a weak-dependence condition. This condition is based on a new dependence coefficient, which is itself defined using the MMD. We will argue with examples that this new notion of dependence is in fact quite general.\n\nBiography: Badr-Eddine Chérief-Abdellatif currently holds a postdoctoral research position in the Department of Statistics at the University of Oxford, working with Arnaud Doucet. Prior to that, he received a PhD in statistics from Institut Polytechnique de Paris prepared at CREST (Center for Research in Economics and Statistics), Paris, under the supervision of Pierre Alquier, currently research scientist at RIKEN AIP in Tokyo. His research covers the fundamental aspects of statistics and machine learning, with a particular focus on the development of tractable and efficient learning methods, and on understanding their statistical properties and their ability to generalize. He is particularly interested in variational inference and in PAC-Bayes theory, and more generally in robust statistics, high-dimensional statistics, online learning and optimization.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20220610T120000 DTEND;TZID=/Europe/London:20220610T130000 DTSTAMP;TZID=/Europe/London:20220210T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20220617120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Maurice Weiler DESCRIPTION:Maurice Weiler (University of Amsterdam): Equivariant and coordinate independent convolutional networks\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: The classical convolutional network architecture can be derived solely from the requirement for translational equivariance. Steerable CNNs generalize this idea to affine symmetry groups, resulting in network architectures that are equivariant under additional symmetries of Euclidean spaces, including for instance rotations, reflections, scaling or shearing. In the second part of this talk we take an alternative viewpoint, considering passive gauge transformations of the labeling/coordinatization of data instead of active transformations of the data itself. This viewpoint allows us to generalize convolutions to Riemannian manifolds, which do not admit a canonical choice of reference frames (gauges) and thus require gauge equivariant convolution kernels. While only being designed to be locally gauge equivariant, we show that such coordinate independent convolutions are in fact equivariant w.r.t. the isometries of the manifold.\n\nBiography: Maurice Weiler is a machine learning researcher with a focus on geometric and equivariant deep learning. After studying computational and theoretical physics at Heidelberg University, he is now a fourth year PhD student with Max Welling at AMLab, University of Amsterdam.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20220617T120000 DTEND;TZID=/Europe/London:20220617T130000 DTSTAMP;TZID=/Europe/London:20220428T000000 END:VEVENT BEGIN:VEVENT UID:seminar-20220624120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Rebecca Lewis DESCRIPTION:Rebecca Lewis (Imperial College London): Inference in high-dimensional logistic regression models with separated data\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Existence of the maximum likelihood estimate of logistic regression coefficients requires that the observed sequence of covariate and response values are not linearly separable. Even when the maximum likelihood estimator exists, it can suffer from considerable bias when the number of independent observations is not large relative to the dimension. We propose an alternative approach to inference on the logistic regression coefficients based on a corrected ordinary least squares estimator. Consistency and asymptotic normality of this estimator is established under a high-dimensional regime in which the number p of covariates and the sample size n both tend to infinity with p < n under weak conditions on the design matrix. Validity of Wald-based inference through this route is thereby established, even when maximum likelihood is infeasible.\n\nBiography: Rebecca is a third-year PhD student working in the area of high-dimensional statistics under the supervision of Dr Heather Battey. Her research focuses on the construction of a confidence set of models, a set that includes a small number of models that fit the data essentially equally well, and other topics derived from this, including the one given in this talk.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20220624T120000 DTEND;TZID=/Europe/London:20220624T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20220826120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Tom Everitt DESCRIPTION:Tom Everitt (DeepMind): Causal Foundations for Safe AI\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/91411912764\n\nAbstract: With great power comes great responsibility. Human-level+ artificial general intelligence (AGI) may become humanity’s best friend or worst enemy, depending on whether we manage to align its behavior with human interests or not. To overcome this challenge, we must identify the potential pitfalls and develop effective mitigation strategies. In this talk, I’ll argue that (Pearlian) causality offers a useful formal framework for reasoning about AI risk, and describe some of our recent work on this topic. In particular, I’ll cover causal definitions of incentives, agents, side effects, generalization, and preference manipulation, and discuss how techniques like recursion, interpretability, impact measures, incentive design, and path-specific effects can combine to address AGI risks.\n\nBiography: Tom Everitt is a senior researcher at DeepMind, leading a small team on causal approaches to AGI safety. He holds a PhD from Australian National University, where he wrote the first PhD thesis fully focused on AGI safety under the supervision of Prof. Marcus Hutter.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20220826T120000 DTEND;TZID=/Europe/London:20220826T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20221007120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Joshua David Robinson DESCRIPTION:Joshua David Robinson (MIT): Sign and Basis Invariant Networks for Spectral Graph Representation Learning\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Eigenvectors computed from data arise in various scenarios including principal component analysis, and matrix factorizations. Another key example is the eigenvectors of the graph Laplacian, which encode information about the structure of a graph or manifold. An important recent application of Laplacian eigenvector is to graph positional encodings, which have been used to develop more powerful graph architectures. However, eigenvectors have symmetries that should be respected by models taking eigenvector inputs; (i) sign flips, since if v is an eigenvector then so is -v; and (ii) more general basis symmetries, which occur in higher dimensional eigenspaces with infinitely many choices of basis eigenvectors. We introduce SignNet and BasisNet---new neural network architectures that are sign and basis invariant. We prove that our networks are universal, i.e., they can approximate any continuous function of eigenvectors with the desired invariances. Moreover, when used with Laplacian eigenvectors, our architectures are provably expressive for graph representation learning; they can approximate—and go beyond—any spectral graph convolution, and can compute spectral invariants that go beyond message passing neural networks. Experiments show the strength of our networks for molecular graph regression, learning expressive graph representations, and more.\n\nBiography: Josh Robinson is a fifth year PhD student at MIT CSAIL working with Stefanie Jegelka and Suvrit Sra. His interests include self-supervised learning and models such as graph neural networks that operate on discrete structured data.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20221007T120000 DTEND;TZID=/Europe/London:20221007T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20221014120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Wenkai Xu DESCRIPTION:Wenkai Xu (University of Oxford): AgraSSt Approximate Graph Stein Statistics for Interpretable Assessment of Implicit Graph Generators\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: We propose and analyse a novel statistical procedure, coined AgraSSt, to assess the quality of graph generators which may not be available in explicit forms. In particular, AgraSSt can be used to determine whether a learned graph generating process is capable of generating graphs which resemble a given input graph. Inspired by Stein operators for random graphs, the key idea of AgraSSt is the construction of a kernel discrepancy based on an operator obtained from the graph generator. AgraSSt can provide interpretable criticisms for a graph generator training procedure and help identify reliable sample batches for downstream tasks. We give theoretical guarantees for a broad class of random graph models. We provide empirical results on both synthetic input graphs with known graph generation procedures, and real-world input graphs that the state-of-the-art (deep) generative models for graphs are trained on.\n\nBiography: Wenkai is a postdoc research associate in statistics at the Department of Statistics, University of Oxford. His research interest includes Stein’s method, kernel method, hypothesis testing and statistical inference beyond Euclidean data. His current work focuses on characterising and assessing random graph models and deep generative models via Stein’s method. He completed his Ph.D. in the Gatsby Computational Neuroscience Unit under the supervision of Prof. Arthur Gretton and Prof. Aapo Hyvarinen.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20221014T120000 DTEND;TZID=/Europe/London:20221014T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20221021120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Samory Kpotufe DESCRIPTION:Samory Kpotufe (Columbia University): Adaptivity in Domain Adaptation and Friends\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Domain adaptation, transfer, multitask, meta, few-shots, representation, or lifelong learning … these are all important recent directions in ML that all touch at the core of what we might mean by ‘AI’. As these directions all concern learning in heterogeneous and ever-changing environments, they all share a central question; what information a data distribution may have about another, critically, in the context of a given estimation problem, e.g., classification, regression, bandits, etc. Our understanding of these problems is still rather fledgeling. We plan to present both some recent positive results and also some negative ones. On one hand, recent measures of discrepancy between distributions, fine-tuned to given estimation problems (classification, bandits, etc) offer a more optimistic picture than existing probability metrics (e.g. Wasserstein, TV) or divergences (KL, Renyi, etc) in terms of achievable rates. On the other hand, when considering seemingly simple extensions to choices between multiple datasets (as in multitask), or multiple prediction models (as in Structural Risk Minimization), it turns out that minimax oracle rates are not always adaptively achievable, i.e., using just the available data without side information. These negative results suggest that domain adaptation is more structured in practice than captured by common invariants considered in the literature. The talk will be based on joint work with collaborators over the last few years, namely, G. Martinet, S. Hanneke, J. Suk.\n\nBiography: Samory Kpotufe is an Associate Professor in the Department of Statistics of Columbia University. He graduated (Sept 2010) in Computer Science at the University of California, San Diego, advised by Sanjoy Dasgupta. He then was a researcher at the Max Planck Institute for Intelligent Systems. At the MPI he worked in the department of Bernhard Schoelkopf, in the learning theory group of Ulrike von Luxburg. Following this, he spent a couple years as an Assistant Research Professor at the Toyota Technological Institute at Chicago. He then spent 4 years at ORFE, Princeton University as an Assistant Professor. He was a visiting member at the Institute of Advanced Study from January to July 2020.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20221021T120000 DTEND;TZID=/Europe/London:20221021T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20221028120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Nick Pawlowski & Wenbo Gong DESCRIPTION:Nick Pawlowski & Wenbo Gong (Microsoft Research): Rhino Deep Causal Temporal Relationship Learning with history-dependent noise\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Discovering causal relationships between different variables from time series data has been a long-standing challenge for many domains. Given the complexity of real-world relationships and the nature of observation in discrete time, the causal discovery method needs to consider non-linear relations between variables, instantaneous effects and history dependent noise. However, previous works do not offer a solution addressing all these problems together. In the first part of this talk, we will first set the scene by covering the basic concepts of causality, together with an end-to-end deep learning based causal inference model called DECI. In the second part, we will present our solution towards addressing the aforementioned challenges in real-world time series data by extending DECI. We name it Rhino, which can model non-linear relationships with instantaneous effects while allowing the noise distribution to be modulated by historical observations.\n\nBiography: Nick Pawlowski is a senior researcher at Microsoft Research Cambridge. His research interests include causality, variational inference and probabilistic reasoning and are currently focused on causal machine learningmethods aiming to improve decision making from observational data. Before join MSR, Nick completed his PhD at Imperial College London under the supervision from Ben Glocker. Wenbo Gong is a researcher at Microsoft Research Cambridge. He is interested in causality, approximate inference and deep generative models. Currently, he focuses on developing causal models for time series data and improving the posterior inference over DAGs. Before joining Microsoft, he finished his PhD at University of Cambridge under supervision from Jose Miguel Hernandez Lobato.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20221028T120000 DTEND;TZID=/Europe/London:20221028T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20221104120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Siyuan Guo DESCRIPTION:Siyuan Guo (University of Cambridge & Max Planck Institute for Intelligent Systems): Causal de Finetti: On the Identification of Invariant Causal Structure in Exchangeable Data\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Learning causal structure from observational data often assumes we observe independent and identically distributed (i.i.d.) data. It aims to find a graphical representation that encodes the same set of conditional independence relationships as those present in the observed distribution. It is known that even with unlimited data, there is a limit to how fine-grained a causal structure we can identify. To overcome this limitation of the i.i.d. setting, recent work has explored using data originating from different, related environments to learn richer causal structures. These approaches implicitly rely on the independent causal mechanisms (ICM) principle, which postulates that the mechanism giving rise to an effect given its causes and the mechanism which generates the causes do not inform or influence each other. Thus, components of the causal model can independently change from environment to environment. Despite its wide application in machine learning and causal inference, there is a lack of statistical formalization of the ICM principle and how it enables the identification of richer causal structures from grouped data. Here we present new Causal de Finetti theorems which offer the first statistical formalization of the ICM principle and show how causal structure identification is possible from exchangeable data.\n\nBiography: Siyuan is a PhD student with Ferenc Huszár at the University of Cambridge and Bernhard Schölkopf at the Max Planck Institute for Intelligent Systems. She is a fellow under the Cambridge-Tübingen fellowship and funded by Premium Research Studentship. Her research interest lies in the intersection of causal inference and machine learning. She is interested in developing both theoretical frameworks and methodologies to enable the transfer of ML algorithms from traditional i.i.d. regimes to non-i.i.d tasks. Previously, she studied Machine Learning (MSc) at UCL and Mathematics (BA + MMath) at Cambridge.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20221104T120000 DTEND;TZID=/Europe/London:20221104T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20221111120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Ilija Bogunovic DESCRIPTION:Ilija Bogunovic (UCL): Robust Design Discovery and Exploration in Bayesian Optimization\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Whether in biological design, causal discovery, material production, or physical sciences, one often faces decisions regarding which new data to collect or which experiments to perform. There is thus a pressing need for adaptive algorithms and sampling strategies that make intelligent decisions about data collection processes and allow for data-efficient and robust learning. In this talk, I will discuss some of the core questions related to these requirements. For instance, how can we use data-driven methods to quantify uncertainty in our optimization objective and efficiently learn and discover robust designs? How can we design learning-based decision-making methods that are robust against input perturbations, data shifts, and adversarial attacks? How can we exploit the problem structure for efficient learning (e.g., how to deal with graph data and permutation-invariant reward functions and provably scale to large domains and graphs)? In the context of the previous questions, I will discuss the key statistical and robustness challenges through the lens of Bayesian optimization and neural bandits. I will show the limitations of existing Bayesian optimization and bandit approaches in failing to simultaneously achieve robustness and data efficiency and discuss algorithms that effectively overcome these challenges. These algorithms are robust, data-efficient, and attain rigorous theoretical guarantees. I will also demonstrate their robust performance in several applications by using real-world data sets and popular benchmarks.\n\nBiography: Ilija Bogunovic is a Lecturer in the Electrical Engineering Department at the University College London. Before that, he was a postdoctoral researcher in the Machine Learning Institute and Learning and Adaptive Systems group at ETH Zurich. He received a Ph.D. in Computer and Communication Sciences from EPFL and an MSc in Computer Science from ETH Zurich. His research interests are centered around data-efficient interactive machine learning, sequential decision-making under uncertainty, reliability and robustness considerations in data-driven algorithms, experimental design, active learning methods, and are motivated by a range of emerging real-world applications. He co-founded a recurring ICML workshop on 'Adaptive Experimental Design and Active Learning in the Real World'.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20221111T120000 DTEND;TZID=/Europe/London:20221111T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20221118120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Eugenio Clerico DESCRIPTION:Eugenio Clerico (University of Oxford): A PAC-Bayesian bound for deterministic classifiers\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: We establish a disintegrated PAC-Bayesian bound for classifiers that are trained via continuous-time (non-stochastic) gradient descent. Contrarily to what is standard in the PAC-Bayesian setting, our result applies to a training algorithm that is deterministic, conditioned on a random initialisation, without requiring any de-randomisation step. We provide a broad discussion of the main features of the bound that we propose, and we study analytically and empirically its behaviour on linear models, finding promising results.\n\nBiography: Eugenio Clerico is a final year DPhil student, supervised by Arnaud Doucet and George Deligiannidis. Before arriving in Oxford, he obtained a Bachelor’s degree in Physics at the University of Pavia (Italy) and a Master’s degree in theoretical Physics at the École Normale Supérieure in Paris. His current research lies in statistical learning theory and its applications to modern deep learning algorithms. More precisely, he has been working mostly on generalisation bounds in the PAC-Bayesian and information-theoretic frameworks, and on the Gaussian behaviour of neural networks in the limit of infinite width.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20221118T120000 DTEND;TZID=/Europe/London:20221118T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20221125120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Xiaowen Dong DESCRIPTION:Xiaowen Dong (University of Oxford): On the stability of spectral graph filters and beyond\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Data collected in network domains, hence supported by an irregular graph rather than a regular grid-like structure, are becoming pervasive. Typical examples include gene expression data associated with a protein-protein interaction graph, or behaviours of a group of individuals in a social network. Graph-based signal processing and machine learning are recent techniques that have been developed to handle such graph-structured data and have seen applications in such diverse fields as drug discovery, fake news detection, and traffic prediction. However, a theoretical understanding of the robustness of these models against perturbation to the input graph domain has been lacking. In this talk, I will present our results on the stability bounds of spectral graph filters as well as other recent work on the robustness of graph machine learning models, which together will contribute to the deployment of these models in real-world scenarios.\n\nBiography: Xiaowen Dong is an Associate Professor in the Department of Engineering Science at the University of Oxford, where he is an academic member of both the Machine Learning Research Group and the Oxford-Man Institute. Prior to joining Oxford, he was a postdoctoral associate in the MIT Media Lab, where he remains as a research affiliate, and received his PhD degree from the Swiss Federal Institute of Technology (EPFL), Lausanne, Switzerland. His main research interests concern signal processing and machine learning techniques for analysing network data, and their applications in studying questions across social and economic sciences.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20221125T120000 DTEND;TZID=/Europe/London:20221125T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20221209120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Lionel Riou-Durand DESCRIPTION:Lionel Riou-Durand (University of Warwick): Adaptive Tuning for Metropolis Adjusted Langevin Trajectories\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Hamiltonian Monte Carlo (HMC) is a widely used sampler for continuous probability distributions. In many cases, the underlying Hamiltonian dynamics exhibit a phenomenon of resonance which decreases the efficiency of the algorithm and makes it very sensitive to hyperparameter values. This issue can be tackled efficiently, either via the use of trajectory length randomization (RHMC) or via partial momentum refreshment. The second approach is connected to the kinetic Langevin diffusion, and has been mostly investigated through the use of Generalized HMC (GHMC). However, GHMC induces momentum flips upon rejections causing the sampler to backtrack and waste computational resources. In this work we focus on a recent algorithm bypassing this issue, named Metropolis Adjusted Langevin Trajectories (MALT). We build upon recent strategies for tuning the hyperparameters of RHMC which target a bound on the Effective Sample Size (ESS) and adapt it to MALT, thereby enabling the first user-friendly deployment of this algorithm. We construct a method to optimize a sharper bound on the ESS and reduce the estimator variance. Easily compatible with parallel implementation, the resultant Adaptive MALT algorithm is competitive in terms of ESS rate and hits useful tradeoffs in memory usage when compared to GHMC, RHMC and NUTS.\n\nBiography: Lionel Riou-Durand is a postdoctoral fellow at the University of Warwick within Statistics since September 2019. He is working in the CoSInES project (Computational Statistical Inference for Engineering and Security), whose principal investigator is Gareth Roberts. He defended my PhD thesis in July 2019. He did his PhD at the Center for Research in Economics and Statistics (CREST), under the supervision of Nicolas Chopin and Arnak Dalalyan. His research themes are connected to computational methods for statistics and machine learning. His primary focus is on sampling algorithms, which are widely used tools for the numerical approximation of statistical estimators. He is particularly interested in measuring their accuracy, evaluating their computational complexities and studying their robustness. The approximation of statistical estimators involves two interdependent challenges, since the aim is to guarantee a negligible numerical error compared to the statistical uncertainty, while controlling the computational burden of the algorithm. These algorithms are widely used to approximate statistical estimators based upon high dimensional integrals, omnipresent in Bayesian statistics. Their construction and study are connected to several mathematical fields, such as stochastic processes, optimal transport, numerical analysis and optimisation. An overall objective is to develop efficient and reliable algorithms, while making these easily accessible by practitioners.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20221209T120000 DTEND;TZID=/Europe/London:20221209T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20230120120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Ricardo Silva DESCRIPTION:Ricardo Silva (University College London): Stochastic Causal Programming for Bounding Treatment Effects\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Causal effect estimation is important for many tasks in the natural and social sciences. We design algorithms for the continuous partial identification problem: bounding the effects of multivariate, continuous treatments when unmeasured confounding makes identification impossible. Specifically, we cast causal effects as objective functions within a constrained optimization problem, and minimize/maximize these functions to obtain bounds. We combine flexible learning algorithms with Monte Carlo methods to implement a family of solutions under the name of stochastic causal programming. In particular, we show how the generic framework can be efficiently formulated in settings where auxiliary variables are clustered into pre-treatment and post-treatment sets, where no fine-grained causal graph can be formulated. Contrasted to other generic approaches, this highly simplifies the problem and has advantages concerning how to encode structural knowledge without explicitly constructing latent hidden common causes. Joint work with Kirtan Padh, Jakob Zeitler, David Watson, Matt Kusner and Niki Kilbertus.\n\nBiography: Ricardo Silva is a Professor of Statistical Machine Learning and Data Science at the Department of Statistical Science, UCL, a Faculty Fellow at the Alan Turing Institute, and a recipient of a EPSRC Open Fellowship (2023-2027). Ricardo obtained a PhD in Machine Learning from Carnegie Mellon University, 2005, followed by postdoctoral positions at the Gatsby Unit and at the Statistical Laboratory, University of Cambridge. His main interests are on causal inference, latent variable models, and probabilistic machine learning. His research has received funding from organisations such as EPSRC, Innovate UK, the Office of Naval Research, Winton Research and Adobe Research. Ricardo has also served in the senior program committee of several machine learning conferences, including the role of Senior Area Chair at NeurIPS and ICML, and Program Chair for the Uncertainty in Artificial Intelligence conference.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20230120T120000 DTEND;TZID=/Europe/London:20230120T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20230120120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Aldo Pacchiano DESCRIPTION:Aldo Pacchiano (Microsoft Research NYC): Learning Systems in Adaptive Environments. Theory, Algorithms and Design\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Recent years have seen great successes in the development of learning algorithms in static predictive and generative tasks, where the objective is to learn a model that performs well on a single test deployment and in applications with abundant data. Comparatively less success has been achieved in designing algorithms for deployment in adaptive scenarios where the data distribution may be influenced by the choices of the algorithm itself, the algorithm needs to adaptively learn from human feedback, or the nature of the environment is rapidly changing. These are some of the most important challenges in the development of ML driven solutions for technologies such as internet social systems, ML driven scientific experimentation, and robotics. To realize the potential of these technologies we will necessitate better ways of designing algorithms for adaptive learning. In this talk I propose the following algorithm design considerations for adaptive environments 1) sample efficient and tractable learning, 2) generalization to unseen domains via effective knowledge transfer and 3) adaptive learning from human feedback. I will give an overview of my work along each of these axes and introduce a variety of open problems and research directions inspired by this conceptual framing.\n\nBiography: Aldo is a Postdoctoral Researcher at Microsoft Research NYC. He obtained his PhD at UC Berkeley where he was advised by Peter Bartlett and Michael Jordan. His research lies in the areas of Reinforcement Learning, Online Learning, Bandits and Algorithmic Fairness. He is particularly interested in furthering our statistical understanding of learning phenomena in adaptive environments and use these theoretical insights and techniques to design efficient and safe algorithms for scientific, engineering, and large-scale societal applications.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20230120T120000 DTEND;TZID=/Europe/London:20230120T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20230203120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Jong Chul Ye DESCRIPTION:Jong Chul Ye (Graduate School of Artificial Intelligence, KAIST, Daejeon, Korea): Manifold-constrained diffusion models for inverse problems in imaging\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Recently, diffusion models have been used to solve various inverse problems in an unsupervised manner with appropriate modifications to the sampling process. However, the current solvers, which recursively apply a reverse diffusion step followed by a projection-based measurement consistency step, often produce sub- optimal results. By studying the generative sampling path, here we show that current solvers throw the sample path off the data manifold, and hence the error accumulates. To address this, we propose an additional correction term inspired by the manifold constraint, which can be used synergistically with the previous solvers to make the iterations close to the manifold. The proposed manifold constraint is straightforward to implement within a few lines of code, yet boosts the performance by a surprisingly large margin. With extensive experiments, we show that our method is superior to the previous methods both theoretically and empirically, producing promising results in many applications such as image inpainting, colorization, and sparse-view computed tomography. Then, we extend diffusion solvers to efficiently handle general noisy (non)linear inverse problems via approximation of the posterior sampling. Interestingly, the resulting posterior sampling scheme is a blended version of diffusion sampling with the manifold constrained gradient without a strict measurement consistency projection step, yielding a more desirable generative path in noisy settings compared to the previous studies. Our method demonstrates that diffusion models can incorporate various measurement noise statistics such as Gaussian and Poisson, and also efficiently handle noisy nonlinear inverse problems such as Fourier phase retrieval and non-uniform deblurring.\n\nBiography: Jong Chul Ye is a Professor of Graduate School of Artificial Intelligence (AI) of Korea Advanced Institute of Science and Technology (KAIST), Korea. He received the B.Sc. and M.Sc. degrees from Seoul National University, Korea, and the Ph.D. from Purdue University, West Lafayette. Before joining KAIST, he worked at Philips Research and GE Global Research in New York. He has served as an associate editor of IEEE Trans. on Image Processing, and an editorial board member for Magnetic Resonance in Medicine. He is currently an associate editor for IEEE Trans. on Medical Imaging, and a Senior Editor of IEEE Signal Processing Magazine. He is an IEEE Fellow, was the Chair of IEEE SPS Computational Imaging TC, and IEEE EMBS Distinguished Lecturer. He was a General Cochair (with Mathews Jacob) for IEEE Symp. On Biomedical Imaging (ISBI) 2020, and will be a program chair for IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP) 2024.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20230203T120000 DTEND;TZID=/Europe/London:20230203T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20230217120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Sam Power DESCRIPTION:Sam Power (University of Bristol): Explicit convergence bounds for Metropolis Markov chains\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Markov chain Monte Carlo (MCMC) algorithms are a widely-used tool for approximate simulation from probability measures in structured, high-dimensional spaces, with a variety of applications. A key ingredient of their success is their ability to converge rapidly to equilibrium at a rate which depends acceptably on the ‘difficulty' of the sampling problem at hand, as captured by the dimension of the problem, and the concentration and smoothness properties of the target distribution. In this talk, I will present recent work with C. Andrieu, A. Lee and A. Wang on the convergence analysis of Metropolis-type MCMC algorithms on R^d. In particular, we provide a detailed study of the Random Walk Metropolis (RWM) Markov chain with arbitrary proposal variances and in any dimension, obtaining interpretable estimates on their convergence behaviour under suitable assumptions. These estimates have a provably sharp dependence on the dimension of the problem, thus providing theoretical validation for the use of these algorithms in complex settings. Our positive results are quite generally applicable. We also study the preconditioned Crank--Nicolson Markov chain as applied to simulation from Gaussian Process posterior models, obtaining dimension-independent complexity bounds under suitable assumptions.\n\nBiography: Sam Power is a researcher in Statistics. He is currently a Postdoctoral Research Associate at the University of Bristol, working with Prof. Christophe Andrieu on the Bayes4Health grant. He also works closely with Prof. Anthony Lee at Bristol. Prior to this role, he was a PhD student at the University of Cambridge, working with Dr. Sergio Bacallado. His research interests center around the design and analysis of stochastic algorithms, with applications mainly to statistics. I am particularly interested in Monte Carlo methods, such as Markov Chain Monte Carlo and Sequential Monte Carlo, and how the implementation of these methods can be made automatic, robust, and efficient.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20230217T120000 DTEND;TZID=/Europe/London:20230217T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20230303120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Robin Evans DESCRIPTION:Robin Evans (University of Oxford): Parameterizing and Simulating from Causal Models\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Many statistical problems in causal inference involve a probability distribution other than the one from which data are actually observed; as an additional complication, the object of interest is often a marginal quantity of this other probability distribution. This creates many practical complications for statistical inference, even where the problem is non-parametrically identified. In particular, it is difficult to perform likelihood-based inference, or even to simulate from the model in a general way. We introduce the 'frugal parameterization', which places the causal effect of interest at its centre, and then build the rest of the model around it. We do this in a way that provides a recipe for constructing a regular, non-redundant parameterization using causal quantities of interest. In the case of discrete variables we can use odds ratios to complete the parameterization, while in the continuous case copulas are the natural choice; other possibilities are also discussed. Our methods allow us to construct and simulate from models with parametrically specified causal distributions, and fit them using likelihood-based methods, including fully Bayesian approaches. Our proposal includes parameterizations for the average causal effect and effect of treatment on the treated, as well as other common quantities of interest. I will also discuss some other applications of the frugal parameterization, including to survival analysis, parameterizing nested Markov models, and ‘Many Data’: combining randomized and observational datasets in a single parametric model. This is joint work with Vanessa Didelez (University of Bremen and Leibniz Institute for Prevention Research and Epidemiology - BIPS).\n\nBiography: Robin Evans is an Associate Professor in Statistics, and a fellow of Jesus College. He received my PhD in Statistics from the University of Washington in 2011, and was a Postdoctoral Research Fellow at the Statistical Laboratory in Cambridge from 2011 to 2013. His research interests include graphical models, causal inference, latent variable models and algebraic and semi-parametric statistics.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20230303T120000 DTEND;TZID=/Europe/London:20230303T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20230324120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Tom Rainforth DESCRIPTION:Tom Rainforth (University of Oxford): Modern Bayesian Experimental Design\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Bayesian experimental design (BED) provides a powerful and general framework for optimizing the design of experiments. However, its deployment often poses substantial computational challenges that can undermine its practical use. In this talk, I will outline how recent advances have transformed our ability to overcome these challenges and thus utilize BED effectively, before discussing some key areas for future development in the field. Related review paper: https://arxiv.org/abs/2302.14545\n\nBiography: I am a Senior Research Fellow in Machine Learning and a faculty member of the OxCSML Group in the Department of Statistics at the University of Oxford, where I run the RainML Research Lab (https://rainml.uk/). My research covers a wide range of topics in and around machine learning and experimental design, with areas of particular interest including Bayesian experimental design, deep learning, representation learning, generative models, Monte Carlo methods, active learning, probabilistic programming, and variational inference.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20230324T120000 DTEND;TZID=/Europe/London:20230324T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20230331120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Pierre Alquier DESCRIPTION:Pierre Alquier (ESSEC Business School Singapore): New deviation inequalities for Markov chains, with applications to stochastic optimization and empirical risk minimization\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Many deviation inequalities were recently proven for Markov chains based on martingale techniques. However, such inequalities rely strongly on the assumption that the chain is homogeneous and contractive. Such an assumption is not satisfied in many practical situations, a typical example being the iterates of SGD. In this paper, we extend these techniques to prove deviation inequalities for a class of non-homogeneous Markov chains. I will introduce these inequalities and then focus on two applications: empirical risk minimization for time series, and stochastic optimization. This is based on a joint work with Xiequan Fan (Tianjin University) and Paul Doukhan (Université de Cergy-Pontoise): https://linkinghub.elsevier.com/retrieve/pii/S0304414922001600 (published version) https://arxiv.org/abs/2102.08685 (open-access version)\n\nBiography: Pierre Alquier is a professor in statistics at ESSEC Business School in the ASIA-PACIFIC Campus in Singapore. He was previously a researcher at RIKEN AIP in Tokyo. Previously, he held various academic positions in Europe, including Professor of Statistics at ENSAE Paris and Senior Lecturer in Statistics at UCD Dublin. He is currently a senior member of the PC for COLT 2023 and editor for the Journal of Machine Learning Research and Transactions on Machine Learning Research.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20230331T120000 DTEND;TZID=/Europe/London:20230331T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20230421120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: James A. Landay DESCRIPTION:James A. Landay (Stanford University): 'AI For Good' Isn’t Good Enough: A Call for Human-Centered AI (joint UCLIC Seminar)\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/92614118102\n\nAbstract: AI for Good initiatives recognize the potential impacts of AI systems on humans and societies. However, simply recognizing these impacts is not enough. To be truly Human-Centered, AI development must be user-centered, community-centered, and societally-centered. User-centered design integrates techniques that consider the needs and abilities of end users, while also improving designs through iterative user testing. Community-centered design engages communities in the early stages of design through participatory techniques. Societally-centered design forecasts and mediates potential impacts on a societal level throughout a project. Successful Human-Centered AI requires the early engagement of multidisciplinary teams beyond technologists, including experts in design, the social sciences and humanities, and domains of interest such as medicine or law, as well as community members. In this talk I will elaborate on my argument for an authentic Human-Centered AI.\n\nBiography: James Landay is a Professor of Computer Science and the Anand Rajaraman and Venky Harinarayan Professor in the School of Engineering at Stanford University. He co-founded and is Vice Director of the Stanford Institute for Human-Centered Artificial Intelligence (HAI). Landay previously was a tenured faculty member at Cornell Tech, the University of Washington, and UC Berkeley. He was also Director of Intel Labs Seattle and co-founder of NetRaker. Landay received his BS in EECS from UC Berkeley, and MS and PhD in Computer Science from Carnegie Mellon University. He is a member of the ACM SIGCHI Academy and an ACM Fellow. He served on the NSF CISE Advisory Committee for six years.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20230421T120000 DTEND;TZID=/Europe/London:20230421T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20230428120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Seth Flaxman DESCRIPTION:Seth Flaxman (University of Oxford): Deep generative modelling with πVAE and PriorVAE to enable scalable MCMC inference on stochastic processes\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Bayesian inference of models where the prior is a stochastic process, e.g. Gaussian process models, are ubiquitous in applied fields where both the flexibility of models and accurate uncertainty quantification are of importance. Decades of research have attempted to alleviate well-known computational bottlenecks, to varying degrees of success. We describe two new related approaches to encoding Gaussian process priors or their finite realisations using deep generative models (VAEs). In our πVAE/PriorVAE framework, trained decoders replace the original prior during Markov chain Monte Carlo (MCMC) inference, conveniently enabling any probabilistic programming framework to sample from complex, nonparametric priors. This approach enables fast and highly efficient inference, with orders-of-magnitude speedups in MCMC efficiency after paying a one-off cost to train a deep neural network. We will describe recent work to enable the recovery of interpretable hyperparameters for these models and applications to spatiotemporal disease modelling. Relevant papers: πVAE (Mishra et al, 2022; https://link.springer.com/article/10.1007/s11222-022-10151-w), PriorVAE (Semenova et al, 2022; https://royalsocietypublishing.org/doi/full/10.1098/rsif.2022.0094), PriorCVAE (Semenova et al, 2023; https://arxiv.org/abs/2304.04307).\n\nBiography: Seth Flaxman is an Associate Professor in the Department of Computer Science at Oxford. His PhD is in machine learning and public policy from Carnegie Mellon University. He was part of the Imperial College COVID-19 Response Team and has published widely on computational statistics and statistical machine learning. He helps run the Machine Learning & Global Health Network (MLGH.net). He was awarded the Samsung AI Researcher of the Year Award in 2020, and the SPI-M-O Award for Modelling and Data Support (SAMDS), in recognition of epidemiological and modelling advice provided to UK government during the Covid-19 pandemic. His research is supported by an EPSRC fellowship.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20230428T120000 DTEND;TZID=/Europe/London:20230428T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20230505120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Mike Walmsley DESCRIPTION:Mike Walmsley (University of Manchester): Practical Deep Learning at Galaxy Zoo\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Modern telescopes take far more images than astronomers could ever review – but these images are only useful if we can quantify the appearance of the galaxies they capture. Galaxy Zoo (www.galaxyzoo.org) is a citizen science project recruiting hundreds of thousands of volunteers to label the appearance of millions of galaxies. This talk will describe how we support our volunteers in this task with a pragmatic mix of DL methods. I will introduce the challenges and opportunities that come from gathering labels from volunteers, rather than paid workers. I will also describe how we use our trained models to create practical tools for them - similarity search, cluster detection, personalised anomaly recommendation – by exploring the models’ learned representations. Our final models will run as part of the pipeline for the Euclid space telescope and help astronomers understand why galaxies look the way they do.\n\nBiography: Mike Walmsley an astrophysics postdoc at the University of Manchester and the lead data scientist for Galaxy Zoo. He is currently on secondment to the Private Office of the Secretary of State for Health and Social Care, using data science to inform policy responses to NHS winter pressures. He will be moving to the University of Toronto as a Dunlap Fellow in September.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20230505T120000 DTEND;TZID=/Europe/London:20230505T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20230512120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Alexander Terenin DESCRIPTION:Alexander Terenin (University of Cambridge): Physically Structured Neural Networks for Smooth and Contact Dynamics\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: A neural network’s architecture encodes key information and inductive biases that are used to guide its predictions. In this talk, we discuss recent work which leverages the perspective of neural ordinary differential equations to design network architectures that encode the structures of classical mechanics. We examine the cases of both smooth dynamics and non-smooth contact dynamics. The architectures obtained are easy to understand, show excellent performance and data-efficiency on simple benchmark tasks, and are a promising emerging tool for use in robot learning and related areas.\n\nBiography: Alexander Terenin is a Postdoctoral Research Associate at the University of Cambridge. He is interested in statistical machine learning, particularly in settings where the data is not fixed, but is gathered interactively by the learning machine. This leads naturally to Gaussian processes and data-efficient interactive decision-making systems such as Bayesian optimization, to areas such as multi-armed bandits and reinforcement learning, and to techniques for incorporating inductive biases and prior information such as symmetries into machine learning models.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20230512T120000 DTEND;TZID=/Europe/London:20230512T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20230526120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Daniel Mannion DESCRIPTION:Daniel Mannion (University College London): Dendritic Computation: The What, The Why & The How?\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Are we missing a crucial component of neural networks? The power consumption of today's machine learning hardware is exponentially increasing, often limiting ML models to cloud platforms. In contrast, the brain can operate at significantly lower power consumptions on the order of 20 W. In this talk, we will explore the role in dendrites within biological neural networks and explore whether these could be key to increasing the computational power of neural networks while maintaining smaller network sizes and lower power consumptions. We will explore the properties of biological dendrites, how these might be used to achieve more power efficient ML hardware and finally outline different approaches to constructing this maybe-missing link. If you're developing ML models which may one day migrate to portable battery-operated devices, then this talk will cover topics that should be of interest.\n\nBiography: I am a researcher within the Electronic & Electrical department at UCL. My research aims to build hardware that can implement sophisticated machine learning algorithms at low power consumptions. Much of this research is inspired by nature and biological neural networks. For example, my main focus is on replicating dendritic trees and assessing their impact of neural network performances. I am always on the lookout to apply our hardware to new application domains, so if you are interested in migrating your ML models onto portable hardware then please feel free to contact.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20230526T120000 DTEND;TZID=/Europe/London:20230526T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20230602120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Mark van der Wilk DESCRIPTION:Mark van der Wilk (Imperial College London): Bivariate Causal Discovery using Bayesian Model Selection\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: With only observational data on two variables, and without other assumptions, it is not possible to infer which one causes the other. Much of the causal literature has focused on guaranteeing identifiability of causal direction, in statistical models for datasets where strong assumptions hold, such as additive noise or restrictions on parameter count. These methods are then subsequently tested on realistic datasets, most of which violate the assumptions and can therefore not be fit properly. We show how to use causal assumptions within the Bayesian framework. This allows us to specify a model that does not artificially restrict the datasets it can fit, while also encoding independent causal mechanisms, leading to an asymmetry between the causal directions. Identifying causal direction then becomes a Bayesian model selection problem. The strong flexibility does imply that some ambiguous datasets exist for which causality cannot be identified. To demonstrate our approach, we construct a Bayesian non-parametric model that can flexibly model the joint. While making few choices in constructing our model, we outperform previous methods on a wide range of benchmark datasets.\n\nBiography: Mark van der Wilk is a senior lecturer (associate professor) at Imperial College London. Currently, his research mainly focusses on finding training procedures that can adjust the connectivity structure in neural networks to 1) reduce reliance on human design by automatically improving inductive biases, 2) improve efficiency by removing unnecessary circuits, and 3) increase adaptivity to new or changing data. He believes that some embodiment of Occam's razor is needed to do this, whether explicit (through e.g. Bayes), or implicit (through meta-learning). He is also interested in applications with decision-making aspects, and advises start-ups and businesses to improve their methods.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20230602T120000 DTEND;TZID=/Europe/London:20230602T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20230609120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Fabio De Sousa Ribeiro DESCRIPTION:Fabio De Sousa Ribeiro (Imperial College London): High Fidelity Image Counterfactuals with Probabilistic Causal Models\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: The ability to generate plausible counterfactuals has wide scientific applicability and is particularly valuable in fields like medical imaging, wherein data are scarce and underrepresentation of subgroups is prevalent. Answering counterfactual queries like 'why?' and 'what if..?', expressed in the language of causality, could greatly benefit several important research areas such as: (i) explainability; (ii) data augmentation; (iii) robustness to spurious correlations, and (iv) fairness notions in both observed and counterfactual outcomes. Despite recent progress, accurate estimation of interventional and counterfactual queries for high-dimensional structured variables (e.g. images) remains an open problem. Few previous works have attempted to fulfil all three rungs of Pearl’s ladder of causation, namely: association; intervention and counterfactuals in a principled manner using deep models. Moreover, evaluating counterfactuals poses inherent challenges, as they are by definition counter-to-fact and unobservable. Contrary to preceding studies, which focus primarily on identifiability guarantees in the limit of infinite data, we take a pragmatic approach to counterfactuals. We focus on exploring the practical limits and possibilities of estimating and empirically evaluating high-fidelity image counterfactuals of real-world data. To this end, we introduce a specific system and method which leverages ideas from causal mediation analysis and advances in generative modelling to engineer deep causal mechanisms for structured variables. Our experiments illustrate the ability of our proposed mechanisms to perform accurate abduction and plausible estimates of direct, indirect and total effects as measured by axiomatic soundness of counterfactuals.\n\nBiography: Fabio De Sousa Ribeiro is a postdoctoral research associate in the BioMedIA group at Imperial College London working under Professor Ben Glocker. His primary research interests lie at the intersection of causality and deep generative modelling for medical imaging and healthcare applications. His work bolsters the ongoing effort by the machine learning community to combine the central ideas behind causality and deep representation learning to help tackle several challenging research areas such as explainability, robustness and fairness.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20230609T120000 DTEND;TZID=/Europe/London:20230609T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20230616120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Abhin Shah DESCRIPTION:Abhin Shah (MIT): On counterfactual inference with unobserved confounding\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Given an observational study with n independent but heterogeneous units, our goal is to learn the counterfactual distribution for each unit using only one p-dimensional sample per unit containing covariates, interventions, and outcomes. Specifically, we allow for unobserved confounding that introduces statistical biases between interventions and outcomes as well as exacerbates the heterogeneity across units. Modeling the underlying joint distribution as an exponential family, we reduce learning the unit-level counterfactual distributions to learning n exponential family distributions with heterogeneous parameters and only one sample per distribution. We introduce a convex objective that pools all n samples to jointly learn all n parameter vectors, and provide a unit-wise mean squared error bound that scales linearly with the metric entropy of the parameter space. For example, when the parameters are s-sparse linear combination of k known vectors, the error is O(s log k/p). En route, we derive sufficient conditions for compactly supported distributions to satisfy the logarithmic Sobolev inequality. As an application of the framework, our results enable consistent imputation of sparsely missing covariates.\n\nBiography: Abhin Shah is a fifth-year Ph.D. student in EECS department at MIT advised by Prof. Devavrat Shah and Prof. Greg Wornell. He is a recipient of MIT’s Jacobs Presidential Fellowship. He interned at Google Research in 2021 and at IBM Research in 2020. Prior to MIT, he graduated from IIT Bombay with a Bachelor’s degree in Electrical Engineering. His research interests include theoretical and applied aspects of trustworthy machine learning with a focus on causality, fairness, and privacy.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20230616T120000 DTEND;TZID=/Europe/London:20230616T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20230623120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Sattar Vakili DESCRIPTION:Sattar Vakili (MediaTek Research): Kernel-based Reinforcement Learning\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Reinforcement Learning (RL) has shown great empirical success in various settings with complex models and large state-action spaces. However, the existing analytical results typically focus on settings with a small number of state-actions or simple models, such as linearly modeled state-action value functions. To derive RL policies that efficiently handle large state-action spaces with more general value functions, some recent works have explored nonlinear function approximation using kernel ridge regression. In this talk, we examine existing results in this RL setting, analytical tools, their limitations and some open problems. Moreover, we introduce a kernel based optimistic least-squares value iteration policy that achieves order optimal regret bounds for a common class of kernels.\n\nBiography: Sattar Vakili is a senior AI researcher at MediaTek Research. He specializes in problems involving sequential decision-making in uncertain environments, with a focus on optimization, bandit and reinforcement learning, kernel-based modeling, and neural networks. Before joining MediaTek Research, Sattar worked at Secondmind.ai, a research lab in Cambridge, UK, led by Professor Carl Rasmussen, Cambridge University. There, he gained expertise in kernel-based and Gaussian process models. Prior to that, he was a postdoc at Princeton University, and he earned his PhD under the supervision of Professor Qing Zhao at Cornell University in 2017.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20230623T120000 DTEND;TZID=/Europe/London:20230623T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20230929120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Marina Riabiz DESCRIPTION:Marina Riabiz (King's College London): Optimal Thinning of MCMC Output\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: The use of heuristics to assess the convergence and compress the output of Markov chain Monte Carlo (MCMC) can be sub-optimal in terms of the empirical approximations that are produced. Typically, a number of the initial states are attributed to “burn in” and removed, whilst the remainder of the chain is “thinned” if compression is also required. In this talk, I consider the problem of retrospectively selecting a subset of states, of fixed cardinality, from the sample path such that the approximation provided by their empirical distribution is close to optimal. A novel class of methods is proposed, based on minimisation of a kernel Stein discrepancy (KSD), that is suitable when the gradient of the log-target can be evaluated and an approximation using a small number of states is required. To minimize the KSD, we consider greedily scanning the entire MCMC output to select one point at the time, as well as selecting more than one point at a time (making the algorithm non-myopic), and mini-batching the candidate set (making the algorithm non-greedy). Theoretical results guarantee consistency of these methods and their effectiveness is demonstrated in the challenging context of parameter inference for ordinary differential equations. \n\nBiography: Marina Riabiz obtained her undergraduate and master’s degree in Mathematical Engineering from Politecnico di Milano, Italy, specialising in Applied Statistics. She completed her PhD in the Signal Processing Group, Information Engineering, at the University of Cambridge, UK, working on latent variable models for Bayesian inference with stable distribution and processes. She then joined King’s College London in 2018 for her postdoc in the Cardiac Electro-Mechanics Research Group (School of Biomedical Engineering and Imaging Sciences), working on uncertainty quantification for cardiac myocyte models. During this time, she was also a visiting researcher at the Alan Turing Institute. In 2021 Marina joined the Department of Mathematics (KCL) as a Lecturer in Statistics.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20230929T120000 DTEND;TZID=/Europe/London:20230929T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20231006120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Peter Orbanz DESCRIPTION:Peter Orbanz (Gatsby Computational Neuroscience Unit UCL): Learning functions with crystallographic symmetries\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: The symmetries of crystals, periodic tilings and similar repetitive geometries are described by a class of groups called crystallographic groups. Motivated by problems in materials science, I will explain how to obtain a representation of continuous functions invariant under such a group, (1) by factoring through a structure that geometers call an orbifold and (2) by a certain generalization of the Fourier transform. This representation is constructive, can be implemented algorithmically, and allows us to construct machine learning models with crystallographic symmetries, such as neural networks, kernel machines, and Gaussian processes. \n\nBiography: Peter Orbanz is a Professor of Machine Learning in the Gatsby Unit. He moved here from Columbia University, where he was associate professor of statistics. He has also been a postdoc at Cambridge, an office mate of Marc Deisenroth, a Microsoft employee, and a PhD student at ETH Zurich, in no particular order.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20231006T120000 DTEND;TZID=/Europe/London:20231006T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20231013120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Anastasia Mantziou DESCRIPTION:Anastasia Mantziou (Alan Turing Institute): Bayesian model-based clustering for populations of network data\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: There is increasing appetite for analysing populations of network data due to the fast-growing body of applications demanding such methods. While methods exist to provide readily interpretable summaries of heterogeneous network populations, these are often descriptive or ad hoc, lacking any formal justification. In contrast, principled analysis methods often provide results difficult to relate back to the applied problem of interest. Motivated by two complementary applied examples, we develop a Bayesian framework to appropriately model complex heterogeneous network populations, whilst also allowing analysts to gain insights from the data, and make inferences most relevant to their needs. The first application involves a study in Computer Science measuring human movements across a University. The second analyses data from Neuroscience investigating relationships between different regions of the brain. While both applications entail analysis of a heterogeneous population of networks, network sizes vary considerably. We focus on the problem of clustering the elements of a network population, where each cluster is characterised by a network representative. We take advantage of the Bayesian machinery to simultaneously infer the cluster membership, the representatives, and the community structure of the representatives, thus allowing intuitive inferences to be made. The implementation of our method on the human movement study reveals interesting movement patterns of individuals in clusters, readily characterised by their network representative. For the brain networks application, our model reveals a cluster of individuals with different network properties of particular interest in Neuroscience. The performance of our method is additionally validated in extensive simulation studies.\n\nBiography: Anastasia is a Postdoctoral Research Associate at The Alan Turing Institute supervised by Gesine Reinert and Mihai Cucuringu from the University of Oxford. Prior to that, she was a Research Assistant in statistical cyber-security at Imperial College London. She completed her PhD in Statistics at Lancaster University under the supervision of Dr Simon Lunagomez, Dr Robin Mitra and Professor Paul Fearnhead. Her research interests include network analysis, Bayesian methods and topic modelling. Her research has been applied to networks emerging from various scientific fields such as neuroscience, ecology and computer science (human tracking systems). Anastasia is currently working on network time series data with application on economics, under the economic networks and transaction data project in The Alan Turing Institute.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20231013T120000 DTEND;TZID=/Europe/London:20231013T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20231020120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Petros Dellaportas DESCRIPTION:Petros Dellaportas (University College London): Can independent Metropolis samplers beat Monte Carlo?\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Assume that we would like to estimate the expected value of a function $f$ with respect to a density $π$ by using an importance density function $q$. We prove that if $π$ and $q$ are close enough under KL divergence, an independent Metropolis sampler estimator that obtains samplers from $π$ with proposal density $q$, enriched with a variance reduction computational strategy based on control variates, achieves smaller asymptotic variance than the one from crude Monte Carlo. We illustrate our results in challenging option pricing problems that require Monte Carlo estimation. Furthermore, we propose an automatic sampling methodology based on adaptive independent Metropolis and we demonstrate its applicability in option pricing and Bayesian inference problems.\n\nBiography: Petros is a professor at UCL and Athens University of Economics and Business.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20231020T120000 DTEND;TZID=/Europe/London:20231020T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20231027120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Yu Luo DESCRIPTION:Yu Luo (King's College London): Bayesian estimation using loss functions\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: In the usual Bayesian setting, a full probabilistic model is required to link the data and parameters, and the form of this model and the inference and prediction mechanisms are specified via de Finetti's representation. In general, such a formulation is not robust to model mis-specification of its component parts. An alternative approach is to draw inference based on loss functions, where the quantity of interest is defined as a minimizer of some expected loss, and to construct posterior distributions based on the loss-based formulation; this strategy underpins the construction of the Gibbs posterior. We develop a Bayesian non-parametric approach; specifically, we generalize the Bayesian bootstrap, and specify a Dirichlet process model for the distribution of the observables. We implement this using direct prior-to-posterior calculations, but also using predictive sampling. The two updating frameworks yield the same posterior distribution under the exchangeability assumption and guarantee consistent estimation under mild conditions. We also study the assessment of posterior validity for non-standard Bayesian calculations. The methodology is demonstrated via the semi-parameter linear model. \n\nBiography: Yu is currently a Lecturer in Statistics at King’s College London. He finished his PhD at McGill University under the supervision of Prof. David Stephens and Dr. David Buckeridge. His principal research focus has been on developing methodology and computational tools to solve emerging problems in biomedicine and science more generally, especially under fully Bayesian settings. Through his training and collaborations, he has developed interests in biostatistics, mixture models, hidden Markov models and causal inference. \n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20231027T120000 DTEND;TZID=/Europe/London:20231027T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20231103120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: José Miguel Hernández Lobato DESCRIPTION:José Miguel Hernández Lobato (University of Cambridge): Normalizing Flows for Molecular Modeling\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Normalizing flows are tractable density models that can approximate complicated target distributions, e.g. Boltzmann distributions of physical systems. However, current methods for training flows either suffer from mode-seeking behavior, use samples from the target generated beforehand by expensive MCMC methods, or use stochastic losses that have high variance. To avoid these problems, we augment flows with annealed importance sampling (AIS) and minimize the mass-covering α-divergence with α=2, which minimizes importance weight variance. Our method, Flow AIS Bootstrap (FAB), uses AIS to generate samples in regions where the flow is a poor approximation of the target, facilitating the discovery of new modes. We apply FAB to multimodal targets and show that we can approximate them very accurately where previous methods fail. To the best of our knowledge, we are the first to learn the Boltzmann distribution of the alanine dipeptide molecule using only the unnormalized target density, without access to samples generated via Molecular Dynamics (MD) simulations: FAB produces better results than training via maximum likelihood on MD samples while using 100 times fewer target evaluations. After reweighting the samples, we obtain unbiased histograms of dihedral angles that are almost identical to the ground truth.\n\nBiography: José Miguel is Professor of Machine Learning at the Department of Engineering in the University of Cambridge, UK. Before joining Cambridge as faculty, he was a postdoctoral fellow in the Harvard Intelligent Probabilistic Systems group at Harvard University, working with Ryan Adams, and before this, also a postdoctoral research associate in the Machine Learning Group at the University of Cambridge (UK), working with Zoubin Ghahramani. Jose Miguel completed his Ph.D. and M.Phil. in Computer Science at the Computer Science Department from Universidad Autónoma de Madrid (Spain), where he also obtained a B.Sc. in Computer Science from this institution, with a special prize to the best academic record on graduation. José Miguel's research focuses on probabilistic machine learning, with a particular interest in deep generative models, Bayesian optimization, approximate inference, Bayesian neural networks and applications of these methods to real-world problems such as the problem of automatic molecular design.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20231103T120000 DTEND;TZID=/Europe/London:20231103T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20231110120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Jonas Peters DESCRIPTION:Jonas Peters (ETH Zurich): Instrumental Time Series and Effect-Invariance for Policy Generalization\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: We present two different works that have recently helped us when analyzing real-world data: the first paper: https://arxiv.org/abs/2203.06056 considers the problem of instrumental variables in a vector auto-regressive setting; the second paper: https://arxiv.org/abs/2306.10983 introduces effect-invariance, which can help to learn policies that generalize better between subjects.\n\nBiography: Jonas is interested in using different types of data to predict the effect of interventions and to build statistical methods that are robust with respect to distributional shifts. He seeks to combine theory and methodology and tries to let real world applications guide his research. His work relates to areas such as causal inference, distribution generalization, dynamical systems, policy learning, graphical models, and independence testing. Since 2023, Jonas is professor in statistics at ETH Zurich. Previously, he has been a professor at the Department of Mathematical Sciences at the University of Copenhagen and a group leader at the Max-Planck-Institute for Intelligent Systems in Tuebingen. He studied Mathematics at the University of Heidelberg and the University of Cambridge and obtained his PhD jointly from MPI and ETH.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20231110T120000 DTEND;TZID=/Europe/London:20231110T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20231117120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Rajen Shah DESCRIPTION:Rajen Shah (University of Cambridge): Rank-transformed subsampling: Inference for multiple data splitting and exchangeable p-values\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Many testing problems are readily amenable to randomised tests such as those employing data splitting, which divide the data into disjoint parts for separate purposes. However despite their usefulness in principle, randomised tests have obvious drawbacks. Firstly, two analyses of the same dataset may lead to different results. Secondly, the test typically loses power because it does not fully utilise the entire sample. As a remedy to these drawbacks, we study how to combine the test statistics or p-values resulting from multiple random realisations such as through random data splits. We introduce rank-transformed subsampling as a general method for delivering large sample inference about the combined statistic or p-value under mild assumptions. We apply our methodology to a range of problems, including testing unimodality in high-dimensional data, testing goodness-of-fit of parametric quantile regression models, testing no direct effect in a sequentially randomised trial and calibrating cross-fit double machine learning confidence intervals. For the latter, our method improves coverage in finite samples and for the testing problems, our method is able to derandomise and improve power. Moreover, in contrast to existing p-value aggregation schemes that can be highly conservative, our method enjoys type-I error control that asymptotically approaches the nominal level. This is joint work with Richard Guo. Underlying paper: https://arxiv.org/pdf/2301.02739.pdf\n\nBiography: Rajen Shah is currently a Professor of Statistics at the University of Cambridge having obtained his PhD there in 2014. He research interests include high-dimensional and nonparametric statistics and causal inference.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20231117T120000 DTEND;TZID=/Europe/London:20231117T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20231124120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Alberto Caron DESCRIPTION:Alberto Caron (Alan Turing Institute): Bayesian Structure Learning with Random Neighbourhood Samplers\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Structure learning is of interests in many disciplines (e.g., genomics, biology, ecology, etc.) where the aim is to reconstruct a graphical model, in the form of a Directed Acyclic Graph (DAG), underlying a set of random variables. Bayesian methods have demonstrated superiority, particularly in low data regimes, for their ability to learn a distribution over possible DAGs rather than just a Maximum A Posteriori. After briefly introducing the problem of (Bayesian) structure learning, and reviewing some of the popular MCMC based approaches, we propose a novel sampler, PARNI-DAG, that performs efficient sampling from the posterior on DAGs via a locally informed, adaptive random neighbourhood proposal that results in better mixing properties. We demonstrate PARNI-DAG mixing properties and accuracy in DAG learning on a series of experimental setups.\n\nBiography: Alberto is a Research Associate at The Alan Turing Institute, affiliated with the AI for Cyber-Defence team, where he currently works on projects involving causality and sequential decision making under uncertainty. Prior to that, he completed his PhD studies on Bayesian Causal Inference at the UCL Department of Statistical Science, under the supervision of Prof. Ioanna Manolopoulou and Prof. Gianluca Baio.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20231124T120000 DTEND;TZID=/Europe/London:20231124T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20240308120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Qingyuan Zhao DESCRIPTION:Qingyuan Zhao (University of Cambridge): Design: The Missing Concept in AI\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Experimental design was introduced nearly a century ago and embodied the dawn of modern statistics. Today, design is more broadly understood as the process of data collection/preparation and is intimately related to the concept of causal identification. Strikingly, design is largely missing in the current development and discussion of AI. I will share some stories based on research from my group and collaborations, in hope that they will help to promote the awareness of design in the interdisciplinary field of data science. The research works I will discuss range from applied analyses of COVID-19, policing, and biodiversity conservation to theoretical foundations of randomized experiments and causal graphical models, but they share one common theme: design trumps analysis.\n\nBiography: Qingyuan Zhao is an Associate Professor of Statistics in the Statistical Laboratory, Department of Pure Mathematics and Mathematical Statistics (DPMMS) at University of Cambridge, a Fellow of the Corpus Christi College, and an Associate Faculty of the Cambridge Centre for AI in Medicine (CCAIM). He is interested in improving the quality and appraisal of statistical research, including new methodology and a better understanding of causal inference, novel study designs, sensitivity analysis, multiple testing, and selective inference.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20240308T120000 DTEND;TZID=/Europe/London:20240308T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20240315120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Francesco Quinzan DESCRIPTION:Francesco Quinzan (University of Oxford): Trustworthy AI: Exploring Causality and Generative Models for Better-Informed Predictions\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: Recent successes of AI and Machine Learning have ignited a fast transfer of technology from research into products and government services. This phenomenon has created a range of problems, which can be broadly attributed to the interaction between technology and society. Examples of these problems are bias and unfairness, lack of robustness, and lack of transparency. In this talk, I will discuss some of the main challenges in Trustworthy AI, focusing on various applications, including data-driven health care and offline RL. I will argue that it is possible to design AI systems that are robust and capable of generalizing effectively, by uncovering the causal mechanisms of the underlying data generating process. I will also discuss how state-of-the-art generative models can be used on top of these techniques, to further enhance generalization performance. I will illustrate recent advancements in this field, and discuss possible future directions.\n\nBiography: Francesco is an associate researcher at the CS Department at the University of Oxford, hosted by Marta Kwiatkowska. He is also an ELSA Research Associate. Previously, Francesco was a Postdoc at the Division of Decision and Control Systems at KTH, where he worked with Stefan Bauer and Cristian Rojas. He obtained his Ph.D. in Computer Science from the Hasso Plattner Institute in Germany. Francesco visited various institutes and research groups, including the Max Plank Institute for Intelligent Systems, where he was hosted by Bernhard Schölkopf, and the Learning & Adaptive Systems Group at ETH. Francesco studied mathematics at the University of Roma Tre, where he graduated with honours.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20240315T120000 DTEND;TZID=/Europe/London:20240315T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20240322120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Aleksandar Botev DESCRIPTION:Aleksandar Botev (Google DeepMind): Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: In the last few years transformers have become the default architecture for sequential modelling tasks like language modelling. However, a new family of models - state space models - are trying to challenge the status quo. In this talk we will investigate the recent progress on these class models and provide context and different perspectives on them from both theoretical and practical point of view. We will argue that not only the choice of recurrent layer matters, but rather than whole block design and architecture plays a huge role in their success. With this we will present Griffin - a hybrid of Recurrent Gated Linear Recurrent Unit and Local Attention that achieves state of the art performance similar to Transformers, but is significantly faster at inference, both in latency and throughput. We will also show that these models can leverage much longer contexts than being trained on and will discuss interesting implications of this.\n\nBiography: Alex is a Research Scientist in Machine Learning at Google DeepMind. He has worked on generative models, second order optimization, Bayesian methods, ML applied to physics and now is finally dabbling into LLMs. Previously he studied in UCL under the supervision of David Barber for his PhD.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20240322T120000 DTEND;TZID=/Europe/London:20240322T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT BEGIN:VEVENT UID:seminar-20240503120000@ucl-ellis.github.io LOCATION:Function Space UCL Centre for Artificial Intelligence 1st Floor 90 High Holborn London WC1V 6BH SUMMARY:DeepMind/ELLIS CSML Seminar: Kostas Margellos DESCRIPTION:Kostas Margellos (University of Oxford): Optimization under the lens of compression learning: Trading feasibility to performance\n\nLocation: Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH\n\nLink: https://ucl.zoom.us/j/97245943682\n\nAbstract: In this talk we consider convex optimization problems affected by uncertainty, where uncertainty is represented by means of samples/scenarios. We show how finite sample complexity bounds for the generalization properties of the resulting solutions can be obtained, using tools from statistical learning theory based on probably approximately correct learning. Specifically, we view this problem under a compression learning lens that allows for sharper bounds compared to Vapnik-Chervonenkis results. We next discuss how to trade (probabilistic) feasibility to optimality by introducing a sample discarding procedure. Existing results in this direction are not tight, often leading to a conservative behaviour as far as performance is concerned. We show how to overcome this and achieve a tight quantification of the feasibility-performance trade-off using a sequential methodology for sample discarding. Moreover, we discuss certain aspects of applying such methodology in a multi-agent setting, with each agent having access to a private set of samples. \n\nBiography: Kostas Margellos received the Diploma degree in electrical engineering from the University of Patras, Patras, Greece, in 2008, and the Ph.D. degree in control engineering from ETH Zürich, Zürich, Switzerland, in 2012. He spent 2013–2015 as a Postdoctoral Researcher with ETH Zürich; UC Berkeley, Berkeley, CA, USA; and Politecnico di Milano, Milan, Italy, respectively. In 2016, he joined the Control Group, Department of Engineering Science, University of Oxford, Oxford, U.K., where he is currently an Associate Professor. He is also a Fellow of Reuben College, Oxford, U.K., and a Lecturer with Worcester College, Oxford, U.K. His research interests include optimization and control of complex uncertain systems, with applications to energy and transportation networks. He is an Associate Editor for Automatica and IEEE Control Systems Letters, and has been general co-chair of L4DC 2024.\n\nhttps://ucl-ellis.github.io/dm_csml_seminar_home CLASS:PUBLIC DTSTART;TZID=/Europe/London:20240503T120000 DTEND;TZID=/Europe/London:20240503T130000 DTSTAMP;TZID=/Europe/London: END:VEVENT END:VCALENDAR