\dm_csml_event_details UCL ELLIS

NeurIPS Previews 2019


Claire Vernade




Thursday, 28 November 2019






Gordon Street 25, E28 Harrie Massey Lecture Theatre

Event series

DeepMind/ELLIS CSML Seminar Series


* Claire Vernade (Deepmind) --- Weighted Linear Bandits for Non-Stationary Environments:
(Joint work with Yoan Russac (ENS), Olivier Cappé (CNRS))

We consider a stochastic linear bandit model in which the available actions correspond to arbitrary context vectors whose associated rewards follow a non-stationary linear regression model. In this setting, the unknown regression parameter is al-lowed to vary in time. To address this problem, we propose D-LinUCB, a novel optimistic algorithm based on discounted linear regression, where exponential weights are used to smoothly forget the past. This involves studying the deviations of the sequential weighted least-squares estimator under generic assumptions.As a by-product, we obtain novel deviation results that can be used beyond non-stationary environments. We provide theoretical guarantees on the behavior ofD-LinUCBin both slowly-varying and abruptly-changing environments. We obtain an upper bound on the dynamic regret that is of order d^2/3 B_T^1/3 T^2/3, where B_T is a measure of non-stationarity (d and T being, respectively, dimension and horizon). This rate is known to be optimal. We also illustrate the empirical performance ofD-LinUCBand compare it with recently proposed alternatives in simulated environments.

* Giulia Luise (UCL) --- Sinkhorn Barycenters with Free Support via Frank-Wolfe Algorithm
(Joint work with Saverio Salzo (IIT), Carlo Ciliberto (Imperial), Massimiliano Pontil (UCL))

We present a novel algorithm to estimate the barycenter of arbitrary probability distributions with respect to the Sinkhorn divergence. Based on a Frank-Wolfe optimization strategy, our approach proceeds by populating the support of the barycenter incrementally, without requiring any pre-allocation. We consider dis-crete as well as continuous distributions, proving convergence rates of the proposed algorithm in both settings. Key elements of our analysis are a new result showing that the Sinkhorn divergence on compact domains has Lipschitz continuous gradient with respect to the Total Variation and a characterization of the sample complexity of Sinkhorn potentials. Experiments validate the effectiveness of our method in practice.

* Michael Arbel (Gatsby Unit, UCL)--- Maximum Mean Discrepancy Gradient Flow
(Joint work with Anna Korba (Gatsby Unit, UCL), Adil Salim (KAUST), Arthur Gretton (Gatsby Unit, UCL))

We construct a Wasserstein gradient flow of the maximum mean discrepancy(MMD) and study its convergence properties. The MMD is an integral probability metric defined for a reproducing kernel Hilbert space (RKHS), and serves as a metric on probability measures for a sufficiently rich RKHS. We obtain conditions for convergence of the gradient flow towards a global optimum, that can be related to particle transport when optimizing neural networks. We also propose a way to regularize this MMD flow, based on an injection of noise in the gradient. This algorithmic fix comes with theoretical and empirical evidence. The practical implementation of the flow is straightforward, since both the MMD and its gradient have simple closed-form expressions, which can be easily estimated with samples.

* Marcel Hirt (UCL) --- Copula-like Variational Inference
(Joint work with Petros Dellaportas, Alain Durmus (ENS Cachan))

This paper considers a new family of variational distributions motivated by Sklar’s theorem. This family is based on new copula-like densities on the hypercube with non-uniform marginals which can be sampled efficiently,i.e.with a complexity linear in the dimension d of the state space. Then, the proposed variational densities that we suggest can be seen as arising from these copula-like densities used as base distributions on the hypercube with Gaussian quantile functions and sparse rotation matrices as normalizing flows. The latter correspond to a rotation of the marginals with complexity O(d log d). We provide some empirical evidence that such a variational family can also approximate non-Gaussian posteriors and can be beneficial compared to Gaussian approximations. Our method performs largely comparably to state-of-the-art variational approximations on standard regression and classification benchmarks for Bayesian Neural Networks.