\dm_csml_event_details UCL ELLIS

Bayesian learning with big data: virtual vector machines and Gaussian processes with sparse eigenval


Yuan (Alan) Qi


Purdue University


Monday, 02 July 2012






Darwin B15 Biochemistry LT

Event series

DeepMind/ELLIS CSML Seminar Series


Title: Bayesian learning with big data: virtual vector machines and Gaussian processes with sparse eigenvalues


In this talk I will cover two topics that have become increasingly important given big data: online learning and sparse Gaussian process models. First, in a typical online learning scenario, a learner is required to process a large data stream using a small memory buffer. Such a requirement is usually in conflict with a learner’s primary pursuit of prediction accuracy. To address this dilemma, we introduce a novel Bayesian online classification algorithm, called the Virtual Vector Machine. The virtual vector machine allows you to smoothly trade-off prediction accuracy with memory size. The virtual vector machine summarizes the information contained in the preceding data stream by a Gaussian distribution over the classification weights plus a constant number of virtual data points. The extra information provided by the virtual points leads to improved predictive accuracy over previous online classification algorithms. Second, we propose a sparse Gaussian process model, EigenGP, based on Karhunen-Loeve (KL) expansions of a GP prior. We use the Nystrom approximation to obtain eigenfunctions of the covariance function and use an empirical Bayesian approach to select these eigenfunctions. By selecting eigenfunctions of Gaussian kernels that are associated with data clusters, EigenGP is also suitable for semi-supervised learning. Our experimental results demonstrate improved predictive performance of EigenGP over several state-of-the- art sparse GP and semisupervised learning methods for regression, classification, and semisupervised classification.

Slides for the talk: PDF