\dm_csml_event_details UCL ELLIS

Rank-transformed subsampling: Inference for multiple data splitting and exchangeable p-values


Speaker

Rajen Shah

Affiliation

University of Cambridge

Date

Friday, 17 November 2023

Time

12:00-13:00

Location

Function Space, UCL Centre for Artificial Intelligence, 1st Floor, 90 High Holborn, London WC1V 6BH

Link

https://ucl.zoom.us/j/97245943682

Event series

DeepMind/ELLIS CSML Seminar Series

Abstract

Many testing problems are readily amenable to randomised tests such as those employing data splitting, which divide the data into disjoint parts for separate purposes. However despite their usefulness in principle, randomised tests have obvious drawbacks. Firstly, two analyses of the same dataset may lead to different results. Secondly, the test typically loses power because it does not fully utilise the entire sample. As a remedy to these drawbacks, we study how to combine the test statistics or p-values resulting from multiple random realisations such as through random data splits. We introduce rank-transformed subsampling as a general method for delivering large sample inference about the combined statistic or p-value under mild assumptions. We apply our methodology to a range of problems, including testing unimodality in high-dimensional data, testing goodness-of-fit of parametric quantile regression models, testing no direct effect in a sequentially randomised trial and calibrating cross-fit double machine learning confidence intervals. For the latter, our method improves coverage in finite samples and for the testing problems, our method is able to derandomise and improve power. Moreover, in contrast to existing p-value aggregation schemes that can be highly conservative, our method enjoys type-I error control that asymptotically approaches the nominal level. This is joint work with Richard Guo. Underlying paper: https://arxiv.org/pdf/2301.02739.pdf

Biography

Rajen Shah is currently a Professor of Statistics at the University of Cambridge having obtained his PhD there in 2014. He research interests include high-dimensional and nonparametric statistics and causal inference.