\dm_csml_event_details UCL ELLIS

Comparing apples and oranges -- reinterpreting common evaluation metrics in classification


Peter Flach


University of Bristol


Friday, 17 October 2014






Malet Place Engineering Building 1.02

Event series

DeepMind/ELLIS CSML Seminar Series



A wide range of evaluation metrics exist in supervised learning, including accuracy, area under the ROC curve (AUC) and Brier score. At first sight these metrics assess different aspects of a predictive model's performance: accuracy measures classification performance (ability to assign the correct class), AUC measures ranking performance (ability to score positives higher than negatives) and Brier score assesses scoring performance (ability to assign probabilities close to the 'ideal' 0/1 values). While it thus appears that these measures are not directly comparable, in this talk I will discuss recent results that demonstrate how each measure can be directly related to expected misclassification loss under varying operating conditions, utilising the notion of a threshold selection method. Among these results is a new interpretation -- and rehabilitation -- of AUC in terms of expected misclassification loss under a novel rate-driven threshold selection method. I will also demonstrate how each evaluation metric can be visualised in cost space, and discuss the importance and effect of classifier calibration. Finally, I will describe ongoing work that investigates how these results can be related to different cost models such as the F-measure.

Most of the talk is based on joint work with José Hernández-Orallo and Cèsar Ferri; some recent publications are accessible here:



Peter Flach has been Professor of Artificial Intelligence at the University of Bristol since 2003. An internationally leading researcher in the areas of mining highly structured data and the evaluation and improvement of machine learning models using ROC analysis, he has also published on the logic and philosophy of machine learning, and on the combination of logic and probability. He is author of Simply Logical: Intelligent Reasoning by Example (John Wiley, 1994) and Machine Learning: the Art and Science of Algorithms that Make Sense of Data (Cambridge University Press, 2012).

Prof Flach is the Editor-in-Chief of the Machine Learning journal, one of the two top journals in the field that has been published for over 25 years by Kluwer and now Springer. He was Programme Co-Chair of the 1999 International Conference on Inductive Logic Programming, the 2001 European Conference on Machine Learning, the 2009 ACM Conference on Knowledge Discovery and Data Mining, and the 2012 European Conference on Machine Learning and Knowledge Discovery in Databases in Bristol.

Slides for the talk: PDF