Ranking earthquake forecasts using proper scoring rules: Binary events in a low probability environment

Francesco Serafini, Mark Naylor, Finn Lindgren, Maxmilian Werner, Ian Main

Research output: Contribution to journalArticlepeer-review

Abstract

Operational earthquake forecasting for risk management and communication during seismic sequences depends on our ability to select an optimal forecasting model. To do this, we need to compare the performance of competing models in prospective experiments, and to rank their performance according to the outcome using a fair, reproducible, and reliable method, usually in a low-probability environment. The Collaboratory for the Study of Earthquake Predictability (CSEP) conducts prospective earthquake forecasting experiments around the globe. In this framework, it is crucial that the metrics employed to rank the competing forecasts are ‘proper’, meaning that, on average, they prefer the data generating model. We prove that the Parimutuel Gambling score, proposed, and in some cases applied, as a metric for comparing probabilistic seismicity forecasts, is in general ‘improper’. In the special case where it is proper, we show it can still be used improperly. We demonstrate the conclusions both analytically and graphically providing a set of simulation based techniques that can be used to assess if a score is proper or not. They only require a data generating model and, at least two forecasts to be compared. We compare the Parimutuel Gambling score’s performance with two commonly-used proper scores (the Brier and logarithmic scores) using confidence intervals to account for the uncertainty around the observed score difference. We suggest that using confidence intervals enables a rigorous approach to distinguish between the predictive skills of candidate forecasts, in addition to their rankings. Our analysis shows that the Parimutuel Gambling score is biased, and the direction of the bias depends on the forecasts taking part in the experiment. Our findings suggest the Parimutuel Gambling score should not be used to distinguishing between multiple competing forecasts, and for care to be taken in the case where only two are being compared.
Original languageEnglish
JournalStochastic Environmental Research and Risk Assessment
DOIs
Publication statusPublished - 28 Mar 2022

Fingerprint

Dive into the research topics of 'Ranking earthquake forecasts using proper scoring rules: Binary events in a low probability environment'. Together they form a unique fingerprint.

Cite this