Abstract
The assessment of variant effect predictor (VEP) performance is fraught with biases introduced by
benchmarking against clinical observations. In this study, building on our previous work, we use
independently generated measurements of protein function from deep mutational scanning (DMS)
experiments for 26 human proteins to benchmark 55 different VEPs, while introducing minimal data
circularity. Many top‐performing VEPs are unsupervised methods including EVE, DeepSequence and
ESM‐1v, a protein language model that ranked first overall. However, the strong performance of
recent supervised VEPs, in particular VARITY, shows that developers are taking data circularity and
bias issues seriously. We also assess the performance of DMS and unsupervised VEPs for
discriminating between known pathogenic and putatively benign missense variants. Our findings are
mixed, demonstrating that some DMS datasets perform exceptionally at variant classification, while
others are poor. Notably, we observe a striking correlation between VEP agreement with DMS data
and performance in identifying clinically relevant variants, strongly supporting the validity of our
rankings and the utility of DMS for independent benchmarking.
benchmarking against clinical observations. In this study, building on our previous work, we use
independently generated measurements of protein function from deep mutational scanning (DMS)
experiments for 26 human proteins to benchmark 55 different VEPs, while introducing minimal data
circularity. Many top‐performing VEPs are unsupervised methods including EVE, DeepSequence and
ESM‐1v, a protein language model that ranked first overall. However, the strong performance of
recent supervised VEPs, in particular VARITY, shows that developers are taking data circularity and
bias issues seriously. We also assess the performance of DMS and unsupervised VEPs for
discriminating between known pathogenic and putatively benign missense variants. Our findings are
mixed, demonstrating that some DMS datasets perform exceptionally at variant classification, while
others are poor. Notably, we observe a striking correlation between VEP agreement with DMS data
and performance in identifying clinically relevant variants, strongly supporting the validity of our
rankings and the utility of DMS for independent benchmarking.
Original language | English |
---|---|
Journal | Molecular Systems Biology |
DOIs | |
Publication status | Published - 13 Jun 2023 |
Keywords / Materials (for Non-textual outputs)
- Benchmark
- Circularity
- DMS
- MAVE
- VEP