Abstract / Description of output
Background: Methods to undertake diagnostic accuracy studies of administrative epilepsy data are challenged by
lack of a way to reliably rank case-ascertainment algorithms in order of their accuracy. This is because it is
difficult to know how to prioritise positive predictive value (PPV) and sensitivity (Sens). Large numbers of true
negative (TN) instances frequently found in epilepsy studies make it difficult to discriminate algorithm accuracy
on the basis of negative predictive value (NPV) and specificity (Spec) as these become inflated (usually >90%).
This study demonstrates the complementary value of using weather forecasting or machine learning metrics
critical success index (CSI) or F measure, respectively, as unitary metrics combining PPV and sensitivity. We
reanalyse data published in a diagnostic accuracy study of administrative epilepsy mortality data in Scotland.
Method: CSI was calculated as 1/[(1/PPV) + (1/Sens) – 1]. F measure was calculated as 2.PPV.Sens/(PPV +
Sens). CSI and F values range from 0 to 1, interpreted as 0 = inaccurate prediction and 1 = perfect accuracy. The
published algorithms were reanalysed using these and their accuracy re-ranked according to CSI in order to allow
comparison to the original rankings.
Results: CSI scores were conservative (range 0.02–0.826), always less than or equal to the lower of the corresponding
PPV (range 39–100%) and sensitivity (range 2–93%). F values were less conservative (range
0.039–0.905), sometimes higher than either PPV or sensitivity, but were always higher than CSI. Low CSI and F
values occurred when there was a large difference between PPV and sensitivity, e.g. CSI was 0.02 and F was
0.039 in an instance when PPV was 100% and sensitivity was 2%. Algorithms with both high PPV and sensitivity
performed best in terms of CSI and F measure, e.g. CSI was 0.826 and F was 0.905 in an instance when PPV was
90% and sensitivity was 91%.
Conclusion: CSI or F measure can combine PPV and sensitivity values into a convenient single metric that is easier
to interpret and rank in terms of diagnostic accuracy than trying to rank diagnostic accuracy according to the two
measures themselves. CSI or F prioritise instances where both PPV and sensitivity are high over instances where
there are large differences between PPV and sensitivity (even if one of these is very high), allowing diagnostic
accuracy thresholds based on combined PPV and sensitivity to be determined. Therefore, CSI or F measures may
be helpful complementary metrics to report alongside PPV and sensitivity in diagnostic accuracy studies of
administrative epilepsy data.
lack of a way to reliably rank case-ascertainment algorithms in order of their accuracy. This is because it is
difficult to know how to prioritise positive predictive value (PPV) and sensitivity (Sens). Large numbers of true
negative (TN) instances frequently found in epilepsy studies make it difficult to discriminate algorithm accuracy
on the basis of negative predictive value (NPV) and specificity (Spec) as these become inflated (usually >90%).
This study demonstrates the complementary value of using weather forecasting or machine learning metrics
critical success index (CSI) or F measure, respectively, as unitary metrics combining PPV and sensitivity. We
reanalyse data published in a diagnostic accuracy study of administrative epilepsy mortality data in Scotland.
Method: CSI was calculated as 1/[(1/PPV) + (1/Sens) – 1]. F measure was calculated as 2.PPV.Sens/(PPV +
Sens). CSI and F values range from 0 to 1, interpreted as 0 = inaccurate prediction and 1 = perfect accuracy. The
published algorithms were reanalysed using these and their accuracy re-ranked according to CSI in order to allow
comparison to the original rankings.
Results: CSI scores were conservative (range 0.02–0.826), always less than or equal to the lower of the corresponding
PPV (range 39–100%) and sensitivity (range 2–93%). F values were less conservative (range
0.039–0.905), sometimes higher than either PPV or sensitivity, but were always higher than CSI. Low CSI and F
values occurred when there was a large difference between PPV and sensitivity, e.g. CSI was 0.02 and F was
0.039 in an instance when PPV was 100% and sensitivity was 2%. Algorithms with both high PPV and sensitivity
performed best in terms of CSI and F measure, e.g. CSI was 0.826 and F was 0.905 in an instance when PPV was
90% and sensitivity was 91%.
Conclusion: CSI or F measure can combine PPV and sensitivity values into a convenient single metric that is easier
to interpret and rank in terms of diagnostic accuracy than trying to rank diagnostic accuracy according to the two
measures themselves. CSI or F prioritise instances where both PPV and sensitivity are high over instances where
there are large differences between PPV and sensitivity (even if one of these is very high), allowing diagnostic
accuracy thresholds based on combined PPV and sensitivity to be determined. Therefore, CSI or F measures may
be helpful complementary metrics to report alongside PPV and sensitivity in diagnostic accuracy studies of
administrative epilepsy data.
Original language | English |
---|---|
Pages (from-to) | 107275 |
Journal | Epilepsy research |
Volume | 199 |
Early online date | 12 Dec 2023 |
DOIs | |
Publication status | Published - 1 Jan 2024 |