Abstract

Motivation: A common experimental output in biomedical science is a list of genes implicated in a given biological process or disease. The gene lists resulting from a group of studies answering the same, or similar, questions can be combined by ranking aggregation methods to find a consensus or a more reliable answer. Evaluating a ranking aggregation method on a specific type of data before using it is required to support the reliability since the property of a dataset can influence the performance of an algorithm. Such evaluation on gene lists is usually based on a simulated database because of the lack of a known truth for real data. However, simulated datasets tend to be too small compared to experimental data and neglect
key features, including heterogeneity of quality, relevance and the inclusion of unranked lists.
Results: In this study, a group of existing methods and their variations which are suitable for metaanalysis of gene lists are compared using simulated and real data. Simulated data was used to explore the performance of the aggregation methods as a function of emulating the common scenarios of real genomic
data, with various heterogeneity of quality, noise level, and a mix of unranked and ranked data using 20000 possible entities. In addition to the evaluation with simulated data, a comparison using real genomic data on the SARS-CoV-2 virus, cancer (NSCLC), and bacteria (macrophage apoptosis) was performed. We summarise our evaluation results in terms of a simple flowchart to select a ranking aggregation method for genomic data. We summarise the results of our evaluation in a simple flowchart to select a ranking aggregation method, and in an automated implementation using the meta-analysis by information content (MAIC) algorithm to infer heterogeneity of data quality across input data sets.
Availability: The code for simulated data generation and running edited version of algorithms:https://github.com/baillielab/comparison_of_RA_methods
Original languageEnglish
JournalBioinformatics
Early online date12 Sep 2022
DOIs
Publication statusE-pub ahead of print - 12 Sep 2022

Fingerprint

Dive into the research topics of 'Systematic comparison of ranking aggregation methods for gene lists in experimental results'. Together they form a unique fingerprint.

Cite this