TY - JOUR
T1 - pyRBDome: a comprehensive computational platform for enhancing RNA-binding proteome data
AU - Chu, Liang-Cui
AU - Christopoulou, Niki
AU - McCaughan, Hugh
AU - Winterbourne, Sophie
AU - Cazzola, Davide
AU - Wang, Shichao
AU - Litvin, Ulad
AU - Brunon, Salomé
AU - Harker, Patrick JB
AU - McNae, Iain
AU - Granneman, Sander
N1 - We would like to thank Shaun Webb and Rasna Walia for help with RNA-BindRPlus, Songling Li and Daron Standley for help writing Python code for automatically submitting aaRNA jobs, Yun Zhou for helping with the analysis of PDB files, and Guido Sanguinetti, Andrea Weiße, and Alfredo Castello for critically reading the article.
PY - 2024/7/30
Y1 - 2024/7/30
N2 - High-throughput proteomics approaches have revolutionised the identification of RNA-binding proteins (RBPome) and RNA-binding sequences (RBDome) across organisms. Yet, the extent of noise, including false positives, associated with these methodologies, is difficult to quantify as experimental approaches for validating the results are generally low throughput. To address this, we introduce pyRBDome, a pipeline for enhancing RNA-binding proteome data in silico. It aligns the experimental results with RNA-binding site (RBS) predictions from distinct machine-learning tools and integrates high-resolution structural data when available. Its statistical evaluation of RBDome data enables quick identification of likely genuine RNA-binders in experimental datasets. Furthermore, by leveraging the pyRBDome results, we have enhanced the sensitivity and specificity of RBS detection through training new ensemble machine-learning models. pyRBDome analysis of a human RBDome dataset, compared with known structural data, revealed that although UV–cross-linked amino acids were more likely to contain predicted RBSs, they infrequently bind RNA in high-resolution structures. This discrepancy underscores the limitations of structural data as benchmarks, positioning pyRBDome as a valuable alternative for increasing confidence in RBDome datasets.All the code and data analysis results are available from our GitLab repository (https://git.ecdf.ed.ac.uk/sgrannem) without restrictions. All the prediction and ground truth analysis results can be found on the repositories starting with pyRBDome-Notebooks. The pyRBDome-Core repository contains all the code required to run the pyRBDome-Notebooks Jupyter notebook files. The results of all the analyses are also available as Microsoft Excel spreadsheets in Tables S2, S3, S4, and S5.
AB - High-throughput proteomics approaches have revolutionised the identification of RNA-binding proteins (RBPome) and RNA-binding sequences (RBDome) across organisms. Yet, the extent of noise, including false positives, associated with these methodologies, is difficult to quantify as experimental approaches for validating the results are generally low throughput. To address this, we introduce pyRBDome, a pipeline for enhancing RNA-binding proteome data in silico. It aligns the experimental results with RNA-binding site (RBS) predictions from distinct machine-learning tools and integrates high-resolution structural data when available. Its statistical evaluation of RBDome data enables quick identification of likely genuine RNA-binders in experimental datasets. Furthermore, by leveraging the pyRBDome results, we have enhanced the sensitivity and specificity of RBS detection through training new ensemble machine-learning models. pyRBDome analysis of a human RBDome dataset, compared with known structural data, revealed that although UV–cross-linked amino acids were more likely to contain predicted RBSs, they infrequently bind RNA in high-resolution structures. This discrepancy underscores the limitations of structural data as benchmarks, positioning pyRBDome as a valuable alternative for increasing confidence in RBDome datasets.All the code and data analysis results are available from our GitLab repository (https://git.ecdf.ed.ac.uk/sgrannem) without restrictions. All the prediction and ground truth analysis results can be found on the repositories starting with pyRBDome-Notebooks. The pyRBDome-Core repository contains all the code required to run the pyRBDome-Notebooks Jupyter notebook files. The results of all the analyses are also available as Microsoft Excel spreadsheets in Tables S2, S3, S4, and S5.
U2 - 10.26508/lsa.202402787
DO - 10.26508/lsa.202402787
M3 - Article
C2 - 39079742
SN - 2575-1077
VL - 7
JO - Life Science Alliance
JF - Life Science Alliance
IS - 10
M1 - e202402787
ER -