Edinburgh Research Explorer

Nano Random Forests to mine protein complexes and their relationships in quantitative proteomics data

Research output: Contribution to journalArticle

Related Edinburgh Organisations

Open Access permissions

Open

Documents

Original languageEnglish
JournalMolecular Biology of the Cell
Early online date5 Jan 2017
DOIs
StateE-pub ahead of print - 5 Jan 2017

Abstract

Ever-increasing numbers of quantitative proteomics datasets constitute a currently underexploited resource for investigating protein function. Multi-protein complexes often follow consistent trends in these experiments, which could provide insights about their biology. Yet, as more experiments are considered, a complex's signature may become conditional and less identifiable. Previously, we successfully distinguished the general proteomic signature of genuine chromosomal proteins from hitchhikers using the Random Forests (RFs) machine learning algorithm. In this technical note, we tested whether small protein complexes could define distinguishable signatures of their own, despite the assumption that machine learning needs large training sets. We show, with simulated and real proteomics data, that RFs can detect small protein complexes and relationships between them. We identified several complexes in quantitative proteomics results of wild-type and knock-out mitotic chromosomes. Other proteins covaried strongly with these complexes, suggesting novel functional links for later study. Integrating the RF analysis for several complexes revealed known interdependencies among kinetochore subunits, and a novel dependency between the inner kinetochore and condensin. Ribosomal proteins, although identified, remained independent of kinetochore subcomplexes. Together, these results show that this complex-oriented RF (NanoRF) approach can integrate proteomics data to uncover subtle protein relationships.

Download statistics

No data available

ID: 30858208