Nano Random Forests to mine protein complexes and their relationships in quantitative proteomics data

Luis F. Montaño-Gutierrez, Shinya Ohta, Georg Kustatscher, William C. Earnshaw, Juri Rappsilber, Kerry S. Bloom (Editor)

Research output: Contribution to journalArticlepeer-review


Ever-increasing numbers of quantitative proteomics datasets constitute a currently underexploited resource for investigating protein function. Multi-protein complexes often follow consistent trends in these experiments, which could provide insights about their biology. Yet, as more experiments are considered, a complex's signature may become conditional and less identifiable. Previously, we successfully distinguished the general proteomic signature of genuine chromosomal proteins from hitchhikers using the Random Forests (RFs) machine learning algorithm. In this technical note, we tested whether small protein complexes could define distinguishable signatures of their own, despite the assumption that machine learning needs large training sets. We show, with simulated and real proteomics data, that RFs can detect small protein complexes and relationships between them. We identified several complexes in quantitative proteomics results of wild-type and knock-out mitotic chromosomes. Other proteins covaried strongly with these complexes, suggesting novel functional links for later study. Integrating the RF analysis for several complexes revealed known interdependencies among kinetochore subunits, and a novel dependency between the inner kinetochore and condensin. Ribosomal proteins, although identified, remained independent of kinetochore subcomplexes. Together, these results show that this complex-oriented RF (NanoRF) approach can integrate proteomics data to uncover subtle protein relationships.
Original languageEnglish
JournalMolecular Biology of the Cell
Early online date5 Jan 2017
Publication statusPublished - 5 Jan 2017


Dive into the research topics of 'Nano Random Forests to mine protein complexes and their relationships in quantitative proteomics data'. Together they form a unique fingerprint.

Cite this