TY - JOUR
T1 - EnGens
T2 - a computational framework for generation and analysis of representative protein conformational ensembles
AU - Conev, Anja
AU - Rigo, Mauricio Menegatti
AU - Devaurs, Didier
AU - Fonseca, André Faustino
AU - Kalavadwala, Hussain
AU - de Freitas, Martiela Vaz
AU - Clementi, Cecilia
AU - Zanatta, Geancarlo
AU - Antunes, Dinler Amaral
AU - Kavraki, Lydia E.
N1 - Funding Information:
Work on this project by A.C. and L.E.K. has been supported in part by the National Institutes of Health NIH [U01CA258512]. Other support included: University of Edinburgh and Medical Research Council [MC_UU_00009/2 to D.D.]; Computational Cancer Biology Training Program fellowship [RP170593 to M.M.R.]; The Brazilian National Council for Scientific and Technological Development [CNPq no. 440412/2022-6 to G.Z.]; University of Houston Funds and Rice University Funds.
Publisher Copyright:
© The Author(s) 2023. Published by Oxford University Press.
PY - 2023/7/7
Y1 - 2023/7/7
N2 - Proteins are dynamic macromolecules that perform vital functions in cells. A protein structure determines its function, but this structure is not static, as proteins change their conformation to achieve various functions. Understanding the conformational landscapes of proteins is essential to understand their mechanism of action. Sets of carefully chosen conformations can summarize such complex landscapes and provide better insights into protein function than single conformations. We refer to these sets as representative conformational ensembles. Recent advances in computational methods have led to an increase in the number of available structural datasets spanning conformational landscapes. However, extracting representative conformational ensembles from such datasets is not an easy task and many methods have been developed to tackle it. Our new approach, EnGens (short for ensemble generation), collects these methods into a unified framework for generating and analyzing representative protein conformational ensembles. In this work, we: (1) provide an overview of existing methods and tools for representative protein structural ensemble generation and analysis; (2) unify existing approaches in an open-source Python package, and a portable Docker image, providing interactive visualizations within a Jupyter Notebook pipeline; (3) test our pipeline on a few canonical examples from the literature. Representative ensembles produced by EnGens can be used for many downstream tasks such as protein–ligand ensemble docking, Markov state modeling of protein dynamics and analysis of the effect of single-point mutations.
AB - Proteins are dynamic macromolecules that perform vital functions in cells. A protein structure determines its function, but this structure is not static, as proteins change their conformation to achieve various functions. Understanding the conformational landscapes of proteins is essential to understand their mechanism of action. Sets of carefully chosen conformations can summarize such complex landscapes and provide better insights into protein function than single conformations. We refer to these sets as representative conformational ensembles. Recent advances in computational methods have led to an increase in the number of available structural datasets spanning conformational landscapes. However, extracting representative conformational ensembles from such datasets is not an easy task and many methods have been developed to tackle it. Our new approach, EnGens (short for ensemble generation), collects these methods into a unified framework for generating and analyzing representative protein conformational ensembles. In this work, we: (1) provide an overview of existing methods and tools for representative protein structural ensemble generation and analysis; (2) unify existing approaches in an open-source Python package, and a portable Docker image, providing interactive visualizations within a Jupyter Notebook pipeline; (3) test our pipeline on a few canonical examples from the literature. Representative ensembles produced by EnGens can be used for many downstream tasks such as protein–ligand ensemble docking, Markov state modeling of protein dynamics and analysis of the effect of single-point mutations.
KW - clustering
KW - conformational ensembles
KW - crystal structure analysis
KW - dimensionality reduction
KW - molecular dynamics (MD)
KW - proteins
UR - https://www.scopus.com/pages/publications/85165521714
U2 - 10.1093/bib/bbad242
DO - 10.1093/bib/bbad242
M3 - Article
C2 - 37418278
AN - SCOPUS:85165521714
SN - 1467-5463
VL - 24
JO - Briefings in bioinformatics
JF - Briefings in bioinformatics
IS - 4
M1 - bbad242
ER -