TY - JOUR
T1 - Deep learning for optimization of protein expression
AU - Nikolados, Evangelos Marios
AU - Oyarzún, Diego A.
N1 - Funding Information:
EMN was supported by a doctoral studentship from the Darwin Trust of Edinburgh . DAO was supported by the United Kingdom Research and Innovation (Grant EP/S02431X/1 ).
Publisher Copyright:
© 2023 The Author(s)
PY - 2023/4/21
Y1 - 2023/4/21
N2 - Recent progress in high-throughput DNA synthesis and sequencing has enabled the development of massively parallel reporter assays for strain characterization. These datasets map a large number of DNA sequences to protein expression levels, sparking increased interest in data-driven methods for sequence-to-expression modeling. Here, we highlight advances in deep learning models of protein expression and their potential for optimizing strains engineered to produce recombinant proteins. We review recent works that built highly accurate models and discuss challenges that hinder adoption by end users. There is a need to better align this technology with the constraints encountered in strain engineering, particularly the cost of acquiring large amounts of data and the requirement for interpretable models that generalize beyond the training data. Overcoming these barriers will help to incentivize academic and industrial laboratories to tap into a new era of data-centric strain engineering.
AB - Recent progress in high-throughput DNA synthesis and sequencing has enabled the development of massively parallel reporter assays for strain characterization. These datasets map a large number of DNA sequences to protein expression levels, sparking increased interest in data-driven methods for sequence-to-expression modeling. Here, we highlight advances in deep learning models of protein expression and their potential for optimizing strains engineered to produce recombinant proteins. We review recent works that built highly accurate models and discuss challenges that hinder adoption by end users. There is a need to better align this technology with the constraints encountered in strain engineering, particularly the cost of acquiring large amounts of data and the requirement for interpretable models that generalize beyond the training data. Overcoming these barriers will help to incentivize academic and industrial laboratories to tap into a new era of data-centric strain engineering.
U2 - 10.1016/j.copbio.2023.102941
DO - 10.1016/j.copbio.2023.102941
M3 - Review article
AN - SCOPUS:85153039715
SN - 0958-1669
VL - 81
JO - Current opinion in biotechnology
JF - Current opinion in biotechnology
M1 - 102941
ER -