Deep learning for optimization of protein expression

Evangelos Marios Nikolados, Diego A. Oyarzún

Research output: Contribution to journalReview articlepeer-review

Abstract / Description of output

Recent progress in high-throughput DNA synthesis and sequencing has enabled the development of massively parallel reporter assays for strain characterization. These datasets map a large number of DNA sequences to protein expression levels, sparking increased interest in data-driven methods for sequence-to-expression modeling. Here, we highlight advances in deep learning models of protein expression and their potential for optimizing strains engineered to produce recombinant proteins. We review recent works that built highly accurate models and discuss challenges that hinder adoption by end users. There is a need to better align this technology with the constraints encountered in strain engineering, particularly the cost of acquiring large amounts of data and the requirement for interpretable models that generalize beyond the training data. Overcoming these barriers will help to incentivize academic and industrial laboratories to tap into a new era of data-centric strain engineering.

Original languageEnglish
Article number102941
Number of pages7
JournalCurrent opinion in biotechnology
Publication statusPublished - 21 Apr 2023


Dive into the research topics of 'Deep learning for optimization of protein expression'. Together they form a unique fingerprint.

Cite this