Abstract / Description of output
Ribosome profiling quantifies translation genome-wide by sequencing ribosome-protected fragments, or footprints. Its single-codon resolution allows identification of translation regulation, such as ribosome stalls or pauses, on individual genes. However, enzyme preferences during library preparation lead to pervasive sequence artifacts that obscure translation dynamics. Widespread over- and under-representation of ribosome footprints can dominate local footprint densities and skew estimates of elongation rates by up to five fold. To address these biases and uncover true patterns of translation, we present choros, a computational method that models ribosome footprint distributions to provide bias-corrected footprint counts. choros uses negative binomial regression to accurately estimate two sets of parameters: (i) biological contributions from codon-specific translation elongation rates; and (ii) technical contributions from nuclease digestion and ligation efficiencies. We use these parameter estimates to generate bias correction factors that eliminate sequence artifacts. Applying choros to multiple ribosome profiling datasets, we are able to accurately quantify and attenuate ligation biases to provide more faithful measurements of ribosome distribution. We show that a pattern interpreted as pervasive ribosome pausing near the beginning of coding regions is likely to arise from technical biases. Incorporating choros into standard analysis pipelines will improve biological discovery from measurements of translation.