Edinburgh Research Explorer

Inferring the frequency spectrum of derived variants to quantify adaptive molecular evolution in protein-coding genes of Drosophila melanogaster

Research output: Contribution to journalArticle

Related Edinburgh Organisations

Original languageEnglish
Pages (from-to)975–984
Number of pages22
Issue number1
Early online date20 Apr 2016
Publication statusPublished - 1 Jun 2016


Many approaches for inferring adaptive molecular evolution analyze the unfolded site frequency spectrum (SFS), a vector of counts of sites with different numbers of copies of derived alleles in a sample of alleles from a population. Accurate inference of the high copy number elements of the SFS is difficult, however, because of misassignment of alleles as derived versus ancestral. This is a known problem with parsimony using outgroup species. Here, we show that the problem is particularly serious if there is variation in the substitution rate among sites brought about by variation in selective constraint levels. We present a new method for inferring the SFS using one or two outgroups, which attempts to overcome the problem of misassignment. We show that two outgroups are required for accurate estimation of the SFS if there is substantial variation in selective constraints, which is expected to be the case for nonsynonymous sites in protein-coding genes. We apply the method to estimate unfolded SFSs for synonymous and nonsynonymous sites in a population of Drosophila melanogaster from Phase 2 of the Drosophila Population Genomics Project. We use the unfolded spectra to estimate the frequency and strength of advantageous and deleterious mutations, and estimate that ~50% of amino acid substitutions are positively selected, but that less than 0.5% of new amino acid mutations are beneficial, with a scaled selection strength of Nes ≈ 12.

    Research areas

  • Drosophila, adaptation, distribution of fitness effects, site frequency spectrum (SFS)

ID: 25119441