Estimating selection on amino-acid sequence polymorphisms in Drosophila

Project Details


Investigation of selection on protein sequences in the fruitfly Drosophila

Layman's description

The diversity of life on earth has arisen by the process of evolution, which involves the transformation of variability caused by mutations into differences between species. Mutations are the result of extremely rare errors in the replication of DNA, which encodes the genetic information that specify the properties of organisms. In order to understand evolution, we need to know what processes can cause new mutations to spread through a whole population. Two major such processes are known. One is natural selection, which involves differences in reproductive success or survival among individuals with different genetic makeup. The other is genetic drift, which involves random changes in the frequencies of genetic types, due to the fact that populations are always limited in size, so that the genetic make-up of the individuals present in one generation is simply a sample from that of the population in the previous gmodels of the effects of mutation, selection and drift. use two closely species of fruitfly, Drosophila miranda and D. pseudoobscura, for this purpose. They live in the mountain forests of Western North America, in habitats that are undisturbed by human activity, unlike many species of Drosophila used in evolutionary studies. The genome of D. pseudoobscura has recently being sequenced, reflecting its status as a classic organism for genetic and evolutionary studies. We have collected data on DNA sequence variability within the species for a set of 76 genes of these species, and compared these sequences with those in a more distant relative, D. affinis. This allowed us to ask questions about the extent to which natural selection caused changes in the proteins encoded by these genes to accumulate between species, versus the extent to which such changes simply reflect the chance accumulation of mutations with little or no effect on the fitness of their carriers. In addition, we estimated the strength of selection acting on variation in protein sequences within these species. Such variation is likely to reflect mutations with harmful effects on fitness; these are likely to contribute to human diseases with complex genetic causes, so it is important to have information on their effects.

Key findings

We have generated polymorphism data on 39 X-linked and 37 autosomal coding sequence loci (about 500bp long, on average) in these two species, as well as sequencing the same loci from the more distant species, D. affinis. We have examined the properties of the distribution of selection coefficients (s) against deleterious amino-acid mutations, by fitting the data on diversity at synonymous and nonsynonymous sites to models that assume either a gamma or a log-normal distribution. Two different methods were used.They agree in demonstrating that the width of these distributions is large, i.e. the distribution of s is highly leptokurtic. The average value of s for an amino-acid mutation that is segregating in the population is very small, of the order of 1/100,000, whereas the majority of new amino-acid mutations have selection coefficients much greater than this, reflecting the rapid elimination of relatively strongly selected mutations from the population. Only a small fraction of new amino-acid mutations are neutral. These results agree with recent studies of other species, but this is the first time that the two different approaches have been directly compared. The study establishes that natural populations carry a wealth of weakly selected, deleterious nonsynonymous mutations, which cause a substantial variance in fitness among individuals.
We used three different methods to estimate the proportion of amino-acid sequence differences between D. pseudoobscura and D. affinis that were fixed by strong positive selection, as opposed to genetic drift at sites subject to purifying selection. TThe results for the three methods are broadly similar, indicating that a substantial proportion (>50%) of amino-acid fixations are driven by positive selection. Theoretical analysis shows that the differences between the results of the three methods are in line with the possible biases caused by violations of the assumptions that they make. Again, this is the first time that these methods have been compared.
Our data show no evidence for strong sexual selection, acting to reduce the effective population size of males relative to females, and hence causing lower synonymous variation on the X chromosome than the autosomes. This is in striking contrast to the results for D. melanogaster and its relatives, raising interesting questions concerning their reproductive biology.
We also estimated the strength of selection on codon usage is these species, using a new method developed in the Charlesworth lab, and showed that it is comparable with results on other Drosophila species.
Finally, a previously method for estimating the extent to which polymorphisms present in the two focal species are derived from their ancestral species was improved, and applied to the data, with the conclusion that about 10%-10% of current polymorphisms are derived from the ancestral species.
Effective start/end date1/04/0630/06/09


  • NERC: £274,735.00


Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.