Edinburgh Research Explorer

Pig genome annotation and analysis

Project: Research

StatusFinished
Effective start/end date1/05/0831/05/11
Total award£264,014.00
Funding organisationBBSRC
Funder project referenceBB/E10520/2
Period1/05/0831/05/11

Description

The genome represents a complete description of an organism. However, to understand the functioning of the genes and regulatory elements, and to design sensible molecular biological experiments to test hypotheses, the genome sequence must be related to the extant functional data for that organism. We propose to annotate and analyse the sequence being generated by the International Pig Genome Sequencing Project. We will use the well established Ensembl system as the main tool for storage, management and dissemination of pig genome data. Pig genome sequencing is currently funded to 3-4x coverage from mapped clones, with two chromosomes at higher coverage. Experience from other low coverage genomes, such as cow, rabbit and armadillo is that this coverage will minimally provide an effective representation of exons, which can then be assembled into genes using a guide genome. By definition this approach cannot resolve lineage specific expansions in the pig genome. However, with this more clone based strategy there will be new opportunities for combining both assembly and annotation strategies to leverage more information out of a 3x assembly. We will integrate the pig genome sequence with diverse pre-existing data sets, including SNPs, ESTs and quantitative trait loci (QTL). We will integrate the sequence with maps (genetic, physical) and physical resources (clones, microarrays) providing a seamless route for interrogation and development of experimentation tools. Finally computational approaches, integrating the above resources and also leveraging the comparative genomics potential in the mammalian clade will be used to analyse and present the genome in a user friendly format. An annotated pig genome sequence will dramatically accelerate research on the pig as an important animal for agriculture and human biology. Our aim is to make the pig genome sequence maximally useful by delivering an annotated sequence of the highest quality in a user friendly manner.

Layman's description

We propose to provide state of the art analysis and annotation of the pig genome sequence being generated by the International Pig Genome Sequencing Project. We will make the annotated genome sequence accessible on the Web through the Ensembl site at http://www.ensembl.org .



The pig genome is the entire DNA sequence of the pig which defines all the biological molecules that make up a pig. By acquiring, managing and annotating the pig genome sequence one accelerates research for both pig biology and for mammalian biology.



Impact on pig biology: Because of the extensive selective breeding which has occurred during domestication, there are a considerable number of breed or line-specific features, from fat/muscle ratios, litter size to skin colour. These features can be mapped genetically into broad regions of the genome, but the final identification of the genes responsible and the causal genetic variation is very complex. The availability of a well-annotated pig genome sequence with links to other data sources, especially those on phenotypes such as growth, carcass composition or responses to infectious disease would provide a dramatic boost to the identification of these causative genes.



Impact on Human biology: The pig genome, as with all mammals has diverged relatively recently from the human lineage. This allows us to look for the effect of evolution in the genome. In comparing genome sequences both the similarities and the differences are informative. Sequences that are conserved across multiple species probably represent essential coding or regulatory sequences. Sequences that differ across species and show evidence of rapid evolutionary change can be important determinants of species survival, including reproductive fitness and the ability to respond to infectious disease. Thus, the comparative genome sequence analyses that we will perform will help us understand mammalian - and hence human - biology, including disease processes.

Key findings

1. Assembly the low coverage pig genome sequence to provide an effective platform for annotation and research

The current assembly (Sscrofa9) was generated from a data freeze in Apr’09 by the Sanger partner and has proved to be an effective platform for annotation and research. A new assembly (Sscrofa10) which will form the basis of the pig genome sequence paper will integrate Illumina short read data (~30x coverage) to improve contiguity and coverage – due Sept 2010.



2. Annotate the pig genome sequence to provide high quality protein coding gene set and high quality RNA gene set

In addition to the annotation of the SScrofa9 assembly using the Ensembl automatic gene prediction pipelines manual annotation has also been undertaken. Sets of both protein coding and non coding RNA genes have been generated. The manual annotation is accessible as a DAS track.



3. Compute comparative genomics alignments between pig and the major mammals (human, mouse, cow) and other key vertebrates

See reports of Ensembl partners at Sanger and EBI.



4. Associate pig specific resources, including QTLs, microarray probes, SNPs and clones with both the genome and genes

Cross references to the Affymetrix Porcine micro array have been generated. SNPs from dbSNP have been remapped from the NCBI Sus scrofa build 1.1 to make them available.

The sequences of the SNPs on the Illumina Porcine 60K SNP chip have been mapped to the Sscrofa9 assembly and are accessible as a DAS track.



5. Provide a user friendly web site of this information

While the sequencing was in progress, Ensembl Pre sites containing alignments of proteins from other species were made available. Once the gene set was completed on the Sscrofa9 assembly, pig was made available in the main Ensembl site (http://www.ensembl.org/Sus_scrofa). This provides detailed genomic location, gene, comparative genomic and variation displays of the data available for pig.

Training workshops (2 in the UK at Hinxton and 1 in the US) were organized for manual annotation using the Otterlace system.

Research outputs