Edinburgh Research Explorer

Pedigree reconstruction from SNP data: Parentage assignment, sibship clustering, and beyond.

Research output: Contribution to journalArticle

Related Edinburgh Organisations

Open Access permissions



  • Download as Adobe PDF

    Final published version, 686 KB, PDF document

    Licence: Creative Commons: Attribution (CC-BY)

Original languageEnglish
JournalMolecular Ecology Resources
Early online date7 Mar 2017
Publication statusPublished - 6 Apr 2017


Data on hundreds or thousands of Single Nucleotide Polymorphisms (SNPs) provides detailed information about the relationships between individuals, but currently few tools can turn this information into a multi-generational pedigree. I present the R package sequoia, which assigns parents, clusters half-siblings sharing an unsampled parent, and assigns grandparents to half-sibships. Assignments are made after consideration of the likelihoods of all possible first, second and third degree relationships between the focal individuals, as well as the traditional alternative of being unrelated. This careful exploration of the local likelihood surface is implemented in a fast, heuristic hill-climbing algorithm. Distinction between the various categories of second degree relatives is possible when likelihoods are calculated conditional on at least one parent of each focal individual. Performance was tested on simulated datasets with realistic genotyping error rate and missingness, based on three different large pedigrees (N = 1000 { 2000). This included a complex pedigree with overlapping generations, occasional close inbreeding and some unknown birth years. Parentage assignment was highly accurate down to about 100 independent SNPs (error rate < 0:1%), and fast (< 1 minute) as most pairs can be excluded from being parent-offspring based on opposite homozygosity. For full pedigree reconstruction, 40% of parents were assumed non-genotyped. Reconstruction resulted in low error rates (< 0:3%), high assignment rates (> 99%) in limited computation time (typically < 1 hour) when at least 200 independent SNPs were used. In three empirical datasets, relatedness estimated from the inferred pedigree was strongly correlated to genomic relatedness.

    Research areas

  • Pedigree, Single Nucleotide Polymorphism, parentage assignment, sibship clustering, Sequoia

Download statistics

No data available

ID: 31968544