Computational comparison of human genomic sequence assemblies for a region of chromosome 4

Research output: Contribution to journalArticlepeer-review


Much of the available human genomic sequence data exist in a fragmentary draft state following the completion of the initial high-volume sequencing performed by the International Human Genome Sequencing Consortium (IHGSC) and Celera Genomics (CG). We compared six draft genome assemblies over a region of chromosome 4p (D4S394-D4S403), two consecutive releases by the IHGSC at University of California, Santa Cruz (UCSC), two consecutive releases from the National Centre for Biotechnology Information (NCBI), the public release from CG, and a hybrid assembly we have produced using IHGSC and CG sequence data. This region presents particular problems for genomic sequence assembly algorithms as it contains a large tandem repeat and is sparsely covered by draft sequences. The six assemblies differed both in terms of their relative coverage of sequence data from the region and in their estimated rates of misassembly. The CG assembly method attained the lowest level of misassembly, whereas NCBI and UCSC assemblies had the highest levels of coverage. All assemblies examined included
Original languageEnglish
Pages (from-to)424-9
Number of pages6
JournalGenome Research
Issue number3
Publication statusPublished - 2002


  • Chromosomes, Human, Pair 4
  • Computational Biology
  • Contig Mapping
  • Genetic Markers
  • Human Genome Project
  • Humans


Dive into the research topics of 'Computational comparison of human genomic sequence assemblies for a region of chromosome 4'. Together they form a unique fingerprint.

Cite this