DupMasker: A tool for annotating primate segmental duplications

  1. Zhaoshi Jiang1,
  2. Robert Hubley2,
  3. Arian Smit2, and
  4. Evan E. Eichler1,3,4
  1. 1 Department of Genome Sciences, University of Washington School of Medicine, Seattle, Washington 98195, USA;
  2. 2 Institute for Systems Biology, Seattle, Washington 98103, USA;
  3. 3 Howard Hughes Medical Institute, Seattle, Washington 98195, USA

Abstract

Segmental duplications (SDs) play an important role in genome rearrangement, evolution, and the copy-number variation (CNV) of primate genomes. Such sequences are difficult to detect, a priori, because they share no defining sequence features that distinguish them from unique portions of the genome. Current sequence annotation of segmental duplications requires computationally intensive, genome-wide self-comparisons that cannot be easily implemented on new data sets. Based on the successful implementation of RepeatMasker, we developed a new genome annotation tool, DupMasker. The program uses a library of nonredundant consensus sequences of human segmental duplications, wherein a majority of the ancestral origins have been determined based on comparisons to mammalian outgroup genomes. Using DupMasker, new human and nonhuman primate (NHP) sequences may be readily queried to provide details on the origin and degree of sequence identity of each duplicon. This program can be applied to delineate the order and orientation of duplicons within complex duplication blocks and used to characterize structural variation differences between sequenced human haplotypes. We predict this tool will be valuable in the annotation of large-insert sequence clones, allowing putative unique and duplicated regions of the genomes to be annotated prior to whole genome assembly comparisons.

Footnotes

| Table of Contents

Preprint Server