Haplotype-Phased de novo Genome Assembly of the Marine Annelid Capitella teleta Using a Three-Generation Long-Read Binning Approach


Meeting Abstract

P3-85  Monday, Jan. 6  Haplotype-Phased de novo Genome Assembly of the Marine Annelid Capitella teleta Using a Three-Generation Long-Read Binning Approach GONZALEZ, P*; SEAVER, EC; BAXEVANIS, AD; NHGRI/NIH; University of Florida; NHGRI/NIH paul.gonzalez@nih.gov

Sequencing the genome of organisms that combine high levels of heterozygosity with small body size is challenging and, as a result, the genomic diversity of many branches of the metazoan tree remains unexplored. Heterozygosity is problematic because sequence variation between alleles must be reconciled into a single consensus sequence, often resulting in spurious “mosaic” sequences that do not actually exist in nature. In small organisms, this problem is further amplified, as many individuals must be pooled in order to obtain sufficient DNA for sequencing. This issue is widespread, particularly among marine invertebrates that typically have body sizes in the millimeter range as well as large effective population sizes, resulting in high levels of heterozygosity. Here, we present a new sequencing strategy based on the trio binning method that solves these issues, applying this method towards the sequencing and assembly of the genome of the marine annelid worm Capitella teleta. We have sequenced a pooled sample of siblings using long-read PacBio SMRT sequencing. We then used Illumina short-read data obtained from their four grand-parents to identify sequences that were specifically inherited from each of the four parental haplotypes present in the sibling pool sample. Finally, we assembled each of these four haplotypes separately prior to scaffolding using Hi-C data. While this method is limited to organisms where sampling of a single family over three generations is feasible, it has the potential to greatly expand the range of species for which a highly contiguous and accurate genome assembly can be obtained, a prerequisite for future comparative genomics studies.

the Society for
Integrative &
Comparative
Biology