Meeting Abstract
Efforts to sequence and assemble the genomes of heterozygous organisms have proven difficult with short read approaches. Repetitive DNA structures, structural variation between haplotypes and large genome sizes are limiting factors to achieving highly contiguous assemblies. Other factors paramount to generating a high quality reference are the quality and size distribution of the starting genomic DNA (gDNA) which is often difficult to obtain for non-model organisms due to co-purification of metabolites. Here we demonstrate the utility of long DNA reads to generate a high quality de novo reference sequence from a sperm sample isolated from an individual of a highly polymorphic species. High quality gDNA isolated from hemichordate sperm circumvented many common DNA isolation pitfalls. Extracted gDNA was used to generate large insert (>30 kb) libraries for subsequent SMRT Sequencing. Using Pacific Biosciences’ Sequel System, the DNA was sequenced and a genome was assembled of approximately 1.6Gb with a contig N50 of ~739Kb. After 2 rounds of polishing with the Arrow consensus calling algorithm, 949 out of 978 (97%) BUSCO orthologs were detected, with 693 (70.9%) of them detected in duplicate, indicating assembly and resolution of different haplotypes in the primary contigs. Animals in the phylum Hemichordata have provided key understanding of the origins of the vertebrate body plan. Here we present the highly contiguous de novo assembly and preliminary annotation of an indirect developing hemichordate genome, Schizocardium californicum.