RNAseq on draft genomes; perils and pitfalls


Meeting Abstract

S4-1.1  Saturday, Jan. 5  RNAseq on draft genomes; perils and pitfalls JONES, Corbin; University of North Carolina cdjones@email.unc.edu

High throughput genomic sequencing is revolutionizing biological research and is rapidly expanding the number of organisms with genomic and trancriptomic (RNAseq) data. These new sequencing technologies produce large numbers of short (<100 bp) reads, which are best suited for assembling unique regions of a genome or transcriptome. These short reads also have inherent technical weaknesses. Short reads perform poorly when assembling repetitive regions of the genome and are problematic when measuring gene expression across members of gene families. These problems are compounded when short reads are used to assemble a genome, annotate genes within that genome, and measure the expression of the genes within that genome. Using a combination of synthetic and experimental data, we illustrate some common pitfalls of measuring the transcriptome using a draft genome. Not surprisingly, gene families are particularly problematic. Our data also suggest that isoform prediction – one of the strengths of RNAseq over microarrays – can be erroneous when applied to draft genomes. Based on these data, we define a set of “good practices” that can improve the quality of inference from RNAseq experiments applied to draft genomes. However, polishing and closing of draft genomes will ultimately be the critical step to preparing them for highly accurate RNAseq analysis.

the Society for
Integrative &
Comparative
Biology