Meeting Abstract
Building accurate phylogenies is the first step in understanding the evolution of complexity and novelty. Genome-scale datasets leverage great power to test phylogenetic hypotheses, but questions remain about best practices, particularly when it comes to using whole transcriptome assemblies derived from heterogeneous sources. While some aspects of whole transcriptome sequencing and assembly have been empirically evaluated (e.g. depth of sequencing, trimming of reads), there are no current data on how user defined aspects of the assembly process can influence phylogenetically relevant factors including orthogroup composition, branch length estimation and topology. Here we construct comprehensive phylogenomic datasets derived from transcriptomes assembled using the Oyster River Protocol, a multi-assembler/kmer approach. This method allows us to create datasets of both good and poor quality and use them to test the effects of assembly quality on phylogenomic reconstruction. We find that good quality transcriptomes produce richer phylogenomic datasets with many more viable partitions than poor quality transcriptome assemblies. This difference in data richness produces pronounced topological artifacts in the poor vs. good datasets and has the potential to affect any downstream analyses or inferences based on the tree itself. Our findings demonstrate the importance of sound transcriptome assembly techniques in phylogenomic analyses, and suggest best practices for building accurate phylogenies.