You Aren’t What You Eat The Impact of Sequence Contamination on Phylogenomics of Blood Flukes (Platyhelminthes; Schistosomatoidea)


Meeting Abstract

P3-93  Monday, Jan. 6  You Aren’t What You Eat: The Impact of Sequence Contamination on Phylogenomics of Blood Flukes (Platyhelminthes; Schistosomatoidea) WAITS, DS*; RIBEIRO, R; KOCOT, KM; BULLARD, SA; HALANYCH, KM; Auburn University, Auburn, AL; University of Alabama, Tuscaloosa, AL; Auburn University, Auburn, AL; Auburn University, Auburn, AL dsw0002@auburn.edu

Sequence contamination occurs regularly in high-throughput sequencing. Whether due to mistakes at the bench or intrinsic to the samples, contamination has the potential to not only waste sequencing effort but also lead investigators to erroneous conclusions. Given that most organisms have high amounts of foreign DNA and the prevalence of high-throughput sequence data, contamination-screening approaches are a necessary part of any bioinformatics pipeline. In pipelines used for phylogenomics, contamination has the potential to introduce confounding signal (“noise”), especially when determining orthologous groups of genes. Blood flukes are ideal candidates to elucidate the impact of contamination since they mature in the blood of vertebrates and probably ingest erythrocytes and plasma components. Here we investigated how post assembly contamination screening approaches affected the final topology of a phylogenomic analysis. We employed a standard phylogenomic pipeline on 34 blood fluke transcriptomes, which resulted in 761 orthology groups (OGs). We used two contamination-screening methods that were based on BLAST sequence comparisons (with varying levels of stringency), resulting in ten datasets that differed in OG content. These ten datasets were additionally pruned of paralogs, yeilding 17-308 final OGs. Six tree topologies were inferred from these datasets, but differences were minor. Interestingly, the two least stringent and the second-most stringent datasets resulted in identical topologies. Our results suggested that standard paralog detection is sufficient for contamination screening in phylogenomic pipelines.

the Society for
Integrative &
Comparative
Biology