Meeting Abstract
Dayhoff, JTT, and LG matrices are 20-state amino acid replacement models used to score amino acid substitutions in phylogenetic analyses. Recently, recoding amino acid matrices into six groups based on substitution frequency in these models has been proposed as a solution to problems associated with substitution saturation and compositional heterogeneity in phylogenetic analyses. While these strategies have some appeal from a theoretical perspective, they have never been empirically tested. To test the performance of Dayhoff-6 and S&R-6 recoding, we used simulations to determine if recoding is truly appropriate to address saturation and compositional heterogeneity. If recoding is appropriate, the expectation is that as saturation or compositional heterogeneity levels increase, recoded matrices should outperform non-recoded datasets. On two separate trees that include a wide range of animals and a few closely related outgroups, we simulate 1,000 datasets of 1,000 amino-acids under the Dayhoff and JTT models and increase branch lengths from 1 to 20 in increments of 1. We show that this increase in branch lengths corresponds with saturation. For each dataset, we reconstruct trees using both recoded and non-recoded models. In both cases, trees produced under recoding strategies were consistently suboptimal to those produced under non-recoded matrices when comparing Robinson-Foulds distances. Similar simulations to test compositional heterogeneity are ongoing. Our preliminary results suggest that these flavors of recoding do not improve the accuracy of phylogenetic reconstruction and that results based on these schemes should be reevaluated.