A BLAST-free clustering method for classifying orthogroups


Meeting Abstract

32-5  Monday, Jan. 4 14:30  A BLAST-free clustering method for classifying orthogroups BOND, S.R.*; BAXEVANIS, A.D.; National Human Genome Research Institute, NIH; National Human Genome Research Institute, NIH steve.bond@nih.gov https://github.com/biologyguy

Inferred orthology (i.e., homology via speciation events) between or among genes is commonly used as a predictor of gene product function. Orthology is also a crucial consideration when classifying genes coherently and consistently across taxa, but the granularity of many popular ortholog prediction tools can be too coarse to properly resolve multiple clusters of closely related sequences in large gene families. Thus, classification is often at the discretion of curators following manual inspection of gene trees. In this work, we present a new effort to automate the classification of orthogroups from predefined sets of homologous sequences. In contrast to common ortholog prediction methods, AlignMe scores have replaced BLASTP E-values as the similarity metric between pairs of sequences. This provides a more refined input for Markov clustering (MCL), which is a popular method for grouping genes into orthogroups via weighted random walks through an all-by-all similarity graph. An issue with MCL, however, is its sensitivity to user-defined parameters. It is difficult to know a priori which parameters to apply and, if different groups of genes have undergone varying degrees of evolution, it may not be possible to select appropriate parameters for the entire dataset. To overcome this we have devised an MCL scoring method, which allows parameter optimization. Furthermore, recursive analysis of clusters by subsequent rounds of parameter optimization accounts for varying evolutionary rates among groups of genes. Our new approach has been named Recursive Dynamic Markov Clustering (RD-MCL), and it shows improved performance over established methods. RD-MCL will be of particular interest to those studying gene family expansion, as it provides an easy and objective mechanism for classifying likely orthogroups.

the Society for
Integrative &
Comparative
Biology