Gene duplications and phylogenetic inference of vertebrates the perils of paralogy

MARTIN, A. P. ; University of Colorado: Gene duplications and phylogenetic inference of vertebrates: the perils of paralogy

Genes are often members of multigene families. Members of multigene families are related by descent. The size of gene families depends on the rate of birth and death of individual genes. Genes are born by gene duplication and are lost by either deletion or through the accumulation of mutations. Comparative analyses of metazoan genes revealed rates of gene duplication can be on the order of the average gene substitution rate (10-6 -10-7 per gene per generation). In addition, duplicate genes can be retained for relatively long periods of time (i.e. millions of years). The continuous remodeling of gene families through the birth and death of genes has important implications for organismal phylogenetic inference based on gene trees. In particular, phylogenetic analysis of nuclear genes may suffer from hidden paralogy–namely, that genes sampled from different taxa are related through gene duplication, not speciation. Paralogy of sampled genes will cause an overestimation of the divergence time between species and can confound accurate inference of relationships among species. The problem of hidden paralogy is illustrated with the HSP70 gene from sharks of the order Lamniformes. Because HSP70 has been widely used for inferring organismal phylogenies, the results have implications for published trees. Moreover, the results from the HSP70 analysis underscore the distinction between gene and species trees and highlight an under-appreciated source of discordance between gene trees and organismal phylogeny; namely, that due to unrecognized paralogy of sampled genes.

the Society for
Integrative &
Comparative
Biology