MBE Advance Access originally published online on March 2, 2005
Molecular Biology and Evolution 2005 22(5):1161-1164; doi:10.1093/molbev/msi123
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
Letter |
Likelihood, Parsimony, and Heterogeneous Evolution


* Department of Mathematics and Statistics and
Department of Biochemistry and Molecular Biology, Dalhousie University, Halifax, Nova Scotia, Canada
E-mail: matts{at}mathstat.dal.ca.
| Abstract |
|---|
|
|
|---|
Evolutionary rates vary among sites and across the phylogenetic tree (heterotachy). A recent analysis suggested that parsimony can be better than standard likelihood at recovering the true tree given heterotachy. The authors recommended that results from parsimony, which they consider to be nonparametric, be reported alongside likelihood results. They also proposed a mixture model, which was inconsistent but better than either parsimony or standard likelihood under heterotachy. We show that their main conclusion is limited to a special case for the type of model they study. Their mixture model was inconsistent because it was incorrectly implemented. A useful nonparametric model should perform well over a wide range of possible evolutionary models, but parsimony does not have this property. Likelihood-based methods are therefore the best way to deal with heterotachy.
Key Words: Heterotachy mixture models likelihood consistency simulation
| Introduction |
|---|
|
|
|---|
Heterotachy is a general term for within-site rate variation over time (Lopez, Casane, and Philippe 2002
K&T studied four-taxon trees with two long and two short terminal edges in each partition. In this setting of simple heterotachy models, there are 6 ways to assign two long and two short terminal edges on a labeled four-taxon tree and 15 combinations of two different edge-length partitions. K&T described one such combination (patterns 1 and 5 in fig. 1). Over all combinations (fig. 1), there are nine where both standard likelihood and parsimony perform well. In two cases, both methods perform poorly, but parsimony does slightly better. In four cases, likelihood does better by roughly the same margin. Therefore, likelihood is as good as or better than parsimony in the majority of combinations for the type of mixture model studied by K&T.
|
K&T simulated evolution using the Jukes-Cantor model (Swofford et al. 1996
) model (Swofford et al. 1996
(fig. 3b), performance differences are negligible and statistically insignificant.
|
|
K&T attribute the poor performance of likelihood-based methods to the nonidentical pattern distribution resulting from assigning edge-length partitions to sites. This attribution is misleading. Edge-length partitions were assigned to sites in a deterministic fashion (edge-length partition b1 to the first half of sites, b2 to the rest), but a randomly selected site is equally likely to have come from either partition. Thus, an appropriate marginal distribution model at a site is a mixture model that assigns probabilities to partitions.
K&T deserve credit for proposing a mixture model Bayesian Markov Chain Monte Carlo with heterotachy (BMCMChetero) that improved on standard likelihood and parsimony methods. K&T weighted likelihood contributions by the posterior probability that the site was in the partition. In their model, the likelihood contribution for pattern xi at site i was
![]() | (1) |
i,1 is the posterior probability
that b1 is the edge-length partition for i. However, this model remained inconsistent. K&T therefore claim that "violating the identical distribution assumption can cause inconsistency, even when the true evolutionary model is used."
This is false. K&T's model is not a correct likelihood model. The likelihood for the parameters should be the probability of the data given these parameters, so the likelihood contribution for a site is the marginal probability of pattern xi at i
![]() | (2) |
is the probability that a randomly selected site has edge-length partition b1 (constant across sites). The overall likelihood is obtained by multiplying equation (2) over i. This method (fig. 4) performs almost as well as the best-possible case, in which the site partitions and edge-length parameters are known a priori (MLtrue and BMCMCtrue, Kolaczkowski and Thornton 2004
|
In the case that sites are independent and have identical distributions, maximum likelihood (ML) estimation will be consistent provided the mixed model satisfies the identifiability condition that incorrect trees do not give the same probabilities of site patterns as the true tree (Chang 1996
![]() | (3) |
converge upon their true values. If the identifiability condition holds, this is the only way it can happen, and ML is consistent. If instead, a set of trees
gives the same probabilities for all patterns, the only inference about the tree that could ever be drawn from sequence data is that it is in
Because the limiting likelihoods are maximized by the true pattern probabilities, statistical tests would be able to make this inference.
K&T state that "non-parametric statistical methods are often applied when the assumptions of parametric techniques are violated." (see also Sanderson and Kim 2000
). This is true, but most such methods perform well under almost all parametric assumptions. Simply not requiring a parametric model is not a sufficient criterion for a satisfactory nonparametric method. For example, 0.2 is an estimator of the mean of a distribution, requiring no parametric assumptions. If the true mean is 0.2, the estimator 0.2 will be unbeatable. With small samples, this estimator will also do well for true means close to 0.2. Nevertheless, it will often do very badly. Parsimony performs badly in many cases (e.g., Felsenstein 2004
, pp. 107121). Thus, finding particular situations in which it does less badly than other methods is not a recommendation for its general use.
Likelihood methods allow comparisons of different models. For example, likelihood ratio tests show that for ribosomal RNA and protein-coding genes, covarion models are better descriptions of the data than models in which rates at sites are constant over time (Galtier 2001
; Huelsenbeck 2002
). Similar analyses may allow us to test more sophisticated models of heterotachy. In contrast, because parsimony does not use explicit models, it cannot answer mechanistic questions of this kind.
In summary, more thorough explorations of edge-length partition combinations and evolutionary models show that under heterotachy, standard likelihood outperforms parsimony overall. The exceptions occur in special cases with oversimplified models, where both methods perform poorly but parsimony is the least bad. Correct likelihood implementation of heterotachy models is the most promising approach.
| Methods |
|---|
|
|
|---|
We used Seq-Gen 1.3 (Rambaut and Grassly 1997
, the transition-transversion ratio was 2, and we used a continuous gamma distribution with shape parameter 1. We used PAUP* 4 beta 10 for UNIX (Swofford 2003
(with continuous gamma-rate variation), we fitted K2P with a four-category discrete gamma approximation (shape parameter and transition-transversion ratio estimated from the data). For the mixed model, we maximized the likelihood (eq. 2) for each tree t over edge-length sets b1, b2 and mixing probability
. We used the general constrained optimization algorithm VE11 with default settings, available from the Harwell Subroutine Library Archive (http://hsl.rl.ac.uk/archive/hslarchive). Edge lengths were constrained to be nonnegative, and
was constrained to be between 0 and 1. | Acknowledgements |
|---|
|
|
|---|
This work was supported by the Genome Atlantic/Genome Canada Prokaryotic Evolution and Diversity Project. We are very grateful to Bryan Kolaczkowski and Joe Thornton for extensive discussions of their work. We are also grateful to David Bryant, Peter Cordes, Chris Field, Yuji Inagaki, Jessica Leigh, Hervé Philippe, and Alastair Simpson for help and comments and to two anonymous referees for constructive criticism.
| Footnotes |
|---|
Peter Lockhart, Associate Editor
| References |
|---|
|
|
|---|
Ané, C., J. G. Burleigh, M. M. McMahon, and M. J. Sanderson. 2005. Covarion structure in plastid genome evolution: a new statistical test. Mol. Biol. Evol. (in press).
Chang, J. T. 1996. Full reconstruction of Markov models on evolutionary trees: identifiability and consistency. Math. Biosci. 137:5173.[CrossRef][ISI][Medline]
Felsenstein, J. 2004. Inferring phylogenies. Sinauer Associates, Sunderland, Mass.
Fitch, W. M., and E. Markowitz. 1970. An improved method for determining codon variability in a gene and its application to the rate of fixation of mutations in evolution. Biochem. Genet. 4:579593.[CrossRef][ISI][Medline]
Galtier, N. 2001. Maximum-likelihood phylogenetic analysis under a covarion-like model. Mol. Biol. Evol. 18:866873.
Huelsenbeck, J. P. 2002. Testing a covariotide model of DNA substitution. Mol. Biol. Evol. 19:698707.
Inagaki, Y., E. Susko, N. M. Fast, and A. J. Roger. 2004. Covarion shifts cause a long-branch attraction artifact that unites microsporidia and archaebacteria in EF1-
phylogenies. Mol. Biol. Evol. 21:13401349.
Kolaczkowski, B., and J. W. Thornton. 2004. Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous. Nature 431:980984.[CrossRef][Medline]
Lockhart, P. J., M. A. Steel, A. C. Barbrook, D. H. Huson, M. A. Charleston, and C. J. Howe. 1998. A covariotide model explains apparent phylogenetic structure of oxygenic photosynthetic lineages. Mol. Biol. Evol. 15:11831188.[Abstract]
Lopez, P., D. Casane, and H. Philippe. 2002. Heterotachy, an important process of protein evolution. Mol. Biol. Evol. 19:17.
Pagel, M., and A. Meade. 2004. A phylogenetic mixture model for detecting pattern-heterogeneity in gene sequence or character state data. Syst. Biol. 53:571581.[CrossRef][ISI][Medline]
Rambaut, A., and N. C. Grassly. 1997. Seq-Gen: an application for the Monte Carlo simulation of DNA sequence evolution along phylogenetic trees. Comput. Appl. Biosci. 13:235238.
Sanderson, M. J., and J. Kim. 2000. Parametric phylogenetics? Syst. Biol. 49:817829.[CrossRef][ISI][Medline]
Steel, M., D. Huson, and P. J. Lockhart. 2000. Invariable sites models and their use in phylogeny reconstruction. Syst. Biol. 49:225232.[CrossRef][ISI][Medline]
Steel, M. A., and L. A. Székely. 2002. Inverting random functions II: explicit bounds for discrete maximum likelihood estimation, with applications. SIAM J. Discrete Math. 15:562575.[CrossRef]
Susko, E., Y. Inagaki, and A. J. Roger. 2004. On inconsistency of the neighbor-joining, least squares, and minimum evolution estimation when substitution processes are incorrectly modeled. Mol. Biol. Evol. 21:16291642.
Swofford, D. L. 2003. PAUP*: phylogenetic analysis using parsimony (*and other methods). Version 4 beta 10. Sinauer Associates, Sunderland, Mass.
Swofford, D. L., G. J. Olsen, P. J. Waddell, and D. M. Hillis. 1996. Phylogenetic inference. Pp. 407514 in D. Hillis, C. Moritz, and B. Mable, eds. Molecular systematics. 2nd edition. Sinauer Associates, Sunderland, Mass.
Uzzell, T., and K. W. Corbin. 1971. Fitting discrete probability distributions to evolutionary events. Science 172:10891096.
![]()
CiteULike
Connotea
Del.icio.us What's this?
This article has been cited by other articles:
![]() |
B. Kolaczkowski and J. W. Thornton A Mixed Branch Length Model of Heterotachy Improves Phylogenetic Accuracy Mol. Biol. Evol., June 1, 2008; 25(6): 1054 - 1066. [Abstract] [Full Text] [PDF] |
||||
![]() |
H.-C. Wang, M. Spencer, E. Susko, and A. J. Roger Testing for Covarion-like Evolution in Protein Sequences Mol. Biol. Evol., January 1, 2007; 24(1): 294 - 305. [Abstract] [Full Text] [PDF] |
||||
![]() |
G. Baele, J. Raes, Y. Van de Peer, and S. Vansteelandt An Improved Statistical Method for Detecting Heterotachy in Nucleotide Sequences Mol. Biol. Evol., July 1, 2006; 23(7): 1397 - 1405. [Abstract] [Full Text] [PDF] |
||||
![]() |
M. Turmel, C. Otis, and C. Lemieux The Chloroplast Genome Sequence of Chara vulgaris Sheds New Light into the Closest Green Algal Relatives of Land Plants Mol. Biol. Evol., June 1, 2006; 23(6): 1324 - 1338. [Abstract] [Full Text] [PDF] |
||||
![]() |
D. H. Huson and D. Bryant Application of Phylogenetic Networks in Evolutionary Studies Mol. Biol. Evol., February 1, 2006; 23(2): 254 - 267. [Abstract] [Full Text] [PDF] |
||||
![]() |
P. Lockhart, P. Novis, B. G. Milligan, J. Riden, A. Rambaut, and T. Larkum Heterotachy and Tree Building: A Case Study with Plastids and Eubacteria Mol. Biol. Evol., January 1, 2006; 23(1): 40 - 45. [Abstract] [Full Text] [PDF] |
||||
![]() |
S. R. Gadagkar and S. Kumar Maximum Likelihood Outperforms Maximum Parsimony Even When Evolutionary Rates Are Heterotachous Mol. Biol. Evol., November 1, 2005; 22(11): 2139 - 2141. [Abstract] [Full Text] [PDF] |
||||
![]() |
J. Leebens-Mack, L. A. Raubeson, L. Cui, J. V. Kuehl, M. H. Fourcade, T. W. Chumley, J. L. Boore, R. K. Jansen, and C. W. dePamphilis Identifying the Basal Angiosperm Node in Chloroplast Genome Phylogenies: Sampling One's Way Out of the Felsenstein Zone Mol. Biol. Evol., October 1, 2005; 22(10): 1948 - 1963. [Abstract] [Full Text] [PDF] |
||||
| ||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||







