Ancestral reconstruction

This is a draft in progress that has not yet been submitted to PLoS Computational Biology.

Jeffrey B. Joy1, Rosemary M. McCloskey1, Richard H. Liang1, Art F. Y. Poon1

1BC Centre for Excellence in HIV/AIDS, Vancouver, Canada.

Ancestral reconstruction is the extrapolation back in time from measured characteristics of individuals (or populations) to their common ancestors. It is an important application of phylogenetics, the reconstruction and study of the evolutionary relationships among individuals, populations or species to their ancestors. Ancestral reconstruction can be used to recover different kinds of ancestral character states, including: the genetic sequence (ancestral sequence reconstruction); amino acid sequence of a protein; the composition of a genome, e.g., gene order; a measurable characteristic of an organism (phenotype); and the geographic range of an ancestral population or species (ancestral range reconstruction).

Ancestral reconstruction relies on a sufficiently realistic model of evolution to accurately recover ancestral states. No matter how well the model approximates the actual evolutionary history, however, one's ability to accurately reconstruct an ancestor deteriorates with increasing evolutionary time between that ancestor and its observed descendants. Additionally, more realistic models of evolution are inevitably more complex and difficult to calculate. Progress in the field of ancestral reconstruction has relied heavily on the exponential growth of computing power and the concomitant development of efficient computational algorithms (e.g., ). Methods of ancestral reconstruction are often applied to a given phylogenetic tree that has already been inferred from the same data. While convenient, this approach has the disadvantage that its results are contingent on the accuracy of a single phylogenetic tree. In contrast, some researchers advocate a more computationally-intensive Bayesian approach that accounts for the uncertainty in tree reconstruction by evaluating ancestral reconstructions over many trees.

History
The concept of ancestral reconstruction is often credited to Emile Zuckerkandl and Linus Pauling. Motivated by the development of techniques for determining the primary (amino acid) sequence of proteins by Frederick Sanger in 1955, Zuckerkandl and Pauling postulated that such sequences could be used to infer not only the phylogeny relating the observed protein sequences, but also the ancestral protein sequence at the earliest point (root) of this tree. However, the idea of reconstructing ancestors from measurable biological characteristics had already been developing in the field of cladistics, one of the precursors of modern phylogenetics. Cladistic methods, which appeared as early as 1901, infer the evolutionary relationships of species on the basis of the distribution of shared characteristics, of which some are inferred to be descended from common ancestors.

Methods and algorithms
Any attempt at ancestral reconstruction begins with a phylogeny. In general, a phylogeny is a tree-based hypothesis about the order in which populations (referred to as taxa) are related by descent from common ancestors. Observed taxa are represented by the tips or terminal nodes of the tree that are progressively connected by branches to their common ancestors, which are represented by the branching points of the tree that are usually referred to as the ancestral or internal nodes. Eventually, all lineages converge to the most recent common ancestor of the entire sample of taxa. In the context of ancestral reconstruction, a phylogeny is often treated as though it were a known quantity (with Bayesian approaches being an important exception). Because there can be an enormous number of phylogenies that are nearly equally effective at explaining the data, reducing the subset of phylogenies supported by the data to a single representative, or point estimate, can be a convenient and sometimes necessary simplifying assumption.

Ancestral reconstruction can be thought of as the direct result of applying a hypothetical model of evolution to a given phylogeny. When the model contains one or more free parameters, the overall objective is to estimate these parameters on the basis of measured characteristics among the observed taxa (sequences) that descended from common ancestors. Parsimony is an important exception to this paradigm, since the underlying model has no free parameters.

Maximum parsimony
Parsimony, known colloquially as "Occam's razor", refers to the principle of selecting the simplest of competing hypotheses. In the context of ancestral reconstruction, parsimony endeavours to find the distribution of ancestral states within a given tree which minimizes the total number of character state changes that would be necessary to explain the states observed at the tips of the tree. This method of maximum parsimony, is one of the earliest formalized algorithms for reconstructing ancestral states.

Maximum parsimony can be implemented by one of several algorithms. One of the earliest examples is Fitch's method, which assigns ancestral character states by parsimony via two traversals of a rooted binary tree. The first stage is a post-order traversal proceeds from the tips toward the root of a tree by visiting child nodes before parents. Initially, we are determining the set of possible character states Si to the i-th ancestor based on the observed character states of their descendants. Each assignment is the set intersection of the character states of the ancestor's descendants; if the intersection is the empty set, then it is the set union. In the latter case, it is implied that a character state change has occurred between the ancestor and one of its two immediate descendants. Each such event counts towards the algorithm's cost function which may be used to discriminate among alternative trees on the basis of maximum parsimony. Next, we perform a pre-order traversal of the tree, proceeding from the root towards the tips. Character states are then assigned to each descendant based on the intersection of its set of possible character states and the character state of its parent. Since the root has no parent node, one may be required to select a character state arbitrarily, specifically when more than one possible state has been reconstructed at the root.

For example, consider the three species "human", "cat", and "dog", and the characteristic "number of legs". One possible scenario giving rise to these three species is that a common ancestor, "A", possessing two legs, diverged into the species "cat" and another species, "B", which also had two legs. Later, "B" diverged into "human" and "dog". This scenario involves a total of two evolutionary changes in the characteristic "number of legs" -- one on the branch from "A" to "cat", and one on the branch from "B" to "dog". An alternative scenario swaps the positions of "cat" and "human", and supposes "B" had four legs. This reduces the total number of evolutionary changes to one -- namely, the transition from four legs to two along the branch from "A" to "B". This scenario is more parsimonious than the first, since it involves fewer evolutionary changes. Note, however, that parsimony does not tell us whether "A" had two or four legs, as both possibilities involve the same number of changes.

Fitch's method assumes that changes between all character states are equally likely to occur; thus, any change incurs the same cost for a given tree. This assumption is often unrealistic. For example, transitions tend to occur more often than [[wp:Transversion|transversions] in the evolution of nucleic acids. This assumption can be relaxed by assigning differential costs to specific character state changes, resulting in a weighted parsimony algorithm .  Additionally, maximum parsimony assumes that the same amount of evolutionary time has passed along every branch of the tree.  This shortcoming is addressed by maximum likelihood methods that directly model the stochastic process of evolution as it unfolds along each branch of a tree.

Maximum likelihood
Maximum likelihood (ML) methods of ancestral state reconstruction treat the character states at internal nodes of the tree as parameters, and attempts to find the parameter values that maximize the probability of the data (the observed character states) given the hypothesis (a model of evolution and phylogeny relating the observed sequences or taxa). Some of the earliest ML approaches to ancestral reconstruction was developed in the context of genetic sequence evolution ; similar models were also developed for the analogous case of discrete character evolution.

Generally speaking, there are two approaches to ancestral reconstruction by maximum likelihood. First, one may work upwards from the descendants of a tree to progressively assign the most likely character state to each ancestor taking into consideration only its immediate descendants. This approach is referred to as marginal reconstruction. It is akin to a greedy algorithm that makes the locally optimal choice at each stage of the optimization problem. While it can be highly efficient, it is not guaranteed to attain a globally optimal solution to the problem. Second, one may instead attempt to find the joint combination of ancestral character states throughout the tree which jointly maximizes the likelihood of the data. Thus, this approach is referred to as joint reconstruction. While it is not as rapid as marginal reconstruction, it is also less likely to be caught in the local optima in non-convex objective functions that modern optimization methods and heuristics are designed to avoid. In the context of ancestral reconstruction, this means that a marginal reconstruction may assign a character state to the immediate ancestor that is locally optimal, but deflects the joint distribution of ancestral character states away from the global optimum (see for further examples). Not surprisingly, joint reconstruction is more computationally complex than marginal reconstruction. Nevertheless, efficient algorithms for joint reconstruction have been developed with a time complexity that is generally linear with the number of observed taxa or sequences.

ML-based methods of ancestral reconstruction tend to provide greater accuracy than MP methods in the presence of variation in rates of evolution among characters (or across sites in a genome). Since ML requires the investigator to define a model of evolution, however, its accuracy may be affected by the use of a grossly incorrect model (model misspecification).

Bayesian inference
Bayesian methods of ancestral state reconstruction seek to determine the set of ancestral states while considering not only the full phylogeny, branch lengths, and evolutionary model but also combining information about the uncertainty of the phylogenetic tree and uncertainty in the estimate of the ancestral state. These methods tend to use Markov chain Monte Carlo techniques for sampling phylogenetic trees and for investigating the parameters of a statistical model of trait evolution.

Models
Parsimony methods are intuitive and appealing, but they suffer from several flaws. For one, the upshot of the Occam's razor heuristic underlying such methods is that such methods assume that changes are rare, and thus are inappropriate in cases where change is the norm rather than the exception. Another flaw is that they do not take branch lengths into account, making the technique liable to infer that one change occurred on a very short branch rather than multiple changes occurring on a very long branch, for example. Also, they are only partially statistically justified. Without a statistical model underlying the method, its estimates do not have well-defined uncertainties.

Many models have been developed to estimate ancestral states of discrete and continuous characters from phylogenetic data and character states of extant descendants. Such models assume that the evolution of a trait through time may be modelled by a well-understood random process. For discrete-valued traits (such as "number of legs"), this random process is typically taken to be a Markov chain; for continuous-valued traits (such as "brain mass"), the process is frequently taken to be a Brownian motion or an Ornstein-Uhlenbeck process. Using this model as the basis for statistical inference, one can now use maximum likelihood methods or Bayesian inference to estimate the ancestral states.

Discrete-state models
Suppose the trait in question may fall into one of $$k$$ states, labelled $$1,\ldots, k$$. The typical means of modelling evolution of this trait is via a continuous-time Markov chain, which may be briefly described as follows. Each state has associated to it rates of transition to all of the other states. The trait is modelled as stepping between the $$k$$ states; when it reaches a given state, it starts an exponential "clock" for each of the other states that it can step to. It then "races" the clocks against each other, and it takes a step to the state whose clock is the first to ring. In such a model, the parameters are the transition rates $$\mathbf{q} = \{q_{ij}: 1\leq i, j\leq k, i\not= j\}$$, which can be estimated using, for example, maximum likelihood methods, where one maximizes over the set of all possible configurations of states of the ancestral nodes.

If one wishes to recover the state of a given ancestral node in the phylogeny (call this node $$\alpha$$) by maximum likelihood, the procedure is: find the maximum likelihood estimate $$\hat{\mathbf{q}}$$ of $$\mathbf{q}$$; then compute the likelihood of each possible state for $$\alpha$$ conditioning on $$\mathbf{q} = \hat{\mathbf{q}}$$; finally, choose the ancestral state which maximizes this. One may also use this substitution model as the basis for a Bayesian inference procedure, which would consider the posterior belief in the state of an ancestral node given some user-chosen prior.

Because such models may have as many as $$k(k-1)$$ parameters, overfitting may be an issue. Some common choices that reduce the parameter space are:


 * Markov $$k$$-state 1 parameter model: this model is the reverse-in-time $$k$$-state counterpart of the Jukes-Cantor model. In this model all transitions have the same probability $$q$$ regardless of their start and end states.  Some transitions may be disallowed by declaring that their probabilities are simply 0; this may be the case, for example, if certain states cannot be reached from other states in a single transition.


 * Asymmetrical Markov $$k$$-state 2 parameter model: in this model, the state space is ordered (so that, for example, state 1 is smaller than state 2, which is smaller than state 3), and transitions may only occur between adjacent states. This model contains two parameters $$q_\mbox{inc}$$ and $$q_\mbox{dec}$$: one for the rate of increase of state (e.g. 0 to 1, 1 to 2, etc.); and one for the rate of decrease in state (e.g. from 2 to 1, 1 to 0, etc.).

Binary state speciation and extinction model
The binary state speciation and extinction model (BiSSE) is a discrete-space model that does not directly follow the framework of those mentioned above. It allows estimation of ancestral binary character states jointly with diversification rates associated with different character states; it may also be straightforwardly extended to a more general multiple discrete-state model. In its most basic form, this model involves 6 parameters: 2 speciation rates (one each for lineages in states 0 and 1); similarly, 2 extinction rates; and 2 rates of character change. This model allows for hypothesis testing on the rates of speciation/extinction/character change, at the cost of increasing the number of parameters.

Continuous-state models
In the case where the trait instead takes non-discrete values, one must instead turn to a model where the trait evolves as some continuous process. Inference of ancestral states by maximum likelihood (or by Bayesian methods) would proceed as above, but with the likelihoods of transitions in state between adjacent nodes given by some other continuous probability distribution.


 * Brownian motion: in this case, if nodes $$\alpha$$ and $$\beta>$$ are adjacent in the phylogeny (say $$\alpha$$ is the ancestor of $$\beta$$) and separated by a branch of length $$t$$, the likelihood of a transition from $$\alpha$$ being in state $$x$$ to $$\beta$$ being in state $$y$$ is given by a Gaussian density with mean $$0$$ and variance $$\sigma^2 t$$. In this case, there is only one parameter ($$\sigma^2$$), and the model assumes that the trait evolves freely without a bias toward increase or decrease, and that the rate of change is constant throughout the branches of the phylogenetic tree.


 * Ornstein-Uhlenbeck process: in brief, an Ornstein-Uhlenbeck process is a continuous stochastic process that behaves like a Brownian motion, but attracted toward some central value, where the strength of the attraction increases with the distance from that value. This is useful for modelling scenarios where the trait is subject to stabilizing selection around a certain value (say $$0$$).  Under this model the above-described transition of $$\alpha$$ being in state $$x$$ to $$\beta$$ being in state $$y$$ would have likelihood defined by the transition density of an Ornstein-Uhlenbeck process with two parameters: $$\sigma^2$$, which describes the variance of the driving Brownian motion, and $$\alpha$$, which describes the strength of its attraction to $$0$$.  As $$\alpha$$ tends to $$0$$, the process is less and less constrained by its attraction to $$0$$ and the process becomes a Brownian motion.  Because of this, the models may be nested, and log-likelihood ratio tests discerning which of the two models is appropriate may be carried out.


 * Stable models of continuous character evolution: though Brownian motion is appealing and tractable as a model of continuous evolution, it does not permit non-neutrality in its basic form, nor does it provide for any variation in the rate of evolution over time. Instead, one may use a stable process, one whose values at fixed times are distributed as stable distributions, to model the evolution of traits.  Stable processes, roughly speaking, behave as Brownian motions that also incorporate discontinuous jumps.  This allows one to appropriately model scenarios in which short bursts of fast trait evolution are expected.  In this setting, maximum likelihood methods are poorly suited due to a rugged likelihood surface and because the likelihood may be made arbitrarily large, so Bayesian methods are more appropriate.

Migration
Ancestral reconstruction is not limited to biological traits. Spatial location is also a trait, and ancestral reconstruction methods can infer the locations of ancestors of the individuals under consideration. Such techniques were used by Lemey et al. to geographically trace the ancestors of 192 Avian influenza A-H5N1 strains sampled from 20 localities and for 101 Africa rabies virus sequences sampled across 12 African countries.

Treating locations as discrete states (countries, cities, etc.) allows for the application of the discrete-state models described above. However, unlike in a model where the state space for the trait is small, there may be many locations, and transitions between certain pairs of states may rarely or never occur; for example, migration between distant locales may never happen directly if air travel between the two places does not exist, so such migrations must pass through intermediate locales first. This means that there could be many parameters in the model which are zero or close to zero. To this end, Lemey et al. use a Bayesian procedure to not only estimate the parameters and ancestral states, but also to select which migration parameters are not zero; their work suggests that this procedure does lead to more efficient use of the data. They also explore the use of prior distributions that incorporate geographical structure or hypotheses about migration dynamics, finding that those they considered had little effect on the findings.

Using this analysis, Lemey et al. find that the most likely hub of diffusion of A-H5N1 is Guangdong, with Hong Kong also receiving posterior support. Further, their results support the hypothesis of longstanding presence of African rabies in West Africa.

Diet reconstruction in Galapagos finches
Both phylogenetic and character data are available for the radiation of finches inhabiting the Galapagos Islands. These data allow testing of hypotheses concerning the timing and ordering of character state changes through time via ancestral state reconstruction. During the dry season the diets of the 13 species of Galapagos finches may be assorted into three broad diet categories, first those that consume grain like foods are considered “granivores”, those that ingest arthropods are termed “insectivores” and those that consume vegetation are classified as “folivores”. Dietary ancestral state reconstruction using maximum parsimony recover 2 major shifts from an insectivorous state: one to granivory, and one to folivory (Figure x). Maximum-likelihood ancestral state reconstruction recovers broadly similar results with one significant difference: the common ancestor of the tree finch (Camarhynchus) and ground finch (Geospiza) clades is most likely granivorous rather than insectivorous (as judged by parsimony; Figure x). In this case this difference between ancestral states returned by maximum parsimony and maximum likelihood likely occurs as a result of the fact that ML estimates consider branch lengths of the phylogenetic tree.

Mammalian body mass
In an analysis of the body mass of 1,679 placental mammal species comparing stable models of continuous character evolution to Brownian motion models, Elliot and Mooers showed that the evolutionary process describing mammalian body mass evolution is best characterized by a stable model of continuous character evolution. Under a stable model ancestral mammals retained a low body mass through early diversification with large increases in body mass coincident with the origin of several Orders of large body massed species (e.g. ungulates). By contrast simulation under a Brownian motion model recovered, a less realistic, order of magnitude larger body mass among ancestral mammals requiring significant reductions in body size prior to the evolution of Orders exhibiting small body size (e.g. Rodentia). Thus stable models recover a more realistic picture of mammalian body mass evolution by permitting large transformations to occur on a small subset of branches.

Ancestral reconstruction of species ranges
Inferring historical biogeographic patterns often requires reconstructing ancestral ranges of species on phylogenetic trees. Clark et al. used a well-resolved phylogeny of plant species in the genus Cyrtandra together with information of their geographic ranges to compare 4 methods of ancestral range reconstruction. They compared Fitch parsimony (FP; parsimony), stochastic mapping (SM; maximum likelihood), dispersal-vicariance analysis (DIVA; parsimony), and dispersal-extinction-cladogenesis (DEC; maximum-likelihood). Results showed that both parsimony methods performed poorly likely due to a the fact that parsimony methods do not consider branch lengths. Both maximum-likelihood methods performed better but DEC analyses which additionally allow incorporation of geological priors gave more realistic inferences about range evolution in Cyrtandra relative to other methods.

Another maximum likelihood method recovers the phylogeographic history of a gene by reconstructing the ancestral locations of the sampled taxa assuming a spatially explicit random walk model of migration given the geographic coordinates of the individuals represented by the tips of the phylogenetic tree. When applied to a phylogenetic tree of chorus frogs Pseudacris feriarum this method recovers recent northward expansion, higher per-generation dispersal distance in the recently colonized region, a noncentral ancestral location and directional migration.

Uncertainty in tree reconstruction
The first step in a comparative analysis often involves ancestral state reconstruction. Such analyses are often conducted using a single best estimate of the phylogenetic relationships among the taxa under study or a single consensus (summary of a set of trees) tree. The underlying assumption of this strategy is that the single tree is the truest depiction of the relationships among the taxa and the lengths of the branches on the tree. In fact, it is vanishingly rare to know the true phylogenetic tree and branch lengths with a high degree of confidence. Since ancestral state reconstructions performed on varying topologies and/or varying branch lengths on a phylogenetic tree can often yield radically different results this is a serious complication for ancestral state reconstruction.

Heterotachy
Heterotachy - variation in the rate at which a given site in a gene sequence alignment evolves over time - biases inferences from models (such as those oft employed in ancestral state reconstruction) which assume rates of evolution are constant.

Model misspecification
Another consequence of heterotachy and heterogeneous rates of character state transitions is that the use of uniform-rate models in ancestral state reconstruction may often result in inaccurate inferences of ancestral states.

Software
There are many software packages available which can perform ancestral state reconstruction a selection is described below:

First there are a number of packages within the R statistical language which are capable of inferring ancestral states notably:

APE implements a variety of ancestral state reconstruction methods for both discrete and continuous characters.

Diversitree implements ancestral state reconstruction under Mk2 and BiSSE models.

HyPhy is a modular software package for hypothesis testing using phylogenies in a maximum likelihood framework. It features a graphical user interface and a versatile batch language for customizing models of sequence evolution. HyPhy implements a fast joint likelihood method of ancestral sequence reconstruction that can be readily adapted to reconstructing discrete ancestral character states or geographic ranges.

Mesquite implements parsimony, maximum likelihood, and Bayesian methods of ancestral state reconstruction for both discrete and continuous characters and has several display methods for resulting reconstructions.

Bayes Traits is a computer package which performs analyses on discrete or continuous characters in a Bayesian framework and allows testing of hypotheses about models of evolution, ancestral states, and correlations among pairs of traits.

BEAST Also performs ancestral state and ancestral sequence reconstruction analyses.

Lagrange is an application which allows analyses reconstruction of geographic range evolution on phylogenetic trees available from http://www.reelab.net/home/software/.

Phylomapper implements a likelihood-based statistical framework for estimating historical patterns of gene flow and ancestral geographic locations. Available from: http://www.evotutor.org/LemmonLab/PhyloMapper1.html.