1 Department of Mathematical Sciences, Faculty of Science, Aarhus University, Aarhus University2 Bioinformatics Research Centre (BiRC), Faculty of Science, Aarhus University, Aarhus University3 Bioinformatics Research Centre (BiRC), Science and Technology, Aarhus University4 Bioinformatics Research Centre (BiRC), Science and Technology, Aarhus University
The evolution of DNA sequences can be described by discrete state continuous time Markov processes on a phylogenetic tree. We consider neighbor-dependent evolutionary models where the instantaneous rate of substitution at a site depends on the states of the neighboring sites. Neighbor-dependent substitution models are analytically intractable and must be analyzed using either approximate or simulation-based methods. We describe statistical inference of neighbor-dependent models using a Markov chain Monte Carlo expectation maximization (MCMC-EM) algorithm. In the MCMC-EM algorithm, the high-dimensional integrals required in the EM algorithm are estimated using MCMC sampling. The MCMC sampler requires simulation of sample paths from a continuous time Markov process, conditional on the beginning and ending states and the paths of the neighboring sites. An exact path sampling algorithm is developed for this purpose.
Journal of Computational and Graphical Statistics, 2008, Vol 17, Issue 1, p. 138-162