Child pages
  • Mathematical models of molecular evolution, fall 2009
Skip to end of metadata
Go to start of metadata

Mathematical models of molecular evolution, fall 2009


Jukka Corander


5 cu.


Advanced studies. Evolutionary models at molecular level are the backbone of molecular biology and bioinformatics activities over a vast diversity of research fields. Course aims: Introduces the central characteristics of mathematical models for molecular evolution and how such models can be estimated from data. The course is primarily targeted to students in mathematics, statistics and computer science, but it is also useful for graduate students in biology who wish to gain deeper understanding of evolutionary models. To see how our current understanding based on methods for studying molecular evolution links humans to our close relatives, check Tree of Life webpages, here.


Probability calculus and stochastic processes are central elements in the course material. Algebra and calculus skills are also useful, but they are not in the main role for this course.


Weeks 45-51, Monday 12-14 and Friday 12-14 in room B120 (Exactum). The course starts on Monday Nov 2nd. NB The following lecture times are cancelled: Mon 30.11., Fri 4.12., Fri 11.12., Mon 14.12. They are replaced by additional lectures on Tue 8.12. 12-14 and on Wed 9.12. 12-16.

We have considered the following pages of the book by T. Koski during weeks 45-47:
1-35,38-48,49-51,61-66,71-72,75-79,82-87,91-98. On Friday Nov 20th we cover pp. 99-104, 110-116.
The last lectures will be given during week 50. On Monday we considered Neighbor-Joining (NJ) algorithm, UPGMA algorithm, Bootstrap and examples with MEGA software. You may also find the original NJ paper helpful, as well as the review on hierarchical clustering algorithms. We concluded the course by looking at stochastic algorithms for finding optimal trees and also at model selection between different substitution models. The articles mentioned in the bibliography below were considered, and additionally, you find these slides useful.


There will be no written examination, but the participants are required to do a set of home assignments. The home assignments should contain a detailed report of the work done and of the results obtained. Deadline for returning the assigments is May 7th 2010. Participants can work in pairs to solve the assignment problems if they wish to do so.

The assignments consist of the following tasks:

1. Simulation of DNA sequence data under various evolutionary models (4 in total).
2. Fitting of each considered model type to each generated dataset and investigation of the accuracy of the results.
3. Analysis of a real dataset using the same four models as above. The real dataset is available here. The file contains 20 DNA sequences for bacillus strains and is in the data format accepted by MEGA software. You can also view and edit the contents with any text editor.
4. Reporting the work and the findings.

The four models to be considered are:

Jukes-Cantor model with homogeneous rate over sites
Jukes-cantor model with Gamma-distributed heterogeneous rates over sites
Kimura 2 parameter model with homogeneous rate over sites
Kimura 2 parameter model with Gamma-distributed heterogeneous rates over sites

Details of the simulation are as follows:

Parameter in the Gamma-distribution of rate heterogeneity should be set equal to 0.5. Seq-Gen or any other software, or your own code in Matlab, R, etc can be used to produce the data sets. Substition rate parameters can be freely chosen as long as they are assigned sensible values. Choose any rooted binary tree topology that you like, with 5 leaf nodes, and simulate DNA sequence data of length 500 bases for each leaf node under the topology for each of the four models defined above. Use both Mr.Bayes and MEGA software to fit the four models to each dataset. In MEGA use the Neighbor-Joining method to learn the topology. Compare the bootstrap support values and the posterior probabilities of the internal nodes of the estimated tree and compare the estimated tree with the true generating model. In the report pay attention to how the model estimation performs when the fitted model and the data generating model are different.


The core of the lecture material is the free e-book by professor Timo Koski, available for download here. A number of scientific articles (paper1,paper2,paper3,paper4) and popular modeling software (Mr.Bayes,MEGA) will also be considered. Another recommended reading in this field is the book 'Computational molecular evolution' by Ziheng Yang (Oxford University Press, 2006).


Did you forget to register? What to do.

Exercise groups











  • No labels