Last modified by varvio@helsinki_fi on 2024/03/27 10:06

Show last authors
1 = (% style="color: rgb(0,51,102);" %)Phylogeny inference and data analysis, spring 2011(%%) =
2
3 === (% style="color: rgb(0,51,102);" %)Lecturer(%%) ===
4
5 [[Sirkka-Liisa Varvio>>doc:mathstatHenkilokunta.Varvio, Sirkka-Liisa]]
6
7 === (% style="color: rgb(0,51,102);" %)Scope(%%) ===
8
9 4-12 cu.
10
11 === (% style="color: rgb(0,51,102);" %)Time schedule, prerequisites, content(%%) ===
12
13 Basic probabilistics and statistical inference are assumed as prerequisites.
14 The course is tailored for Bioinformatics Master program (MBI) and thus some knowledge on sequence analysis, working with sequence databases, sequence alignment, distance matrix based phylogenies by MEGA-software, are assumed to be known.
15 The course is, however, open for all students: additional sessions will be arranged for students who are not familiar with these basics (introduced in other MBI-courses).
16
17 === (% style="color: rgb(0,51,102);" %)**Part I**(%%) ===
18
19 (% style="color: rgb(0,51,102);" %)**III period**:(%%)
20 Tuesdays 15-17 and Thursdays 14-16 in B120 and in computer class C128.
21 Time schedule for extra sessions (for non-MBI -students) in computer class during three first course weeks in negotiable (for example before or after lecture times).
22
23 The aim of the course is to elucidate both biological and statistical aspects of phylogenies, i.e. evolutionary trees and networks, which are elementary structures of differences among biological entities (species, individuals, genes, sequences in general), amenable to statistical inference. The major categories of phylogeny inference methods are distance matrix, parsimony, maximum likelihood, Bayesian and network approaches.
24
25 //Exam (40 points) and homework assignments (20points): 4cr, essay (statistically or biologically focused): 2cr.//
26 //50-60 points: grade 5, 30 points: grade 1.//
27 //Course is focused on practical working => exam in computer class.//
28
29 (% style="color: rgb(0,51,102);" %)**Week 1**(%%) program is for non-MBI (Bioinformatics Master program) students. Lectures and practicals in computer class overlap with topics which have in been in Introduction to bioinformatics / Molecules for bioinformatics: Basic phylogeny concepts, basic practical computer working and getting familiar with molecular sequence databases.
30 18.01., 15-17, computer class C128.
31
32 (% style="color: rgb(0,51,102);" %)**Week 2**(%%)
33 Tue 25.01, 15-17, B120, Description of the course, assignments, how to proceed with data collection.
34 Thu 27.01, 14-16, B120, Alignment with Clustal, editing with Genedoc, MEGA
35
36 (% style="color: rgb(0,51,102);" %)**Week 3**(%%)
37 Tue 01.02, 15-16, C128, Practical session with MEGA, [[attach:for training simple phylo with MEGA.txt]], [[attach:for training simple phylo FASTA-format.txt]]
38 Wed 02.02, 12-20, C128, Extra help for those that need, data collections and alignments for datasets 1 and 2.
39 Thu 03.02, 14-16, D340, Phylogeny books and programs, Distance matrix methods (this is given as a paper copy), Nucleotide substitution modelling.
40
41 (% style="color: rgb(0,51,102);" %)**Week 4**(%%)
42 Tue 08.02, 15-16, C128, MrBayes demo, checking that everybody can start with the program. Use this NEXUS-file for training: [[attach:training.nex]].
43 Wed 09.02, 12-14, C128, Extra help, checking results,dataset 1 should be ready for MrBayes analyses (i.e. you have already done MEGA-analyses) and dataset 2 collected.
44 Thu 10.02, 14-16, B120, Parsimony, Maximum likelihood.
45
46 (% style="color: rgb(0,51,102);" %)**Week 5**(%%)
47 Tue 15.02, 15.00-16, C128, Datasets 3 and 4, Splitstree and Network.
48 Thu 17.02, 14-16, B120, Lectures on methods continue.
49
50 (% style="color: rgb(0,51,102);" %)**Week 6**(%%)
51 Tue 22.02, 15-17, C128, (% style="color: rgb(0,0,0);" %)Results from datasets 1 and 2 + lectures on methods(%%) (2h session probabaly not enough, we continue until everything is done.)
52 Thu 24.02, 14-16, B120, (% style="color: rgb(0,0,0);" %)Results from dataset 4 + lectures on networks and phylogeny examples ((%%)phylodynamics, phylogeography, phylogenomics, phyloprofiling).
53
54 (% style="color: rgb(0,51,102);" %)**Week 7**(%%)
55 Thu 03.03, 16-20, C128, EXAM.
56
57 |=(((
58 (% style="color: rgb(0,51,102);" %)Lectures, links, note also the material in "software tools"
59 )))|=(((
60 (% style="color: rgb(0,51,102);" %)Recommended review (and other) papers(%%) \\
61 )))
62 |(((
63 [[attach:Phylogeny books and programs..pdf]],
64 all phylogeny programs from here:
65 [[http:~~/~~/evolution.genetics.washington.edu/phylip/software.html>>url:http://evolution.genetics.washington.edu/phylip/software.html||shape="rect"]]
66 \\
67 )))|(((
68 //Before phylogenies, seq collection and alignments~://
69 [[attach:NCBI BLAST.pdf]],  [[attach:Multiple sequence alignment review.pdf]],
70 [[attach:Collection of tools.pdf]], includes links to many aligning,
71 editing, and phylogeny tree visualization facilities.\\
72 )))
73 |(((
74 Distance matrix methods in phylogeny inference
75 (paper copy in lecture Thu 03.02)
76 [[attach:Maximum parsimony phylogeny inference..pdf]]
77 \\[[attach:Maximum likelihood phylogeny inference..pdf]]
78 \\[[attach:Bayesian phylogeny inference..pdf]]
79 [[attach:Description of Bayesian inference relevant in MrBayes..pdf]]
80 \\Networks\\
81 )))|(((
82 [[attach:Short tutorial article.pdf]]
83 .
84 .
85 [[attach:Traditional and Bayesian phylogeny estimation.pdf]]
86 \\[[attach:Original MrBayes paper.pdf]], [[attach:MCMCMC.pdf]]
87 .
88 .
89 .
90 [[attach:Phylogenetic networks review.pdf]],[[attach:Median-joining networks.pdf]]\\
91 )))
92 |(((
93 [[attach:Nucleotide substitution modelling..pdf]],
94 Practical tool for model choice:
95 [[http:~~/~~/www.hiv.lanl.gov/content/sequence/findmodel/findmodel.html,>>url:http://www.hiv.lanl.gov/content/sequence/findmodel/findmodel.html||shape="rect"]]
96 (see also [[attach:Modeltest.pdf]])
97 \\\\\\
98 )))|(((
99 [[attach:Model selection review.pdf]]
100 [[attach:Bootstrap confidence in detail by Efron.pdf]]
101 [[attach:Bootstrap vs posterior prob 1.pdf]],[[attach:Bootstrap vs posterior prob 2.pdf]],
102 [[attach:Bootstrap vs posterior prob 3.pdf]], [[attach:Posterior prob how meaningful.pdf]],
103 [[attach:Bayesian inference of character evolution.pdf]]..
104 [[attach:The Bayesian revolution in genetics.pdf]]
105 )))
106
107 |=(((
108 (% style="color: rgb(0,51,102);" %)Examples(%%)\\
109 )))
110 |(((
111 Phylogenies from non-sequence material, [[attach:Worms with legs_Nature240211.pdf]], [[attach:Suppl240211.pdf]],[[attach:Language phylogenies.pdf]]
112 [[http:~~/~~/www.timetree.org/>>url:http://www.timetree.org/||shape="rect"]] , [[http:~~/~~/tolweb.org/tree/>>url:http://tolweb.org/tree/||shape="rect"]], [[attach:History of molecular clock.pdf]], [[attach:Origins of eukaryotes.pdf]], [[attach:Origins of mitochondria.pdf]], [[attach:The ring of life.pdf]]
113 Human origins, [[attach:A new view of the birth of Homo sapiens.pdf]], [[attach:Finger from Siberia April2010.pdf]],[[attach:Tooth from Siberia December2010.pdf]]
114 [[attach:Phylogenomics.pdf]], [[attach:Pharmacophylogenomics.pdf]]
115 Phylodynamics, [[attach:Phylodynamics review.pdf]], [[attach:Phylodynamics of viruses.pdf]], [[attach:Phylodynamics of influenza viruses.pdf]],
116 [[attach:Microbiome phylotyping.pdf]], [[http:~~/~~/www.mlst.net/>>url:http://www.mlst.net/||shape="rect"]]\\
117 )))
118
119 |=(((
120 (% style="color: rgb(0,51,102);" %)**Assignments (homework)**
121 )))
122
123 //Working in 2-4 student groups. You get 20 points for the total 60 points (40 from exam) by doing these. Note that the course is focused on practicals and you need these skills in exam (reasonable aspects, not sophisticated details).//
124
125 |(((
126 (% style="color: rgb(0,51,102);" %)**Dataset 1**(%%)
127 [[attach:Assignment 1 - Instructions for data collection and alignment.pdf]],  results from neigbor-joining, UPGMA, parsimony phylogenies during weeks (3) and 4 and bayesian results during weeks (4) and 5, whole work ready Tuesday 22.02. 
128 [[attach:Geneseq not aligned FASTA.txt]], [[attach:Geneseq aligned FASTA.txt]], [[attach:Geneseq in MEGAformat.txt]](% style="color: rgb(255, 0, 0); text-decoration: underline" %)I(% style="color: rgb(0, 0, 0); text-decoration: underline" %)N CASE(%%) (% style="color: rgb(0,0,0);" %)YOU WOULD LIKE TO ANALYSE YOUR FLU-VIRUS DATA BY NETWORK-SOFTWARE, SEND YOUR FASTA-FILE TO SIRU TO GET IT IN RDF-FORMAT.(%%)\\
129 )))
130 |(((
131 (% style="color: rgb(0,51,102);" %)**Dataset 2**(%%)
132 [[attach:Assignment 2 - Instructions for data collection.pdf]], checking data-collection in and preliminary results  in weeks (3) and 4. Whole work ready Tuesday 22.02.
133 Data to be collected from [[http:~~/~~/www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html>>url:http://www.ncbi.nlm.nih.gov/genomes/FLU/FLU.html||shape="rect"]]\\
134 )))
135 |(((
136 (% style="color: rgb(0,51,102);" %)**Dataset 3**(%%)
137 Human gene and its alleles. A change in the plan: The 20 points will be given from assignments 1, 2 and 4. The human examples are illustrated during lecture (you need not do anything in practice).\\
138 )))
139 |(((
140 (% style="color: rgb(0,51,102);" %)**Dataset 4**(%%)
141 Data from three bacteria. Select one of them for phylogeny analysis. Compare neighbor-joining (by MEGA) and network phylogenies by Splitstree4 (FASTA-file) and Network4.6 (rdf-file). You will get rdf-files also by email as they are probabaly not ok here.
142 (% style="color: rgb(0,0,0);" %)See your email for instructions. Additional instructions here:(%%) [[attach:Practical instructions for using the Network-software.pdf]]
143 [[attach:Borrelia_ClpAFASTA.fasta]],[[attach:Borrelia_ClpA.rdf]]
144 [[attach:pneumococcusGkiFASTA.txt]]
145 [[attach:StaphylococcusarccFASTA.txt]]
146 \\
147 )))
148
149 |=(((
150 (% style="color: rgb(0,51,102);" %)**Software tools**
151 )))
152
153 |(((
154 (% style="color: rgb(0,51,102);" %)**//Sequence alignment//**(%%)
155 The necessary step before phylogeny inference is to align sequence data, inspect the aligned data and correct it. Sequence aligning, like working with sequence databases, are topics in other MBI-courses, and theoretical basics will not be in the program of this phylo-course.\\
156
157 * The most widely used aligning program, **ClustalX**, [[http:~~/~~/www.clustal.org/>>url:http://www.clustal.org/||shape="rect"]], is installed in computer class C128, as well as **GeneDoc**, [[http:~~/~~/www.nrbsc.org/gfx/genedoc/>>url:http://www.nrbsc.org/gfx/genedoc/||shape="rect"]] for editing (correcting) an alignment suggested by Clustal.
158 * You can also use Clustal through [[http:~~/~~/www.ebi.ac.uk/Tools/msa/clustalw2/>>url:http://www.ebi.ac.uk/Tools/msa/clustalw2/||shape="rect"]]; look at "Similar applications" from this EBI-website and find more facilities, for example this [[http:~~/~~/www.jalview.org/>>url:http://www.jalview.org/||shape="rect"]] and this [[http:~~/~~/pbil.univ-lyon1.fr/software/seaview.html>>url:http://pbil.univ-lyon1.fr/software/seaview.html||shape="rect"]]. Note also this:[[attach:Collection of tools.pdf]]
159 )))
160 |(((
161 (% style="color: rgb(0,51,102);" %)**//Phylogeny software//**(%%)
162 These program packages have been installed in computer class C128. All these will be used during the course.
163 Note the links from which you can download the program to your own computer if you prefer working at somewhere else than in C128.
164 //Note also the manuals!//\\
165
166 * **MEGA4**, [[http:~~/~~/www.megasoftware.net/>>url:http://www.megasoftware.net/||shape="rect"]], Manual:[[attach:MEGA4 Manual.pdf]]
167 ** NOTE: MEGA-link now (since 25.Jan 2011) goes to a new version MEGA5 which has been released. As this happened so recently, working in classroom C128 will be with MEGA4, administrator will not be asked to install a new version.
168 * **MrBayes3.1**, [[http:~~/~~/mrbayes.csit.fsu.edu/>>url:http://mrbayes.csit.fsu.edu/||shape="rect"]], Manual: [[http:~~/~~/mrbayes.csit.fsu.edu/mb3.1_manual.pdf>>url:http://mrbayes.csit.fsu.edu/mb3.1_manual.pdf||shape="rect"]],[[attach:MrBayes Commands.pdf]]
169 * **Network4.6**, [[http:~~/~~/www.fluxus-engineering.com/sharenet.htm>>url:http://www.fluxus-engineering.com/sharenet.htm||shape="rect"]], Manual: [[http:~~/~~/www.fluxus-engineering.com/Network4600_user_guide.pdf>>url:http://www.fluxus-engineering.com/Network4600_user_guide.pdf||shape="rect"]]
170 * **Splitstree4**, [[http:~~/~~/www.splitstree.org/>>url:http://www.splitstree.org/||shape="rect"]],[[attach:Splitsree4 Manual.pdf]]
171 )))
172
173 ----
174
175 === (% style="color: rgb(0,51,102);" %)**Part II**(%%) ===
176
177 (% style="color: rgb(0,51,102);" %)**IV period**:(%%) Practical project work in computer class and at home, time schedule negotiable (4-6 cr), topics to negotiated at the end of part I
178 The project is based on sequence data collected from databases.
179
180 ----
181
182 === [[Registration>>url:https://oodi-www.it.helsinki.fi/hy/opintjakstied.jsp?html=1&Tunniste=57729||shape="rect"]] ===
183
184 This is registration to part I. At the end of III-period you should decide if you continue to part II.
185
186 Did you forget to register? [[What to do>>doc:mathstatOpiskelu.Kysymys4]].\\