Statistical genetics, fall 2012


Sirkka-Liisa Varvio


2-10 cu + 3cr seminar
The course is a set of one week modules. 
The program consists of lectures and associated weekly home-exercises, computer work (data-analysis), discussions on scientific papers.

More information here.pdf


Basic probabilistics and interest in bio-applications of statistics.
Course is suitable for students with planned orientation to biometry and bioinformatics and also for other students interested in biological applications of statistical inference.
Compulsory course in Bioinformatics Master´s program (MBI, degree requirements 2012-2014).

Lectures and computer class sessions

Weeks 45-50, Tuesdays 14-18, Wednesdays 14-18, C128.


Content scheme

Module I, Intro

Wed 7. 11, 14-18, D123
• Course synopsis
Statistical concepts in a nutshell.pdf
Basic concepts in genetics and home-exercise 1.pdf
• Review: Linkage disequilibrium.pdf

Module II, Mutations

(Tue 13.11), lecture postponed to Mon 19.11 16-18, C128
Modelling mutations and home-exercise 2.pdf, For home-exe 2.2.pdf
Mutation analysis from genome data.pdf
Universal trend in amino acid gain and loss.pdf

Wed 14.11, 14-18, C128
NOT FOR EXAM (This is for 1cr in addition to the 4cr from the course, cf this: More information here.pdf)
Teacher: Virginia Brilhante from Biomedicum (Research Program for Molecular Neurology).
Amino acid evolutionary diagnosis of function-altering mutations in silico by using the softwares
SIFT (Sorting Intolerable From Tolerable) and PolyPhen (prediction of functional effects of human nsSNPs).
Work to be done during the computer session:
cs1PatientExomeVariantDataSample.xlsx, cs1Guidelines.pdf
cs2PatientExomeVariantDataSample.xlsx, cs2Guidelines.pdf
cs2-data: corrected version added
Virginia Brilhante lecture 141112.pdf

Assignment SIFT_PolyPhen:
The work started during the computer session is part A part of the assignment.
SIFT_assignment Part B.pdf, cowhaplo1_20.txt, cowSNP_and_reference.txt, cowSNP_and reference_AA.txt
Submission to MOODLE

Module III, Populations

Tue 20.11, 14-18, C128
• Review: Estimating population genetic structures.pdf
• Review: Data analysis methods.pdf
Home-exercise 3.pdfAppendix to Home-exercise 3.pdf
Home-exercise 3 was done during the session. During the session Wed 28.11 more background theory.

Data to be analysed:
Tajima D calculator
Human population HLA-gene data: Zulu.txt, Toroko.txt, Sioux.txt, South Indian.txt, Mexican.txt, Irish.txt, Filipino.txt, Zulu.txt, Czech.txt, Finn 90.txt, Turk.txt
Bacteria data: USA_before_vacc.txt, USA_after_vacc.txt, Skand_before_vacc.txt, Skand_after_vacc.txt

Selecton server, Selecton tutorial.pdf
Growth hormone gene.txt, BDNF-gene.txt, HLA gene.txt

Wed 21.11, 14-18, C128.
NOT FOR EXAM  (This is 1cr in addition to the 4cr from the course, cf. this: More information here.pdf)
Guidance to practical work in computer class.
Data-analysis by the software Arlequin3.5, Manual
Example data for training during the session, which is aloso part A for the assignment:
HLA_DRB1_freqtable.xlsx, HLA_DRB1_seqs.txt The script: convertToArlequin.R.txt, Completed data_file to Arlequin.txt, Script comments.txt
This is geographical population data from one very polymorphic human gene (very many alleles, the sequences), collected from here 

  Arlequin assignment.pdfExample 1.pdfExample 2.pdf

Note added 3.12: I have now corrected the datafile which included mistakes in the data-collection step. The inputfile, which you should name again as .arp, instead of .txt, works fine.

Module IV, Coalescence, haplotypes

Wed 28.11, 14-18, C128
Lecture topics: More about previous week theme, statistical tests of neutrality etc. Introduction to the concept and theory coalescence (retrospective population genetics).
Selection tests and coalescence theory.pdf, Kingman´s Coalescence paper.pdf, Human genome Tajima D

(Wed 28.11, 14-18, C128).See your email for info about this.
NOT FOR EXAM (This is 1cr in addition to the 4cr from the course, cf. this: More information here.pdf)
Guidance to practical work in computer class.
• Haplotype networks by using the software Network, Network_Manual.pdf.
The program is installed in C128 computers. Freely available to own computers, too (although under a company´s webpage). Note that installation always expires at the end of a year: install anew at the beginning of a new year.
Haplotype assignment.pdf, Article_1.pdf, Article_2.pdf


Module V GWAS-minicourse

Tue 04.12, 14-18, CK112
Teachers: Samuli Ripatti, Matti Pirinen, FIMM

Human Genetics and Biostatistics
Heritability - Linkage studies - Association studies - Genotyping technologies - Quality control
Statistics of association mapping (significance and power vs. priors and evidence)
Examples of real studies - Recent topics in GWAS world (e.g. missing heritability, synthetic associations,...)
Why do we think that certain phenotype has a genetic component?
How does gene mapping work with family data and how does it work with population data?
What kind of data modern genotyping technologies produce and how to analyse them?
What have we learned during the last five years from genome-wide association studies?
And what have we not learned so far?

• Lecture: GWAS_041212.pdf
• Reviews: Anderson_et al_2010.pdf, Risch_2000.pdf

Wed 05.12, 14-18, C128
NOT FOR EXAM (This is 1cr in  addition to the 4cr from the course, cf. this: More information here.pdf)
• Lecture: GWAS_1_051212.pdf, GWAS_2_051212.pdf
R_matrix_80_Europeans.txt, GWAS_examples.R
• Literature here:


Tue 11.12 or Wed 12.12, 14-18, C128.

Your choice for the exam, inform here: exam-DOODLE

Journal club

Journal club.pdf
History of statistics + genetics: , , , , , History_Muller.pdf
• Reserved: BiancaHistory_Fisher.pdf and History_Fisher2.pdf, TeemuHistory_Galton.pdf and History_Galton2.pdf, SaraHistory_HW.pdf, SophieHistory_Haldane.pdf

The code origins:
• Reserved: Anna KOrigins_2.pdf, FilippoOrigins_1.pdf and Origins_3.pdf
Disease mutations with an evolutionary view: , , , Dis_mut_4.pdf, Dis_mut_9.pdf,
• Reserved: JimmyDis_mut_6.pdf, AnjuDis_mut_3.pdf and Dis_mut_10.pdf, JohannaDis_mut_5.pdf, YujuanDis_mut_1.pdf and Dis_mut_2.pdf, Anna KDis_mut_7.pdf, SophieDis_mut_8.pdf,

Statistical genetic studies of populations of various organisms: Pop_1.pdf, Pop_2.pdf, Pop_3.pdf, Pop_4.pdf, Pop_5.pdf, Pop_6.pdf, , Pop_8.pdf, Pop_9.pdf, Pop_10.pdf
• Reserved: AhmedPop_7.pdf



Send an email to sirkka-liisa.varvio at if you want to give a seminar. Presentations during the last course week and written essays in January.

Agreed topics:


Your choice for seminar session: seminar-DOODLE  


Did you forget to register? What to do.Roman_b.pdf

