Genome-wide association studies, Autumn 2015
II period, 14-17 in C128, Tuesdays 3.11, 10.11, 17.11, 24.11.
- Pearson & Manolio. (2008). How to interpret a GWAS.
- Slatkin (2008) Linkage disequilibrium - understanding the evolutionary past and mapping the medical future.
- Sham & Purcell. (2014) Statistical power and significance testing in large-scale genetic studies.
- Price et al. (2010). New approaches to population stratification in genome-wide association studies.
- Gibson (2012). Rare and common variants: Twenty arguments.
- Zuk et al. (2012). The mystery of missing heritability: Genetic interactions create phantom heritability.
- Lawlor et al. (2008) Mendelian randomization: Using genes as instruments for making causal inferences in epidemiology.
Return the assignments of each week as a single PDF file that contains the R-scripts, the output of the scripts and the Figures. For example, you can use MS Word and save as / export as PDF. The assignments are returned through Moodle. ( If you do not have a UH student account for Moodle system, you can email your answers as a PDF to matti.pirinen'at'helsinki.fi )
The answers must be returned by 14.15 (o'clock) on Tue 8.12. Return your answers as a single PDF through Moodle. ( If you do not have a UH student account for Moodle system, you can email your answers as a PDF to matti.pirinen'at'helsinki.fi .) Sufficient material for complete exam answers can be found above from 'Course material' and 'Assignments' sections.
How to answer the exam questions ?
- Guiding principle: The idea of the exam is to verify that you yourself have the knowledge of GWAS as taught in this course. Therefore you should understand and have processed everything that is in included in your answers.
- Formulate the answers in your own words rather than copy-pasting from somewhere.
- You can use short definition-like pieces of text from the source materials on this webpage or from elsewhere you have discovered yourself.
- If you copy anything else than definition-like text you should mark clearly where the material is taken from. The only reason that you would ever want to use such longer citations would be to give examples of the topic from the literature. In general, there is no need for long citations in this exam.
- You are free to make figures that mimic figures you have seen during the course or elsewhere. If it is an almost direct copy of the original add a note "Adapted from --reference--" to the figure legend.
- Read each question carefully and answer to the question being asked. It does not help to include irrelevant pieces of information in the answers, no matter how great answers they were to some other question.
- IMPORTANT: Every student does the home exam alone: do not share your answers with others, do not include any material that you haven't processed yourself.
Passing the course
The course is passed when a student has at least half of the exercise points and at least half of the exam points. The course will be graded from 1 to 5.
For students completing also the project work (see below), there will still be a single grade from the course. An excellent project work can increase the grade determined by the exam and assignments.
After a successful completion of the lecture course (home assignments and home exams), students have an option to do a project work of 2 cr. Return your project report as a single PDF through Moodle. ( If you do not have a UH student account for Moodle system, you can email your report as a PDF to matti.pirinen'at'helsinki.fi . ) The deadline is Sun 31.1.2016 (23.59 o'clock).
Structure of the report:
- Start with a compact Abstract that tells what is included in the report and why it is important. In practice, Abstract may be the last thing you write/finish for the report, after you know exactly what is included.
- Have a short Introduction that puts the topic in its context with respect to GWA studies. With more statistical topics, you may also refer to a more general formulation of the problem in statistics or to some other fields of science that tackle similar problems.
- Use your own consideration how to best present the main content of the project from your own angle. If you use R or other software to demonstrate your topic, you should include the codes at the end of the report as an Appendix. Short and compact pieces of code (a few lines in easily readable form) can be included also in the main report. Choice is yours, think what is most clear.
- End the report with a compact Conclusion section that describes your own conclusion on the topic.
- Add References in the end, for example, by numbering them and referring with ('number') in the text. Or you can refer by ('Surname, year') in the text, in which case use alphabetical order in the reference section.
- You may include Appendixies.
A guideline for an amount of work expected to get the credits is to read carefully and with a good understanding at least two scientific publications, and reporting what you have learned in your own words. Note that in some projects you may spend most time on doing simulations and data analysis rather than reading papers and that is completely OK.
Possible topics include the following. Also your own topic is not only possible but also encouraged!
- GWAS essay. Write a report on what is GWAS by going carefully through one recent large disease GWAS and one quantitative trait GWAS and by reporting how each step of the GWAS process was completed in those two studies. For each GWAS step, include also a general summary of why that step is important in GWAS in addition to explanation about how it was conducted in your example GWASs.
Quantitative traits: Height, BMI, Lipids.
Diseases: Schizophrenia, Coronary artery disease, Multiple sclerosis, Inflammatory bowel disease, Rheumatoid arthritis, Type 2 Diabetes.
- Statistical imputation in GWAS. Explain how Hidden Markov models are used for genotype imputation and how multivariate normal distribution is used for z-score imputation in GWAS data.
Marchini & Howie (2010) Genotype imputation for genome-wide association studies.
Pasaniuc et al. (2014) Fast and accurate imputation of summary statistics enhances evidence of functional enrichment.
- Phenotype prediction. Explain what are the current methods to predict individual's phenotype from his/her genome data and how well do they work.
Vilhjalmsson et al. (2015) Modeling Linkage Disequilibrium Increases Accuracy of Polygenic Risk Scores.
Speed & Balding (2014) MultiBLUP: improved SNP-based prediction for complex traits.
- Bayes factors in GWAS. Explain what is the difference between Bayesian approach and the traditional P-value-based association testing. Demonstrate the differences using R and simulated data. See also 'Bayes factor' part of practicals1.R.
Stephens and Balding (2009) Bayesian statistical methods for genetic association studies.
Wakefield (2009) Bayes Factors for Genome-Wide Association Studies: Comparison with P-values
- Covariates in logistic regression. Explain what are surprising effects about covariate adjustment in case-control studies and how should covariates be used in those studies. Demonstrate with R.
Mefford & Witte (2012) The Covariate's Dilemma.
Pirinen et al. (2012) Including known covariates can reduce power to detect genetic effects in case-control studies.
Zaitlen et al. (2012) Informed Conditioning on Clinical Covariates Increases Power in Case-Control Association Studies.
- Significance levels in GWAS. Explain simulation experiments that have been carried out to determine relevant significance threshold in GWAS. Explain also more generally the connection between significance thresholds and statistical power and probability that a statistical association is not a false positive.
Sham & Purcell. (2014) Statistical power and significance testing in large-scale genetic studies.
Pe'er et al. (2008) Estimation of the multiple testing burden for genomewide association studies of nearly all common variants.
Dudbridge & Gusnanto (2008) Estimation of significance thresholds for genomewide association scans.
Hoggart et al. (2008) Genome-Wide Significance for Dense SNP and Resequencing Data.
- Article series on GWAS in Nature Reviews Genetics.
- Anderson et al. (2010). Data quality control in genetic case-control association studies.
- Vukcevic. (2009). Bayesian and Frequentist Methods and Analyses of Genome-Wide Association Studies
- Pirinen et al. (2012) Including known covariates can reduce power to detect genetic effects in case-control studies
- Yang et al. (2010) Common SNPs explain a large proportion of the heritability for human height.
- Lääperi (2015) Msc thesis. Linear mixed models for estimating heritability and testing genetic association in family data
- Lee et al. (2014) Rare-Variant Association Analysis: Study Designs and Statistical Tests.
- Voight et al. (2012) Plasma HDL cholesterol and risk of myocardial infarction: a mendelian randomisation study
- Do et al. (2013) Common variants associated with plasma triglycerides and risk for coronary artery disease
- Video: "Genome-Wide Association Studies - Karen Mohlke (2012)"
- Video: "Understanding and Interpreting GWAS Data" by Paul de Bakker.
Did you forget to register? What to do?
Course feedback can be given at any point during the course. Click here.