Statistical Methods for Association Mapping
Statistical Methods for Association Mapping, spring 2008
This course will introduce association mapping and the statistical methods and computational techniques needed. The course will be based on chapters 7, and 8 of the book Association Mapping in Plants, together with case studies of recent large scale case-control studies of human disease.
Association mapping is a gene mapping method based on detecting and utilising population-level associations — i.e. non-independence or `linkage disequilibrium' between genetic loci, e.g. between DNA markers and traits of interest. There is interest in using association mapping to find genetic loci associated with variation in complex traits and diseases.
The advent of dense maps (e.g. 500,000 or more) of SNP markers covering the genome, and technologies for screening large numbers of markers per individual is leading to generation of vast amounts of genomic data. However, obtaining useful information from the data is non-trivial, and many published associations are spurious. Statistical methods for analysing the data will be presented. Experimental designs with sufficient power, to overcome the low prior odds for genomic associations, are equally vital. The course will introduce methods and software for ensuring designs have sufficient power to obtain reasonable posterior odds for associations.
The basic concepts of Bayesian statistics, and how to use them for testing scientific hypotheses will be introduced. This will enable computation of posterior probabilities for scientific hypotheses. A range of techniques including analytical approximate methods, conjugate prior distributions and MCMC sampling will be introduced. Bayesian computations for case studies will be demonstrated and compared with classical `frequentist' inference based on p-values, which will be shown to be particularly problematic in a genomics context.
The R system for data analysis and graphics and the BUGS system will be introduced, and the required computations demonstrated. R functions and libraries (ldDesign) will be provided.
The course will start from first principles of Bayesian statistics. Knowledge of the basics of calculus (differentiation, integration), matrix algebra, and probability theory will be an advantage. Basic knowledge of genetics (e.g. Mendelian inheritance, heritability) will also be an advantage. The course will aim to cater for both biologically and statistically oriented students.