Intensive course on Bayesian inference
Intensive course on Bayesian inference for the Nordic biostatistics network, spring 2013
Lecturer
Scope
7.5 cu. This course is primarily intended for participants from the Nordic biostatistics network, excluding University of Helsinki. Statistics students at other Finnish universities are also welcome. Course explains the fundamental issues in Bayesian inference, role of prior probabilities, predictive modeling, hierarchical models, model selection, asymptotics. These concepts are also put into several application contexts to demonstrate how and why Bayesian inference works, what are the benefits and potential pitfalls. No prior knowledge about Bayesian inference is expected. The aim is to equip participants with good intuition on how the Bayesian machinery works, rather than focus on exact mathematical formalism.
Type
advanced level course.
Prerequisites
First course on calculus, linear algebra and probability.
Lectures
First part 4.3.-8.3. Every day 10-12, 13-16. Room B120 in the Exactum building.
Second part 13.5.-17.5. Every day 10-12, 13-16. Room C124 in the Exactum building.
Exams
The participants will have to solve exercises and do home projects to gain the credits from the course. Sets of exercises: 1,2,3,4,5, return solutions by email to the lecturer by May 31st. During the break between part 1 and 2 of the course, participants should familiarize themselves with the WinBUGS software for Bayesian inference (many tutorials available from the website, there is even a YouTube video introducing it). Using WinBUGS fit these models from the Bayes course of Jukka Ranta. Summarize as joint report and send to the lecturer before the second part starts. The final part of examination involves doing any two out of these small project alternatives and sending the report back to the lecturer by August 31st.
Registration
To register for the course, send an email to the lecturer (first dot last at helsinki dot fi).
Preliminary lecture diary - Part 1
Many of these lecture slides will be used during the course. Introduction to subjective and epistemic vs physical perspective on probability, Bayes' theorem (see this simple eye-opener on our perception of probabilities and information), dynamic revision of uncertainty using Bayes' theorem; see the example on perception and sensory integration, Search & Rescue game and usefulness of systematic use of prior information in the context of infant mortality and SIDS (see this article by Gilbert et al. 2005). Potential consequences of seemingly innocent and vague priors are illustrated by this example.
Hierachical models - a vanilla introduction, exchangeability, de Finetti's representation theorem, prior and posterior predictive distributions, illustrations with probabilistic classification of documents, see also the following two papers about predictive classification in addition to the lecture slides: paper 1, paper 2. Example of inference and utility. More about hierarchical models - a solid frozen vanilla cracker example of a hierarchical model. The cracker example uses advanced importance sampling, a nice introduction to importance sampling can be found here. About ABC (approximate Bayesian computation) inference, see this introduction. More details on choosing priors through formal rules are found in this review. Gu, L. Notes on Dirichlet distribution with relatives provides a concise recapitulation of some of the central formulas around the Dirichlet distribution. Example of Bayesian meta-analysis from the biostatistics book of George Woodworth. Finite mixture models and EM-algorithm, Bayesian learning for Markov chains, introduction to hidden Markov models (HMMs) and recursions for various posterior probabilities in HMMs, all from the HMM book by prof Timo Koski at KTH.A biological example of the use of HMM is here. HMMs are also relevant for a multitude of engineering applications, such as dynamic tracking, an excellent technical review of this field by Arnaud Doucet is here, another excellent review by Cyrill Stachniss and an excellent short introduction by Bryan Minor is here.
A nice tutorial on Bayesian non-parametric models is available here, see also these slides on mixture models by Christopher Bishop.
Preliminary lecture diary - Part 2
Model comparison and selection, asymptotic behavior of model selection procedures, see this proof of asymptotic consistency for the discrete case and this article by Ziheng Yang, formal rules for choosing priors in the context of model comparison, Occham's razor principle, see this primer on Occham's razor and Bayesian model comparison for Markov chains, Occam's razor in curve fitting - a demo, a nice review of information-theoretic criteria for model selection, a paper on cross-validation and predictive inference, predictive evaluation of forecasts was considered in this article, information-theoretic book by D MacKay where Ch 28 contains a detailed explanation of the Occham's razor principle and Bayesian model comparison, Bayesian learning of the order of a discrete-time Markov chain (see this excerpt from the book: Timo Koski. Hidden Markov models for bioinformatics. Kluwer, 2001). Model selection under improper priors with fractional marginal likelihood (see course slides and these articles: paper1, paper2, paper3), model choice using Bayesian entropy criterion, with application to structural learning of time-series dynamics, effects of proper priors in model selection, a case with clustering of cancer genome data. What happens in Bayesian inference when null hypothesis should not be favored by Occham's razor? This is a general problem in forensics applications and a recent solution to it is presented (see paper: Blomstedt P, Corander J. (2012) Posterior predictive comparisons for the two-sample problem. Communications in Statistics – Theory and Methods, in press). Other forensics related problems that were discussed are crime linking and GSR evidence (see Romain Gauriot, Lawrence Gunaratnam, Rossana Moroni, Tapani Reinikainen, Jukka Corander. (2012). Statistical Challenges in the Quantification of Gunshot Residue Evidence. Journal of Forensic Sciences, in press). For a general introduction to Bayesian networks and expert systems, see this paper. Introduction to minimum description length principle and an application of it to clustering genetic data. Non-reversible MCMC and its application to graphical model learning.
Bibliography
Examples of useful books on Bayesian theory and modeling are Bernardo & Smith (1994), O'Hagan (1994), Schervish (1995), Gelman et al. (2004), see also the lecture slides collection.