Software development

Last modified by varvio@helsinki_fi on 2024/02/14 06:58

Software development

Computers and algorithms have always been of core importance in data analysis. Today the computer is no longer a human but a digital machine and the algorithms have moved from being paper based to being implemented in sophisticated software. Developing and maintaining software for modern data analysis has become a difficult and resource hungry task. Two approaches to data analysis software are possible. In the first special purpose software is written for a particular problem or group of problems. In the second general purpose software is written to do a wide range of problems.

BUGS -Bayesian inference Using Gibbs Sampling- is a flexible and "easy to use" Markov Chain Monte Carlo based packages for Bayesian data analysis. The latest version of the package OpenBUGS has been released as open source software and has its own web site. One aim of making BUGS open source is to encourage members of the statistical community to extend and improve the software. A major design change from earlier version of BUGS is a more general way of attaching sampling algorithms to nodes or groups of modes in the Bayesian model. Contact:  Bob O'Hara

BAPS (versions 2 & 3) is software for Bayesian inference of the genetic structure in a population using molecular marker genes. BAPS treats both the allele frequencies of the molecular markers and the number of genetically diverged groups in population as random variables. In BAPS 2 inference is based on a novel non-reversible Markov Chain Monte Carlo algorithm, whereas BAPS 3 uses stochastic optimization to infer the posterior mode of the genetic structure, and to quantify the statistical uncertainty about the mode estimate. BAPS 3 also enables analyses of admixture, and the use of prior information about the allele frequencies in the underlying population. Contact: Jukka Siren

BAMA -Bayesian Analysis of Multilocus Association- is a software (written in C-language) to select a trait-associated subset of markers among large number of candidates and is equally applicable for analyzing wide chromosomal segments and small candidate regions. The software is applicable for quantitative and binary traits as well as for multiallelic genotyte (haplotype) data, where there can be some degree of missing values. Supported data designs are random population sample or case-control design, where data points have been collected from individuals which are unrelated (or equally related) to each other. The number of associated markers, their positions and strengths of association are all estimated simultaneously using Markov Chain Monte Carlo estimation. An extended version of above method following alternative modelling practice has been implemented in WinBUGS. An extended version uses genetic/physical distance information between markers to exclude fluctuating effects of confounders (e.g.,mutation, selection, genetic drift, population structure, and variations in allele frequencies) from the association signal. Code of this implementation as well as above software are freely available at http://www.rni.helsinki.fi/~mjs/. Contact: Mikko Sillanpää

Multimapper -Bayesian QTL mapping software for inbred and outbred lines- is a software (written in C-language) which performs multiple Quantitative Trait Locus (QTL) mapping analysis for backcross and F2 crossing designs of inbred or outbred plant/animal lines. More specificly, a version of the software for outbred lines is called Multimapper/OUTBRED. The Multimapper software uses multiple QTL model for one chromosome and accounts for genetic background effects of other chromosomes by including preselected set of marker cofactors to the QTL model. The software can handle missing dataincluding unknown haplotypes in case of outbred populations.The number of QTLs, their positions and genotype effects are all estimated simultaneously using reversible jump MCMC estimation. The software is freely available at http://www.rni.helsinki.fi/~mjs/. Contact:  Mikko Sillanpää