Introduction to Open Data Science, spring 2017

Last modified by Xwiki VePa on 2025/01/08 07:17

Introduction to Open Data Science, spring 2017

Motivation: Our era of data - larger than ever and complex like chaos - requires several skills from statisticians and other data scientists. We must discover the patterns hidden behind numbers in matrices and arrays. We are not afraid of coding, recoding, programming, or modelling. We want to visualize, analyze, interpret, understand, and communicate. These are the core themes of Open Data Science (Open Data - Open Science - Data Science). And this course is THE course for learning these skills.

The above text modified from: https://www.crcpress.com/Correspondence-Analysis-in-Practice-Third-Edition/Greenacre/p/book/9781498731775?tab=rev

Teacher

Kimmo Vehkalahti, University Lecturer, Adj.Prof., D.Soc.Sci (Statistics)
Fellow of the Teachers' Academy

Assistant teachers

Emma Kämäräinen, Tuomo Nieminen, Petteri Mäntymaa (students of Statistics/Data Science)

Thursday 19 January 2017

THE COURSE HAS STARTED, AND THIS Wiki PAGE WILL NOT BE NEEDED OR UPDATED ANYMORE.

See our video "Welcome to the course!"

and check the newest information (published just before the course started) from:

https://courses.helsinki.fi/78995/115961424 (these pages are replacing these Wiki pages)

You may enroll until 25 January 2017.

If you study at Uni Helsinki, Register for the course in Weboodi.
Otherwise, just enroll to the MOOC platform (see the info from the above link).

*******************************************************************************************************************************************
*******************************************************************************************************************************************
*******************************************************************************************************************************************
*******************************************************************************************************************************************
*******************************************************************************************************************************************

General learning objective

After completing this course you will understand the principles and advantages of using open research tools with open data and understand the possibilities of reproducible research.You will know how to use R, R Studio, R markdown, and GitHub for these tasks and also know how to learn more of these open software tools. You will also know how to apply certain statistical methods of data science, that is, data-driven statistics.

Practical info

- ===

WebOodi: 78995 (5 credits), language: English, campus: City Centre.
Period III, starting 19 Jan 2017, ending 2 Mar 2017.
Weekly workshop: Thu 8-10, Unioninkatu 35, lecture room
We recommend you to bring a laptop computer (Mac/Windows/Linux) to the workshops.
Please prepare to work hard several hours each week.

New course for everyone interested in Open Data Science!

Basically meant for the doctoral students of the (Computational) Social Sciences and (Digital) Humanities.
Master's students are also welcome, and it will be suitable even for Bachelor's studies (at least in Statistics).
We learn to use open software tools of Data Science and to analyze openly available data sets.

R, R studio, R markdown and GitHub will be learnt and used throughout the course.

We will also take use of the platforms mooc.helsinki.fi and DataCamp.

The course consists of 7 chapters, one for each week of the teaching period.

Chapter 1 introduces the tools (DataCamp, R, RStudio, GitHub) and the weekly working methods (reports, peer reviews, strict deadlines) of the course.

Chapters 2-6 introduce various topics to be worked with the tools and methods of Chapter 1 using different data sets.

Chapter 7 introduces a special assignment for wrapping up the course after the previous, weekly working phase of six weeks.

1 Tools and methods for open and reproducible research

R
RStudio
Rmarkdown
GitHub

2 Regression and model validation

Simple regression
Multiple regression
Regression diagnostics

3 Logistic regression

Logistic regression
Cross validation: Training set and test set

4 Clustering and classification

K-means clustering (KMC)
Discriminant analysis (DA)

5 Dimensionality reduction techniques

Principal component analysis (PCA)
Correspondence analysis (CA, MCA)

6 Multivariate statistical modelling

Confirmatory factor analysis (CFA)
Structural equation models (SEM)

7 Final assignment

Doctoral/Master's level: using a new data set (perhaps your own)
Bachelor level: using a data set that has been used earlier on the course

Introduction to Open Data Science, spring 2017

Introduction to Open Data Science, spring 2017

Teacher

Assistant teachers

See our video "Welcome to the course!"

General learning objective

Practical info

New course for everyone interested in Open Data Science!

Contents

1 Tools and methods for open and reproducible research

2 Regression and model validation

3 Logistic regression

4 Clustering and classification

5 Dimensionality reduction techniques

6 Multivariate statistical modelling

7 Final assignment

Register for the course

Statistics – it’s not what you think it is.

Navigation