Last modified by kvehkala@helsinki_fi on 2024/02/07 06:37

Show last authors
1 = (% class="confluence-link confluence-link" %)**Introduction to Open Data Science, spring 2017**(%%) =
2
3 (% class="confluence-link confluence-link" %)**
4 **
5
6 (% class="confluence-link" style="color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none; color: rgb(255, 0, 0)" %)**Motivation:**(% class="confluence-link" style="color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none" %) Our era of data - larger than ever and complex like chaos - requires several skills from statisticians and other data scientists. We must discover the patterns hidden behind numbers in matrices and arrays. (% class="confluence-link" style="color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none; color: rgb(255, 0, 0); color: rgb(0, 0, 0); text-decoration: none" %)We are not afraid of coding, recoding, programming, or modelling. We want to visualize, analyze, interpret, understand, and communicate.(% style="color: rgb(0,0,0);text-decoration: none;" %) These are the core themes of **Open Data Science** ** (**(% style="color: rgb(0, 0, 0); text-decoration: none; color: rgb(255, 0, 0)" %)**//Open Data - Open Science - Data Science//**(% style="color: rgb(0,0,0);text-decoration: none;" %)**).** And this course is THE course for learning these skills.
7
8 (% style="color: rgb(0,0,0);text-decoration: none;" %)//The above text modified from~://(%%) [[(% style="color: rgb(17,85,204);text-decoration: underline;" %)https:~~/~~/www.crcpress.com/Correspondence-Analysis-in-Practice-Third-Edition/Greenacre/p/book/9781498731775?tab=rev>>url:https://www.crcpress.com/Correspondence-Analysis-in-Practice-Third-Edition/Greenacre/p/book/9781498731775?tab=rev||style="text-decoration: none;" shape="rect"]]
9
10 === (% style="color: rgb(255,0,0);" %)Teacher(%%) ===
11
12 [[Kimmo Vehkalahti>>url:http://wiki.helsinki.fi/display/SocStats/Vehkalahti%2C+Kimmo||shape="rect"]], University Lecturer, Adj.Prof., D.Soc.Sci (Statistics)
13 Fellow of the [[Teachers' Academy>>url:http://www.helsinki.fi/teachersacademy||shape="rect"]]
14
15 === (% style="color: rgb(255,0,0);" %)Assistant teachers(%%) ===
16
17 **Emma Kämäräinen**, **Tuomo Nieminen**, **Petteri Mäntymaa** (students of Statistics/Data Science)
18
19
20
21 (% style="color: rgb(255,0,0);" %)**Thursday 19 January 2017
22 **
23
24 (% style="color: rgb(255,0,0);" %)**
25 **
26
27 (% style="color: rgb(255,0,0);" %)**THE COURSE HAS STARTED, AND THIS Wiki PAGE WILL NOT BE NEEDED OR UPDATED ANYMORE.
28 **
29
30 (% style="color: rgb(255,0,0);" %)** **
31
32 === (% style="color: rgb(255,0,0);" %)**See our video [["Welcome to the course!">>url:https://vimeo.com/195829801||shape="rect"]]**(%%) ===
33
34 (% style="color: rgb(255,0,0);" %)**and check the newest information (published just before the course started) from:
35 **
36
37 (% style="color: rgb(255,0,0);" %)** **
38
39 (% style="color: rgb(255,0,0);" %)**[[https:~~/~~/courses.helsinki.fi/78995/115961424>>url:https://courses.helsinki.fi/78995/115961424||shape="rect"]] (these pages are replacing these Wiki pages)
40 **
41
42 (% style="color: rgb(255,0,0);" %)** **
43
44 (% style="color: rgb(255,0,0);" %)**You may enroll until 25 January 2017.**
45
46 (% style="color: rgb(255,0,0);" %)**If you study at Uni Helsinki, **(% style="color: rgb(0,0,255);" %)**[[(% style="color: rgb(0, 0, 255); color: rgb(0, 0, 255)" %)Register for the course>>url:https://weboodi.helsinki.fi/hy/opettaptied.jsp?html=1&OpetTap=115961424||shape="rect"]](%%) **(% style="color: rgb(0, 0, 255); color: rgb(255, 0, 0)" %)**in Weboodi**(% style="color: rgb(0,0,255);" %)**.
47 **(% style="color: rgb(0, 0, 255); color: rgb(255, 0, 0)" %)**Otherwise, just enroll to the MOOC platform (see the info from the above link).**
48
49 (% style="color: rgb(0, 0, 255); color: rgb(255, 0, 0)" %)** **(% style="color: rgb(0,0,255);" %)**
50 **
51
52 (% style="color: rgb(255,0,0);" %)** **
53
54 (% style="color: rgb(255,0,0);" %)**~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~**
55 **(% style="color: rgb(255, 0, 0); color: rgb(255, 0, 0)" %)**~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~**
56 ~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~****(% style="color: rgb(255,0,0);" %)**
57 **(% style="color: rgb(255, 0, 0); color: rgb(255, 0, 0)" %)**~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~****(% style="color: rgb(255,0,0);" %)**
58 **(% style="color: rgb(255, 0, 0); color: rgb(255, 0, 0); color: rgb(255, 0, 0)" %)**~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~****(% style="color: rgb(255, 0, 0); color: rgb(255, 0, 0)" %)**
59 **
60
61 (% style="color: rgb(255,0,0);" %)**
62 **
63
64 === (% style="color: rgb(255,0,0);" %)**General learning objective**(%%) ===
65
66 (% style="color: rgb(255, 0, 0); color: rgb(0, 51, 102); color: rgb(0, 0, 0)" %)**After completing this course you will u**(% style="color: rgb(255, 0, 0); color: rgb(0, 51, 102); color: rgb(0, 0, 0); color: rgb(0, 0, 0); text-decoration: none" %)**nderstand the principles and advantages of using open research tools with open data and understand the **(% style="color: rgb(255, 0, 0); color: rgb(0, 51, 102); color: rgb(0, 0, 0); color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none" %)**possibilities of reproducible research.You will **(% style="color: rgb(255, 0, 0); color: rgb(0, 51, 102); color: rgb(0, 0, 0); color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none" %)**know how to use R, R Studio, R markdown, and GitHub for these tasks and also know how to learn mo**(% style="color: rgb(255, 0, 0); color: rgb(0, 0, 0)" %)**re of these open software tools. You will also know how to apply certain statistical methods of data science, that is, data-driven statistics.**
67
68 === (% style="color: rgb(255,0,0);" %)**Practical info**(%%)**
69 ** ===
70
71 * **WebOodi: 78995 (5 credits), language: English, campus: City Centre.**
72 * **Period III, starting 19 Jan 2017, ending 2 Mar 2017.**
73 * **Weekly workshop: Thu 8-10, Unioninkatu 35, lecture room**
74 * **We recommend you to bring a laptop computer (Mac/Windows/Linux) to the workshops.
75 **
76 * **Please prepare to work hard several hours each week.**
77
78 === (% style="color: rgb(255,0,0);" %)New course for everyone interested in Open Data Science!
79 (%%) ===
80
81 * (% class="confluence-link confluence-link" %)Basically meant for the doctoral students of the **(Computational) Social Sciences** and **(Digital) Humanities**.(% class="confluence-link" %)
82 * (% class="confluence-link confluence-link" %)Master's students are also welcome, and it will be suitable even for Bachelor's studies (at least in Statistics).(% class="confluence-link confluence-link confluence-link" %)// //
83 * (% class="confluence-link confluence-link" %)We learn to use open software tools of **Data Science** and to analyze openly available data sets.
84
85 * (% class="confluence-link confluence-link" %)R, R studio, R markdown and GitHub will be learnt and used throughout the course.
86
87 * (% class="confluence-link confluence-link" %)We will also take use of the platforms [[mooc.helsinki.fi>>url:http://mooc.helsinki.fi||shape="rect"]] and [[DataCamp>>url:https://www.datacamp.com/||shape="rect"]].
88
89 == (% style="color: rgb(255,0,0);" %)**Contents**(%%) ==
90
91 The course consists of **7 chapters**, one for each week of the teaching period.
92
93 Chapter 1 introduces the tools (DataCamp, R, RStudio, GitHub) and the **weekly working methods** (reports, peer reviews, strict deadlines) of the course.
94
95 Chapters 2-6 introduce various topics to be worked with the tools and methods of Chapter 1 using different data sets.
96
97 Chapter 7 introduces a special assignment for wrapping up the course after the previous, weekly working phase of six weeks.
98
99 === (% class="confluence-link confluence-link" style="color: rgb(0,0,0);text-decoration: none;" %)1 Tools and methods for open and reproducible research(%%) ===
100
101 * R
102 * RStudio
103 * Rmarkdown
104 * GitHub
105
106 === (% style="color: rgb(0,0,0);text-decoration: none;" %)2 Regression and model validation
107 (%%) ===
108
109 * (((
110 (% style="color: rgb(0,0,0);text-decoration: none;" %)Simple regression
111 )))
112 * (((
113 (% style="color: rgb(0,0,0);text-decoration: none;" %)Multiple regression
114 )))
115 * (((
116 (% style="color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none" %)Regression diagnostics
117 )))
118
119 === (% style="color: rgb(0,0,0);text-decoration: none;" %)3 Logistic regression(%%) ===
120
121 * Logistic regression(% style="color: rgb(0,0,0);text-decoration: none;" %)
122 * (% style="color: rgb(0,0,0);text-decoration: none;" %)Cross validation: Training set and test set
123
124 === (% style="color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none" %)4(% style="color: rgb(0,0,0);text-decoration: none;" %) Clustering and classification
125 (%%) ===
126
127 * (((
128 (% style="color: rgb(0,0,0);text-decoration: none;" %)K-means clustering (KMC)
129
130 )))
131 * (((
132 (% style="color: rgb(0,0,0);text-decoration: none;" %)Discriminant analysis (DA)
133
134 )))
135
136 === (% style="color: rgb(0,0,0);text-decoration: none;" %)5 (% style="color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none" %)Dimensionality reduction techniques(%%) ===
137
138 * (((
139 (% style="color: rgb(0,0,0);text-decoration: none;" %)Principal component analysis (PCA)
140 )))
141 * (((
142 (% style="color: rgb(0,0,0);text-decoration: none;" %)Correspondence analysis (CA, MCA)
143 )))
144
145 === (% style="color: rgb(0,0,0);text-decoration: none;" %)6 Multivariate statistical modelling(%%) ===
146
147 * (((
148 (% style="color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none" %)Confirmatory factor analysis (CFA)
149 )))
150 * (% style="color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none" %)Structural equation models (SEM)
151
152
153 === 7 Final assignment ===
154
155 * Doctoral/Master's level: using a new data set (perhaps your own)
156 * Bachelor level: using a data set that has been used earlier on the course
157
158 == (% style="color: rgb(0,0,255);" %)**[[(% style="color: rgb(0, 0, 255); color: rgb(0, 0, 255)" %)Register for the course>>url:https://weboodi.helsinki.fi/hy/opettaptied.jsp?html=1&OpetTap=115961424||shape="rect"]](%%)**(%%) ==
159
160
161
162 == [[Statistics – it’s not what you think it is.>>url:http://thisisstatistics.org/||shape="rect"]] ==
163
164 [[~[~[image:attach:Rlogo.png~]~]>>url:https://www.r-project.org/||shape="rect"]]
165
166 (% class="confluence-link confluence-link" %)//**
167 **//
168
169 (% class="confluence-link confluence-link" %)//**
170 **//