Wiki source code of Introduction to Open Data Science, spring 2017
Last modified by kvehkala@helsinki_fi on 2024/02/07 06:37
Show last authors
author | version | line-number | content |
---|---|---|---|
1 | = (% class="confluence-link confluence-link" %)**Introduction to Open Data Science, spring 2017**(%%) = | ||
2 | |||
3 | (% class="confluence-link confluence-link" %)** | ||
4 | ** | ||
5 | |||
6 | (% class="confluence-link" style="color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none; color: rgb(255, 0, 0)" %)**Motivation:**(% class="confluence-link" style="color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none" %) Our era of data - larger than ever and complex like chaos - requires several skills from statisticians and other data scientists. We must discover the patterns hidden behind numbers in matrices and arrays. (% class="confluence-link" style="color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none; color: rgb(255, 0, 0); color: rgb(0, 0, 0); text-decoration: none" %)We are not afraid of coding, recoding, programming, or modelling. We want to visualize, analyze, interpret, understand, and communicate.(% style="color: rgb(0,0,0);text-decoration: none;" %) These are the core themes of **Open Data Science** ** (**(% style="color: rgb(0, 0, 0); text-decoration: none; color: rgb(255, 0, 0)" %)**//Open Data - Open Science - Data Science//**(% style="color: rgb(0,0,0);text-decoration: none;" %)**).** And this course is THE course for learning these skills. | ||
7 | |||
8 | (% style="color: rgb(0,0,0);text-decoration: none;" %)//The above text modified from~://(%%) [[(% style="color: rgb(17,85,204);text-decoration: underline;" %)https:~~/~~/www.crcpress.com/Correspondence-Analysis-in-Practice-Third-Edition/Greenacre/p/book/9781498731775?tab=rev>>url:https://www.crcpress.com/Correspondence-Analysis-in-Practice-Third-Edition/Greenacre/p/book/9781498731775?tab=rev||style="text-decoration: none;" shape="rect"]] | ||
9 | |||
10 | === (% style="color: rgb(255,0,0);" %)Teacher(%%) === | ||
11 | |||
12 | [[Kimmo Vehkalahti>>url:http://wiki.helsinki.fi/display/SocStats/Vehkalahti%2C+Kimmo||shape="rect"]], University Lecturer, Adj.Prof., D.Soc.Sci (Statistics) | ||
13 | Fellow of the [[Teachers' Academy>>url:http://www.helsinki.fi/teachersacademy||shape="rect"]] | ||
14 | |||
15 | === (% style="color: rgb(255,0,0);" %)Assistant teachers(%%) === | ||
16 | |||
17 | **Emma Kämäräinen**, **Tuomo Nieminen**, **Petteri Mäntymaa** (students of Statistics/Data Science) | ||
18 | |||
19 | |||
20 | |||
21 | (% style="color: rgb(255,0,0);" %)**Thursday 19 January 2017 | ||
22 | ** | ||
23 | |||
24 | (% style="color: rgb(255,0,0);" %)** | ||
25 | ** | ||
26 | |||
27 | (% style="color: rgb(255,0,0);" %)**THE COURSE HAS STARTED, AND THIS Wiki PAGE WILL NOT BE NEEDED OR UPDATED ANYMORE. | ||
28 | ** | ||
29 | |||
30 | (% style="color: rgb(255,0,0);" %)** ** | ||
31 | |||
32 | === (% style="color: rgb(255,0,0);" %)**See our video [["Welcome to the course!">>url:https://vimeo.com/195829801||shape="rect"]]**(%%) === | ||
33 | |||
34 | (% style="color: rgb(255,0,0);" %)**and check the newest information (published just before the course started) from: | ||
35 | ** | ||
36 | |||
37 | (% style="color: rgb(255,0,0);" %)** ** | ||
38 | |||
39 | (% style="color: rgb(255,0,0);" %)**[[https:~~/~~/courses.helsinki.fi/78995/115961424>>url:https://courses.helsinki.fi/78995/115961424||shape="rect"]] (these pages are replacing these Wiki pages) | ||
40 | ** | ||
41 | |||
42 | (% style="color: rgb(255,0,0);" %)** ** | ||
43 | |||
44 | (% style="color: rgb(255,0,0);" %)**You may enroll until 25 January 2017.** | ||
45 | |||
46 | (% style="color: rgb(255,0,0);" %)**If you study at Uni Helsinki, **(% style="color: rgb(0,0,255);" %)**[[(% style="color: rgb(0, 0, 255); color: rgb(0, 0, 255)" %)Register for the course>>url:https://weboodi.helsinki.fi/hy/opettaptied.jsp?html=1&OpetTap=115961424||shape="rect"]](%%) **(% style="color: rgb(0, 0, 255); color: rgb(255, 0, 0)" %)**in Weboodi**(% style="color: rgb(0,0,255);" %)**. | ||
47 | **(% style="color: rgb(0, 0, 255); color: rgb(255, 0, 0)" %)**Otherwise, just enroll to the MOOC platform (see the info from the above link).** | ||
48 | |||
49 | (% style="color: rgb(0, 0, 255); color: rgb(255, 0, 0)" %)** **(% style="color: rgb(0,0,255);" %)** | ||
50 | ** | ||
51 | |||
52 | (% style="color: rgb(255,0,0);" %)** ** | ||
53 | |||
54 | (% style="color: rgb(255,0,0);" %)**~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~** | ||
55 | **(% style="color: rgb(255, 0, 0); color: rgb(255, 0, 0)" %)**~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~** | ||
56 | ~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~****(% style="color: rgb(255,0,0);" %)** | ||
57 | **(% style="color: rgb(255, 0, 0); color: rgb(255, 0, 0)" %)**~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~****(% style="color: rgb(255,0,0);" %)** | ||
58 | **(% style="color: rgb(255, 0, 0); color: rgb(255, 0, 0); color: rgb(255, 0, 0)" %)**~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~*~****(% style="color: rgb(255, 0, 0); color: rgb(255, 0, 0)" %)** | ||
59 | ** | ||
60 | |||
61 | (% style="color: rgb(255,0,0);" %)** | ||
62 | ** | ||
63 | |||
64 | === (% style="color: rgb(255,0,0);" %)**General learning objective**(%%) === | ||
65 | |||
66 | (% style="color: rgb(255, 0, 0); color: rgb(0, 51, 102); color: rgb(0, 0, 0)" %)**After completing this course you will u**(% style="color: rgb(255, 0, 0); color: rgb(0, 51, 102); color: rgb(0, 0, 0); color: rgb(0, 0, 0); text-decoration: none" %)**nderstand the principles and advantages of using open research tools with open data and understand the **(% style="color: rgb(255, 0, 0); color: rgb(0, 51, 102); color: rgb(0, 0, 0); color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none" %)**possibilities of reproducible research.You will **(% style="color: rgb(255, 0, 0); color: rgb(0, 51, 102); color: rgb(0, 0, 0); color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none" %)**know how to use R, R Studio, R markdown, and GitHub for these tasks and also know how to learn mo**(% style="color: rgb(255, 0, 0); color: rgb(0, 0, 0)" %)**re of these open software tools. You will also know how to apply certain statistical methods of data science, that is, data-driven statistics.** | ||
67 | |||
68 | === (% style="color: rgb(255,0,0);" %)**Practical info**(%%)** | ||
69 | ** === | ||
70 | |||
71 | * **WebOodi: 78995 (5 credits), language: English, campus: City Centre.** | ||
72 | * **Period III, starting 19 Jan 2017, ending 2 Mar 2017.** | ||
73 | * **Weekly workshop: Thu 8-10, Unioninkatu 35, lecture room** | ||
74 | * **We recommend you to bring a laptop computer (Mac/Windows/Linux) to the workshops. | ||
75 | ** | ||
76 | * **Please prepare to work hard several hours each week.** | ||
77 | |||
78 | === (% style="color: rgb(255,0,0);" %)New course for everyone interested in Open Data Science! | ||
79 | (%%) === | ||
80 | |||
81 | * (% class="confluence-link confluence-link" %)Basically meant for the doctoral students of the **(Computational) Social Sciences** and **(Digital) Humanities**.(% class="confluence-link" %) | ||
82 | * (% class="confluence-link confluence-link" %)Master's students are also welcome, and it will be suitable even for Bachelor's studies (at least in Statistics).(% class="confluence-link confluence-link confluence-link" %)// // | ||
83 | * (% class="confluence-link confluence-link" %)We learn to use open software tools of **Data Science** and to analyze openly available data sets. | ||
84 | |||
85 | * (% class="confluence-link confluence-link" %)R, R studio, R markdown and GitHub will be learnt and used throughout the course. | ||
86 | |||
87 | * (% class="confluence-link confluence-link" %)We will also take use of the platforms [[mooc.helsinki.fi>>url:http://mooc.helsinki.fi||shape="rect"]] and [[DataCamp>>url:https://www.datacamp.com/||shape="rect"]]. | ||
88 | |||
89 | == (% style="color: rgb(255,0,0);" %)**Contents**(%%) == | ||
90 | |||
91 | The course consists of **7 chapters**, one for each week of the teaching period. | ||
92 | |||
93 | Chapter 1 introduces the tools (DataCamp, R, RStudio, GitHub) and the **weekly working methods** (reports, peer reviews, strict deadlines) of the course. | ||
94 | |||
95 | Chapters 2-6 introduce various topics to be worked with the tools and methods of Chapter 1 using different data sets. | ||
96 | |||
97 | Chapter 7 introduces a special assignment for wrapping up the course after the previous, weekly working phase of six weeks. | ||
98 | |||
99 | === (% class="confluence-link confluence-link" style="color: rgb(0,0,0);text-decoration: none;" %)1 Tools and methods for open and reproducible research(%%) === | ||
100 | |||
101 | * R | ||
102 | * RStudio | ||
103 | * Rmarkdown | ||
104 | * GitHub | ||
105 | |||
106 | === (% style="color: rgb(0,0,0);text-decoration: none;" %)2 Regression and model validation | ||
107 | (%%) === | ||
108 | |||
109 | * ((( | ||
110 | (% style="color: rgb(0,0,0);text-decoration: none;" %)Simple regression | ||
111 | ))) | ||
112 | * ((( | ||
113 | (% style="color: rgb(0,0,0);text-decoration: none;" %)Multiple regression | ||
114 | ))) | ||
115 | * ((( | ||
116 | (% style="color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none" %)Regression diagnostics | ||
117 | ))) | ||
118 | |||
119 | === (% style="color: rgb(0,0,0);text-decoration: none;" %)3 Logistic regression(%%) === | ||
120 | |||
121 | * Logistic regression(% style="color: rgb(0,0,0);text-decoration: none;" %) | ||
122 | * (% style="color: rgb(0,0,0);text-decoration: none;" %)Cross validation: Training set and test set | ||
123 | |||
124 | === (% style="color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none" %)4(% style="color: rgb(0,0,0);text-decoration: none;" %) Clustering and classification | ||
125 | (%%) === | ||
126 | |||
127 | * ((( | ||
128 | (% style="color: rgb(0,0,0);text-decoration: none;" %)K-means clustering (KMC) | ||
129 | |||
130 | ))) | ||
131 | * ((( | ||
132 | (% style="color: rgb(0,0,0);text-decoration: none;" %)Discriminant analysis (DA) | ||
133 | |||
134 | ))) | ||
135 | |||
136 | === (% style="color: rgb(0,0,0);text-decoration: none;" %)5 (% style="color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none" %)Dimensionality reduction techniques(%%) === | ||
137 | |||
138 | * ((( | ||
139 | (% style="color: rgb(0,0,0);text-decoration: none;" %)Principal component analysis (PCA) | ||
140 | ))) | ||
141 | * ((( | ||
142 | (% style="color: rgb(0,0,0);text-decoration: none;" %)Correspondence analysis (CA, MCA) | ||
143 | ))) | ||
144 | |||
145 | === (% style="color: rgb(0,0,0);text-decoration: none;" %)6 Multivariate statistical modelling(%%) === | ||
146 | |||
147 | * ((( | ||
148 | (% style="color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none" %)Confirmatory factor analysis (CFA) | ||
149 | ))) | ||
150 | * (% style="color: rgb(0, 0, 0); text-decoration: none; color: rgb(0, 0, 0); text-decoration: none" %)Structural equation models (SEM) | ||
151 | |||
152 | |||
153 | === 7 Final assignment === | ||
154 | |||
155 | * Doctoral/Master's level: using a new data set (perhaps your own) | ||
156 | * Bachelor level: using a data set that has been used earlier on the course | ||
157 | |||
158 | == (% style="color: rgb(0,0,255);" %)**[[(% style="color: rgb(0, 0, 255); color: rgb(0, 0, 255)" %)Register for the course>>url:https://weboodi.helsinki.fi/hy/opettaptied.jsp?html=1&OpetTap=115961424||shape="rect"]](%%)**(%%) == | ||
159 | |||
160 | |||
161 | |||
162 | == [[Statistics – it’s not what you think it is.>>url:http://thisisstatistics.org/||shape="rect"]] == | ||
163 | |||
164 | [[~[~[image:attach:Rlogo.png~]~]>>url:https://www.r-project.org/||shape="rect"]] | ||
165 | |||
166 | (% class="confluence-link confluence-link" %)//** | ||
167 | **// | ||
168 | |||
169 | (% class="confluence-link confluence-link" %)//** | ||
170 | **// |