This page contains resources for the workshop on digital humanities methods at HDHD, convened by Tanja Säily and Reijo Sund.
Task: study change over time in a word of interest using one of these resources
- Google Ngram Viewer (English & some other languages)
- EEBO N-gram Browser (Early English Books Online)
- Korp (Finnish or Finland Swedish)
Points to think about
- How to get started: need to prepare data for analysis, what sort of ready-to-go data is out there
- Do we start with a research question or do we just explore the data and hope that something emerges?
- Different materials need to be handled differently: a single novel vs. a corpus of correspondence/forum data/whatever
- How to ensure replicability (if not reproducibility)?
- Where to get help? Need to network with computer scientists and the like
Examples
- Self-organising maps [Klami & Honkela, 2007]
- Google Ngrams: 18th-century swearing?
- Keyword analysis: Civil-War effect? [Lijffijt et al., 2012]
- Topic modeling [Honkela et al., 2012]
- Visualisation of text corpora [Siirtola et al., 2011]
- Sentiment analysis [Honkela et al., 2014]
Tools and methods
- AntConc (concordances, keywords, collocations)
- Voyant Tools (a web-based reading and analysis environment for digital texts, documentation)
- MALLET (topic modeling)
- Google Charts, Google Maps, Google Ngram Viewer (visualisation)
- Text Variation Explorer (text visualisation)
- Wordle, Tagxedo (word clouds)
- SOMbrero Web User Interface (self-organising maps)
- GIS tools and data (geographic information systems)
- DIRT Digital research Tools, Mapping
R (statistics, text mining)
- R and Data Mining: Text Mining
- Natural Language Processing in R
- Text Mining Infrastructure in R [Feinerer et al., 2008]
- Media Corpora, Text Mining, and the Sociological Imagination [Bastin & Bouchet-Valat, 2014]
- Introduction to the tm Package – Text Mining in R
- Graphical Integrated Text Mining Solution in R
- Text Mining Handbook
Data sources
- Kielipankki/FIN-CLARIN (Finnish, Finland Swedish)
- Turku Dependency Treebank (Finnish)
- Early English Books Online (EEBO-TCP)
- Eighteenth Century Collections Online (ECCO-TCP)
- Linguistic Data Consortium
- Oxford Text Archive
- Corpora at Lancaster (including EEBO)
- Corpora by Mark Davies