Corpus linguistics usually refers to the study of linguistic phenomena through statistical analysis of large collections of machine-readable texts, i.e. corpora.
Within discourse analysis, the term corpus-assisted discourse analysis (CADS) is used (Baker 2006).
Methods
- Frequency lists
- Keywords (e.g. Gabrielatos 2018)
- Collocation (e.g. Gablasova, Brezina, and McEnery 2017) - words that tend to occur close to a node word that the researcher is are interested in
- Word sketch (in Sketch Engine) - variation of collocation analysis where collocates are grouped according to grammatical relation to node word
- Diachronic trends (e.g. Kilgarriff, Busta, and Rychlý 2015)
Software
- Sketch Engine
- A web-based corpus management toolkit
- Provides access to a large selection of corpora in various languages
- Beginner-friendly
- used guide https://www.sketchengine.eu/guide/
- Select institutional login to see if your institution provides access!
- AntConc
- A basic, free toolkit for corpus analysis
- Voyant
- Web-based reading and analysis environment for digital texts
- Kielipankki KORP
- Korp is a Web-based tool that allows its user to search for keywords in text corpora (typically grammatically parsed) and to generate concordances.
- Korp gives its users access to extensive collections of texts in Finnish and Finland Swedish.
- #LancsBox
- #LancsBox is a free, new-generation software package for corpus analysis developed at Lancaster University
- Innovative visualization methods
- User guide: http://corpora.lancs.ac.uk/lancsbox/help.php
References and readings
- Baker, Paul. 2006. Using Corpora in Discourse Analysis. London: Continuum.
- Brezina, Vaclav. 2018. Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.
- Gablasova, Dana, Vaclav Brezina, and Tony McEnery. 2017. “Collocations in Corpus-Based Language Learning Research: Identifying, Comparing, and Interpreting the Evidence.” Language Learning 67 (S1): 155–79. https://doi.org/https://doi.org/10.1111/lang.12225.
- Gabrielatos, Costas. 2018. “Keyness Analysis: Nature, Metrics and Techniques.” In Corpus Approaches to Discourse: A Critical Review, edited by Charlotte Taylor and Anna Marchi, 225–58. Milton, UK: Routledge.
- Kilgarriff, Adam, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. 2014. “The Sketch Engine: Ten Years On.” Lexicography 1: 7–36. https://doi.org/10.1007/s40607-014-0009-9.
- Kilgarriff, Adam, Jan Busta, and Pavel Rychlý. 2015. “DIACRAN: A Framework for Diachronic Analysis.” Corpus Linguistics (CL2015). Lancaster, UK. internal-pdf://107.6.110.146/Diacran_CL2015.pdf.
- Kyröläinen, Aki, and Veronika Laippala. 2020. “Määrällinen Korpuslingvistiikka.” In Kielentutkimuksen Menetelmiä, edited by M Luodonpää-Manni, M Hamunen, R Konstenius, M Miestamo, U Nikanne, and K Sinnemäki. Helsinki: Suomalaisen Kirjallisuuden Seura.
- Lillqvist, Ella. 2019. “Korpusavusteinen Diskurssianalyysi Kuluttajatutkimuksen Menetelmänä: Pikavippikeskustelun Synty, Nousu Ja Arkipäiväistyminen Suomi24-Keskustelufoorumilla.” Kulutustutkimus.Nyt 13 (1): 5–30. https://journal.fi/kulutustutkimus/article/view/84608.
- Mautner, Gerlinde. 2016. “Checks and Balances: How Corpus Linguistics Can Contribute to CDA.” In Methods of Critical Discourse Studies, edited by R Wodak and M Meyer, 3rd ed., 122–143. London: SAGE Publications.