1.4 Corpus Linguistics

Last modified by smlaakso@helsinki_fi on 2024/01/16 08:08

Corpus linguistics usually refers to the study of linguistic phenomena through statistical analysis of large collections of machine-readable texts, i.e. corpora

Within discourse analysis, the term corpus-assisted discourse analysis (CADS) is used (Baker 2006).


Methods

  • Frequency lists
  • Keywords (e.g. Gabrielatos 2018)
  • Collocation (e.g. Gablasova, Brezina, and McEnery 2017) - words that tend to occur close to a node word that the researcher is are interested in
  • Word sketch (in Sketch Engine) - variation of collocation analysis where collocates are grouped according to grammatical relation to node word
  • Diachronic trends (e.g. Kilgarriff, Busta, and Rychlý 2015)

Software

  1. Sketch Engine
    • A web-based corpus management toolkit
    • Provides access to a large selection of corpora in various languages
    • Beginner-friendly
    • used guide https://www.sketchengine.eu/guide/
    • Select institutional login to see if your institution provides access!
  2. AntConc
    • A basic, free toolkit for corpus analysis
  3. Voyant
    • Web-based reading and analysis environment for digital texts
  4. Kielipankki KORP
    • Korp is a Web-based tool that allows its user to search for keywords in text corpora (typically grammatically parsed) and to generate concordances.
    • Korp gives its users access to extensive collections of texts in Finnish and Finland Swedish.
  5. #LancsBox

References and readings

  • Baker, Paul. 2006. Using Corpora in Discourse Analysis. London: Continuum.
  • Brezina, Vaclav. 2018. Statistics in Corpus Linguistics: A Practical Guide. Cambridge: Cambridge University Press.
  • Gablasova, Dana, Vaclav Brezina, and Tony McEnery. 2017. “Collocations in Corpus-Based Language Learning Research: Identifying, Comparing, and Interpreting the Evidence.” Language Learning 67 (S1): 155–79. https://doi.org/https://doi.org/10.1111/lang.12225.
  • Gabrielatos, Costas. 2018. “Keyness Analysis: Nature, Metrics and Techniques.” In Corpus Approaches to Discourse: A Critical Review, edited by Charlotte Taylor and Anna Marchi, 225–58. Milton, UK: Routledge.
  • Kilgarriff, Adam, Vít Baisa, Jan Bušta, Miloš Jakubíček, Vojtěch Kovář, Jan Michelfeit, Pavel Rychlý, and Vít Suchomel. 2014. “The Sketch Engine: Ten Years On.” Lexicography 1: 7–36. https://doi.org/10.1007/s40607-014-0009-9.
  • Kilgarriff, Adam, Jan Busta, and Pavel Rychlý. 2015. “DIACRAN: A Framework for Diachronic Analysis.” Corpus Linguistics (CL2015). Lancaster, UK. internal-pdf://107.6.110.146/Diacran_CL2015.pdf.
  • Kyröläinen, Aki, and Veronika Laippala. 2020. “Määrällinen Korpuslingvistiikka.” In Kielentutkimuksen Menetelmiä, edited by M Luodonpää-Manni, M Hamunen, R Konstenius, M Miestamo, U Nikanne, and K Sinnemäki. Helsinki: Suomalaisen Kirjallisuuden Seura.
  • Lillqvist, Ella. 2019. “Korpusavusteinen Diskurssianalyysi Kuluttajatutkimuksen Menetelmänä: Pikavippikeskustelun Synty, Nousu Ja Arkipäiväistyminen Suomi24-Keskustelufoorumilla.” Kulutustutkimus.Nyt 13 (1): 5–30. https://journal.fi/kulutustutkimus/article/view/84608.
  • Mautner, Gerlinde. 2016. “Checks and Balances: How Corpus Linguistics Can Contribute to CDA.” In Methods of Critical Discourse Studies, edited by R Wodak and M Meyer, 3rd ed., 122–143. London: SAGE Publications.