En - Digi.kansalliskirjasto.fi Data
The ready-made export package from Digi can be downloaded from https://digi.kansalliskirjasto.fi/opendata/submit .
The export packages contain the page-specific XML of the digitized newspapers and journals of the National Library of Finland (1771–1910). The export packages has been divided by decades and years to a folder structure, where there exists one XML file for each page. The custom XML file (example) contains:
- The mestadata of the page
- ALTO XML, which contains the words and coordinates of the page. An example of one one ALTO-file from digi's page: http://digi.kansalliskirjasto.fi (You can open the ALTO format via A-icon and then you get both the page text and the ALTO XML).
- Raw text of the page
Terms of use
Please refer to http://digi.kansalliskirjasto.fi/terms . Users may not distribute in-copyright digitized material onwards without the permission from the rights holder.
The structure of the data exports
Length Date Time Name
--------- ---------- ----- ----
0 02-29-2016 09:19 1771-1870/
0 02-29-2016 09:19 1771-1870/fin/
0 02-29-2016 09:12 1771-1870/fin/1861/
4682 02-06-2016 13:02 1771-1870/fin/1861/kk-ocr.xsd
5716 02-06-2016 13:02 1771-1870/fin/1861/1457-4519_1861-01-01_0_001.xml
133732 02-06-2016 13:02 1771-1870/fin/1861/1457-4519_1861-01-01_0_002.xml
199296 02-06-2016 13:02 1771-1870/fin/1861/1457-4519_1861-01-01_0_003.xml
38717 02-06-2016 13:02 1771-1870/fin/1861/1457-4519_1861-01-01_0_004.xml
The structure of the filename
1771-1870/fin/1775/1457-4683_1775-09-01_0_001.xml
YearRange/language/year/ISSN_publicationdate_issuenumber_pagenumber
Download tips
You can proceed download if it has interrupted:
wget -c URL
curl -C -
Feedback
If you have any questions or feedback, please send them via the Feedback-functionality at the right upper corner of digi.
Citing
You can cite the open data for example, in following way:
The National Library of Finland (2019), The open data exports of digitized newspapers and journals, 2019. The National Library of Finland. https://digi.kansalliskirjasto.fi/opendata/submit
Frequently asked questions
Q: How do I get all page files of one particular binding? How do I know in which packages it is located?
A: You can check the publication years of a binding via the Titles view of Digi. After that download the needed packages. NB! You can also contact us via feedback link.
Q: I lost my download link? My download link expired, what should I do?
A: Please, answer to the question form again, you will get a new download link.
Literature
- ===
Pääkkönen, T., Kervinen, J., Nivala, A., Kettunen, K., & Mäkelä, E. (2016). Exporting Finnish Digitized Historical Newspaper Contents for Offline Use. D-Lib Magazine, 22(7/8). http://doi.org/10.1045/july2016-paakkonen
Kettunen, K., Pääkkönen, T., & Koistinen, M. (2016). Kansalliskirjaston digitoitu historiallinen lehtiaineisto 1771–1910: sanatason laatu, kokoelmien käyttö ja laadun parantaminen. Informaatiotutkimus, 35(3), 3–14. http://ojs.tsv.fi/index.php/inf/article/view/59433
Kettunen, Pääkkönen ja Koistinen (2016): http://journal.fi/inf/article/view/59433 (in Finnish).
Applications
You can find couple of tiny scripts from Github, there is an script which enables to get just text from a XML file.
Java and php versions of scripts can be found from here: java, php.