En - Digi.kansalliskirjasto.fi Data

Last modified by tzpaakko@helsinki_fi on 2024/01/16 07:58

The ready-made export package from Digi can be downloaded from https://digi.kansalliskirjasto.fi/opendata/submit .



The export packages contain the page-specific XML of the digitized newspapers and journals of the National Library of Finland (1771–1910). The export packages has been divided by decades and years to a folder structure, where there exists one XML file for each page. The custom  XML file (example) contains: 

  • The mestadata of the page
  • ALTO XML, which contains the words and coordinates of the page. An example of one  one ALTO-file from digi's page: http://digi.kansalliskirjasto.fi  (You can open the ALTO format via A-icon and then you get both the page text and the ALTO XML).
  • Raw text of the page


Terms of use

Please refer to  http://digi.kansalliskirjasto.fi/terms . Users may not distribute in-copyright digitized material onwards without the permission from the rights holder.


The structure of the data exports


Archive:  nlf_ocrdump_v0-2_newspapers_1771-1870.zip
  Length      Date    Time    Name
---------  ---------- -----   ----
        0  02-29-2016 09:19   1771-1870/
        0  02-29-2016 09:19   1771-1870/fin/
        0  02-29-2016 09:12   1771-1870/fin/1861/
     4682  02-06-2016 13:02   1771-1870/fin/1861/kk-ocr.xsd
     5716  02-06-2016 13:02   1771-1870/fin/1861/1457-4519_1861-01-01_0_001.xml
   133732  02-06-2016 13:02   1771-1870/fin/1861/1457-4519_1861-01-01_0_002.xml
   199296  02-06-2016 13:02   1771-1870/fin/1861/1457-4519_1861-01-01_0_003.xml
    38717  02-06-2016 13:02   1771-1870/fin/1861/1457-4519_1861-01-01_0_004.xml


The structure of the filename

1771-1870/fin/1775/1457-4683_1775-09-01_0_001.xml  

YearRange/language/year/ISSN_publicationdate_issuenumber_pagenumber


Download tips

You can proceed download if it has interrupted: 

wget -c URL  
curl -C -


Feedback

If you have any questions or feedback, please send them via the Feedback-functionality at the right upper corner of digi.


Citing

You can cite the open data for example, in following way:

The National Library of Finland (2019), The open data exports of digitized newspapers and journals, 2019. The National Library of Finland. https://digi.kansalliskirjasto.fi/opendata/submit


Frequently asked questions

Q: How do I get all page files of one particular binding?  How do I know in which packages it is located?

A: You can check the publication years of a binding via the Titles view of Digi. After that download the needed packages. NB! You can also contact us via feedback link.

image2017-1-26 20:35:28.png


Q: I lost my download link?  My download link expired, what should I do?

A: Please, answer to the question form again, you will get a new download link.


Literature

    • ===

Pääkkönen, T., Kervinen, J., Nivala, A., Kettunen, K., & Mäkelä, E. (2016). Exporting Finnish Digitized Historical Newspaper Contents for Offline Use. D-Lib Magazine, 22(7/8). http://doi.org/10.1045/july2016-paakkonen

Kettunen, K., Pääkkönen, T., & Koistinen, M. (2016). Kansalliskirjaston digitoitu historiallinen lehtiaineisto 1771–1910: sanatason laatu, kokoelmien käyttö ja laadun parantaminen. Informaatiotutkimus, 35(3), 3–14. http://ojs.tsv.fi/index.php/inf/article/view/59433

Kettunen, Pääkkönen ja Koistinen (2016): http://journal.fi/inf/article/view/59433 (in Finnish).

 


Applications

You can find couple of tiny scripts from Github, there is an script which enables to get just text from a XML file.

Java and php versions of scripts can be found from here: javaphp.



Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Save

Tallenna

Tallenna

Tallenna

Tallenna

Tallenna

Tallenna

Tallenna