Interfaces of digi.kansalliskirjasto.fi

Last modified by Tuula Pääkkönen on 2024/02/05 14:03

This page briefly documents the interfaces currently available at digi.kansalliskirjasto.fi .

Additional information  of  interfaces of the National Library can be found at https://data.nationallibrary.fi

Access

Digi interfaces are available for the metadata of the material, which is available from the network. I.e. for newspapers and journals until year 1949 (as digitization progresses). 

Interfaces can be used according to the terms of use of Digi.nationallibrary.fi .

Data content

Metadata of the digitized works (newspapers, journals, books, sheet music, etc.). For newspapers it is possible to get publication time, issn of each digitized binding. For books there is all metadata of books  obtained from the description work.

Each book contains both digital record (detailing information of digitization) and physical record (giving mainly information of the original work). Digital record of books have been specified and it those are offered by Digi for the harvesting use by Melinda, in order to get the digitized item links to Finna. NB! Process of getting links to digitized items to Finna is still under way, and progresses collection by collection. First set for this has been the Clandestine collection

OpenURL

OpenURL links you to the page image based on the date information on the URL parameters. The parameters, which are available can be seen from example below:

http://digi.kansalliskirjasto.fi/openurl/query.html?genre=journal&date=1888-01-03&issn=0355-6913&spage=1

  • genre ( journal, no need to change)
  • date (YYYY-MM-DD)
  • ISSN (the identifier of the newspaper or journal)
  • spage  (page number)

The above url returns the page image of Aamulehti 3.1.1888 page 2.

https://digi.kansalliskirjasto.fi/sanomalehti/binding/379687/thumbnail/2

Applicability

  • Use OpenURL if you want permanent reference to the newspaper, without using the binding id.

Getting started with OAI-PMH

OAI-PMH is a harvesting interface via with you can get metadata records of a specific service. Most often you will want to harvest metadata for a specific collection, which is in OAI-PMH vocabulary set .   The sets of OAI-PMH corresponds in Digi the collection identifier. The collection idenfier can be seen from the homepage of a collection https://digi.kansalliskirjasto.fi/collections?id=681  within Digi. For example for previous collection the identifier is 681, and the set number for OAI-PMH is col-681. So you can access all metadata of that collection via accessing: https://digi.kansalliskirjasto.fi/interfaces/OAI-PMH?verb=ListRecords&set=col-681&metadataPrefix=oai_dc .

Quite often you can use specific OAI-PMH library from a programming language. Eg. Sickle for Python is one option.  There you set up the OAI-PMH server connection details, set up collection and you get back a handle which gives you the next batch of records.

from sickle import Sickle

URL ='https://digi.kansalliskirjasto.fi/interfaces/OAI-PMH'

sickle = Sickle(URL)

records = sickle.ListRecords(
**{'metadataPrefix': 'oai_dc''set': colid,
})

for record in records:
  # do sth fun with a record.

OAI-PMH

Basic information about the service

Which formats offered: (dublin core , qdc_finna, marc21)

Which sets are offered:

The collections of digi can be retrieved by prepending 'col-' to the collection number, which can be seen at the url of the collection. For example for collection https://digi.kansalliskirjasto.fi/collections?id=41 the OAI-PMH is :

Return the first 100 records, at the end is the resumption token to get next batch

The resumption token by which you can continue the first batch of results is given at the end of the first batch and so on until end of batches.

Returns different material types:

Returns just identifiers and datestamp:

Return 1 specific record (the identifier is created with help of binding id):

Usage of richer QDC_finna metadata format:

https://digi.kansalliskirjasto.fi/interfaces/OAI-PMH?verb=GetRecord&metadataPrefix=qdc_finna&identifier=oai:digi.kansalliskirjasto.fi:1901222

Filter records by date

Applicability

  • Use OAI-PMH if you want the basic binding level metadata. The querying of all records will take some time as batch size via OAI-PMH is limited.

OAI-PMH and Datestamp (changed 10/2023)

Digi started to utilize datestamp field to also to show the metadata changes of the binding. Previously the timestamp gave the original importing date of the binding, which in some cases was suitable but not in all use cases. The modified date should help e.g. if it is important to know if a binding has a updates to its metadata, so reharvesting can taget those specifically.

OAI-PMH and Checksums (new 05/2023)

As part of the Fin-CLariah project, OAI-PMH offers the current checksums of digitized materials. With a checksum, you can verify that you have fully downloaded the offered content unchanged.

You can see the checksums in the qdc_finna metadata format. The checksums are given with file type, page number and value of checksum.

Disclaimer: However, due to history of digitization activities or corrections made after original digitization, some material has currently invalid checksums - these will be fixed in the future.

For example, snippet from a book, where you can see the checksums in action:

Checksum example
<dc:rights>Tekijänoikeuden alainen</dc:rights>
<kk:file bundle="THUMBNAIL" href="https://digi.kansalliskirjasto.fi/teos/binding/1928249/thumbnail/1" type="image/jpeg" length="154184"/>
<kk:file bundle="ORIGINAL" href="https://digi.kansalliskirjasto.fi/teos/binding/1928249/pdf/" name="Aakkoset - maalattavia kuvia = ABC-bok - målbilder - V. Soldan-Brofeldt._1_1_1932.pdf" type="application/pdf" length="2945075"/>
<kk:checksum filetype="application/xml+alto" pagenumber="1" value="0e4e0ce3cbe3a073fa166aadf024d82c"/>
<kk:checksum filetype="application/xml+alto" pagenumber="2" value="ce330667e2a5300aa1a38340f7936930"/>
<kk:checksum filetype="application/xml+alto" pagenumber="3" value="a2326e7ea1bf58a77fa564704a94aec6"/>
<kk:checksum filetype="application/xml+alto" pagenumber="4" value="e6239fc6e2d2eeaae2df6ce28919cb9d"/>

OAI-PMH and Deleted works (new 08/2023)

Earlier the information about the deleted items was not available via the OAI-PMH. This was changed on 23.8. when deleted items support was added to the OAI-PMH. In case a binding is removed, e.g. because it is replaced with better digitzation, now the OAI-PMH returns this information. As per OAI-PMH specification, the deleted record has the header information but not other metadata. You can access just the deleted works via  a new 'deleted' set.:

https://digi.kansalliskirjasto.fi/interfaces/OAI-PMH?verb=ListRecords&metadataPrefix=qdc_finna&set=deleted

In addition, if a set now includes deleted items, then those are returned with collections. However, if a work is unattached from the collection because of another reason, then it is not visible via OAI-PMH set either. This should be quite rare, and only occur during the initial creation of collections.

Earlier deleted works are not visible via deleted, but new ones since 23.8. are visible via different OAI-PMH verbs. The deleted items are typically at the end of the list.

Via collection
https://digi.kansalliskirjasto.fi/interfaces/OAI-PMH?verb=ListIdentifiers&metadataPrefix=qdc_finna&set=col-21

And individual deleted record:

https://digi.kansalliskirjasto.fi/interfaces/OAI-PMH?verb=GetRecord&metadataPrefix=qdc_finna&identifier=oai:digi.kansalliskirjasto.fi:1906699

OAI-PMH for Books

The books operate in the same way as the newspaper and journal materials. In the books , the 'set' is the collection of books, you can find the collection id either from the collection page or by observing the ListSet verb results.

Getting all collections of books

Getting the 'Geography and Travel' collection:

Getting a subcollection (subcollections are separated via colon ( , :

Getting a particular record:

(A quick way is just to replace the binding id of a desired binding to the example above).

Getting a particular record in marc21 format:

JSON

Available for the newspaper and journal title information (core metadata):

https://digi.kansalliskirjasto.fi/api/newspaper/titles?language=fi

SFX (Deprecated, not in use)

See instructions of using via SFX: https://www.kiwi.fi/pages/viewpage.action?pageId=103187594

Sub components of individual binding available


    • ===

The different sub components of post-processing can be accessed via https://digi.kansalliskirjasto.fi , where <bindingid> is the unique local id of a particular issue number.

https://digi.kansalliskirjasto.fi/sanomalehti/binding/1426186/image/1   the access page image (.jpg)

https://digi.kansalliskirjasto.fi/sanomalehti/binding/1426186/thumbnail/1  the thumbnail of the page (.jpg)

https://digi.kansalliskirjasto.fi/sanomalehti/binding/1426186/pdf  the whole binding as pdf

https://digi.kansalliskirjasto.fi/sanomalehti/binding/1426186/page-1.txt  the page text as it is.

https://digi.kansalliskirjasto.fi/sanomalehti/binding/1426186/page-1.xml  returns the ALTO XML,which contains also the layout information.

https://digi.kansalliskirjasto.fi/sanomalehti/binding/1426186/image/100 sample of an error page, if binding doesn't have that many pages

https://digi.kansalliskirjasto.fi/sanomalehti/binding/1426186/mets.xml?full=true   mets file for the whole binding, true=unfiltered, false=filters away some amdsec parts.

Utilizing METS (draft)

METS file contains information about digitized packages and which files are included. It is possible the filenames from that.

You can also use the METS-file as a starting point for material download. Utilize e.g. area:

<mets:fileGrp ID="ALTOGRP" USE="alto">

 

in books USE=alto and newspapers and journals USE=Text

Digi supports downloading alto files with prefixed zeroes, so it is possible to download as https://digi.kansalliskirjasto.fi/sanomalehti/binding/1426186/page-00001.xml  as the filename appears in METS.

References

For all questions or comments, please use the Feedback-functionality of https://digi.kansalliskirjasto.fi/etusivu  .

Tallenna

Tallenna

Tallenna

Tallenna

TallennaTallenna

Tallenna

Tallenna

Tallenna

TallennaTal