Specimen Identifiers

Last modified by Anniina Kuusijärvi on 2024/02/12 14:08

This page describes what museum specimen identifiers are, what kind of identifiers are created for specimens entered to Kotka and how this is done.

Briefly: each specimen gets its own unique HTTP-URI -identifier like this: http://tun.fi/JAA.123 in Kotka. These can be created in several ways, for different situations. In addition, the traditional identifier (e.g. H-number or similar) is entered to a separate field Original catalogue number. Specimens can be searched with either identifier, but it is best to give the HTTP-URI or both as a reference to the specimen.

All other resources, like organisations and collections, also get a similar kind or unique identifier in Kotka.

Kotka admins will help with any questions related to Specimen identifiers (kotka(at)luomus.fi).

General

Every specimen that is saved into Kotka will have a globally unique, persistent and stable identifier. Identifiers are supposed to be "dumb" as in they are a string that has no information encoded in it. Identifiers are used to identify specimens and connect specimens to their corresponding data in Kotka. They can also be used to refer to specimens in e.g. a scientific article or external databases (such as GBIF). Identifiers can be printed on labels as text and as a barcode or QR code.

HTTP URI-identifiers

Kotka uses so-called HTTP URI -identifiers, which provide a standardized method for creating identifiers that are unique in global scale.

An example of this kind of identifier:
http://id.luomus.fi/GP.92636

where

http:// is scheme identifier, which tells that this is a HTTP URI
id.luomus.fi is a Domain name controlled by Luomus. This guarantees that no-one else can make similar identifiers and makes identifiers globally unique (as long as everybody follows the standard)
GP is a namespace identifier. Always uppercase.
92636 is an object identifier. Usually numbers, but sometimes may contain letters.

These identifiers look like web addresses and can be usually used as such to retrieve more information about the specimen. But ultimately they are just sequences of strings which carry no meaning by themselves. One cannot tell what a specimen is like just by looking at the identifier. The "dumbness" of the identifier has some benefits, at least in theory:

The identifier does not need to be changed is something in the specimen information or specimen changes (for example if the specimen is donated to another museum)
- this way helps to maintain the connection between the identifier and the specimen
Identifiers are easy to generate, as no information about the specimen, the collection it belongs to or the owner is needed

Background information about identifiers in biodiversity informatics

The Trouble with Triplets in Biodiversity Informatics: A Data-Driven Case against Current Identifier Practices. http://dx.doi.org/10.1371/journal.pone.0114069
Community Next Steps for Making Globally Unique Identifiers Work for Biocollections Data: http://dx.doi.org/10.3897%2Fzookeys.494.9352

Principles and rules for identifiers

Globally unique: no other object in the world has the same identifier
- Same identifier cannot be shared by several specimens (Exception: the special case of several specimens in one document.), nor can they be reused.
Persistent and stable: identifier given to a specimen always refers to the same specimen and the identifier is never changed (the target of the identifier and the identifier itself are both persistent). Persistent identifiers are sometimes referred to as PIDs.
- Identifier must not be abbreviated by removing any part of it.
Resolvable: the identifier can be used to automatically fetch more information about the specimen.
Dumb: Identifiers carry no meaning (they are so-called “dumb identifiers”). Technically an identifier is just a string of letters, numbers and other symbols.

Domain part of the identifier

For all HTTP-URI identifiers in Kotka, the default domain is http://tun.fi. This is used to generate IDs for all users for all resources (collections, organisations, transactions, specimens). Luomus is responsible for the maintenance of tun.fi and makes sure the identifiers always resolve (that more information can be found using the identifier). Luomus also maintains another domain used in Kotka, http://id.luomus.fi, which is used e.g. for Luomus specimen identifiers.

Other organisations using Kotka can maintain their own domains:

University of Turku: http://mus.utu.fi
University of Oulu herbarium: http://id.herb.oulu.fi
University of Oulu zoology: http://id.zmuo.oulu.fi

These organisations are responsible for the maintenance of the domains themselves and the traffic from these domains has to be directed to the server pointed by Kotka admins, for the identifiers to resolve.

Each Kotka user gets a default domain which is used for the specimens she/he creates. Users can enter specimens using other domains as well, but then they need to use the required domain as a prefix for the namespace when entering data. This is why it is not wise to have different domains for different collections inside a collection.

The format and maintenance of domains is mainly a technical question and does not affect the ownership or access/edit rights of specimens.

Domains:

Domain abbreviation	Domain	Organisation
tun:	http://tun.fi/	Others
luomus:	http://id.luomus.fi/	Luomus
utu:	http://mus.utu.fi/	University of Turku
zmuo:	http://id.zmuo.oulu.fi/	University of Oulu, zoology
herbo:	http://id.herb.oulu.fi/	University of oulu, botany

Creating identifiers

When generating and creating identifiers, the most important thing is to make sure each identifier is unique and there are no errors (for example different identifier written on the label than to the database). Identifiers can be created in two ways in Kotka.

A) Manually

Each user or user group need their own Namespace to be able to generate identifiers for Specimens. Namespaces are maintained here: https://triplestore.luomus.fi/namespaces Person responsible/person in charge is responsible that the numbering of the specimens stays unique within the namespace. This way of generating identifiers is useful in the situation where specimens are labelled before the data is entered to Kotka.

Get a namespace identifier for yourself from Kotka administrators (kotka(ät)luomus.fi). Usually this consists of 2 to 3 capital letters. When requesting the namespace identifier, tell whether it's going to be used for
1. zoological, botanical, palaeontological, microbial specimens or botanic garden accessions, or all
2. identifiers under id.luomus.fi, tun.fi, utu.fi, herbo or zmuo.
Write this namespace identifier on the Namespace ID field (either on Excel sheet or entry form).
1. NOTE: If you are making identifiers under many domains, prefix the identifier with one of these: "luomus:", "tun:", "utu:", "herbo:" (Oulu herbarium) or "zmuo:" (Oulu zoological)
Give each specimen an object identifier, which usually is running number, starting from 1. Write this to Object ID field (either on Excel sheet or entry form).

It’s your responsibility to make sure that same running number is not given to several specimens. When importing the Excel or saving the entry form, Kotka will check the identifiers and prevent saving several specimens with same numbers.

Responsibility of a namespace ID can be given to someone else by informing about the change to Kotka admins kotka(at)luomus.fi.

Examples:

http://id.luomus.fi/GV.123http://id.luomus.fi/GP.123

B) Automatically when data is saved using entry form

When creating a new specimen using Kotka’s entry form, if you leave the fields NamespaceID and ObjectID empty, Kotka will generate a unique identifier automatically to the namespace HT. In most cases you should let Kotka create the identifier automatically. Kotka generates unique identifiers and the responsibility of the person digitising is to attach the correct label/identifier to the correct specimen.

Example: http://id.luomus.fi/HT.5181

Note: IDs can't be generated automatically in Excel import, because then it would be too easy to enter the same data into Kotka more than once, with different identifiers. Kotka can't check whether the same specimen already exists in Kotka, if it gets a new identifier. Identical specimens or specimens that resemble each other can't be used for validation, as there may be actual real duplicates.

What if specimen already has an identifier?

If the specimen being digitized already has an identifier other than HTTP URI, it must be written into Additional IDs field or Original Specimen ID (original catalogue number) field. Then a new HTTP URI identifier is created for the specimen as described above.

If printing new labels for the specimens would be too much work, the old identifier could be used as the object identifier part of the new identifier. However, please contact Kotka administrators before giving specimens this kind of identifiers, as these are considered case by case. It can cause several problems and side effects to keep the old identifier as part of the new identifier.

If you are using Luomus Botanical Museum’s H-number as the object identifier, use “HA” as the namespace identifier.
E.g. http://id.luomus.fi/HA.H0003706

Using specimen identifiers

When specimen is referred to (e.g. in an article, GBIF, specimen list) the full, official HTTP-URI identifier should be used, and not only its latter part. So not just JA.123 but http://id.luomus.fi/Ja.123. An identifier can be abbreviated/shortened if it is necessary due to space saving, but each label should always have the full identifier in text and not just as barcode or QR code. Shortening identifiers can lead to:

The specimen being mixed up with other specimens
The identifier can't be used to search for more information about the specimen in other publications
Specimen information can't be linked together or more information fetched automatically (e.g. through GBIF, FinBIF, BOLD etc.)

Abbreviation of identifiers can be compared to someone leaving the postal code and city out from the street address, and the user is expected to know where "Keskustie 13" is. An experienced professional can conclude it from the context but other users or computers can't.

Publications have different ways of printing and using identifiers. If the full URI takes up too much space in the actual publications, some may add the identifiers to supplementary materials and some may embed them to the article as links, for example.

Original identifiers have traditionally been and are sometimes still used when referring to specimens. This is not recommended, as old identifiers are not always unique and are rarely machine readable or understandable to non-experts.

Notes to admins

Every user can have their default domain prefix set into MA.defaultQNamePrefix on Triplestore. If none is set, the default is "luomus".

If user prefixes Namespace ID when saving specimens, it will override MA.defaultQNamePrefix.

Namespaces are maintained on Triplestore editor.