Entering collection metadata
Page in Finnish: Aineistojen metatiedon tallennus
The metadata (description data) of Finland’s natural history collections and data are recorded to Kotka. These instructions are meant to provide guidelines on how to record different types of collections, data etc. The instructions can and should be updated based on what is proved to be useful in practice. Improving the metadata quality is a continuous process.
While entering collection metadata, you should keep in mind who is using the data and for what purpose. The metadata of collections and other data is maintained because:
- Collection owners and researchers outside should know what kind of data exist (name, description, contents (taxa, time, place), how to use them (access rights) and who to contact for more info (people in charge)). Information of the data advances their utilisation and impact.
- The publicity law commands that the public must have knowledge of the officials’ data and their access rights.
- The information systems of Finnish Biodiversity Information Facility (FinBIF) need information of the features and owners of data in order to present the data correctly to the public, researchers and officials in laji.fi -portal.
- FinBIF Restricted data requests portal (Aineistopyyntöjärjestelmä PYHA) uses metadata information for example to relay data requests to the correct person responsible
- Sharing data to other portals/data collectors, like GBIF, requires up to date metadata to accompany the shared data
How are the collections grouped/categorized
The principle in Kotka is that every specimen belongs to one single collection. This is why you should not create collections that partially overlap. E.g. no collections called “Finnish bugs” and “Finnish database of creepy crawlies” if the same specimens could belong to both of them.
You should avoid creating a collection based on an information system. Preferably you should create a collection “Finnish bugs”, where it is informed that “80% of specimens have been saved to Finnish database of creepy crawlies”.
Because the collections are arranged hierarchically, you should always aim to place the specimens to collections of the lowest possible hierarchy. E.g. if the collection hierarchy is as follows:
- 1) Museum of Taka Hikiä collections
- 2) Zoological collections of Taka Hikiä museum
- 3) Insect collections of Taka Hikiä museum
- 4) Lepidoptera collection of Taka Hikiä museum
- 3) Insect collections of Taka Hikiä museum
- 2) Zoological collections of Taka Hikiä museum
...then the finnish bug specimens should not be attached to the higher collections (1-3) but a new “Bug collection” should be created under the insect collections.
You can record information to the lower hierarchy collections that specify or overwrite information of the higher collections. If the information (e.g. access rights or person responsible) is conflicting in the higher and lower collections, the information of the lower collections is used.
You should create a separate collection for a set of specimens that fulfills one of the following:
- It has been historically and continuously handled as a collection also in other places than within the organisation, e.g. in scientific publications.
- It has its own official and established abbreviation or some other identifier.
- It is managed by its own team (eg. no collections that are managed by several teams).
- It is preserved permanently and distinctly apart from other specimens (e.g. in its own storage room).
- The specimens have a numbering system of their own (e.g. if two sets have used overlapping numbering, you should organize them to two separate collections).
- It has been organized in a clearly different way compared to other specimen sets.
- It has been rather permanently recorded to different information systems. (However, please don't turn an information system, but rather the data in the information system, into a collection.)
- Its access rights/restrictions clearly differ from others. (Note however that single specimens can have access restrictions, so e.g. endangered species should not be organized as their own collection.)
You should keep specimens together in the same collection, if:
- Recording and maintaining separate collections is too laborious (e.g. if a person in charge is changed, their information shouldn't have to be changed to the metadata of ten different collections).
- Specimen sets are stored in a same place (e.g. sets donated to the museum that will be merged into the museum's collection should be recorded to that collection, and the donation can be recorded for example as a tag).
- The boundary between collections is unstable or flickering, and it's not clear to which collection each specimen should go (e.g. does a possible type specimen go to the type collection or the general collection?).
- The delimitation of the collection is changed over time (e.g. don't create a collection for endangered species).
- The only reason to compile a collection is to be able to retrieve the specimen data easily from the database (for this you should use tags).
Besides collections, to sort the specimens they can be assigned into tags. A specimen can have several different tags E.g. "The donated collection of Matti Meikäläinen" or "FiRI-project digitized specimens". Specimens can also be searched based on other features like country, order or type specimen status, so collections should not be created just for search purposes.
How to enter collection metadata
Mandatory fields have been marked purple.
Basic information
Owner of record: Organisation that owns the metadata record and can edit the information.
Name of the collection in Finnish and English (Swedish optional): Descriptive, distinctive name that separates the collection from other possible collections. The name should be understandable to others as well as the collection maintaining team. (Thus the collection name cannot be "Northern Fennoscandia", unless it actually consists of collections of Northern Fennoscandias. "Lichens of Northern Fennoscandia" or something similar would be more correct.) It should be noticed that in Kotka this is data from collections, so this section should specifically be collection name and not the name of a database telling about a collection.
Even though the collection is a part of other collections higher in the hierarchy, the organisation name holding the collection should be added to the name of the lower lever collections, too, to keep names distinctive. For "Lichens of Fennoscandia of Taka Hikiä natural history museum" is distinctive from other "Lichens of Fennoscandia" collections in other organisations using Kotka.
Collection code: Use this only if the collection already has an official and established abbreviation. New ones don't need to be made up.
In laji.fi, collection code is automatically added to the end of the collection name in brackets, if filled in in Kotka metadata.
Collection Type: Is it a museum specimen collection, a living garden collection or something else
Description: The Finnish name and description need to be understandable to the public. The English name and description can be written from a more scientific point of view, f.ex. by using established professional terms, if description by general terms doesn't work.
Is part of: If the collection is part of another collection, choose the parent collection here.
(Publisher short name: This is an admin field, only visible for admin users. Used when sharing data to the public.)
Coverage and methods
Taxonomic coverage: Give the lowest possible taxon rank name, e.g. family or order. You can enter e.g. "Biota" as coverage, if the data includes observations or specimens of many kinds of taxa.
Temporal coverage: Give simple, machine readable years in form beginning year - ending year. (e.g.: "1860 - 1910"). If there's no specific information, give an educated guess. Here you don't have to explain that the data is not necessarily very specific. If the collection accumulation is ongoing, leave the ending year open (e.g. "1970 - ").
Geographic coverage: Give a name that most accurately describes the smallest possible area that the collection covers. Preferably use terms that are used in the collection or that are contemporary (e.g. for a collection, which temporal coverage would be "1950 - 1970", the geographic coverage could be"the Soviet Union"). If the collection includes observations/specimens from around the world, the geographic coverage could be "World".
Coverage basis: Give concise definition for the collection, if not apparent from the collection name. For example "Winter birds of Finland".
Methods: Different kinds of standardized methods used when creating this collection, at any stage of the process. E.g. sampling method, census method, instruments, tools, software.
Copyrights and permissions for use
Publisher name (en): Give a name for the organisation or institution publishing the data (used when data is published secondarily outside the primary system). If not filled in for collection or one of its parents, FinBIF will be shown for collection in data sharing (CETAF RDF). (This used to be copyright owner)
License for use: With what licenses can the data be used. All data is open data according to FinBIF's data policy. The license is usually Creative Commons Attribution 4.0. An exception can be made, if an agreement on the use of another license has been made with FinBIF. See FinBIF's data policy for further information.
The options to choose from are:
- Creative Commons Zero (CC0): Use is free and the data doesn't have to be cited, which makes it easier to use large amounts of data.
- Creative Commons Attribution (CC-BY): Use is free, but the data has to be cited in a manner the data owner has defined.
- Public domain: This concerns data, which copyrights no longer exist and thus it's not possible to demand the use of licenses anymore.
- All rights reserved: This is used in a few special cases, when there's no agreement with the owner on the use of licenses. It must be considered whether to include such data into the FinBIF systems. (For Example: Specimens that are owned by other museums or private people and have been determined in Luomus and saved to Kotka at the time.)
More information about Creative Commons licenses.
Secure level: Determines how tightly the data will be concealed, unless they have been concealed on other grounds such as the list of sensitive species or by concealing single specimens/observations. The levels are compatible with the security levels of sensitive observations. No security level is used by default.
The choice has to be justified in "Basis for concealment or quarantine" field. The basis of concealment has to concern the whole collection. "The data includes some observations of endangered species" is not enough to conceal the entire collection.
Quarantine in years: This defines the quarantine time in years. During quarantine the data is processed as if it was secured to 100x100 km2 level. Public authorities (viranomaiset) can see the full, unconcealed observations in the public authority portal. Others can make a data request (aineistopyyntö) that the data owner approves or rejects.
The choice has to be justified in "Basis for concealment or quarantine" field. The data policy of Luomus restricts the quarantine time to a maximum of four years.
Basis for concealment or quarantine: The basis for concealment and/or quarantine (e.g. the name of a research project), a basis for possible exceptions to the data policy. Needs to be filled if Secure level or Quarantine is used.
Special terms for data use: Free text description for any special terms for data use, for example restrictions for commercial use etc.
Accessibility to public: This field is used for garden areas, tells whether a garden area can be accessed and plants viewed bu the public or not.
(Allowed for DW statistics: This is an admin field, only visible for admin users. Tells whether the collection can be used for data warehouse/statistics endpoints.)
(Download request handler: This is an admin field, only visible for admin users. Admins fill in the MA-identier of the person who is responsible for handling restricted data requests to this collection. Typically the same person as Person responsible. If not filled i, is inherited from the parent collection. Repeatable field.)
(Collection DOI from GBIF: This is an admin field, only visible for admin users. DOI received from GBIF after publishing the data in GBIF portal.)
(Share to GBIF: This is an admin field, only visible for admin users. Can the data be shared to GBIF or not. Given collection id means data is shared under that collection.)
Size
Size (approx.): Estimation of the number of specimens, observations or records in the collection, as a number. Explanations can be added to notes field. Used to count Luomus digitisation statistics (digistat.luomus.fi)
% digitised (approx.): Estimation of the proportion of the collection that is digitised/databased in a system or a file. Explanations can be added to notes field.
Amount of type specimens (approx.): estimation of the number of type specimens in the collection. Explanations can be added to notes field.
Additional information
Location of data and backups: Here it would be most useful to give the name of the information system that contains the digital data (e.g. Kotka, Vihko). If the data is located on a pc hard drive, this description is only useful if the data can be found based on it. E.g. "table on my floppy disc" isn't useful, but "excel file collection.xls in Karen Curator's file folder in network drive" is already a lot more useful.
Location of collection: The physical location of the collection specimens. The building or room number should be accurate enough.
Person responsible: The person, who is responsible for the collection and has the power to decide for example on the the collection's accumulation, preservation and information publication. In form "Lastname, Firstname".
Contact email: Email of the contact person. It can be personal or general contact info through which you can reach the contact person. Fill in a working form using @ -character.
Citation recommendation: Recommendation for the kind of citation that ought to be used when citing the data in e.g. a scientific paper. This will be shown to the data users.
Language: language the data in mainly written in.
URL: If there's more information about the collection online, add the address here. (e.g. Winter bird monitoring website, where methods are described and linked to results, or the history of Mannerheim's collection.)
Collection quality: Data quality assessment for the collection using three categories. More information: Collection quality rating
Data Quality description: Free text description for the data quality, reasons and explanations, known errors etc.
Notes about the data: Diary style notes about the data belonging to the collection, for example corrections done, etc. Notice that there is a separate field for notes about this metadata.
Metadata on this metadata
Status of this metadata: filled in manually (for now) according to what information has been recorded (NOT YET AUTOMATED)
- preliminary - needs improvement= only mandatory fields have been filled
- satisfactory - could be improved = in addition to mandatory fields, some more fields have been filled in
- comprehensive = most fields are filled in
- hidden - this metadata should not be used = this option is chosen when the metadata is not in use, and don't want it to be publicly visible, but the metadata needs to be retained and can't be entirely deleted.
Notes: Any additional information to any of the fields on the metadata form. Note that there is a separate notes field for notes about the collection data itself.