Data transfer checklist

Last modified by Anniina Kuusijärvi on 2025/02/04 06:58

Page in Finnish: Datansiirron checklist

This is a checklist for things that need to be done or decided when a new data is being cleaned and imported to Kotka. This can for example be printed and filled in when the work is progressing.

You should first agree on responsibilities together with the customer. There are more instructions about these in
Starting to use Kotka.

Check that the data was correctly transferred to OpenRefine

Save the original data to a network drive or other place where you intend to manage versions

Check that the ä's, ö's, gender symbols, accents, Cyrillic letters, dates etc. are displayed correctly

Check that possible carriage returns/new lines (rivinvaihto) in the data do not split one specimen on several rows

Check that the file includes all data rows and none are left out

General

Create a collection as the correct organisation as the owner of record

 

Enter this CollectionID to the specimen data

Create a tag for the data transformation process (e.g. "Data transferred from Selma to Kotka in November 2015")

 

Enter this tag ID to the specimen data

Create a tag for the source system (e.g. "Data from Selma")

 

Enter this tag ID to the specimen data

Create a tag for the original collection, if it not recorded as a collection in Kotka, too. (e.g. "Robert von Bonsdorff's lepidoptera collection")

 

Enter this tag ID to the specimen data

Create column for record basis and fill it in accordingly (often missing in the original data)

For data digitised by Digitarium

 

To the field MYEditor: Digitarium, names of the digitisers and possible abbreviations for their job titles

Enter the tag GX.270 for the specimens ("Specimens digitized by digitarium")

To the field MYEntered: date, when the specimen was digitised in digitarium

Copy the original data to verbatim fields, at least if the data is going to be interpreted or cleaned a lot

 

Verbatim leg

Verbatim date

Verbatim coordinates

Verbatim locality

Taxon verbatim

Det verbatim

Verbatim labels

Store the data also in the actual data fields

Check Triplestore for the MA-identifiers of most important creators and editors of the data and create identifiers for them if they do not have one yet (at least for those still working at the organisation)

 

Enter the MA-identifiers to the fields MZCreator and MZEditor

Note that if these fields are filled in, admins need to import the data

Pick the original dates created from the data and decide what is done if these are missing in the original data (will you for example use the last day of the previous year so that the import does not effect the statistics for the current year)

 

Enter these as datetime to the field MZDateCreated  (For example 2018-12-31T00:00:00+0200)

MZDateEdited is filled in automatically to the datetime of the import

Note that if this field are filled in, admins need to import the data

Checks

Check whether the data contains information on sensitive species (FinBIF taxonomy database takes care of the concealments for Finnish sensitive species but not for foreign sensitive species, whose exact collection locality should not be revealed)

 

Use the MZPublicityRestrictions field to conceal the exact locality information of these foreign specimens

Empty fields that contain no data (e.g. "tyhjä", "-". "unknown", "null", space etc.

Check the use of question marks and move the question marks resembling uncertainty to the beginning of a field (e.g. to municipality "?Helsinki")

Check the use of square brackets and leave interpretations to verbatim fields, not actual data fields.

Check the use of brackets and remove synonyms

Transform dates

 

Date begin to format dd.mm.yyyy (if no date end is filled, this can also be mm.yyyy or yyyy and Kotka automatically fills in the month + year or year to begin and end)

Date end to format dd.mm.yyyy

Det date to the format dd.mm.yyyy or yyyy

Standardise fields

 

Biogeographical province: Finnish abbreviation for Finnish provinces and Latin abbreviation for Russian side provinces

Country name?

Municipality name?

From repeating fields, remove semicolons from places where the value is not supposed to be split into separate fields

Person names: transform to the format "Lastname, Firstname; Lastname, Firstname"

Spell out names if necessary and supplement (for example Matti & Maija Meikäläinen → Meikäläinen, Matti; Meikäläinen, Maija)

Add the leading zero to the YKJ east coordinates (if can be done without a chance for error)

Data transfer

Get an approval from the data owner as agreed

Use the correct organisation MOS-identifier as the owner of record

More information in other documents: