Data transfer checklist
Page in Finnish: Datansiirron checklist
This is a checklist for things that need to be done or decided when a new data is being cleaned and imported to Kotka. This can for example be printed and filled in when the work is progressing.
You should first agree on responsibilities together with the customer. There are more instructions about these in
Starting to use Kotka.
Check that the data was correctly transferred to OpenRefine
Save the original data to a network drive or other place where you intend to manage versions
Check that the ä's, ö's, gender symbols, accents, Cyrillic letters, dates etc. are displayed correctly
Check that possible carriage returns/new lines (rivinvaihto) in the data do not split one specimen on several rows
Check that the file includes all data rows and none are left out
General
Create a collection as the correct organisation as the owner of record
Enter this CollectionID to the specimen data
Create a tag for the data transformation process (e.g. "Data transferred from Selma to Kotka in November 2015")
Enter this tag ID to the specimen data
Create a tag for the source system (e.g. "Data from Selma")
Enter this tag ID to the specimen data
Create a tag for the original collection, if it not recorded as a collection in Kotka, too. (e.g. "Robert von Bonsdorff's lepidoptera collection")
Enter this tag ID to the specimen data
Create column for record basis and fill it in accordingly (often missing in the original data)
For data digitised by Digitarium
Store the data also in the actual data fields
Check Triplestore for the MA-identifiers of most important creators and editors of the data and create identifiers for them if they do not have one yet (at least for those still working at the organisation)
Pick the original dates created from the data and decide what is done if these are missing in the original data (will you for example use the last day of the previous year so that the import does not effect the statistics for the current year)
Checks
Check whether the data contains information on sensitive species (FinBIF taxonomy database takes care of the concealments for Finnish sensitive species but not for foreign sensitive species, whose exact collection locality should not be revealed)
Use the MZPublicityRestrictions field to conceal the exact locality information of these foreign specimens
Empty fields that contain no data (e.g. "tyhjä", "-". "unknown", "null", space etc.
Check the use of question marks and move the question marks resembling uncertainty to the beginning of a field (e.g. to municipality "?Helsinki")
Check the use of square brackets and leave interpretations to verbatim fields, not actual data fields.
Check the use of brackets and remove synonyms
From repeating fields, remove semicolons from places where the value is not supposed to be split into separate fields
Person names: transform to the format "Lastname, Firstname; Lastname, Firstname"
Spell out names if necessary and supplement (for example Matti & Maija Meikäläinen → Meikäläinen, Matti; Meikäläinen, Maija)
Add the leading zero to the YKJ east coordinates (if can be done without a chance for error)
Data transfer
Get an approval from the data owner as agreed
Use the correct organisation MOS-identifier as the owner of record
More information in other documents: