Data transfer checklist

Last modified by Anniina Kuusijärvi on 2024/02/12 13:43

Page in Finnish: Datansiirron checklist

This is a checklist for things that need to be done or decided when a new data is being cleaned and imported to Kotka. This can for example be printed and filled in when the work is progressing.

You should first agree on responsibilities together with the customer. There are more instructions about these in
Starting to use Kotka.

Check that the data was correctly transferred to OpenRefine

#3340

Save the original data to a network drive or other place where you intend to manage versions

#3341

Check that the ä's, ö's, gender symbols, accents, Cyrillic letters, dates etc. are displayed correctly

#3342

Check that possible carriage returns/new lines (rivinvaihto) in the data do not split one specimen on several rows

#3343

Check that the file includes all data rows and none are left out

General

#3344

Create a collection as the correct organisation as the owner of record

 

#3345

Enter this CollectionID to the specimen data

#3346

Create a tag for the data transformation process (e.g. "Data transferred from Selma to Kotka in November 2015")

 

#3347

Enter this tag ID to the specimen data

#3348

Create a tag for the source system (e.g. "Data from Selma")

 

#3349

Enter this tag ID to the specimen data

#3350

Create a tag for the original collection, if it not recorded as a collection in Kotka, too. (e.g. "Robert von Bonsdorff's lepidoptera collection")

 

#3351

Enter this tag ID to the specimen data

#3352

Create column for record basis and fill it in accordingly (often missing in the original data)

#3353

For data digitised by Digitarium

 

#3354

To the field MYEditor: Digitarium, names of the digitisers and possible abbreviations for their job titles

#3355

Enter the tag GX.270 for the specimens ("Specimens digitized by digitarium")

#3356

To the field MYEntered: date, when the specimen was digitised in digitarium

#3357

Copy the original data to verbatim fields, at least if the data is going to be interpreted or cleaned a lot

 

#3358

Verbatim leg

#3359

Verbatim date

#3360

Verbatim coordinates

#3361

Verbatim locality

#3362

Taxon verbatim

#3363

Det verbatim

#3364

Verbatim labels

#3365

Store the data also in the actual data fields

#3366

Check Triplestore for the MA-identifiers of most important creators and editors of the data and create identifiers for them if they do not have one yet (at least for those still working at the organisation)

 

#3367

Enter the MA-identifiers to the fields MZCreator and MZEditor

#3368

Note that if these fields are filled in, admins need to import the data

#3369

Pick the original dates created from the data and decide what is done if these are missing in the original data (will you for example use the last day of the previous year so that the import does not effect the statistics for the current year)

 

#3370

Enter these as datetime to the field MZDateCreated  (For example 2018-12-31T00:00:00+0200)

#3371

MZDateEdited is filled in automatically to the datetime of the import

#3372

Note that if this field are filled in, admins need to import the data

Checks

#3373

Check whether the data contains information on sensitive species (FinBIF taxonomy database takes care of the concealments for Finnish sensitive species but not for foreign sensitive species, whose exact collection locality should not be revealed)

 

#3374

Use the MZPublicityRestrictions field to conceal the exact locality information of these foreign specimens

#3375

Empty fields that contain no data (e.g. "tyhjä", "-". "unknown", "null", space etc.

#3376

Check the use of question marks and move the question marks resembling uncertainty to the beginning of a field (e.g. to municipality "?Helsinki")

#3377

Check the use of square brackets and leave interpretations to verbatim fields, not actual data fields.

#3378

Check the use of brackets and remove synonyms

#3379

Transform dates

 

#3380

Date begin to format dd.mm.yyyy (if no date end is filled, this can also be mm.yyyy or yyyy and Kotka automatically fills in the month + year or year to begin and end)

#3381

Date end to format dd.mm.yyyy

#3382

Det date to the format dd.mm.yyyy or yyyy

#3383

Standardise fields

 

#3384

Biogeographical province: Finnish abbreviation for Finnish provinces and Latin abbreviation for Russian side provinces

#3385

Country name?

#3386

Municipality name?

#3387

From repeating fields, remove semicolons from places where the value is not supposed to be split into separate fields

#3388

Person names: transform to the format "Lastname, Firstname; Lastname, Firstname"

#3389

Spell out names if necessary and supplement (for example Matti & Maija Meikäläinen → Meikäläinen, Matti; Meikäläinen, Maija)

#3390

Add the leading zero to the YKJ east coordinates (if can be done without a chance for error)

Data transfer

#3391

Get an approval from the data owner as agreed

#3392

Use the correct organisation MOS-identifier as the owner of record

More information in other documents: