Starting to use Kotka

Last modified by akuusija@helsinki_fi on 2024/02/07 06:51

Page in Finnish: Kotkan käyttöönotto

This describes the process of saving a new set of data to Kotka when an organisation starts to use Kotka.

(FinBIF = Finnish Biodiversity Information Facility/Luomus, customer = team/organisation that is going to start using Kotka)

General description of the process

  1. Meet, get to know the data and plan. Write down and agree on the responsibilities of each party.
  2. The customer takes care of its responsibilities (data cleaning, documentation and transformations). The old system is 'frozen' and data is no longer updated there.
  3. FinBIF takes care of their responsibilities (possible new features to Kotka, giving advice, data cleaning, documentation and transformations).
  4. The customer makes a check in between if so was agreed.
  5. FinBIF imports the data to Kotka and archives it in its raw format into FinBIF data archive. FinBIF delivers a report to the customer about the data transfer.
  6. The customer shuts down their old system and takes an archive copy of the data to store themselves. The customer starts using Kotka.

More detailed description of the process

1) Agreeing on the data transfer

First FinBIF and the customer meet (face to face or online), get to know the data and agree how to proceed.

In the beginning it is useful if the customer delivers the data or a representative part of it to FinBIF to familiarise with. Together with the data it is good to deliver a description of the data and a documentation on the data structure: what different data fields mean, what kind of values they include, potential pitfalls and things that are difficult to understand and possible plans and wishes. FinBIF gets to know the data based on this documentation before things are progressed.

  • Name a contact person from the customer (person that takes care for example that the data owner/team/organisation does not clean the data simultaneously with FinBIF).
  • Find out what tools, systems and storage places the customer currently has for different types of information and which parts are replaced with Kotka and other FinBIF services. Get to know the data structure and quality.
    • Specimens
    • Transactions
    • Taxonomy
    • Other data (e.g. research projects)
  • Agree on what data in transferred and is the whole collection/former system transferred in one go/all at once or in smaller sections.
    • All data in one go: transferring the data needs a lot of work and takes time (longer time without any system), but there is no need to use two systems at a time
    • In smaller sections: features can be improved based on user experience, can start using Kotka straight away, but need to use two systems for a while and keep clear what data is primary where.
  • Agree on general issues
    • Agree which domain is used in the URI-identifiers (id.luomus.fi, tun.fi or organisation's own domain)
    • Agree to which organisation or suborganisation the collection and the specimens are connected to, who owns the data
    • Find out what new features are needed for Kotka and how critical these are (must, should, nice to have)
      • Fields and options to fields
      • Labels (Label designer)
      • Reports
      • Concealments
      • Other possible issues that Kotka could solve
      • Something else?
  • Agree on the responsibilities for each party
    • Exrtacting the data from the original system/data source (default: customer)
    • Data cleaning, improving the data quality, unifying the data, and how thoroughly this is done (default: customer)
    • Data transformation to Kotka import format (default: customer)
    • Does the customer want to check the data before the final import to Kotka (default no, if has done the cleaning; but default yes, if FinBIF has done the cleaning)
    • Importing the data to Kotka (default: FinBIF, at least if there are >20 000 specimens and if the original creator and editor names and dates need to be stored in Kotka)
    • Documenting the data transfer process (default both do their part, FinBIF compiles together)
    • Implementing the new features to Kotka (default: FinBIF)
    • maintaining the taxonomy (default: FinBIF/named experts)

2 )And 3) Data transformation and cleaning

Data is transformed to the format of Kotka Excel import as precisely as possible before saving the data to Kotka. If the customer imports the data, the data has to be exactly in the required Kotka format. If there are exceptions to the data format that have been agreed on (e.g. old editor or creator names or dates need to be kept), FinBIF will import the data to Kotka using admin tools.

Before data transformation it pays off to decide how much the data will be cleaned - whether spelling is harmonized (especially locality and person names), data split to their own fields (especially from notes fields) and apparent typos or error are fixed (e.g. in dates). It is worth aiming for a smooth data transfer in the beginning and clean only larger and more common problems. Data quality can be improved later and Kotka has search and statistics tools to support this.

Irrespective of who takes care of the data transformation, it goes about the same way. Different tools and their combinations can be used (tools in the original database, OpenRefine, scripts (R, Python....) or Excel). In any case it is very important to take care that only one person at a time edits the data!

Check also the general principles for data cleaning and the data transfer checklist, summarized here:

  1. Get to know the data (Open refine etc.), the documentation about the data and things agreed before, and write down:
    1. Which field/value in the original data is mapped to which field/value in Kotka
    2. Unclear issues
    3. Fields that do not exist in Kotka as such
  2. Ask when you need to ask, the goal is to:
    1. Solve unclear issues
    2. Find out whether new fields are needed or can we manage with the old fields (what would be the purpose of new fields and how often would they be used)
    3. Does the data need to be supplemented with new specimens or more detailed information
  3. Clean the data and transform to Kotka format
    1. See the more detailed data transfer checklist
    2. Write down as a list, what was done to the data
    3. Write down what amendments and exceptions FinBIF may have to do when the data is imported to Kotka
    4. While cleaning the data, take a backup every once in a while
  4. Make a test import to Kotka test (with a few hundred random specimens) to see whether it goes to Kotka and how it looks like in Kotka. Return to edit the data if necessary.

4) Customer checks the data

FinBIF delivers the data for the customer, if this was agreed. Customer checks the data and approves it, or if there are errors, we go back to steps 2 and 3 to fix the errors.

5) Saving the data to Kotka

The data is imported to Kotka. if FinBIF does the import and there are some exceptions to the normal procedure, FinBIF takes care of these as agreed before. Finally the raw data is archived and documented so that those viewing the data later can understand it and track possible errors that may have happened in the data transfer process.

  1. FinBIF archives the original and the transformed data (and possible transformation scripts) to the FinBIF data archive.
  2. FinBIF and the customer compile a final report on the transfer.
    1. The report is delivered both to the customer and FinBIF.
    2. FinBIF archives the report to the data archive together with the data.
  3. FinBIF writes down any lessons learned and applies them in future data transfers (for example by improving these instructions).

6) Start to use Kotka

  • Customers start to use Kotka
  • Customer shuts down the old system and archives it (e.g. installation package, scripts etc.)
  • Customer archives the data in the old system in a suitable manner (e.g. MySQL export file, Excel files)

About transformations

Remember that each specimen needs:

  • recordType: usually PreservedSpecimen
  • some kind of locality information (minimum information is country or higher geography)
  • Collection the specimens belong to
  • Tag, that identifies the imported data as as a whole
  • (Specimens digitised by Digitarium are attached to a tag GX.270)

Managing dates and person names

Moved to -> Entering specimen data.

Transformation tools