MfN Berlin
ETL workflows
Frederik.Berger@mfn.berlin
Tue 2022-02-01 00:49
At MfN we have different workflows, which I will try to summarize below. Please let me know, if you need further information, so that I can put you in contact with the respective colleagues.Best wishesFrederik
Automated workflows for high-speed 2D imaging:- Systems perform all processing steps and deliver two image files (Tiff/Raw and Png). All technical and administrative Metadata related to the images are delivered with a json sidecar file (XML in METS format for library and archival material).- Automated import of image files and related metadata to the digital asset management system- Object related metadata are acquired in different ways. In one case they are delivered together with the images in the json sidecar file and parsed by the database management team (this process is not yet fully established). In most cases object related metadata are acquired in Excel spreadsheets and imported to the respective CMSAutomated workflows for data acquisition with mobile devices (vertebrate collections and assessments):- We use the app ODK Collect. Data are uploaded to a central ODK server. The process for integration into the media repository and the CMS is also automated. If you are interested, I can put you in contact with one of our of database developers, who can certainly give you more detailsManual workflow for 2D imaging on demand:- Image capturing with Capture One software. DNG and PNG files are stored in structured file system- Manual upload to DAM system after quality check and basic post processing. Post-Processing can include color corrections and rendering of scale bars. We plan to move towards automated uploads, as soon as we can establish an automated quality control- In case of multi-focus imaging: image acquisition and rendering of multi-focus images are separated steps. The rendering process includes the above mentioned post processing steps3D imaging:- At present: Mostly CT images from scientific projects. Processing is mostly done by requesters and/or student helpers. Raw and processed files are stored in the file system and managed by the lab technicians. Upload routines for long-term-archiving and publication are not established yetObject related metadata:- Mostly metadata are acquired with Excel spreadsheets, which are designed for enabling bulk uploads into the CMS- Direct acquisition into CMS (Specify 6) turned out to be very slow in larger digitization projectsAnalytical metadata:- At present completely decentralised, following the (niche-)standards of the respective community
We consider the media repository (EasyDB) as destination system for digitzation processes. There is a different pipeline for publishing data online, which draws on data from the media repository and the CMS.