Nano automation
Motivation
I noticed that I end up spending a lot of time preparing material for Nano users' CellProfiler trainings. Lately there's been two main things: 1) nuclei segmentation by StarDist and 2) image quality measurements with CellProfiler. I have prepared a Fiji script and a CP pipeline that users can use for these steps, and before the training I would run it for their first data set, just to have something to show in the training. This got a bit repetitive, so how about automating the whole thing?
Plan
lmu-airflow server for managing workflows
- scan lmu_active1/instruments/Nano
- store plate list in db
- status = [new, processing, ready] ?
- scan db for new plates
- create file list with date/plate/well/site metadata
- copy data to turso $WRKDIR
- file system mount? check /home/ad/turso on Cubbli
- CIFS it is: https://version.helsinki.fi/it-for-science/hpc/-/issues/121
- mount both lmu_active1 and wrk-vakka
authorized_keys can be used to open limited passwordless access for scp.How about rsync?
- file system mount? check /home/ad/turso on Cubbli
- schedule MeasureImageQuality
- Prepare batch script template,depending on number of images
- results to database
- schedule MeasureImageFocus https://www.researchgate.net/publication/323782068_Assessing_microscope_image_focus_quality_with_deep_learning
- schedule CorrectIlluminationCalculate and CorrectIlluminationApply
- schedule StarDist
- copy data from turso $WRKDIR to lmu_active1
- prepare file lists with filepath mapping
- Airflow secrets (Fernet?) https://airflow.apache.org/docs/apache-airflow/stable/administration-and-deployment/security/secrets/index.html
CellProfiler on Turso
- Use singularity
- export APPTAINER_CACHEDIR=/proj/group/lmu/software/singularity
- singularity pull docker://cellprofiler/cellprofiler:4.2.5
- singularity pull /proj/group/lmu/software/singularity/cellprofiler-4.2.5-hj.sif docker://hajaalin/cellprofiler:master
- singularity exec --bind /run/user/1028227 --bind /etc/machine-id cellprofiler.v4.2.1.sif cellprofiler
- singularity exec cellprofiler.v4.2.1.sif cellprofiler -c -r -p tmp/cptest/pipeline.cppipe --data-file=tmp/cptest/images.csv -f 1 -l 2
- /opt/apptainer/bin/apptainer exec --bind /mnt/lmu_active1/instruments/Nano:/mnt/lmu_active1/instruments/Nano:ro cellprofiler_4.2.5.sif cellprofiler -c -p git/airflow-deploy/cellprofiler/test1.cppipe --data-file git/airflow-deploy/imagesets_test1.csv -o git/airflow-deploy/cellprofiler/output/
- Prepare pipeline
- use LoadData module (read .csv)
- exporttodatabase → MySQL database
srun --interactive -n 1 --mem=4G -t 02:00:00 -p short -M ukko --pty bash
# on compute node:
singularity run --bind /wrk-vakka/group/lmu/nano:/mnt /proj/group/lmu/software/singularity/cellprofiler-4.2.5 -c -p /mnt/Nano/pipeline.cppipe --data-file /mnt/Nano/images2_turso.csv
Build .sif from local Docker image:
sudo docker build -t cellprofiler:v4.2.5 - < git/cellprofiler-docker/v4.2.5/Dockerfile
sudo APPTAINER_TMPDIR=/home/hajaalin/tmp /opt/apptainer/bin/apptainer build cellprofiler_4.2.5-hajaalin.sif docker-daemon://cellprofiler:v4.2.5
/opt/apptainer/bin/apptainer run --bind /mnt/lmu_active1/instruments/Nano:/mnt/lmu_active1/instruments/Nano:ro cellprofiler_4.2.5-hajaalin.sif
StarDist on Turso
- CellProfiler plugin runstardist...
- ... but it seems so fast even with CPU that sending the task to Turso might not make sense anymore
StarDist on Puhti
- module load tensorflow; export PYTHONUSERBASE=...; pip install --user stardist
- use dask for parallel processing
Todo:
- now using paramiko in Airflow operators, how to use rsync? (SSH key, authorized_file/command)
- schedule a cleanup job to remove data from /scratch
CellProfiler on Puhti
https://docs.csc.fi/computing/containers/creating/#converting-a-docker-container
mkdir /projappl/tanhuanp/software/apptainer
cd /projappl/tanhuanp/software/apptainer/
export APPTAINER_TMPDIR=$LOCAL_SCRATCH
export APPTAINER_CACHEDIR=$LOCAL_SCRATCH
unset XDG_RUNTIME_DIR
apptainer pull cellprofiler_4.2.5-hajaalin.sif docker://hajaalin/cellprofiler:v4.2.5
/projappl/tanhuanp/software/apptainer/cellprofiler_4.2.5-hajaalin.sif -h
apptainer exec -B/scratch/:/scratch /projappl/tanhuanp/software/apptainer/cellprofiler_4.2.5-hajaalin.sif cellprofiler -h
MySQL database server
- test VM with Vagrant / VirtualBox
- db for Nano plate list
- db for per image QC data
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 0
$ nc 127.0.0.1 3306
Host '10.0.2.2' is not allowed to connect to this MariaDB server
sshd 903 root 5u IPv4 23052 0t0 TCP *:22 (LISTEN)
sshd 903 root 7u IPv6 23054 0t0 TCP *:22 (LISTEN)
mysqld 6065 mysql 22u IPv4 51155 0t0 TCP *:3306 (LISTEN)
IDM / Open IRIS
- OK ajastus
- OK retry
- OK vertaa automaattilista ja nykyinen customers
- OK listaa expired projektit → meili staffille
- OK kansioiden luonti
- OK listaa kansiot, katso onko ei-aktiivisia käyttäjiä (esim. projektin aikana poistunut participant)
- OK kopioi käyttäjälista lmu_servers -levylle
- OK käyttäjät Idm:ssä?