Nano automation

Last modified by Harri Tapio Jäälinoja on 2024/03/20 11:24

Motivation

I noticed that I end up spending a lot of time preparing material for Nano users' CellProfiler trainings. Lately there's been two main things: 1) nuclei segmentation by StarDist and 2) image quality measurements with CellProfiler. I have prepared a Fiji script and a CP pipeline that users can use for these steps, and before the training I would run it for their first data set, just to have something to show in the training. This got a bit repetitive, so how about automating the whole thing?

Plan

lmu-airflow server for managing workflows

CellProfiler on Turso

  • Use singularity
    • export APPTAINER_CACHEDIR=/proj/group/lmu/software/singularity
    • singularity pull docker://cellprofiler/cellprofiler:4.2.5
    • singularity pull /proj/group/lmu/software/singularity/cellprofiler-4.2.5-hj.sif docker://hajaalin/cellprofiler:master
    • singularity exec --bind /run/user/1028227 --bind /etc/machine-id cellprofiler.v4.2.1.sif cellprofiler
    • singularity exec cellprofiler.v4.2.1.sif cellprofiler -c -r -p tmp/cptest/pipeline.cppipe --data-file=tmp/cptest/images.csv -f 1 -l 2
    • /opt/apptainer/bin/apptainer exec --bind /mnt/lmu_active1/instruments/Nano:/mnt/lmu_active1/instruments/Nano:ro cellprofiler_4.2.5.sif cellprofiler -c -p git/airflow-deploy/cellprofiler/test1.cppipe --data-file git/airflow-deploy/imagesets_test1.csv -o git/airflow-deploy/cellprofiler/output/
  • Prepare pipeline
    • use LoadData module (read .csv)
    • exporttodatabase → MySQL database
# on turso login node:
srun --interactive -n 1 --mem=4G -t 02:00:00 -p short -M ukko --pty bash

# on compute node:
singularity run --bind /wrk-vakka/group/lmu/nano:/mnt /proj/group/lmu/software/singularity/cellprofiler-4.2.5 -c -p /mnt/Nano/pipeline.cppipe --data-file /mnt/Nano/images2_turso.csv

Build .sif from local Docker image:

rm cellprofiler_4.2.5-hajaalin.sif
sudo docker build -t cellprofiler:v4.2.5 - < git/cellprofiler-docker/v4.2.5/Dockerfile
sudo APPTAINER_TMPDIR=/home/hajaalin/tmp /opt/apptainer/bin/apptainer build cellprofiler_4.2.5-hajaalin.sif docker-daemon://cellprofiler:v4.2.5
/opt/apptainer/bin/apptainer run  --bind /mnt/lmu_active1/instruments/Nano:/mnt/lmu_active1/instruments/Nano:ro cellprofiler_4.2.5-hajaalin.sif

StarDist on Turso

  • CellProfiler plugin runstardist...
  • ... but it seems so fast even with CPU that sending the task to Turso might not make sense anymore

StarDist on Puhti

  • module load tensorflow; export PYTHONUSERBASE=...; pip install --user stardist
  • use dask for parallel processing

Todo:

  • now using paramiko in Airflow operators, how to use rsync? (SSH key, authorized_file/command)
  • schedule a cleanup job to remove data from /scratch

CellProfiler on Puhti

https://docs.csc.fi/computing/containers/creating/#converting-a-docker-container

mkdir /projappl/tanhuanp/software/apptainer

cd /projappl/tanhuanp/software/apptainer/

export APPTAINER_TMPDIR=$LOCAL_SCRATCH
export APPTAINER_CACHEDIR=$LOCAL_SCRATCH
unset XDG_RUNTIME_DIR

apptainer pull cellprofiler_4.2.5-hajaalin.sif docker://hajaalin/cellprofiler:v4.2.5

/projappl/tanhuanp/software/apptainer/cellprofiler_4.2.5-hajaalin.sif -h

apptainer exec -B/scratch/:/scratch /projappl/tanhuanp/software/apptainer/cellprofiler_4.2.5-hajaalin.sif cellprofiler -h

MySQL database server

  • test VM with Vagrant / VirtualBox
  • db for Nano plate list
  • db for per image QC data
hajaalin@pangolin-18:~$ mysql -h dx5-biotek4.biocenter.helsinki.fi -u cellprofiler nano_qc
ERROR 2013 (HY000): Lost connection to MySQL server at 'reading initial communication packet', system error: 0

$ nc 127.0.0.1 3306
Host '10.0.2.2' is not allowed to connect to this MariaDB server
[root@cpdb my.cnf.d]# lsof -i -P -n|grep LISTEN
sshd       903    root    5u  IPv4  23052      0t0  TCP *:22 (LISTEN)
sshd       903    root    7u  IPv6  23054      0t0  TCP *:22 (LISTEN)
mysqld    6065   mysql   22u  IPv4  51155      0t0  TCP *:3306 (LISTEN)

IDM / Open IRIS

  • OK ajastus
  • OK retry
  • OK vertaa automaattilista ja nykyinen customers
  • OK listaa expired projektit → meili staffille
  • OK kansioiden luonti
  • OK listaa kansiot, katso onko ei-aktiivisia käyttäjiä (esim. projektin aikana poistunut participant)
  • OK kopioi käyttäjälista lmu_servers -levylle
  • OK käyttäjät Idm:ssä?