HPC Cheat Sheet

Last modified by smaisala@helsinki_fi on 2024/02/08 06:49

Working examples on bash commands and snippets as well as summary info on the HPC environment.



Reporting Bugs & Issues

https://version.helsinki.fi/it-for-science/hpc


HPC Templates

https://version.helsinki.fi/it-for-science/hpc/-/tree/main



Federations

Vakka

subclusters Vorna, Ukko and Carrington

mounted to /wrk-vakka on Turso login nodes

symbolic link /wrk to /wrk-vakka on Vorna, Ukko and Carrington compute nodes 

Kappa

subcluster kale

mounted to /wrk-kappa on Turso login nodes

symbolic link /wrk to /wrk-kappa on Kale compute nodes


All systems running on x86 RHEL 8.x

uname -a



Turso login (within University domain)  [set your username]

ssh -YA username@turso.cs.helsinki.fi


Turso login (outside University domain) [care to include  >>  ] [set your username]

Host turso.cs.helsinki.fiProxyCommand ssh username@melkinpaasi.cs.helsinki.fi -W %h:%p \\n" >> ~/.ssh/configecho -e "\\n \\n    

ssh -YA username@turso.cs.helsinki.fi 


Turso file transfer (from outside University domain)  [set your username]

rsync -av --progress -e “ssh -A $username@pangolin.it.helsinki.fi ssh” /my/path $user@turso.cs.helsinki.fi:/wrk/users/user/dest



Dirs mounted on compute nodes

$HOME    # home dir

$PROJ      # user apps, no data

/wrk                 # scratch data

/wrk-vakka    # scratch data

/wrk-kappa    # scratch data


Change to local working dir in batch script

--chdir=/wrk/users/$USER



Slurm


interactive session and then execute xxx within

srun --interactive -n 4 --mem=4G -t 00:10:00 -p short -M [ukko|vorna|kale|carrington] --pty bash

srun -n4 --mem=1G -M [ukko|vorna|kale|carrington] xxx


interactive session with X11 forwarding

srun --interactive -n 4 --mem=4G -t 00:10:00 -p short -M [ukko|vorna|kale|carrington] --pty bash --x11


interactive GPU dev

srun --interactive -c4 --mem=4G -t04:00:00 -pgpu-oversub -Mukko --pty bash --export="ALL,CUDA_VISIBLE_DEVICES=0"

srun --interactive --mem-per-gpu=4G --cpus-per-gpu=4 -t04:00:00 -pgpu-oversub -G 1 -Mukko --pty bash



Lustre


quota                   lfs quota -hu $USER /wrk

find                       lfs find bash_find_syntax

usage per OST    lfs df -h optional_dir



Modules


command help    man module


installed        module avail

load               module load Python

loaded          module list


info                module help Python

search           module spider int


unload          module unload Python

unload all     module purge


save loaded to bundle    module save bundle_name

restore bundle                 module restore bundle_name



Python virtual env

cd /proj/$USER

module purge

module load Python/3.5.2-foss-2016b

python3 -m venv venv

source venv/bin/activate

pip install tensorflow



Singularity


Convert Docker images to Singularity – https://www.nas.nasa.gov/hecc/support/kb/converting-docker-images-to-singularity-for-use-on-pleiades_643.html



SBATCH


Module xthi – process/thread placement utility

Run file xthi.sh as below with ./sbatch xthi.sh


#!/bin/bash
#SBATCH  -M vorna
#SBATCH --partition=short 
#SBATCH --nodes=1 
#SBATCH -c 16 
#SBATCH -n 1 
#SBATCH --ntasks-per-node=1 
#SBATCH --mem=10G 

module purge
module load xthi
ml

export OMP_PLACES=cores 
export OMP_PROC_BIND=spread

srun --ntasks-per-node=1 -n1 -c16 --mpi=pmix_v3 xthi



MPI


Map-Reduce – Pi estimation example


Compile with e.g. mpicc -o pi pi.c -lm

For a 4 processes run, each process maps 5 out of the 20 intervals as below before reducing (summing up).


#include "mpi.h"
#include <math.h>
#include <stdio.h>

https://github.com/Leon3cs/mpi-samples//* Adapted from   */

int main(int argc, char *argv[])
{
        int n_intervals = 20;
        int rank, n_procs, i;
        double PI = 3.141592653589793238462643;

        double pi_interval, pi_approx, h, sum, x;

        MPI_Init(&argc, &argv);
        MPI_Comm_size(MPI_COMM_WORLD, &n_procs);
        MPI_Comm_rank(MPI_COMM_WORLD, &rank);


        /* broadcast */
        MPI_Bcast(&n_intervals, 1, MPI_INT, 0, MPI_COMM_WORLD);


        /* map */
        h = 1.0 / (double)n_intervals;
        sum = 0.0;
        for (i = rank + 1; i <= n_intervals; i += n_procs) {
                x = h * ((double)i - 0.5);
                sum += 4.0 / (1.0 + x * x);
        }
        pi_interval = h * sum;


        /* reduce */
        MPI_Reduce(&pi_interval, &pi_approx, 1, MPI_DOUBLE, MPI_SUM, 0,
                   MPI_COMM_WORLD);

        if (rank == 0) {
                printf("pi_approx %.12f \nerror %.5E \nrel_error %.5E\n",
                       pi_approx, fabs(pi_approx - PI),
                       100.0 * fabs(pi_approx - PI) / PI);
        }

        MPI_Finalize();
        return 0;
}