HPC Cheat Sheet

Last modified by smaisala@helsinki_fi on 2024/02/08 06:49

Working examples on bash commands and snippets as well as summary info on the HPC environment.

Reporting Bugs & Issues

https://version.helsinki.fi/it-for-science/hpc

HPC Templates

https://version.helsinki.fi/it-for-science/hpc/-/tree/main

Federations

Vakka

subclusters Vorna, Ukko and Carrington

mounted to /wrk-vakka on Turso login nodes

symbolic link /wrk to /wrk-vakka on Vorna, Ukko and Carrington compute nodes

Kappa

subcluster kale

mounted to /wrk-kappa on Turso login nodes

symbolic link /wrk to /wrk-kappa on Kale compute nodes

All systems running on x86 RHEL 8.x

uname -a

Turso login (within University domain) [set your username]

ssh -YA username@turso.cs.helsinki.fi

Turso login (outside University domain) [care to include >> ] [set your username]

Host turso.cs.helsinki.fiProxyCommand ssh username@melkinpaasi.cs.helsinki.fi -W %h:%p \\n" >> ~/.ssh/configecho -e "\\n \\n

ssh -YA username@turso.cs.helsinki.fi

Turso file transfer (from outside University domain) [set your username]

rsync -av --progress -e “ssh -A $username@pangolin.it.helsinki.fi ssh” /my/path $user@turso.cs.helsinki.fi:/wrk/users/user/dest

Dirs mounted on compute nodes

$HOME # home dir

~~$PROJ # user apps, no data~~

/wrk # scratch data

/wrk-vakka # scratch data

/wrk-kappa # scratch data

Change to local working dir in batch script

--chdir=/wrk/users/$USER

Slurm

interactive session and then execute xxx within

srun --interactive -n 4 --mem=4G -t 00:10:00 -p short -M [ukko|vorna|kale|carrington] --pty bash

srun -n4 --mem=1G -M [ukko|vorna|kale|carrington] xxx

interactive session with X11 forwarding

srun --interactive -n 4 --mem=4G -t 00:10:00 -p short -M [ukko|vorna|kale|carrington] --pty bash --x11

interactive GPU dev

srun --interactive -c4 --mem=4G -t04:00:00 -pgpu-oversub -Mukko --pty bash --export="ALL,CUDA_VISIBLE_DEVICES=0"

srun --interactive --mem-per-gpu=4G --cpus-per-gpu=4 -t04:00:00 -pgpu-oversub -G 1 -Mukko --pty bash

Lustre

quota lfs quota -hu $USER /wrk

find lfs find bash_find_syntax

usage per OST lfs df -h optional_dir

Modules

command help man module

installed module avail

load module load Python

loaded module list

info module help Python

search module spider int

unload module unload Python

unload all module purge

save loaded to bundle module save bundle_name

restore bundle module restore bundle_name

Python virtual env

cd /proj/$USER

module purge

module load Python/3.5.2-foss-2016b

python3 -m venv venv

source venv/bin/activate

pip install tensorflow

Singularity

Convert Docker images to Singularity – https://www.nas.nasa.gov/hecc/support/kb/converting-docker-images-to-singularity-for-use-on-pleiades_643.html

SBATCH

Module xthi – process/thread placement utility

Run file xthi.sh as below with ./sbatch xthi.sh

#!/bin/bash
#SBATCH -M vorna
#SBATCH --partition=short
#SBATCH --nodes=1
#SBATCH -c 16
#SBATCH -n 1
#SBATCH --ntasks-per-node=1
#SBATCH --mem=10G

module purge
module load xthi
ml

export OMP_PLACES=cores
export OMP_PROC_BIND=spread

srun --ntasks-per-node=1 -n1 -c16 --mpi=pmix_v3 xthi

MPI

Map-Reduce – Pi estimation example

Compile with e.g. mpicc -o pi pi.c -lm

For a 4 processes run, each process maps 5 out of the 20 intervals as below before reducing (summing up).

#include "mpi.h"
#include <math.h>
#include <stdio.h>

https://github.com/Leon3cs/mpi-samples//* Adapted from */

int main(int argc, char *argv[])
{
int n_intervals = 20;
int rank, n_procs, i;
double PI = 3.141592653589793238462643;

double pi_interval, pi_approx, h, sum, x;

MPI_Init(&argc, &argv);
MPI_Comm_size(MPI_COMM_WORLD, &n_procs);
MPI_Comm_rank(MPI_COMM_WORLD, &rank);

/* broadcast */
MPI_Bcast(&n_intervals, 1, MPI_INT, 0, MPI_COMM_WORLD);

/* map */
h = 1.0 / (double)n_intervals;
sum = 0.0;
for (i = rank + 1; i <= n_intervals; i += n_procs) {
x = h * ((double)i - 0.5);
sum += 4.0 / (1.0 + x * x);
}
pi_interval = h * sum;

/* reduce */
MPI_Reduce(&pi_interval, &pi_approx, 1, MPI_DOUBLE, MPI_SUM, 0,
MPI_COMM_WORLD);

if (rank == 0) {
printf("pi_approx %.12f \nerror %.5E \nrel_error %.5E\n",
pi_approx, fabs(pi_approx - PI),
100.0 * fabs(pi_approx - PI) / PI);
}

MPI_Finalize();
return 0;
}

HPC Cheat Sheet

Navigation