HPC 2022/02 Summer Kickstart

Last modified by juhaheli@helsinki_fi on 2024/02/08 06:49

Part of the Scientific Computing in Practice lecture series at Aalto University.

Audience: All FGCI consortium members looking for the HPC crash course.

About the course

The FGCI wide kickstart for all FGCI consortium members. We’ll have support representatives from several Universities. Most of material will be common for all the participants and in addition we organize breaking rooms for different sites (= sort of parallel sessions) when needed.

Overall, it is a three day kickstart for researchers to get started with the available computational resources at FGCI and CSC. The course is modular, see the tentative schedule below. On the day one we start with the basic HPC intro, go through the available resources at CSC and then switch to the FGCI sites practicalities. The days two and three we cover one by one steps on how to get started on the local computational clusters. Learning by doing.

By the end of the course you get the hints, ready solutions and copy/paste examples on how to find, run and monitor your applications, and manage your data. In addition to how to optimize your workflow in terms of filesystem traffic, memory usage etc.

Aalto users note: the course is obligatory for all new Triton users and recommended to all interested in the field.

Times

Time, date: Wed 2.3, Thu 3.3 11:50-16:00 EEST

Place: Online: Zoom link is TBA

Lecturering by: Aalto Science IT and CSC people

Registration: registration link

The daily schedule is flexible, below is the tentative plan. There will be frequent breaks. You will be given time to try and ask, it’s more like an informal help session to get you started with the computing resources.

BTW, HPC stands for High Performance Computing.

Day #1 (Wed 2nd Feb)

11:50 – 15:00 - Feb 2022 / Getting started with scientific computing
15:00 – 15:45 (especially important) Help connecting to Triton (Aalto), Zoom link by email to registered participants.

Day #2 (Thu 3rd Feb)

All times approximate, breaks every hour.

11:50 – 12:30: What can you do with a computational cluster?
About clusters and your work
Real example 1: Large-scale computing with array jobs
Real example 2: Large-scale parallel computing
12:30 – 15:00: Running your first jobs in the queue
Interactive jobs
Serial Jobs
Monitoring job progress and job efficiency
15:00 – 15:30: Other things you should know about the HPC environment
Software modules
Data storage
Remote access to data
15:30 – 16:00: Questions to presenters

Day #3 (Fri 4th Feb)

All times approximate, breaks every hour.

11:50 – 13:00: Simple parallelization with array jobs
Array jobs
13:00 – 14:00: Using more than one CPU at the same time
Parallel computing
14:00 – 14:30: Laptops to Lumi, Jussi Enkovaara, CSC
You now know of basics of using a computing cluster. What if you need more than what a university can provide? CSC (and other national computing centers) have even more resources, and this is a tour of them.
14:40 – 15:30: Running jobs that can utilize GPU hardware
GPU computing
15:30 – 16:00: Questions to presenters

Cost: Free of charge for FGCI consortium members including University of Helsinki employees and students.

Course prerequisite requirements and other details specific to University of Helsinki

Participants will be provided with access to Kale & Turso for running examples. Participants are expected to have SSH client installed. You can use VDI for convenient access point.

If you do not yet have access to Kale / Turso, request account now. See Kale User Guide for instructions.
Then, log in to HPC Environment User Guide to and verify that you have access.
- To Access Kale, you have to be within the university firewall, either by VDI, jumphost, or VPN. Also the eduroam in University premises is within our firewall when accessed with University account, but eduroam in other organizations is not. Examples of jumphosts are e.g. markka.it.helsinki.fi, pangolin.it.helsinki.fi, melkki.cs.helsinki.fi, melkinkari.cs.helsinki.fi, login.physics.helsinki.fi. There are many, because at any given point in time, some of them are in bad mood due to University AD implementation.
- You'll get access with command

ssh kale.grid.helsinki.fi

- Or, if your username is not the same in kale and in the machine you are running the ssh client (possible e.g. when using VPN), with command

ssh username@kale.grid.helsinki.fi

- If you are connecting from Windows 10, you should be able to install ssh client from the software store. In earlier Windows versions, you need to install Putty (Side note: putty dot org is an advertisement site trying to get you to install something else). For using graphical programs, you might need to install X server on Windows. It is far easier to just use VDI in that case.
- There are some differences in the directory structures as compared e.g. Aalto. At University of Helsinki:

$PROJ points to a project directory /proj/$USER/ $WRKDIR points to the working directory /wrk/users/$USER/$HOME is a home directory intended for only profile files. Cache and other data should be redirected to $WRKDIR or $PROJ as appropriate.

- Make sure that you have Aalto software repositories. You can add the repositories by loading the fgci-common module by command:

module load fgci-common

In the interactive exercises, the Triton user guide instructs to use 'interactive' queue. In kale there is no special queue for that, instead you should use 'short' queue for the interactive course jobs.

Storage

$WRKDIR is based on Lustre, and optimized for high throughput, low latency workloads, and has a large capacity. See more details from a Lustre User Guide $WRKDIR is always local to a cluster. It is not backupped, and it is intended for storing the temporary files of your analysis. You should remove the files at the end of each run, if possible. Whenever there is shortage of space, the old files will be removed by the admins, so $WRKDIR is not a long term data storage, or safe by any means.
$PROJ is based on NFS, and therefore much slower, however it offers good place for binaries, source codes etc that need backup. $PROJ has much greater latencies, and much lower throughput, and is not suitable for runtime datasets. $PROJ is shared between all clusters.

Other

If you aren’t familiar with the Linux shell, watch the video.
The more specific remote access instructions at Remote access to University resources.