HPC 2024/06 Summer Kickstart
JUN2024 / FCCI Summer KICKSTART
Part of the Scientific Computing in Practice lecture series at Aalto University.
Audience: All FGCI consortium members looking for the HPC crash course.
About the course
The FGCI wide kickstart for all FGCI consortium members. We’ll have support representatives from several Universities. Most of material will be common for all the participants and in addition we organize breaking rooms for different sites (= sort of parallel sessions) when needed.
Overall, it is a three day kickstart for researchers to get started with the available computational resources at FGCI and CSC. The course is modular, see the tentative schedule below. On the day one we start with the basic HPC intro, go through the available resources at CSC and then switch to the FGCI sites practicalities. The days two and three we cover one by one steps on how to get started on the local computational clusters. Learning by doing.
By the end of the course you get the hints, ready solutions and copy/paste examples on how to find, run and monitor your applications, and manage your data. In addition to how to optimize your workflow in terms of filesystem traffic, memory usage etc.
Aalto users note: the course is obligatory for all new Triton users and recommended to all interested in the field.
Times
Time, date: Tue 6. Jun - Thu 7. Jun 11:50-16:00 EEST
Place: Online: Zoom link is TBA
Lecturering by: Aalto Science IT and CSC people
Registration: registration link
The daily schedule is flexible, below is the tentative plan. There will be frequent breaks. You will be given time to try and ask, it’s more like an informal help session to get you started with the computing resources.
BTW, HPC stands for High Performance Computing.
Day #1 (Tue 6.jun): Basics and background
11:50–12:00: Joining time/icebreaker
12:00–12:10 Introduction, about the course Richard Darst and other staff Materials: Summer Kickstart intro
12:10–12:25: From data storage to your science Enrico Glerean and Simo Tuomisto
Data is how most computational work starts, whether it is externally collected, simulation code, or generated. And these days, you can work on data even remotely, and these workflows aren’t obvious. We discuss how data storage choices lead to computational workflows. Materials: SciComp Intro
12:25–12:50: What is parallel computing? An analogy with cooking Enrico Glerean and Thomas Pfau
In workshops such as this, you will hear lots about parallel computing and how you need it, but rarely get a understandable introduction to how they relate and which are right for you. Here, we give a understandable metaphor with preparing large meals. Slides
13:00–13:25: How big is my calculation? Measuring your needs. Simo Tuomisto and Thomas Pfau
People often wonder how many resources their job needs, either on their own computer or on the cluster. When should you move to a cluster? How many resources to request? We’ll go over how we think about these problems.
13:25–13:50: Behind the scenes: the humans of scientific computing Richard Darst and Teemu Ruokolainen
Who are we that teach this course and provide SciComp support? What makes it such a fascinating career? Learn about what goes on behind the scenes and how you could join us.
14:00–14:45: Connecting to a HPC cluster Thomas Pfau and Simo Tuomisto
Required if you are attending the Triton/HPC tutorials the following days, otherwise the day is done.
14:00–14:20?: Livestream introduction to connecting
14:??–15:00: Individual help time in Zoom (links sent to registered participants)
Break until 15:00 once you get connected.
Material: Connecting to Triton
15:00–15:25: Using the cluster from the shell (files and directories) Richard Darst and Teemu Ruokolainen
Once we connect, what can we do? We’ll get a tour of the shell, files diretories, and how we copy basic data to the cluster. Material: Using the cluster from a shell.
15:25–15:50: What can you do with a computational cluster?
See several real examples of how people use the cluster (what you can do at the end of the course): 1) Large-scale computing with array jobs, 2) Large-scale parallel computing. Demo.
Preparation for day 2:
Remember to read/watch the “shell crash course” (see “Preparation” below) if you are not yet confident with the command line. This will be useful for tomorrow.
Day #2 (Wed 7.jun): Basic use of a cluster (Richard Darst, Simo Tuomisto)
11:50–12:00: Joining time/icebreaker
12:00–12:05: Introduction to days 2-3
12:05–12:30 Structure of a cluster: The Slurm queueing system
12:30–15:00: Running your first jobs in the queue
Monitoring job progress and job efficiency
15:00–15:30: Other things you should know about the HPC environment
15:30–16:00: Q&A
Day #3 (Thu 8.jun): Advanced cluster use (Simo Tuomisto, Richard Darst)
11:50–12:00: Joining time/icebreaker
12:00–12:30: What does “parallel” mean?:
12:30–14:00: Forms of parallelization
doc: /triton/tut/parallel-shared
doc: /triton/tut/parallel-mpi
14:00–14:30: Laptops to Lumi
You now know of basics of using a computing cluster. What if you need more than what a university can provide? CSC (and other national computing centers) have even more resources, and this is a tour of them. Slides from 2022 here.
14:40–15:30: Running jobs that can utilize GPU hardware:
15:30–16:00: Ask us anything
Cost: Free of charge for FGCI consortium members including University of Helsinki employees and students.
Course prerequisite requirements and other details specific to University of Helsinki
Participants will be provided with access to Kale & Turso for running examples. Participants are expected to have SSH client installed. You can use VDI for convenient access point.
- If you do not yet have access to Turso, request account now. See HPC Environment Guide - Access to cluster for instructions.
- Then, log in to HPC Environment User Guide to and verify that you have access.
- To access turso, you have to be within the university firewall, either by VDI, jumphost, or VPN. Also the eduroam in University premises is within our firewall when accessed with University account, but eduroam in other organizations is not. Examples of jumphosts are e.g. markka.it.helsinki.fi, pangolin.it.helsinki.fi, melkki.cs.helsinki.fi, melkinkari.cs.helsinki.fi, login.physics.helsinki.fi. There are many, because at any given point in time, some of them are in bad mood due to University AD implementation.
- You'll get access with command
ssh turso.cs.helsinki.fi
Or, if your username is not the same in kale and in the machine you are running the ssh client (possible e.g. when using VPN), with command
ssh username@turso.cs.helsinki.fi
- If you are connecting from Windows 10, you should be able to install ssh client from the software store. In earlier Windows versions, you need to install Putty (Side note: putty dot org is an advertisement site trying to get you to install something else). For using graphical programs, you might need to install X server on Windows. It is far easier to just use VDI in that case.
- There are some differences in the directory structures as compared e.g. Aalto. At University of Helsinki:
$PROJ points to a project directory /proj/$USER/ $WRKDIR points to the working directory /wrk/users/$USER/$HOME is a home directory intended for only profile files. Cache and other data should be redirected to $WRKDIR or $PROJ as appropriate.
- Make sure that you have Aalto software repositories. You can add the repositories by loading the fgci-common module by command:
module load fgci-common
- In the interactive exercises, the Triton user guide instructs to use 'interactive' queue. In kale there is no special queue for that, instead you should use 'short' queue for the interactive course jobs.
Storage
- $WRKDIR is based on Lustre, and optimized for high throughput, low latency workloads, and has a large capacity. See more details from a Lustre User Guide $WRKDIR is always local to a cluster. It is not backupped, and it is intended for storing the temporary files of your analysis. You should remove the files at the end of each run, if possible. Whenever there is shortage of space, the old files will be removed by the admins, so $WRKDIR is not a long term data storage, or safe by any means.
- $PROJ is based on NFS, and therefore much slower, however it offers good place for binaries, source codes etc that need backup. $PROJ has much greater latencies, and much lower throughput, and is not suitable for runtime datasets. $PROJ is shared between all clusters.
Other
- If you aren’t familiar with the Linux shell, watch the video.
- The more specific remote access instructions at Remote access to University resources.