Jan 2021 / FCCI Winter Kickstart 2021
Part of the Scientific Computing in Practice lecture series at Aalto University.
Audience: All FGCI consortium members looking for the HPC crash course.
About the course
The first time FGCI wide kickstart for all FGCI consortium members. We’ll have support representatives from several Universities. Most of material will be common for all the participants and in addition we organize breaking rooms for different sites (= sort of parallel sessions) when needed.
Overall, it is a three day kickstart for researchers to get started with the available computational resources at FGCI and CSC. The course is modular, see the tentative schedule below. On the day one we start with the basic HPC intro, go through the available resources at CSC and then switch to the FGCI sites practicalities. The days two and three we cover one by one steps on how to get started on the local computational clusters. Learning by doing.
In addition, on the last day we will have HTCondor introduction for all interested (UH Note: HT Condor is not applicable to University of Helsinki but you are free to attend).
By the end of the course you get the hints, ready solutions and copy/paste examples on how to find, run and monitor your applications, and manage your data. In addition to how to optimize your workflow in terms of filesystem traffic, memory usage etc.
Aalto users note: the course is obligatory for all new Triton users and recommended to all interested in the field.
Time, date: Fri 29.1, Mon 2.2, Wed 3.2, 12:00-16:00 EEST
Place: Online: Zoom link is TBA
Lecturering by: Aalto Science IT and CSC people
Registration: registration link
The daily schedule is flexible, below is the tentative plan. There will be frequent breaks. You will be given time to try and ask, it’s more like an informal help session to get you started with the computing resources.
BTW, HPC stands for High Performance Computing.
Day #1 (Mon 8.jun):
Module #1.1 (15m): Welcome, course details
Module #1.2 (1h): HPC crash course: what is behind the front-end // lecture // HPC fundamentals: terminology, architectures, interconnects, infrastructure behind, as well as MPI vs shared memory // Ivan Degtyarenko
Module #1.3 (1h): CSC resources overview // lecture with demos // An overview of CSC computing environment and services including Puhti supercomputer, Allas data management solution, Cloud services, notebooks, containers, etc // Jussi Enkovaara and Henrik Nortamo
Module #1.4 (1h) Gallery of computing workflows // There are more options that just Triton by ssh, like we will learn later. We’ll give an overview of all the ways you can work. // Enrico Glerean
- Aalto: Remote workflows at Aalto
Module #1.5 (.5h): Connecting to the cluster // tutorial // Get connected to Triton in preparation for day 2 // Enrico Glerean
- Aalto: Connecting to Triton tutorial – if you can ssh to Triton and run
hostname, you are ready for tomorrow.
Day #2 (Tue 9.jun):
Module #2.1 (4h): Getting started on the cluster // tutorial // SLURM basics, software, and storage. Workflow, running and monitoring serial jobs on Triton. Interactively and in batch mode. module and toolchains, special resources like GPU // Richard Darst
Day #3 (Wed 10.jun):
Module #3.1 (2h): Advanced SLURM // tutorial // Running in parallel with MPI and OpenMP, array jobs, running on GPU with -gres, local drives, constraints // Simo Tuomisto
Module #3.2 (1.5h): HTCondor (Not applicable to Univeristy of Helsinki) // lecture with demos // Did you know that department workstations can be used for distributed computing? HTCondor lets you // Matthew West
Cost: Free of charge for FGCI consortium members including University of Helsinki employees and students.
Course prerequisite requirements and other details specific to University of Helsinki
Participants will be provided with access to Kale & Ukko2 for running examples. Participants are expected to have SSH client installed. You can use VDI for convenient access point.
- If you do not yet have access to Kale / Ukko2, request account now. See Kale User Guide for instructions.
- Then, log in to Kale / Ukko2 to and verify that you have access.
- To Access Kale, you have to be within the university firewall, either by VDI, jumphost, or VPN. Also the eduroam in University premises is within our firewall when accessed with University account, but eduroam in other organizations is not. Examples of jumphosts are e.g. markka.it.helsinki.fi, pangolin.it.helsinki.fi, melkki.cs.helsinki.fi, melkinkari.cs.helsinki.fi, login.physics.helsinki.fi. There are many, because at any given point in time, some of them are in bad mood due to University AD implementation.
- You'll get access with command
Or, if your username is not the same in kale and in the machine you are running the ssh client (possible e.g. when using VPN), with command
- If you are connecting from Windows 10, you should be able to install ssh client from the software store. In earlier Windows versions, you need to install Putty (Side note: putty dot org is an advertisement site trying to get you to install something else). For using graphical programs, you might need to install X server on Windows. It is far easier to just use VDI in that case.
- There are some differences in the directory structures as compared e.g. Aalto. At University of Helsinki:
$PROJ points to a project directory /proj/$USER/
$WRKDIR points to the working directory /wrk/users/$USER/
$HOME is a home directory intended for only profile files. Cache and other data should be redirected to $WRKDIR or $PROJ as appropriate.
- Make sure that you have Aalto software repositories. You can add the repositories by loading the fgci-common module by command:
module load fgci-common
- In the interactive exercises, the Triton user guide instructs to use 'interactive' queue. In kale there is no special queue for that, instead you should use 'short' queue for the interactive course jobs.
- $WRKDIR is based on Lustre, and optimized for high throughput, low latency workloads, and has a large capacity. See more details from a Lustre User Guide. $WRKDIR is always local to a cluster.
- $PROJ is based on NFS, and therefore much slower, however it offers good place for binaries, source codes etc that need backup. $PROJ has much greater latencies, and much lower throughput, and is not suitable for runtime datasets. $PROJ is shared between all clusters.
- If you aren’t familiar with the Linux shell, watch the video.
- The more specific remote access instructions at Remote access to University resources.
Logical View of HY Clusters
Additional course info at: it4sci <at> helsinki <dot> fi