HPC 2024/06  Summer Kickstart

Last modified by Sami Maisala on 2024/06/04 10:00

JUN2024 / FCCI Summer KICKSTART

Part of the Scientific Computing in Practice lecture series at Aalto University.

Audience: All FGCI consortium members looking for the HPC crash course.

About the course

The FGCI wide kickstart for all FGCI consortium members. We’ll have support representatives from several Universities. Most of material will be common for all the participants and in addition we organize breaking rooms for different sites (= sort of parallel sessions) when needed.

Overall, it is a three day kickstart for researchers to get started with the available computational resources at FGCI and CSC. The course is modular, see the tentative schedule below. On the day one we start with the basic HPC intro, go through the available resources at CSC and then switch to the FGCI sites practicalities. The days two and three we cover one by one steps on how to get started on the local computational clusters. Learning by doing.

By the end of the course you get the hints, ready solutions and copy/paste examples on how to find, run and monitor your applications, and manage your data. In addition to how to optimize your workflow in terms of filesystem traffic, memory usage etc.

Aalto users note: the course is obligatory for all new Triton users and recommended to all interested in the field.

Times

Time, date: Tue 6. Jun - Thu 7. Jun 11:50-16:00 EEST

Place: Online: Zoom link is TBA

Lecturering by: Aalto Science IT and CSC people

Registration: registration link

The daily schedule is flexible, below is the tentative plan. There will be frequent breaks. You will be given time to try and ask, it’s more like an informal help session to get you started with the computing resources.

BTW, HPC stands for High Performance Computing.

  • Day #1 (Tue 6.jun): Basics and background

    •  

    11:50–12:00: Joining time/icebreaker

    •  

    12:00–12:10 Introduction, about the course Richard Darst and other staff Materials: Summer Kickstart intro

    •  

    12:10–12:25: From data storage to your science Enrico Glerean and Simo Tuomisto


      •  

    Data is how most computational work starts, whether it is externally collected, simulation code, or generated. And these days, you can work on data even remotely, and these workflows aren’t obvious. We discuss how data storage choices lead to computational workflows. Materials: SciComp Intro

    •  

    12:25–12:50: What is parallel computing? An analogy with cooking Enrico Glerean and Thomas Pfau


      •  

    In workshops such as this, you will hear lots about parallel computing and how you need it, but rarely get a understandable introduction to how they relate and which are right for you. Here, we give a understandable metaphor with preparing large meals. Slides

    •  

    13:00–13:25: How big is my calculation? Measuring your needs. Simo Tuomisto and Thomas Pfau


      •  

    People often wonder how many resources their job needs, either on their own computer or on the cluster. When should you move to a cluster? How many resources to request? We’ll go over how we think about these problems.

    •  

    13:25–13:50: Behind the scenes: the humans of scientific computing Richard Darst and Teemu Ruokolainen


      •  

    Who are we that teach this course and provide SciComp support? What makes it such a fascinating career? Learn about what goes on behind the scenes and how you could join us.

    •  

    14:00–14:45: Connecting to a HPC cluster Thomas Pfau and Simo Tuomisto


      •  

    Required if you are attending the Triton/HPC tutorials the following days, otherwise the day is done.


      •  

    14:00–14:20?: Livestream introduction to connecting


      •  

    14:??–15:00: Individual help time in Zoom (links sent to registered participants)


      •  

    Break until 15:00 once you get connected.


      •  

    Material: Connecting to Triton

    •  

    15:00–15:25: Using the cluster from the shell (files and directories) Richard Darst and Teemu Ruokolainen


      •  

    Once we connect, what can we do? We’ll get a tour of the shell, files diretories, and how we copy basic data to the cluster. Material: Using the cluster from a shell.

    •  

    15:25–15:50: What can you do with a computational cluster?


      •  

    See several real examples of how people use the cluster (what you can do at the end of the course): 1) Large-scale computing with array jobs, 2) Large-scale parallel computing. Demo.

    •  

    Preparation for day 2:


      •  

    Remember to read/watch the “shell crash course” (see “Preparation” below) if you are not yet confident with the command line. This will be useful for tomorrow.

  • Day #2 (Wed 7.jun): Basic use of a cluster (Richard Darst, Simo Tuomisto)

    •  

    11:50–12:00: Joining time/icebreaker

    •  

    12:00–12:05: Introduction to days 2-3


      •  

    About clusters and your work

    •  

    12:05–12:30 Structure of a cluster: The Slurm queueing system


      •  

    Slurm: the queuing system

    •  

    12:30–15:00: Running your first jobs in the queue


      •  

    Interactive jobs


      •  

    Serial Jobs


      •  

    Monitoring job progress and job efficiency

    •  

    15:00–15:30: Other things you should know about the HPC environment


      •  

    Software modules


      •  

    Data storage


      •  

    Remote access to data

    •  

    15:30–16:00: Q&A

  • Day #3 (Thu 8.jun): Advanced cluster use (Simo Tuomisto, Richard Darst)

    •  

    11:50–12:00: Joining time/icebreaker

    •  

    12:00–12:30: What does “parallel” mean?:


      •  

    Parallel computing

    •  

    12:30–14:00: Forms of parallelization


      •  

    Array jobs


      •  

    doc: /triton/tut/parallel-shared


      •  

    doc: /triton/tut/parallel-mpi

    •  

    14:00–14:30: Laptops to Lumi


      •  

    You now know of basics of using a computing cluster. What if you need more than what a university can provide? CSC (and other national computing centers) have even more resources, and this is a tour of them. Slides from 2022 here.

    •  

    14:40–15:30: Running jobs that can utilize GPU hardware:


      •  

    GPU computing

    •  

    15:30–16:00: Ask us anything

  •  

Cost: Free of charge for FGCI consortium members including University of Helsinki employees and students.

Course prerequisite requirements and other details specific to University of Helsinki

Participants will be provided with access to Kale & Turso for running examples. Participants are expected to have SSH client installed. You can use VDI for convenient access point.

ssh turso.cs.helsinki.fi

    • Or, if your username is not the same in kale and in the machine you are running the ssh client (possible e.g. when using VPN), with command 

    •  
ssh username@turso.cs.helsinki.fi

    • If you are connecting from Windows 10, you should be able to install ssh client from the software store.  In earlier Windows versions, you need to install Putty (Side note: putty dot org is an advertisement site trying to get you to install something else).  For using graphical programs, you might need to install X server on Windows.  It is far easier to just use VDI in that case.
    • There are some differences in the directory structures as compared e.g. Aalto.   At University of Helsinki:

    •  
$PROJ points to a project directory /proj/$USER/ $WRKDIR points to the working directory /wrk/users/$USER/$HOME is a home directory intended for only profile files. Cache and other data should be redirected to $WRKDIR or $PROJ as appropriate.

    • Make sure that you have Aalto software repositories. You can add the repositories by loading the fgci-common module by command:
    •  
module load fgci-common
  • In the interactive exercises, the Triton user guide instructs to use 'interactive' queue.  In kale there is no special queue for that, instead you should use 'short' queue for the interactive course jobs.

Storage

  • $WRKDIR is based on Lustre, and optimized for high throughput, low latency workloads, and has a large capacity. See more details from a Lustre User Guide $WRKDIR is always local to a cluster.  It is not backupped, and it is intended for storing the temporary files of your analysis.  You should remove the files at the end of each run, if possible.  Whenever there is shortage of space, the old files will be removed by the admins, so $WRKDIR is not a long term data storage, or safe by any means.
  • $PROJ is based on NFS, and therefore much slower, however it offers good place for binaries, source codes etc that need backup. $PROJ has much greater latencies, and much lower throughput, and is not suitable for runtime datasets. $PROJ is shared between all clusters.

Other

Logical View of HY Clusters

Additional course info at: it4sci <at> helsinki <dot> fi