Page tree
Skip to end of metadata
Go to start of metadata

1.0 Access

New HPC cluster ukko2 is now in production. To be able to use the cluster, you need to either

  • be member of CS staff or
  • belong to the IDM group grp-cs-ukko2

Please note that it might take up to 2 hours after you have been added to either group before your home directory gets created. 


Remote connection

Connections are only allowed from the helsinki.fi domain. VPN or eduroam is not sufficient. To access ukko2 outside of the domain (e.g. home), add following to your ~/.ssh/config:

Host ukko2.cs.helsinki.fi
   ProxyCommand ssh username@melkinpaasi.cs.helsinki.fi -W %h:%p
Note: with OpenSSH versions > 7.3, you can substitute the ProxyCommand with more human readable line: ProxyJump melkinpaasi.cs.helsinki.fi

To access a login node:

ssh <username>@ukko2.cs.helsinki.fi

Batch scheduling system is the most prominent difference between ukko and ukko2. Instead of logging directly into computing node and executing jobs there interactively, users now log in to a login node and submit the jobs via batch scheduler. The login node is for batch job management and for compiler/development environment only. Do not execute any production jobs there.  Slurm handles the resource requests and optimises the resource allocation.

Another change is a Module System. Modules are used to manage software packages, compiler environments etc. Users can load or unload modules freely.

1.1 Ukko 1 Resources

    • At this time there is a single Ukko 1 Cubbli Linux node available, see instructions. Node has 4 cores, 32GB of RAM.

1.2 Ukko 2 Resources

    • Single login node ukko2 serves logins, compiler environments and all batch scheduling functions
    • 31 regular compute nodes (ukko-02  - ukko-32):   28 cores, 2 threads and  256 GB RAM. 
    • big memory  nodes (ukko2-pekka, ukko2-paavo):  96 cores, 2 threads and 3 TB RAM.  
    • GPU nodes (ukko2-g01, ukko2-g02) :  28 cores, 2 threads, 512 GB RAM and 4 Tesla P100 GPU cards. 

1.3 I/O, disks and filesystems

The home directory is the same as it is in the old ukko: all user files will be available on Ukko2. User can also access the files in ukko from the department computers using the path /cs/work/home/username. Local work directory is /cs/work/scratch. Please do not use local drives of the nodes. If you need local drives as a resource, please consider using Kale instead.

2.0 Scientific software

Most development tools and software packages are available through modules, but not all. Here's references to those which you might need to install locally at this time. 

3.0 Queuing system

Jobs are managed by SLURM and runtime environment is managed with aforementioned modules. The jobs are submitted from the ukko2 login node (ukko2.cs.helsinki.fi). There are six production queues and one short queue to test jobs. Some queues overlap resources for better system utilisation:

Queue name

Wall Time limit

Cores

Memory per core

Notes

short24h10328GB - 32GB
long14 days8688GB
extralong60 days288GBSingle node
bigmem7 days19232GBTwo nodes
gpu7 days5618GBHave to reserve with #SBATCH p gpu and --Gres:gpu=<nbr# gpu's>
test1h1128GBHave to reserve with #SBATCH -p test
cubbli1 day48GBHave to reserve with -p cubbli

3.1 Creating a Batch job

Simplest way to submit job into the system is to do it as a simple serial job. Following example is requesting 1 core and 100M of memory for 10 minutes and placement in a test queue. At the end of the script, srun command is used to start the program. You can think the scheduler as elaborate time and resource reservation system. #SABTCH -sections describe the required resources and the rest is just regular Linux. You can execute pretty much any Linux script or command you would be able to execute on the login node. There are few exceptions such as setting up daemons, but nothing you would ordinarily encounter.

Batch script needs to start with shebang (#!/bin/bash) and the batch parameters have to be set in the script before the actual program.

Time conventions

sbatch syntax for time may not be obvious. Normally times are set as DD-hh:mm:ss where DD=days, hh=hours, mm=minutes, and ss=seconds. However, you can also use 2-0 to represent 2 days, while 10:00 would indicate 10 minutes.


#!/bin/bash
#SBATCH --job-name=test
#SBATCH -o result.txt
#SBATCH -p test
#SBATCH -c 1
#SBATCH -t 10:00
#SBATCH --mem-per-cpu=100

srun hostname
srun sleep 60

Following command submits the test.job into the system and , and the scheduler takes care of the job placement (sbatch accepts additional options on the command line):

sbatch test.job

3.1.1 Environment Variables and Exit Codes

If you need only some environment variables to be propagated from your session, or none, you can choose export option (default is ALL. See a special case for Cubbli Linux nodes):

 --export=<environment variables | ALL | NONE>

When batch job is submitted and launched, Slurm sets number of environment variables which can be used for job control. Standard linux exit codes are used at job exit. Please see this page for a full compendium of the variables and error codes.

3.1.2 Serial - Consumable resources

Below are some of the most common batch job options for serial jobs. If no values are given, system defaults are used. These values are used to determine the job priority.

Job Wall Time limit:

#SBATCH -t <Wall Time limit>

Job CPU count equals to the cores:

#SBATCH -c <CPU count>

Job memory limit (Please Note: memory reservation is per core):

#SBATCH --mem-per-cpu=<MB>

3.1.3 Further job control

When a job s submitted, you can use following commands to view the status of the queues, and change the job status if needed.

Show queue information:

sinfo -l

If you want to cancel your job:

scancel   <jobID>   

To check the status of your jobs that are in the queue:

squeue -l -u yourusername

For information about a job that is running:

scontrol show jobid -dd <jobID>

For information about a completed job's efficiency. Output of seff is automatically included in the end of job mail notifications, if notifications are set to be sent in the batch script.

Seff

Note that seff produces meaningful output only from successfully completed job.
seff <jobID>

Job control summary of less common commands:

 Command Description
sacctDisplays accounting data for all jobs.
scontrol View SLURM configuration and state.
sjstat  Display statistics of jobs under control of SLURM (combines data from sinfo, squeue and scontrol).
sprioDisplay the priorities of the pending jobs. Jobs with higher priotities are launched first.
smap Graphically view information about SLURM jobs, partitions, and set configurations parameters.

Comprehensive Slurm Quick Reference & Cheat Sheet which can be printed out.

Page for additional information if you prefer PBS-like job control. 

3.2 Serial or Parallel Job?

Serial job is any program that runs on a single machine. In case of Ukko2, it means a program running on a single core.

Parallel job is composed of multiple processes which run on multiple machines. Simplest case would be a job that uses two cpu's and sets of related processes. Processes talk to each other through a medium shared between the cpu's, like a local memory space.  

More about parallel processing, and related options, please see page about Parallel Processing. If you are interested in Spark deployment, please see a Spark User Guide.

3.2.1 Testing and Development

Before running jobs on the production queues, resource requirements, and in case of MPI jobs, scalability should be tested. 1h test queue is available for this purpose. "-p" parameter is mandatory for test jobs.

Queues

Non-GPU Production jobs do not require -p option. Leaving it out allows greater flexibility for a job placement.
#SBATCH -p test

When submitting job for test queue, following parameters could be set in the batch -file. Mail parameters can be used with any jobs, they are not limited to test:

#!/bin/bash
#SBATCH --job-name=test                  // Job name to be displayed in queue
#SBATCH --output=foobar.out              // Job output at the completion
#SBATCH -p test                          // Request test partition
#SBATCH -c 1                             // Request single core
#SBATCH -e foobar.err                    // Define error file
#SBATCH --mail-type=END                  // Defining END of job mail notification
#SBATCH --mail-user=user@address.mail    // mail recipient

srun hostname                            // commands to be run
srun sleep 60

3.2.2 How to request GPU's

Below an example script for GPU usage, assuming single GPU (note that for this you need to use --gres:gpu=1), two cores and 100M of memory  to be used for a default time:

#!/bin/bash
#SBATCH --job-name=test
#SBATCH -o result.txt
#SBATCH -p gpu
#SBATCH -c 2
#SBATCH --gres=gpu:1
#SBATCH --mem-per-cpu=100

srun hostname
srun sleep 60

Running as a deamon

You may not have heard of Nvidia GPU Grant Program. Professors and Researchers may be eligible to participate to the program. If in doubt, please do check the requirements and eligibility.

3.2.3 Setting up e-mail notifications

You can set up e-mail notifications for batch job. If set, changes in the job status will be sent to specified user. Default is the user who submits the job.

E-mail Notification

Job END and ALL -notifications include the seff output. It is very useful for finding out the actual job resource utilisation.

Most commonly chosen mail options are: NONE, BEGIN, END, FAIL or ALL. To set the option, following line is needed in the batch script. Multiple options can be set as comma separated list:

#SBATCH --mail-type=<option>,<option>

User may also specify mail address other than default:

#SBATCH --mail-user=<mail@address>

3.3 Interactive use

There is no need for direct access to node to start interactive session. Slurm allows interactive sessions to be started with srun and it is a great way to do testing and debugging. After entering the srun command, interactive job request is sent to the normal queue to wait for resources to become available. Once resources are available the session starts on a compute node, and you are put into the directory from which you ran the launched the session. You can then run commands.

Environment you get on compute node is determined by:

    1. The environment as set in your session from which you launch the srun command.
    2. Any extra variables set by Slurm
    3. Settings from your .bashrc file

Example of starting 1 core, 1 task interactive session with bash -shell. If values are not set, they are inherited from the system or queue defaults.

srun -c 1 --ntasks-per-node=1 --pty bash

To show slurm variables when session starts:

export | grep SLURM

Interactive node availability

At this time, there are no dedicated nodes  for pure interactive use. Wait time can be long if requesting large amounts of memory or CPU's. Single core requests usually fit in nicely.

3.4 Advance Reservations

Slurm supports Advance Reservations. You or your group may ask for a specific resources for a dedicated time slot. However, because Advance Reservations are not ordinary user option to choose, specific request needs to be submitted to it4science@helsinki.fi to enable the reservation. Advance Reservations are disruptive to the system operation (jobs need to be drained from the system to enable empty slot at the given time) and the resource requirements have to be justified.

3.5 Actual Resource Utilisation

Slurm features simple utility to provide job utilisation details from any job that has completed. Using this utility helps to determine actual resource needs, of the job for future reference. You can run job once with much higher resource requests, and then use self to find out the actual use, which you can then use for later runs:

seff <completed job ID>

Resource Requests

Accurate resource requests (memory, CPU and time) will expedite the execution of your jobs - and lead to better system utilisation. Less you request, more expedient execution.

3.5.1 Job Accounting Data 

Slurm has a powerful accounting feature with myriad options to choose from. Below a line featuring some of the more useful details:

sacct -oJobID,JobName,ExitCode,NNodes,NCPUS,MaxRSS,Elapsed,End

Provides easy to read list formatted output, where fields are:

JobID: Job identification number

JobName: Job name given in the Slurm batch script

ExitCode: Exit code once job was terminated

NNodes: Node count

NCPUS: CPU's (Core) reserved by the job

MaxRSS: Memory peak usage during job execution, returns value when job has finished. This value can be used to adjust the requested memory value in the batch script accordingly.

Elapsed: Time batch job was in execution

End: End time of batch job

Other examples:

Lists details when JobID is known:

sacct -j <jobID> -oJobID,JobName,ExitCode,NNodes,NCPUS,MaxRSS,Elapsed,End

Jobs listed by UserID:

sacct -u <userID> -oJobID,JobName,ExitCode,NNodes,NCPUS,MaxRSS,Elapsed,End


sacct man page

Comprehensive accounting options, and parameters can be found from the accounting man page.


3.5.2 Scheduling Policy

Job execution priorities depend upon user resource requests. If no resource limits are requested in the batch script, then system and queue defaults are used. Job priority and scheduling decisions are based on available system resources. Fair Share is applied to allocate everyone near equal share of the system. Below a list of resources considered with most "expensive" resource on the top:

      1. GPU requested
      2. Memory requested
      3. Wall Time requested
      4. CPU's requested


System Defaults

System defaults: Job placement in short queue, allocation of 512MB of memory per CPU (core), 1 hour of Wall Time and 1 CPU (core). Defaults can be changed by user specified resource requests in the job batch script.

3.0 Further Reading

Aalto University Triton User Guide

Technical Specifications of Ukko2

Parallel Processing

GDB Debugger Cheat Sheet

PBS Command Wrappers

Module System

CSC's Taito cluster's documentation may be useful

CSC SLURM instructions

Slurm Quick Reference & Cheat Sheet

Ukko Cubbli Linux Instructions

Spark User Guide

GCC Optimization Guide