New HPC cluster ukko2 is now in production. To be able to use the cluster, you need to either
- be member of CS staff or
- belong to the IDM group grp-cs-ukko2
Please note that it might take up to 2 hours after you have been added to either group before your home directory gets created.
Connections are only allowed from the helsinki.fi domain. VPN or eduroam is not sufficient. To access ukko2 outside of the domain (e.g. home), add following to your ~/.ssh/config:
ProxyCommand ssh firstname.lastname@example.org -W %h:%p
Note: with OpenSSH versions > 7.3, you can substitute the ProxyCommand with more human readable line: ProxyJump melkinpaasi.cs.helsinki.fi
To access a login node:
Batch scheduling system is the most prominent difference between ukko and ukko2. Instead of logging directly into computing node and executing jobs there interactively, users now log in to a login node and submit the jobs via batch scheduler. The login node is for batch job management and for compiler/development environment only. Do not execute any production jobs there. Slurm handles the resource requests and optimises the resource allocation.
Another change is a Module System. Modules are used to manage software packages, compiler environments etc. Users can load or unload modules freely.
1.1 Ukko 1 Resources
- At this time there is a single Ukko 1 Cubbli Linux node available, see instructions. Node has 4 cores, 32GB of RAM.
1.2 Ukko 2 Resources
- Single login node ukko2 serves logins, compiler environments and all batch scheduling functions
- 31 regular compute nodes (ukko-02 - ukko-32): 28 cores, 2 threads and 256 GB RAM.
- 2 big memory nodes (ukko2-pekka, ukko2-paavo): 96 cores, 2 threads and 3 TB RAM.
- 2 GPU nodes (ukko2-g01, ukko2-g02) : 28 cores, 2 threads, 512 GB RAM and 4 Tesla P100 GPU cards.
1.3 I/O, disks and filesystems
The home directory is the same as it is in the old ukko: all user files will be available on Ukko2. User can also access the files in ukko from the department computers using the path /cs/work/home/username. Local work directory is /cs/work/scratch. Please do not use local drives of the nodes. If you need local drives as a resource, please consider using Kale instead.
2.0 Scientific software
Most development tools and software packages are available through modules, but not all. Here's references to those which you might need to install locally at this time.
3.0 Queuing system
Jobs are managed by SLURM and runtime environment is managed with aforementioned modules. The jobs are submitted from the ukko2 login node (ukko2.cs.helsinki.fi). There are six production queues and one short queue to test jobs. Some queues overlap resources for better system utilisation:
Wall Time limit
Memory per core
|short||24h||1032||8GB - 32GB|
|extralong||60 days||28||8GB||Single node|
|bigmem||7 days||192||32GB||Two nodes|
|gpu||7 days||56||18GB||Have to reserve with #SBATCH p gpu and --Gres:gpu=<nbr# gpu's>|
|test||1h||112||8GB||Have to reserve with #SBATCH -p test|
|cubbli||1 day||4||8GB||Have to reserve with -p cubbli|
3.1 Creating a Batch job
Simplest way to submit job into the system is to do it as a simple serial job. Following example is requesting 1 core and 100M of memory for 10 minutes and placement in a test queue. At the end of the script, srun command is used to start the program. You can think the scheduler as elaborate time and resource reservation system. #SABTCH -sections describe the required resources and the rest is just regular Linux. You can execute pretty much any Linux script or command you would be able to execute on the login node. There are few exceptions such as setting up daemons, but nothing you would ordinarily encounter.
Batch script needs to start with shebang (#!/bin/bash) and the batch parameters have to be set in the script before the actual program.
sbatch syntax for time may not be obvious. Normally times are set as DD-hh:mm:ss where DD=days, hh=hours, mm=minutes, and ss=seconds. However, you can also use 2-0 to represent 2 days, while 10:00 would indicate 10 minutes.
#!/bin/bash #SBATCH --job-name=test #SBATCH -o result.txt #SBATCH -p test #SBATCH -c 1 #SBATCH -t 10:00 #SBATCH --mem-per-cpu=100 srun hostname srun sleep 60
Following command submits the test.job into the system and , and the scheduler takes care of the job placement (sbatch accepts additional options on the command line):
3.1.1 Environment Variables and Exit Codes
If you need only some environment variables to be propagated from your session, or none, you can choose export option (default is ALL. See a special case for Cubbli Linux nodes):
--export=<environment variables | ALL | NONE>
When batch job is submitted and launched, Slurm sets number of environment variables which can be used for job control. Standard linux exit codes are used at job exit. Please see this page for a full compendium of the variables and error codes.
3.1.2 Serial - Consumable resources
Below are some of the most common batch job options for serial jobs. If no values are given, system defaults are used. These values are used to determine the job priority.
Job Wall Time limit:
#SBATCH -t <Wall Time limit>
Job CPU count equals to the cores:
#SBATCH -c <CPU count>
Job memory limit (Please Note: memory reservation is per core):
3.1.3 Further job control
When a job s submitted, you can use following commands to view the status of the queues, and change the job status if needed.
Show queue information:
If you want to cancel your job:
To check the status of your jobs that are in the queue:
squeue -l -u yourusername
For information about a job that is running:
scontrol show jobid -dd <jobID>
For information about a completed job's efficiency. Output of seff is automatically included in the end of job mail notifications, if notifications are set to be sent in the batch script.
Job control summary of less common commands:
|sacct||Displays accounting data for all jobs.|
|scontrol||View SLURM configuration and state.|
|sjstat||Display statistics of jobs under control of SLURM (combines data from sinfo, squeue and scontrol).|
|sprio||Display the priorities of the pending jobs. Jobs with higher priotities are launched first.|
|smap||Graphically view information about SLURM jobs, partitions, and set configurations parameters.|
Comprehensive Slurm Quick Reference & Cheat Sheet which can be printed out.
3.2 Serial or Parallel Job?
Serial job is any program that runs on a single machine. In case of Ukko2, it means a program running on a single core.
Parallel job is composed of multiple processes which run on multiple machines. Simplest case would be a job that uses two cpu's and sets of related processes. Processes talk to each other through a medium shared between the cpu's, like a local memory space.
3.2.1 Testing and Development
Before running jobs on the production queues, resource requirements, and in case of MPI jobs, scalability should be tested. 1h test queue is available for this purpose. "-p" parameter is mandatory for test jobs.
#SBATCH -p test
When submitting job for test queue, following parameters could be set in the batch -file. Mail parameters can be used with any jobs, they are not limited to test:
#!/bin/bash #SBATCH --job-name=test // Job name to be displayed in queue #SBATCH --output=foobar.out // Job output at the completion #SBATCH -p test // Request test partition #SBATCH -c 1 // Request single core #SBATCH -e foobar.err // Define error file #SBATCH --mail-type=END // Defining END of job mail notification #SBATCH --email@example.com // mail recipient srun hostname // commands to be run srun sleep 60
3.2.2 How to request GPU's
Below an example script for GPU usage, assuming single GPU (note that for this you need to use --gres:gpu=1), two cores and 100M of memory to be used for a default time:
#!/bin/bash #SBATCH --job-name=test #SBATCH -o result.txt #SBATCH -p gpu #SBATCH -c 2 #SBATCH --gres=gpu:1 #SBATCH --mem-per-cpu=100 srun hostname srun sleep 60
Running as a deamon
You may not have heard of Nvidia GPU Grant Program. Professors and Researchers may be eligible to participate to the program. If in doubt, please do check the requirements and eligibility.
3.2.3 Setting up e-mail notifications
You can set up e-mail notifications for batch job. If set, changes in the job status will be sent to specified user. Default is the user who submits the job.
Most commonly chosen mail options are: NONE, BEGIN, END, FAIL or ALL. To set the option, following line is needed in the batch script. Multiple options can be set as comma separated list:
User may also specify mail address other than default:
3.3 Interactive use
There is no need for direct access to node to start interactive session. Slurm allows interactive sessions to be started with srun and it is a great way to do testing and debugging. After entering the srun command, interactive job request is sent to the normal queue to wait for resources to become available. Once resources are available the session starts on a compute node, and you are put into the directory from which you ran the launched the session. You can then run commands.
Environment you get on compute node is determined by:
- The environment as set in your session from which you launch the srun command.
- Any extra variables set by Slurm
- Settings from your .bashrc file
Example of starting 1 core, 1 task interactive session with bash -shell. If values are not set, they are inherited from the system or queue defaults.
srun -c 1 --ntasks-per-node=1 --pty bash
To show slurm variables when session starts:
export | grep SLURM
Interactive node availability
3.4 Advance Reservations
Slurm supports Advance Reservations. You or your group may ask for a specific resources for a dedicated time slot. However, because Advance Reservations are not ordinary user option to choose, specific request needs to be submitted to firstname.lastname@example.org to enable the reservation. Advance Reservations are disruptive to the system operation (jobs need to be drained from the system to enable empty slot at the given time) and the resource requirements have to be justified.
3.5 Actual Resource Utilisation
Slurm features simple utility to provide job utilisation details from any job that has completed. Using this utility helps to determine actual resource needs, of the job for future reference. You can run job once with much higher resource requests, and then use self to find out the actual use, which you can then use for later runs:
seff <completed job ID>
3.5.1 Job Accounting Data
Slurm has a powerful accounting feature with myriad options to choose from. Below a line featuring some of the more useful details:
Provides easy to read list formatted output, where fields are:
JobID: Job identification number
JobName: Job name given in the Slurm batch script
ExitCode: Exit code once job was terminated
NNodes: Node count
NCPUS: CPU's (Core) reserved by the job
MaxRSS: Memory peak usage during job execution, returns value when job has finished. This value can be used to adjust the requested memory value in the batch script accordingly.
Elapsed: Time batch job was in execution
End: End time of batch job
Lists details when JobID is known:
sacct -j <jobID> -oJobID,JobName,ExitCode,NNodes,NCPUS,MaxRSS,Elapsed,End
Jobs listed by UserID:
sacct -u <userID> -oJobID,JobName,ExitCode,NNodes,NCPUS,MaxRSS,Elapsed,End
sacct man page
3.5.2 Scheduling Policy
Job execution priorities depend upon user resource requests. If no resource limits are requested in the batch script, then system and queue defaults are used. Job priority and scheduling decisions are based on available system resources. Fair Share is applied to allocate everyone near equal share of the system. Below a list of resources considered with most "expensive" resource on the top:
- GPU requested
- Memory requested
- Wall Time requested
- CPU's requested
3.0 Further Reading
Aalto University Triton User Guide
CSC's Taito cluster's documentation may be useful