Here you can find the compendium of Slurm environment variables and exit codes for a quick reference.

1.0 INPUT ENVIRONMENT VARIABLES

Upon startup, sbatch will read and handle the options set in the following environment variables. Note that environment variables will override any options set in a batch script, and command line options will override any environment variables.

Variable NameEquals

SBATCH_ACCOUNT

Same as -A, --account

SBATCH_ACCTG_FREQ

Same as --acctg-freq

SBATCH_ARRAY_INX

Same as -a, --array

SBATCH_BLRTS_IMAGE

Same as --blrts-image

SBATCH_CHECKPOINT

Same as --checkpoint

SBATCH_CHECKPOINT_DIR

Same as --checkpoint-dir

SBATCH_CLUSTERS or SLURM_CLUSTERS

Same as --clusters

SBATCH_CNLOAD_IMAGE

Same as --cnload-image

SBATCH_CONN_TYPE

Same as --conn-type

SBATCH_CONSTRAINT

Same as -C, --constraint

SBATCH_CORE_SPEC

Same as --core-spec

SBATCH_DEBUG

Same as -v, --verbose

SBATCH_DELAY_BOOT

Same as --delay-boot

SBATCH_DISTRIBUTION

Same as -m, --distribution

SBATCH_EXCLUSIVE

Same as --exclusive
SBATCH_EXPORTSame as --export
SBATCH_GEOMETRYSame as -g, --geometry

SBATCH_GET_USER_ENV

Same as --get-user-env
SBATCH_GRES_FLAGSSame as --gres-flags
SBATCH_HINT or SLURM_HINTSame as --hint
SBATCH_IGNORE_PBSSame as --ignore-pbs
SBATCH_IMMEDIATESame as -I, --immediate
SBATCH_IOLOAD_IMAGESame as --ioload-image

SBATCH_JOBID

Same as --jobid
SBATCH_JOB_NAMESame as -J, --job-name
SBATCH_LINUX_IMAGE

Same as --linux-image

SBATCH_MEM_BINDSame as --mem-bind
SBATCH_MLOADER_IMAGESame as --mloader-image
SBATCH_NETWORKSame as --network
SBATCH_NO_REQUEUESame as --no-requeue

SBATCH_NO_ROTATE

Same as -R, --no-rotate
SBATCH_OPEN_MODESame as --open-mode
SBATCH_OVERCOMMITSame as -O, --overcommit
SBATCH_PARTITIONSame as -p, --partition
SBATCH_POWERSame as --power

SBATCH_PROFILE

Same as --profile
SBATCH_QOSSame as --qos
SBATCH_RAMDISK_IMAGESame as --ramdisk-image
SBATCH_RESERVATIONSame as --reservation
SBATCH_REQ_SWITCHWhen a tree topology is used, this defines the maximum count of switches desired for the job allocation and optionally the maximum time to wait for that number of switches. See --switches
SBATCH_REQUEUESame as --requeue

SBATCH_SIGNAL

Same as --signal
SBATCH_SPREAD_JOBSame as --spread-job
SBATCH_THREAD_SPECSame as --thread-spec
SBATCH_TIMELIMITSame as -t, --time

SBATCH_USE_MIN_NODES

Same as --use-min-nodes

SBATCH_WAIT

Same as -W, --wait
SBATCH_WAIT_ALL_NODESSame as --wait-all-nodes
SBATCH_WAIT4SWITCHMax time waiting for requested switches. See --switches
SBATCH_WCKEYSame as --wckey
SLURM_CONFThe location of the Slurm configuration file.

SLURM_EXIT_ERROR

Specifies the exit code generated when a Slurm error occurs (e.g. invalid options). This can be used by a script to distinguish application exit codes from various Slurm error conditions.
SLURM_STEP_KILLED_MSG_NODE_ID=IDIf set, only the specified node will log when the job or step are killed by a signal.

2.0 OUTPUT ENVIRONMENT VARIABLES

The Slurm controller will set the following variables in the environment of the batch script.

Variable NameEquals

BASIL_RESERVATION_ID

The reservation ID on Cray systems running ALPS/BASIL only.
MPIRUN_NOALLOCATEDo not allocate a block on Blue Gene L/P systems only.
MPIRUN_NOFREEDo not free a block on Blue Gene L/P systems only.
MPIRUN_PARTITIONThe block name on Blue Gene systems only.
SBATCH_MEM_BINDSet to value of the --mem-bind option.
SBATCH_MEM_BIND_LISTSet to bit mask used for memory binding.

SBATCH_MEM_BIND_PREFER

Set to "prefer" if the --mem-bind option includes the prefer option.
SBATCH_MEM_BIND_TYPESet to the memory binding type specified with the --mem-bind option. Possible values are "none", "rank", "map_map", "mask_mem" and "local".
SBATCH_MEM_BIND_VERBOSESet to "verbose" if the --mem-bind option includes the verbose option. Set to "quiet" otherwise.
SLURM_*_PACK_GROUP_#For a heterogenous job allocation, the environment variables are set separately for each component.
SLURM_ARRAY_TASK_COUNTTotal number of tasks in a job array.
SLURM_ARRAY_TASK_IDJob array ID (index) number.

SLURM_ARRAY_TASK_MAX

Job array's maximum ID (index) number.
SLURM_ARRAY_TASK_MINJob array's minimum ID (index) number.
SLURM_ARRAY_TASK_STEPJob array's index step size.
SLURM_ARRAY_JOB_IDJob array's master job ID number.
SLURM_CHECKPOINT_IMAGE_DIRDirectory into which checkpoint images should be written if specified on the execute line.

SLURM_CLUSTER_NAME

Name of the cluster on which the job is executing.
SLURM_CPUS_ON_NODENumber of CPUS on the allocated node.
SLURM_CPUS_PER_TASKNumber of cpus requested per task. Only set if the --cpus-per-task option is specified.
SLURM_DISTRIBUTIONSame as -m, --distribution
SLURM_GTIDSGlobal task IDs running on this node. Zero origin and comma separated.
SLURM_JOB_ACCOUNTAccount name associated of the job allocation.

SLURM_JOB_ID (and SLURM_JOBID for backwards compatibility)

The ID of the job allocation.
SLURM_JOB_CPUS_PER_NODECount of processors available to the job on this node. Note the select/linear plugin allocates entire nodes to jobs, so the value indicates the total count of CPUs on the node. The select/cons_res plugin allocates individual processors to jobs, so this number indicates the number of processors on this node allocated to the job.
SLURM_JOB_DEPENDENCYSet to value of the --dependency option.
SLURM_JOB_NAMEName of the job.
SLURM_JOB_NODELIST (and SLURM_NODELIST for backwards compatibility)List of nodes allocated to the job.

SLURM_JOB_NUM_NODES (and SLURM_NNODES for backwards compatibility)

Total number of nodes in the job's resource allocation.
SLURM_JOB_PARTITIONName of the partition in which the job is running.
SLURM_JOB_QOSQuality Of Service (QOS) of the job allocation.
SLURM_JOB_RESERVATIONAdvanced reservation containing the job allocation, if any.
SLURM_LOCALIDNode local task ID for the process within a job.
SLURM_MEM_PER_CPUSame as --mem-per-cpu

SLURM_MEM_PER_NODE

Same as --mem
SLURM_NODE_ALIASESSets of node name, communication address and hostname for nodes allocated to the job from the cloud. Each element in the set if colon separated and each set is comma separated. For example: SLURM_NODE_ALIASES=ec0:1.2.3.4:foo,ec1:1.2.3.5:bar
SLURM_NODEIDID of the nodes allocated.
SLURM_NTASKS (and SLURM_NPROCS for backwards compatibility)Same as -n, --ntasks

SLURM_NTASKS_PER_CORE

Number of tasks requested per core. Only set if the --ntasks-per-core option is specified.

SLURM_NTASKS_PER_NODENumber of tasks requested per node. Only set if the --ntasks-per-node option is specified.
SLURM_NTASKS_PER_SOCKET

Number of tasks requested per socket. Only set if the --ntasks-per-socket option is specified.

SLURM_PACK_SIZESet to count of components in heterogeneous job.
SLURM_PRIO_PROCESSThe scheduling priority (nice value) at the time of job submission. This value is propagated to the spawned processes.

SLURM_PROCID

The MPI rank (or relative process ID) of the current process
SLURM_PROFILESame as --profile
SLURM_RESTART_COUNTIf the job has been restarted due to system failure or has been explicitly requeued, this will be sent to the number of times the job has been restarted.
SLURM_SUBMIT_DIR

The directory from which sbatch was invoked.

SLURM_SUBMIT_HOSTThe hostname of the computer from which sbatch was invoked.

SLURM_TASKS_PER_NODE

Number of tasks to be initiated on each node. Values are comma separated and in the same order as SLURM_JOB_NODELIST. If two or more consecutive nodes are to have the same task count, that count is followed by "(x#)" where "#" is the repetition count. For example, "SLURM_TASKS_PER_NODE=2(x3),1" indicates that the first three nodes will each execute three tasks and the fourth node will execute one task.
SLURM_TASK_PIDThe process ID of the task being started.
SLURM_TOPOLOGY_ADDRThis is set only if the system has the topology/tree plugin configured. The value will be set to the names network switches which may be involved in the job's communications from the system's top level switch down to the leaf switch and ending with node name. A period is used to separate each hardware component name.

SLURM_TOPOLOGY_ADDR_PATTERN

This is set only if the system has the topology/tree plugin configured. The value will be set component types listed in SLURM_TOPOLOGY_ADDR. Each component will be identified as either "switch" or "node". A period is used to separate each hardware component type.
SLURMD_NODENAMEName of the node running the job script.


2.1 Filename patterns

sbatch allows for a filename pattern to contain one or more replacement symbols, which are a percent sign "%" followed by a letter (e.g. %j).

%% The character "%".

%A Job array's master job allocation number.

%a Job array ID (index) number.

%J jobid.stepid of the running job. (e.g. "128.0")

%j jobid of the running job.

%N short hostname. This will create a separate IO file per node.

%n Node identifier relative to current job (e.g. "0" is the first node of the running job) This will create a separate IO file per node.

%s stepid of the running job.

%t task identifier (rank) relative to current job. This will create a separate IO file per task.

%u User name.

%x Job name.

Some examples of how the format string may be used for a 4 task job step with a Job ID of 128 and step id of 0 are included below:

job%J.out
job128.0.out
job%4j.out
job0128.out
job%j-%2t.out
job128-00.out, job128-01.out, ...

3.0 JOB EXIT CODES

The exit code from a batch job is a standard Unix termination signal and exit code 0 means successful completion. Codes 1-127 are generated from the job calling exit() with a non-zero value to indicate an error. Codes 129-255 represent jobs terminated by Unix signals. 

Signal NameSignal NumberExit TypeReason
SIGHUP1TermHangup detected on controlling terminal or death of controlling process
SIGINT2TermInterrupt from keyboard
SIGQUIT3CoreQuit from keyboard
SIGILL4CoreIllegal Instruction
SIGABRT6CoreAbort signal from abort(3)
SIGFPE8CoreFloating point exception
SIGKILL9TermKill signal
SIGSEGV11CoreInvalid memory reference
SIGPIPE13TermBroken pipe: write to pipe with no readers
SIGALRM14TermTimer signal from alarm(2)
SIGTERM15Term

Termination signal


Exit CodeReason
9CPU time limit.
64Your job was running out of CPU time. Allocate more resources, eg. CPU time limit.
125An ErrMsg(severe) was reached.
127

System has a problem(?), contact administrators.

130Run out of CPU or swap time. If suspecting swap time, check for memory leaks.
131Run out of CPU or swap time. If suspecting swap time, check for memory leaks.
134The job killed with an abort signal, and you probably got core dumped. Possible causes: assert() or an ErrMsg(fatal) hit. Possible run-time bug. Use a debugger to find out what's wrong.
137The job was killed because it exceeded the time limit.
139Segmentation violation. Usually indicates a pointer error.
140The job exceeded the "wall clock" time limit (as opposed to the CPU time limit).