Priority for pending jobs

Usage history

The order of execution of queued jobs is determined according to a system of priorities. 3 parameters are taken into account:

  • The length of time your job has been pending.
  • Recent job history of the user and the other members of its account (fairshare).
  • The requested resources for this job (nodes and walltime).

The squeue command lists your pending jobs starting from the highest priority.

The priority of a job is expressed in the form of a value between 0 and 1. This value is defined by taking into account the 3 parameters mentioned previously, which each have a defined weight. The closer the value is to 1, the more priority the job will be. To know the priority of your pending jobs, use the command:

sprio -n

To compare priorities between different jobs (here 10345 and 10346) :

sprio -n --jobs=10345,10346

To know the value of your current fairshare and that of your Slurm account (thus taking into account the fairshare of your colleagues):

sshare -A your_account_name

If your fairshare has a low value (FairShare column), it will increase if you do not submit a job for several days.

Low priority partitions

Some machines, funded by a laboratory or a company, are also available to all users but with a lower priority. This means that if jobs are pending (Pending state in Slurm) for the same type of node, the job sent to a higher priority queue will be executed before the others. Job preemption is not applied on the cluster, i.e. no running job will be stopped to give way to a job submitted on a higher priority queue.

A100 within the gpu partition

In order to explicitly specify that you want to use an A100 card, mention the following Slurm option:

#SBATCH --gres=gpu:a100:1
#SBATCH --partition=gpu

You can use up to 2 A100s simultaneously.

Requests to use A100 GPUs are non-priority. If there are no free A100s when you submit your job, be aware that it could potentially remain pending for a long time because other users have priority. It is therefore advisable to first check the status of the gpu03 node (equipped with A100s). To do so:

sinfo -n gpu03 -o %T

If the state is idle, your job can start immediately. If the state is alloc or mix, the node is already at least partially used. If you do not specify the type of GPU you want, your job can be processed either on a V100 or on an A100, depending on availability:

#SBATCH --gres=gpu:1
#SBATCH --partition=gpu

More information about GPU jobs can be read on this page.