Priority for pending jobs

Usage history

The order of execution of queued jobs is determined according to a system of priorities. 3 parameters are taken into account:

  • The length of time your job has been pending.
  • Recent job history of the user and the other members of its account (fairshare).
  • The requested resources for this job (nodes and walltime).

The squeue command lists your pending jobs starting from the highest priority.

The priority of a job is expressed in the form of a value between 0 and 1. This value is defined by taking into account the 3 parameters mentioned previously, which each have a defined weight. The closer the value is to 1, the more priority the job will be. To know the priority of your pending jobs, use the command:

sprio -n

To compare priorities between different jobs (here 10345 and 10346) :

sprio -n --jobs=10345,10346

To know the value of your current fairshare and that of your Slurm account (thus taking into account the fairshare of your colleagues):

sshare -A your_account_name

If your fairshare has a low value (FairShare column), it will increase if you do not submit a job for several days.

Low priority partitions

Some machines, funded by a laboratory or a company, are also available to all users but with a lower priority. This means that if jobs are pending (Pending state in Slurm) for the same type of node, the job sent to a higher priority queue will be executed before the others. Job preemption is not applied on the cluster, i.e. no running job will be stopped to give way to a job submitted on a higher priority queue.

A100 within the gpu partition

In order to explicitly specify that you want to use an A100 card, mention the following Slurm option:

#SBATCH --gres=gpu:a100:1
#SBATCH --partition=gpu

Requests to use A100 GPUs are non-priority. If there are no free A100s when you submit your job, be aware that it could potentially remain pending for a long time because other users have priority. It is therefore advisable to first check the status of the gpu03 node (equipped with A100s). To do so:

sinfo --Format Partition,NodeList,NodeAI,CPUsState -p gpu

If CPUS(A) = 12 or more and NODES(A)=3, all GPUs are busy.

If you do not specify the type of GPU you want, your job can be processed either on a V100 or on an A100, depending on availability:

#SBATCH --gres=gpu:1
#SBATCH --partition=gpu

More information about GPU jobs can be read on this page.