The order of execution of queued jobs is determined according to a system of priorities. 3 parameters are taken into account:
- The length of time your job has been pending.
- Recent job history of the user and the other members of its account (fairshare).
- The requested resources for this job (nodes and walltime).
The squeue command lists the pending jobs starting from the highest priority.
The priority of a job is expressed in the form of a value between 0 and 1. This value is defined by taking into account the 3 parameters mentioned previously, which each have a defined weight. The closer the value is to 1, the more priority the job will be. To know the priority of your pending jobs, use the command:
To compare priorities between different jobs (here 10345 and 10346) :
sprio -n --jobs=10345,10346
To know the value of your current fairshare and that of your Slurm account (thus taking into account the fairshare of your colleagues):
sshare -A your_account_name
If your fairshare has a low value (FairShare column), it will increase if you do not submit a job for several days.
Low priority partitions
Some machines, funded by a laboratory or a company, are also available to all users but with a lower priority. This means that if jobs are pending (Pending state in Slurm) for the same type of node, the job sent to a higher priority queue will be executed before the others. Job preemption is not applied on the cluster, i.e. no running job will be stopped to give way to a job submitted on a higher priority queue.
The amdcourt partition is a low priority queue. If the number of nodes you want to reserve is not available, be aware that your job may remain in the queue for potentially a long time because other queues using AMD nodes have higher priority than amdcourt.
If your job requires between 1 and 256 cores (8 AMD nodes), it is strongly recommended that you submit it to the amd partition.
It is recommended to use the amdcourt partition in the following cases:
- You need 257 to 512 cores simultaneously for less than 36 hours.
- There are not enough free resources on the amd partition and there are free nodes on the amdcourt partition. You can see how many nodes are idle on each partition with the following command:
sinfo --state=idle -p amd,amdcourt