Usage history
The order of execution of queued jobs is determined according to a system of priorities. 3 parameters are taken into account:
- The length of time your job has been pending.
- Recent job history of the user and the other members of its account (fairshare).
- The requested resources for this job (nodes and walltime).
The squeue command lists your pending jobs starting from the highest priority.
The priority of a job is expressed in the form of a value between 0 and 1. This value is defined by taking into account the 3 parameters mentioned previously, which each have a defined weight. The closer the value is to 1, the more priority the job will be. To know the priority of your pending jobs, use the command:
sprio -n
To compare priorities between different jobs (here 10345 and 10346) :
sprio -n --jobs=10345,10346
To know the value of your current fairshare and that of your Slurm account (thus taking into account the fairshare of your colleagues):
sshare -A your_account_name
If your fairshare has a low value (FairShare column), it will increase if you do not submit a job for several days.
Low priority partitions
Some machines, funded by a laboratory or a company, are also available to all users but with a lower priority. This means that if jobs are pending (Pending state in Slurm) for the same type of node, the job sent to a higher priority queue will be executed before the others. Job preemption is not applied on the cluster, i.e. no running job will be stopped to give way to a job submitted on a higher priority queue.
A100 within the gpu partition
In order to explicitly specify that you want to use an A100 card, mention the following Slurm option:
#SBATCH --gres=gpu:a100:1
#SBATCH --partition=gpu
Requests to use A100 GPUs are non-priority. If there are no free A100s when you submit your job, be aware that it could potentially remain pending for a long time because other users have priority. It is therefore advisable to first check the status of the gpu03 node (equipped with A100s). To do so:
sinfo --Format Partition,NodeList,NodeAI,CPUsState -p gpu
If CPUS(A) = 12 or more and NODES(A)=3, all GPUs are busy.
If you do not specify the type of GPU you want, your job can be processed either on a V100 or on an A100, depending on availability:
#SBATCH --gres=gpu:1
#SBATCH --partition=gpu
More information about GPU jobs can be read on this page.