Configure your resource reservation
Reservations on the cluster are made via the Slurm Scheduler, which puts the submitted jobs in queue (also called partitions) until resources are allocated.
Important: When submitting a job, it is compulsory to mention the following information:
- the name of the project assigned to you by the scientific committee (with account = <project name> )
- the resources you wish to reserve (by default, 1 core). In particular, you can use:
- -n ou ntasks= for the number of tasks (will be worth 1 task if not specified). By default, Slurm will use the value of ntasks considering that there is one core per task. So the number of reserved cores will be equal to ntasks.
- ntasks-per-node= number of tasks to be performed on the same node.
- ntasks-per-socket= number of tasks to be performed on the same processor. As a reminder, the nodes of the cpucourt and cpulong queues have 2 processors of 20 cores each.
- -N ou nodes= for the number of nodes. If not specified, allocation will be based on other resource options. If you only specify the number of nodes, with the cpucourt queue (which is not exclusive) this will reserve by default 1 core per reserved node. As the cpulong queue is exclusive, if you reserve 1 node, you reserve all of its cores, even if you do not use them all
- a walltime after which your job will be stopped by Slurm (with time =dd-hh:mm:ss). The walltime of your job must be less than or equal to the maximum walltime defined for the partition on which you submit your job (see the description of the partitions here). By default, if no partition is specified, the job is sent to the cpucourt queue.
Warning: even if your job has not finished executing once the walltime has been reached, your job will be automatically stopped. It is therefore strongly recommended that you carefully estimate the execution time of your job and introduce checkpoints in your code.
Submit a job
In a script
Use command: sbatch <file containing your script>
The full list of options when submitting a job can be found here.
Example of script named “myJob” which allows to execute 5 tasks (here a task will take 1 core). Resources will be reserved for a maximum of 2 hours and the job will be sent to the short job queue (cpucourt).
#SBATCH --time=0-02:00:00 #SBATCH --account=projectname
module load ...
my commands to run the job ...
Although optional, the commands mail-user et mail-type allow you to receive email notifications about the status of your job. The modules allow you to dynamically modify your environment variables (essentially PATH, LD_LIBRARY_PATH or MAN_PATH) depending on the module you are loading. For more information on using the modules, click here. The complete list of modules installed on the cluster is available here.
For jobs using the GPU node, you must add the two options below, gres being the number of V100 cards to reserve (between 1 and 4).
Since you specify the number of cards with the gres option, you do not have to modify the CUDA_VISIBLE_DEVICES variable (everything is managed by Slurm) and your jobs should only have access to the reserved GPUs.
If your software requires user interaction rather than scripts in batch mode, you may use the interactive mode. This allows you to launch an interactive shell on a compute node so that you can directly work on that node.
For example, to launch an interactive bash shell for one hour on the cpucourt partition:
srun -A my_account_Slurm -p cpucourt -t 01:00:00 --pty bash -i