Monitoring your jobs

Track a job in progress

Once you have submitted a job, you can follow its status with the command squeue:

  • To display your current or pending jobs: squeue -u <username> 
  • To display the current or pending jobs for a project (your Slurm account): squeue -A <account name>
squeue_colonne_st.png
The different states for a a job (column ST) are: CA (canceled), CD (completed), CF (configuring), CG (completing), F (failed), NF (node fail), PD (pending), R (running), TO (timeout)

Your job can be put on hold for the following reasons:

  • There are currently not enough resources available on the cluster to satisfy your request.
  • You are already using resources on the desired queue (specified with the partition option) and your new job would make you exceed the authorized limit (see the limits for each queue here).

In both cases, just wait for the jobs in progress to finish and your waiting job will be launched automatically. See also : Priority for pending jobs.

To get the full details of a job in progress, use the command:

 scontrol show job <JOBID> 

For jobs submitted with sbatch, standard output and errors are by default written to a file named slurm-<JOBID>.out. You can specify another file with the output option.

If you mentioned your e-mail (see above example of submission with sbatch), you will receive an e-mail at each stage mentioned in mail-type (the full list of options is available here).

Track the consumption of CPU hours for my projects

Launching the command usage_info will allow you to consult for each of your projects the CPU time already consumed compared to the total time which has been allocated by the scientific committee. This consumed time includes the consumptions made by all the project members having a user account on the cluster. The time counter is updated each time a job is finished.

View my job history

sacct -u <username> --format=JobID,JobName,partition,alloccpus,state,elapsed,maxrss,totalcpu,start,end -S <MM/JJ/AA>

Cancel a job

scancel <JOBID>

To cancel all my pending jobs:

scancel -u username --state=pending

To cancel all my jobs (running and pending):

scancel -u username