Monitoring your jobs

Track a job in progress

Once you have submitted a job, you can follow its status with the command squeue:

squeue_colonne_st.png
The different states for a a job (column ST) are: CA (canceled), CD (completed), CF (configuring), CG (completing), F (failed), NF (node fail), PD (pending), R (running), TO (timeout)

Your job can be pending for several reasons listed on this page.

To get the full details of a job in progress, use the command:

 scontrol show job <JOBID> 

For jobs submitted with sbatch, standard output and errors are by default written to a file named slurm-<JOBID>.out. You can specify another file with the output option.

If you mentioned your e-mail (see above example of submission with sbatch), you will receive an e-mail at each stage mentioned in mail-type (the full list of options is available here).

Track the consumption of CPU hours for my projects

Launching the command usage_info will allow you to consult for each of your projects the CPU time already consumed compared to the total time which has been allocated by the scientific committee. This consumed time includes the consumptions made by all the project members having a user account on the cluster. The time counter is updated each time a job is finished.

View my job history

sacct -u <username> --format=JobID,JobName,partition,alloccpus,state,elapsed,maxrss,totalcpu,start,end -S <MM/JJ/AA>

Cancel a job

scancel <JOBID>

To cancel all my pending jobs:

scancel -u username --state=pending

To cancel all my jobs (running and pending):

scancel -u username