Submitting and cancelling SLURM jobs

Submit a job script called my_script.sh requesting

  • 5GB RAM per cpu
  • 20 CPUs on a single node
  • use the scheduler queue $MYQUEUE
  • use the job name $MYJOB_NAME
  • provide a time ceiling of 3 HRs
  • write STDOUT to file $MYLOG.log
sbatch --mem-per-cpu 5GB -c 20 -p $MYQUEUE -J $MYJOB_NAME -t 0-3:00:00 -o $MYLOG.log my_script.sh

# Cancel a task with the related JOB_ID
scancel $MYJOB_ID

Submit an interactive job

  • 2GB RAM per cpu
  • 4 CPUs on a single node
  • use the scheduler queue $MYQUEUE
  • provide a time ceiling of 1 HR
  • execute task zero in pseudo terminal mode. The option “pty” is important because it allows an interactive terminal mode. Without “pty” every command issued would be run 4 times (-c 4)
srun --mem-per-cpu 2GB -c 4 -p $MYQUEUE -t 0-01:00:00 --pty /bin/bash

# Use srun for any long jobs, even cp or rsync
# DONT USE THE LOGIN NODE
srun -p $MYQUEUE cp my_file my_new_file

Jobs can be submitted by passing all SLURM parameters through bash script

#!/usr/bin/env bash
#SBATCH --mem-per-cpu 5GB
#SBATCH -c 20
#SBATCH -p $MYQUEUE
#SBATCH -J $MYJOB
#SBATCH -t 0-3:00:00
#SBATCH -o $MYLOG.log my_script.sh

raxmlHPC-PTHREADS-AVX -T 20 \
-m GTRGAMMA \
-p 82748 \
-# 20 \
-s $MY_PHYLIP_FILE \
-n $MY_NAME \
-o $MY_OUTGROUP \
-w $MY_OUTDIR

Run the RAxML analysis as

sbatch my_raxml_script.sh

Find information about partitions and jobs

# Display submitted jobs for a given user
squeue -u $USER

List job information

  • The sacct command displays job accounting data stored in the job accounting log file or Slurm database in a variety of forms for your analysis. The sacct command displays information on jobs, job steps, status, and exitcodes by default. For the non-root user, the sacct command limits the display of job accounting data to jobs that were launched with their own user identifier (UID) by default. Data for other users can be displayed with the –allusers, –user, or –uid options.
sacct --format="CPUTime,MaxRSS,AveRSS,JobName,Timelimit,Start,Elapsed"

# Display available partitions on the cluster
sinfo

# List jobs that ran since Dec 1st 2016
sacct -S 2016-12-01

# Get help about sacct command
sacct --helpformat