INTRODUCTION

Slurm can parallelize jobs very nicely if each file has some form of a number in it’s title. For example if your files look like “MYPREFIX0004.txt, MYPREFIX0005.txt, MYPREFIX0006.txt, ect” you could do:

mylib=MYPREFIX`printf %04d $SLURM_ARRAY_TASK_ID`

Then just use the $mylib variable in your script and call the script like so:

sbatch --array 5-6 myscript.sbatch

which will have the slurm scheduler submit parallel jobs for you to make your jobs run quicker.

If your file aren’t formatted with a number you can use this workaround.

Glob the filenames you want to run operations on from whatever directory

FILES=(/mypath/*)

Create var NUMFILES which will be the length of the submitted array job

NUMFILES=${#FILES[@]}
echo $NUMFILES

Create a variable in your script that references an array value using $SLURM_ARRAY_TASK_ID as an index

entry=$(basename ${FILES[$SLURM_ARRAY_TASK_ID]} .tab) #include file extension if you want to cut that off