79217639

Date: 2024-11-23 10:45:14
Score: 0.5
Natty:
Report link

I have not been able to find a satisfying answer to submitting tasks on multiple nodes using job steps. However I found that in my case (multiple identical runs) what works really well is to submit only one job step split in many tasks. The batch script would then look like:

#!/bin/sh

#SBATCH --partition parallel
#SBATCH --ntasks=100
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=100M
#SBATCH --job-name test
#SBATCH --output test.out

srun -n100 -u exec.sh

with the executable script exec.sh containing expressions with the variable $SLURM_PROCID to differentiate between the tasks. For example:

#!/bin/sh

echo $SLURM_PROCID
sleep 1200

This will result in the desired behavior, but from what I understand it has some drawbacks compared to submitting separate job steps when it comes to the independently controlling each task. However, until a better alternative is found, this is the only approach that seems to work for this use case.

Reasons:
  • Long answer (-0.5):
  • Has code block (-0.5):
  • Self-answer (0.5):
  • Low reputation (1):
Posted by: Christos