79153244

Date: 2024-11-03 17:16:50
Score: 0.5
Natty:
Report link

You should almost certainly not have sleep in your job script. All it's doing is occupying the job's resources without getting any work done - waste.

Job arrays are just a submission shorthand: the members of the array have the same overhead as standalone jobs. The only difference is that the arrayjob sits in the queue sort of like a Python "generator", so every time the scheduler considers the queue and there are resources available, another array member will be budded off as a standalone job.

That's why the sleep makes no sense: it's in the job, not in the submission. Slurm doesn't have a syntax for throttling a job array by time (only by max running).

But why not just "for n in {1..60}; do sbatch script $n; sleep 10; done"?

I'm a cluster admin, and I'm find with this. You're trying to be kind to the scheduler, which is good. Every 10 seconds is overkill though - the scheduler can probably take a job per second without any sweat. I'd want you to think more thoroughly about whether the "shape" of each job makes sense (GPU jobs often can use more than one CPU core, and is this job efficient in the first place? and there are lots of ways to tune for cases where your program (matlab) can't keep a GPU busy, such as MPS and MIGs.)

Reasons:
  • Long answer (-1):
  • No code block (0.5):
  • Contains question mark (0.5):
  • Low reputation (0.5):
Posted by: markhahn