Reports

As a user of Snakemake on SLURM, I found for myself that the job grouping feature is not really designed for, or useful for, what you are trying to do. In a standard SLURM setup, if you have a cluster with, say, 4 nodes where each node has 16 cores, then you can submit 100 single-core jobs and SLURM will run 64 jobs immediately (sending 16 to each node) and then start each of the remaining 36 as soon as a core is free. This is how Snakemake expects to interact with SLURM, letting SLURM allocate the individual CPU cores.

It seems like your SLURM setup is locked into --exclusive mode such that each job assigns a full node, even if the job only needs a single core. Is this correct? Is there any way you can alter the SLURM configuration or add a new partition that allows 1-core jobs? Or is that not a possibility?

My own reason for grouping the jobs was to reduce the number of individual short tasks sent to SLURM to reduce load on the SLURMD controller. In the end I re-wrote my Snakefile to explicitly process the inputs in batches. To do this, I made a Python script that runs before the Snakemake workflow and determines the number of batches and what inputs are in what batch, and saves the result as JSON. Then I use this information within a Snakemake input function to assign multiple inputs files to each batch job. It's effective, but complex, and not workable as a general solution.

79497977