That's not at all how I interpreted this man page description. By "If the job can use more than..." I thought it was completely obvious (although perhaps wrong, if your interpretation is correct, but it never crossed my mind) that it referred to whether the _submitting user_ is OK with it using more than one partition. The partition where the user is forbidden (because of the partition's allowed account) should just be _not_ the earliest initiation (because it'll never initiate there), and therefore not run there, but still be able to run on the other partitions listed in the batch script.
> that's fair. I was considering this only given the fact that we know the user doesn't have access to a partition (this isn't the surprise here) and that slurm communicates that as the reason pretty clearly. I can see how if a user is submitting against multiple partitions they might hope that if a job couldn't run in a given partition, given the number of others provided, the scheduler might consider all of those *before* dying outright at the first rejection. On Thu, Sep 21, 2023 at 10:28 AM Bernstein, Noam CIV USN NRL (6393) Washington DC (USA) <noam.bernst...@nrl.navy.mil> wrote: > On Sep 21, 2023, at 9:46 AM, David <dr...@umich.edu> wrote: > > Slurm is working as it should. From your own examples you proved that; by > not submitting to b4 the job works. However, looking at man sbatch: > > -p, --partition=<partition_names> > Request a specific partition for the resource allocation. > If not specified, the default behavior is to allow the slurm controller to > select > the default partition as designated by the system > administrator. If the job can use more than one partition, specify their > names in a comma > separate list and the one offering earliest initiation will > be used with no regard given to the partition name ordering (although > higher pri‐ > ority partitions will be considered first). When the job is > initiated, the name of the partition used will be placed first in the job > record > partition string. > > In your example, the job can NOT use more than one partition (given the > restrictions defined on the partition itself precluding certain accounts > from using it). This, to me, seems either like a user education issue (i.e. > don't have them submit to every partition), or you can try the job submit > lua route - or perhaps the hidden partition route (which I've not tested). > > > That's not at all how I interpreted this man page description. By "If the > job can use more than..." I thought it was completely obvious (although > perhaps wrong, if your interpretation is correct, but it never crossed my > mind) that it referred to whether the _submitting user_ is OK with it using > more than one partition. The partition where the user is forbidden (because > of the partition's allowed account) should just be _not_ the earliest > initiation (because it'll never initiate there), and therefore not run > there, but still be able to run on the other partitions listed in the batch > script. > > I think it's completely counter-intuitive that submitting saying it's OK > to run on one of a few partitions, and one partition happening to be > forbidden to the submitting user, means that it won't run at all. What if > you list multiple partitions, and increase the number of nodes so that > there aren't enough in one of the partitions, but not realize this > problem? Would you expect that to prevent the job from ever running on any > partition? > > Noam > -- David Rhey --------------- Advanced Research Computing University of Michigan