from:"Alan Stange via slurm\-users"

[slurm-users] question on sbatch --prefer

2024-02-09 Thread Alan Stange via slurm-users

Hello all,

I'm somewhat new to Slurm, but long time user of other batch systems.  
Assume we have a simple cluster of uniform racks of systems with no
special resources, and our jobs are all single cpu tasks.

Lets say I have a long running job in the cluster, which needs to spawn
a helper process into the cluster.  We have a strong preference for this
helper to run on the same cluster node as the original job, but if that
node is already scheduled full, then we want this new task to be
scheduled on another systems without any delay.

The problem I have is that the --nodelist doesn't solve this, and, as
far as I can tell, there's no option with --prefer to specify a node
name as a resource, without creating a gres for every hostname in the
cluster.

It seems like what I'm trying to do should be achievable, but having
read though the documentation and searched the archives of this list,
I'm not seeing a solution.

I'm hoping someone here has some experience with this and can point me
in the right direction.

Sincerely,

Alan

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: [INTERNET] Re: question on sbatch --prefer

2024-02-09 Thread Alan Stange via slurm-users

Chip,

Thank you for your prompt response.  We could do that, but the helper is
optional, and at times might involve additional helpers depending  on
the inputs to the problem being solved, and we don't a priori know the
number of helpers that might be needed.

Alan

On 2/9/24 10:59, Chip Seraphine wrote:
> Normally I'd address this by having an sbatch script allocate enough 
> resources for both jobs (specifying one node), and then kick off the helper 
> as a separate step (assuming I am understanding your issue correctly).
>
>
> On 2/9/24, 9:57 AM, "Alan Stange via slurm-users" 
> mailto:slurm-users@lists.schedmd.com>> wrote:
>
>
>
> Hello all,
>
>
> I'm somewhat new to Slurm, but long time user of other batch systems.
> Assume we have a simple cluster of uniform racks of systems with no
> special resources, and our jobs are all single cpu tasks.
>
>
> Lets say I have a long running job in the cluster, which needs to spawn
> a helper process into the cluster. We have a strong preference for this
> helper to run on the same cluster node as the original job, but if that
> node is already scheduled full, then we want this new task to be
> scheduled on another systems without any delay.
>
>
> The problem I have is that the --nodelist doesn't solve this, and, as
> far as I can tell, there's no option with --prefer to specify a node
> name as a resource, without creating a gres for every hostname in the
> cluster.
>
>
> It seems like what I'm trying to do should be achievable, but having
> read though the documentation and searched the archives of this list,
> I'm not seeing a solution.
>
>
> I'm hoping someone here has some experience with this and can point me
> in the right direction.
>
>
> Sincerely,
>
>
> Alan
>
>
> --
> slurm-users mailing list -- slurm-users@lists.schedmd.com 
> <mailto:slurm-users@lists.schedmd.com>
> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com 
> <mailto:slurm-users-le...@lists.schedmd.com>
>
>
>
> This e-mail and any attachments may contain information that is confidential 
> and proprietary and otherwise protected from disclosure. If you are not the 
> intended recipient of this e-mail, do not read, duplicate or redistribute it 
> by any means. Please immediately delete it and any attachments and notify the 
> sender that you have received it by mistake. Unintended recipients are 
> prohibited from taking action on the basis of information in this e-mail or 
> any attachments. The DRW Companies make no representations that this e-mail 
> or any attachments are free of computer viruses or other defects.


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Running slurm on alternate ports

2024-05-20 Thread Alan Stange via slurm-users

Hello all,

for testing purposes, we would like to run slurm on ports different from
the default values.   No problems in setting this up.  But how does one
tell srun/sbatch/etc what the different port numbers are?   I see no
command line options to specify a port or an alternate configuration file.


Thank you,

Alan

-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] cpu distribution question

2024-06-07 Thread Alan Stange via slurm-users

All,

I have a very simple slurm cluster.  It's just a single system with 2
sockets and 16 cores in each socket.  I would like to be able to submit
a simple task into this cluster, and to have the cpus assigned to that
task allocated round robin across the two sockets.   Everything I try is
putting all the cpus for this single task on the same socket.

I have not specified any CpuBind options in the slurm.conf file.   For
example, if I try

$ srun -c 4 --pty bash

I get a shell prompt on the system, and can run

$ taskset -cp $$
pid 12345 current affinity list: 0,2,4,6

and I get this same set of cpus no matter what options I try (the
cluster is idle with no tasks consuming slots).

I've tried various srun command line options like:
--hint=compute_bound
--hint=memory_bound
various --cpubind options
-B 2:2 -m block:cyclic and block:fcyclic

Note that if I try to allocation 17 cpus, then I do get the 17th cpu
allocated on the 2nd socket.


What magic incantation is needed to get an allocation where the cpus are
chosen round robin across the sockets?

Thank you!

Alan


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] Re: [INTERNET] Re: cpu distribution question

2024-06-08 Thread Alan Stange via slurm-users

Thank you for this note, the comments on the issue you raised have saved
me a lot of time.

I agree that the documentation around this issue of cpu
assignment/allocation is confusing and part of the problem.    Of
particular concern is the changing behavior of cpu allocation when
adding a gres that has nothing to do with cpu assignment.  The
non-deterministic behavior here is unsettling.

I guess I will need to have  a look at a plugin so that we can ensure
deterministic behavior.   

We have already modified the slurm source to allow regular users to
create there own reservations, so perhaps I might get annoyed enough to
'fix' this in the slurm code base.

Thank you!

Alan

On 6/7/24 18:36, Juergen Salk wrote:
> Hi Alan,
>
> unfortunately, process placement in Slurm is kind of black magic for
> sub-node jobs, i.e. jobs that allocate only a small number of CPUs of
> a node. 
>
> I have recently raised a similar question here:
>
>  https://support.schedmd.com/show_bug.cgi?id=19236
>
> And the buttom line was, that to "really have control over task placement 
> you really have to allocate the node in --exclusive manner". 
>
> Best regards
> Jürgen
>
>
> * Alan Stange via slurm-users  [240607 14:52]:
>> All,
>>
>> I have a very simple slurm cluster.  It's just a single system with 2
>> sockets and 16 cores in each socket.  I would like to be able to submit
>> a simple task into this cluster, and to have the cpus assigned to that
>> task allocated round robin across the two sockets.   Everything I try is
>> putting all the cpus for this single task on the same socket.
>>
>> I have not specified any CpuBind options in the slurm.conf file.   For
>> example, if I try
>>
>> $ srun -c 4 --pty bash
>>
>> I get a shell prompt on the system, and can run
>>
>> $ taskset -cp $$
>> pid 12345 current affinity list: 0,2,4,6
>>
>> and I get this same set of cpus no matter what options I try (the
>> cluster is idle with no tasks consuming slots).
>>
>> I've tried various srun command line options like:
>> --hint=compute_bound
>> --hint=memory_bound
>> various --cpubind options
>> -B 2:2 -m block:cyclic and block:fcyclic
>>
>> Note that if I try to allocation 17 cpus, then I do get the 17th cpu
>> allocated on the 2nd socket.
>>
>>
>> What magic incantation is needed to get an allocation where the cpus are
>> chosen round robin across the sockets?
>>
>> Thank you!
>>
>> Alan
>>
>>
>> -- 
>> slurm-users mailing list -- slurm-users@lists.schedmd.com
>> To unsubscribe send an email to slurm-users-le...@lists.schedmd.com
> -- 
> Jürgen Salk
> Scientific Software & Compute Services (SSCS)
> Kommunikations- und Informationszentrum (kiz)
> Universität Ulm
> Telefon: +49 (0)731 50-22478
> Telefax: +49 (0)731 50-22471


-- 
slurm-users mailing list -- slurm-users@lists.schedmd.com
To unsubscribe send an email to slurm-users-le...@lists.schedmd.com

[slurm-users] question on sbatch --prefer

[slurm-users] Re: [INTERNET] Re: question on sbatch --prefer

[slurm-users] Running slurm on alternate ports

[slurm-users] cpu distribution question

[slurm-users] Re: [INTERNET] Re: cpu distribution question

5 matches

Site Navigation

Mail list logo

Footer information