Re: [slurm-users] Help with preemtion based on licenses

2019-11-06 Thread Oytun Peksel
Thank you all for your input. Being a newbie in this, my impression from what you guys write is for most commercial software suspend/release_license/reacquire mechanism is not feasible. (Answer to Mark) What we are using here is an engineering software called abaqus. In abaqus you can use token

Re: [slurm-users] Help with preemtion based on licenses

2019-11-06 Thread Chris Samuel
On Wednesday, 6 November 2019 7:36:57 AM PST Oytun Peksel wrote: > GPU part of the discussion is beyond my knowledge so I assumed it would be > possible to release it. If you simply suspend a job then the application does not exit, it will just get stopped and so will be holding various resource

Re: [slurm-users] Help with preemtion based on licenses

2019-11-06 Thread Reuti
> Am 06.11.2019 um 16:36 schrieb Oytun Peksel : > > Thanks for the information Mark. > > I understand. GPU part of the discussion is beyond my knowledge so I assumed > it would be possible to release it. > > But as for the licenses it is always possible to leave it to the system > admin. It

Re: [slurm-users] Help with preemtion based on licenses

2019-11-06 Thread Oytun Peksel
Thanks for the information Mark. I understand. GPU part of the discussion is beyond my knowledge so I assumed it would be possible to release it. But as for the licenses it is always possible to leave it to the system admin. It is possible to take care of license release and reacquire using scr

Re: [slurm-users] Help with preemtion based on licenses

2019-11-06 Thread Mark Hahn
This does not make sense to me. If gpu is my generic resource why would it not release the gpu resources if a job is suspended? how would that be implemented? how would the scheduler reach into the application and cause the license to be released and reacquired? after all, the license server

Re: [slurm-users] Replacement for FastSchedule since 19.05.3

2019-11-06 Thread Paul Edmon
I would submit a bug on this just to inquire as to if there is any replacement or option for FastSchedule 0.  They may have suggestions. -Paul Edmon- On 11/6/19 3:16 AM, Taras Shapovalov wrote: Hi Chris, Thanks for the answer. Does this mean that there is no way to get rid of annoying error

Re: [slurm-users] Help with preemtion based on licenses

2019-11-06 Thread Oytun Peksel
Ok, I found out it is possible to preempt on licenses if you define the license as a generic resource. Such as: GresTypes=license NodeName=SomeNode Gres=license:someSoftware:100 And submit the jobs with --gres=license:someSoftware:20 But this does not work with PreemptMode=Suspend. It would requ

Re: [slurm-users] Running job using our serial queue

2019-11-06 Thread Marcus Wagner
Hi David, if I remember right (we have disabled swap for years now), swapping out processes seem to slow down the system overall. But I know, that if the oom_killer does its job (killing over memory processes), the whole system is stalled until it has done its work. This might be the issue, yo

Re: [slurm-users] Replacement for FastSchedule since 19.05.3

2019-11-06 Thread Taras Shapovalov
Hi Chris, Thanks for the answer. Does this mean that there is no way to get rid of annoying error messages in the logs if we need the hardware autodetection (FastSchedule=0)? error: FastSchedule will be removed in 20.02, as will the FastSchedule=0 functionality. Please consider removing this from

Re: [slurm-users] Help with preemtion based on licenses

2019-11-06 Thread Oytun Peksel
Yes of course no one would expect the resource manager to control the job applications to release licenses. Sometimes licenses are released either automatically or can be done by scripts. The desired behavior here while using '--license someSoftware@someserver:x ' : if there are not enough lic