> Am 06.11.2019 um 16:36 schrieb Oytun Peksel <oytun.pek...@semcon.com>:
> 
> Thanks for the information Mark.
> 
> I understand. GPU part of the discussion is beyond my knowledge so I assumed 
> it would be possible to release it.
> 
> But as for the licenses it is always possible to leave it to the system 
> admin. It is possible to take care of license release and reacquire using 
> scripts instead of assuming it is not possible. At least there should be an 
> easy configuration option to configure generic or trackable resources to be 
> releasable.

To name some additional obstacles to Mark's notes:

In the inaction of any queuing system and the license tracking mechanism inside 
each application there can for sure many things be improved. But it starts 
already with the constraint that there is to my knowledge no mechanism in any 
license daemon to "check and reserve/acquire a license if available" in an 
atomic operation, so that the queuing system is aware of the availability of a 
license and schedule a job to use it. What might come close is to borrow a 
license in a scheduling run and use this information for an upcoming job. But 
here already the limitations of each allocation might be different: some 
vendors allow to release a borrowed license premature, while others don't allow 
this and one has to wait for the specified timeframe to elapse.

Then there is the application itself: when does it check for an available 
license? Just as the application starts, periodic every certain amount of 
elapsed time, or for each iteration while it's running – or will it hold the 
license while it's running and only release it when it finishes? What will 
happen if the application was suspended for some time and when it continues it 
might discover that there were X minutes without a license daemon response and 
so it might quit. If one is lucky: results achieved up to this point can still 
be saved.

To make the things worse: what type of license is used by a particular 
application? One license per core/thread, per CPU, per job, per machine; or per 
machine per user or for each group on this machine?

One positive aspect could be, if one job consists of several instances of a 
program like a compiler when compiling a large application and the job could be 
stopped exactly when no compiler instance is active but just the job script.

Sure, for some applications it might be possible to script this in some way. So 
in my opinion the first goal for such a proposal would be to get this working 
outside of any queuing system. Stop the application on a local machine with a 
sigstop and attempt to use the license by another instance of this application, 
being it the same or another machine. Often the state of the license daemon can 
be checked and the stopped application should allow the counter of the 
available licenses to increment again in the license daemon's state output.

-- Reuti


> After all software licenses might be the most expensive resource to utilize  
> where preemption might sometimes be inevitable.
> 
> For now I have no better plan than to dig in the source code to find an easy 
> way to change this behavior.
> 
> Oytun Peksel
> oytun.pek...@semcon.com
> Mobile   +46739205917
> 
> 
> -----Original Message-----
> From: slurm-users <slurm-users-boun...@lists.schedmd.com> On Behalf Of Mark 
> Hahn
> Sent: den 6 november 2019 16:23
> To: Slurm User Community List <slurm-users@lists.schedmd.com>
> Subject: Re: [slurm-users] Help with preemtion based on licenses
> 
>> This does not make sense to me. If gpu is my generic resource why would it 
>> not release the gpu resources if a job is suspended?
> 
> how would that be implemented?  how would the scheduler reach into the 
> application and cause the license to be released and reacquired?
> after all, the license server is otherwise oblivious to whether the job it 
> has granted a license to has been suspended or resumed.
> this applies to other gres as well - for instance GPUs, since there's no 
> mechanism to free up GPU resources allocated to a suspended process.
> 
> *that* is the problem - merely adding and substracting is not.
> 
> regards, mark hahn.
> 
> 
> 
> When you communicate with us or otherwise interact with Semcon, we will 
> process personal data that you provide to us or we collect about you, please 
> read more in our Privacy Policy<https://semcon.com/data-privacy-policy/>.
> 


Reply via email to