Hi Davide,

In your slurmctld log you see an entry "error: job_submit/lua: /opt/slurm/job_submit.lua".

What I think happens is that when slurmctld encounters an error in job_submit.lua, it will revert to the last known good script cached by slurmctld and ignore the file on disk from now on, even if it has been corrected. An "scontrol reconfig" may make slurmctld reread the job_submit.lua, please try it.

I believe that this slurmctld behavior is undocumented at present. Please see https://bugs.schedmd.com/show_bug.cgi?id=14472#c15 for a description:

And, if after the reconfigure, the job_submit.lua is wrong formatted (or 
missing), it will use the previous version of the script (which we have stored 
backup previously):

/Ole


On 9/7/22 14:21, Davide DelVento wrote:
Thanks Ole, your wiki page sheds some light on this mystery.
Very frustrating that even the simple example provided in the release
fails, and it fails at the most basic logging functionality.

Note that "my" job_submit.lua is now the unmodified, slurm-provided
one.... and that the luac command returns nothing in my case (this is
Lua 5.3.4) so syntax seems correct?

Yet the logs report the problem I mentioned rather than the actual
content that the plugin is attempting to log.

On Wed, Sep 7, 2022 at 2:13 AM Ole Holm Nielsen
<ole.h.niel...@fysik.dtu.dk> wrote:

Hi Davide,

I suggest that you check your job_submit.lua script with the LUA compiler:

luac -p /etc/slurm/job_submit.lua

I have written some more details in my Wiki page
https://wiki.fysik.dtu.dk/niflheim/Slurm_configuration#job-submit-plugins

Best regards,
Ole

On 9/7/22 01:51, Davide DelVento wrote:
Thanks again to both of you.

I actually did not build Slurm myself, otherwise I'd keep extensive
logs of what I did. Other people did, so I don't know. However, I get
the same grep'ing results as yours.

Looking at the logs reveals some info, but it's cryptic.

[2022-09-06T17:33:56.513] debug3: job_submit/lua:
slurm_lua_loadscript: skipping loading Lua script:
/opt/slurm/job_submit.lua
[2022-09-06T17:33:56.513] error: job_submit/lua:
/opt/slurm/job_submit.lua: [string "slurm.user_msg
(string.format(table.unpack({...})))"]:1: bad argument #2 to 'format'
(no value)

As you can see, there is no line number and there is nothing like
user_msg in this code. There is indeed an "unpack" which is used in
the SchedMD-defined logging helper function which has a comment
"Implicit definition of arg was removed in Lua 5.2" and that's where I
speculate the error occurs.

I should stress, this is with their own example, not my code. I guess
I could forgo the logging and move forward, but that won't probably
lead me very far.

I am contemplating submitting a github issue about it? I did check
that the version of the job_submit.lua I have is the same currently in
the repo at 
https://github.com/SchedMD/slurm/blob/master/etc/job_submit.lua.example

On Thu, Sep 1, 2022 at 11:55 PM Ole Holm Nielsen
<ole.h.niel...@fysik.dtu.dk> wrote:

Did you install all prerequiste packages (including lua) on the server
where you built the Slurm packages?

On my system I get:

$ strings `which slurmctld ` | grep HAVE_LUA
HAVE_LUA 1

/Ole

https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#install-prerequisites

On 9/2/22 05:15, Davide DelVento wrote:
Thanks.

I did try a lua script as soon as I got your first email, but that
never worked (yes, I enabled it in slurm.conf and ran "scontrol
reconfigure" after). Slurm simply acted as if there was no job_submit script.

After various tests, all unsuccessful, today I found that link which I
mentioned saying that lua might not be compiled in, hence all my most
recent messages of this thread.

That file is indeed there, so that's good news that I don't need to recompile.
However I'm puzzled on what might be missing...


On Thu, Sep 1, 2022 at 6:33 PM Brian Andrus <toomuc...@gmail.com> wrote:

lua is the language you can use with the job_submit plugin.

I was showing a quick way to see that job_submit capability is indeed in
there.

You can see if lua support is there by looking for the job_submit_lua.so
file is there.
It would be part of the slurm rpm (not the slurm-slurmctl rpm)

Usually it would be found at /usr/lib64/slurm/job_submit_lua.so

If that is there, you should be good with trying out a job_submit lua
script.

Brian Andrus

On 9/1/2022 1:24 PM, Davide DelVento wrote:
Thanks again, Brian, indeed that grep returns many hits, but none of
them includes lua, i.e.

     strings `which slurmctld ` | grep -i job_submit | grep -i lua

returns nothing. So I should use the C rather than the more convenient
lua interface, unless I recompile or am I missing something?

On Thu, Sep 1, 2022 at 12:30 PM Brian Andrus <toomuc...@gmail.com> wrote:
I would be surprised if it were compiled without the support. However,
you could check and run something like:

strings /sbin/slurmctld | grep job_submit

(or where ever your slurmctld binary is). There should be quite a few
lines with that in it.

Brian Andrus

On 9/1/2022 10:54 AM, Davide DelVento wrote:
Thanks Brian for the suggestion, which I am now exploring.

The documentation is a bit cryptic for me, but exploring a few things
and checking 
https://funinit.wordpress.com/2018/06/07/how-to-use-job_submit_lua-with-slurm/
I suspect my slurm install (provided by cluster vendor) was not
compiled with the lua plugin installed. Do you know how to verify if
that is the case or if it's something else? I don't see a way to show
if the plugin is actually being "seen" by slurm, and I suspect it's
not.

Does anyone else have other suggestions or comment on either the
plugin or the prolog workaround?

Thanks!


On Tue, Aug 30, 2022 at 3:01 PM Brian Andrus <toomuc...@gmail.com> wrote:
Not sure if you can do all the things you intend, but the job_submit
script is precisely where you want to check submission options.

https://slurm.schedmd.com/job_submit_plugins.html

Brian Andrus

On 8/30/2022 12:58 PM, Davide DelVento wrote:
Hi,

I would like to soft-enforce license utilization only when the -L is
set. My idea: check in the prolog if the license was requested and
only if it were, set the environmental variables needed for the
license.

I looked at all environmental variables set by slurm and did not find
any related to the license as I was hoping.

As a workaround, I could check

scontrol show job $SLURM_JOB_ID | grep License

and that would work, but (as discussed in other messages in this list)
the documentation at https://slurm.schedmd.com/prolog_epilog.html say

Prolog and Epilog scripts should be designed to be as short as possible
and should not call Slurm commands (e.g. squeue, scontrol, sacctmgr,
etc). [...] Slurm commands in these scripts can potentially lead to performance
issues and should not be used.
This is a bit of a concern, since the prolog would be invoked for
every job on the cluster, and it's a prolog (rather than the epilogue
like discussed in earlier messages).

So two questions:

1) is there a better workaround to check in the prolog if the current
job requested a license and/or
2) would this kind of use of scontrol be okay or is indeed a concern

Reply via email to