[slurm-users] Nodes show by sinfo in partitions

2019-05-17 Thread Verzelloni Fabio
Hello,
I have a question related to the cloud feature or a feature that can solve an 
issue that I have with my cluster,to make it simple let say that I have a set 
of nodes ( let say 10 nodes ), if needed I move node/s from cluster A to 
cluster B and in my slurm.conf I define all the possible number of available 
nodes:

Cluster A
NodeName=clusterA-[001-010]

Cluster B
NodeName=clusterB-[001-010]

In normal operation I have 5 nodes in 'cluster A' and 5 in 'cluster B', but in 
case of needs I reboot a node of 'cluster B' in 'cluster A', and the result 
will be 4 nodes in 'cluster B' and 6 in 'cluster A'.
The "issue" is that since I specified all possible nodes in slurm.conf, when I 
ran sinfo what I see is:

Cluster A
Normal up 1-00:00:00 5 up clusterA-[01-05]
Normal up 1-00:00:00 5 down* clusterA-[06-10]
 
Cluster B
Normal up 1-00:00:00 5 up clusterB-[06-10]
Normal up 1-00:00:00 5 down* clusterB-[01-5]

And in both slurmctld.log I have the message:

error: Unable to resolve "clusterA-006": Unknown host

or 

error: Unable to resolve "clusterB-001": Unknown host

Since I have a lot of partitions and a lot of nodes, the sinfo it is much more 
complicated to read due to the DOWN nodes that are actually not present in the 
system, is there a way/feature/option that wont display in the sinfo nodes that 
are actually NOT present and reachable by the slurmctld due to the  "error: 
Unable to resolve "clusterA-006": Unknown host " ?

Basically I'd like to have in both slurm.conf all the possible nodes but the 
sinfo should shows:

Cluster A
Normal up 1-00:00:00 5 up clusterA-[01-05]

Cluster B
Normal up 1-00:00:00 5 up clusterB-[06-10]

And If I move a node once the node is actually reachable:

Cluster A
Normal up 1-00:00:00 6 up clusterA-[01-06]

Cluster B
Normal up 1-00:00:00 4 up clusterB-[07-10]

Thanks
Fabio

--
- Fabio Verzelloni - CSCS - Swiss National Supercomputing Centre
via Trevano 131 - 6900 Lugano, Switzerland
Tel: +41 (0)91 610 82 04
 



[slurm-users] RPM build error - accounting_storage_mysql.so

2019-07-19 Thread Verzelloni Fabio
Hi Everyone,
I'm trying to rebuild slurm rpms for 18.08.7 on RHEL7.6 with this command:

rpmbuild -tb --define "_prefix /opt/slurm/18.08.07"   --define "_sysconfdir 
/etc/slurm"   --define "_slurm_sysconfdir /etc/slurm" slurm-18.08.7.tar.bz2

But no matter what, I'm keep getting this error:

...
Provides: slurm-slurmd = 18.08.7-1.el7 slurm-slurmd(x86-64) = 18.08.7-1.el7
Requires(interp): /bin/sh /bin/sh /bin/sh
Requires(rpmlib): rpmlib(FileDigests) <= 4.6.0-1 rpmlib(PayloadFilesHavePrefix) 
<= 4.0-1 rpmlib(CompressedFileNames) <= 3.0.4-1
Requires(post): /bin/sh
Requires(preun): /bin/sh
Requires(postun): /bin/sh
Requires: libc.so.6()(64bit) libc.so.6(GLIBC_2.14)(64bit) 
libc.so.6(GLIBC_2.2.5)(64bit) libc.so.6(GLIBC_2.3.2)(64bit) 
libc.so.6(GLIBC_2.3.4)(64bit) libc.so.6(GLIBC_2.3)(64bit) 
libc.so.6(GLIBC_2.4)(64bit) libc.so.6(GLIBC_2.7)(64bit) libdl.so.2()(64bit) 
libpam_misc.so.0()(64bit) libpam_misc.so.0(LIBPAM_MISC_1.0)(64bit) 
libpam.so.0()(64bit) libpam.so.0(LIBPAM_1.0)(64bit) libpthread.so.0()(64bit) 
libpthread.so.0(GLIBC_2.2.5)(64bit) libpthread.so.0(GLIBC_2.3.2)(64bit) 
libslurmfull.so()(64bit) libutil.so.1()(64bit) libutil.so.1(GLIBC_2.2.5)(64bit) 
libz.so.1()(64bit) libz.so.1(ZLIB_1.2.0)(64bit)
Processing files: slurm-slurmdbd-18.08.7-1.el7.x86_64
error: File not found: 
/root/rpmbuild/BUILDROOT/slurm-18.08.7-1.el7.x86_64/opt/slurm/18.08.07/lib64/slurm/accounting_storage_mysql.so


RPM build errors:
File not found: 
/root/rpmbuild/BUILDROOT/slurm-18.08.7-1.el7.x86_64/opt/slurm/18.08.07/lib64/slurm/accounting_storage_mysql.so
File not found: 
/root/rpmbuild/BUILDROOT/slurm-18.08.7-1.el7.x86_64/opt/slurm/18.08.07/lib64/slurm/accounting_storage_mysql.so

It seems that I have all dependencies in place, and on RHEL7.5 I did not get 
this issue, suggestions are appreciated. Moreover is my rpmbuild command ok to 
get also sview or do I need to pass some other option?

Thanks
Fabio

--
- Fabio Verzelloni - CSCS - Swiss National Supercomputing Centre
via Trevano 131 - 6900 Lugano, Switzerland
Tel: +41 (0)91 610 82 04
 



[slurm-users] Job error when using --job-name=`basename $PWD`

2019-07-28 Thread Verzelloni Fabio
Hi Everyone, 
I'm experiencing a weird issue, when submitting a job like that:
-
#!/bin/bash
#SBATCH --job-name=`basename $PWD`
#SBATCH --ntasks=2
srun -n 2 hostname
-
Output:
srun: error: Unable to create step for job 15387: More processors requested 
than permitted

If I submit a job like that:
-
#!/bin/bash
#SBATCH --job-name=myjob
#SBATCH --ntasks=2
srun -n 2 hostname
-
Output:
Mynode-001
Mynode-001

If I decrease the number of task it works fine:
-
#!/bin/bash
#SBATCH --job-name=`basename $PWD`
#SBATCH --ntasks=1
srun -n 1 hostname
-
Output:
Mynode-001

The slurm version is 18.08.8, is that a bug in slurm?

Thanks
Fabio

--
- Fabio Verzelloni - CSCS - Swiss National Supercomputing Centre
via Trevano 131 - 6900 Lugano, Switzerland
Tel: +41 (0)91 610 82 04
 



Re: [slurm-users] Job error when using --job-name=`basename $PWD`

2019-07-29 Thread Verzelloni Fabio
Dear Sven, thanks for the clarification.

Fabio

On 29.07.19, 11:44, "slurm-users on behalf of Sven Hansen" 
 wrote:

Hi Fabio,

SLURM does not support usual shell expression evaluation or arbitrary 
variable substitution within #SBATCH arguments. If the option parser 
comes across options it does not understand, it tends to abort and 
assumes default settings for all mandatory unset options. This can be 
nasty for newcomers because it happens silently.

If you need to set job arguments dynamically, altering a basic jobscript 
frame with a programming language of your choice is the go-to option. As 
so often, sed does the trick very well.

Best,
Sven

Am 29.07.2019 um 08:22 schrieb Marcus Boden:
> Hi Fabio,
>
> are you sure that command substition works in the #SBATCH part of the
> jobscript? I don't think that slurm actally evaluates that, though I
> might be wrong.
>
> It seems like the #SBATCH after the --job-name line are not evaluated
> anymore, therefore you can't start srun with two tasks (since slurm only
> allocates one).
>
> Best regards,
    > Marcus
>
> On 19-07-29 05:51, Verzelloni  Fabio wrote:
>> Hi Everyone,
>> I'm experiencing a weird issue, when submitting a job like that:
>> -
>> #!/bin/bash
>> #SBATCH --job-name=`basename $PWD`
>> #SBATCH --ntasks=2
>> srun -n 2 hostname
>> -
>> Output:
>> srun: error: Unable to create step for job 15387: More processors 
requested than permitted
>>
>> If I submit a job like that:
>> -
>> #!/bin/bash
>> #SBATCH --job-name=myjob
>> #SBATCH --ntasks=2
>> srun -n 2 hostname
>> -
>> Output:
>> Mynode-001
>> Mynode-001
>>
>> If I decrease the number of task it works fine:
>> -
>> #!/bin/bash
>> #SBATCH --job-name=`basename $PWD`
>> #SBATCH --ntasks=1
>> srun -n 1 hostname
>> -
>> Output:
>> Mynode-001
>>
>> The slurm version is 18.08.8, is that a bug in slurm?
>>
>> Thanks
>> Fabio
>>
>> --
>> - Fabio Verzelloni - CSCS - Swiss National Supercomputing Centre
>> via Trevano 131 - 6900 Lugano, Switzerland
>> Tel: +41 (0)91 610 82 04
>>   
>>