Sorry. I really don't want this to be a flame war, but...
...I would say having SLURM rpms in EPEL could be very helpful for a lot
of people.
I get that this took you by surprise, but that's not a reason to not
have them in the repository. I, for one, will happily test if they work
for me, and if they do, that means that I can stop having to build them.
I agree it's not hard to do, but if I don't have to do it I'll be very
happy about that.
There are surely other packages in public repositories where 'this would
break things if upgraded' applies. For example, nvidia driver RPMs and
I'm immensely grateful to the people maintaining those packages. It
makes my live setting up GPU node much easier. However, it would break
my production clusters if I pulled upgrades to those without proper
planning. That does not make me say that nvidia drivers ought not to be
in a public repository, it makes me configure version locking for them,
so I control what version is installed & when they are upgraded.
I mean sure you do test your upgrade sets before you apply them to your
production machines?
Along the same lines, if I (and a lot of other sites) where to upgrade
to the latest kernel version in the Red Hat repos, that'd break my file
system. Doesn't mean I think Red Hat should stop packaging kernels.
I would very much say the same thing applies to SLURM packages.
Tina
On 25/01/2021 14:36, Ole Holm Nielsen wrote:
On 1/25/21 2:59 PM, Andy Riebs wrote:
Several things to keep in mind...
1. Slurm, as a product undergoing frequent, incompatible revisions, is
not well-suited for provisioning from a stable public repository! On
the other hand, it's just not that hard to build a stable version
where you can directly control the upgrade timing.
I agree that Slurm is probably not well suited for a public repository
because of the special care that *must* be taken when upgrading between
major versions.
When I use both EPEL for a lot of nice software (Munge, Lmod, ...), AND
I build my own Slurm RPMs, now suddenly slurm RPMs from EPEL upsets this
stable scenario.
2. If you really want a closely managed source for your Slurm RPMs, get
them from the SchedMD website.
All of us get the Slurm source from the SchedMD website. And all of us
have to build our own RPMs from that source (a simple one-liner).
SchedMD doesn't provide any RPMs.
3. "You could have solicited advice..." -- while this is certainly true,
for many of us in the open source world, the standard is "release
something quickly, and then improve it, based in part on feedback,
over time."
I don't think this trial-and-error-like approach is suitable for Slurm.
We're running production HPC clusters that need to stay very stable.
4. Slurm packages (and other contributions, including suggestions on
this
mailing list) that haven't been provided by SchedMD have probably
been
provisioned and tested by a volunteer -- be sure to keep the
conversation civil!
We all have to build our own Slurm RPMs, and we should not get them from
a volunteer. IMHO, building Slurm RPMs is very simple. It's the
deployment and upgrading which is the hard part of the equation.
I think my points quoted below deserve careful consideration by the EPEL
volunteer, because the results could be potentially harmful.
Thanks,
Ole
Andy Riebs
On 1/25/2021 2:47 AM, Ole Holm Nielsen wrote:
On 1/23/21 9:43 PM, Philip Kovacs wrote:
I can assure you it was easier for you to filter slurm from your
repos than it was for me to make them available to both epel7 and
epel8.
No good deed goes unpunished I guess.
I do sympathize with your desire to make the Slurm installation a bit
easier by providing RPMs via the EPEL repo. I do not underestimate
the amount of work it takes to add software to EPEL.
However, I have several issues with your approach:
1. Breaking existing Slurm installations could cause big time
problems at a lot of sites! The combined work to repair broken
installations at many sites might be substantial. Sites who are more
than two releases behind 20.11 could end up with dysfunctional
clusters. You are undoubtedly aware that 20.11.3 fixes a major
problem in 20.11.2 wrt. OpenMPI, so the upgrade from 20.02 to 20.11.2
may cause problems.
2. Your EPEL RPMs *must not* upgrade between major Slurm releases,
like the 20.02 to 20.11 upgrade that almost happened at our site! I
refer again to the delicate upgrade procedure described in
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm
3. You could have solicited advice from the slurm-users list before
planning your EPEL Slurm packages.
4. How do you plan to keep updating future Slurm minor versions on
EPEL in a timely fashion?
5. How did you build your RPM packages? The built-in options may be
important, for example, this might be recommended:
$ rpmbuild -ta slurm-xxx.tar.bz2 --with mysql --with slurmrestd
6. Building Slurm RPM packages is actually a tiny part of what it
takes to install Slurm from scratch. There are quite a number of
prerequisites and other things to set up besides the RPMs, see
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation
plus configuration of Slurm itself and its database.
In conclusion, I would urge you to ensure that your EPEL packages
won't mess up existing Slurm installations! I agree with Ryan
Novosielski that you should rename your RPMs so that they don't
overwrite packages built by SchedMD's rpmbuild system.
I propose that you add the major version 20.11 right after the
"slurm" name so that your EPEL RPMs would be named "slurm-20.11-*"
like in:
slurm-20.11-20.11.2-2.el7.x86_64
People with more knowledge of RPM than I have could help you ensure
that no unwarranted upgrades or double Slurm installations can take
place.
Thanks,
Ole
On Saturday, January 23, 2021, 07:03:08 AM EST, Ole Holm Nielsen
<ole.h.niel...@fysik.dtu.dk> wrote:
We use the EPEL yum repository on our CentOS 7 nodes. Today EPEL
surprisingly delivers Slurm 20.11.2 RPMs, and the daily yum updates
(luckily) fail with some errors:
--> Running transaction check
---> Package slurm.x86_64 0:20.02.6-1.el7 will be updated
--> Processing Dependency: slurm(x86-64) = 20.02.6-1.el7 for package:
slurm-libpmi-20.02.6-1.el7.x86_64
--> Processing Dependency: libslurmfull.so()(64bit) for package:
slurm-libpmi-20.02.6-1.el7.x86_64
---> Package slurm.x86_64 0:20.11.2-2.el7 will be an update
--> Processing Dependency: pmix for package: slurm-20.11.2-2.el7.x86_64
--> Processing Dependency: libfreeipmi.so.17()(64bit) for package:
slurm-20.11.2-2.el7.x86_64
--> Processing Dependency: libipmimonitoring.so.6()(64bit) for package:
slurm-20.11.2-2.el7.x86_64
--> Processing Dependency: libslurmfull-20.11.2.so()(64bit) for
package:
slurm-20.11.2-2.el7.x86_64
---> Package slurm-contribs.x86_64 0:20.02.6-1.el7 will be updated
---> Package slurm-contribs.x86_64 0:20.11.2-2.el7 will be an update
---> Package slurm-devel.x86_64 0:20.02.6-1.el7 will be updated
---> Package slurm-devel.x86_64 0:20.11.2-2.el7 will be an update
---> Package slurm-perlapi.x86_64 0:20.02.6-1.el7 will be updated
---> Package slurm-perlapi.x86_64 0:20.11.2-2.el7 will be an update
---> Package slurm-slurmdbd.x86_64 0:20.02.6-1.el7 will be updated
---> Package slurm-slurmdbd.x86_64 0:20.11.2-2.el7 will be an update
--> Running transaction check
---> Package freeipmi.x86_64 0:1.5.7-3.el7 will be installed
---> Package pmix.x86_64 0:1.1.3-1.el7 will be installed
---> Package slurm.x86_64 0:20.02.6-1.el7 will be updated
--> Processing Dependency: slurm(x86-64) = 20.02.6-1.el7 for package:
slurm-libpmi-20.02.6-1.el7.x86_64
--> Processing Dependency: libslurmfull.so()(64bit) for package:
slurm-libpmi-20.02.6-1.el7.x86_64
---> Package slurm-libs.x86_64 0:20.11.2-2.el7 will be installed
--> Finished Dependency Resolution
Error: Package: slurm-libpmi-20.02.6-1.el7.x86_64
(@/slurm-libpmi-20.02.6-1.el7.x86_64)
Requires: libslurmfull.so()(64bit)
Removing: slurm-20.02.6-1.el7.x86_64
(@/slurm-20.02.6-1.el7.x86_64)
libslurmfull.so()(64bit)
Updated By: slurm-20.11.2-2.el7.x86_64 (epel)
Not found
Error: Package: slurm-libpmi-20.02.6-1.el7.x86_64
(@/slurm-libpmi-20.02.6-1.el7.x86_64)
Requires: slurm(x86-64) = 20.02.6-1.el7
Removing: slurm-20.02.6-1.el7.x86_64
(@/slurm-20.02.6-1.el7.x86_64)
slurm(x86-64) = 20.02.6-1.el7
Updated By: slurm-20.11.2-2.el7.x86_64 (epel)
slurm(x86-64) = 20.11.2-2.el7
You could try using --skip-broken to work around the problem
You could try running: rpm -Va --nofiles --nodigest
We still run Slurm 20.02 and don't want EPEL to introduce any Slurm
updates!! Slurm must be upgraded with some care, see for example
https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm <https://wiki.fysik.dtu.dk/niflheim/Slurm_installation#upgrading-slurm>
Therefore we must disable EPEL's slurm RPMs permanently. The fix is to
add to the file /etc/yum.repos.d/epel.repo an "exclude=slurm*" line
like
the last line in:
[epel]
name=Extra Packages for Enterprise Linux 7 - $basearch
#baseurl=http://download.fedoraproject.org/pub/epel/7/$basearch
<http://download.fedoraproject.org/pub/epel/7/$basearch>
metalink=https://mirrors.fedoraproject.org/metalink?repo=epel-7&arch=$basearch&infra=$infra&content=$contentdir
<https://mirrors.fedoraproject.org/metalink?repo=epel-7&arch=$basearch&infra=$infra&content=$contentdir>
failovermethod=priority
enabled=1
gpgcheck=1
gpgkey=file:///etc/pki/rpm-gpg/RPM-GPG-KEY-EPEL-7
exclude=slurm*
/Ole
--
Tina Friedrich, Advanced Research Computing Snr HPC Systems Administrator
Research Computing and Support Services
IT Services, University of Oxford
http://www.arc.ox.ac.uk http://www.it.ox.ac.uk