OK so OpenMPI works fine. That means SLURM, OFED and hardware are fine.
Which mvapich2 package are you using, a home built one or one provided by
Bright ?
Regards,
--
Jan-Albert
Jan-Albert van Ree | Linux System Administrator | Digital Services
MARIN | T +31 317 49 35 48 | j.a.v@marin.n
The latest MariaDB packaging is different, there is a 3rd RPM needed, as
well as the client and developer. Away from my desk but the info is on the
MariaDB site.
William
On Wed, 11 Dec 2019, 05:23 Chris Samuel, wrote:
> On Tuesday, 10 December 2019 1:57:59 PM PST Dean Schulze wrote:
>
> > This
On Tuesday, 10 December 2019 1:57:59 PM PST Dean Schulze wrote:
> This bug report from a couple of years ago indicates a source code issue:
>
> https://bugs.schedmd.com/show_bug.cgi?id=3278
>
> This must have been fixed by now, though.
>
> I built using slurm-19.05.2. Does anyone know if this
Hi Chris,
On Tuesday, 10 December 2019 11:49:44 AM PST Chris Woelkers - NOAA Federal
wrote:
> Test jobs, submitted via sbatch, are able to run on one node with no problem
> but will not run on multiple nodes. The jobs are using mpirun and mvapich2
> is installed.
Is there a reason why you aren'
Hi Chris,
Your issue sounds similar to a case I ran into once, where I could run jobs
on a few nodes, but once it spanned more than a handful it would fail. In
that particular case, we figured out that it was due to broadcast storm
protection being enabled on the cluster switch. When the first n
Thanks for the reply and the things to try. Here are the answers to your
questions/tests in order:
- I tried mpiexec and the same issue occurred.
- While the job is listed as running I checked all the nodes. None of them
have processes spawned. I have no idea on the hydra process.
- I have version
There's a problem with accounting_storage/mysql plugin:
$ sudo slurmdbd -D -
slurmdbd: debug: Log file re-opened
slurmdbd: pidfile not locked, assuming no running daemon
slurmdbd: debug3: Trying to load plugin /usr/lib/slurm/auth_munge.so
slurmdbd: debug: Munge authentication plugin loaded
We're running multiple clusters using Bright 8.x with Scientific Linux 7 (and
have run Scientific Linux releases 5 and 6 with Bright 5.0 and higher in the
past without issues on many different pieces of hardware) and never experienced
this. But some things to test :
- some implementations pref
$ systemctl status slurmdbd
● slurmdbd.service - Slurm DBD accounting daemon
Loaded: loaded (/etc/systemd/system/slurmdbd.service; enabled; vendor
preset: enabled)
Active: failed (Result: exit-code) since Tue 2019-12-10 13:33:28 MST;
40min ago
Process: 787 ExecStart=/usr/sbin/slurmdbd $SLUR
What do you get from
systemctl status slurmdbd
systemctl status slurmctld
I’m assuming at least slurmdbd isn’t running.
> On Dec 10, 2019, at 3:05 PM, Dean Schulze wrote:
>
> External Email Warning
> This email originated from outside the university. Please use caution when
> opening attachme
I'm trying to set up my first slurm installation following these
instructions:
https://github.com/nateGeorge/slurm_gpu_ubuntu
I've had to deviate a little bit because I'm using virtual machines that
don't have GPUs, so I don't have a gres.conf file and in
/etc/slurm/slurm.conf I don't have an ent
I have a 16 node HPC that is in the process of being upgraded from CentOS 6
to 7. All nodes are diskless and connected via 1Gbps Ethernet and FDR
Infiniband. I am using Bright Cluster Management to manage it and their
support has not found a solution to this problem.
For the most part the cluster i
Hi Angelines,
we create a job specific scratch directory in the prolog script but
use the task_prolog script to set the environment variable.
In prolog:
scratch_dir=/your/path
/bin/mkdir -p ${scratch_dir}
/bin/chmod 700 ${scratch_dir}
/bin/chown ${SLURM_JOB_USER} ${scratch_dir}
In task_prolog:
13 matches
Mail list logo