Hi,
what is the output of the command:
slurmd -C rocks7
Best regards
Werner
On 05/05/2018 06:56 PM, Mahmood Naderan wrote:
Quick follow up.
I see the Sockets for the head node is 1 while for the compute nodes
is 32. And I think that is the reason, why slurm only see one cpu
(CPUTot=1).
M
On Sunday, 6 May 2018 2:56:51 AM AEST Mahmood Naderan wrote:
> May I ask what is the difference between CPUs and Sockets in slurm.conf?
CPUs are processor cores (on a package) and the socket is the package that
goes into the motherboard (which these days usually has its own memory
controller wh
On Sunday, 6 May 2018 2:00:44 AM AEST Eric F. Alemany wrote:
> Working on weekends - hey ?
[...]
This isn't my work. ;-)
> It seems as the commands give different result (?) - What do you think ?
Very very interesting - both slurmd and lscpu report 32 cores, but with
differing interpretations
On Sunday, 6 May 2018 11:02:25 AM AEST Kenneth Russell wrote:
> If you find that you have the same problem as me , you can use the script
> file below to automate the reinstall process. As I said in my original
> note, this is a very inefficient way to run slurm
That script looks very very weird.
On 06/05/18 11:26, Will Dennis wrote:
1) I am not sure Slurm can run “all-in-one” with
controller/worker/acctg-db all on one host… If anyone else know if
this is doable, please chime in (I actually have a request to do this
for a single machine at work, where the researchers want to have many
fo
A few thoughts…
1) I am not sure Slurm can run “all-in-one” with controller/worker/acctg-db all
on one host… If anyone else know if this is doable, please chime in (I actually
have a request to do this for a single machine at work, where the researchers
want to have many folks share a single GP
Sent from Mailspring
(https://link.getmailspring.com/link/1525569570.local-0ef17352-66af-v1.2.1-7e744...@getmailspring.com/0?redirect=https%3A%2F%2Fgetmailspring.com%2F&recipient=c2x1cm0tdXNlcnNAbGlzdHMuc2NoZWRtZC5jb20%3D),
the best free email app for work
-- Forwarded message
Sent from Mailspring
(https://link.getmailspring.com/link/1525569168.local-b056fee5-454d-v1.2.1-7e744...@getmailspring.com/0?redirect=https%3A%2F%2Fgetmailspring.com%2F&recipient=c2x1cm0tdXNlcnNAbGlzdHMuc2NoZWRtZC5jb20%3D),
the best free email app for work
-- Forwarded message
Eric, I had already installed the latest version of slurm (V17.11.5). I
followed you advice and upgraded Ubuntu server to V 18.04. That didn't solve
the problem.
To install slurm I used the instructions in the web site:
https://github.com//mknoxnv/ubuntu-slurm/blob/master/REEADME.mdEric, I had
Hi Kenneth,
The pidfile is just a record that says what is the pid of slurmctld or slurmdbd
or whatever daemon. It is used by systemd and gets created automatically. The
only thing you could worry about is the parent directory of the pidfile, but
not having a pidfile doesn't block the daemon fro
Hi Ken
I am in the same boat as you are meaning that I am also new to SLURM.
This is what I've done from good recommendation.
Install Ubuntu 18.04 on your servers which just got released last week.
Apparently the ubuntu 16.04 package of SLURM is outdated.
Install slurm-llnl on headnode/master
Ins
I am a new slurm user and am trying to set up a single node test system. I have
spent endless hours trying to get slurm services to start. I am running Ubuntu
Server V16.04 and slurm 17.11.5. My MB has an AMD 8 core processor. When I try
to start slurmdbd or slurmctld services I get messages say
Quick follow up.
I see the Sockets for the head node is 1 while for the compute nodes
is 32. And I think that is the reason, why slurm only see one cpu
(CPUTot=1).
May I ask what is the difference between CPUs and Sockets in slurm.conf?
Regards,
Mahmood
On Sat, May 5, 2018 at 9:24 PM, Mahmood
Hi,
I also have the same problem. I think by default, slurm won't add the
head node as a compute node. I manually set the state to resume,
However, the number of cores is still low (1) and not what I specified
in slurm.conf
[root@rocks7 mahmood]# scontrol show node rocks7
NodeName=rocks7 Arch=x86
Hi Chris,
Working on weekends - hey ?
when i do "slurmd -C” on one of my execute node, i get:
eric@radonc01:~$ slurmd -C
NodeName=radonc01 slurmd: Considering each NUMA node as a socket
CPUs=32 Boards=1 SocketsPerBoard=4 CoresPerSocket=8 ThreadsPerCore=1
RealMemory=64402
UpTime=2-17:35:12
Al
On Thursday, 26 April 2018 3:28:19 AM AEST Cory Holcomb wrote:
> It appears that I have a configuration that only takes into account the
> allocated memory before dispatching.
With batch systems the idea is for the users to set constraints for their jobs
so the scheduler can backfill other jobs
On Wednesday, 2 May 2018 11:04:34 PM AEST R. Paul Wiegand wrote:
> When I set "--gres=gpu:1", the slurmd log does have encouraging lines such
> as:
>
> [2018-05-02T08:47:04.916] [203.0] debug: Allowing access to device
> /dev/nvidia0 for job
> [2018-05-02T08:47:04.916] [203.0] debug: Not allowi
On Thursday, 3 May 2018 1:23:44 PM AEST Brendan Moloney wrote:
> I upgraded somewhat recently from 17.02 to 17.11, but I am not positive if
> this bug is new or just went unnoticed previously.
There is a known deadlock bug in 17.11.x which can happen for certain
workloads, hopefully fixed in 17.
On Thursday, 3 May 2018 10:59:50 PM AEST John DeSantis wrote:
> So, has anyone else run into a similar issue?
No, but...
> I'm using slurm 16.05.10-2 and slurmdbd 16.05.10-2.
...you're on a very old version of Slurm with a known security problem in its
slurmdbd, and you can't even download tha
On Saturday, 5 May 2018 2:45:19 AM AEST Eric F. Alemany wrote:
> With Ray suggestion i have a error message for each nodes. Here i am giving
> you only one error message from a node.
> sacct: error: NodeNames=radonc01 CPUs=32 doesn't match
> Sockets*CoresPerSocket*ThreadsPerCore (16), resetting CP
On Thursday, 3 May 2018 10:28:46 AM AEST Matt Hohmeister wrote:
> …and it looks good, except for the drain on my server/compute node:
I think if you've had the config wrong at some point in the past then slurmctld
will remember the error and you'll need to manually clear it with:
scontrol updat
On Saturday, 5 May 2018 12:43:32 AM AEST Benjamin Rampe wrote:
> I haven't found anything in the documentation that talks about
> limitations regarding job accounting.
Yeah, the documentation is pretty poor on this. :-(
The best I can find is this email to the old slurm-dev list from 6 years ago
22 matches
Mail list logo