From what I know of how this works, no, it’s not getting it from a local file or the master node. I don’t believe it even makes a network connection, nor requires a slurm.conf in order to run. If you can run it fresh on a node with no config and that’s what it comes up with, it’s probably getting it from the VM somehow.
-- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `' > On Mar 11, 2020, at 10:26 AM, mike tie <m...@carleton.edu> wrote: > > > Yep, slurmd -C is obviously getting the data from somewhere, either a local > file or from the master node. hence my email to the group; I was hoping > that someone would just say: "yeah, modify file xxxx". But oh well. I'll > start playing with strace and gdb later this week; looking through the > source might also be helpful. > > I'm not cloning existing virtual machines with slurm. I have access to a > vmware system that from time to time isn't running at full capacity; usage > is stable for blocks of a month or two at a time, so my thought/plan was to > spin up a slurm compute node on it, and resize it appropriately every few > months (why not put it to work). I started with 10 cores, and it looks like > I can up it to 16 cores for a while, and that's when I ran into the problem. > > -mike > > > > Michael Tie > Technical Director > Mathematics, Statistics, and Computer Science > > One North College Street phn: 507-222-4067 > Northfield, MN 55057 cel: 952-212-8933 > m...@carleton.edu fax: 507-222-4312 > > > > On Wed, Mar 11, 2020 at 1:15 AM Kirill 'kkm' Katsnelson <k...@pobox.com> > wrote: > On Tue, Mar 10, 2020 at 1:41 PM mike tie <m...@carleton.edu> wrote: > Here is the output of lstopo > > $ lstopo -p > Machine (63GB) > Package P#0 + L3 (16MB) > L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#0 + PU P#0 > L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#1 + PU P#1 > L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#2 + PU P#2 > L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#3 + PU P#3 > Package P#1 + L3 (16MB) > L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#0 + PU P#4 > L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#1 + PU P#5 > L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#2 + PU P#6 > L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#3 + PU P#7 > Package P#2 + L3 (16MB) > L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#0 + PU P#8 > L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#1 + PU P#9 > L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#2 + PU P#10 > L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#3 + PU P#11 > Package P#3 + L3 (16MB) > L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#0 + PU P#12 > L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#1 + PU P#13 > L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#2 + PU P#14 > L2 (4096KB) + L1d (32KB) + L1i (32KB) + Core P#3 + PU P#15 > > There is no sane way to derive the number 10 from this topology. obviously: > it has a prime factor of 5, but everything in the lstopo output is sized in > powers of 2 (4 packages, a.k.a. sockets, 4 single-threaded CPU cores per). > > I responded yesterday but somehow managed to plop my signature into the > middle of it, so maybe you have missed inline replies? > > It's very, very likely that the number is stored *somewhere*. First to > eliminate is the hypothesis that the number is acquired from the control > daemon. That's the simplest step and the largest landgrab in the > divide-and-conquer analysis plan. Then just look where it comes from on the > VM. strace(1) will reveal all files slurmd reads. > > You are not rolling out the VMs from an image, ain't you? I'm wondering why > do you need to tweak an existing VM that is already in a weird state. Is > simply setting its snapshot aside and creating a new one from an image > hard/impossible? I did not touch VMWare for more than 10 years, so I may be a > bit naive; in the platform I'm working now (GCE), create-use-drop pattern of > VM use is much more common and simpler than create and maintain it to either > *ad infinitum* or *ad nauseam*, whichever will have been reached the > earliest. But I don't know anything about VMWare; maybe it's not possible or > feasible with it. > > -kkm >