Re: [slurm-users] GPUs not available after making use of all threads?

Diego Zuccato Mon, 13 Feb 2023 21:59:14 -0800

I think that's incorrect:
> The concept of hyper-threading is not doubling cores. It is a single
> core that can 'instantly' switch work from one process to another.
> Only one is being worked on at any given time.

A core can have multiple (usually 2) independent execution pipelines, sothat multiple instructions from different threads run concurrently. Itdoes not switch from one to the other.But it does have some shared resources, like the MMU and sometimes theFPU (maybe only on older AMD processors). Having a single MMU means thatall the instructions running on a core must have the same "view" of thememory space, and that means that they must come from a single process.IOW that they're multiple threads of a single process.

If the sw you're going to run makes good use of multithreading, havinghyperthreading can pe a great boost. If the sw only uses multitasking,then hyperthreading is a net loss (not only you can't use half theavailable threads, you also usually get slower clock speeds).


Diego

Il 13/02/2023 15:29, Brian Andrus ha scritto:

Hermann makes a good point.
The concept of hyper-threading is not doubling cores. It is a singlecore that can 'instantly' switch work from one process to another. Onlyone is being worked on at any given time.
So if I request a single core on a hyper-threaded system, I would not bepleased to find you are giving it to someone else 1/2 the time. I wouldneed to have the actual core assigned. If I request multiple cores andmy app is only going to affect itself, then I _may_ benefit fromhyper-threading.
In general, enabling hyper-threading is not the best practice forefficient HPC jobs. The goal is that every process is utilizing the CPUas close to 100% as possible, which would render hyper-threading moot.
Brian Andrus

On 2/13/2023 12:15 AM, Hermann Schwärzler wrote:
Hi Sebastian,

I am glad I could help (although not exactly as expected :-).
With your node-configuration you are "circumventing" how Slurmbehaves, when using "CR_Core": if you read the respective part in
https://slurm.schedmd.com/slurm.conf.html

it says:

"CR_Core
[...] On nodes with hyper-threads, each thread is counted as a CPUto satisfy a job's resource requirement, but multiple jobs are notallocated threads on the same core."
That's why you got a full core (both threads) when allocating a singeCPU. Or e.g. four threads when allocating three CPUs asf.
"Lying" to Slurm about the actual hardware-setup helps to avoid thisbehaviour but are you really confident with potentially running twodifferent jobs on the hyper-threads of the same core?
Regards,
Hermann

On 2/12/23 22:04, Sebastian Schmutzhard-Höfler wrote:
Hi Hermann,

Using your suggested settings did not work for us.
When trying to allocate a single thread with --cpus-per-task=1, itstill reserved a whole CPU (two threads). On the other hand, whenrequesting an even number of threads, it does what it should.
However, I could make it work by using

SelectTypeParameters=CR_Core
NodeName=nodename Sockets=2 CoresPerSocket=128 ThreadsPerCore=1

instead of

SelectTypeParameters=CR_Core
NodeName=nodename Sockets=2 CoresPerSocket=64 ThreadsPerCore=2

So your suggestion brought me in the right direction. Thanks!

If anyone thinks this is complete nonsense, please let me know!

Best wishes,

Sebastian

On 11.02.23 11:13, Hermann Schwärzler wrote:
Hi Sebastian,

we did a similar thing just recently.

We changed our node settings from
NodeName=DEFAULT CPUs=64 Boards=1 SocketsPerBoard=2CoresPerSocket=32 ThreadsPerCore=2
to
NodeName=DEFAULT Boards=1 SocketsPerBoard=2 CoresPerSocket=32ThreadsPerCore=2
in order to make use of individual hyper-threads possible (we usethis in combination with
SelectTypeParameters=CR_Core_Memory).
This works as expected: after this, when e.g. asking for--cpus-per-task=4 you will get 4 hyper-threads (2 cores) per task(unless you also specify e.g. "--hint=nomultithread").
So you might try to remove the "CPUs=256" part of yournode-specification to let Slurm do that calculation of the number ofCPUs itself.
BTW: on a side-note: as most of our users do not bother to usehyper-threads or even do not want to as their programs might sufferfrom doing so, we made "--hint=nomultithread" the default in ourinstallation by adding
CliFilterPlugins=cli_filter/lua
to our slurm.conf and creating a cli_filter.lua file in the samedirectory as slurm.conf, that contains this
function slurm_cli_setup_defaults(options, early_pass)
        options['hint'] = 'nomultithread'

        return slurm.SUCCESS
end
(see alsohttps://github.com/SchedMD/slurm/blob/master/etc/cli_filter.lua.example).So if user really want to use hyper-threads they have to add"--hint=multithread" to their job/allocation-options.
Regards,
Hermann

On 2/10/23 00:31, Sebastian Schmutzhard-Höfler wrote:
Dear all,
we have a node with 2 x 64 CPUs (with two threads each) and 8 GPUs,running slurm 22.05.5
In order to make use of individual threads, we changed|
|

|SelectTypeParameters=CR_Core||
NodeName=nodename CPUs=256 Sockets=2 CoresPerSocket=64ThreadsPerCore=2 |
to

|SelectTypeParameters=CR_CPU NodeName=nodename CPUs=256|
We are now able to allocate individual threads to jobs, despite thefollowing error in slurmd.log:
error: Node configuration differs from hardware: CPUs=256:256(hw)Boards=1:1(hw) SocketsPerBoard=256:2(hw) CoresPerSocket=1:64(hw)ThreadsPerCore=1:2(hw)
However, it appears that since this change, we can only make use of4 out of the 8 GPUs.
The output of "sinfo -o %G" might be relevant.

In the first situation it was

$ sinfo -o %G
GRES
gpu:A100:8(S:0,1)

Now it is:

$ sinfo -o %G
GRES
gpu:A100:8(S:0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110,112,114,116,118,120,122,124,126)
||Has anyone faced this or a similar issue and can give me somedirections?
Best wishes

Sebastian

||


--
Diego Zuccato
DIFA - Dip. di Fisica e Astronomia
Servizi Informatici
Alma Mater Studiorum - Università di Bologna
V.le Berti-Pichat 6/2 - 40127 Bologna - Italy
tel.: +39 051 20 95786

Re: [slurm-users] GPUs not available after making use of all threads?

Reply via email to