[Bug libgomp/115367] New: The implementation of OMP_DYNAMIC is not dynamic

mail+gcc at nh2 dot me via Gcc-bugs Wed, 05 Jun 2024 19:00:02 -0700

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115367

Bug ID: 115367
Summary: The implementation of OMP_DYNAMIC is not dynamic
Product: gcc
Version: 15.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: libgomp
Assignee: unassigned at gcc dot gnu.org
Reporter: mail+gcc at nh2 dot me
CC: jakub at gcc dot gnu.org
Target Milestone: ---

Please see:

"Why does my OpenMP app sometimes use only 1 thread, sometimes 3, sometimes all
cores?"

https://stackoverflow.com/questions/78584145/why-does-my-openmp-app-sometimes-use-only-1-thread-sometimes-3-sometimes-all-c/78584146

OMP_DYNAMIC is implemented like this (on Linux, likely other platforms):

https://github.com/gcc-mirror/gcc/blob/10cb3336ba1ac89b258f627222e668b023a6d3d4/libgomp/config/linux/proc.c#L180-L188

/* When OMP_DYNAMIC is set, at thread launch determine the number of
threads we should spawn for this team. */
/* ??? I have no idea what best practice for this is. Surely some
function of the number of processors that are *still* online and
the load average. Here I use the number of processors online
minus the 15 minute load average. */

unsigned
gomp_dynamic_max_threads (void) {
// ...
return n_onln - loadavg;

### `OMP_DYNAMIC` (of `libgomp`) is really bad

* Because of this logic, your app will use only 1 thread, even though the
system is completely idle _now_, just because it was busy 10 minutes ago.
* The dynamic limit is determined _at process start_ (loading time), and fixed
forever.
* So started programs stay slow _forever_ if they were started at a time 5
minutes after the system was busy.
* It means a server can never achieve full utilisation when working down a
queue of jobs.
* Say you have 8 cores, and a queue of N jobs to process, each of which takes
15 minutes full-CPU.
* The first jobs starts at 0 15-min-utilisation, thus using all cores.
* The next job starts, using only 1 core.
* The next job starts, using only 7 cores.
* The next job starts, using only 1 cores.
* The next job starts, using only 7 cores.
* ...
* In the long run, the **server uses only half of its cores on average**.
* It makes performance behaviour completely irreproducible.

None of this behaviour is documented

* in [`libgomp`](https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fDYNAMIC.html)
* or in the [OpenMP spec for
`OMP_DYNAMIC`](https://www.openmp.org/spec-html/5.0/openmpse51.html).

Those docs sound like the behaviour is nice "runtime-dynamic" when in fact it
is fixed across the process's liftime, and based on ultra-slow rolling
averages.

I argue that libgomp does not implement the OpenMP spec well here.

It says

> OpenMP implementation may adjust the number of threads to use for executing
> parallel regions in order to optimize the use of system resources

Thus suggests that the OpenMP implementation may do something sensible to
adjust the number of threads "DYNAMIC"ally.

Nowhere does it say that it should determine this at the start of the process,
and never adjust it again.

That's the opposite of "dynamic"!

And then combined with a very-much-not-dynamic 15 minutes delay.

I read the spec text as "do something sensible like GNU make, which checks the
(short-term!) loadavg()" of the current system periodically and ajusts its
parallelism accordingly".

[Bug libgomp/115367] New: The implementation of OMP_DYNAMIC is not dynamic

Reply via email to