https://gcc.gnu.org/bugzilla/show_bug.cgi?id=115367
Bug ID: 115367 Summary: The implementation of OMP_DYNAMIC is not dynamic Product: gcc Version: 15.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: libgomp Assignee: unassigned at gcc dot gnu.org Reporter: mail+gcc at nh2 dot me CC: jakub at gcc dot gnu.org Target Milestone: --- Please see: "Why does my OpenMP app sometimes use only 1 thread, sometimes 3, sometimes all cores?" https://stackoverflow.com/questions/78584145/why-does-my-openmp-app-sometimes-use-only-1-thread-sometimes-3-sometimes-all-c/78584146 OMP_DYNAMIC is implemented like this (on Linux, likely other platforms): https://github.com/gcc-mirror/gcc/blob/10cb3336ba1ac89b258f627222e668b023a6d3d4/libgomp/config/linux/proc.c#L180-L188 /* When OMP_DYNAMIC is set, at thread launch determine the number of threads we should spawn for this team. */ /* ??? I have no idea what best practice for this is. Surely some function of the number of processors that are *still* online and the load average. Here I use the number of processors online minus the 15 minute load average. */ unsigned gomp_dynamic_max_threads (void) { // ... return n_onln - loadavg; ### `OMP_DYNAMIC` (of `libgomp`) is really bad * Because of this logic, your app will use only 1 thread, even though the system is completely idle _now_, just because it was busy 10 minutes ago. * The dynamic limit is determined _at process start_ (loading time), and fixed forever. * So started programs stay slow _forever_ if they were started at a time 5 minutes after the system was busy. * It means a server can never achieve full utilisation when working down a queue of jobs. * Say you have 8 cores, and a queue of N jobs to process, each of which takes 15 minutes full-CPU. * The first jobs starts at 0 15-min-utilisation, thus using all cores. * The next job starts, using only 1 core. * The next job starts, using only 7 cores. * The next job starts, using only 1 cores. * The next job starts, using only 7 cores. * ... * In the long run, the **server uses only half of its cores on average**. * It makes performance behaviour completely irreproducible. None of this behaviour is documented * in [`libgomp`](https://gcc.gnu.org/onlinedocs/libgomp/OMP_005fDYNAMIC.html) * or in the [OpenMP spec for `OMP_DYNAMIC`](https://www.openmp.org/spec-html/5.0/openmpse51.html). Those docs sound like the behaviour is nice "runtime-dynamic" when in fact it is fixed across the process's liftime, and based on ultra-slow rolling averages. I argue that libgomp does not implement the OpenMP spec well here. It says > OpenMP implementation may adjust the number of threads to use for executing > parallel regions in order to optimize the use of system resources Thus suggests that the OpenMP implementation may do something sensible to adjust the number of threads "DYNAMIC"ally. Nowhere does it say that it should determine this at the start of the process, and never adjust it again. That's the opposite of "dynamic"! And then combined with a very-much-not-dynamic 15 minutes delay. I read the spec text as "do something sensible like GNU make, which checks the (short-term!) loadavg()" of the current system periodically and ajusts its parallelism accordingly".