> On Thu, Sep 4, 2025 at 6:07 PM Sebastian Pop <[email protected]> wrote:
>
> This patch adds runtime thread count detection to auto-parallelization.
>
> -ftree-parallelize-loops=0 option generates parallelized loops without
>
> specifying a fixed thread count, deferring this decision to program execution
>
> time where it is controlled by the OMP_NUM_THREADS environment variable.
>
>
>
> The patch changes:
>
>
>
> 1. Flag semantics:
>
> - Default (-1): auto-parallelization disabled.
>
> - 0: runtime thread detection via OMP_NUM_THREADS.
>
> - N>1: fixed thread count (no change to previous behavior.)
>
>
>
> 2. Gate condition: allow pass execution for flag == 0 || flag > 1.
>
>
>
> 3. OpenMP builtin enablement: enable for flag >= 0 instead of > 1.
>
>
>
> 4. Thread count handling: when flag == 0, set n_threads=0 and omit
>
> num_threads clause, letting OpenMP runtime determine thread count.
>
>
>
> 5. Profitability checks: bypass thread-count-dependent checks when
> n_threads=0.
>
>
>
> 6. Driver integration: automatically link libgomp and enable pthread
>
> support when -ftree-parallelize-loops=0 is used.
diff --git a/gcc/builtins.def b/gcc/builtins.def
index f6f3e104f6a..c4d86654aeb 100644
--- a/gcc/builtins.def
+++ b/gcc/builtins.def
@@ -223,7 +223,7 @@ along with GCC; see the file COPYING3. If not see
false, true, true, ATTRS, false, \
(flag_openacc \
|| flag_openmp \
- || flag_tree_parallelize_loops > 1))
+ || flag_tree_parallelize_loops >= 0))
this changes behavior for == 1.
I must say I find =0 a bad "special case" value as user-interface.
Can you instead add an
alias for -ftree-parallelize-loops with auto-detection semantics and leave
the new special value undocumented?
If you'd chose -1U as the "hidden" special value for autodetection you
would not have
to alter any code checking for flag_tree_parallelize_loops as "flag",
nor change its default.
Richard.
>
>
> Bootstrap and regression tested on aarch64-linux. Compiled SPEC HPC pot3d
>
> https://www.spec.org/hpc2021/docs/benchmarks/628.pot3d_s.html with
>
> -ftree-parallelize-loops=0 and tested without having OMP_NUM_THREADS set in
> the
>
> environment and with OMP_NUM_THREADS set to different values.
>
>
>
> gcc/ChangeLog:
>
>
>
> * builtins.def (DEF_GOMP_BUILTIN): Enable OpenMP builtins for
>
> flag_tree_parallelize_loops >= 0.
>
> * common.opt (ftree-parallelize-loops): Change initial value to -1.
>
> * gcc/doc/invoke.texi(ftree-parallelize-loops=n): Document possible
>
> values for variable n.
>
> * gcc.cc (LINK_SPEC): Add automatic libgomp linking for
>
> -ftree-parallelize-loops=0.
>
> (GOMP_SELF_SPECS): Add automatic pthread linking for
>
> -ftree-parallelize-loops=0.
>
> * tree-parloops.cc (create_parallel_loop): Generate a "#pragma omp
>
> parallel" without num_threads(x) clause when n_threads is zero.
>
> (gen_parallel_loop): Use a conservative value of 2 for the auto-
>
> parallelization cost model in case it is a runtime check.
>
> (parallelize_loops): Handle flag_tree_parallelize_loops == 0 as
>
> n_threads = 0.
>
> (gate): Execute the pass when flag_tree_parallelize_loops >= 0.
>
>
>
> gcc/testsuite/ChangeLog:
>
>
>
> * gcc.dg/autopar/runtime-threads-1.c: New test.
>
>
>
> Signed-off-by: Sebastian Pop [email protected]
>
>