[committed] openmp: Improve #pragma omp simd vectorization
Hi! As mentioned earlier, the vectorizer punts on vectorization of loops with non-constant steps. As for OpenMP loops it is by the language restriction always possible to compute the number of loop iterations before the loop, this change helps those cases by computing it and using an alternate IV that iterates from 0 to < niterations with step of 1 next to the normal IV which will be just linear in that. List of functions where we compared to current trunk vectorize some loops where we previously didn't (for c-c++-common only listing the C function names, both C and C++ are affected though): gcc/testsuite/gcc.dg/vect/vect-simd-17.c doit gcc/testsuite/gcc.dg/vect/vect-simd-18.c foo gcc/testsuite/gcc.dg/vect/vect-simd-19.c foo gcc/testsuite/gcc.dg/vect/vect-simd-20.c foo libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_f_simd_auto libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_f_simd_guided32 libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_f_simd_runtime libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_f_simd_static libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_f_simd_static32 libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_pf_simd_auto._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_pf_simd_guided32._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_pf_simd_runtime._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_pf_simd_static32._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_pf_simd_static._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_simd_normal libgomp/testsuite/libgomp.c-c++-common/for-2.c f5_simd_normal libgomp/testsuite/libgomp.c-c++-common/for-2.c f6_simd_normal libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_dpfs_auto._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_dpfs_ds128_auto._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_dpfs_ds128_guided32._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_dpfs_ds128_runtime._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_dpfs_ds128_static32._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_dpfs_ds128_static._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_dpfs_guided32._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_dpfs_runtime._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_dpfs_static32._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_dpfs_static._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_ds_ds128_normal libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_ds_normal libgomp/testsuite/libgomp.c-c++-common/for-4.c f3_taskloop_simd_normal._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_tpf_simd_auto._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_tpf_simd_guided32._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_tpf_simd_runtime._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_tpf_simd_static32._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_tpf_simd_static._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_t_simd_normal._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttdpfs_auto._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttdpfs_ds128_auto._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttdpfs_ds128_guided32._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttdpfs_ds128_runtime._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttdpfs_ds128_static32._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttdpfs_ds128_static._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttdpfs_guided32._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttdpfs_runtime._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttdpfs_static32._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttdpfs_static._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttds_ds128_normal._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttds_normal._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-5.c f5_t_simd_normal._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-5.c f6_t_simd_normal._omp_fn.0 libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tdpfs_auto._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tdpfs_ds128_auto._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tdpfs_ds128_guided32._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tdpfs_ds128_runtime._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tdpfs_ds128_static32._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tdpfs_ds128_static._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tdpfs_guided32._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tdpfs_runtime._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tdpfs_static32._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tdpfs_static._omp_fn.1 libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tds_ds128_n
Implement iterative dataflow in modref to track parameters
Hi, this patchs finishes the parameter tracking by implementing the iterative dataflow in propagation stage. This is necessary since we now can propagate how the pointers are passed around recursive calls (as done in a testcase) and thus it is no-longer safe to simply merge all summaries in one SCC component of the call-graph. cc1plus stats are now: Alias oracle query stats: refs_may_alias_p: 62971744 disambiguations, 73160711 queries ref_maybe_used_by_call_p: 141176 disambiguations, 63867883 queries call_may_clobber_ref_p: 23573 disambiguations, 29322 queries nonoverlapping_component_refs_p: 0 disambiguations, 37720 queries nonoverlapping_refs_since_match_p: 19432 disambiguations, 55659 must overlaps, 75860 queries aliasing_component_refs_p: 54724 disambiguations, 753570 queries TBAA oracle: 24124230 disambiguations 56228428 queries 16058141 are in alias set 0 10338303 queries asked about the same object 125 queries asked about the same alias set 0 access volatile 3919230 are dependent in the DAG 1788399 are aritificially in conflict with void * Modref stats: modref use: 10408 disambiguations, 46993 queries modref clobber: 1418549 disambiguations, 1951251 queries 4898707 tbaa queries (2.510547 per modref query) 396878 base compares (0.203397 per modref query) PTA query stats: pt_solution_includes: 975364 disambiguations, 13604284 queries pt_solutions_intersect: 1026606 disambiguations, 13181198 queries So compared to https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554692.html we get 25% use disambiguations and 91% more clobber disambiguations. Tramp3d is Alias oracle query stats: refs_may_alias_p: 2056905 disambiguations, 2317461 queries ref_maybe_used_by_call_p: 7137 disambiguations, 2093762 queries call_may_clobber_ref_p: 234 disambiguations, 234 queries nonoverlapping_component_refs_p: 0 disambiguations, 4313 queries nonoverlapping_refs_since_match_p: 329 disambiguations, 10200 must overlaps, 10616 queries aliasing_component_refs_p: 858 disambiguations, 34600 queries TBAA oracle: 894996 disambiguations 1695991 queries 138346 are in alias set 0 470668 queries asked about the same object 0 queries asked about the same alias set 0 access volatile 191666 are dependent in the DAG 315 are aritificially in conflict with void * Modref stats: modref use: 842 disambiguations, 2265 queries modref clobber: 14833 disambiguations, 28900 queries 34884 tbaa queries (1.207059 per modref query) 5041 base compares (0.174429 per modref query) PTA query stats: pt_solution_includes: 313372 disambiguations, 525724 queries pt_solutions_intersect: 130374 disambiguations, 415138 queries So about twice many use and 40% clobber disambiguations. Bootstrapped/regtested x86_64-linux, I plan to commit it later today after more testing. 2020-09-26 Jan Hubicka * ipa-inline-transform.c: Include ipa-modref-tree.h and ipa-modref.h. (inline_call): Call ipa_merge_modref_summary_after_inlining. * ipa-inline.c (ipa_inline): Do not free summaries. * ipa-modref.c (dump_records): Fix formating. (merge_call_side_effects): Break out from ... (analyze_call): ... here; record recursive calls. (analyze_stmt): Add new parameter RECURSIVE_CALLS. (analyze_function): Do iterative dataflow on recursive calls. (compute_parm_map): New function. (ipa_merge_modref_summary_after_inlining): New function. (collapse_loads): New function. (modref_propagate_in_scc): Break out from ... (pass_ipa_modref::execute): ... here; Do iterative dataflow. * ipa-modref.h (ipa_merge_modref_summary_after_inlining): Declare. gcc/testsuite/ChangeLog: 2020-09-26 Jan Hubicka * gcc.dg/lto/modref-1_0.c: New test. * gcc.dg/lto/modref-1_1.c: New test. * gcc.dg/tree-ssa/modref-2.c: New test. diff --git a/gcc/ipa-inline-transform.c b/gcc/ipa-inline-transform.c index 5e37e612bfd..af2c2856aaa 100644 --- a/gcc/ipa-inline-transform.c +++ b/gcc/ipa-inline-transform.c @@ -48,6 +48,8 @@ along with GCC; see the file COPYING3. If not see #include "cfg.h" #include "basic-block.h" #include "ipa-utils.h" +#include "ipa-modref-tree.h" +#include "ipa-modref.h" int ncalls_inlined; int nfunctions_inlined; @@ -487,6 +489,7 @@ inline_call (struct cgraph_edge *e, bool update_original, gcc_assert (curr->callee->inlined_to == to); old_size = ipa_size_summaries->get (to)->size; + ipa_merge_modref_summary_after_inlining (e); ipa_merge_fn_summary_after_inlining (e); if (e->in_polymorphic_cdtor) mark_all_inlined_calls_cdtor (e->callee); diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c index c667de2a97c..225a0140725 100644 --- a/gcc/ipa-inline.c +++ b/gcc/ipa-inline.c @@ -2770,9 +2770,6 @@ ip
Re: Implement iterative dataflow in modref to track parameters
On September 26, 2020 10:20:20 AM GMT+02:00, Jan Hubicka wrote: >Hi, >this patchs finishes the parameter tracking by implementing the >iterative >dataflow in propagation stage. This is necessary since we now can >propagate how the pointers are passed around recursive calls (as done >in >a testcase) and thus it is no-longer safe to simply merge all summaries >in one SCC component of the call-graph. > >cc1plus stats are now: > >Alias oracle query stats: > refs_may_alias_p: 62971744 disambiguations, 73160711 queries > ref_maybe_used_by_call_p: 141176 disambiguations, 63867883 queries > call_may_clobber_ref_p: 23573 disambiguations, 29322 queries > nonoverlapping_component_refs_p: 0 disambiguations, 37720 queries >nonoverlapping_refs_since_match_p: 19432 disambiguations, 55659 must >overlaps, 75860 queries > aliasing_component_refs_p: 54724 disambiguations, 753570 queries > TBAA oracle: 24124230 disambiguations 56228428 queries > 16058141 are in alias set 0 > 10338303 queries asked about the same object > 125 queries asked about the same alias set > 0 access volatile > 3919230 are dependent in the DAG > 1788399 are aritificially in conflict with void * > >Modref stats: > modref use: 10408 disambiguations, 46993 queries > modref clobber: 1418549 disambiguations, 1951251 queries > 4898707 tbaa queries (2.510547 per modref query) > 396878 base compares (0.203397 per modref query) > >PTA query stats: > pt_solution_includes: 975364 disambiguations, 13604284 queries > pt_solutions_intersect: 1026606 disambiguations, 13181198 queries > >So compared to >https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554692.html we >get 25% >use disambiguations and 91% more clobber disambiguations. > >Tramp3d is > >Alias oracle query stats: > refs_may_alias_p: 2056905 disambiguations, 2317461 queries > ref_maybe_used_by_call_p: 7137 disambiguations, 2093762 queries > call_may_clobber_ref_p: 234 disambiguations, 234 queries > nonoverlapping_component_refs_p: 0 disambiguations, 4313 queries >nonoverlapping_refs_since_match_p: 329 disambiguations, 10200 must >overlaps, 10616 queries > aliasing_component_refs_p: 858 disambiguations, 34600 queries > TBAA oracle: 894996 disambiguations 1695991 queries > 138346 are in alias set 0 > 470668 queries asked about the same object > 0 queries asked about the same alias set > 0 access volatile > 191666 are dependent in the DAG > 315 are aritificially in conflict with void * > >Modref stats: > modref use: 842 disambiguations, 2265 queries > modref clobber: 14833 disambiguations, 28900 queries > 34884 tbaa queries (1.207059 per modref query) > 5041 base compares (0.174429 per modref query) > >PTA query stats: > pt_solution_includes: 313372 disambiguations, 525724 queries > pt_solutions_intersect: 130374 disambiguations, 415138 queries > >So about twice many use and 40% clobber disambiguations. > >Bootstrapped/regtested x86_64-linux, I plan to commit it later today >after >more testing. Any compile time figures for this? Firefox? > >2020-09-26 Jan Hubicka > > * ipa-inline-transform.c: Include ipa-modref-tree.h and ipa-modref.h. > (inline_call): Call ipa_merge_modref_summary_after_inlining. > * ipa-inline.c (ipa_inline): Do not free summaries. > * ipa-modref.c (dump_records): Fix formating. > (merge_call_side_effects): Break out from ... > (analyze_call): ... here; record recursive calls. > (analyze_stmt): Add new parameter RECURSIVE_CALLS. > (analyze_function): Do iterative dataflow on recursive calls. > (compute_parm_map): New function. > (ipa_merge_modref_summary_after_inlining): New function. > (collapse_loads): New function. > (modref_propagate_in_scc): Break out from ... > (pass_ipa_modref::execute): ... here; Do iterative dataflow. > * ipa-modref.h (ipa_merge_modref_summary_after_inlining): Declare. > >gcc/testsuite/ChangeLog: > >2020-09-26 Jan Hubicka > > * gcc.dg/lto/modref-1_0.c: New test. > * gcc.dg/lto/modref-1_1.c: New test. > * gcc.dg/tree-ssa/modref-2.c: New test. > >diff --git a/gcc/ipa-inline-transform.c b/gcc/ipa-inline-transform.c >index 5e37e612bfd..af2c2856aaa 100644 >--- a/gcc/ipa-inline-transform.c >+++ b/gcc/ipa-inline-transform.c >@@ -48,6 +48,8 @@ along with GCC; see the file COPYING3. If not see > #include "cfg.h" > #include "basic-block.h" > #include "ipa-utils.h" >+#include "ipa-modref-tree.h" >+#include "ipa-modref.h" > > int ncalls_inlined; > int nfunctions_inlined; >@@ -487,6 +489,7 @@ inline_call (struct cgraph_edge *e, bool >update_original, > gcc_assert (curr->callee->inlined_to == to); > > old_size = ipa_size_summaries->get (to)->size; >+ ipa_merge_modref_summary_after_inlining (e); > ipa_merge_fn_summary_after_inlining (e); > if (e->in_polymorphic_
Re: Fix handling of gimple_clobber in ipa_modref
> On September 26, 2020 12:04:24 AM GMT+02:00, Jan Hubicka > wrote: > >Hi, > >while adding check for gimple_clobber I reversed the return value > >so instead of ignoring the statement ipa-modref gives up. Fixed thus. > >This explains the drop between originally reported disambinguations > >stats and ones I got later. > > I don't think you can ignore clobbers. They are barriers for code motion. modref is (before and after patch) about 1.4% of the WPA time (2s). This is pretty the much cost of a single pass over the symbol table (other non-busy IPA passes takes about the same, ipa-comdat is fater with 0.7%). The iteration of dataflow happens only on non-trivial strongly connected components and at least for GCC alway terminates in 3 iterations (to trigger more one needs to function with many params with operation like shifting every param right. > > Richard. > > > >Bootstrapped/regtested x86_64-linux. > > > >gcc/ChangeLog: > > > >2020-09-25 Jan Hubicka > > > > * ipa-modref.c (analyze_stmt): Fix return value for gimple_clobber. > > > >diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c > >index aa6929ff010..44b844b90db 100644 > >--- a/gcc/ipa-modref.c > >+++ b/gcc/ipa-modref.c > >@@ -658,7 +658,7 @@ analyze_stmt (modref_summary *summary, gimple > >*stmt, bool ipa) > > { > > /* There is no need to record clobbers. */ > > if (gimple_clobber_p (stmt)) > >-return false; > >+return true; > > /* Analyze all loads and stores in STMT. */ > > walk_stmt_load_store_ops (stmt, summary, > > analyze_load, analyze_store); >
Re: [PATCH] optabs: Don't reuse target for multi-word expansions if it overlaps operand(s) [PR97073]
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and > release branches? > > 2020-09-18 Jakub Jelinek > > PR middle-end/97073 > * optabs.c (expand_binop, expand_absneg_bit, expand_unop, > expand_copysign_bit): Check reg_overlap_mentioned_p between target > and operand(s) and if it returns true, force a pseudo as target. > > * gcc.c-torture/execute/pr97073.c: New test. This looks good to me. -- Eric Botcazou
Re: dg-options after board/cflags
On Wed, 2 Sep 2020, Jose E. Marchesi via Gcc-patches wrote: > Your patch dealt with board/multilib_flags, but the same problem exists > for board/cflags and many other flag-containing options. What's the use case for that? IIUC board flags are supposed to be ones that are absolutely required for executables to run with a given board, such as multilib selection, special linker scripts, non-standard run-time library paths, etc. These must not be overridden by test cases or they will inevitably fail. Maciej
[PATCH] AIX: collect2 visibility
The code that collect2 generates, compiles and links into applications and shared libraries to initialize constructors and register DWARF tables is built with the compiler options used to invoke the linker. If the compiler options change the visibility from default, the library initialization routines will not be visible and this can prevent initialization. This patch checks if the command line sets visibiliity and then adds GCC pragmas to the initialization code generated by collect2 if necessary to define the visibility on global, exported functions as default. Bootstrapped on powerpc-ibm-aix7.2.0.0 Thanks, David gcc/ChangeLog: 2020-09-26 David Edelsohn Clement Chigot * collect2.c (visibility_flag): New. (main): Detect -fvisibility. (write_c_file_stat): Push and pop default visibility. 0001-aix-collect2-visibility.patch Description: Binary data
[Patch][nvptx] return true in libc_has_function for function_sincos
Found when looking at PR97203 (but having no effect there). The GCC ME optimizes with -O1 (or higher) the a = sinf(x) b = cosf(x) to __builtin_cexpi(x, &a, &b) (...i as in internal; like cexp(z) but with with __real__ z == 0) In expand_builtin_cexpi, that is handles as: if (optab_handler (sincos_optab, mode) != CODE_FOR_nothing) ... else if (targetm.libc_has_function (function_sincos)) ... else fn = builtin_decl_explicit (BUILT_IN_CEXPF); And the latter is done. As newlib's cexpf does not know that __real__ z == 0, it calculates 'r = expf (__real__ z)' before invoking sinf and cosf on __imag__ z. Thus, it is much faster to call 'sincosf', which also exists in newlib. Solution: Return true for targetm.libc_has_function (function_sincos). NOTE: With -funsafe-math-optimizations (-O0 or higher), sinf/cosf and sincosf invoke .sin.approx/.cos/.approx instead of doing a library call. OK? Tobias - Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander Walter [nvptx] return true in libc_has_function for function_sincos gcc/ChangeLog: * config/nvptx/nvptx.c (nvptx_libc_has_function): New. (TARGET_LIBC_HAS_FUNCTION): Redefine to new func. diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c index 54b1fdf669b..d4b0de30ff1 100644 --- a/gcc/config/nvptx/nvptx.c +++ b/gcc/config/nvptx/nvptx.c @@ -6536,6 +6536,21 @@ nvptx_set_current_function (tree fndecl) oacc_bcast_partition = 0; } +/* By default we assume that c99 functions are present at the runtime, + including sincos which is excluded in default_libc_has_function. */ +bool +nvptx_libc_has_function (enum function_class fn_class) +{ + if (fn_class == function_c94 + || fn_class == function_c99_misc + || fn_class == function_c99_math_complex + || fn_class == function_sincos) +return true; + + return false; +} + + #undef TARGET_OPTION_OVERRIDE #define TARGET_OPTION_OVERRIDE nvptx_option_override @@ -6681,6 +6696,9 @@ nvptx_set_current_function (tree fndecl) #undef TARGET_SET_CURRENT_FUNCTION #define TARGET_SET_CURRENT_FUNCTION nvptx_set_current_function +#undef TARGET_LIBC_HAS_FUNCTION +#define TARGET_LIBC_HAS_FUNCTION nvptx_libc_has_function + struct gcc_target targetm = TARGET_INITIALIZER; #include "gt-nvptx.h"
[committed] libstdc++: Use __libc_single_threaded to optimise atomics [PR 96817]
Glibc 2.32 adds a global variable that says whether the process is single-threaded. We can use this to decide whether to elide atomic operations, as a more precise and reliable indicator than __gthread_active_p. This means that guard variables for statics and reference counting in shared_ptr can use less expensive, non-atomic ops even in processes that are linked to libpthread, as long as no threads have been created yet. It also means that we switch to using atomics if libpthread gets loaded later via dlopen (this still isn't supported in general, for other reasons). We can't use __libc_single_threaded to replace __gthread_active_p everywhere. If we replaced the uses of __gthread_active_p in std::mutex then we would elide the pthread_mutex_lock in the code below, but not the pthread_mutex_unlock: std::mutex m; m.lock();// pthread_mutex_lock std::thread t([]{}); // __libc_single_threaded = false t.join(); m.unlock(); // pthread_mutex_unlock We need the lock and unlock to use the same "is threading enabled" predicate, and similarly for init/destroy pairs for mutexes and condition variables, so that we don't try to release resources that were never acquired. There are other places that could use __libc_single_threaded, such as _Sp_locker in src/c++11/shared_ptr.cc and locale init functions, but they can be changed later. libstdc++-v3/ChangeLog: PR libstdc++/96817 * include/ext/atomicity.h (__gnu_cxx::__is_single_threaded()): New function wrapping __libc_single_threaded if available. (__exchange_and_add_dispatch, __atomic_add_dispatch): Use it. * libsupc++/guard.cc (__cxa_guard_acquire, __cxa_guard_abort) (__cxa_guard_release): Likewise. * testsuite/18_support/96817.cc: New test. Tested powerpc64le-linux, with glibc 2.31 and 2.32. Committed to trunk. commit e6923541fae5081b646f240d54de2a32e17a0382 Author: Jonathan Wakely Date: Sat Sep 26 20:32:36 2020 libstdc++: Use __libc_single_threaded to optimise atomics [PR 96817] Glibc 2.32 adds a global variable that says whether the process is single-threaded. We can use this to decide whether to elide atomic operations, as a more precise and reliable indicator than __gthread_active_p. This means that guard variables for statics and reference counting in shared_ptr can use less expensive, non-atomic ops even in processes that are linked to libpthread, as long as no threads have been created yet. It also means that we switch to using atomics if libpthread gets loaded later via dlopen (this still isn't supported in general, for other reasons). We can't use __libc_single_threaded to replace __gthread_active_p everywhere. If we replaced the uses of __gthread_active_p in std::mutex then we would elide the pthread_mutex_lock in the code below, but not the pthread_mutex_unlock: std::mutex m; m.lock();// pthread_mutex_lock std::thread t([]{}); // __libc_single_threaded = false t.join(); m.unlock(); // pthread_mutex_unlock We need the lock and unlock to use the same "is threading enabled" predicate, and similarly for init/destroy pairs for mutexes and condition variables, so that we don't try to release resources that were never acquired. There are other places that could use __libc_single_threaded, such as _Sp_locker in src/c++11/shared_ptr.cc and locale init functions, but they can be changed later. libstdc++-v3/ChangeLog: PR libstdc++/96817 * include/ext/atomicity.h (__gnu_cxx::__is_single_threaded()): New function wrapping __libc_single_threaded if available. (__exchange_and_add_dispatch, __atomic_add_dispatch): Use it. * libsupc++/guard.cc (__cxa_guard_acquire, __cxa_guard_abort) (__cxa_guard_release): Likewise. * testsuite/18_support/96817.cc: New test. diff --git a/libstdc++-v3/include/ext/atomicity.h b/libstdc++-v3/include/ext/atomicity.h index 813ceb0bbf8..2d3e5fb0904 100644 --- a/libstdc++-v3/include/ext/atomicity.h +++ b/libstdc++-v3/include/ext/atomicity.h @@ -34,11 +34,27 @@ #include #include #include +#if __has_include() +# include +#endif namespace __gnu_cxx _GLIBCXX_VISIBILITY(default) { _GLIBCXX_BEGIN_NAMESPACE_VERSION + __attribute__((__always_inline__)) + inline bool + __is_single_threaded() _GLIBCXX_NOTHROW + { +#ifndef __GTHREADS +return true; +#elif __has_include() +return ::__libc_single_threaded; +#else +return !__gthread_active_p(); +#endif + } + // Functions for portable atomic access. // To abstract locking primitives across all thread policies, use: // __exchange_and_add_dispatch @@ -79,25 +95,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION __attribute__ ((__always_inline__)) __exchange_and_add_dispatch(_Atomic_word* __mem, int __val) { -#ifdef __G
Re: Fix handling of gimple_clobber in ipa_modref
> On September 26, 2020 12:04:24 AM GMT+02:00, Jan Hubicka > wrote: > >Hi, > >while adding check for gimple_clobber I reversed the return value > >so instead of ignoring the statement ipa-modref gives up. Fixed thus. > >This explains the drop between originally reported disambinguations > >stats and ones I got later. > > I don't think you can ignore clobbers. They are barriers for code motion. Hi, this is fix I have installed after lto-bootstrapping/regtesting. The statistics for cc1plus are almost unchanged that is sort of expected given that I only measure late optimization by getting dump from LTO. Thank for pointing this out, it may have triggered a nasty wrong code bug :) Honza Alias oracle query stats: refs_may_alias_p: 63013346 disambiguations, 73204989 queries ref_maybe_used_by_call_p: 141350 disambiguations, 63909728 queries call_may_clobber_ref_p: 23597 disambiguations, 29430 queries nonoverlapping_component_refs_p: 0 disambiguations, 37763 queries nonoverlapping_refs_since_match_p: 19444 disambiguations, 55671 must overlaps, 75884 queries aliasing_component_refs_p: 54749 disambiguations, 753947 queries TBAA oracle: 24159888 disambiguations 56277876 queries 16064485 are in alias set 0 10340953 queries asked about the same object 125 queries asked about the same alias set 0 access volatile 3920604 are dependent in the DAG 1791821 are aritificially in conflict with void * Modref stats: modref use: 10444 disambiguations, 46994 queries modref clobber: 1421468 disambiguations, 1954304 queries 4907798 tbaa queries (2.511277 per modref query) 396785 base compares (0.203031 per modref query) PTA query stats: pt_solution_includes: 976073 disambiguations, 13607833 queries pt_solutions_intersect: 1026016 disambiguations, 13185678 queries * ipa-modref.c (analyze_stmt): Do not skip clobbers in early pass. * ipa-pure-const.c (analyze_stmt): Update comment. diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c index 73a7900883a..728c6c1523d 100644 --- a/gcc/ipa-modref.c +++ b/gcc/ipa-modref.c @@ -676,13 +676,16 @@ static bool analyze_stmt (modref_summary *summary, gimple *stmt, bool ipa, vec *recursive_calls) { - /* There is no need to record clobbers. */ - if (gimple_clobber_p (stmt)) + /* In general we can not ignore clobbers because they are barries for code + motion, however after inlining it is safe to do becuase local optimization + passes do not consider clobbers from other functions. + Similar logic is in ipa-pure-consts. */ + if ((ipa || cfun->after_inlining) && gimple_clobber_p (stmt)) return true; + /* Analyze all loads and stores in STMT. */ walk_stmt_load_store_ops (stmt, summary, analyze_load, analyze_store); - /* or call analyze_load_ipa, analyze_store_ipa */ switch (gimple_code (stmt)) { @@ -705,7 +708,7 @@ analyze_stmt (modref_summary *summary, gimple *stmt, bool ipa, } } -/* Analyze function F. IPA indicates whether we're running in tree mode (false) +/* Analyze function F. IPA indicates whether we're running in local mode (false) or the IPA mode (true). */ static void diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c index bdbccd010dc..1af3206056e 100644 --- a/gcc/ipa-pure-const.c +++ b/gcc/ipa-pure-const.c @@ -742,6 +742,8 @@ check_stmt (gimple_stmt_iterator *gsip, funct_state local, bool ipa) /* Do consider clobber as side effects before IPA, so we rather inline C++ destructors and keep clobber semantics than eliminate them. + Similar logic is in ipa-modref. + TODO: We may get smarter during early optimizations on these and let functions containing only clobbers to be optimized more. This is a common case of C++ destructors. */
Re: [PATCH v7] genemit.c (main): split insn-emit.c for compiling parallelly
Hi, Has this patch been merged ? Jojo 在 2020年9月15日 +0800 PM5:16,Jojo R ,写道: > gcc/ChangeLog: > > * genemit.c (main): Print 'split line'. > * Makefile.in (insn-emit.c): Define split count and file > > --- > gcc/Makefile.in | 19 + > gcc/genemit.c | 104 +--- > 2 files changed, 83 insertions(+), 40 deletions(-) > > diff --git a/gcc/Makefile.in b/gcc/Makefile.in > index 79e854aa938..a7fcc7d5949 100644 > --- a/gcc/Makefile.in > +++ b/gcc/Makefile.in > @@ -1258,6 +1258,21 @@ ANALYZER_OBJS = \ > # We put the *-match.o and insn-*.o files first so that a parallel make > # will build them sooner, because they are large and otherwise tend to be > # the last objects to finish building. > + > +# target overrides > +-include $(tmake_file) > + > +INSN-GENERATED-SPLIT-NUM ?= 0 > + > +insn-generated-split-num = $(shell i=1; j=`expr $(INSN-GENERATED-SPLIT-NUM) > + 1`; \ > + while test $$i -le $$j; do \ > + echo $$i; i=`expr $$i + 1`; \ > + done) > + > +insn-emit-split-c := $(foreach o, $(shell for i in > $(insn-generated-split-num); do echo $$i; done), insn-emit$(o).c) > +insn-emit-split-obj = $(patsubst %.c,%.o, $(insn-emit-split-c)) > +$(insn-emit-split-c): insn-emit.c > + > OBJS = \ > gimple-match.o \ > generic-match.o \ > @@ -1265,6 +1280,7 @@ OBJS = \ > insn-automata.o \ > insn-dfatab.o \ > insn-emit.o \ > + $(insn-emit-split-obj) \ > insn-extract.o \ > insn-latencytab.o \ > insn-modes.o \ > @@ -2365,6 +2381,9 @@ $(simple_generated_c:insn-%.c=s-%): s-%: > build/gen%$(build_exeext) > $(RUN_GEN) build/gen$*$(build_exeext) $(md_file) \ > $(filter insn-conditions.md,$^) > tmp-$*.c > $(SHELL) $(srcdir)/../move-if-change tmp-$*.c insn-$*.c > + $*v=$$(echo $$(csplit insn-$*.c /parallel\ compilation/ -k -s > {$(INSN-GENERATED-SPLIT-NUM)} -f insn-$* -b "%d.c" 2>&1));\ > + [ ! "$$$*v" ] || grep "match not found" <<< $$$*v > + [ -s insn-$*0.c ] || (for i in $(insn-generated-split-num); do touch > insn-$*$$i.c; done && echo "" > insn-$*.c) > $(STAMP) s-$* > > # gencheck doesn't read the machine description, and the file produced > diff --git a/gcc/genemit.c b/gcc/genemit.c > index 84d07d388ee..54a0d909d9d 100644 > --- a/gcc/genemit.c > +++ b/gcc/genemit.c > @@ -847,24 +847,13 @@ handle_overloaded_gen (overloaded_name *oname) > } > } > > -int > -main (int argc, const char **argv) > -{ > - progname = "genemit"; > - > - if (!init_rtx_reader_args (argc, argv)) > - return (FATAL_EXIT_CODE); > - > -#define DEF_INTERNAL_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \ > - nofail_optabs[OPTAB##_optab] = true; > -#include "internal-fn.def" > - > - /* Assign sequential codes to all entries in the machine description > - in parallel with the tables in insn-output.c. */ > - > - printf ("/* Generated automatically by the program `genemit'\n\ > -from the machine description file `md'. */\n\n"); > +/* Print include header. */ > > +static void > +printf_include (void) > +{ > + printf ("/* Generated automatically by the program `genemit'\n" > + "from the machine description file `md'. */\n\n"); > printf ("#define IN_TARGET_CODE 1\n"); > printf ("#include \"config.h\"\n"); > printf ("#include \"system.h\"\n"); > @@ -900,35 +889,70 @@ from the machine description file `md'. */\n\n"); > printf ("#include \"tm-constrs.h\"\n"); > printf ("#include \"ggc.h\"\n"); > printf ("#include \"target.h\"\n\n"); > +} > > - /* Read the machine description. */ > +/* Generate the `gen_...' function from GET_CODE(). */ > > - md_rtx_info info; > - while (read_md_rtx (&info)) > - switch (GET_CODE (info.def)) > - { > - case DEFINE_INSN: > - gen_insn (&info); > - break; > +static void > +gen_md_rtx (md_rtx_info *info) > +{ > + switch (GET_CODE (info->def)) > + { > + case DEFINE_INSN: > + gen_insn (info); > + break; > > - case DEFINE_EXPAND: > - printf ("/* %s:%d */\n", info.loc.filename, info.loc.lineno); > - gen_expand (&info); > - break; > + case DEFINE_EXPAND: > + printf ("/* %s:%d */\n", info->loc.filename, info->loc.lineno); > + gen_expand (info); > + break; > > - case DEFINE_SPLIT: > - printf ("/* %s:%d */\n", info.loc.filename, info.loc.lineno); > - gen_split (&info); > - break; > + case DEFINE_SPLIT: > + printf ("/* %s:%d */\n", info->loc.filename, info->loc.lineno); > + gen_split (info); > + break; > > - case DEFINE_PEEPHOLE2: > - printf ("/* %s:%d */\n", info.loc.filename, info.loc.lineno); > - gen_split (&info); > - break; > + case DEFINE_PEEPHOLE2: > + printf ("/* %s:%d */\n", info->loc.filename, info->loc.lineno); > + gen_split (info); > + break; > > - default: > - break; > - } > + default: > + break; > + } > +} > + > +int > +main (int argc, const char **argv) > +{ > + progname = "genemit"; > + > + if (!init_rtx_reader_args (argc, argv)) > + return (FATAL_EXIT_CODE); > + > +#define DEF_INTERNAL_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \ > + nofail_optabs[OPTAB##_optab] = true; > +#include "internal-fn.def" > + > + /* Assign sequential codes to all entries in the machine description > + in parallel with the tabl
Re: [PATCH v4 1/3] IFN: Implement IFN_VEC_SET for ARRAY_REF with VIEW_CONVERT_EXPR
On 2020/9/25 21:28, Richard Sandiford wrote: > xionghu luo writes: >> @@ -2658,6 +2659,45 @@ expand_vect_cond_mask_optab_fn (internal_fn, gcall >> *stmt, convert_optab optab) >> >> #define expand_vec_cond_mask_optab_fn expand_vect_cond_mask_optab_fn >> >> +/* Expand VEC_SET internal functions. */ >> + >> +static void >> +expand_vec_set_optab_fn (internal_fn, gcall *stmt, convert_optab optab) >> +{ >> + tree lhs = gimple_call_lhs (stmt); >> + tree op0 = gimple_call_arg (stmt, 0); >> + tree op1 = gimple_call_arg (stmt, 1); >> + tree op2 = gimple_call_arg (stmt, 2); >> + rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE); >> + rtx src = expand_normal (op0); >> + >> + machine_mode outermode = TYPE_MODE (TREE_TYPE (op0)); >> + scalar_mode innermode = GET_MODE_INNER (outermode); >> + >> + rtx value = expand_expr (op1, NULL_RTX, VOIDmode, EXPAND_NORMAL); >> + rtx pos = expand_expr (op2, NULL_RTX, VOIDmode, EXPAND_NORMAL); > > These two can just use expand_normal. Might be easier to read if > they come immediately after the expand_normal (op0). > > LGTM with that change for the internal-fn.c stuff, thanks. > Thank you, updated and committed as r11-3486. Tested and confirmed Power/X86/ARM still not supporting vec_set with register index, so there are no ICE regressions caused by generating IFN VEC_SET but not properly expanded. Thanks, Xionghu