[committed] openmp: Improve #pragma omp simd vectorization

2020-09-26 Thread Jakub Jelinek via Gcc-patches
Hi!

As mentioned earlier, the vectorizer punts on vectorization of loops with 
non-constant
steps.  As for OpenMP loops it is by the language restriction always possible 
to compute
the number of loop iterations before the loop, this change helps those cases
by computing it and using an alternate IV that iterates from 0 to < niterations 
with
step of 1 next to the normal IV which will be just linear in that.

List of functions where we compared to current trunk vectorize some loops where 
we
previously didn't (for c-c++-common only listing the C function names, both C 
and C++
are affected though):

gcc/testsuite/gcc.dg/vect/vect-simd-17.c doit
gcc/testsuite/gcc.dg/vect/vect-simd-18.c foo
gcc/testsuite/gcc.dg/vect/vect-simd-19.c foo
gcc/testsuite/gcc.dg/vect/vect-simd-20.c foo
libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_f_simd_auto
libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_f_simd_guided32
libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_f_simd_runtime
libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_f_simd_static
libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_f_simd_static32
libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_pf_simd_auto._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_pf_simd_guided32._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_pf_simd_runtime._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_pf_simd_static32._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_pf_simd_static._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-2.c f3_simd_normal
libgomp/testsuite/libgomp.c-c++-common/for-2.c f5_simd_normal
libgomp/testsuite/libgomp.c-c++-common/for-2.c f6_simd_normal
libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_dpfs_auto._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_dpfs_ds128_auto._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_dpfs_ds128_guided32._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_dpfs_ds128_runtime._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_dpfs_ds128_static32._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_dpfs_ds128_static._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_dpfs_guided32._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_dpfs_runtime._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_dpfs_static32._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_dpfs_static._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_ds_ds128_normal
libgomp/testsuite/libgomp.c-c++-common/for-3.c f3_ds_normal
libgomp/testsuite/libgomp.c-c++-common/for-4.c f3_taskloop_simd_normal._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_tpf_simd_auto._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_tpf_simd_guided32._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_tpf_simd_runtime._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_tpf_simd_static32._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_tpf_simd_static._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_t_simd_normal._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttdpfs_auto._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttdpfs_ds128_auto._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-5.c 
f3_ttdpfs_ds128_guided32._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttdpfs_ds128_runtime._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-5.c 
f3_ttdpfs_ds128_static32._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttdpfs_ds128_static._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttdpfs_guided32._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttdpfs_runtime._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttdpfs_static32._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttdpfs_static._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttds_ds128_normal._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-5.c f3_ttds_normal._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-5.c f5_t_simd_normal._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-5.c f6_t_simd_normal._omp_fn.0
libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tdpfs_auto._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tdpfs_ds128_auto._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tdpfs_ds128_guided32._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tdpfs_ds128_runtime._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tdpfs_ds128_static32._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tdpfs_ds128_static._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tdpfs_guided32._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tdpfs_runtime._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tdpfs_static32._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tdpfs_static._omp_fn.1
libgomp/testsuite/libgomp.c-c++-common/for-6.c f3_tds_ds128_n

Implement iterative dataflow in modref to track parameters

2020-09-26 Thread Jan Hubicka
Hi,
this patchs finishes the parameter tracking by implementing the iterative
dataflow in propagation stage. This is necessary since we now can
propagate how the pointers are passed around recursive calls (as done in
a testcase) and thus it is no-longer safe to simply merge all summaries
in one SCC component of the call-graph.

cc1plus stats are now:

Alias oracle query stats:
  refs_may_alias_p: 62971744 disambiguations, 73160711 queries
  ref_maybe_used_by_call_p: 141176 disambiguations, 63867883 queries
  call_may_clobber_ref_p: 23573 disambiguations, 29322 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 37720 queries
  nonoverlapping_refs_since_match_p: 19432 disambiguations, 55659 must 
overlaps, 75860 queries
  aliasing_component_refs_p: 54724 disambiguations, 753570 queries
  TBAA oracle: 24124230 disambiguations 56228428 queries
   16058141 are in alias set 0
   10338303 queries asked about the same object
   125 queries asked about the same alias set
   0 access volatile
   3919230 are dependent in the DAG
   1788399 are aritificially in conflict with void *

Modref stats:
  modref use: 10408 disambiguations, 46993 queries
  modref clobber: 1418549 disambiguations, 1951251 queries
  4898707 tbaa queries (2.510547 per modref query)
  396878 base compares (0.203397 per modref query)

PTA query stats:
  pt_solution_includes: 975364 disambiguations, 13604284 queries
  pt_solutions_intersect: 1026606 disambiguations, 13181198 queries

So compared to
https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554692.html we get 25%
use disambiguations and 91% more clobber disambiguations.

Tramp3d is

Alias oracle query stats:
  refs_may_alias_p: 2056905 disambiguations, 2317461 queries
  ref_maybe_used_by_call_p: 7137 disambiguations, 2093762 queries
  call_may_clobber_ref_p: 234 disambiguations, 234 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 4313 queries
  nonoverlapping_refs_since_match_p: 329 disambiguations, 10200 must overlaps, 
10616 queries
  aliasing_component_refs_p: 858 disambiguations, 34600 queries
  TBAA oracle: 894996 disambiguations 1695991 queries
   138346 are in alias set 0
   470668 queries asked about the same object
   0 queries asked about the same alias set
   0 access volatile
   191666 are dependent in the DAG
   315 are aritificially in conflict with void *

Modref stats:
  modref use: 842 disambiguations, 2265 queries
  modref clobber: 14833 disambiguations, 28900 queries
  34884 tbaa queries (1.207059 per modref query)
  5041 base compares (0.174429 per modref query)

PTA query stats:
  pt_solution_includes: 313372 disambiguations, 525724 queries
  pt_solutions_intersect: 130374 disambiguations, 415138 queries

So about twice many use and 40% clobber disambiguations.

Bootstrapped/regtested x86_64-linux, I plan to commit it later today after
more testing.

2020-09-26  Jan Hubicka  

* ipa-inline-transform.c: Include ipa-modref-tree.h and ipa-modref.h.
(inline_call): Call ipa_merge_modref_summary_after_inlining.
* ipa-inline.c (ipa_inline): Do not free summaries.
* ipa-modref.c (dump_records): Fix formating.
(merge_call_side_effects): Break out from ...
(analyze_call): ... here; record recursive calls.
(analyze_stmt): Add new parameter RECURSIVE_CALLS.
(analyze_function): Do iterative dataflow on recursive calls.
(compute_parm_map): New function.
(ipa_merge_modref_summary_after_inlining): New function.
(collapse_loads): New function.
(modref_propagate_in_scc): Break out from ...
(pass_ipa_modref::execute): ... here; Do iterative dataflow.
* ipa-modref.h (ipa_merge_modref_summary_after_inlining): Declare.

gcc/testsuite/ChangeLog:

2020-09-26  Jan Hubicka  

* gcc.dg/lto/modref-1_0.c: New test.
* gcc.dg/lto/modref-1_1.c: New test.
* gcc.dg/tree-ssa/modref-2.c: New test.

diff --git a/gcc/ipa-inline-transform.c b/gcc/ipa-inline-transform.c
index 5e37e612bfd..af2c2856aaa 100644
--- a/gcc/ipa-inline-transform.c
+++ b/gcc/ipa-inline-transform.c
@@ -48,6 +48,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "cfg.h"
 #include "basic-block.h"
 #include "ipa-utils.h"
+#include "ipa-modref-tree.h"
+#include "ipa-modref.h"
 
 int ncalls_inlined;
 int nfunctions_inlined;
@@ -487,6 +489,7 @@ inline_call (struct cgraph_edge *e, bool update_original,
   gcc_assert (curr->callee->inlined_to == to);
 
   old_size = ipa_size_summaries->get (to)->size;
+  ipa_merge_modref_summary_after_inlining (e);
   ipa_merge_fn_summary_after_inlining (e);
   if (e->in_polymorphic_cdtor)
 mark_all_inlined_calls_cdtor (e->callee);
diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index c667de2a97c..225a0140725 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -2770,9 +2770,6 @@ ip

Re: Implement iterative dataflow in modref to track parameters

2020-09-26 Thread Richard Biener
On September 26, 2020 10:20:20 AM GMT+02:00, Jan Hubicka  wrote:
>Hi,
>this patchs finishes the parameter tracking by implementing the
>iterative
>dataflow in propagation stage. This is necessary since we now can
>propagate how the pointers are passed around recursive calls (as done
>in
>a testcase) and thus it is no-longer safe to simply merge all summaries
>in one SCC component of the call-graph.
>
>cc1plus stats are now:
>
>Alias oracle query stats:
>  refs_may_alias_p: 62971744 disambiguations, 73160711 queries
>  ref_maybe_used_by_call_p: 141176 disambiguations, 63867883 queries
>  call_may_clobber_ref_p: 23573 disambiguations, 29322 queries
>  nonoverlapping_component_refs_p: 0 disambiguations, 37720 queries
>nonoverlapping_refs_since_match_p: 19432 disambiguations, 55659 must
>overlaps, 75860 queries
>  aliasing_component_refs_p: 54724 disambiguations, 753570 queries
>  TBAA oracle: 24124230 disambiguations 56228428 queries
>   16058141 are in alias set 0
>   10338303 queries asked about the same object
>   125 queries asked about the same alias set
>   0 access volatile
>   3919230 are dependent in the DAG
>   1788399 are aritificially in conflict with void *
>
>Modref stats:
>  modref use: 10408 disambiguations, 46993 queries
>  modref clobber: 1418549 disambiguations, 1951251 queries
>  4898707 tbaa queries (2.510547 per modref query)
>  396878 base compares (0.203397 per modref query)
>
>PTA query stats:
>  pt_solution_includes: 975364 disambiguations, 13604284 queries
>  pt_solutions_intersect: 1026606 disambiguations, 13181198 queries
>
>So compared to
>https://gcc.gnu.org/pipermail/gcc-patches/2020-September/554692.html we
>get 25%
>use disambiguations and 91% more clobber disambiguations.
>
>Tramp3d is
>
>Alias oracle query stats:
>  refs_may_alias_p: 2056905 disambiguations, 2317461 queries
>  ref_maybe_used_by_call_p: 7137 disambiguations, 2093762 queries
>  call_may_clobber_ref_p: 234 disambiguations, 234 queries
>  nonoverlapping_component_refs_p: 0 disambiguations, 4313 queries
>nonoverlapping_refs_since_match_p: 329 disambiguations, 10200 must
>overlaps, 10616 queries
>  aliasing_component_refs_p: 858 disambiguations, 34600 queries
>  TBAA oracle: 894996 disambiguations 1695991 queries
>   138346 are in alias set 0
>   470668 queries asked about the same object
>   0 queries asked about the same alias set
>   0 access volatile
>   191666 are dependent in the DAG
>   315 are aritificially in conflict with void *
>
>Modref stats:
>  modref use: 842 disambiguations, 2265 queries
>  modref clobber: 14833 disambiguations, 28900 queries
>  34884 tbaa queries (1.207059 per modref query)
>  5041 base compares (0.174429 per modref query)
>
>PTA query stats:
>  pt_solution_includes: 313372 disambiguations, 525724 queries
>  pt_solutions_intersect: 130374 disambiguations, 415138 queries
>
>So about twice many use and 40% clobber disambiguations.
>
>Bootstrapped/regtested x86_64-linux, I plan to commit it later today
>after
>more testing.

Any compile time figures for this? Firefox? 

>
>2020-09-26  Jan Hubicka  
>
>   * ipa-inline-transform.c: Include ipa-modref-tree.h and ipa-modref.h.
>   (inline_call): Call ipa_merge_modref_summary_after_inlining.
>   * ipa-inline.c (ipa_inline): Do not free summaries.
>   * ipa-modref.c (dump_records): Fix formating.
>   (merge_call_side_effects): Break out from ...
>   (analyze_call): ... here; record recursive calls.
>   (analyze_stmt): Add new parameter RECURSIVE_CALLS.
>   (analyze_function): Do iterative dataflow on recursive calls.
>   (compute_parm_map): New function.
>   (ipa_merge_modref_summary_after_inlining): New function.
>   (collapse_loads): New function.
>   (modref_propagate_in_scc): Break out from ...
>   (pass_ipa_modref::execute): ... here; Do iterative dataflow.
>   * ipa-modref.h (ipa_merge_modref_summary_after_inlining): Declare.
>
>gcc/testsuite/ChangeLog:
>
>2020-09-26  Jan Hubicka  
>
>   * gcc.dg/lto/modref-1_0.c: New test.
>   * gcc.dg/lto/modref-1_1.c: New test.
>   * gcc.dg/tree-ssa/modref-2.c: New test.
>
>diff --git a/gcc/ipa-inline-transform.c b/gcc/ipa-inline-transform.c
>index 5e37e612bfd..af2c2856aaa 100644
>--- a/gcc/ipa-inline-transform.c
>+++ b/gcc/ipa-inline-transform.c
>@@ -48,6 +48,8 @@ along with GCC; see the file COPYING3.  If not see
> #include "cfg.h"
> #include "basic-block.h"
> #include "ipa-utils.h"
>+#include "ipa-modref-tree.h"
>+#include "ipa-modref.h"
> 
> int ncalls_inlined;
> int nfunctions_inlined;
>@@ -487,6 +489,7 @@ inline_call (struct cgraph_edge *e, bool
>update_original,
>   gcc_assert (curr->callee->inlined_to == to);
> 
>   old_size = ipa_size_summaries->get (to)->size;
>+  ipa_merge_modref_summary_after_inlining (e);
>   ipa_merge_fn_summary_after_inlining (e);
>   if (e->in_polymorphic_

Re: Fix handling of gimple_clobber in ipa_modref

2020-09-26 Thread Jan Hubicka
> On September 26, 2020 12:04:24 AM GMT+02:00, Jan Hubicka  
> wrote:
> >Hi,
> >while adding check for gimple_clobber I reversed the return value
> >so instead of ignoring the statement ipa-modref gives up.  Fixed thus.
> >This explains the drop between originally reported disambinguations
> >stats and ones I got later.
> 
> I don't think you can ignore clobbers. They are barriers for code motion. 
modref is (before and after patch) about 1.4% of the WPA time (2s). This
is pretty the much cost of a single pass over the symbol table (other
non-busy IPA passes takes about the same, ipa-comdat is fater with 0.7%).

The iteration of dataflow happens only on non-trivial strongly connected
components and at least for GCC alway terminates in 3 iterations (to
trigger more one needs to function with many params with operation like
shifting every param right.
> 
> Richard. 
> 
> 
> >Bootstrapped/regtested x86_64-linux.
> >
> >gcc/ChangeLog:
> >
> >2020-09-25  Jan Hubicka  
> >
> > * ipa-modref.c (analyze_stmt): Fix return value for gimple_clobber.
> >
> >diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
> >index aa6929ff010..44b844b90db 100644
> >--- a/gcc/ipa-modref.c
> >+++ b/gcc/ipa-modref.c
> >@@ -658,7 +658,7 @@ analyze_stmt (modref_summary *summary, gimple
> >*stmt, bool ipa)
> > {
> >   /* There is no need to record clobbers.  */
> >   if (gimple_clobber_p (stmt))
> >-return false;
> >+return true;
> >   /* Analyze all loads and stores in STMT.  */
> >   walk_stmt_load_store_ops (stmt, summary,
> > analyze_load, analyze_store);
> 


Re: [PATCH] optabs: Don't reuse target for multi-word expansions if it overlaps operand(s) [PR97073]

2020-09-26 Thread Eric Botcazou
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
> release branches?
> 
> 2020-09-18  Jakub Jelinek  
> 
>   PR middle-end/97073
>   * optabs.c (expand_binop, expand_absneg_bit, expand_unop,
>   expand_copysign_bit): Check reg_overlap_mentioned_p between target
>   and operand(s) and if it returns true, force a pseudo as target.
> 
>   * gcc.c-torture/execute/pr97073.c: New test.

This looks good to me.

-- 
Eric Botcazou




Re: dg-options after board/cflags

2020-09-26 Thread Maciej W. Rozycki
On Wed, 2 Sep 2020, Jose E. Marchesi via Gcc-patches wrote:

> Your patch dealt with board/multilib_flags, but the same problem exists
> for board/cflags and many other flag-containing options.

 What's the use case for that?  IIUC board flags are supposed to be ones 
that are absolutely required for executables to run with a given board, 
such as multilib selection, special linker scripts, non-standard run-time 
library paths, etc.  These must not be overridden by test cases or they 
will inevitably fail.

  Maciej


[PATCH] AIX: collect2 visibility

2020-09-26 Thread David Edelsohn via Gcc-patches
The code that collect2 generates, compiles and links into applications
and shared libraries to initialize constructors and register DWARF tables
is built with the compiler options used to invoke the linker.  If the
compiler options change the visibility from default, the library
initialization routines will not be visible and this can prevent
initialization.

This patch checks if the command line sets visibiliity and then adds
GCC pragmas to the initialization code generated by collect2 if
necessary to define the visibility on global, exported functions as default.

Bootstrapped on powerpc-ibm-aix7.2.0.0

Thanks, David

gcc/ChangeLog:

2020-09-26  David Edelsohn  
Clement Chigot  

* collect2.c (visibility_flag): New.
(main): Detect -fvisibility.
(write_c_file_stat): Push and pop default visibility.


0001-aix-collect2-visibility.patch
Description: Binary data


[Patch][nvptx] return true in libc_has_function for function_sincos

2020-09-26 Thread Tobias Burnus

Found when looking at PR97203 (but having no effect there).

The GCC ME optimizes with -O1 (or higher) the
  a = sinf(x)
  b = cosf(x)
to __builtin_cexpi(x, &a, &b)
(...i as in internal; like cexp(z) but with with __real__ z == 0)


In expand_builtin_cexpi, that is handles as:
  if (optab_handler (sincos_optab, mode) != CODE_FOR_nothing)
...
  else if (targetm.libc_has_function (function_sincos))
...
  else
fn = builtin_decl_explicit (BUILT_IN_CEXPF);

And the latter is done. As newlib's cexpf does not know that
__real__ z == 0, it calculates 'r = expf (__real__ z)' before
invoking sinf and cosf on __imag__ z.

Thus, it is much faster to call 'sincosf', which also exists
in newlib.

Solution: Return true for targetm.libc_has_function (function_sincos).


NOTE: With -funsafe-math-optimizations (-O0 or higher),
sinf/cosf and sincosf invoke .sin.approx/.cos/.approx instead of
doing a library call.

OK?

Tobias

-
Mentor Graphics (Deutschland) GmbH, Arnulfstraße 201, 80634 München / Germany
Registergericht München HRB 106955, Geschäftsführer: Thomas Heurung, Alexander 
Walter
[nvptx] return true in libc_has_function for function_sincos

gcc/ChangeLog:

	* config/nvptx/nvptx.c (nvptx_libc_has_function): New.
	(TARGET_LIBC_HAS_FUNCTION): Redefine to new func.

diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 54b1fdf669b..d4b0de30ff1 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -6536,6 +6536,21 @@ nvptx_set_current_function (tree fndecl)
   oacc_bcast_partition = 0;
 }
 
+/* By default we assume that c99 functions are present at the runtime,
+   including sincos which is excluded in default_libc_has_function.  */
+bool
+nvptx_libc_has_function (enum function_class fn_class)
+{
+  if (fn_class == function_c94
+  || fn_class == function_c99_misc
+  || fn_class == function_c99_math_complex
+  || fn_class == function_sincos)
+return true;
+
+  return false;
+}
+
+
 #undef TARGET_OPTION_OVERRIDE
 #define TARGET_OPTION_OVERRIDE nvptx_option_override
 
@@ -6681,6 +6696,9 @@ nvptx_set_current_function (tree fndecl)
 #undef TARGET_SET_CURRENT_FUNCTION
 #define TARGET_SET_CURRENT_FUNCTION nvptx_set_current_function
 
+#undef TARGET_LIBC_HAS_FUNCTION
+#define TARGET_LIBC_HAS_FUNCTION nvptx_libc_has_function
+
 struct gcc_target targetm = TARGET_INITIALIZER;
 
 #include "gt-nvptx.h"


[committed] libstdc++: Use __libc_single_threaded to optimise atomics [PR 96817]

2020-09-26 Thread Jonathan Wakely via Gcc-patches
Glibc 2.32 adds a global variable that says whether the process is
single-threaded. We can use this to decide whether to elide atomic
operations, as a more precise and reliable indicator than
__gthread_active_p.

This means that guard variables for statics and reference counting in
shared_ptr can use less expensive, non-atomic ops even in processes that
are linked to libpthread, as long as no threads have been created yet.
It also means that we switch to using atomics if libpthread gets loaded
later via dlopen (this still isn't supported in general, for other
reasons).

We can't use __libc_single_threaded to replace __gthread_active_p
everywhere. If we replaced the uses of __gthread_active_p in std::mutex
then we would elide the pthread_mutex_lock in the code below, but not
the pthread_mutex_unlock:

  std::mutex m;
  m.lock();// pthread_mutex_lock
  std::thread t([]{}); // __libc_single_threaded = false
  t.join();
  m.unlock();  // pthread_mutex_unlock

We need the lock and unlock to use the same "is threading enabled"
predicate, and similarly for init/destroy pairs for mutexes and
condition variables, so that we don't try to release resources that were
never acquired.

There are other places that could use __libc_single_threaded, such as
_Sp_locker in src/c++11/shared_ptr.cc and locale init functions, but
they can be changed later.

libstdc++-v3/ChangeLog:

PR libstdc++/96817
* include/ext/atomicity.h (__gnu_cxx::__is_single_threaded()):
New function wrapping __libc_single_threaded if available.
(__exchange_and_add_dispatch, __atomic_add_dispatch): Use it.
* libsupc++/guard.cc (__cxa_guard_acquire, __cxa_guard_abort)
(__cxa_guard_release): Likewise.
* testsuite/18_support/96817.cc: New test.

Tested powerpc64le-linux, with glibc 2.31 and 2.32. Committed to trunk.

commit e6923541fae5081b646f240d54de2a32e17a0382
Author: Jonathan Wakely 
Date:   Sat Sep 26 20:32:36 2020

libstdc++: Use __libc_single_threaded to optimise atomics [PR 96817]

Glibc 2.32 adds a global variable that says whether the process is
single-threaded. We can use this to decide whether to elide atomic
operations, as a more precise and reliable indicator than
__gthread_active_p.

This means that guard variables for statics and reference counting in
shared_ptr can use less expensive, non-atomic ops even in processes that
are linked to libpthread, as long as no threads have been created yet.
It also means that we switch to using atomics if libpthread gets loaded
later via dlopen (this still isn't supported in general, for other
reasons).

We can't use __libc_single_threaded to replace __gthread_active_p
everywhere. If we replaced the uses of __gthread_active_p in std::mutex
then we would elide the pthread_mutex_lock in the code below, but not
the pthread_mutex_unlock:

  std::mutex m;
  m.lock();// pthread_mutex_lock
  std::thread t([]{}); // __libc_single_threaded = false
  t.join();
  m.unlock();  // pthread_mutex_unlock

We need the lock and unlock to use the same "is threading enabled"
predicate, and similarly for init/destroy pairs for mutexes and
condition variables, so that we don't try to release resources that were
never acquired.

There are other places that could use __libc_single_threaded, such as
_Sp_locker in src/c++11/shared_ptr.cc and locale init functions, but
they can be changed later.

libstdc++-v3/ChangeLog:

PR libstdc++/96817
* include/ext/atomicity.h (__gnu_cxx::__is_single_threaded()):
New function wrapping __libc_single_threaded if available.
(__exchange_and_add_dispatch, __atomic_add_dispatch): Use it.
* libsupc++/guard.cc (__cxa_guard_acquire, __cxa_guard_abort)
(__cxa_guard_release): Likewise.
* testsuite/18_support/96817.cc: New test.

diff --git a/libstdc++-v3/include/ext/atomicity.h 
b/libstdc++-v3/include/ext/atomicity.h
index 813ceb0bbf8..2d3e5fb0904 100644
--- a/libstdc++-v3/include/ext/atomicity.h
+++ b/libstdc++-v3/include/ext/atomicity.h
@@ -34,11 +34,27 @@
 #include 
 #include 
 #include 
+#if __has_include()
+# include 
+#endif
 
 namespace __gnu_cxx _GLIBCXX_VISIBILITY(default)
 {
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
+  __attribute__((__always_inline__))
+  inline bool
+  __is_single_threaded() _GLIBCXX_NOTHROW
+  {
+#ifndef __GTHREADS
+return true;
+#elif __has_include()
+return ::__libc_single_threaded;
+#else
+return !__gthread_active_p();
+#endif
+  }
+
   // Functions for portable atomic access.
   // To abstract locking primitives across all thread policies, use:
   // __exchange_and_add_dispatch
@@ -79,25 +95,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __attribute__ ((__always_inline__))
   __exchange_and_add_dispatch(_Atomic_word* __mem, int __val)
   {
-#ifdef __G

Re: Fix handling of gimple_clobber in ipa_modref

2020-09-26 Thread Jan Hubicka
> On September 26, 2020 12:04:24 AM GMT+02:00, Jan Hubicka  
> wrote:
> >Hi,
> >while adding check for gimple_clobber I reversed the return value
> >so instead of ignoring the statement ipa-modref gives up.  Fixed thus.
> >This explains the drop between originally reported disambinguations
> >stats and ones I got later.
> 
> I don't think you can ignore clobbers. They are barriers for code motion. 

Hi,
this is fix I have installed after lto-bootstrapping/regtesting.
The statistics for cc1plus are almost unchanged that is sort of expected
given that I only measure late optimization by getting dump from LTO.

Thank for pointing this out, it may have triggered a nasty wrong code
bug :)

Honza

Alias oracle query stats:
  refs_may_alias_p: 63013346 disambiguations, 73204989 queries
  ref_maybe_used_by_call_p: 141350 disambiguations, 63909728 queries
  call_may_clobber_ref_p: 23597 disambiguations, 29430 queries
  nonoverlapping_component_refs_p: 0 disambiguations, 37763 queries
  nonoverlapping_refs_since_match_p: 19444 disambiguations, 55671 must 
overlaps, 75884 queries
  aliasing_component_refs_p: 54749 disambiguations, 753947 queries
  TBAA oracle: 24159888 disambiguations 56277876 queries
   16064485 are in alias set 0
   10340953 queries asked about the same object
   125 queries asked about the same alias set
   0 access volatile
   3920604 are dependent in the DAG
   1791821 are aritificially in conflict with void *

Modref stats:
  modref use: 10444 disambiguations, 46994 queries
  modref clobber: 1421468 disambiguations, 1954304 queries
  4907798 tbaa queries (2.511277 per modref query)
  396785 base compares (0.203031 per modref query)

PTA query stats:
  pt_solution_includes: 976073 disambiguations, 13607833 queries
  pt_solutions_intersect: 1026016 disambiguations, 13185678 queries

* ipa-modref.c (analyze_stmt): Do not skip clobbers in early pass.
* ipa-pure-const.c (analyze_stmt): Update comment.

diff --git a/gcc/ipa-modref.c b/gcc/ipa-modref.c
index 73a7900883a..728c6c1523d 100644
--- a/gcc/ipa-modref.c
+++ b/gcc/ipa-modref.c
@@ -676,13 +676,16 @@ static bool
 analyze_stmt (modref_summary *summary, gimple *stmt, bool ipa,
  vec  *recursive_calls)
 {
-  /* There is no need to record clobbers.  */
-  if (gimple_clobber_p (stmt))
+  /* In general we can not ignore clobbers because they are barries for code
+ motion, however after inlining it is safe to do becuase local optimization
+ passes do not consider clobbers from other functions.
+ Similar logic is in ipa-pure-consts.  */
+  if ((ipa || cfun->after_inlining) && gimple_clobber_p (stmt))
 return true;
+
   /* Analyze all loads and stores in STMT.  */
   walk_stmt_load_store_ops (stmt, summary,
analyze_load, analyze_store);
-  /* or call analyze_load_ipa, analyze_store_ipa */
 
   switch (gimple_code (stmt))
{
@@ -705,7 +708,7 @@ analyze_stmt (modref_summary *summary, gimple *stmt, bool 
ipa,
}
 }
 
-/* Analyze function F.  IPA indicates whether we're running in tree mode 
(false)
+/* Analyze function F.  IPA indicates whether we're running in local mode 
(false)
or the IPA mode (true).  */
 
 static void
diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index bdbccd010dc..1af3206056e 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -742,6 +742,8 @@ check_stmt (gimple_stmt_iterator *gsip, funct_state local, 
bool ipa)
   /* Do consider clobber as side effects before IPA, so we rather inline
  C++ destructors and keep clobber semantics than eliminate them.
 
+ Similar logic is in ipa-modref.
+
  TODO: We may get smarter during early optimizations on these and let
  functions containing only clobbers to be optimized more.  This is a common
  case of C++ destructors.  */


Re: [PATCH v7] genemit.c (main): split insn-emit.c for compiling parallelly

2020-09-26 Thread Jojo R
Hi,

Has this patch been merged ?

Jojo
在 2020年9月15日 +0800 PM5:16,Jojo R ,写道:
> gcc/ChangeLog:
>
> * genemit.c (main): Print 'split line'.
> * Makefile.in (insn-emit.c): Define split count and file
>
> ---
> gcc/Makefile.in | 19 +
> gcc/genemit.c | 104 +---
> 2 files changed, 83 insertions(+), 40 deletions(-)
>
> diff --git a/gcc/Makefile.in b/gcc/Makefile.in
> index 79e854aa938..a7fcc7d5949 100644
> --- a/gcc/Makefile.in
> +++ b/gcc/Makefile.in
> @@ -1258,6 +1258,21 @@ ANALYZER_OBJS = \
> # We put the *-match.o and insn-*.o files first so that a parallel make
> # will build them sooner, because they are large and otherwise tend to be
> # the last objects to finish building.
> +
> +# target overrides
> +-include $(tmake_file)
> +
> +INSN-GENERATED-SPLIT-NUM ?= 0
> +
> +insn-generated-split-num = $(shell i=1; j=`expr $(INSN-GENERATED-SPLIT-NUM) 
> + 1`; \
> + while test $$i -le $$j; do \
> + echo $$i; i=`expr $$i + 1`; \
> + done)
> +
> +insn-emit-split-c := $(foreach o, $(shell for i in 
> $(insn-generated-split-num); do echo $$i; done), insn-emit$(o).c)
> +insn-emit-split-obj = $(patsubst %.c,%.o, $(insn-emit-split-c))
> +$(insn-emit-split-c): insn-emit.c
> +
> OBJS = \
> gimple-match.o \
> generic-match.o \
> @@ -1265,6 +1280,7 @@ OBJS = \
> insn-automata.o \
> insn-dfatab.o \
> insn-emit.o \
> + $(insn-emit-split-obj) \
> insn-extract.o \
> insn-latencytab.o \
> insn-modes.o \
> @@ -2365,6 +2381,9 @@ $(simple_generated_c:insn-%.c=s-%): s-%: 
> build/gen%$(build_exeext)
> $(RUN_GEN) build/gen$*$(build_exeext) $(md_file) \
> $(filter insn-conditions.md,$^) > tmp-$*.c
> $(SHELL) $(srcdir)/../move-if-change tmp-$*.c insn-$*.c
> + $*v=$$(echo $$(csplit insn-$*.c /parallel\ compilation/ -k -s 
> {$(INSN-GENERATED-SPLIT-NUM)} -f insn-$* -b "%d.c" 2>&1));\
> + [ ! "$$$*v" ] || grep "match not found" <<< $$$*v
> + [ -s insn-$*0.c ] || (for i in $(insn-generated-split-num); do touch 
> insn-$*$$i.c; done && echo "" > insn-$*.c)
> $(STAMP) s-$*
>
> # gencheck doesn't read the machine description, and the file produced
> diff --git a/gcc/genemit.c b/gcc/genemit.c
> index 84d07d388ee..54a0d909d9d 100644
> --- a/gcc/genemit.c
> +++ b/gcc/genemit.c
> @@ -847,24 +847,13 @@ handle_overloaded_gen (overloaded_name *oname)
> }
> }
>
> -int
> -main (int argc, const char **argv)
> -{
> - progname = "genemit";
> -
> - if (!init_rtx_reader_args (argc, argv))
> - return (FATAL_EXIT_CODE);
> -
> -#define DEF_INTERNAL_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \
> - nofail_optabs[OPTAB##_optab] = true;
> -#include "internal-fn.def"
> -
> - /* Assign sequential codes to all entries in the machine description
> - in parallel with the tables in insn-output.c. */
> -
> - printf ("/* Generated automatically by the program `genemit'\n\
> -from the machine description file `md'. */\n\n");
> +/* Print include header. */
>
> +static void
> +printf_include (void)
> +{
> + printf ("/* Generated automatically by the program `genemit'\n"
> + "from the machine description file `md'. */\n\n");
> printf ("#define IN_TARGET_CODE 1\n");
> printf ("#include \"config.h\"\n");
> printf ("#include \"system.h\"\n");
> @@ -900,35 +889,70 @@ from the machine description file `md'. */\n\n");
> printf ("#include \"tm-constrs.h\"\n");
> printf ("#include \"ggc.h\"\n");
> printf ("#include \"target.h\"\n\n");
> +}
>
> - /* Read the machine description. */
> +/* Generate the `gen_...' function from GET_CODE(). */
>
> - md_rtx_info info;
> - while (read_md_rtx (&info))
> - switch (GET_CODE (info.def))
> - {
> - case DEFINE_INSN:
> - gen_insn (&info);
> - break;
> +static void
> +gen_md_rtx (md_rtx_info *info)
> +{
> + switch (GET_CODE (info->def))
> + {
> + case DEFINE_INSN:
> + gen_insn (info);
> + break;
>
> - case DEFINE_EXPAND:
> - printf ("/* %s:%d */\n", info.loc.filename, info.loc.lineno);
> - gen_expand (&info);
> - break;
> + case DEFINE_EXPAND:
> + printf ("/* %s:%d */\n", info->loc.filename, info->loc.lineno);
> + gen_expand (info);
> + break;
>
> - case DEFINE_SPLIT:
> - printf ("/* %s:%d */\n", info.loc.filename, info.loc.lineno);
> - gen_split (&info);
> - break;
> + case DEFINE_SPLIT:
> + printf ("/* %s:%d */\n", info->loc.filename, info->loc.lineno);
> + gen_split (info);
> + break;
>
> - case DEFINE_PEEPHOLE2:
> - printf ("/* %s:%d */\n", info.loc.filename, info.loc.lineno);
> - gen_split (&info);
> - break;
> + case DEFINE_PEEPHOLE2:
> + printf ("/* %s:%d */\n", info->loc.filename, info->loc.lineno);
> + gen_split (info);
> + break;
>
> - default:
> - break;
> - }
> + default:
> + break;
> + }
> +}
> +
> +int
> +main (int argc, const char **argv)
> +{
> + progname = "genemit";
> +
> + if (!init_rtx_reader_args (argc, argv))
> + return (FATAL_EXIT_CODE);
> +
> +#define DEF_INTERNAL_OPTAB_FN(NAME, FLAGS, OPTAB, TYPE) \
> + nofail_optabs[OPTAB##_optab] = true;
> +#include "internal-fn.def"
> +
> + /* Assign sequential codes to all entries in the machine description
> + in parallel with the tabl

Re: [PATCH v4 1/3] IFN: Implement IFN_VEC_SET for ARRAY_REF with VIEW_CONVERT_EXPR

2020-09-26 Thread xionghu luo via Gcc-patches



On 2020/9/25 21:28, Richard Sandiford wrote:
> xionghu luo  writes:
>> @@ -2658,6 +2659,45 @@ expand_vect_cond_mask_optab_fn (internal_fn, gcall 
>> *stmt, convert_optab optab)
>>   
>>   #define expand_vec_cond_mask_optab_fn expand_vect_cond_mask_optab_fn
>>   
>> +/* Expand VEC_SET internal functions.  */
>> +
>> +static void
>> +expand_vec_set_optab_fn (internal_fn, gcall *stmt, convert_optab optab)
>> +{
>> +  tree lhs = gimple_call_lhs (stmt);
>> +  tree op0 = gimple_call_arg (stmt, 0);
>> +  tree op1 = gimple_call_arg (stmt, 1);
>> +  tree op2 = gimple_call_arg (stmt, 2);
>> +  rtx target = expand_expr (lhs, NULL_RTX, VOIDmode, EXPAND_WRITE);
>> +  rtx src = expand_normal (op0);
>> +
>> +  machine_mode outermode = TYPE_MODE (TREE_TYPE (op0));
>> +  scalar_mode innermode = GET_MODE_INNER (outermode);
>> +
>> +  rtx value = expand_expr (op1, NULL_RTX, VOIDmode, EXPAND_NORMAL);
>> +  rtx pos = expand_expr (op2, NULL_RTX, VOIDmode, EXPAND_NORMAL);
> 
> These two can just use expand_normal.  Might be easier to read if
> they come immediately after the expand_normal (op0).
> 
> LGTM with that change for the internal-fn.c stuff, thanks.
> 

Thank you, updated and committed as r11-3486.  Tested and confirmed 
Power/X86/ARM
still not supporting vec_set with register index, so there are no ICE 
regressions
caused by generating IFN VEC_SET but not properly expanded.


Thanks,
Xionghu