[PATCH] fIx PR92704

2019-11-29 Thread Richard Biener


Bootstrapped / tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-11-29  Richard Biener  

PR tree-optimization/92704
* tree-if-conv.c (combine_blocks): Deal with virtual PHIs
in loops performing only loads.

* gcc.dg/torture/pr92704.c: New testcase.

Index: gcc/tree-if-conv.c
===
--- gcc/tree-if-conv.c  (revision 278807)
+++ gcc/tree-if-conv.c  (working copy)
@@ -2624,6 +2624,11 @@ combine_blocks (class loop *loop)
   vphi = get_virtual_phi (bb);
   if (vphi)
{
+ /* When there's just loads inside the loop a stray virtual
+PHI merging the uses can appear, update last_vdef from
+it.  */
+ if (!last_vdef)
+   last_vdef = gimple_phi_arg_def (vphi, 0);
  imm_use_iterator iter;
  use_operand_p use_p;
  gimple *use_stmt;
@@ -2655,6 +2660,10 @@ combine_blocks (class loop *loop)
  if (gimple_vdef (stmt))
last_vdef = gimple_vdef (stmt);
}
+ else
+   /* If this is the first load we arrive at update last_vdef
+  so we handle stray PHIs correctly.  */
+   last_vdef = gimple_vuse (stmt);
  if (predicated[i])
{
  ssa_op_iter i;
Index: gcc/testsuite/gcc.dg/torture/pr92704.c
===
--- gcc/testsuite/gcc.dg/torture/pr92704.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr92704.c  (working copy)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fexceptions -fnon-call-exceptions -fno-tree-dce 
-ftree-loop-if-convert" } */
+int zr, yx;
+
+void __attribute__ ((simd))
+oj (int rd, int q7)
+{
+  int wo = (__UINTPTR_TYPE__)&rd;
+
+  while (q7 < 1)
+{
+  int kv;
+  short int v3;
+
+  for (v3 = 0; v3 < 82; v3 += 3)
+{
+}
+
+  kv = zr ? 0 : v3;
+  yx = kv < rd;
+  zr = zr && yx;
+  ++q7;
+}
+}


Don't pass booleans as mask types to simd clones (PR 92710)

2019-11-29 Thread Richard Sandiford
In this PR we assigned a vector mask type to the result of a comparison
and then tried to pass that mask type to a simd clone, which expected
a normal (non-mask) type instead.

This patch simply punts on call arguments that have a mask type.
A better fix would be to pattern-match the comparison to a COND_EXPR,
like we would if the comparison was stored to memory, but doing that
isn't gcc 9 or 10 material.

Note that this doesn't affect x86_64-linux-gnu because the ABI promotes
bool arguments to ints.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK for trunk
and gcc-9-branch?

Richard


2019-11-30  Richard Sandiford  

gcc/
PR tree-optimization/92710
* tree-vect-stmts.c (vectorizable_simd_clone_call): Reject
vector mask arguments.

gcc/testsuite/
PR tree-optimization/92710
* gcc.dg/vect/pr92710.c: New test.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2019-11-22 09:57:59.194224976 +
+++ gcc/tree-vect-stmts.c   2019-11-29 08:28:12.015121876 +
@@ -3925,7 +3925,16 @@ vectorizable_simd_clone_call (stmt_vec_i
  || thisarginfo.dt == vect_external_def)
gcc_assert (thisarginfo.vectype == NULL_TREE);
   else
-   gcc_assert (thisarginfo.vectype != NULL_TREE);
+   {
+ gcc_assert (thisarginfo.vectype != NULL_TREE);
+ if (VECTOR_BOOLEAN_TYPE_P (thisarginfo.vectype))
+   {
+ if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"vector mask arguments are not supported\n");
+ return false;
+   }
+   }
 
   /* For linear arguments, the analyze phase should have saved
 the base and step in STMT_VINFO_SIMD_CLONE_INFO.  */
Index: gcc/testsuite/gcc.dg/vect/pr92710.c
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/vect/pr92710.c 2019-11-29 08:28:12.011121905 +
@@ -0,0 +1,12 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-fopenmp-simd" } */
+
+#pragma omp declare simd
+_Bool foo (_Bool) __attribute__((const));
+
+void
+f (_Bool *restrict x, char *restrict y, char *restrict z)
+{
+  for (int i = 0; i < 128; ++i)
+x[i] = foo (y[i] == z[i]);
+}


Re: [SVE] PR89007 - Implement generic vector average expansion

2019-11-29 Thread Prathamesh Kulkarni
On Fri, 22 Nov 2019 at 17:09, Prathamesh Kulkarni
 wrote:
>
> On Wed, 20 Nov 2019 at 16:54, Richard Biener  wrote:
> >
> > On Wed, 20 Nov 2019, Richard Sandiford wrote:
> >
> > > Hi,
> > >
> > > Thanks for doing this.  Adding Richard on cc:, since the SVE subject
> > > tag might have put him off.  There's not really anything SVE-specific
> > > here apart from the testcase.
> >
> > Ah.
> >
> > > > 2019-11-19  Prathamesh Kulkarni  
> > > >
> > > > PR tree-optimization/89007
> > > > * tree-vect-patterns.c (vect_recog_average_pattern): If there is no
> > > > target support available, generate code to distribute rshift over 
> > > > plus
> > > > and add one depending upon floor or ceil rounding.
> > > >
> > > > testsuite/
> > > > * gcc.target/aarch64/sve/pr89007.c: New test.
> > > >
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c 
> > > > b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c
> > > > new file mode 100644
> > > > index 000..32095c63c61
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c
> > > > @@ -0,0 +1,29 @@
> > > > +/* { dg-do assemble { target aarch64_asm_sve_ok } } */
> > > > +/* { dg-options "-O -ftree-vectorize -march=armv8.2-a+sve 
> > > > --save-temps" } */
> > > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > > +
> > > > +#define N 1024
> > > > +unsigned char dst[N];
> > > > +unsigned char in1[N];
> > > > +unsigned char in2[N];
> > > > +
> > > > +/*
> > > > +**  foo:
> > > > +** ...
> > > > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > > > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > > > +** add (z[0-9]+\.b), \1, \2
> > > > +** orr (z[0-9]+)\.d, z[0-9]+\.d, z[0-9]+\.d
> > > > +** and (z[0-9]+\.b), \4\.b, #0x1
> > > > +** add z0.b, \3, \5
> > >
> > > It'd probably be more future-proof to allow (\1, \2|\2, \1) and
> > > (\3, \5|\5, \3).  Same for the other testcase.
> > >
> > > > +** ...
> > > > +*/
> > > > +void
> > > > +foo ()
> > > > +{
> > > > +  for( int x = 0; x < N; x++ )
> > > > +dst[x] = (in1[x] + in2[x] + 1) >> 1;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not {\tuunpklo\t} } } */
> > > > +/* { dg-final { scan-assembler-not {\tuunpkhi\t} } } */
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c 
> > > > b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c
> > > > new file mode 100644
> > > > index 000..cc40f45046b
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c
> > > > @@ -0,0 +1,29 @@
> > > > +/* { dg-do assemble { target aarch64_asm_sve_ok } } */
> > > > +/* { dg-options "-O -ftree-vectorize -march=armv8.2-a+sve 
> > > > --save-temps" } */
> > > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > > +
> > > > +#define N 1024
> > > > +unsigned char dst[N];
> > > > +unsigned char in1[N];
> > > > +unsigned char in2[N];
> > > > +
> > > > +/*
> > > > +**  foo:
> > > > +** ...
> > > > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > > > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > > > +** add (z[0-9]+\.b), \1, \2
> > > > +** and (z[0-9]+)\.d, z[0-9]+\.d, z[0-9]+\.d
> > > > +** and (z[0-9]+\.b), \4\.b, #0x1
> > > > +** add z0.b, \3, \5
> > > > +** ...
> > > > +*/
> > > > +void
> > > > +foo ()
> > > > +{
> > > > +  for( int x = 0; x < N; x++ )
> > > > +dst[x] = (in1[x] + in2[x]) >> 1;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not {\tuunpklo\t} } } */
> > > > +/* { dg-final { scan-assembler-not {\tuunpkhi\t} } } */
> > > > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> > > > index 8ebbcd76b64..7025a3b4dc2 100644
> > > > --- a/gcc/tree-vect-patterns.c
> > > > +++ b/gcc/tree-vect-patterns.c
> > > > @@ -2019,22 +2019,59 @@ vect_recog_average_pattern (stmt_vec_info 
> > > > last_stmt_info, tree *type_out)
> > >
> > > >/* Check for target support.  */
> > > >tree new_vectype = get_vectype_for_scalar_type (vinfo, new_type);
> > > > -  if (!new_vectype
> > > > -  || !direct_internal_fn_supported_p (ifn, new_vectype,
> > > > - OPTIMIZE_FOR_SPEED))
> > > > +
> > > > +  if (!new_vectype)
> > > >  return NULL;
> > > >
> > > > +  bool ifn_supported
> > > > += direct_internal_fn_supported_p (ifn, new_vectype, 
> > > > OPTIMIZE_FOR_SPEED);
> > > > +
> > > >/* The IR requires a valid vector type for the cast result, even 
> > > > though
> > > >   it's likely to be discarded.  */
> > > >*type_out = get_vectype_for_scalar_type (vinfo, type);
> > > >if (!*type_out)
> > > >  return NULL;
> > >  >
> > > > -  /* Generate the IFN_AVG* call.  */
> > > >tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
> > > >tree new_ops[2];
> > > >vect_convert_inputs (last_stmt_info, 2, new_ops, new_type,
> > > >unprom, new_vectype);
> > > > +
> > > > +  if (!ifn_supported)
> > >
> > > I guess this is personal preference, but I'm not sure there's much
> > > 

RE: [PATCH][GCC][SLP][testsuite] Turn off vect-epilogue-nomask for slp-rect-3

2019-11-29 Thread Richard Biener
On Thu, 28 Nov 2019, Tamar Christina wrote:

> Hi Richi,
> 
> > >
> > > This patch turns off vect-epilogue-nomask for slp-reduc-3 as it seems
> > > that the epiloque in this loop is vectorizable using SLP and smaller
> > > VF.  Since this test expects there to be no SLP vectorization at all
> > > the testcase then fails for arm targets.
> > 
> > Actually we do expect SLP vectorization, just the counting might go wrong.
> > 
> > What's the actual FAIL for arm?
> 
> I should have worded this better considering the testcase literally contains 
> SLP in the name...
> 
> The failure is for the XFAIL 
> 
> /* { dg-final { scan-tree-dump-times "vectorizing stmts using SLP" 1 "vect" { 
> xfail { vect_widen_sum_hi_to_si_pattern || { ! vect_unpack } } } } } */
> 
> And my understanding as to what is happening is that without epiloque no mask 
> it would only try HI modes, but thanks to the epiloques nomask
> It tries QI mode as well which succeeds.  The xfail then generates an xpass 
> since the condition on it checks for HI to SI and not QI.
> 
> So I disabled the epiloque mask since it seems to violate the conditions the 
> test actually wanted to test for.
> 
> Not quite sure why it's failing only on Arm though.

No idea.

I agree about the resolution so the patch is fine.

Thanks,
RIchard.


> Regards,
> Tamar
> 
> > 
> > Disabling epilogue vect is of course OK if it simplifies things.
> > 
> > > Regtested on arm-none-eabi and no issues.
> > >
> > > Ok for trunk?
> > 
> > > Thanks,
> > > Tamar
> > >
> > > gcc/testsuite/ChangeLog:
> > >
> > > 2019-11-28  Tamar Christina  
> > >
> > >   * gcc.dg/vect/slp-reduc-3.c: Turn off epilogue-nomask.
> > >
> > >
> > 
> > --
> > Richard Biener 
> > SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409
> > Nuernberg, Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: Don't pass booleans as mask types to simd clones (PR 92710)

2019-11-29 Thread Richard Biener
On Fri, Nov 29, 2019 at 9:30 AM Richard Sandiford
 wrote:
>
> In this PR we assigned a vector mask type to the result of a comparison
> and then tried to pass that mask type to a simd clone, which expected
> a normal (non-mask) type instead.
>
> This patch simply punts on call arguments that have a mask type.
> A better fix would be to pattern-match the comparison to a COND_EXPR,
> like we would if the comparison was stored to memory, but doing that
> isn't gcc 9 or 10 material.
>
> Note that this doesn't affect x86_64-linux-gnu because the ABI promotes
> bool arguments to ints.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK for trunk
> and gcc-9-branch?

OK.

Richard.

> Richard
>
>
> 2019-11-30  Richard Sandiford  
>
> gcc/
> PR tree-optimization/92710
> * tree-vect-stmts.c (vectorizable_simd_clone_call): Reject
> vector mask arguments.
>
> gcc/testsuite/
> PR tree-optimization/92710
> * gcc.dg/vect/pr92710.c: New test.
>
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2019-11-22 09:57:59.194224976 +
> +++ gcc/tree-vect-stmts.c   2019-11-29 08:28:12.015121876 +
> @@ -3925,7 +3925,16 @@ vectorizable_simd_clone_call (stmt_vec_i
>   || thisarginfo.dt == vect_external_def)
> gcc_assert (thisarginfo.vectype == NULL_TREE);
>else
> -   gcc_assert (thisarginfo.vectype != NULL_TREE);
> +   {
> + gcc_assert (thisarginfo.vectype != NULL_TREE);
> + if (VECTOR_BOOLEAN_TYPE_P (thisarginfo.vectype))
> +   {
> + if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"vector mask arguments are not supported\n");
> + return false;
> +   }
> +   }
>
>/* For linear arguments, the analyze phase should have saved
>  the base and step in STMT_VINFO_SIMD_CLONE_INFO.  */
> Index: gcc/testsuite/gcc.dg/vect/pr92710.c
> ===
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr92710.c 2019-11-29 08:28:12.011121905 +
> @@ -0,0 +1,12 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-fopenmp-simd" } */
> +
> +#pragma omp declare simd
> +_Bool foo (_Bool) __attribute__((const));
> +
> +void
> +f (_Bool *restrict x, char *restrict y, char *restrict z)
> +{
> +  for (int i = 0; i < 128; ++i)
> +x[i] = foo (y[i] == z[i]);
> +}


The Trophex Show - Attendees List

2019-11-29 Thread Emily Jones
Hi
Hope you are doing well.
We are following up to if you would be interested in the Attendees/Visitors 
List of
The Trophex Show
12 - 13 Jan 2020
NEC, Birmingham, UK
Counts = 7,640

our company provides the following details regarding your attendees:  Title, 
Client Name, Email and Website, address, Phone number, Zip code & Cities.
If you are interested in the attendees list, let us know your thoughts so we 
will send you the discounted cost and additional information.

Regards,
Emily



[PATCH v2 0/2] Fix run-time handling of `libgcc_s' in testing

2019-11-29 Thread Maciej W. Rozycki
Hi,

 This is a follow-up to the original proposal of a change posted here:



to address a catastrophic libgo testsuite failure in cross-compilation 
where shared `libgcc_s' library cannot be found by the loader at run time 
in build-tree testing and consequently no test case executes.

 In the course of discussion it has turned out that the culprit is in a 
generic GCC test framework helper and in an attempt to handle this I have 
come across a nonsensical warning message produced by the GCC driver 
interfering with the solution I have come up with.

 Consequently I have prepared this small patch series where in 1/2 I 
propose to drop the warning message in the affected case and then in 2/2 I 
address the actual problem.  See individual change descriptions for 
details.

 These two changes have been regression-tested with `make check' using the 
`x86_64-linux-gnu' native system as well as the `x86_64-linux-gnu' host 
and the `riscv64-linux-gnu' target, with RISC-V QEMU in the Linux user 
emulation mode as the target board.

 OK to apply?

  Maciej


[PATCH v2 1/2] driver: Do not warn about ineffective `-x' option if no inputs were given

2019-11-29 Thread Maciej W. Rozycki
Fix an issue with the GCC driver and the `-x' option where a warning is 
issued in an invocation like:

$ riscv64-linux-gnu-gcc -print-multi-directory -x c++
riscv64-linux-gnu-gcc: warning: '-x c++' after last input file has no effect
lib64/lp64d
$ 

where no inputs were given and hence the use of `-x' is irrelevant.  
The statement printed is also untrue as the `-x' does not come after the 
last input file given that none was given.  Do not print it then if no 
inputs were supplied.

gcc/
* gcc.c (process_command): Only warn about an ineffective `-x' 
option if any input files have actually been supplied.
---
Hi,

 This warning interferes with 2/2 and libgomp testing where `-x c++' is 
included with compiler invocation for a subset of tests.  This could 
probably be dealt with in the test suite, but I have concluded that the 
warning makes no sense in the first place and would better be simply 
removed instead.

  Maciej

New change in v2.
---
 gcc/gcc.c |2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

gcc-spec-lang-warn-no-input.diff
Index: gcc/gcc/gcc.c
===
--- gcc.orig/gcc/gcc.c
+++ gcc/gcc/gcc.c
@@ -4741,7 +4741,7 @@ process_command (unsigned int decoded_op
   /* More prefixes are enabled in main, after we read the specs file
  and determine whether this is cross-compilation or not.  */
 
-  if (n_infiles == last_language_n_infiles && spec_lang != 0)
+  if (n_infiles != 0 && n_infiles == last_language_n_infiles && spec_lang != 0)
 warning (0, "%<-x %s%> after last input file has no effect", spec_lang);
 
   /* Synthesize -fcompare-debug flag from the GCC_COMPARE_DEBUG


[PATCH v2 2/2] testsuite: Fix run-time tracking down of `libgcc_s'

2019-11-29 Thread Maciej W. Rozycki
Fix a catastrophic libgo testsuite failure in cross-compilation where 
the shared `libgcc_s' library cannot be found by the loader at run time 
in build-tree testing and consequently all test cases fail the execution 
stage, giving output (here with the `x86_64-linux-gnu' host and the 
`riscv64-linux-gnu' target, with RISC-V QEMU in the Linux user emulation 
mode as the target board) like:

spawn qemu-riscv64 -E 
LD_LIBRARY_PATH=.:.../riscv64-linux-gnu/lib64/lp64d/libgo/.libs ./a.exe
./a.exe: error while loading shared libraries: libgcc_s.so.1: cannot open 
shared object file: No such file or directory
FAIL: archive/tar

To do so rework `gcc-set-multilib-library-path' so as not to rely on the 
`rootme' TCL variable to have been preset in testsuite invocation, which 
only works for the GCC test suites and not for library test suites, and 
also use `remote_exec host' rather than `exec' to invoke the compiler in 
determination of `libgcc_s' locations, so that the solution works in 
remote testing as well while also avoiding the hardcoded limit of the 
executable's path length imposed by `exec'.

This is based on an observation that the multilib root directory can be 
determined by stripping out the multilib directory in effect as printed 
with the `-print-multi-directory' option from the path produced by the 
`-print-file-name=' option.  And then individual full multilib paths can 
be assembled for the other multilibs by appending their respective 
multilib directories to the multilib root directory.

Unlike with the old solution the full multilib paths are not checked for 
the presence of the shared `libgcc_s' library there, but that is 
supposed to be harmless.  Also the full multilib path for the multilib 
used with the compiler used for testing will now come first, which 
should reduce run-time processing in the usual case.

With this change in place test output instead looks like:

spawn qemu-riscv64 -E 
LD_LIBRARY_PATH=.:.../riscv64-linux-gnu/lib64/lp64d/libgo/.libs:..././gcc/lib64/lp64d:..././gcc/.:..././gcc/lib32/ilp32:..././gcc/lib32/ilp32d:..././gcc/lib64/lp64
 ./a.exe
PASS
PASS: archive/tar

No summary comparison, because the libgo testsuite does not provide one 
in this configuration for some reason, however this change improves 
overall results from 0 PASSes and 159 FAILs to 133 PASSes and 26 FAILs.

gcc/testsuite/
* lib/gcc-defs.exp (gcc-set-multilib-library-path): Use 
`-print-file-name=' to determine the multilib root directory.  
Use `remote_exec host' rather than `exec' to invoke the 
compiler.
---
Hi,

 As PR testsuite/40699, PR testsuite/40707 and PR testsuite/40709 and 
resulting r149508, a revert of r149113 ("Tidy up testsuite handling of 
LD_LIBRARY_PATH"), 
 and some other 
commits, have indicated non-selected multilib directories have to be 
included in the dynamic loader's library path for some targets for some 
reason.  So I have decided to preserve the approach, even though it 
appears odd to me.

  Maciej

Changes from v1:

- Resolve the issue globally in `gcc-set-multilib-library-path' in 
  gcc-defs.exp rather than for libgo only in `go_link_flags' in go.exp.
---
 gcc/testsuite/lib/gcc-defs.exp |   43 +++--
 1 file changed, 29 insertions(+), 14 deletions(-)

Index: gcc/gcc/testsuite/lib/gcc-defs.exp
===
--- gcc.orig/gcc/testsuite/lib/gcc-defs.exp
+++ gcc/gcc/testsuite/lib/gcc-defs.exp
@@ -324,29 +324,44 @@ proc dg-additional-files-options { optio
 # for COMPILER, including multilib directories.
 
 proc gcc-set-multilib-library-path { compiler } {
-global rootme
+set shlib_ext [get_shlib_extension]
+set options [lrange $compiler 1 end]
+set compiler [lindex $compiler 0]
 
-# ??? rootme will not be set when testing an installed compiler.
-# In that case, we should perhaps use some other method to find
-# libraries.
-if {![info exists rootme]} {
+set libgcc_s_x [remote_exec host "$compiler" \
+   "$options -print-file-name=libgcc_s.${shlib_ext}"]
+if { [lindex $libgcc_s_x 0] == 0 \
+&& [set libgcc_s_dir [file dirname [lindex $libgcc_s_x 1]]] != "" } {
+   set libpath ":${libgcc_s_dir}"
+} else {
return ""
 }
 
-set libpath ":${rootme}"
-set options [lrange $compiler 1 end]
-set compiler [lindex $compiler 0]
-if { [is_remote host] == 0 && [which $compiler] != 0 } {
-   foreach i "[eval exec $compiler $options --print-multi-lib]" {
+set multi_dir_x [remote_exec host "$compiler" \
+"$options -print-multi-directory"]
+set multi_lib_x [remote_exec host "$compiler" \
+"$options -print-multi-lib"]
+if { [lindex $multi_dir_x 0] == 0 && [lindex $multi_lib_x 0] == 0 } {
+   set multi_dir [string trim [lindex $multi_dir_x 1]]
+   set mul

Re: [PATCH] PR90838: Support ctz idioms

2019-11-29 Thread Richard Biener
On Fri, Nov 15, 2019 at 4:24 PM Wilco Dijkstra  wrote:
>
> Hi Richard,
>
> > Uh.  Well.  I think that the gimple-match-head.c hunk isn't something we 
> > want.  Instead,
> > since this optimizes a memory access, the handling should move
> > to tree-ssa-forwprop.c where you _may_ use a (match ...)
> > match.pd pattern to do the (rshift (mult (bit_and (negate @1) @1)
> > matching.  It might be the first to use that feature, you need to
> > declare the function to use it from tree-ssa-forwprop.c.  So
>
> OK, I've moved to to fwprop, and it works just fine there while still
> using match.pd to do the idiom matching. Here is the updated version:
>
> [PATCH v2] PR90838: Support ctz idioms
>
> v2: Use fwprop pass rather than match.pd
>
> Support common idioms for count trailing zeroes using an array lookup.
> The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
> constant which when multiplied by a power of 2 contains a unique value
> in the top 5 or 6 bits.  This is then indexed into a table which maps it
> to the number of trailing zeroes.  When the table is valid, we emit a
> sequence using the target defined value for ctz (0):
>
> int ctz1 (unsigned x)
> {
>   static const char table[32] =
> {
>   0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
>   31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
> };
>
>   return table[((unsigned)((x & -x) * 0x077CB531U)) >> 27];
> }
>
> Is optimized to:
>
> rbitw0, w0
> clz w0, w0
> and w0, w0, 31
> ret
>
> Bootstrapped on AArch64. OK for commit?
>
> ChangeLog:
>
> 2019-11-15  Wilco Dijkstra  
>
> PR tree-optimization/90838
> * tree-ssa-forwprop.c (optimize_count_trailing_zeroes):
> Add new function.
> (simplify_count_trailing_zeroes): Add new function.
> (pass_forwprop::execute): Try ctz simplification.
> * match.pd: Add matching for ctz idioms.
> * testsuite/gcc.target/aarch64/pr90838.c: New test.
> ---
>
> diff --git a/gcc/match.pd b/gcc/match.pd
> index 
> 6edf54b80012d87dbe7330f5ee638cdba2f9c099..479e9076f0d4deccda54425e93ee4567b85409aa
>  100644
> --- a/gcc/match.pd
> +++ b/gcc/match.pd
> @@ -6060,3 +6060,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (simplify
>   (vec_perm vec_same_elem_p@0 @0 @1)
>   @0)
> +
> +/* Match count trailing zeroes for simplify_count_trailing_zeroes in fwprop.
> +   The canonical form is array[((x & -x) * C) >> SHIFT] where C is a magic
> +   constant which when multiplied by a power of 2 contains a unique value
> +   in the top 5 or 6 bits.  This is then indexed into a table which maps it
> +   to the number of trailing zeroes.  */
> +(match (ctz_table_index @1 @2 @3)
> +  (rshift (mult (bit_and (negate @1) @1) INTEGER_CST@2) INTEGER_CST@3))

You need a :c on the bit_and

> diff --git a/gcc/testsuite/gcc.target/aarch64/pr90838.c 
> b/gcc/testsuite/gcc.target/aarch64/pr90838.c
> new file mode 100644
> index 
> ..bff3144c0d1b3984016e5a404e986eae785c73ed
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/aarch64/pr90838.c
> @@ -0,0 +1,64 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +int ctz1 (unsigned x)
> +{
> +  static const char table[32] =
> +{
> +  0, 1, 28, 2, 29, 14, 24, 3, 30, 22, 20, 15, 25, 17, 4, 8,
> +  31, 27, 13, 23, 21, 19, 16, 7, 26, 12, 18, 6, 11, 5, 10, 9
> +};
> +
> +  return table[((unsigned)((x & -x) * 0x077CB531U)) >> 27];
> +}
> +
> +int ctz2 (unsigned x)
> +{
> +  const int u = 0;
> +  static short table[64] =
> +{
> +  32, 0, 1,12, 2, 6, u,13, 3, u, 7, u, u, u, u,14,
> +  10, 4, u, u, 8, u, u,25, u, u, u, u, u,21,27,15,
> +  31,11, 5, u, u, u, u, u, 9, u, u,24, u, u,20,26,
> +  30, u, u, u, u,23, u,19,29, u,22,18,28,17,16, u
> +};
> +
> +  x = (x & -x) * 0x0450FBAF;
> +  return table[x >> 26];
> +}
> +
> +int ctz3 (unsigned x)
> +{
> +  static int table[32] =
> +{
> +  0, 1, 2,24, 3,19, 6,25, 22, 4,20,10,16, 7,12,26,
> +  31,23,18, 5,21, 9,15,11,30,17, 8,14,29,13,28,27
> +};
> +
> +  if (x == 0) return 32;
> +  x = (x & -x) * 0x04D7651F;
> +  return table[x >> 27];
> +}
> +
> +static const unsigned long long magic = 0x03f08c5392f756cdULL;
> +
> +static const char table[64] = {
> + 0,  1, 12,  2, 13, 22, 17,  3,
> +14, 33, 23, 36, 18, 58, 28,  4,
> +62, 15, 34, 26, 24, 48, 50, 37,
> +19, 55, 59, 52, 29, 44, 39,  5,
> +63, 11, 21, 16, 32, 35, 57, 27,
> +61, 25, 47, 49, 54, 51, 43, 38,
> +10, 20, 31, 56, 60, 46, 53, 42,
> + 9, 30, 45, 41,  8, 40,  7,  6,
> +};
> +
> +int ctz4 (unsigned long x)
> +{
> +  unsigned long lsb = x & -x;
> +  return table[(lsb * magic) >> 58];
> +}
> +
> +/* { dg-final { scan-assembler-times "clz\t" 4 } } */
> +/* { dg-final { scan-assembler-times "and\t" 2 } } */
> +/* { dg-final { scan-assembler-not "cmp\t.*0" } } */
> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
> index 
> fe

[PATCH] Fix PR92715

2019-11-29 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-11-29  Richard Biener  

PR tree-optimization/92715
* tree-ssa-forwprop.c (simplify_vector_constructor): Bail
out for uniform vectors and source vectors with less elements
than the destination.

* gcc.dg/torture/pr92715.c: New testcase.

Index: gcc/tree-ssa-forwprop.c
===
--- gcc/tree-ssa-forwprop.c (revision 278827)
+++ gcc/tree-ssa-forwprop.c (working copy)
@@ -2038,13 +2038,13 @@
   constructor_elt *elt;
   bool maybe_ident;
 
-  gcc_checking_assert (gimple_assign_rhs_code (stmt) == CONSTRUCTOR);
-
   op = gimple_assign_rhs1 (stmt);
   type = TREE_TYPE (op);
-  gcc_checking_assert (TREE_CODE (type) == VECTOR_TYPE);
+  gcc_checking_assert (TREE_CODE (op) == CONSTRUCTOR
+  && TREE_CODE (type) == VECTOR_TYPE);
 
-  if (!TYPE_VECTOR_SUBPARTS (type).is_constant (&nelts))
+  if (!TYPE_VECTOR_SUBPARTS (type).is_constant (&nelts)
+  || uniform_vector_p (op))
 return false;
   elem_type = TREE_TYPE (type);
   elem_size = TREE_INT_CST_LOW (TYPE_SIZE (elem_type));
@@ -2136,6 +2136,9 @@
   || ! VECTOR_TYPE_P (TREE_TYPE (orig[0])))
 return false;
   refnelts = TYPE_VECTOR_SUBPARTS (TREE_TYPE (orig[0])).to_constant ();
+  /* We currently do not handle larger destination vectors.  */
+  if (refnelts < nelts)
+return false;
 
   if (maybe_ident)
 {
Index: gcc/testsuite/gcc.dg/torture/pr92715.c
===
--- gcc/testsuite/gcc.dg/torture/pr92715.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr92715.c  (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-mavx2" { target x86_64-*-* i?86-*-* } } */
+
+typedef double v4si __attribute__((vector_size(32)));
+typedef double v2si __attribute__((vector_size(16)));
+
+void foo (v4si *dstp, v2si *srcp)
+{
+  v2si src = *srcp;
+  *dstp = (v4si) { src[0], src[1], src[0], src[1] };
+}
+
+void bar (v4si *dstp, v2si *srcp)
+{
+  v2si src = *srcp;
+  *dstp = (v4si) { src[0], src[0], src[0], src[0] };
+}


[committed] Fix lambda handling in OpenMP declare reduction (PR c++/60228)

2019-11-29 Thread Jakub Jelinek
Hi!

When OpenMP declare reduction appears outside of block scope (i.e. namespace
scope or class scope), an artificial function that is actually never called
and should be never emitted is created to hold the statements of the
combiner and initializer from which then OpenMP reduction clause handling
picks the statements.  Unfortunately, with lambdas there can be various
issues as the testcases show.
One is in templates, where lambda parsing can add extra DECL_EXPRs and
popscope doesn't add a BIND_EXPR around so we can end up with multiple
statements returned from finish_omp_structured_block, while the reduction
handling code requires exactly one to be able to figure out what is what.
The rest of the fixes is making sure that the artificial functions aren't
actually genericized/gimplified which with the lambdas apparently happened
in some cases, that they don't show up in the mangled names (the functions
are artificially added and furthermore their names contain spaces etc.,
so assemblers aren't happy about that either).  Some of the changes handle
those like consteval functions, but something more is needed in two spots.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk.

2019-11-29  Jakub Jelinek  

PR c++/60228
* parser.c (cp_parser_omp_declare_reduction_exprs): If
processing_template_decl, wrap the combiner or initializer
into EXPR_STMT.
* decl.c (start_preparsed_function): Don't start a lambda scope
for DECL_OMP_DECLARE_REDUCTION_P functions.
(finish_function): Don't finish a lambda scope for
DECL_OMP_DECLARE_REDUCTION_P functions, nor cp_fold_function
them nor cp_genericize them.
* mangle.c (decl_mangling_context): Look through
DECL_OMP_DECLARE_REDUCTION_P functions.
* semantics.c (expand_or_defer_fn_1): For DECL_OMP_DECLARE_REDUCTION_P
functions, use tentative linkage, don't keep their bodies with
-fkeep-inline-functions and return false at the end.

* g++.dg/gomp/openmp-simd-2.C: Don't expect bodies for
DECL_OMP_DECLARE_REDUCTION_P functions.

* testsuite/libgomp.c++/udr-20.C: New test.
* testsuite/libgomp.c++/udr-21.C: New test.

--- gcc/cp/parser.c.jj  2019-11-28 09:02:26.931819871 +0100
+++ gcc/cp/parser.c 2019-11-28 11:30:22.273670201 +0100
@@ -41244,6 +41244,8 @@ cp_parser_omp_declare_reduction_exprs (t
   combiner = cp_parser_expression (parser);
   finish_expr_stmt (combiner);
   block = finish_omp_structured_block (block);
+  if (processing_template_decl)
+block = build_stmt (input_location, EXPR_STMT, block);
   add_stmt (block);
 
   if (!cp_parser_require (parser, CPP_CLOSE_PAREN, RT_CLOSE_PAREN))
@@ -41348,6 +41350,8 @@ cp_parser_omp_declare_reduction_exprs (t
 
   block = finish_omp_structured_block (block);
   cp_walk_tree (&block, cp_remove_omp_priv_cleanup_stmt, omp_priv, NULL);
+  if (processing_template_decl)
+   block = build_stmt (input_location, EXPR_STMT, block);
   add_stmt (block);
 
   if (ctor)
--- gcc/cp/decl.c.jj2019-11-28 09:02:26.470826954 +0100
+++ gcc/cp/decl.c   2019-11-28 13:09:24.748799556 +0100
@@ -16318,7 +16318,8 @@ start_preparsed_function (tree decl1, tr
   && !implicit_default_ctor_p (decl1))
 cp_ubsan_maybe_initialize_vtbl_ptrs (current_class_ptr);
 
-  start_lambda_scope (decl1);
+  if (!DECL_OMP_DECLARE_REDUCTION_P (decl1))
+start_lambda_scope (decl1);
 
   return true;
 }
@@ -16703,7 +16704,8 @@ finish_function (bool inline_p)
   if (fndecl == NULL_TREE)
 return error_mark_node;
 
-  finish_lambda_scope ();
+  if (!DECL_OMP_DECLARE_REDUCTION_P (fndecl))
+finish_lambda_scope ();
 
   if (c_dialect_objc ())
 objc_finish_function ();
@@ -16845,7 +16847,9 @@ finish_function (bool inline_p)
 invoke_plugin_callbacks (PLUGIN_PRE_GENERICIZE, fndecl);
 
   /* Perform delayed folding before NRV transformation.  */
-  if (!processing_template_decl && !DECL_IMMEDIATE_FUNCTION_P (fndecl))
+  if (!processing_template_decl
+  && !DECL_IMMEDIATE_FUNCTION_P (fndecl)
+  && !DECL_OMP_DECLARE_REDUCTION_P (fndecl))
 cp_fold_function (fndecl);
 
   /* Set up the named return value optimization, if we can.  Candidate
@@ -16958,7 +16962,9 @@ finish_function (bool inline_p)
 do_warn_unused_parameter (fndecl);
 
   /* Genericize before inlining.  */
-  if (!processing_template_decl && !DECL_IMMEDIATE_FUNCTION_P (fndecl))
+  if (!processing_template_decl
+  && !DECL_IMMEDIATE_FUNCTION_P (fndecl)
+  && !DECL_OMP_DECLARE_REDUCTION_P (fndecl))
 cp_genericize (fndecl);
 
  cleanup:
--- gcc/cp/mangle.c.jj  2019-11-05 08:40:43.705298108 +0100
+++ gcc/cp/mangle.c 2019-11-28 13:22:35.489587821 +0100
@@ -873,7 +873,16 @@ decl_mangling_context (tree decl)
   else if (template_type_parameter_p (decl))
  /* template type parms have no mangling context.  */
   return NULL_TREE;
-  return CP_DECL_CONTEXT (decl);
+
+  tconte

[PING][PATCH] doc: Correct `--enable-version-specific-runtime-libs' support information

2019-11-29 Thread Maciej W. Rozycki
On Wed, 20 Nov 2019, Maciej W. Rozycki wrote:

> The `--enable-version-specific-runtime-libs' configuration option is now 
> supported throughout all of our target library subdirectories, so update 
> installation documentation accordingly and also mention that the default 
> for the option is `yes' for libada and `no' for the remaining libraries.

 Ping for:



  Maciej


[C++ PATCH] (temporarily) undefine __cpp_consteval

2019-11-29 Thread Jakub Jelinek
Hi!

When submitting the P1902R1 patch for missing feature macros, I
completely forgot that we can't claim consteval support, because we have
the
  /* FIXME: For now.  */
  if (virtualp && (inlinep & 8) != 0)
{
  sorry_at (DECL_SOURCE_LOCATION (decl),
"% % method %qD not supported yet",
decl);
  inlinep &= ~8;
}
limitation in consteval support.  I've tried to make some progress on it
in PR88335, but am stuck, so this patch instead comments out this and
updates cxx-status.html to explain the partial support.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk and
wwwdocs?

2019-11-29  Jakub Jelinek  

* c-cppbuiltin.c (c_cpp_builtins): Don't define __cpp_consteval for
now.

* g++.dg/cpp2a/feat-cxx2a.C: Don't test __cpp_consteval for now.

--- gcc/c-family/c-cppbuiltin.c.jj  2019-11-28 09:02:23.705869433 +0100
+++ gcc/c-family/c-cppbuiltin.c 2019-11-28 18:59:20.407918255 +0100
@@ -999,7 +999,7 @@ c_cpp_builtins (cpp_reader *pfile)
  cpp_define (pfile, "__cpp_designated_initializers=201707L");
  cpp_define (pfile, "__cpp_constexpr_in_decltype=201711L");
  cpp_define (pfile, "__cpp_conditional_explicit=201806L");
- cpp_define (pfile, "__cpp_consteval=201811L");
+ /* cpp_define (pfile, "__cpp_consteval=201811L"); */
  cpp_define (pfile, "__cpp_constinit=201907L");
  cpp_define (pfile, "__cpp_deduction_guides=201907L");
  cpp_define (pfile, "__cpp_nontype_template_parameter_class=201806L");
--- gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C.jj  2019-11-28 09:02:25.331844453 
+0100
+++ gcc/testsuite/g++.dg/cpp2a/feat-cxx2a.C 2019-11-28 19:00:59.459400136 
+0100
@@ -520,11 +520,13 @@
 #  error "__cpp_constexpr_in_decltype != 201711"
 #endif
 
+/* Not supported fully yet:
 #ifndef __cpp_consteval
 #  error "__cpp_consteval"
 #elif __cpp_consteval != 201811
 #  error "__cpp_consteval != 201811"
 #endif
+*/
 
 #ifndef __cpp_concepts
 #  error "__cpp_concepts"

Jakub
diff --git a/htdocs/projects/cxx-status.html b/htdocs/projects/cxx-status.html
index c6ff78e1..c8655a17 100644
--- a/htdocs/projects/cxx-status.html
+++ b/htdocs/projects/cxx-status.html
@@ -288,7 +288,8 @@
 
Immediate functions (consteval) 
   http://wg21.link/p1073r3";>P1073R3
-   10 
+   10
+(partial, no consteval virtual support) 
__cpp_consteval >= 201811 
 
 


[0/5] Don't defer vector type choice for bools (PR 92596)

2019-11-29 Thread Richard Sandiford
When vectorising a comparison between N-bit integers or N-bit floats,
we want the boolean result to use the vector mask type for N-bit elements.
On most targets this is a vector of N-bit integers, but for SVE it's
a vector predicate and on AVX512 it's a scalar integer mask.

On the other hand, when loading or storing an M-byte boolean, we want to
treat it like any other M-byte integer type.

This difference leads to some complicated handling.  E.g. booleean logic
ops fed by two N-bit comparisons should use a vector mask for N-bit
elements.  But boolean logic ops fed by two M-byte data loads should
use normal M-byte integer vectors.  Boolean logic ops fed by an N-bit
comparison and an M-bit comparison need to convert one of the inputs
first (handled via pattern stmts).  Boolean logic ops fed by an N-bit
comparison and a load are not yet supported.  Etc.

Historically we've tried to make this choice on the fly.  This has
two major downsides:

(a) search_type_for_mask has to use a worklist to find the mask type for
a particular operation.  The results are not cached between calls,
so this is a potential source of quadratic behavior.

(b) we can only choose the vector type for a boolean result once
we know the vector types of the inputs.  So both the loop and
SLP vectorisers make another pass for boolean types.

The second example in PR 92596 is another case in which (b) causes
problems.  I tried various non-invasive ways of working around it,
but although they worked for the testcase and testsuite, it was easy
to see that they were flaky and would probably cause problems later.
In the end I think the best fix is to stop trying to make this decision
on the fly and record it in the stmt_vec_info instead.

Obviously it's not ideal to be doing something like this in stage 3,
but it is a bug fix and I think it will make bool-related problems
easier to handle in future.

Each patch tested individually on aarch64-linux-gnu and the series
as a whole on x86_64-linux-gnu.  OK to install?

Richard


Re: [SVE] PR89007 - Implement generic vector average expansion

2019-11-29 Thread Richard Biener
On Fri, Nov 22, 2019 at 12:40 PM Prathamesh Kulkarni
 wrote:
>
> On Wed, 20 Nov 2019 at 16:54, Richard Biener  wrote:
> >
> > On Wed, 20 Nov 2019, Richard Sandiford wrote:
> >
> > > Hi,
> > >
> > > Thanks for doing this.  Adding Richard on cc:, since the SVE subject
> > > tag might have put him off.  There's not really anything SVE-specific
> > > here apart from the testcase.
> >
> > Ah.
> >
> > > > 2019-11-19  Prathamesh Kulkarni  
> > > >
> > > > PR tree-optimization/89007
> > > > * tree-vect-patterns.c (vect_recog_average_pattern): If there is no
> > > > target support available, generate code to distribute rshift over 
> > > > plus
> > > > and add one depending upon floor or ceil rounding.
> > > >
> > > > testsuite/
> > > > * gcc.target/aarch64/sve/pr89007.c: New test.
> > > >
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c 
> > > > b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c
> > > > new file mode 100644
> > > > index 000..32095c63c61
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-1.c
> > > > @@ -0,0 +1,29 @@
> > > > +/* { dg-do assemble { target aarch64_asm_sve_ok } } */
> > > > +/* { dg-options "-O -ftree-vectorize -march=armv8.2-a+sve 
> > > > --save-temps" } */
> > > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > > +
> > > > +#define N 1024
> > > > +unsigned char dst[N];
> > > > +unsigned char in1[N];
> > > > +unsigned char in2[N];
> > > > +
> > > > +/*
> > > > +**  foo:
> > > > +** ...
> > > > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > > > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > > > +** add (z[0-9]+\.b), \1, \2
> > > > +** orr (z[0-9]+)\.d, z[0-9]+\.d, z[0-9]+\.d
> > > > +** and (z[0-9]+\.b), \4\.b, #0x1
> > > > +** add z0.b, \3, \5
> > >
> > > It'd probably be more future-proof to allow (\1, \2|\2, \1) and
> > > (\3, \5|\5, \3).  Same for the other testcase.
> > >
> > > > +** ...
> > > > +*/
> > > > +void
> > > > +foo ()
> > > > +{
> > > > +  for( int x = 0; x < N; x++ )
> > > > +dst[x] = (in1[x] + in2[x] + 1) >> 1;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not {\tuunpklo\t} } } */
> > > > +/* { dg-final { scan-assembler-not {\tuunpkhi\t} } } */
> > > > diff --git a/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c 
> > > > b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c
> > > > new file mode 100644
> > > > index 000..cc40f45046b
> > > > --- /dev/null
> > > > +++ b/gcc/testsuite/gcc.target/aarch64/sve/pr89007-2.c
> > > > @@ -0,0 +1,29 @@
> > > > +/* { dg-do assemble { target aarch64_asm_sve_ok } } */
> > > > +/* { dg-options "-O -ftree-vectorize -march=armv8.2-a+sve 
> > > > --save-temps" } */
> > > > +/* { dg-final { check-function-bodies "**" "" } } */
> > > > +
> > > > +#define N 1024
> > > > +unsigned char dst[N];
> > > > +unsigned char in1[N];
> > > > +unsigned char in2[N];
> > > > +
> > > > +/*
> > > > +**  foo:
> > > > +** ...
> > > > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > > > +** lsr (z[0-9]+\.b), z[0-9]+\.b, #1
> > > > +** add (z[0-9]+\.b), \1, \2
> > > > +** and (z[0-9]+)\.d, z[0-9]+\.d, z[0-9]+\.d
> > > > +** and (z[0-9]+\.b), \4\.b, #0x1
> > > > +** add z0.b, \3, \5
> > > > +** ...
> > > > +*/
> > > > +void
> > > > +foo ()
> > > > +{
> > > > +  for( int x = 0; x < N; x++ )
> > > > +dst[x] = (in1[x] + in2[x]) >> 1;
> > > > +}
> > > > +
> > > > +/* { dg-final { scan-assembler-not {\tuunpklo\t} } } */
> > > > +/* { dg-final { scan-assembler-not {\tuunpkhi\t} } } */
> > > > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> > > > index 8ebbcd76b64..7025a3b4dc2 100644
> > > > --- a/gcc/tree-vect-patterns.c
> > > > +++ b/gcc/tree-vect-patterns.c
> > > > @@ -2019,22 +2019,59 @@ vect_recog_average_pattern (stmt_vec_info 
> > > > last_stmt_info, tree *type_out)
> > >
> > > >/* Check for target support.  */
> > > >tree new_vectype = get_vectype_for_scalar_type (vinfo, new_type);
> > > > -  if (!new_vectype
> > > > -  || !direct_internal_fn_supported_p (ifn, new_vectype,
> > > > - OPTIMIZE_FOR_SPEED))
> > > > +
> > > > +  if (!new_vectype)
> > > >  return NULL;
> > > >
> > > > +  bool ifn_supported
> > > > += direct_internal_fn_supported_p (ifn, new_vectype, 
> > > > OPTIMIZE_FOR_SPEED);
> > > > +
> > > >/* The IR requires a valid vector type for the cast result, even 
> > > > though
> > > >   it's likely to be discarded.  */
> > > >*type_out = get_vectype_for_scalar_type (vinfo, type);
> > > >if (!*type_out)
> > > >  return NULL;
> > >  >
> > > > -  /* Generate the IFN_AVG* call.  */
> > > >tree new_var = vect_recog_temp_ssa_var (new_type, NULL);
> > > >tree new_ops[2];
> > > >vect_convert_inputs (last_stmt_info, 2, new_ops, new_type,
> > > >unprom, new_vectype);
> > > > +
> > > > +  if (!ifn_supported)
> > >
> > > I guess this is personal preference, but I'm not sure there's much
> >

[1/5] Improve tree-vect-patterns.c handling of boolean comparisons

2019-11-29 Thread Richard Sandiford
vect_recog_bool_pattern assumed that a comparison between two booleans
should always become a comparison of vector mask types (implemented as an
XOR_EXPR).  But if the booleans in question are generated as data values
(e.g. because they're loaded directly from memory), we should treat them
like ordinary integers instead, just as we do for boolean logic ops whose
operands are loaded from memory.  vect_get_mask_type_for_stmt already
handled this case:

  /* We may compare boolean value loaded as vector of integers.
 Fix mask_type in such case.  */
  if (mask_type
  && !VECTOR_BOOLEAN_TYPE_P (mask_type)
  && gimple_code (stmt) == GIMPLE_ASSIGN
  && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_comparison)
mask_type = truth_type_for (mask_type);

and not handling it here complicated later patches.

The initial list of targets for vect_bool_cmp is deliberately conservative.


2019-11-30  Richard Sandiford  

gcc/
* doc/sourcebuild.texi (vect_bool_cmp): Document.
* tree-vect-patterns.c (search_type_for_mask_1): If neither
operand to a boolean comparison is a natural vector mask,
handle both operands like normal integers instead.

gcc/testsuite/
* gcc.dg/vect/vect-bool-cmp-2.c: New test.
* lib/target-supports.exp (check_effective_target_vect_bool_cmp): New
effective target procedure.

Index: gcc/doc/sourcebuild.texi
===
--- gcc/doc/sourcebuild.texi2019-11-20 21:11:59.065472803 +
+++ gcc/doc/sourcebuild.texi2019-11-29 09:11:21.365130870 +
@@ -1522,6 +1522,10 @@ Target does not support a vector add ins
 @item vect_no_bitwise
 Target does not support vector bitwise instructions.
 
+@item vect_bool_cmp
+Target supports comparison of @code{bool} vectors for at least one
+vector length.
+
 @item vect_char_add
 Target supports addition of @code{char} vectors for at least one
 vector length.
Index: gcc/tree-vect-patterns.c
===
--- gcc/tree-vect-patterns.c2019-11-16 10:29:21.207212217 +
+++ gcc/tree-vect-patterns.c2019-11-29 09:11:21.389130702 +
@@ -3944,7 +3944,8 @@ search_type_for_mask_1 (tree var, vec_in
 vinfo, cache);
  if (!res || (res2 && TYPE_PRECISION (res) > TYPE_PRECISION 
(res2)))
res = res2;
- break;
+ if (res)
+   break;
}
 
  comp_vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE (rhs1));
Index: gcc/testsuite/gcc.dg/vect/vect-bool-cmp-2.c
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/vect/vect-bool-cmp-2.c 2019-11-29 09:11:21.373130815 
+
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+
+void
+f (_Bool *restrict x, _Bool *restrict y)
+{
+  for (int i = 0; i < 128; ++i)
+x[i] = x[i] == y[i];
+}
+
+/* { dg-final { scan-tree-dump "loop vectorized" "vect" { target vect_bool_cmp 
} } } */
Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp   2019-11-26 22:11:24.494545152 
+
+++ gcc/testsuite/lib/target-supports.exp   2019-11-29 09:11:21.373130815 
+
@@ -5749,6 +5749,16 @@ proc check_effective_target_vect_bswap {
 || [istarget amdgcn-*-*] }}]
 }
 
+# Return 1 if the target supports comparison of bool vectors for at
+# least one vector length.
+
+proc check_effective_target_vect_bool_cmp { } {
+return [check_cached_effective_target_indexed vect_bool_cmp {
+  expr { [istarget i?86-*-*] || [istarget x86_64-*-*]
+|| [istarget aarch64*-*-*]
+|| [is-effective-target arm_neon] }}]
+}
+
 # Return 1 if the target supports addition of char vectors for at least
 # one vector length.
 


[2/5] Make vectorizable_operation punt early on codes it doesn't handle

2019-11-29 Thread Richard Sandiford
vectorizable_operation returned false for codes that are handled by
vectorizable_shift, but only after it had already done a lot of work.
Checking earlier should be more efficient and avoid polluting the logs
with duplicate info.

Also, there was no such early-out for comparisons or COND_EXPRs.
Fixing that avoids a false scan-tree-dump hit with a later patch.


2019-11-29  Richard Sandiford  

gcc/
* tree-vect-stmts.c (vectorizable_operation): Punt early
on codes that are handled elsewhere.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2019-11-29 08:28:12.015121876 +
+++ gcc/tree-vect-stmts.c   2019-11-29 09:11:24.553108756 +
@@ -5999,6 +5999,21 @@ vectorizable_operation (stmt_vec_info st
 
   orig_code = code = gimple_assign_rhs_code (stmt);
 
+  /* Shifts are handled in vectorizable_shift.  */
+  if (code == LSHIFT_EXPR
+  || code == RSHIFT_EXPR
+  || code == LROTATE_EXPR
+  || code == RROTATE_EXPR)
+   return false;
+
+  /* Comparisons are handled in vectorizable_comparison.  */
+  if (TREE_CODE_CLASS (code) == tcc_comparison)
+return false;
+
+  /* Conditions are handled in vectorizable_condition.  */
+  if (code == COND_EXPR)
+return false;
+
   /* For pointer addition and subtraction, we should use the normal
  plus and minus for the vector operation.  */
   if (code == POINTER_PLUS_EXPR)
@@ -6123,11 +6138,6 @@ vectorizable_operation (stmt_vec_info st
 
   gcc_assert (ncopies >= 1);
 
-  /* Shifts are handled in vectorizable_shift ().  */
-  if (code == LSHIFT_EXPR || code == RSHIFT_EXPR || code == LROTATE_EXPR
-  || code == RROTATE_EXPR)
-   return false;
-
   /* Supportable by target?  */
 
   vec_mode = TYPE_MODE (vectype);


[3/5] Make vect_get_mask_type_for_stmt take a group size

2019-11-29 Thread Richard Sandiford
This patch makes vect_get_mask_type_for_stmt and
get_mask_type_for_scalar_type take a group size instead of
the SLP node, so that later patches can call it before an
SLP node has been built.


2019-11-29  Richard Sandiford  

gcc/
* tree-vectorizer.h (get_mask_type_for_scalar_type): Replace
the slp_tree parameter with a group size parameter.
(vect_get_mask_type_for_stmt): Likewise.
* tree-vect-stmts.c (get_mask_type_for_scalar_type): Likewise.
(vect_get_mask_type_for_stmt): Likewise.
* tree-vect-slp.c (vect_slp_analyze_node_operations_1): Update
call accordingly.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2019-11-16 10:40:08.422638677 +
+++ gcc/tree-vectorizer.h   2019-11-29 09:11:27.781086362 +
@@ -1640,7 +1640,7 @@ extern tree get_related_vectype_for_scal
 poly_uint64 = 0);
 extern tree get_vectype_for_scalar_type (vec_info *, tree, unsigned int = 0);
 extern tree get_vectype_for_scalar_type (vec_info *, tree, slp_tree);
-extern tree get_mask_type_for_scalar_type (vec_info *, tree, slp_tree = 0);
+extern tree get_mask_type_for_scalar_type (vec_info *, tree, unsigned int = 0);
 extern tree get_same_sized_vectype (tree, tree);
 extern bool vect_chooses_same_modes_p (vec_info *, machine_mode);
 extern bool vect_get_loop_mask_type (loop_vec_info);
@@ -1693,7 +1693,7 @@ extern gcall *vect_gen_while (tree, tree
 extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
 extern opt_result vect_get_vector_types_for_stmt (stmt_vec_info, tree *,
  tree *, unsigned int = 0);
-extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info, slp_tree = 0);
+extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info, unsigned int = 0);
 
 /* In tree-vect-data-refs.c.  */
 extern bool vect_can_force_dr_alignment_p (const_tree, poly_uint64);
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2019-11-29 09:11:24.553108756 +
+++ gcc/tree-vect-stmts.c   2019-11-29 09:11:27.781086362 +
@@ -11362,14 +11362,15 @@ get_vectype_for_scalar_type (vec_info *v
 
Returns the mask type corresponding to a result of comparison
of vectors of specified SCALAR_TYPE as supported by target.
-   NODE, if nonnull, is the SLP tree node that will use the returned
-   vector type.  */
+   If GROUP_SIZE is nonzero and we're performing BB vectorization,
+   make sure that the number of elements in the vector is no bigger
+   than GROUP_SIZE.  */
 
 tree
 get_mask_type_for_scalar_type (vec_info *vinfo, tree scalar_type,
-  slp_tree node)
+  unsigned int group_size)
 {
-  tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type, node);
+  tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type, group_size);
 
   if (!vectype)
 return NULL;
@@ -12229,11 +12230,12 @@ vect_get_vector_types_for_stmt (stmt_vec
 
 /* Try to determine the correct vector type for STMT_INFO, which is a
statement that produces a scalar boolean result.  Return the vector
-   type on success, otherwise return NULL_TREE.  NODE, if nonnull,
-   is the SLP tree node that will use the returned vector type.  */
+   type on success, otherwise return NULL_TREE.  If GROUP_SIZE is nonzero
+   and we're performing BB vectorization, make sure that the number of
+   elements in the vector is no bigger than GROUP_SIZE.  */
 
 opt_tree
-vect_get_mask_type_for_stmt (stmt_vec_info stmt_info, slp_tree node)
+vect_get_mask_type_for_stmt (stmt_vec_info stmt_info, unsigned int group_size)
 {
   vec_info *vinfo = stmt_info->vinfo;
   gimple *stmt = stmt_info->stmt;
@@ -12245,7 +12247,8 @@ vect_get_mask_type_for_stmt (stmt_vec_in
   && !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt
 {
   scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
-  mask_type = get_mask_type_for_scalar_type (vinfo, scalar_type, node);
+  mask_type = get_mask_type_for_scalar_type (vinfo, scalar_type,
+group_size);
 
   if (!mask_type)
return opt_tree::failure_at (stmt,
Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c 2019-11-26 22:04:58.099362339 +
+++ gcc/tree-vect-slp.c 2019-11-29 09:11:27.777086392 +
@@ -2757,7 +2757,8 @@ vect_slp_analyze_node_operations_1 (vec_
   bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
   if (bb_vinfo && STMT_VINFO_VECTYPE (stmt_info) == boolean_type_node)
 {
-  tree vectype = vect_get_mask_type_for_stmt (stmt_info, node);
+  unsigned int group_size = SLP_TREE_SCALAR_STMTS (node).length ();
+  tree vectype = vect_get_mask_type_for_stmt (stmt_info, group_size);
   if (!vec

[4/5] Record the vector mask precision in stmt_vec_info

2019-11-29 Thread Richard Sandiford
search_type_for_mask uses a worklist to search a chain of boolean
operations for a natural vector mask type.  This patch instead does
that in vect_determine_stmt_precisions, where we also look for
overpromoted integer operations.  We then only need to compute
the precision once and can cache it in the stmt_vec_info.

The new function vect_determine_mask_precision is supposed
to handle exactly the same cases as search_type_for_mask_1,
and in the same way.  There's a lot we could improve here,
but that's not stage 3 material.

I wondered about sharing mask_precision with other fields like
operation_precision, but in the end that seemed too dangerous.
We have patterns to convert between boolean and non-boolean
operations and it would be very easy to get mixed up about
which case the fields are describing.


2019-11-29  Richard Sandiford  

gcc/
* tree-vectorizer.h (stmt_vec_info::mask_precision): New field.
(vect_use_mask_type_p): New function.
* tree-vect-patterns.c (vect_init_pattern_stmt): Copy the
mask precision to the pattern statement.
(append_pattern_def_seq): Add a scalar_type_for_mask parameter
and use it to initialize the new stmt's mask precision.
(search_type_for_mask_1): Delete.
(search_type_for_mask): Replace with...
(integer_type_for_mask): ...this new function.  Use the information
cached in the stmt_vec_info.
(vect_recog_bool_pattern): Update accordingly.
(build_mask_conversion): Pass the scalar type associated with the
mask type to append_pattern_def_seq.
(vect_recog_mask_conversion_pattern): Likewise.  Call
integer_type_for_mask instead of search_type_for_mask.
(vect_convert_mask_for_vectype): Call integer_type_for_mask instead
of search_type_for_mask.
(possible_vector_mask_operation_p): New function.
(vect_determine_mask_precision): Likewise.
(vect_determine_stmt_precisions): Call it.

Index: gcc/tree-vectorizer.h
===
--- gcc/tree-vectorizer.h   2019-11-29 09:11:27.781086362 +
+++ gcc/tree-vectorizer.h   2019-11-29 09:11:31.277062112 +
@@ -1089,6 +1089,23 @@ typedef struct data_reference *dr_p;
   unsigned int operation_precision;
   signop operation_sign;
 
+  /* If the statement produces a boolean result, this value describes
+ how we should choose the associated vector type.  The possible
+ values are:
+
+ - an integer precision N if we should use the vector mask type
+   associated with N-bit integers.  This is only used if all relevant
+   input booleans also want the vector mask type for N-bit integers,
+   or if we can convert them into that form by pattern-matching.
+
+ - ~0U if we considered choosing a vector mask type but decided
+   to treat the boolean as a normal integer type instead.
+
+ - 0 otherwise.  This means either that the operation isn't one that
+   could have a vector mask type (and so should have a normal vector
+   type instead) or that we simply haven't made a choice either way.  */
+  unsigned int mask_precision;
+
   /* True if this is only suitable for SLP vectorization.  */
   bool slp_vect_only_p;
 };
@@ -1245,6 +1262,15 @@ nested_in_vect_loop_p (class loop *loop,
  && (loop->inner == (gimple_bb (stmt_info->stmt))->loop_father));
 }
 
+/* Return true if STMT_INFO should produce a vector mask type rather than
+   a normal nonmask type.  */
+
+static inline bool
+vect_use_mask_type_p (stmt_vec_info stmt_info)
+{
+  return stmt_info->mask_precision && stmt_info->mask_precision != ~0U;
+}
+
 /* Return TRUE if a statement represented by STMT_INFO is a part of a
pattern.  */
 
Index: gcc/tree-vect-patterns.c
===
--- gcc/tree-vect-patterns.c2019-11-29 09:11:21.389130702 +
+++ gcc/tree-vect-patterns.c2019-11-29 09:11:31.277062112 +
@@ -112,7 +112,12 @@ vect_init_pattern_stmt (gimple *pattern_
   STMT_VINFO_DEF_TYPE (pattern_stmt_info)
 = STMT_VINFO_DEF_TYPE (orig_stmt_info);
   if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
-STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
+{
+  gcc_assert (VECTOR_BOOLEAN_TYPE_P (vectype)
+ == vect_use_mask_type_p (orig_stmt_info));
+  STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
+  pattern_stmt_info->mask_precision = orig_stmt_info->mask_precision;
+}
   return pattern_stmt_info;
 }
 
@@ -131,17 +136,25 @@ vect_set_pattern_stmt (gimple *pattern_s
 
 /* Add NEW_STMT to STMT_INFO's pattern definition statements.  If VECTYPE
is nonnull, record that NEW_STMT's vector type is VECTYPE, which might
-   be different from the vector type of the final pattern statement.  */
+   be different from the vector type of the final pattern statement.
+   If VECTYPE is a mask type, SCALAR_TYPE_FOR_MASK is the scalar type
+   from 

[PATCH] ipa-cp: Avoid ICEs when looking at expanded thunks and unoptimized functions (PR 92476)

2019-11-29 Thread Martin Jambor
Hi,

the patch below fixes the i686 failures reported in PR 92476.  Newly
expanded "artificial" thunks need to be analyzed when expanded so that
we create necessary function summaries and jump functions for them.
They still don't get IPA-CP lattices, so I looked at all accesses to
those and verified that only the functions saving IPA-VR and IPA-bits
analyses could try to access non-existing lattices.

After that, Martin's testcase in comment 4 of the bug also revealed two
places where we try to access summaries of unoptimized functions and
segfault, so I fixed those too.  Unfortunately it seems our testsuite
cannot optimize different LTO compilation units with different options
and so I could not add the testcase there.  But it no longer ICEs.

Bootstrapped and LTO-profile-bootstrapped and tested on x86_64-linux and
I also verified the -m32 testsuite failures are all gone.  OK for trunk?

Thanks,

Martin


2019-11-28  Martin Jambor  
Jan Hubicka  

PR ipa/92476
* ipa-cp.c (set_single_call_flag): Set node_calling_single_call in
the summary only if the summary exists.
(find_more_scalar_values_for_callers_subset): Check node_dead in
the summary only if the summary exists.
(ipcp_store_bits_results): Ignore nodes without lattices.
(ipcp_store_vr_results): Likewise.
* cgraphclones.c: Include ipa-fnsummary.h and ipa-prop.h and the
header files required by them.
(cgraph_node::expand_all_artificial_thunks): Analyze expanded thunks.
---
 gcc/cgraphclones.c |  7 +++
 gcc/ipa-cp.c   | 10 --
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index ac5c57a47aa..8e86e82a226 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -80,6 +80,11 @@ along with GCC; see the file COPYING3.  If not see
 #include "tree-inline.h"
 #include "dumpfile.h"
 #include "gimple-pretty-print.h"
+#include "alloc-pool.h"
+#include "symbol-summary.h"
+#include "tree-vrp.h"
+#include "ipa-prop.h"
+#include "ipa-fnsummary.h"
 
 /* Create clone of edge in the node N represented by CALL_EXPR
the callgraph.  */
@@ -267,6 +272,8 @@ cgraph_node::expand_all_artificial_thunks ()
  {
thunk->thunk.thunk_p = false;
thunk->analyze ();
+   ipa_analyze_node (thunk);
+   inline_analyze_function (thunk);
  }
thunk->expand_all_artificial_thunks ();
   }
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index 31a98a3d98a..7fb9f30f709 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -1165,7 +1165,7 @@ set_single_call_flag (cgraph_node *node, void *)
   /* Local thunks can be handled transparently, skip them.  */
   while (cs && cs->caller->thunk.thunk_p && cs->caller->local)
 cs = cs->next_caller;
-  if (cs)
+  if (cs && IPA_NODE_REF (cs->caller))
 {
   IPA_NODE_REF (cs->caller)->node_calling_single_call = true;
   return true;
@@ -4411,7 +4411,7 @@ find_more_scalar_values_for_callers_subset (struct 
cgraph_node *node,
  struct ipa_jump_func *jump_func;
  tree t;
 
- if (IPA_NODE_REF (cs->caller)->node_dead)
+ if (IPA_NODE_REF (cs->caller) && IPA_NODE_REF (cs->caller)->node_dead)
continue;
 
  if (!IPA_EDGE_REF (cs)
@@ -5416,6 +5416,9 @@ ipcp_store_bits_results (void)
 
   if (info->ipcp_orig_node)
info = IPA_NODE_REF (info->ipcp_orig_node);
+  if (!info->lattices)
+   /* Newly expanded artificial thunks do not have lattices.  */
+   continue;
 
   unsigned count = ipa_get_param_count (info);
   for (unsigned i = 0; i < count; i++)
@@ -5489,6 +5492,9 @@ ipcp_store_vr_results (void)
 
   if (info->ipcp_orig_node)
info = IPA_NODE_REF (info->ipcp_orig_node);
+  if (!info->lattices)
+   /* Newly expanded artificial thunks do not have lattices.  */
+   continue;
 
   unsigned count = ipa_get_param_count (info);
   for (unsigned i = 0; i < count; i++)
-- 
2.24.0



[5/5] Don't defer choice of vector type for bools (PR 92596)

2019-11-29 Thread Richard Sandiford
Now that stmt_vec_info records the choice between vector mask
types and normal nonmask types, we can use that information in
vect_get_vector_types_for_stmt instead of deferring the choice
of vector type till later.

vect_get_mask_type_for_stmt used to check whether the boolean inputs
to an operation:
(a) consistently used mask types or consistently used nonmask types; and
(b) agreed on the number of elements.

(b) shouldn't be a problem when (a) is met.  If the operation
consistently uses mask types, tree-vect-patterns.c will have corrected
any mismatches in mask precision.  (This is because we only use mask
types for a small well-known set of operations and tree-vect-patterns.c
knows how to handle any that could have different mask precisions.)
And if the operation consistently uses normal nonmask types, there's
no reason why booleans should need extra vector compatibility checks
compared to ordinary integers.

So the potential difficulties all seem to come from (a).  Now that
we've chosen the result type ahead of time, we also have to consider
whether the outputs and inputs consistently use mask types.

Taking each vectorizable_* routine in turn:

- vectorizable_call

vect_get_vector_types_for_stmt only handled booleans specially
for gassigns, so vect_get_mask_type_for_stmt never had chance to
handle calls.  I'm not sure we support any calls that operate on
booleans, but as things stand, a boolean result would always have
a nonmask type.  Presumably any vector argument would also need to
use nonmask types, unless it corresponds to internal_fn_mask_index
(which is already a special case).

For safety, I've added a check for mask/nonmask combinations here
even though we didn't check this previously.

- vectorizable_simd_clone_call

Again, vect_get_mask_type_for_stmt never had chance to handle calls.
The result of the call will always be a nonmask type and the patch
for PR 92710 rejects mask arguments.  So all booleans should
consistently use nonmask types here.

- vectorizable_conversion

The function already rejects any conversion between booleans in which
one type isn't a mask type.

- vectorizable_operation

This function definitely needs a consistency check, e.g. to handle
& and | in which one operand is loaded from memory and the other is
a comparison result.  Ideally we'd handle this via pattern stmts
instead (like we do for the all-mask case), but that's future work.

- vectorizable_assignment

VECT_SCALAR_BOOLEAN_TYPE_P requires single-bit precision, so the
current code already rejects problematic cases.

- vectorizable_load

Loads always produce nonmask types and there are no relevant inputs
to check against.

- vectorizable_store

vect_check_store_rhs already rejects mask/nonmask combinations
via useless_type_conversion_p.

- vectorizable_reduction
- vectorizable_lc_phi

PHIs always have nonmask types.  After the change above, attempts
to combine the PHI result with a mask type would be rejected by
vectorizable_operation.  (Again, it would be better to handle
this using pattern stmts.)

- vectorizable_induction

We don't generate inductions for booleans.

- vectorizable_shift

The function already rejects boolean shifts via type_has_mode_precision_p.

- vectorizable_condition

The function already rejects mismatches via useless_type_conversion_p.

- vectorizable_comparison

The function already rejects comparisons between mask and nonmask types.
The result is always a mask type.


2019-11-29  Richard Sandiford  

gcc/
PR tree-optimization/92596
* tree-vect-stmts.c (vectorizable_call): Punt on hybrid mask/nonmask
operations.
(vectorizable_operation): Likewise, instead of relying on
vect_get_mask_type_for_stmt to do this.
(vect_get_vector_types_for_stmt): Always return a vector type
immediately, rather than deferring the choice for boolean results.
Use a vector mask type instead of a normal vector if
vect_use_mask_type_p.
(vect_get_mask_type_for_stmt): Delete.
* tree-vect-loop.c (vect_determine_vf_for_stmt_1): Remove
mask_producers argument and special boolean_type_node handling.
(vect_determine_vf_for_stmt): Remove mask_producers argument and
update calls to vect_determine_vf_for_stmt_1.  Remove doubled call.
(vect_determine_vectorization_factor): Update call accordingly.
* tree-vect-slp.c (vect_build_slp_tree_1): Remove special
boolean_type_node handling.
(vect_slp_analyze_node_operations_1): Likewise.

gcc/testsuite/
PR tree-optimization/92596
* gcc.dg/vect/bb-slp-pr92596.c: New test.
* gcc.dg/vect/bb-slp-43.c: Likewise.

Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   2019-11-29 09:13:43.0 +
+++ gcc/tree-

Re: [2/5] Make vectorizable_operation punt early on codes it doesn't handle

2019-11-29 Thread Richard Biener
On Fri, Nov 29, 2019 at 11:13 AM Richard Sandiford
 wrote:
>
> vectorizable_operation returned false for codes that are handled by
> vectorizable_shift, but only after it had already done a lot of work.
> Checking earlier should be more efficient and avoid polluting the logs
> with duplicate info.
>
> Also, there was no such early-out for comparisons or COND_EXPRs.
> Fixing that avoids a false scan-tree-dump hit with a later patch.

OK.

>
> 2019-11-29  Richard Sandiford  
>
> gcc/
> * tree-vect-stmts.c (vectorizable_operation): Punt early
> on codes that are handled elsewhere.
>
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2019-11-29 08:28:12.015121876 +
> +++ gcc/tree-vect-stmts.c   2019-11-29 09:11:24.553108756 +
> @@ -5999,6 +5999,21 @@ vectorizable_operation (stmt_vec_info st
>
>orig_code = code = gimple_assign_rhs_code (stmt);
>
> +  /* Shifts are handled in vectorizable_shift.  */
> +  if (code == LSHIFT_EXPR
> +  || code == RSHIFT_EXPR
> +  || code == LROTATE_EXPR
> +  || code == RROTATE_EXPR)
> +   return false;
> +
> +  /* Comparisons are handled in vectorizable_comparison.  */
> +  if (TREE_CODE_CLASS (code) == tcc_comparison)
> +return false;
> +
> +  /* Conditions are handled in vectorizable_condition.  */
> +  if (code == COND_EXPR)
> +return false;
> +
>/* For pointer addition and subtraction, we should use the normal
>   plus and minus for the vector operation.  */
>if (code == POINTER_PLUS_EXPR)
> @@ -6123,11 +6138,6 @@ vectorizable_operation (stmt_vec_info st
>
>gcc_assert (ncopies >= 1);
>
> -  /* Shifts are handled in vectorizable_shift ().  */
> -  if (code == LSHIFT_EXPR || code == RSHIFT_EXPR || code == LROTATE_EXPR
> -  || code == RROTATE_EXPR)
> -   return false;
> -
>/* Supportable by target?  */
>
>vec_mode = TYPE_MODE (vectype);


Re: [3/5] Make vect_get_mask_type_for_stmt take a group size

2019-11-29 Thread Richard Biener
On Fri, Nov 29, 2019 at 11:14 AM Richard Sandiford
 wrote:
>
> This patch makes vect_get_mask_type_for_stmt and
> get_mask_type_for_scalar_type take a group size instead of
> the SLP node, so that later patches can call it before an
> SLP node has been built.

OK.

>
> 2019-11-29  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (get_mask_type_for_scalar_type): Replace
> the slp_tree parameter with a group size parameter.
> (vect_get_mask_type_for_stmt): Likewise.
> * tree-vect-stmts.c (get_mask_type_for_scalar_type): Likewise.
> (vect_get_mask_type_for_stmt): Likewise.
> * tree-vect-slp.c (vect_slp_analyze_node_operations_1): Update
> call accordingly.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2019-11-16 10:40:08.422638677 +
> +++ gcc/tree-vectorizer.h   2019-11-29 09:11:27.781086362 +
> @@ -1640,7 +1640,7 @@ extern tree get_related_vectype_for_scal
>  poly_uint64 = 0);
>  extern tree get_vectype_for_scalar_type (vec_info *, tree, unsigned int = 0);
>  extern tree get_vectype_for_scalar_type (vec_info *, tree, slp_tree);
> -extern tree get_mask_type_for_scalar_type (vec_info *, tree, slp_tree = 0);
> +extern tree get_mask_type_for_scalar_type (vec_info *, tree, unsigned int = 
> 0);
>  extern tree get_same_sized_vectype (tree, tree);
>  extern bool vect_chooses_same_modes_p (vec_info *, machine_mode);
>  extern bool vect_get_loop_mask_type (loop_vec_info);
> @@ -1693,7 +1693,7 @@ extern gcall *vect_gen_while (tree, tree
>  extern tree vect_gen_while_not (gimple_seq *, tree, tree, tree);
>  extern opt_result vect_get_vector_types_for_stmt (stmt_vec_info, tree *,
>   tree *, unsigned int = 0);
> -extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info, slp_tree = 0);
> +extern opt_tree vect_get_mask_type_for_stmt (stmt_vec_info, unsigned int = 
> 0);
>
>  /* In tree-vect-data-refs.c.  */
>  extern bool vect_can_force_dr_alignment_p (const_tree, poly_uint64);
> Index: gcc/tree-vect-stmts.c
> ===
> --- gcc/tree-vect-stmts.c   2019-11-29 09:11:24.553108756 +
> +++ gcc/tree-vect-stmts.c   2019-11-29 09:11:27.781086362 +
> @@ -11362,14 +11362,15 @@ get_vectype_for_scalar_type (vec_info *v
>
> Returns the mask type corresponding to a result of comparison
> of vectors of specified SCALAR_TYPE as supported by target.
> -   NODE, if nonnull, is the SLP tree node that will use the returned
> -   vector type.  */
> +   If GROUP_SIZE is nonzero and we're performing BB vectorization,
> +   make sure that the number of elements in the vector is no bigger
> +   than GROUP_SIZE.  */
>
>  tree
>  get_mask_type_for_scalar_type (vec_info *vinfo, tree scalar_type,
> -  slp_tree node)
> +  unsigned int group_size)
>  {
> -  tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type, node);
> +  tree vectype = get_vectype_for_scalar_type (vinfo, scalar_type, 
> group_size);
>
>if (!vectype)
>  return NULL;
> @@ -12229,11 +12230,12 @@ vect_get_vector_types_for_stmt (stmt_vec
>
>  /* Try to determine the correct vector type for STMT_INFO, which is a
> statement that produces a scalar boolean result.  Return the vector
> -   type on success, otherwise return NULL_TREE.  NODE, if nonnull,
> -   is the SLP tree node that will use the returned vector type.  */
> +   type on success, otherwise return NULL_TREE.  If GROUP_SIZE is nonzero
> +   and we're performing BB vectorization, make sure that the number of
> +   elements in the vector is no bigger than GROUP_SIZE.  */
>
>  opt_tree
> -vect_get_mask_type_for_stmt (stmt_vec_info stmt_info, slp_tree node)
> +vect_get_mask_type_for_stmt (stmt_vec_info stmt_info, unsigned int 
> group_size)
>  {
>vec_info *vinfo = stmt_info->vinfo;
>gimple *stmt = stmt_info->stmt;
> @@ -12245,7 +12247,8 @@ vect_get_mask_type_for_stmt (stmt_vec_in
>&& !VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (gimple_assign_rhs1 (stmt
>  {
>scalar_type = TREE_TYPE (gimple_assign_rhs1 (stmt));
> -  mask_type = get_mask_type_for_scalar_type (vinfo, scalar_type, node);
> +  mask_type = get_mask_type_for_scalar_type (vinfo, scalar_type,
> +group_size);
>
>if (!mask_type)
> return opt_tree::failure_at (stmt,
> Index: gcc/tree-vect-slp.c
> ===
> --- gcc/tree-vect-slp.c 2019-11-26 22:04:58.099362339 +
> +++ gcc/tree-vect-slp.c 2019-11-29 09:11:27.777086392 +
> @@ -2757,7 +2757,8 @@ vect_slp_analyze_node_operations_1 (vec_
>bb_vec_info bb_vinfo = STMT_VINFO_BB_VINFO (stmt_info);
>if (bb_vinfo && STMT_VINFO_VECTYPE (stmt_info) == boolean

Re: [patch] follow up on the aarch64 r18 story

2019-11-29 Thread Olivier Hainque
Hi Richard,

> On 21 Nov 2019, at 23:44, Olivier Hainque  wrote:

>> +/* The pair of scratch registers used for stack probing during prologue.  */
>> +#define PROBE_STACK_FIRST_REG   R10_REGNUM
>> +#define PROBE_STACK_SECOND_REG  R11_REGNUM
>> +
>> 
>> These should be moved to the define_constant in aarch64.md that defines all 
>> the register numbers (add them to near the end of the list where the
>> other aliases are defined.
> 
> Sure, will adjust and retest. Thanks.

Here's an updated version of the patch with your suggestion
incorporated, bootstrapped and regression checked for languages=all
on a native aarch64-linux host.

I replaced the _REG suffix by _REGNUM to match all the other
definitions in aarch64.md, just for consistency.

With Kind Regards,

Olivier

2019-11-07  Olivier Hainque  

* config/aarch64/aarch64.md: Define PROBE_STACK_FIRST_REGNUM
and PROBE_STACK_SECOND_REGNUM constants, designating r10/r11.
Replacements for the PROBE_STACK_FIRST/SECOND_REG constants in
aarch64.c.
* config/aarch64/aarch64.h (TARGET_OS_USES_R18): New macro,
default value 0 that target OS configuration files may redefine.
(STATIC_CHAIN_REGNUM): r9 if TARGET_OS_USES_R18, r18 otherwise.
* config/aarch64/aarch64.c (PROBE_STACK_FIRST_REG): Remove.
(PROBE_STACK_SECOND_REG): Remove.
(aarch64_emit_probe_stack_range): Adjust to the _REG -> _REGNUM
suffix update for PROBE_STACK register numbers.
(aarch64_conditional_register_usage): Preserve r18 if the target
OS uses it, and check that the static chain selection wouldn't
conflict.



aarch64-os-r18.diff
Description: Binary data


Re: [PATCH] ipa-cp: Avoid ICEs when looking at expanded thunks and unoptimized functions (PR 92476)

2019-11-29 Thread Jan Hubicka
> Hi,
> 
> the patch below fixes the i686 failures reported in PR 92476.  Newly
> expanded "artificial" thunks need to be analyzed when expanded so that
> we create necessary function summaries and jump functions for them.
> They still don't get IPA-CP lattices, so I looked at all accesses to
> those and verified that only the functions saving IPA-VR and IPA-bits
> analyses could try to access non-existing lattices.
> 
> After that, Martin's testcase in comment 4 of the bug also revealed two
> places where we try to access summaries of unoptimized functions and
> segfault, so I fixed those too.  Unfortunately it seems our testsuite
> cannot optimize different LTO compilation units with different options
> and so I could not add the testcase there.  But it no longer ICEs.
I think you can simply add different flag into different testcases:
20090210_1.c:/* { dg-options "-fPIC" { target { ! sparc*-*-* } } } */
20090218-1_1.c:/* { dg-options "-fgnu89-inline" } */
20090218-2_1.c:/* { dg-options { -fgnu89-inline } } */
20111207-1_1.c:/* { dg-options "-fno-lto" } */

> 
> Bootstrapped and LTO-profile-bootstrapped and tested on x86_64-linux and
> I also verified the -m32 testsuite failures are all gone.  OK for trunk?
> 
> Thanks,
> 
> Martin
> 
> 
> 2019-11-28  Martin Jambor  
> Jan Hubicka  
> 
>   PR ipa/92476
>   * ipa-cp.c (set_single_call_flag): Set node_calling_single_call in
>   the summary only if the summary exists.
>   (find_more_scalar_values_for_callers_subset): Check node_dead in
>   the summary only if the summary exists.
>   (ipcp_store_bits_results): Ignore nodes without lattices.
>   (ipcp_store_vr_results): Likewise.
>   * cgraphclones.c: Include ipa-fnsummary.h and ipa-prop.h and the
>   header files required by them.
>   (cgraph_node::expand_all_artificial_thunks): Analyze expanded thunks.

OK, thanks
Honza
> ---
>  gcc/cgraphclones.c |  7 +++
>  gcc/ipa-cp.c   | 10 --
>  2 files changed, 15 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
> index ac5c57a47aa..8e86e82a226 100644
> --- a/gcc/cgraphclones.c
> +++ b/gcc/cgraphclones.c
> @@ -80,6 +80,11 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-inline.h"
>  #include "dumpfile.h"
>  #include "gimple-pretty-print.h"
> +#include "alloc-pool.h"
> +#include "symbol-summary.h"
> +#include "tree-vrp.h"
> +#include "ipa-prop.h"
> +#include "ipa-fnsummary.h"
>  
>  /* Create clone of edge in the node N represented by CALL_EXPR
> the callgraph.  */
> @@ -267,6 +272,8 @@ cgraph_node::expand_all_artificial_thunks ()
> {
>   thunk->thunk.thunk_p = false;
>   thunk->analyze ();
> + ipa_analyze_node (thunk);
> + inline_analyze_function (thunk);
> }
>   thunk->expand_all_artificial_thunks ();
>}
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 31a98a3d98a..7fb9f30f709 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -1165,7 +1165,7 @@ set_single_call_flag (cgraph_node *node, void *)
>/* Local thunks can be handled transparently, skip them.  */
>while (cs && cs->caller->thunk.thunk_p && cs->caller->local)
>  cs = cs->next_caller;
> -  if (cs)
> +  if (cs && IPA_NODE_REF (cs->caller))
>  {
>IPA_NODE_REF (cs->caller)->node_calling_single_call = true;
>return true;
> @@ -4411,7 +4411,7 @@ find_more_scalar_values_for_callers_subset (struct 
> cgraph_node *node,
> struct ipa_jump_func *jump_func;
> tree t;
>  
> -   if (IPA_NODE_REF (cs->caller)->node_dead)
> +   if (IPA_NODE_REF (cs->caller) && IPA_NODE_REF (cs->caller)->node_dead)
>   continue;
>  
> if (!IPA_EDGE_REF (cs)
> @@ -5416,6 +5416,9 @@ ipcp_store_bits_results (void)
>  
>if (info->ipcp_orig_node)
>   info = IPA_NODE_REF (info->ipcp_orig_node);
> +  if (!info->lattices)
> + /* Newly expanded artificial thunks do not have lattices.  */
> + continue;
>  
>unsigned count = ipa_get_param_count (info);
>for (unsigned i = 0; i < count; i++)
> @@ -5489,6 +5492,9 @@ ipcp_store_vr_results (void)
>  
>if (info->ipcp_orig_node)
>   info = IPA_NODE_REF (info->ipcp_orig_node);
> +  if (!info->lattices)
> + /* Newly expanded artificial thunks do not have lattices.  */
> + continue;
>  
>unsigned count = ipa_get_param_count (info);
>for (unsigned i = 0; i < count; i++)
> -- 
> 2.24.0
> 


Re: [4/5] Record the vector mask precision in stmt_vec_info

2019-11-29 Thread Richard Biener
On Fri, Nov 29, 2019 at 11:14 AM Richard Sandiford
 wrote:
>
> search_type_for_mask uses a worklist to search a chain of boolean
> operations for a natural vector mask type.  This patch instead does
> that in vect_determine_stmt_precisions, where we also look for
> overpromoted integer operations.  We then only need to compute
> the precision once and can cache it in the stmt_vec_info.
>
> The new function vect_determine_mask_precision is supposed
> to handle exactly the same cases as search_type_for_mask_1,
> and in the same way.  There's a lot we could improve here,
> but that's not stage 3 material.
>
> I wondered about sharing mask_precision with other fields like
> operation_precision, but in the end that seemed too dangerous.
> We have patterns to convert between boolean and non-boolean
> operations and it would be very easy to get mixed up about
> which case the fields are describing.

OK.

>
> 2019-11-29  Richard Sandiford  
>
> gcc/
> * tree-vectorizer.h (stmt_vec_info::mask_precision): New field.
> (vect_use_mask_type_p): New function.
> * tree-vect-patterns.c (vect_init_pattern_stmt): Copy the
> mask precision to the pattern statement.
> (append_pattern_def_seq): Add a scalar_type_for_mask parameter
> and use it to initialize the new stmt's mask precision.
> (search_type_for_mask_1): Delete.
> (search_type_for_mask): Replace with...
> (integer_type_for_mask): ...this new function.  Use the information
> cached in the stmt_vec_info.
> (vect_recog_bool_pattern): Update accordingly.
> (build_mask_conversion): Pass the scalar type associated with the
> mask type to append_pattern_def_seq.
> (vect_recog_mask_conversion_pattern): Likewise.  Call
> integer_type_for_mask instead of search_type_for_mask.
> (vect_convert_mask_for_vectype): Call integer_type_for_mask instead
> of search_type_for_mask.
> (possible_vector_mask_operation_p): New function.
> (vect_determine_mask_precision): Likewise.
> (vect_determine_stmt_precisions): Call it.
>
> Index: gcc/tree-vectorizer.h
> ===
> --- gcc/tree-vectorizer.h   2019-11-29 09:11:27.781086362 +
> +++ gcc/tree-vectorizer.h   2019-11-29 09:11:31.277062112 +
> @@ -1089,6 +1089,23 @@ typedef struct data_reference *dr_p;
>unsigned int operation_precision;
>signop operation_sign;
>
> +  /* If the statement produces a boolean result, this value describes
> + how we should choose the associated vector type.  The possible
> + values are:
> +
> + - an integer precision N if we should use the vector mask type
> +   associated with N-bit integers.  This is only used if all relevant
> +   input booleans also want the vector mask type for N-bit integers,
> +   or if we can convert them into that form by pattern-matching.
> +
> + - ~0U if we considered choosing a vector mask type but decided
> +   to treat the boolean as a normal integer type instead.
> +
> + - 0 otherwise.  This means either that the operation isn't one that
> +   could have a vector mask type (and so should have a normal vector
> +   type instead) or that we simply haven't made a choice either way.  */
> +  unsigned int mask_precision;
> +
>/* True if this is only suitable for SLP vectorization.  */
>bool slp_vect_only_p;
>  };
> @@ -1245,6 +1262,15 @@ nested_in_vect_loop_p (class loop *loop,
>   && (loop->inner == (gimple_bb (stmt_info->stmt))->loop_father));
>  }
>
> +/* Return true if STMT_INFO should produce a vector mask type rather than
> +   a normal nonmask type.  */
> +
> +static inline bool
> +vect_use_mask_type_p (stmt_vec_info stmt_info)
> +{
> +  return stmt_info->mask_precision && stmt_info->mask_precision != ~0U;
> +}
> +
>  /* Return TRUE if a statement represented by STMT_INFO is a part of a
> pattern.  */
>
> Index: gcc/tree-vect-patterns.c
> ===
> --- gcc/tree-vect-patterns.c2019-11-29 09:11:21.389130702 +
> +++ gcc/tree-vect-patterns.c2019-11-29 09:11:31.277062112 +
> @@ -112,7 +112,12 @@ vect_init_pattern_stmt (gimple *pattern_
>STMT_VINFO_DEF_TYPE (pattern_stmt_info)
>  = STMT_VINFO_DEF_TYPE (orig_stmt_info);
>if (!STMT_VINFO_VECTYPE (pattern_stmt_info))
> -STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> +{
> +  gcc_assert (VECTOR_BOOLEAN_TYPE_P (vectype)
> + == vect_use_mask_type_p (orig_stmt_info));
> +  STMT_VINFO_VECTYPE (pattern_stmt_info) = vectype;
> +  pattern_stmt_info->mask_precision = orig_stmt_info->mask_precision;
> +}
>return pattern_stmt_info;
>  }
>
> @@ -131,17 +136,25 @@ vect_set_pattern_stmt (gimple *pattern_s
>
>  /* Add NEW_STMT to STMT_INFO's pattern definition statements.  If VECTYPE
> is nonnull, record that NEW_ST

Re: [1/5] Improve tree-vect-patterns.c handling of boolean comparisons

2019-11-29 Thread Richard Biener
On Fri, Nov 29, 2019 at 11:12 AM Richard Sandiford
 wrote:
>
> vect_recog_bool_pattern assumed that a comparison between two booleans
> should always become a comparison of vector mask types (implemented as an
> XOR_EXPR).  But if the booleans in question are generated as data values
> (e.g. because they're loaded directly from memory), we should treat them
> like ordinary integers instead, just as we do for boolean logic ops whose
> operands are loaded from memory.  vect_get_mask_type_for_stmt already
> handled this case:
>
>   /* We may compare boolean value loaded as vector of integers.
>  Fix mask_type in such case.  */
>   if (mask_type
>   && !VECTOR_BOOLEAN_TYPE_P (mask_type)
>   && gimple_code (stmt) == GIMPLE_ASSIGN
>   && TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == 
> tcc_comparison)
> mask_type = truth_type_for (mask_type);
>
> and not handling it here complicated later patches.
>
> The initial list of targets for vect_bool_cmp is deliberately conservative.

OK.

Richard.

>
> 2019-11-30  Richard Sandiford  
>
> gcc/
> * doc/sourcebuild.texi (vect_bool_cmp): Document.
> * tree-vect-patterns.c (search_type_for_mask_1): If neither
> operand to a boolean comparison is a natural vector mask,
> handle both operands like normal integers instead.
>
> gcc/testsuite/
> * gcc.dg/vect/vect-bool-cmp-2.c: New test.
> * lib/target-supports.exp (check_effective_target_vect_bool_cmp): New
> effective target procedure.
>
> Index: gcc/doc/sourcebuild.texi
> ===
> --- gcc/doc/sourcebuild.texi2019-11-20 21:11:59.065472803 +
> +++ gcc/doc/sourcebuild.texi2019-11-29 09:11:21.365130870 +
> @@ -1522,6 +1522,10 @@ Target does not support a vector add ins
>  @item vect_no_bitwise
>  Target does not support vector bitwise instructions.
>
> +@item vect_bool_cmp
> +Target supports comparison of @code{bool} vectors for at least one
> +vector length.
> +
>  @item vect_char_add
>  Target supports addition of @code{char} vectors for at least one
>  vector length.
> Index: gcc/tree-vect-patterns.c
> ===
> --- gcc/tree-vect-patterns.c2019-11-16 10:29:21.207212217 +
> +++ gcc/tree-vect-patterns.c2019-11-29 09:11:21.389130702 +
> @@ -3944,7 +3944,8 @@ search_type_for_mask_1 (tree var, vec_in
>  vinfo, cache);
>   if (!res || (res2 && TYPE_PRECISION (res) > TYPE_PRECISION 
> (res2)))
> res = res2;
> - break;
> + if (res)
> +   break;
> }
>
>   comp_vectype = get_vectype_for_scalar_type (vinfo, TREE_TYPE 
> (rhs1));
> Index: gcc/testsuite/gcc.dg/vect/vect-bool-cmp-2.c
> ===
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.dg/vect/vect-bool-cmp-2.c 2019-11-29 09:11:21.373130815 
> +
> @@ -0,0 +1,10 @@
> +/* { dg-do compile } */
> +
> +void
> +f (_Bool *restrict x, _Bool *restrict y)
> +{
> +  for (int i = 0; i < 128; ++i)
> +x[i] = x[i] == y[i];
> +}
> +
> +/* { dg-final { scan-tree-dump "loop vectorized" "vect" { target 
> vect_bool_cmp } } } */
> Index: gcc/testsuite/lib/target-supports.exp
> ===
> --- gcc/testsuite/lib/target-supports.exp   2019-11-26 22:11:24.494545152 
> +
> +++ gcc/testsuite/lib/target-supports.exp   2019-11-29 09:11:21.373130815 
> +
> @@ -5749,6 +5749,16 @@ proc check_effective_target_vect_bswap {
>  || [istarget amdgcn-*-*] }}]
>  }
>
> +# Return 1 if the target supports comparison of bool vectors for at
> +# least one vector length.
> +
> +proc check_effective_target_vect_bool_cmp { } {
> +return [check_cached_effective_target_indexed vect_bool_cmp {
> +  expr { [istarget i?86-*-*] || [istarget x86_64-*-*]
> +|| [istarget aarch64*-*-*]
> +|| [is-effective-target arm_neon] }}]
> +}
> +
>  # Return 1 if the target supports addition of char vectors for at least
>  # one vector length.
>


Fix DR_GROUP_GAP for strided accesses (PR 92677)

2019-11-29 Thread Richard Sandiford
When dissolving an SLP-only group of accesses, we should only set
the gap to group_size - 1 for normal non-strided groups.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2019-11-29  Richard Sandiford  

gcc/
PR tree-optimization/92677
* tree-vect-loop.c (vect_dissolve_slp_only_groups): Set the gap
to zero when dissolving a group of strided accesses.

gcc/testsuite/
PR tree-optimization/92677
* gcc.dg/vect/pr92677.c: New test.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2019-11-29 09:13:43.764143091 +
+++ gcc/tree-vect-loop.c2019-11-29 10:52:30.475476141 +
@@ -1829,7 +1829,10 @@ vect_dissolve_slp_only_groups (loop_vec_
  DR_GROUP_FIRST_ELEMENT (vinfo) = vinfo;
  DR_GROUP_NEXT_ELEMENT (vinfo) = NULL;
  DR_GROUP_SIZE (vinfo) = 1;
- DR_GROUP_GAP (vinfo) = group_size - 1;
+ if (STMT_VINFO_STRIDED_P (first_element))
+   DR_GROUP_GAP (vinfo) = 0;
+ else
+   DR_GROUP_GAP (vinfo) = group_size - 1;
  vinfo = next;
}
}
Index: gcc/testsuite/gcc.dg/vect/pr92677.c
===
--- /dev/null   2019-09-17 11:41:18.176664108 +0100
+++ gcc/testsuite/gcc.dg/vect/pr92677.c 2019-11-29 10:52:30.475476141 +
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-additional-options "-O3" } */
+
+int a, c;
+int *b;
+long d;
+double *e;
+
+void fn1() {
+  long f;
+  double g, h;
+  while (c) {
+if (d) {
+  g = *e;
+  *(b + 4) = g;
+}
+if (f) {
+  h = *(e + 2);
+  *(b + 6) = h;
+}
+e += a;
+b += 8;
+c--;
+d += 2;
+  }
+}


Re: [C] Add a target hook that allows targets to verify type usage

2019-11-29 Thread Richard Sandiford
Ping

Richard Sandiford  writes:
> This patch adds a new target hook to check whether there are any
> target-specific reasons why a type cannot be used in a certain
> source-language context.  It works in a similar way to existing
> hooks like TARGET_INVALID_CONVERSION and TARGET_INVALID_UNARY_OP.
>
> The reason for adding the hook is to report invalid uses of SVE types.
> Throughout a TU, the SVE vector and predicate types represent values
> that can be stored in an SVE vector or predicate register.  At certain
> points in the TU we might be able to generate code that assumes the
> registers have a particular size, but often we can't.  In some cases
> we might even make multiple different assumptions in the same TU
> (e.g. when implementing an ifunc for multiple vector lengths).
>
> But SVE types themselves are the same type throughout.  The register
> size assumptions change how we generate code, but they don't change
> the definition of the types.
>
> This means that the types do not have a fixed size at the C level
> even when -msve-vector-bits=N is in effect.  It also means that the
> size does not work in the same way as for C VLAs, where the abstract
> machine evaluates the size at a particular point and then carries that
> size forward to later code.
>
> The SVE ACLE deals with this by making it invalid to use C and C++
> constructs that depend on the size or layout of SVE types.  The spec
> refers to the types as "sizeless" types and defines their semantics as
> edits to the standards.  See:
>
>   https://gcc.gnu.org/ml/gcc-patches/2018-10/msg00868.html
>
> for a fuller description and:
>
>   https://gcc.gnu.org/ml/gcc/2019-11/msg00088.html
>
> for a recent update on the status.
>
> However, since all current sizeless types are target-specific built-in
> types, there's no real reason for the frontends to handle them directly.
> They can just hand off the checks to target code instead.  It's then
> possible for the errors to refer to "SVE types" rather than "sizeless
> types", which is likely to be more meaningful to users.
>
> There is a slight overlap between the new tests and the ones for
> gnu_vector_type_p in r277950, but here the emphasis is on testing
> sizelessness.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Richard

2019-11-12  Richard Sandiford  

gcc/
* target.h (type_context_kind): New enum.
(verify_type_context): Declare.
* target.def (verify_type_context): New target hook.
* doc/tm.texi.in (TARGET_VERIFY_TYPE_CONTEXT): Likewise.
* doc/tm.texi: Regenerate.
* tree.c (verify_type_context): New function.
* config/aarch64/aarch64-protos.h (aarch64_sve::verify_type_context):
Declare.
* config/aarch64/aarch64-sve-builtins.cc (verify_type_context):
New function.
* config/aarch64/aarch64.c (aarch64_verify_type_context): Likewise.
(TARGET_VERIFY_TYPE_CONTEXT): Define.

gcc/c-family/
* c-common.c (pointer_int_sum): Use verify_type_context to check
whether the target allows pointer arithmetic for the types involved.
(c_sizeof_or_alignof_type, c_alignof_expr): Use verify_type_context
to check whether the target allows sizeof and alignof operations
for the types involved.

gcc/c/
* c-decl.c (start_decl): Allow initialization of variables whose
size is a POLY_INT_CST.
(finish_decl): Use verify_type_context to check whether the target
allows variables with a particular type to have static or thread-local
storage duration.  Don't raise a second error if such variables do
not have a constant size.
(grokdeclarator): Use verify_type_context to check whether the
target allows fields or array elements to have a particular type.
* c-typeck.c (pointer_diff): Use verify_type_context to test whether
the target allows pointer difference for the types involved.
(build_unary_op): Likewise for pointer increment and decrement.

gcc/testsuite/
* gcc.target/aarch64/sve/acle/general-c/sizeless-1.c: New test.
* gcc.target/aarch64/sve/acle/general-c/sizeless-2.c: Likewise.

Index: gcc/target.h
===
--- gcc/target.h2019-11-08 08:31:17.0 +
+++ gcc/target.h2019-11-12 16:01:45.643584681 +
@@ -218,6 +218,35 @@ enum omp_device_kind_arch_isa {
   omp_device_isa
 };
 
+/* The contexts in which the use of a type T can be checked by
+   TARGET_VERIFY_TYPE_CONTEXT.  */
+enum type_context_kind {
+  /* Directly measuring the size of T.  */
+  TCTX_SIZEOF,
+
+  /* Directly measuring the alignment of T.  */
+  TCTX_ALIGNOF,
+
+  /* Creating objects of type T with static storage duration.  */
+  TCTX_STATIC_STORAGE,
+
+  /* Creating objects of type T with thread-local storage duration.  */
+  TCTX_THREAD_STORAGE,
+
+  /* Creating a field of type T.  */
+  TCTX_FIELD,
+
+  /* C

Ping: [C++ PATCH] Opt out of GNU vector extensions for built-in SVE types

2019-11-29 Thread Richard Sandiford
Ping

Richard Sandiford  writes:
> This is the C++ equivalent of r277950, which prevented the
> use of the GNU vector extensions with SVE vector types for C.
> [https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=277950].
> I've copied the rationale below for reference.
>
> The changes here are very similar to the C ones.  Perhaps the only
> noteworthy thing (that I know of) is that the patch continues to treat
> !gnu_vector_type_p vector types as literal types/potential constexprs.
> Disabling the GNU vector extensions shouldn't in itself stop the types
> from being literal types, since whatever the target provides instead
> might be constexpr material.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> Richard
>
> -
> The AArch64 port defines built-in SVE types at start-up under names
> like __SVInt8_t.  These types are represented in the front end and
> gimple as normal VECTOR_TYPEs and are code-generated as normal vectors.
> However, we'd like to stop the frontends from treating them in the
> same way as GNU-style ("vector_size") vectors, for several reasons:
>
> (1) We allowed the GNU vector extensions to be mixed with Advanced SIMD
> vector types and it ended up causing a lot of confusion on big-endian
> targets.  Although SVE handles big-endian vectors differently from
> Advanced SIMD, there are still potential surprises; see the block
> comment near the head of aarch64-sve.md for details.
>
> (2) One of the SVE vectors is a packed one-bit-per-element boolean vector.
> That isn't a combination the GNU vector extensions have supported
> before.  E.g. it means that vectors can no longer decompose to
> arrays for indexing, and that not all elements are individually
> addressable.  It also makes it less clear which order the initialiser
> should be in (lsb first, or bitfield ordering?).  We could define
> all that of course, but it seems a bit weird to go to the effort
> for this case when, given all the other reasons, we don't want the
> extensions anyway.
>
> (3) The GNU vector extensions only provide full-vector operations,
> which is a very artifical limitation on a predicated architecture
> like SVE.
>
> (4) The set of operations provided by the GNU vector extensions is
> relatively small, whereas the SVE intrinsics provide many more.
>
> (5) It makes it easier to ensure that (with default options) code is
> portable between compilers without the GNU vector extensions having
> to become an official part of the SVE intrinsics spec.
>
> (6) The length of the SVE types is usually not fixed at compile time,
> whereas the GNU vector extension is geared around fixed-length
> vectors.
>
> It's possible to specify the length of an SVE vector using the
> command-line option -msve-vector-bits=N, but in principle it should
> be possible to have functions compiled for different N in the same
> translation unit.  This isn't supported yet but would be very useful
> for implementing ifuncs.  Once mixing lengths in a translation unit
> is supported, the SVE types should represent the same type throughout
> the translation unit, just as GNU vector types do.
>
> However, when -msve-vector-bits=N is in effect, we do allow conversions
> between explicit GNU vector types of N bits and the corresponding SVE
> types.  This doesn't undermine the intent of (5) because in this case
> the use of GNU vector types is explicit and intentional.  It also doesn't
> undermine the intent of (6) because converting between the types is just
> a conditionally-supported operation.  In other words, the types still
> represent the same types throughout the translation unit, it's just that
> conversions between them are valid in cases where a certain precondition
> is known to hold.  It's similar to the way that the SVE vector types are
> defined throughout the translation unit but can only be used in functions
> for which SVE is enabled.
> -

2019-11-08  Richard Sandiford  

gcc/cp/
* cp-tree.h (CP_AGGREGATE_TYPE_P): Check for gnu_vector_type_p
instead of VECTOR_TYPE.
* call.c (build_conditional_expr_1): Restrict vector handling
to vectors that satisfy gnu_vector_type_p.  Don't treat the
"then" and "else" types as equivalent if they have the same
vector shape but differ in whether they're GNU vectors.
* cvt.c (ocp_convert): Only allow vectors to be converted
to bool if they satisfy gnu_vector_type_p.
(build_expr_type_conversion): Only allow conversions from
vectors if they satisfy gnu_vector_type_p.
* typeck.c (cp_build_binary_op): Only allow binary operators to be
applied to vectors if they satisfy gnu_vector_type_p.
(cp_build_unary_op): Likewise unary operators.
   

[PATCH] Stream memory access types in IPA ICF.

2019-11-29 Thread Martin Liška

Hello.

The patch is about streaming of at maximum 3 tree types that are
used for memory references in IPA ICF. That helps rapidly to reduce
number of function bodies loaded in WPA phase. Based on numbers for
Firefox we get from:

Init called for 87557 items (23.30%).
...
Totally needed symbols: 40580, fraction of loaded symbols: 46.35%

to:

Init called for 55844 items (14.86%).
...
Totally needed symbols: 40580, fraction of loaded symbols: 72.67%

Where memory peak drops from 5.7GB to 4.8GB and WPA is faster by 5 seconds.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-11-27  Martin Liska  

PR ipa/92535
* ipa-icf.c (sem_function::sem_function): Initialize
memory_access_types.
(record_memory_op_type): New function.
(sem_function::init): Walk memory accesses for GIMPLE
statements.
(sem_item_optimizer::write_summary): Stream memory_access_types.
(sem_item_optimizer::read_section): Read memory_access_types
and hash pointer to canonical types.
(sem_item_optimizer::execute): Update hash by memory
access type.
(sem_item_optimizer::update_hash_by_memory_access_type):
New.
* ipa-icf.h (memory_access_types): New.
(m_canonical_types_hash): Likewise.
(update_hash_by_memory_access_type): Likewise.
---
 gcc/ipa-icf.c | 60 +--
 gcc/ipa-icf.h | 10 +
 2 files changed, 68 insertions(+), 2 deletions(-)


diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index bec7cbc7201..0a219821bc6 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -66,6 +66,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "coverage.h"
 #include "gimple-pretty-print.h"
 #include "data-streamer.h"
+#include "tree-streamer.h"
 #include "fold-const.h"
 #include "calls.h"
 #include "varasm.h"
@@ -84,6 +85,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "stor-layout.h"
 #include "dbgcnt.h"
 #include "tree-vector-builder.h"
+#include "gimple-walk.h"
 
 using namespace ipa_icf_gimple;
 
@@ -225,14 +227,16 @@ hash_map sem_item::m_type_hash_cache;
 /* Semantic function constructor that uses STACK as bitmap memory stack.  */
 
 sem_function::sem_function (bitmap_obstack *stack)
-: sem_item (FUNC, stack), m_checker (NULL), m_compared_func (NULL)
+  : sem_item (FUNC, stack), memory_access_types (), m_canonical_types_hash (0),
+m_checker (NULL), m_compared_func (NULL)
 {
   bb_sizes.create (0);
   bb_sorted.create (0);
 }
 
 sem_function::sem_function (cgraph_node *node, bitmap_obstack *stack)
-: sem_item (FUNC, node, stack), m_checker (NULL), m_compared_func (NULL)
+  : sem_item (FUNC, node, stack), memory_access_types (),
+m_canonical_types_hash (0), m_checker (NULL), m_compared_func (NULL)
 {
   bb_sizes.create (0);
   bb_sorted.create (0);
@@ -1332,6 +1336,22 @@ sem_function::merge (sem_item *alias_item)
   return true;
 }
 
+/* Callback for walk_stmt_load_store_addr_ops.  */
+
+static bool
+record_memory_op_type (gimple *, tree t, tree, void *data)
+{
+  t = get_base_address (t);
+  if (t != NULL_TREE)
+{
+  vec *access_types = (vec *) data;
+  if (access_types->length () < 3)
+	access_types->safe_push (TREE_TYPE (t));
+}
+
+  return false;
+}
+
 /* Semantic item initialization function.  */
 
 void
@@ -1385,6 +1405,10 @@ sem_function::init (ipa_icf_gimple::func_checker *checker)
 	  {
 		hash_stmt (stmt, hstate);
 		nondbg_stmt_count++;
+
+		walk_stmt_load_store_addr_ops (stmt, &memory_access_types,
+	   record_memory_op_type,
+	   record_memory_op_type, NULL);
 	  }
 	  }
 
@@ -2134,6 +2158,14 @@ sem_item_optimizer::write_summary (void)
 	  streamer_write_uhwi_stream (ob->main_stream, node_ref);
 
 	  streamer_write_uhwi (ob, (*item)->get_hash ());
+
+	  if ((*item)->type == FUNC)
+	{
+	  sem_function *fn = static_cast (*item);
+	  streamer_write_uhwi (ob, fn->memory_access_types.length ());
+	  for (unsigned i = 0; i < fn->memory_access_types.length (); i++)
+		stream_write_tree (ob, fn->memory_access_types[i], true);
+	}
 	}
 }
 
@@ -2185,6 +2217,14 @@ sem_item_optimizer::read_section (lto_file_decl_data *file_data,
 	  cgraph_node *cnode = dyn_cast  (node);
 
 	  sem_function *fn = new sem_function (cnode, &m_bmstack);
+	  unsigned count = streamer_read_uhwi (&ib_main);
+	  inchash::hash hstate (0);
+	  for (unsigned i = 0; i < count; i++)
+	{
+	  tree type = stream_read_tree (&ib_main, data_in);
+	  hstate.add_ptr (TYPE_CANONICAL (type));
+	}
+	  fn->m_canonical_types_hash = hstate.end ();
 	  fn->set_hash (hash);
 	  m_items.safe_push (fn);
 	}
@@ -2381,6 +2421,7 @@ sem_item_optimizer::execute (void)
 
   build_graph ();
   update_hash_by_addr_refs ();
+  update_hash_by_memory_access_type ();
   build_hash_based_classes ();
 
   if (dump_file)
@@ -2518,6 +2559,21 @@ sem_item_o

Re: [Patch][OpenMP] Fix use_device_… with absent optional arg

2019-11-29 Thread Tobias Burnus
Revised patch after some re-considerations (and finding tons of issues 
with VALUE + OPTIONAL in gfortran itself). – What the patch does [all 
related to use_device_{addr,ptr} and 'optional' arguments]:


For assumed-shape arrays, the compiler generates code like "if (arg) 
arg.0 = arg->data;". Hence, arg.0 was always available – but possibly 
pointing to uninitialized memory. – Likewise, per-value-passed 
arguments, 'arg' (i.e. &arg) is always available. — But, in the absent 
case, if 'arg->data is not NULL or the per-value decl is not mapped (cf. 
test case), that's not the best idea.


I thought that I don't need a condition in thoses case – but it turns 
out that the offloading plugin might (rightly!) complain that the 
address is not mapped. – Hence, I add the is-present condition now also 
in for those case; as none remain, the do_optional_check is now gone.


However, after the library call, amp_arr.arg is known to be initialized. 
In this case, I keep the do_optional_check check. – This avoids code 
which boils down to  "x0 = arg ? omp_arr.arg : NULL". – and keeps 
generating condtions only for complex code such as:  if (present) { 
tmp.data = omp_arr.arg; arg = &tmp; } else {arg = NULL;}.


Finally, while testing/exploring value+optional bugs, I stumbled over 
'type(c_ptr),value' which is 'void *'. In particular, it is pointer but 
still the is-present status is passed as hidden argument. This patch 
fixes both mapping and the is-present check.


Build on x86-64-gnu-linux + tested once on a system without offloading 
support and with nvptx offloading.

OK?

Tobias

PS: Regarding the issues with OPTIONAL and VALUE, see PR fortran/92703 
and PR fortran/92702. Found issues: const-len character strings are 
passed as value w/o hidden is-present arg but func call passes them; 
those and derived types/the outer class container are passed by value - 
but the 'present()' check assumes pointers (hence: ICE); if basent 
null_ptr_node is passed, I fear that this will give stack issues, at 
least on some platforms. — Additionally, deferred-length character 
strings and arrays are permitted since F2008 but not yet supported.


	gcc/fortran/
	* trans-openmp.c (gfc_omp_is_optional_argument,
	gfc_omp_check_optional_argument): Handle type(c_ptr),value which uses a
	hidden argument for the is-present check.

	gcc/
	* omp-low.c (lower_omp_target): For use_device_ptr/use_derice_addr
	and Fortran's optional arguments, unconditionally add the is-present
	condition before the libgomp call.

	libgomp/
	* testsuite/libgomp.fortran/use_device_ptr-optional-2.f90: Add
	'type(c_ptr), value' test case. Conditionally map the per-value
	passed arguments.

diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index d9dfcabc65e..f21785fa8c3 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -60,7 +60,8 @@ gfc_omp_is_allocatable_or_ptr (const_tree decl)
 
 /* True if the argument is an optional argument; except that false is also
returned for arguments with the value attribute (nonpointers) and for
-   assumed-shape variables (decl is a local variable containing arg->data).  */
+   assumed-shape variables (decl is a local variable containing arg->data).
+   Note that pvoid_type_node is for 'type(c_ptr), value.  */
 
 static bool
 gfc_omp_is_optional_argument (const_tree decl)
@@ -68,6 +69,7 @@ gfc_omp_is_optional_argument (const_tree decl)
   return (TREE_CODE (decl) == PARM_DECL
 	  && DECL_LANG_SPECIFIC (decl)
 	  && TREE_CODE (TREE_TYPE (decl)) == POINTER_TYPE
+	  && TREE_TYPE (decl) != pvoid_type_node
 	  && GFC_DECL_OPTIONAL_ARGUMENT (decl));
 }
 
@@ -99,9 +101,12 @@ gfc_omp_check_optional_argument (tree decl, bool for_present_check)
   || !GFC_DECL_OPTIONAL_ARGUMENT (decl))
 return NULL_TREE;
 
-  /* For VALUE, the scalar variable is passed as is but a hidden argument
- denotes the value.  Cf. trans-expr.c.  */
-  if (TREE_CODE (TREE_TYPE (decl)) != POINTER_TYPE)
+   /* Scalars with VALUE attribute which are passed by value use a hidden
+  argument to denote the present status.  They are passed as nonpointer type
+  with one exception: 'type(c_ptr), value' as '*void'.  */
+   /* Cf. trans-expr.c's gfc_conv_expr_present.  */
+   if (TREE_CODE (TREE_TYPE (decl)) != POINTER_TYPE
+   || TREE_TYPE (decl) == pvoid_type_node)
 {
   char name[GFC_MAX_SYMBOL_LEN + 2];
   tree tree_name;
diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 19132f76da2..0e66a68ff36 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -11981,8 +11981,6 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	  case OMP_CLAUSE_USE_DEVICE_PTR:
 	  case OMP_CLAUSE_USE_DEVICE_ADDR:
 	  case OMP_CLAUSE_IS_DEVICE_PTR:
-	bool do_optional_check;
-	do_optional_check = false;
 	ovar = OMP_CLAUSE_DECL (c);
 	var = lookup_decl_in_outer_ctx (ovar, ctx);
 
@@ -12004,10 +12002,7 @@ lower_omp_target (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 	  }
 	type = 

Re: [Patch][OpenMP/OpenACC/Fortran] Fix mapping of optional (present|absent) arguments

2019-11-29 Thread Tobias Burnus

Early *PING*.

Tobias Burnus wrote:
This patch does two things regarding explicit and automatical variable 
mapping to offloaded devices:


* Fixes bugs with optional arguments, which are present. They were 
mapped but the mapping had issues causing run-time failures.

* It now also handles absent optional arguments.

Compared to the previous patch set,** I added several OpenMP test 
cases – and fixed the fallout.


Except for trivial changes to libgomp/oacc-mem.c and omp-low.c, all 
changes are in fortran/trans-openmp.c and only affect optional arguments.


The patch was bootstrapped and tested on x86_64-gnu-linux w/o 
offloading-support configured and with nvptx offloading.


Tobias

** Included in the attached patch are the following previously posted 
patches: [1] the trivial libgomp/oacc-mem.c change, [2] only the 
remaining single-line change in omp-low.c, [3] the trans-openmp.c 
changes (which had to be modified+extended), and [5] the test cases. 
([2] and [4] are already in GCC 10.) See: 
https://gcc.gnu.org/ml/gcc-patches/2019-07/threads.html#00960 for the 
original patches.


PS: For full OpenMP support, (absent) optional arguments also needed 
to be handled for data-share clauses.




Re: [PATCH] Stream memory access types in IPA ICF.

2019-11-29 Thread Richard Biener
On Fri, Nov 29, 2019 at 12:37 PM Martin Liška  wrote:
>
> Hello.
>
> The patch is about streaming of at maximum 3 tree types that are
> used for memory references in IPA ICF. That helps rapidly to reduce
> number of function bodies loaded in WPA phase. Based on numbers for
> Firefox we get from:
>
> Init called for 87557 items (23.30%).
> ...
> Totally needed symbols: 40580, fraction of loaded symbols: 46.35%
>
> to:
>
> Init called for 55844 items (14.86%).
> ...
> Totally needed symbols: 40580, fraction of loaded symbols: 72.67%
>
> Where memory peak drops from 5.7GB to 4.8GB and WPA is faster by 5 seconds.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

It looks like a hack.  First of all you do quite some useless work since
already the caller of walk_stmt_* could check for the vector length.
second you are streaming a quite random type here with not much
semantic value, plus 't' is already the base of the memory reference.

To me it looks like the stmt hash is incredibly weak, for example
not considering a GIMPLE_COND comparison code,
doing weird stuff for commutative ops (the idea is probably to
hash in both orders?  but the implementation is completely off),
but not considering the same for communative comparison ops
(or swapped ops).

To me it looks like if we want to hash types we could at least
hash basic type properties plus for aggregates hash the ODR
name if any such exists.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> 2019-11-27  Martin Liska  
>
> PR ipa/92535
> * ipa-icf.c (sem_function::sem_function): Initialize
> memory_access_types.
> (record_memory_op_type): New function.
> (sem_function::init): Walk memory accesses for GIMPLE
> statements.
> (sem_item_optimizer::write_summary): Stream memory_access_types.
> (sem_item_optimizer::read_section): Read memory_access_types
> and hash pointer to canonical types.
> (sem_item_optimizer::execute): Update hash by memory
> access type.
> (sem_item_optimizer::update_hash_by_memory_access_type):
> New.
> * ipa-icf.h (memory_access_types): New.
> (m_canonical_types_hash): Likewise.
> (update_hash_by_memory_access_type): Likewise.
> ---
>   gcc/ipa-icf.c | 60 +--
>   gcc/ipa-icf.h | 10 +
>   2 files changed, 68 insertions(+), 2 deletions(-)
>
>


Re: [PATCH] Stream memory access types in IPA ICF.

2019-11-29 Thread Richard Biener
On Fri, Nov 29, 2019 at 1:17 PM Richard Biener
 wrote:
>
> On Fri, Nov 29, 2019 at 12:37 PM Martin Liška  wrote:
> >
> > Hello.
> >
> > The patch is about streaming of at maximum 3 tree types that are
> > used for memory references in IPA ICF. That helps rapidly to reduce
> > number of function bodies loaded in WPA phase. Based on numbers for
> > Firefox we get from:
> >
> > Init called for 87557 items (23.30%).
> > ...
> > Totally needed symbols: 40580, fraction of loaded symbols: 46.35%
> >
> > to:
> >
> > Init called for 55844 items (14.86%).
> > ...
> > Totally needed symbols: 40580, fraction of loaded symbols: 72.67%
> >
> > Where memory peak drops from 5.7GB to 4.8GB and WPA is faster by 5 seconds.
> >
> > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >
> > Ready to be installed?
>
> It looks like a hack.  First of all you do quite some useless work since
> already the caller of walk_stmt_* could check for the vector length.
> second you are streaming a quite random type here with not much
> semantic value, plus 't' is already the base of the memory reference.
>
> To me it looks like the stmt hash is incredibly weak, for example
> not considering a GIMPLE_COND comparison code,
> doing weird stuff for commutative ops (the idea is probably to
> hash in both orders?

Oh, and it doesn't consider dependences at all but seems to
hash a functions stmts as independent?  I'd have hashed SSA
names as the hash of their def stmt to factor that in.

  but the implementation is completely off),
> but not considering the same for communative comparison ops
> (or swapped ops).
>
> To me it looks like if we want to hash types we could at least
> hash basic type properties plus for aggregates hash the ODR
> name if any such exists.
>
> > Thanks,
> > Martin
> >
> > gcc/ChangeLog:
> >
> > 2019-11-27  Martin Liska  
> >
> > PR ipa/92535
> > * ipa-icf.c (sem_function::sem_function): Initialize
> > memory_access_types.
> > (record_memory_op_type): New function.
> > (sem_function::init): Walk memory accesses for GIMPLE
> > statements.
> > (sem_item_optimizer::write_summary): Stream memory_access_types.
> > (sem_item_optimizer::read_section): Read memory_access_types
> > and hash pointer to canonical types.
> > (sem_item_optimizer::execute): Update hash by memory
> > access type.
> > (sem_item_optimizer::update_hash_by_memory_access_type):
> > New.
> > * ipa-icf.h (memory_access_types): New.
> > (m_canonical_types_hash): Likewise.
> > (update_hash_by_memory_access_type): Likewise.
> > ---
> >   gcc/ipa-icf.c | 60 +--
> >   gcc/ipa-icf.h | 10 +
> >   2 files changed, 68 insertions(+), 2 deletions(-)
> >
> >


Re: Fix DR_GROUP_GAP for strided accesses (PR 92677)

2019-11-29 Thread Richard Biener
On Fri, Nov 29, 2019 at 11:53 AM Richard Sandiford
 wrote:
>
> When dissolving an SLP-only group of accesses, we should only set
> the gap to group_size - 1 for normal non-strided groups.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.  Probably also broken on branch(es).

Richard.

> Richard
>
>
> 2019-11-29  Richard Sandiford  
>
> gcc/
> PR tree-optimization/92677
> * tree-vect-loop.c (vect_dissolve_slp_only_groups): Set the gap
> to zero when dissolving a group of strided accesses.
>
> gcc/testsuite/
> PR tree-optimization/92677
> * gcc.dg/vect/pr92677.c: New test.
>
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2019-11-29 09:13:43.764143091 +
> +++ gcc/tree-vect-loop.c2019-11-29 10:52:30.475476141 +
> @@ -1829,7 +1829,10 @@ vect_dissolve_slp_only_groups (loop_vec_
>   DR_GROUP_FIRST_ELEMENT (vinfo) = vinfo;
>   DR_GROUP_NEXT_ELEMENT (vinfo) = NULL;
>   DR_GROUP_SIZE (vinfo) = 1;
> - DR_GROUP_GAP (vinfo) = group_size - 1;
> + if (STMT_VINFO_STRIDED_P (first_element))
> +   DR_GROUP_GAP (vinfo) = 0;
> + else
> +   DR_GROUP_GAP (vinfo) = group_size - 1;
>   vinfo = next;
> }
> }
> Index: gcc/testsuite/gcc.dg/vect/pr92677.c
> ===
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr92677.c 2019-11-29 10:52:30.475476141 +
> @@ -0,0 +1,26 @@
> +/* { dg-do compile } */
> +/* { dg-additional-options "-O3" } */
> +
> +int a, c;
> +int *b;
> +long d;
> +double *e;
> +
> +void fn1() {
> +  long f;
> +  double g, h;
> +  while (c) {
> +if (d) {
> +  g = *e;
> +  *(b + 4) = g;
> +}
> +if (f) {
> +  h = *(e + 2);
> +  *(b + 6) = h;
> +}
> +e += a;
> +b += 8;
> +c--;
> +d += 2;
> +  }
> +}


[PATCH][amdgcn] Fix ICE in re-simplification of VEC_COND_EXPR

2019-11-29 Thread Harwath, Frederik
Hi,
currently, on trunk, the tests gcc.dg/vect/vect-cond-reduc-1.c and 
gcc.dg/pr68286.c fail when compiling for amdgcn-unknown-amdhsa.
The reason seems to lie in the interaction of the changes that have been 
introduced by revision r276659
("Allow COND_EXPR and VEC_COND_EXPR condtions to trap" by Ilya Leoshkevich) of 
trunk and the vectorized code that is generated for amdgcn.

If the function maybe_resimplify_conditional_op from gimple-match-head.c gets 
called on a conditional operation without an "else" part, it
makes the operation unconditional, but only if the operation cannot trap. To 
check this, it uses operation_could_trap_p.
This ends up in a violated assertion in the latter function if 
maybe_resimplify_conditional_op is called on a COND_EXPR or VEC_COND_EXPR:

 /* This function cannot tell whether or not COND_EXPR and VEC_COND_EXPR could
 trap, because that depends on the respective condition op.  */
  gcc_assert (op != COND_EXPR && op != VEC_COND_EXPR);

A related issue has been resolved by the patch that was committed as r276915 
("PR middle-end/92063" by Jakub Jelinek).

In our case, the error is triggered by the simplification rule at line 3450 of 
gcc/match.pd:

/* A + (B vcmp C ? 1 : 0) -> A - (B vcmp C ? -1 : 0), since vector comparisons
   return all -1 or all 0 results.  */
/* ??? We could instead convert all instances of the vec_cond to negate,
   but that isn't necessarily a win on its own.  */
(simplify
 (plus:c @3 (view_convert? (vec_cond:s @0 integer_each_onep@1 integer_zerop@2)))
 (if (VECTOR_TYPE_P (type)
  && known_eq (TYPE_VECTOR_SUBPARTS (type),
   TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1)))
  && (TYPE_MODE (TREE_TYPE (type))
  == TYPE_MODE (TREE_TYPE (TREE_TYPE (@1)
  (minus @3 (view_convert (vec_cond @0:0 (negate @1) @2)
)

It seems that this rule is not invoked when compiling for x86_64 where the 
generated code for vect-cond-reduc-1.c does not contain anything that would
match this rule. Could it be that there is no test covering this rule for 
commonly tested architectures?

I have changed maybe_resimplify_conditional_op to check if a COND_EXPR or 
VEC_COND_EXPR could trap by checking whether the condition can trap using
generic_expr_could_trap_p. Judging from the comment above the assertion and the 
code changes of r276659, it seems that this is both necessary and
sufficient to verify if those expressions can trap.

Does that sound reasonable and can the patch be included in trunk?

The patch fixes the failing tests for me and does not cause any visible 
regressions in the results of "make check" which I have executed for targets 
amdgcn-unknown-amdhsa
and x86_64-pc-linux-gnu.

Best regards,
Frederik



2019-11-28  Frederik Harwath  

gcc/
* gimple-match-head.c (maybe_resimplify_conditional_op): use
generic_expr_could_trap_p to check if the condition of COND_EXPR or
VEC_COND_EXPR can trap.
---
 gcc/gimple-match-head.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index 2996bade301..4da6c4d7458 100644
--- a/gcc/gimple-match-head.c
+++ b/gcc/gimple-match-head.c
@@ -144,9 +144,17 @@ maybe_resimplify_conditional_op (gimple_seq *seq, 
gimple_match_op *res_op,
   /* Likewise if the operation would not trap.  */
   bool honor_trapv = (INTEGRAL_TYPE_P (res_op->type)
  && TYPE_OVERFLOW_TRAPS (res_op->type));
-  if (!operation_could_trap_p ((tree_code) res_op->code,
-  FLOAT_TYPE_P (res_op->type),
-  honor_trapv, res_op->op_or_null (1)))
+  tree_code op_code = (tree_code) res_op->code;
+  /* COND_EXPR and VEC_COND_EXPR will trap if, and only if, the condition
+traps and hence we have to check this. For all other operations, we
+don't need to consider the operands. */
+  bool op_could_trap = op_code == COND_EXPR || op_code == VEC_COND_EXPR ?
+   generic_expr_could_trap_p (res_op->ops[0]) :
+   operation_could_trap_p ((tree_code) res_op->code,
+   FLOAT_TYPE_P (res_op->type),
+   honor_trapv, res_op->op_or_null (1));
+
+  if (!op_could_trap)
{
  res_op->cond.cond = NULL_TREE;
  return false;
-- 
2.17.1



Re: [PATCH] Stream memory access types in IPA ICF.

2019-11-29 Thread Jan Hubicka
> > It looks like a hack.  First of all you do quite some useless work since
> > already the caller of walk_stmt_* could check for the vector length.
> > second you are streaming a quite random type here with not much
> > semantic value, plus 't' is already the base of the memory reference.
> >
> > To me it looks like the stmt hash is incredibly weak, for example
> > not considering a GIMPLE_COND comparison code,
> > doing weird stuff for commutative ops (the idea is probably to
> > hash in both orders?
> 
> Oh, and it doesn't consider dependences at all but seems to
> hash a functions stmts as independent?  I'd have hashed SSA
> names as the hash of their def stmt to factor that in.

I agree that hash calculation needs fine tuning (and we want to be sure
we do have checking that hasher agrees with comparer).

I did not look in detail into the Martin's patch. The idea of hashing
access type at WPA time came from our discussion.
If you make statistics, the dominating reason for giving up compare
after reading body is mismatch in memory operands.  This is extremely
comon for instances of templates where functions often have ideantical
body except that basetypes of all accesses are different.

Here we get same hash and eventually call operand_equal_p which walks
whole path and calls types_compatible_p that eventually finds the
basetype and returns false comparing TYPE_CANONICAL of both.

Because TYPE_CANONICAL is not known at compile time I was considering
two options: break out canonical type hashes out of lto-common.c or
simply update the hash at compile time.


Well, this all is aimed to solve copmile time/memory use problem caused
by not merging these functions.  Of course, we ought to merge.  I think
we should not use operand_equal_p for memory references and implement
separate comparsion that does two separate things:
 1) compare semantics of the access path (i.e. how big chunk of memory
with what alignment is read)
For constant offset addresses this is basicaly
get_ref_base_and-offset
 2) what additional info tree-ssa-alias can use.
(dependence clique, ref alias type, base alias type and the chain of
refs used by nonoverlapping_refs_p and friends).
I have prototype patch for that somewhere.

For ADDR_EXPR we are basically interested in base and offset only.

Honza
> 
>   but the implementation is completely off),
> > but not considering the same for communative comparison ops
> > (or swapped ops).
> >
> > To me it looks like if we want to hash types we could at least
> > hash basic type properties plus for aggregates hash the ODR
> > name if any such exists.
> >
> > > Thanks,
> > > Martin
> > >
> > > gcc/ChangeLog:
> > >
> > > 2019-11-27  Martin Liska  
> > >
> > > PR ipa/92535
> > > * ipa-icf.c (sem_function::sem_function): Initialize
> > > memory_access_types.
> > > (record_memory_op_type): New function.
> > > (sem_function::init): Walk memory accesses for GIMPLE
> > > statements.
> > > (sem_item_optimizer::write_summary): Stream memory_access_types.
> > > (sem_item_optimizer::read_section): Read memory_access_types
> > > and hash pointer to canonical types.
> > > (sem_item_optimizer::execute): Update hash by memory
> > > access type.
> > > (sem_item_optimizer::update_hash_by_memory_access_type):
> > > New.
> > > * ipa-icf.h (memory_access_types): New.
> > > (m_canonical_types_hash): Likewise.
> > > (update_hash_by_memory_access_type): Likewise.
> > > ---
> > >   gcc/ipa-icf.c | 60 +--
> > >   gcc/ipa-icf.h | 10 +
> > >   2 files changed, 68 insertions(+), 2 deletions(-)
> > >
> > >


Re: [PATCH][amdgcn] Fix ICE in re-simplification of VEC_COND_EXPR

2019-11-29 Thread Richard Biener
On Fri, Nov 29, 2019 at 1:24 PM Harwath, Frederik
 wrote:
>
> Hi,
> currently, on trunk, the tests gcc.dg/vect/vect-cond-reduc-1.c and 
> gcc.dg/pr68286.c fail when compiling for amdgcn-unknown-amdhsa.
> The reason seems to lie in the interaction of the changes that have been 
> introduced by revision r276659
> ("Allow COND_EXPR and VEC_COND_EXPR condtions to trap" by Ilya Leoshkevich) 
> of trunk and the vectorized code that is generated for amdgcn.
>
> If the function maybe_resimplify_conditional_op from gimple-match-head.c gets 
> called on a conditional operation without an "else" part, it
> makes the operation unconditional, but only if the operation cannot trap. To 
> check this, it uses operation_could_trap_p.
> This ends up in a violated assertion in the latter function if 
> maybe_resimplify_conditional_op is called on a COND_EXPR or VEC_COND_EXPR:
>
>  /* This function cannot tell whether or not COND_EXPR and VEC_COND_EXPR could
>  trap, because that depends on the respective condition op.  */
>   gcc_assert (op != COND_EXPR && op != VEC_COND_EXPR);
>
> A related issue has been resolved by the patch that was committed as r276915 
> ("PR middle-end/92063" by Jakub Jelinek).
>
> In our case, the error is triggered by the simplification rule at line 3450 
> of gcc/match.pd:
>
> /* A + (B vcmp C ? 1 : 0) -> A - (B vcmp C ? -1 : 0), since vector comparisons
>return all -1 or all 0 results.  */
> /* ??? We could instead convert all instances of the vec_cond to negate,
>but that isn't necessarily a win on its own.  */
> (simplify
>  (plus:c @3 (view_convert? (vec_cond:s @0 integer_each_onep@1 
> integer_zerop@2)))
>  (if (VECTOR_TYPE_P (type)
>   && known_eq (TYPE_VECTOR_SUBPARTS (type),
>TYPE_VECTOR_SUBPARTS (TREE_TYPE (@1)))
>   && (TYPE_MODE (TREE_TYPE (type))
>   == TYPE_MODE (TREE_TYPE (TREE_TYPE (@1)
>   (minus @3 (view_convert (vec_cond @0:0 (negate @1) @2)
> )
>
> It seems that this rule is not invoked when compiling for x86_64 where the 
> generated code for vect-cond-reduc-1.c does not contain anything that would
> match this rule. Could it be that there is no test covering this rule for 
> commonly tested architectures?

This was all added for aarch64 SVE.  So it looks like the outer plus
was conditional and we end up inheriting the
condition for the inner vec_cond.  Your fix looks reasonable but is
very badly formatted.  Can you instead do

 if (op_Code == cOND_EPXR || op_code == vEC_COND_EXPR)
   op_could_trap = generic_expr_could_trap (..)
 else
  op_could_trap = operation_could_trap_p (...

Thanks,
RIchard.

> I have changed maybe_resimplify_conditional_op to check if a COND_EXPR or 
> VEC_COND_EXPR could trap by checking whether the condition can trap using
> generic_expr_could_trap_p. Judging from the comment above the assertion and 
> the code changes of r276659, it seems that this is both necessary and
> sufficient to verify if those expressions can trap.
>
> Does that sound reasonable and can the patch be included in trunk?
>
> The patch fixes the failing tests for me and does not cause any visible 
> regressions in the results of "make check" which I have executed for targets 
> amdgcn-unknown-amdhsa
> and x86_64-pc-linux-gnu.
>
> Best regards,
> Frederik
>
>
>
> 2019-11-28  Frederik Harwath  
>
> gcc/
> * gimple-match-head.c (maybe_resimplify_conditional_op): use
> generic_expr_could_trap_p to check if the condition of COND_EXPR or
> VEC_COND_EXPR can trap.
> ---
>  gcc/gimple-match-head.c | 14 +++---
>  1 file changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
> index 2996bade301..4da6c4d7458 100644
> --- a/gcc/gimple-match-head.c
> +++ b/gcc/gimple-match-head.c
> @@ -144,9 +144,17 @@ maybe_resimplify_conditional_op (gimple_seq *seq, 
> gimple_match_op *res_op,
>/* Likewise if the operation would not trap.  */
>bool honor_trapv = (INTEGRAL_TYPE_P (res_op->type)
>   && TYPE_OVERFLOW_TRAPS (res_op->type));
> -  if (!operation_could_trap_p ((tree_code) res_op->code,
> -  FLOAT_TYPE_P (res_op->type),
> -  honor_trapv, res_op->op_or_null (1)))
> +  tree_code op_code = (tree_code) res_op->code;
> +  /* COND_EXPR and VEC_COND_EXPR will trap if, and only if, the condition
> +traps and hence we have to check this. For all other operations, we
> +don't need to consider the operands. */
> +  bool op_could_trap = op_code == COND_EXPR || op_code == VEC_COND_EXPR ?
> +   generic_expr_could_trap_p (res_op->ops[0]) :
> +   operation_could_trap_p ((tree_code) res_op->code,
> +   FLOAT_TYPE_P (res_op->type),
> +   honor_trapv, res_op->op_or_null (1));
> +
> +  if (!op_could_trap)
> {
>   res_op->cond.cond = NULL_TREE;
>   return fa

Re: [PATCH] Stream memory access types in IPA ICF.

2019-11-29 Thread Richard Biener
On Fri, Nov 29, 2019 at 1:33 PM Jan Hubicka  wrote:
>
> > > It looks like a hack.  First of all you do quite some useless work since
> > > already the caller of walk_stmt_* could check for the vector length.
> > > second you are streaming a quite random type here with not much
> > > semantic value, plus 't' is already the base of the memory reference.
> > >
> > > To me it looks like the stmt hash is incredibly weak, for example
> > > not considering a GIMPLE_COND comparison code,
> > > doing weird stuff for commutative ops (the idea is probably to
> > > hash in both orders?
> >
> > Oh, and it doesn't consider dependences at all but seems to
> > hash a functions stmts as independent?  I'd have hashed SSA
> > names as the hash of their def stmt to factor that in.
>
> I agree that hash calculation needs fine tuning (and we want to be sure
> we do have checking that hasher agrees with comparer).
>
> I did not look in detail into the Martin's patch. The idea of hashing
> access type at WPA time came from our discussion.
> If you make statistics, the dominating reason for giving up compare
> after reading body is mismatch in memory operands.  This is extremely
> comon for instances of templates where functions often have ideantical
> body except that basetypes of all accesses are different.

Is it?

> Here we get same hash and eventually call operand_equal_p which walks
> whole path and calls types_compatible_p that eventually finds the
> basetype and returns false comparing TYPE_CANONICAL of both.
>
> Because TYPE_CANONICAL is not known at compile time I was considering
> two options: break out canonical type hashes out of lto-common.c or
> simply update the hash at compile time.

But this is done at WPA time, after we merged types.  We can use ODR
types at compile-time already.

We could also use the canonical type hash for all types I guess.

> Well, this all is aimed to solve copmile time/memory use problem caused
> by not merging these functions.  Of course, we ought to merge.  I think
> we should not use operand_equal_p for memory references and implement
> separate comparsion that does two separate things:
>  1) compare semantics of the access path (i.e. how big chunk of memory
> with what alignment is read)
> For constant offset addresses this is basicaly
> get_ref_base_and-offset
>  2) what additional info tree-ssa-alias can use.
> (dependence clique, ref alias type, base alias type and the chain of
> refs used by nonoverlapping_refs_p and friends).
> I have prototype patch for that somewhere.

Sure, I think this is what the old code did (or tried to do).

I see we're quite elaborately hashing memory ref trees but do almost nothing
for regular stmt.  That seems backwards since we'd want to do almost nothing
for memory refs to be able to do the above fancy at compare stage.

> For ADDR_EXPR we are basically interested in base and offset only.
>
> Honza
> >
> >   but the implementation is completely off),
> > > but not considering the same for communative comparison ops
> > > (or swapped ops).
> > >
> > > To me it looks like if we want to hash types we could at least
> > > hash basic type properties plus for aggregates hash the ODR
> > > name if any such exists.
> > >
> > > > Thanks,
> > > > Martin
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 2019-11-27  Martin Liska  
> > > >
> > > > PR ipa/92535
> > > > * ipa-icf.c (sem_function::sem_function): Initialize
> > > > memory_access_types.
> > > > (record_memory_op_type): New function.
> > > > (sem_function::init): Walk memory accesses for GIMPLE
> > > > statements.
> > > > (sem_item_optimizer::write_summary): Stream memory_access_types.
> > > > (sem_item_optimizer::read_section): Read memory_access_types
> > > > and hash pointer to canonical types.
> > > > (sem_item_optimizer::execute): Update hash by memory
> > > > access type.
> > > > (sem_item_optimizer::update_hash_by_memory_access_type):
> > > > New.
> > > > * ipa-icf.h (memory_access_types): New.
> > > > (m_canonical_types_hash): Likewise.
> > > > (update_hash_by_memory_access_type): Likewise.
> > > > ---
> > > >   gcc/ipa-icf.c | 60 +--
> > > >   gcc/ipa-icf.h | 10 +
> > > >   2 files changed, 68 insertions(+), 2 deletions(-)
> > > >
> > > >


Re: [PATCH] Stream memory access types in IPA ICF.

2019-11-29 Thread Jan Hubicka
> On Fri, Nov 29, 2019 at 1:33 PM Jan Hubicka  wrote:
> >
> > > > It looks like a hack.  First of all you do quite some useless work since
> > > > already the caller of walk_stmt_* could check for the vector length.
> > > > second you are streaming a quite random type here with not much
> > > > semantic value, plus 't' is already the base of the memory reference.
> > > >
> > > > To me it looks like the stmt hash is incredibly weak, for example
> > > > not considering a GIMPLE_COND comparison code,
> > > > doing weird stuff for commutative ops (the idea is probably to
> > > > hash in both orders?
> > >
> > > Oh, and it doesn't consider dependences at all but seems to
> > > hash a functions stmts as independent?  I'd have hashed SSA
> > > names as the hash of their def stmt to factor that in.
> >
> > I agree that hash calculation needs fine tuning (and we want to be sure
> > we do have checking that hasher agrees with comparer).
> >
> > I did not look in detail into the Martin's patch. The idea of hashing
> > access type at WPA time came from our discussion.
> > If you make statistics, the dominating reason for giving up compare
> > after reading body is mismatch in memory operands.  This is extremely
> > comon for instances of templates where functions often have ideantical
> > body except that basetypes of all accesses are different.
> 
> Is it?

It seems to me that most of unmerged functions are like that.
Marin has some stats, I started -fdump-ipa-icf-details on Firefox now
and will send some examples.
> 
> > Here we get same hash and eventually call operand_equal_p which walks
> > whole path and calls types_compatible_p that eventually finds the
> > basetype and returns false comparing TYPE_CANONICAL of both.
> >
> > Because TYPE_CANONICAL is not known at compile time I was considering
> > two options: break out canonical type hashes out of lto-common.c or
> > simply update the hash at compile time.
> 
> But this is done at WPA time, after we merged types.  We can use ODR
> types at compile-time already.

Yep, hashing ODR names if available was also something I tought of.
Problem here is that only types with linkage have ODR names
> 
> We could also use the canonical type hash for all types I guess.
> 
> > Well, this all is aimed to solve copmile time/memory use problem caused
> > by not merging these functions.  Of course, we ought to merge.  I think
> > we should not use operand_equal_p for memory references and implement
> > separate comparsion that does two separate things:
> >  1) compare semantics of the access path (i.e. how big chunk of memory
> > with what alignment is read)
> > For constant offset addresses this is basicaly
> > get_ref_base_and-offset
> >  2) what additional info tree-ssa-alias can use.
> > (dependence clique, ref alias type, base alias type and the chain of
> > refs used by nonoverlapping_refs_p and friends).
> > I have prototype patch for that somewhere.
> 
> Sure, I think this is what the old code did (or tried to do).

Yep, I kind of returned original comparsion logic except for adding the
missing the access path comparsion. 
I will finish the patch.

Honza
> 
> I see we're quite elaborately hashing memory ref trees but do almost nothing
> for regular stmt.  That seems backwards since we'd want to do almost nothing
> for memory refs to be able to do the above fancy at compare stage.
> 
> > For ADDR_EXPR we are basically interested in base and offset only.
> >
> > Honza
> > >
> > >   but the implementation is completely off),
> > > > but not considering the same for communative comparison ops
> > > > (or swapped ops).
> > > >
> > > > To me it looks like if we want to hash types we could at least
> > > > hash basic type properties plus for aggregates hash the ODR
> > > > name if any such exists.
> > > >
> > > > > Thanks,
> > > > > Martin
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > 2019-11-27  Martin Liska  
> > > > >
> > > > > PR ipa/92535
> > > > > * ipa-icf.c (sem_function::sem_function): Initialize
> > > > > memory_access_types.
> > > > > (record_memory_op_type): New function.
> > > > > (sem_function::init): Walk memory accesses for GIMPLE
> > > > > statements.
> > > > > (sem_item_optimizer::write_summary): Stream 
> > > > > memory_access_types.
> > > > > (sem_item_optimizer::read_section): Read memory_access_types
> > > > > and hash pointer to canonical types.
> > > > > (sem_item_optimizer::execute): Update hash by memory
> > > > > access type.
> > > > > (sem_item_optimizer::update_hash_by_memory_access_type):
> > > > > New.
> > > > > * ipa-icf.h (memory_access_types): New.
> > > > > (m_canonical_types_hash): Likewise.
> > > > > (update_hash_by_memory_access_type): Likewise.
> > > > > ---
> > > > >   gcc/ipa-icf.c | 60 
> > > > > +--
> > > > >   gcc/ipa-icf.h | 10 +++

Re: [PATCH][amdgcn] Fix ICE in re-simplification of VEC_COND_EXPR

2019-11-29 Thread Harwath, Frederik
Hi Richard,

On 29.11.19 13:37, Richard Biener wrote:
> On Fri, Nov 29, 2019 at 1:24 PM Harwath, Frederik
>  wrote:
> [...]
>> It seems that this rule is not invoked when compiling for x86_64 where the 
>> generated code for vect-cond-reduc-1.c does not contain anything that would
>> match this rule. Could it be that there is no test covering this rule for 
>> commonly tested architectures?
> 
> This was all added for aarch64 SVE.  So it looks like the outer plus
> was conditional and we end up inheriting the
I should have mentioned this, it was indeed a COND_ADD.

> condition for the inner vec_cond.  Your fix looks reasonable but is
> very badly formatted.  Can you instead do
> 
>  if (op_Code == cOND_EPXR || op_code == vEC_COND_EXPR)
>op_could_trap = generic_expr_could_trap (..)
>  else
>   op_could_trap = operation_could_trap_p (...
> 

Sorry, sure!

Thanks,
Frederik



Re: [5/5] Don't defer choice of vector type for bools (PR 92596)

2019-11-29 Thread Richard Biener
On Fri, Nov 29, 2019 at 11:16 AM Richard Sandiford
 wrote:
>
> Now that stmt_vec_info records the choice between vector mask
> types and normal nonmask types, we can use that information in
> vect_get_vector_types_for_stmt instead of deferring the choice
> of vector type till later.
>
> vect_get_mask_type_for_stmt used to check whether the boolean inputs
> to an operation:
> (a) consistently used mask types or consistently used nonmask types; and
> (b) agreed on the number of elements.
>
> (b) shouldn't be a problem when (a) is met.  If the operation
> consistently uses mask types, tree-vect-patterns.c will have corrected
> any mismatches in mask precision.  (This is because we only use mask
> types for a small well-known set of operations and tree-vect-patterns.c
> knows how to handle any that could have different mask precisions.)
> And if the operation consistently uses normal nonmask types, there's
> no reason why booleans should need extra vector compatibility checks
> compared to ordinary integers.
>
> So the potential difficulties all seem to come from (a).  Now that
> we've chosen the result type ahead of time, we also have to consider
> whether the outputs and inputs consistently use mask types.
>
> Taking each vectorizable_* routine in turn:
>
> - vectorizable_call
>
> vect_get_vector_types_for_stmt only handled booleans specially
> for gassigns, so vect_get_mask_type_for_stmt never had chance to
> handle calls.  I'm not sure we support any calls that operate on
> booleans, but as things stand, a boolean result would always have
> a nonmask type.  Presumably any vector argument would also need to
> use nonmask types, unless it corresponds to internal_fn_mask_index
> (which is already a special case).
>
> For safety, I've added a check for mask/nonmask combinations here
> even though we didn't check this previously.
>
> - vectorizable_simd_clone_call
>
> Again, vect_get_mask_type_for_stmt never had chance to handle calls.
> The result of the call will always be a nonmask type and the patch
> for PR 92710 rejects mask arguments.  So all booleans should
> consistently use nonmask types here.
>
> - vectorizable_conversion
>
> The function already rejects any conversion between booleans in which
> one type isn't a mask type.
>
> - vectorizable_operation
>
> This function definitely needs a consistency check, e.g. to handle
> & and | in which one operand is loaded from memory and the other is
> a comparison result.  Ideally we'd handle this via pattern stmts
> instead (like we do for the all-mask case), but that's future work.
>
> - vectorizable_assignment
>
> VECT_SCALAR_BOOLEAN_TYPE_P requires single-bit precision, so the
> current code already rejects problematic cases.
>
> - vectorizable_load
>
> Loads always produce nonmask types and there are no relevant inputs
> to check against.
>
> - vectorizable_store
>
> vect_check_store_rhs already rejects mask/nonmask combinations
> via useless_type_conversion_p.
>
> - vectorizable_reduction
> - vectorizable_lc_phi
>
> PHIs always have nonmask types.  After the change above, attempts
> to combine the PHI result with a mask type would be rejected by
> vectorizable_operation.  (Again, it would be better to handle
> this using pattern stmts.)
>
> - vectorizable_induction
>
> We don't generate inductions for booleans.
>
> - vectorizable_shift
>
> The function already rejects boolean shifts via type_has_mode_precision_p.
>
> - vectorizable_condition
>
> The function already rejects mismatches via useless_type_conversion_p.
>
> - vectorizable_comparison
>
> The function already rejects comparisons between mask and nonmask types.
> The result is always a mask type.

OK.

Thanks for cleaning this up!
Richard.

>
> 2019-11-29  Richard Sandiford  
>
> gcc/
> PR tree-optimization/92596
> * tree-vect-stmts.c (vectorizable_call): Punt on hybrid mask/nonmask
> operations.
> (vectorizable_operation): Likewise, instead of relying on
> vect_get_mask_type_for_stmt to do this.
> (vect_get_vector_types_for_stmt): Always return a vector type
> immediately, rather than deferring the choice for boolean results.
> Use a vector mask type instead of a normal vector if
> vect_use_mask_type_p.
> (vect_get_mask_type_for_stmt): Delete.
> * tree-vect-loop.c (vect_determine_vf_for_stmt_1): Remove
> mask_producers argument and special boolean_type_node handling.
> (vect_determine_vf_for_stmt): Remove mask_producers argument and
> update calls to vect_determine_vf_for_stmt_1.  Remove doubled call.
> (vect_determine_vectorization_factor): Update call accordingly.
> * tree-vect-slp.c (vect_build_slp_tree_1): Remove special
> boolean_type_node handling.
> (vect_slp_analyze_node_operations_1): Likewise.
>
> gcc/testsui

Re: [PATCH][AArch64] Enable CLI for Armv8.6-a: armv8.6-a, i8mm and bf16

2019-11-29 Thread Richard Sandiford
Hi Dennis,

Sorry for the slow response.

Dennis Zhang  writes:
> Hi all,
>
> This patch is part of a series adding support for Armv8.6-A features.
> It enables options including -march=armv8.6-a, +i8mm and +bf16.
> The +i8mm and +bf16 features are mandatory for Armv8.6-a and optional 
> for Armv8.2-a and onward.
> Documents are at https://developer.arm.com/docs/ddi0596/latest
>
> Regtested for aarch64-none-linux-gnu.
>
> Please help to check if it's ready for trunk.
>
> Many thanks!
> Dennis
>
> gcc/ChangeLog:
>
> 2019-11-26  Dennis Zhang  
>
>   * config/aarch64/aarch64-arches.def (armv8.6-a): New.
>   * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Define
>   __ARM_FEATURE_MATMUL_INT8, __ARM_FEATURE_BF16_VECTOR_ARITHMETIC and
>   __ARM_FEATURE_BF16_SCALAR_ARITHMETIC when enabled.
>   * config/aarch64/aarch64-option-extensions.def (i8mm, bf16): New.
>   * config/aarch64/aarch64.h (AARCH64_FL_V8_6): New macro.
>   (AARCH64_FL_I8MM, AARCH64_FL_BF16, AARCH64_FL_FOR_ARCH8_6): Likewise.
>   (AARCH64_ISA_V8_6, AARCH64_ISA_I8MM, AARCH64_ISA_BF16): Likewise.
>   (TARGET_I8MM, TARGET_BF16_FP, TARGET_BF16_SIMD): Likewise.
>   * doc/invoke.texi (armv8.6-a, i8mm, bf16): Document new options.
>
> gcc/testsuite/ChangeLog:
>
> 2019-11-26  Dennis Zhang  
>
>   * gcc.target/aarch64/pragma_cpp_predefs_2.c: Add tests for i8mm
>   and bf16 features.
>
> diff --git a/gcc/config/aarch64/aarch64-arches.def 
> b/gcc/config/aarch64/aarch64-arches.def
> index d258bd49244..e464d329c1a 100644
> --- a/gcc/config/aarch64/aarch64-arches.def
> +++ b/gcc/config/aarch64/aarch64-arches.def
> @@ -36,5 +36,6 @@ AARCH64_ARCH("armv8.2-a", generic,   8_2A,  
> 8,  AARCH64_FL_FOR_ARCH8_2)
>  AARCH64_ARCH("armv8.3-a", generic,8_3A,  8,  
> AARCH64_FL_FOR_ARCH8_3)
>  AARCH64_ARCH("armv8.4-a", generic,8_4A,  8,  
> AARCH64_FL_FOR_ARCH8_4)
>  AARCH64_ARCH("armv8.5-a", generic,8_5A,  8,  
> AARCH64_FL_FOR_ARCH8_5)
> +AARCH64_ARCH("armv8.6-a", generic,8_6A,  8,  
> AARCH64_FL_FOR_ARCH8_6)
>  
>  #undef AARCH64_ARCH
> diff --git a/gcc/config/aarch64/aarch64-c.c b/gcc/config/aarch64/aarch64-c.c
> index f3da07fd28a..20d1e00552b 100644
> --- a/gcc/config/aarch64/aarch64-c.c
> +++ b/gcc/config/aarch64/aarch64-c.c
> @@ -165,6 +165,12 @@ aarch64_update_cpp_builtins (cpp_reader *pfile)
>aarch64_def_or_undef (TARGET_RNG, "__ARM_FEATURE_RNG", pfile);
>aarch64_def_or_undef (TARGET_MEMTAG, "__ARM_FEATURE_MEMORY_TAGGING", 
> pfile);
>  
> +  aarch64_def_or_undef (TARGET_I8MM, "__ARM_FEATURE_MATMUL_INT8", pfile);
> +  aarch64_def_or_undef (TARGET_BF16_SIMD,
> + "__ARM_FEATURE_BF16_VECTOR_ARITHMETIC", pfile);
> +  aarch64_def_or_undef (TARGET_BF16_FP,
> + "__ARM_FEATURE_BF16_SCALAR_ARITHMETIC", pfile);
> +
>/* Not for ACLE, but required to keep "float.h" correct if we switch
>   target between implementations that do or do not support ARMv8.2-A
>   16-bit floating-point extensions.  */
> diff --git a/gcc/config/aarch64/aarch64-option-extensions.def 
> b/gcc/config/aarch64/aarch64-option-extensions.def
> index d3ae1b2431b..5b7c3b8a213 100644
> --- a/gcc/config/aarch64/aarch64-option-extensions.def
> +++ b/gcc/config/aarch64/aarch64-option-extensions.def
> @@ -198,4 +198,14 @@ AARCH64_OPT_EXTENSION("sve2-bitperm", 
> AARCH64_FL_SVE2_BITPERM, AARCH64_FL_SIMD |
>  /* Enabling or disabling "tme" only changes "tme".  */
>  AARCH64_OPT_EXTENSION("tme", AARCH64_FL_TME, 0, 0, false, "")
>  
> +/* Enabling "i8mm" also enables "simd".
> +   Disabling "i8mm" only disables "i8mm".  */
> +AARCH64_OPT_EXTENSION("i8mm", AARCH64_FL_I8MM, AARCH64_FL_SIMD, \
> +   0, false, "i8mm")

We have to maintain the transitive closure of features by hand,
so anything that enables AARCH64_FL_SIMD also needs to enable
AARCH64_FL_FP.

We should also add i8mm to the list of things that +nosimd and +nofp
disable.

(It would be better to do this automatically, but that's future work.)

> +/* Enabling "bf16" also enables "simd" and "fp".
> +   Disabling "bf16" only disables "bf16".  */
> +AARCH64_OPT_EXTENSION("bf16", AARCH64_FL_BF16, AARCH64_FL_SIMD | 
> AARCH64_FL_FP,
> +   0, false, "bf16")

Similarly here we should add bf16 to the list of things that +nofp disables.

> @@ -308,6 +323,13 @@ extern unsigned aarch64_architecture_version;
>  /* Memory Tagging instructions optional to Armv8.5 enabled through +memtag.  
> */
>  #define TARGET_MEMTAG (AARCH64_ISA_V8_5 && AARCH64_ISA_MEMTAG)
>  
> +/* I8MM instructions are enabled through +i8mm.  */
> +#define TARGET_I8MM (TARGET_SIMD && AARCH64_ISA_I8MM)

This should then just be AARCH64_ISA_I8MM (i.e. no need to test
TARGET_SIMD).

> +
> +/* BF16 instructions are enabled through +bf16.  */
> +#define TARGET_BF16_FP (AARCH64_ISA_BF16 && TARGET_FLOAT)

Similarly here we don't need a test for TARGET_FLOAT.

> +#

Re: [PATCH] Stream memory access types in IPA ICF.

2019-11-29 Thread Richard Biener
On Fri, Nov 29, 2019 at 2:02 PM Richard Biener
 wrote:
>
> On Fri, Nov 29, 2019 at 1:57 PM Jan Hubicka  wrote:
> >
> > Hi,
> > this is an example (I just copied first few comparsions on libxul
> > builds)
> > The operators== seems all instances of nsCOMPtr.
> > nsCOMPtr::operator=(nsCOMPtr&&)
>
> Oh, yes - this means we do not merge enough.  But the issue Martin tries to 
> fix
> is we have equal hash functions too often but then fail to merge.  So you say
> Martin fixes this not by actually merging where we could but to fail "cheaper"
> but in the same bogus way?  That would be sad...

Btw, did you consider merging more aggressively by considering the merge result
only for the offline copies but continue to use the original unmerged copies for
inlining only?  Most semantic issues like access path come up in the context
of inlining only (any TBAA kinds).  That would mean creating a new "alias"
that is "noinline" and mark all original "aliases" as inline clones with body.

Richard.


[C++ Patch] A few more cp_expr_loc_or_input_loc and a diagnostic regression fix

2019-11-29 Thread Paolo Carlini

Hi,

a few more rather straightforward uses for cp_expr_loc_or_input_loc.

Additionally, while working on the latter, I noticed that, compared to 
say, gcc-7, lately the code we have in cp_build_addr_expr_1 to diagnose 
taking the address of 'main' often doesn't work anymore, when the 
argument is wrapped in a location_wrapper. The below fixes that by using 
tree_strip_any_location_wrapper in the DECL_MAIN_P check, which works 
fine, but I can imagine various other ways to solve the issue... Tested 
x86_64-linux.


Thanks, Paolo.

//

/cp
2019-11-29  Paolo Carlini  

* typeck.c (cp_build_addr_expr_1): Use cp_expr_loc_or_input_loc in a
few additional places.
(check_return_expr): Likewise.

* typeck.c (cp_build_addr_expr_1): Use tree_strip_any_location_wrapper
for address of main pedwarn.

/testsuite
2019-11-29  Paolo Carlini  

* g++.dg/diagnostic/inconsistent-deduction-1.C: New.
* g++.dg/diagnostic/returning-a-value-1.C: Likewise.
* g++.dg/cpp0x/decltype3.C: Check location(s) too.
* g++.dg/cpp0x/decltype4.C: Likewise.
* g++.dg/cpp0x/lambda/lambda-deduce-ext-neg.C: Likewise.
* g++.dg/cpp2a/consteval13.C: Likewise.
* g++.dg/expr/pmf-1.C: Likewise.
* g++.dg/other/ptrmem2.C: Likewise.
* g++.dg/template/ptrmem17.C: Likewise.
* g++.old-deja/g++.bugs/900213_03.C: Likewise.
* g++.old-deja/g++.other/pmf7.C: Likewise.
* g++.old-deja/g++.other/ptrmem7.C: Likewise.

* g++.dg/diagnostic/main2.C: New.
Index: cp/typeck.c
===
--- cp/typeck.c (revision 278834)
+++ cp/typeck.c (working copy)
@@ -6103,12 +6103,14 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue
  else if (current_class_type
   && TREE_OPERAND (arg, 0) == current_class_ref)
/* An expression like &memfn.  */
-   permerror (input_location, "ISO C++ forbids taking the address of 
an unqualified"
+   permerror (cp_expr_loc_or_input_loc (arg),
+  "ISO C++ forbids taking the address of an unqualified"
   " or parenthesized non-static member function to form"
   " a pointer to member function.  Say %<&%T::%D%>",
   base, name);
  else
-   permerror (input_location, "ISO C++ forbids taking the address of a 
bound member"
+   permerror (cp_expr_loc_or_input_loc (arg),
+  "ISO C++ forbids taking the address of a bound member"
   " function to form a pointer to member function."
   "  Say %<&%T::%D%>",
   base, name);
@@ -6154,13 +6156,13 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue
   arg = build1 (CONVERT_EXPR, type, arg);
   return arg;
 }
-  else if (pedantic && DECL_MAIN_P (arg))
+  else if (pedantic && DECL_MAIN_P (tree_strip_any_location_wrapper (arg)))
 {
   /* ARM $3.4 */
   /* Apparently a lot of autoconf scripts for C++ packages do this,
 so only complain if -Wpedantic.  */
   if (complain & (flag_pedantic_errors ? tf_error : tf_warning))
-   pedwarn (input_location, OPT_Wpedantic,
+   pedwarn (cp_expr_loc_or_input_loc (arg), OPT_Wpedantic,
 "ISO C++ forbids taking address of function %<::main%>");
   else if (flag_pedantic_errors)
return error_mark_node;
@@ -6218,7 +6220,8 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue
if (TYPE_REF_P (TREE_TYPE (t)))
  {
if (complain & tf_error)
- error ("cannot create pointer to reference member %qD", t);
+ error_at (cp_expr_loc_or_input_loc (arg),
+   "cannot create pointer to reference member %qD", t);
return error_mark_node;
  }
 
@@ -6254,8 +6257,9 @@ cp_build_addr_expr_1 (tree arg, bool strict_lvalue
  || !DECL_IMMEDIATE_FUNCTION_P (current_function_decl)))
{
  if (complain & tf_error)
-   error ("taking address of an immediate function %qD",
-  stripped_arg);
+   error_at (cp_expr_loc_or_input_loc (arg),
+ "taking address of an immediate function %qD",
+ stripped_arg);
  return error_mark_node;
}
   if (TREE_CODE (stripped_arg) == FUNCTION_DECL
@@ -9689,7 +9693,8 @@ check_return_expr (tree retval, bool *no_warning)
   if (DECL_DESTRUCTOR_P (current_function_decl))
 {
   if (retval)
-   error ("returning a value from a destructor");
+   error_at (cp_expr_loc_or_input_loc (retval),
+ "returning a value from a destructor");
   return NULL_TREE;
 }
   else if (DECL_CONSTRUCTOR_P (current_function_decl))
@@ -9700,7 +9705,8 @@ check_return_expr (tree retval, bool *no_warning)
error ("cannot return from a handler of a func

Re: [PATCH] PR85678: Change default to -fno-common

2019-11-29 Thread Martin Liška

Hello.

I've noticed quite significant package failures caused by the revision.
Would you please consider documenting this change in porting_to.html
(and in changes.html) for GCC 10 release?

Thank you


Re: [PATCH] ipa-cp: Avoid ICEs when looking at expanded thunks and unoptimized functions (PR 92476)

2019-11-29 Thread Martin Jambor
Hi,

On Fri, Nov 29 2019, Jan Hubicka wrote:
>> Hi,
>> 
>> the patch below fixes the i686 failures reported in PR 92476.  Newly
>> expanded "artificial" thunks need to be analyzed when expanded so that
>> we create necessary function summaries and jump functions for them.
>> They still don't get IPA-CP lattices, so I looked at all accesses to
>> those and verified that only the functions saving IPA-VR and IPA-bits
>> analyses could try to access non-existing lattices.
>> 
>> After that, Martin's testcase in comment 4 of the bug also revealed two
>> places where we try to access summaries of unoptimized functions and
>> segfault, so I fixed those too.  Unfortunately it seems our testsuite
>> cannot optimize different LTO compilation units with different options
>> and so I could not add the testcase there.  But it no longer ICEs.
> I think you can simply add different flag into different testcases:
> 20090210_1.c:/* { dg-options "-fPIC" { target { ! sparc*-*-* } } } */
> 20090218-1_1.c:/* { dg-options "-fgnu89-inline" } */
> 20090218-2_1.c:/* { dg-options { -fgnu89-inline } } */
> 20111207-1_1.c:/* { dg-options "-fno-lto" } */
>

Ah, servers me right for only grepping for dg-lto-options.

I have committed the fix and will commit the following testcase addition
as a follow-up.

Thanks,

Martin


2019-11-29  Martin Jambor  

PR ipa/92476
* g++.dg/lto/pr92476_[01].C: New test.
---
 gcc/testsuite/g++.dg/lto/pr92476_0.C | 20 
 gcc/testsuite/g++.dg/lto/pr92476_1.C | 13 +
 2 files changed, 33 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/lto/pr92476_0.C
 create mode 100644 gcc/testsuite/g++.dg/lto/pr92476_1.C

diff --git a/gcc/testsuite/g++.dg/lto/pr92476_0.C 
b/gcc/testsuite/g++.dg/lto/pr92476_0.C
new file mode 100644
index 000..5bbc9236f4d
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lto/pr92476_0.C
@@ -0,0 +1,20 @@
+// { dg-lto-do link }
+// { dg-lto-options { { -O0 -flto -shared -fPIC -fvisibility=hidden } } }
+// { dg-require-effective-target fpic }
+// { dg-require-effective-target shared }
+// { dg-extra-ld-options "-shared" }
+
+namespace Passenger {
+namespace Json {
+class Value {};
+} // namespace Json
+namespace ConfigKit {
+class Translator {};
+} // namespace ConfigKit
+namespace LoggingKit {
+void initialize(const Json::Value & = Json::Value(),
+const ConfigKit::Translator & = ConfigKit::Translator());
+void init_module() { initialize(); }
+} // namespace LoggingKit
+} // namespace Passenger
+
diff --git a/gcc/testsuite/g++.dg/lto/pr92476_1.C 
b/gcc/testsuite/g++.dg/lto/pr92476_1.C
new file mode 100644
index 000..cd29613b808
--- /dev/null
+++ b/gcc/testsuite/g++.dg/lto/pr92476_1.C
@@ -0,0 +1,13 @@
+// { dg-options { -O2 -flto -shared -fPIC -fvisibility=hidden } }
+
+namespace Passenger {
+namespace Json {
+class Value;
+}
+namespace ConfigKit {
+class Translator;
+}
+namespace LoggingKit {
+void initialize(const Json::Value &, const ConfigKit::Translator &) {}
+} // namespace LoggingKit
+} // namespace Passenger
-- 
2.24.0



[PATCH] Fix ICE in re-simplification of VEC_COND_EXPR (was: Re: [PATCH][amdgcn] Fix ICE in re-simplification of VEC_COND_EXPR)

2019-11-29 Thread Harwath, Frederik
Hi,

On 29.11.19 13:51, Harwath, Frederik wrote:

>> condition for the inner vec_cond.  Your fix looks reasonable but is
>> very badly formatted.  Can you instead do

I hope the formatting looks better now. I have also removed the [amdgcn] from 
the subject line since
the fact that this has been discovered in the context of amdgcn is not really 
essential.

Best regards,
Frederik


2019-11-29  Frederik Harwath  

gcc/
* gimple-match-head.c (maybe_resimplify_conditional_op): use
generic_expr_could_trap_p to check if the condition of COND_EXPR or
VEC_COND_EXPR can trap.
---
 gcc/gimple-match-head.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index 2996bade301..c763a80a6d1 100644
--- a/gcc/gimple-match-head.c
+++ b/gcc/gimple-match-head.c
@@ -144,9 +144,21 @@ maybe_resimplify_conditional_op (gimple_seq *seq, 
gimple_match_op *res_op,
   /* Likewise if the operation would not trap.  */
   bool honor_trapv = (INTEGRAL_TYPE_P (res_op->type)
  && TYPE_OVERFLOW_TRAPS (res_op->type));
-  if (!operation_could_trap_p ((tree_code) res_op->code,
-  FLOAT_TYPE_P (res_op->type),
-  honor_trapv, res_op->op_or_null (1)))
+  tree_code op_code = (tree_code) res_op->code;
+  bool op_could_trap;
+
+  /* COND_EXPR and VEC_COND_EXPR will trap if, and only if, the condition
+ traps and hence we have to check this. For all other operations, we
+ don't need to consider the operands. */
+  if (op_code == COND_EXPR || op_code == VEC_COND_EXPR)
+   op_could_trap = generic_expr_could_trap_p (res_op->ops[0]);
+  else
+   op_could_trap = operation_could_trap_p ((tree_code) res_op->code,
+   FLOAT_TYPE_P (res_op->type),
+   honor_trapv,
+   res_op->op_or_null (1));
+
+  if (!op_could_trap)
{
  res_op->cond.cond = NULL_TREE;
  return false;
-- 
2.17.1




Re: [PATCH] Fix ICE in re-simplification of VEC_COND_EXPR (was: Re: [PATCH][amdgcn] Fix ICE in re-simplification of VEC_COND_EXPR)

2019-11-29 Thread Jakub Jelinek
On Fri, Nov 29, 2019 at 02:38:34PM +0100, Harwath, Frederik wrote:
> 2019-11-29  Frederik Harwath  
> 
> gcc/
>   * gimple-match-head.c (maybe_resimplify_conditional_op): use

s/use/Use/

>   generic_expr_could_trap_p to check if the condition of COND_EXPR or
>   VEC_COND_EXPR can trap.
> ---
>  gcc/gimple-match-head.c | 18 +++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
> index 2996bade301..c763a80a6d1 100644
> --- a/gcc/gimple-match-head.c
> +++ b/gcc/gimple-match-head.c
> @@ -144,9 +144,21 @@ maybe_resimplify_conditional_op (gimple_seq *seq, 
> gimple_match_op *res_op,
>/* Likewise if the operation would not trap.  */
>bool honor_trapv = (INTEGRAL_TYPE_P (res_op->type)
> && TYPE_OVERFLOW_TRAPS (res_op->type));
> -  if (!operation_could_trap_p ((tree_code) res_op->code,
> -FLOAT_TYPE_P (res_op->type),
> -honor_trapv, res_op->op_or_null (1)))
> +  tree_code op_code = (tree_code) res_op->code;
> +  bool op_could_trap;
> +
> +  /* COND_EXPR and VEC_COND_EXPR will trap if, and only if, the condition
> +   traps and hence we have to check this. For all other operations, we

s/. /.  /

> +   don't need to consider the operands. */

Likewise.

Jakub



[PATCH] Fix VN segfault

2019-11-29 Thread Richard Biener


This probably fixes a buffer overrun reported by Honza.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-11-29  Richard Biener  

* tree-ssa-sccvn.c (vn_walk_cb_data::push_partial_def): Bail
out early for too large objects.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 278832)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -1753,6 +1753,12 @@ void *
 vn_walk_cb_data::push_partial_def (const pd_data &pd, tree vuse,
   HOST_WIDE_INT maxsizei)
 {
+  const HOST_WIDE_INT bufsize = 64;
+  /* We're using a fixed buffer for encoding so fail early if the object
+ we want to interpret is bigger.  */
+  if (maxsizei > bufsize * BITS_PER_UNIT)
+return (void *)-1;
+
   if (partial_defs.is_empty ())
 {
   partial_defs.safe_push (pd);
@@ -1823,16 +1829,17 @@ vn_walk_cb_data::push_partial_def (const
   /* Now simply native encode all partial defs in reverse order.  */
   unsigned ndefs = partial_defs.length ();
   /* We support up to 512-bit values (for V8DFmode).  */
-  unsigned char buffer[64];
+  unsigned char buffer[bufsize];
   int len;
 
   while (!partial_defs.is_empty ())
 {
   pd_data pd = partial_defs.pop ();
+  gcc_checking_assert (pd.offset < bufsize);
   if (TREE_CODE (pd.rhs) == CONSTRUCTOR)
/* Empty CONSTRUCTOR.  */
memset (buffer + MAX (0, pd.offset),
-   0, MIN ((HOST_WIDE_INT)sizeof (buffer) - MAX (0, pd.offset),
+   0, MIN (bufsize - MAX (0, pd.offset),
pd.size + MIN (0, pd.offset)));
   else
{
@@ -1847,7 +1854,7 @@ vn_walk_cb_data::push_partial_def (const
  pad = GET_MODE_SIZE (mode) - pd.size;
}
  len = native_encode_expr (pd.rhs, buffer + MAX (0, pd.offset),
-   sizeof (buffer) - MAX (0, pd.offset),
+   bufsize - MAX (0, pd.offset),
MAX (0, -pd.offset) + pad);
  if (len <= 0 || len < (pd.size - MAX (0, -pd.offset)))
{


Re: [PATCH] Fix ICE in re-simplification of VEC_COND_EXPR (was: Re: [PATCH][amdgcn] Fix ICE in re-simplification of VEC_COND_EXPR)

2019-11-29 Thread Harwath, Frederik
Hi Jakub,

On 29.11.19 14:41, Jakub Jelinek wrote:

> s/use/Use/
>
> [...]
>
> s/. /.  /

Right, thanks. Does that look ok for inclusion in trunk now?

Best regards,
Frederik


2019-11-29  Frederik Harwath  

gcc/
* gimple-match-head.c (maybe_resimplify_conditional_op): Use
generic_expr_could_trap_p to check if the condition of COND_EXPR or
VEC_COND_EXPR can trap.
---
 gcc/gimple-match-head.c | 18 +++---
 1 file changed, 15 insertions(+), 3 deletions(-)

diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
index 2996bade301..9010f11621e 100644
--- a/gcc/gimple-match-head.c
+++ b/gcc/gimple-match-head.c
@@ -144,9 +144,21 @@ maybe_resimplify_conditional_op (gimple_seq *seq, 
gimple_match_op *res_op,
   /* Likewise if the operation would not trap.  */
   bool honor_trapv = (INTEGRAL_TYPE_P (res_op->type)
  && TYPE_OVERFLOW_TRAPS (res_op->type));
-  if (!operation_could_trap_p ((tree_code) res_op->code,
-  FLOAT_TYPE_P (res_op->type),
-  honor_trapv, res_op->op_or_null (1)))
+  tree_code op_code = (tree_code) res_op->code;
+  bool op_could_trap;
+
+  /* COND_EXPR and VEC_COND_EXPR will trap if, and only if, the condition
+ traps and hence we have to check this.  For all other operations, we
+ don't need to consider the operands.  */
+  if (op_code == COND_EXPR || op_code == VEC_COND_EXPR)
+   op_could_trap = generic_expr_could_trap_p (res_op->ops[0]);
+  else
+   op_could_trap = operation_could_trap_p ((tree_code) res_op->code,
+   FLOAT_TYPE_P (res_op->type),
+   honor_trapv,
+   res_op->op_or_null (1));
+
+  if (!op_could_trap)
{
  res_op->cond.cond = NULL_TREE;
  return false;
-- 
2.17.1



[PATCH] Add changes that I made in GCC 10 development cycle.

2019-11-29 Thread Martin Liška

Hello.

I'm sending entries for changes.html file for GCC 10.

Martin

---
 htdocs/gcc-10/changes.html | 38 ++
 1 file changed, 38 insertions(+)


diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html
index f0f0d312..67c5234d 100644
--- a/htdocs/gcc-10/changes.html
+++ b/htdocs/gcc-10/changes.html
@@ -42,6 +42,10 @@ a work-in-progress.
 library required for building GCC has been increased to version
 3.1.0 (released 2011-10-03).
   
+  
+The automatic template instantiation at link time (-frepo)
+has been removed.
+  
 
 
 
@@ -53,6 +57,40 @@ a work-in-progress.
   __builtin_roundeven for the corresponding function from
 ISO/IEC TS 18661.
   
+  
+A new option, https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-fno-allocation-dce";>-fallocation-dce
+has been added. The option removes unneeded pairs of new
+and delete operators.
+  
+  
+Profile driven optimization improvements:
+
+  
+Using https://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html#index-fprofile-values";>-fprofile-values,
+an instrumented binary can track multiple
+values (up to 4) for e.g. indirect calls and provide more precise profile information.
+  
+
+  
+  
+Link-time optimization improvements:
+
+  
+A new binary https://gcc.gnu.org/onlinedocs/gcc/lto-dump.html";>lto-dump
+has been added.  The program can dump various
+information about a LTO bytecode object file.
+  
+  
+Parallel phase of the LTO can automatically detect a running make's jobserver
+or can fall back to number of available cores.
+  
+  
+The LTO bytecode can be compressed with
+https://facebook.github.io/zstd/";>zstd
+algorithm.  Configure script can automatically detect the zstd support.
+  
+
+  
 
 
 



[PR92726] OpenACC: 'NULL'-in -> no-op, and/or 'NULL'-out (was: [PATCH 1/5, OpenACC] Allow NULL as an argument to OpenACC 2.6 directives)

2019-11-29 Thread Thomas Schwinge
Hi Tobias!

Reviewing your
<8be82276-81b1-817c-fcd2-51f24f5fe2d2@codesourcery.com">http://mid.mail-archive.com/8be82276-81b1-817c-fcd2-51f24f5fe2d2@codesourcery.com>
"[Patch][OpenMP/OpenACC/Fortran] Fix mapping of optional (present|absent)
arguments" reminded me that still this behavioral change has not been
split out, cited below, that you described as "trivial".

I've just filed  "OpenACC: 'NULL'-in ->
no-op, and/or 'NULL'-out", so please reference that one in the ChangeLog
updates.

So, eventually that'll be more than just
'libgomp/oacc-mem.c:update_dev_host', but I understand we need that one
now, for Fortran optional arguments support.  Any other changes can then
be handled later (once the OpenACC specification changes have been
completed).

Please also add a new test case 'libgomp.oacc-c-c++-common/null-1.c' with
a "Test 'NULL'-in -> no-op, and/or 'NULL'-out" header, executing things
like 'acc_update_device (NULL, [...])' etc. for everything that calls
'update_dev_host': 'acc_update_device', 'acc_update_device_async',
'acc_update_self', 'acc_update_self_async'.  These functions are also
called for OpenACC 'update' directives
('libgomp/oacc-parallel.c:GOACC_update'), but I suppose it's not possible
to construct an OpenACC 'update' directive conveying a 'NULL' pointer,
that is, something that would result in 'hostaddrs[i] == NULL'?  Likewise
for Fortran, I suppose.

In 'libgomp/libgomp.texi' then add a note to 'acc_update_device' (and
other relevant functions) like this:

@@ -2586,6 +2586,9 @@ This function updates the device copy from the 
previously mapped host memory.
 The host memory is specified with the host address @var{a} and a length of
 @var{len} bytes.

+If @var{a} is the @code{NULL} pointer, this is a no-op.
+
 [...]


As for 'libgomp/oacc-mem.c:gomp_acc_insert_pointer', that's only called
for OpenACC 'enter data' directives
('libgomp/oacc-parallel.c:GOACC_enter_exit_data'), and specifically only
for 'GOMP_MAP_POINTER', 'GOMP_MAP_TO_PSET'.  Is there a way to construct
a test case that will result in a 'NULL' pointer there, other than via
Fortran optional arguments?  If not, then that hunk should be removed
here, and move into/stay in the Fortran optional arguments patch that
you've posted.

(And that said, Julian's got a patch pending review that gets rid of
'gomp_acc_insert_pointer' and other such black magic, yay.)


Grüße
 Thomas


On 2019-07-12T12:35:05+0100, Kwok Cheung Yeung  wrote:
> Fortran pass-by-reference optional arguments behave much like normal 
> Fortran arguments when lowered to GENERIC/GIMPLE, except they can be 
> null (representing a non-present argument).
>
> Some parts of libgomp (those dealing with updating mappings) currently 
> do not expect to take a null address and fail. These need to be changed 
> to deal with the null appropriately, by turning the operation into a 
> no-op (as you never need to update a non-present argument).
>
>   libgomp/
>   * oacc-mem.c (update_dev_host): Return early if the host address
>   is NULL.
>   (gomp_acc_insert_pointer): Likewise.
>   * testsuite/libgomp.oacc-c-c++-common/lib-43.c: Remove.
>   * testsuite/libgomp.oacc-c-c++-common/lib-47.c: Likewise.
> ---
>   libgomp/oacc-mem.c |  9 
>   .../testsuite/libgomp.oacc-c-c++-common/lib-43.c   | 51 
> --
>   .../testsuite/libgomp.oacc-c-c++-common/lib-47.c   | 49 
> -
>   3 files changed, 9 insertions(+), 100 deletions(-)
>   delete mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-43.c
>   delete mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/lib-47.c
>
> diff --git a/libgomp/oacc-mem.c b/libgomp/oacc-mem.c
> index 2f27100..8cc5120 100644
> --- a/libgomp/oacc-mem.c
> +++ b/libgomp/oacc-mem.c
> @@ -831,6 +831,12 @@ update_dev_host (int is_dev, void *h, size_t s, int 
> async)
> if (acc_dev->capabilities & GOMP_OFFLOAD_CAP_SHARED_MEM)
>   return;
>
> +  /* Fortran optional arguments that are non-present result in a
> + null host address here.  This can safely be ignored as it is
> + not possible to 'update' a non-present optional argument.  */
> +  if (h == NULL)
> +return;
> +
> acc_prof_info prof_info;
> acc_api_info api_info;
> bool profiling_p = GOACC_PROFILING_SETUP_P (thr, &prof_info, &api_info);
> @@ -901,6 +907,9 @@ gomp_acc_insert_pointer (size_t mapnum, void **hostaddrs, 
> size_t *sizes,
> struct goacc_thread *thr = goacc_thread ();
> struct gomp_device_descr *acc_dev = thr->dev;
>
> +  if (*hostaddrs == NULL)
> +return;
> +
> if (acc_is_present (*hostaddrs, *sizes))
>   {
> splay_tree_key n;
> diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-43.c 
> b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-43.c
> deleted file mode 100644
> index 5db2912..000
> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-43.c
> +++ /dev/null
> 

Re: [C++ PATCH] (temporarily) undefine __cpp_consteval

2019-11-29 Thread Marek Polacek
On Fri, Nov 29, 2019 at 10:30:13AM +0100, Jakub Jelinek wrote:
> --- a/htdocs/projects/cxx-status.html
> +++ b/htdocs/projects/cxx-status.html
> @@ -288,7 +288,8 @@
>  
> Immediate functions (consteval) 
>http://wg21.link/p1073r3";>P1073R3
> -   10 
> 
> +   10
> +(partial, no consteval virtual support) 
> __cpp_consteval >= 201811 

You can use class="partial" which I added for use in the DR table.

--
Marek Polacek • Red Hat, Inc. • 300 A St, Boston, MA



Re: [PATCH] PR85678: Change default to -fno-common

2019-11-29 Thread Wilco Dijkstra
Hi Martin,

> I've noticed quite significant package failures caused by the revision.

How significant? Is it mostly the common mistake of forgetting extern?

> Would you please consider documenting this change in porting_to.html
> (and in changes.html) for GCC 10 release?

Sure, I already had a patch for changes.html - I've added an initial porting_to
as well:

[wwwdocs] Document -fcommon default change

Add an entry for the default change. Passes the W3 validator.

--
diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html
index 
f0f0d312171a54afede176f06ce76f9c8abaebc4..980e4e591781d04aa888ba5988981006bd30dd1f
 100644
--- a/htdocs/gcc-10/changes.html
+++ b/htdocs/gcc-10/changes.html
@@ -47,6 +47,13 @@ a work-in-progress.
 
 
 General Improvements
+The following GCC command line options have been introduced or improved.
+
+  GCC now defaults to -fno-common.  In C, global variables 
with
+  multiple tentative definitions will result in linker errors.
+  Global variable accesses are also more efficient on various targets.
+  
+
 
 The following built-in functions have been introduced.
 
diff --git a/htdocs/gcc-10/porting_to.html b/htdocs/gcc-10/porting_to.html
new file mode 100644
index 
..2e652f6aa4bd3259a316af0c72ab7eb96bab53b7
--- /dev/null
+++ b/htdocs/gcc-10/porting_to.html
@@ -0,0 +1,65 @@
+
+
+
+
+
+Porting to GCC 10
+https://gcc.gnu.org/gcc.css"; />
+
+
+
+Porting to GCC 10
+
+
+The GCC 10 release series differs from previous GCC releases in
+a number of ways. Some of these are a result
+of bug fixing, and some old behaviors have been intentionally changed
+to support new standards, or relaxed in standards-conforming ways to
+facilitate compilation or run-time performance.
+
+
+
+Some of these changes are user visible and can cause grief when
+porting to GCC 10. This document is an effort to identify common issues
+and provide solutions. Let us know if you have suggestions for improvements!
+
+
+
+
+
+C language issues
+
+Default to -fno-common
+
+
+  A common mistake in C is omitting extern when declaring a global
+  variable in a header file.  If the header is included by several files it
+  results in multiple definitions of the same variable.  In previous GCC
+  versions this error is ignored.  GCC 10 defaults to -fno-common,
+  which means a linker error will now be reported.
+  To fix this, use extern in header files when declaring global
+  variables, and ensure each global is defined in exactly one C file.
+  As a workaround, legacy C code can be compiled with -fcommon.
+
+  
+  int x;  // tentative definition - avoid in header files
+
+  extern int y;  // correct declaration in a header file
+  
+
+
+
+
+ 
+
+
+
+



Re: [PATCH] Fix ICE in re-simplification of VEC_COND_EXPR

2019-11-29 Thread Richard Sandiford
"Harwath, Frederik"  writes:
> Hi Jakub,
>
> On 29.11.19 14:41, Jakub Jelinek wrote:
>
>> s/use/Use/
>>
>> [...]
>>
>> s/. /.  /
>
> Right, thanks. Does that look ok for inclusion in trunk now?
>
> Best regards,
> Frederik
>
>
> 2019-11-29  Frederik Harwath  
>
> gcc/
> * gimple-match-head.c (maybe_resimplify_conditional_op): Use
> generic_expr_could_trap_p to check if the condition of COND_EXPR or
> VEC_COND_EXPR can trap.

Thanks for doing this, looks good to me FWIW.  I was seeing the same
failure for SVE but hadn't found time to look at it.

Richard

> ---
>  gcc/gimple-match-head.c | 18 +++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
>
> diff --git a/gcc/gimple-match-head.c b/gcc/gimple-match-head.c
> index 2996bade301..9010f11621e 100644
> --- a/gcc/gimple-match-head.c
> +++ b/gcc/gimple-match-head.c
> @@ -144,9 +144,21 @@ maybe_resimplify_conditional_op (gimple_seq *seq, 
> gimple_match_op *res_op,
>/* Likewise if the operation would not trap.  */
>bool honor_trapv = (INTEGRAL_TYPE_P (res_op->type)
>   && TYPE_OVERFLOW_TRAPS (res_op->type));
> -  if (!operation_could_trap_p ((tree_code) res_op->code,
> -  FLOAT_TYPE_P (res_op->type),
> -  honor_trapv, res_op->op_or_null (1)))
> +  tree_code op_code = (tree_code) res_op->code;
> +  bool op_could_trap;
> +
> +  /* COND_EXPR and VEC_COND_EXPR will trap if, and only if, the condition
> + traps and hence we have to check this.  For all other operations, we
> + don't need to consider the operands.  */
> +  if (op_code == COND_EXPR || op_code == VEC_COND_EXPR)
> +   op_could_trap = generic_expr_could_trap_p (res_op->ops[0]);
> +  else
> +   op_could_trap = operation_could_trap_p ((tree_code) res_op->code,
> +   FLOAT_TYPE_P (res_op->type),
> +   honor_trapv,
> +   res_op->op_or_null (1));
> +
> +  if (!op_could_trap)
> {
>   res_op->cond.cond = NULL_TREE;
>   return false;


[PATCH] libstdc++:: improve how pretty printers find node types (PR 91997)

2019-11-29 Thread Jonathan Wakely

This fixes two related problems.

The iterators for node-based containers use nested typedefs such as
std::list::iterator::_Node to denote their node types. As reported in
https://bugzilla.redhat.com/show_bug.cgi?id=1053438 those typedefs are
not always present in the debug info. That means the pretty printers
cannot find them using gdb.lookup_type (via the find_type helper).
Instead of looking up the nested typedefs this patch makes the printers
look up the actual class templates directly.

A related problem (and the original topic of PR 91997) is that GDB fails
to find types via gdb.lookup_type when printing a backtrace from a
non-C++ functiion: https://sourceware.org/bugzilla/show_bug.cgi?id=25234
That is also solved by not looking up the nested typedef.

PR libstdc++/91997
* python/libstdcxx/v6/printers.py (find_type): Fail more gracefully
if we run out of base classes to look at.
(llokup_templ_spec, lookup_node_type): New utilities to find node
types for node-based containers.
(StdListPrinter.children, NodeIteratorPrinter.__init__)
(NodeIteratorPrinter.to_string, StdSlistPrinter.children)
(StdSlistIteratorPrinter.to_string, StdRbtreeIteratorPrinter.__init__)
(StdMapPrinter.children, StdSetPrinter.children)
(StdForwardListPrinter.children): Use lookup_node_type instead of
find_type.
(StdListIteratorPrinter.__init__, StdFwdListIteratorPrinter.__init__):
Pass name of node type to NodeIteratorPrinter constructor.
(Tr1HashtableIterator.__init__): Rename argument.
(StdHashtableIterator.__init__): Likewise. Use lookup_templ_spec
instead of find_type.
* testsuite/libstdc++-prettyprinters/59161.cc: Remove workaround for
_Node typedef not being present in debuginfo.
* testsuite/libstdc++-prettyprinters/91997.cc: New test.

Tested powerpc64le-linux, committed to trunk.

I plan to backport this to the release branches too.

commit 85e0abed67eaf9e10382d8688dfa3260d11c1b7a
Author: Jonathan Wakely 
Date:   Fri Nov 29 13:13:56 2019 +

libstdc++:: improve how pretty printers find node types (PR 91997)

This fixes two related problems.

The iterators for node-based containers use nested typedefs such as
std::list::iterator::_Node to denote their node types. As reported in
https://bugzilla.redhat.com/show_bug.cgi?id=1053438 those typedefs are
not always present in the debug info. That means the pretty printers
cannot find them using gdb.lookup_type (via the find_type helper).
Instead of looking up the nested typedefs this patch makes the printers
look up the actual class templates directly.

A related problem (and the original topic of PR 91997) is that GDB fails
to find types via gdb.lookup_type when printing a backtrace from a
non-C++ functiion: https://sourceware.org/bugzilla/show_bug.cgi?id=25234
That is also solved by not looking up the nested typedef.

PR libstdc++/91997
* python/libstdcxx/v6/printers.py (find_type): Fail more gracefully
if we run out of base classes to look at.
(llokup_templ_spec, lookup_node_type): New utilities to find node
types for node-based containers.
(StdListPrinter.children, NodeIteratorPrinter.__init__)
(NodeIteratorPrinter.to_string, StdSlistPrinter.children)
(StdSlistIteratorPrinter.to_string, 
StdRbtreeIteratorPrinter.__init__)
(StdMapPrinter.children, StdSetPrinter.children)
(StdForwardListPrinter.children): Use lookup_node_type instead of
find_type.
(StdListIteratorPrinter.__init__, 
StdFwdListIteratorPrinter.__init__):
Pass name of node type to NodeIteratorPrinter constructor.
(Tr1HashtableIterator.__init__): Rename argument.
(StdHashtableIterator.__init__): Likewise. Use lookup_templ_spec
instead of find_type.
* testsuite/libstdc++-prettyprinters/59161.cc: Remove workaround for
_Node typedef not being present in debuginfo.
* testsuite/libstdc++-prettyprinters/91997.cc: New test.

diff --git a/libstdc++-v3/python/libstdcxx/v6/printers.py 
b/libstdc++-v3/python/libstdcxx/v6/printers.py
index cd79a1fa6e6..869a8286675 100644
--- a/libstdc++-v3/python/libstdcxx/v6/printers.py
+++ b/libstdc++-v3/python/libstdcxx/v6/printers.py
@@ -94,13 +94,78 @@ def find_type(orig, name):
 # The type was not found, so try the superclass.  We only need
 # to check the first superclass, so we don't bother with
 # anything fancier here.
-field = typ.fields()[0]
-if not field.is_base_class:
+fields = typ.fields()
+if len(fields) and fields[0].is_base_class:
+typ = fields[0].type
+else:
 raise ValueError("Cannot find type %s::%s" % (str(orig), name))
-typ = field.type

Re: [C++ PATCH] (temporarily) undefine __cpp_consteval

2019-11-29 Thread Jakub Jelinek
On Fri, Nov 29, 2019 at 09:42:06AM -0500, Marek Polacek wrote:
> On Fri, Nov 29, 2019 at 10:30:13AM +0100, Jakub Jelinek wrote:
> > --- a/htdocs/projects/cxx-status.html
> > +++ b/htdocs/projects/cxx-status.html
> > @@ -288,7 +288,8 @@
> >  
> > Immediate functions (consteval) 
> >http://wg21.link/p1073r3";>P1073R3
> > -   10 
> > 
> > +   10
> > +(partial, no consteval virtual support) 
> > __cpp_consteval >= 201811 
> 
> You can use class="partial" which I added for use in the DR table.

We have other partial ones (__VA_OPT__), so then:

diff --git a/htdocs/projects/cxx-status.html b/htdocs/projects/cxx-status.html
index c6ff78e1..00ca61e4 100644
--- a/htdocs/projects/cxx-status.html
+++ b/htdocs/projects/cxx-status.html
@@ -87,7 +87,7 @@
__VA_OPT__ for preprocessor comma elision 
   http://wg21.link/p0306r4";>P0306R4
   http://wg21.link/p1042r1";>P1042R1
-   8
+   8
 (partial, no #__VA_OPT__ support) 

 
@@ -288,7 +288,8 @@
 
Immediate functions (consteval) 
   http://wg21.link/p1073r3";>P1073R3
-   10 
+   10
+(partial, no consteval virtual support) 
__cpp_consteval >= 201811 
 
 


Jakub



BountySource campaign for the cc0 transition of the AVR backend

2019-11-29 Thread John Paul Adrian Glaubitz
Hi!

Since we were successful with the BountySource campaign to convert the M68K 
backend
from cc0 to MODE_CC [1], I have now started a second BountySource campaign which
aims to achieve the same thing for the AVR backend [2]. The corresponding bug
in Bugzilla is #92729 [3].

Let's see how it goes :).

Adrian

> [1] 
> https://www.bountysource.com/issues/80706251-m68k-convert-the-backend-to-mode_cc-so-it-can-be-kept-in-future-releases
> [2] 
> https://www.bountysource.com/issues/84630749-avr-convert-the-backend-to-mode_cc-so-it-can-be-kept-in-future-releases
> [3] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92729

-- 
 .''`.  John Paul Adrian Glaubitz
: :' :  Debian Developer - glaub...@debian.org
`. `'   Freie Universitaet Berlin - glaub...@physik.fu-berlin.de
  `-GPG: 62FF 8A75 84E0 2956 9546  0006 7426 3B37 F5B5 F913


Re: Fix DR_GROUP_GAP for strided accesses (PR 92677)

2019-11-29 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, Nov 29, 2019 at 11:53 AM Richard Sandiford
>  wrote:
>>
>> When dissolving an SLP-only group of accesses, we should only set
>> the gap to group_size - 1 for normal non-strided groups.
>>
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> OK.  Probably also broken on branch(es).

It looks like it's trunk only -- we didn't dissolve DR groups here
until the support for SLP masked loads (r271704).

Thanks,
Richard

>
> Richard.
>
>> Richard
>>
>>
>> 2019-11-29  Richard Sandiford  
>>
>> gcc/
>> PR tree-optimization/92677
>> * tree-vect-loop.c (vect_dissolve_slp_only_groups): Set the gap
>> to zero when dissolving a group of strided accesses.
>>
>> gcc/testsuite/
>> PR tree-optimization/92677
>> * gcc.dg/vect/pr92677.c: New test.
>>
>> Index: gcc/tree-vect-loop.c
>> ===
>> --- gcc/tree-vect-loop.c2019-11-29 09:13:43.764143091 +
>> +++ gcc/tree-vect-loop.c2019-11-29 10:52:30.475476141 +
>> @@ -1829,7 +1829,10 @@ vect_dissolve_slp_only_groups (loop_vec_
>>   DR_GROUP_FIRST_ELEMENT (vinfo) = vinfo;
>>   DR_GROUP_NEXT_ELEMENT (vinfo) = NULL;
>>   DR_GROUP_SIZE (vinfo) = 1;
>> - DR_GROUP_GAP (vinfo) = group_size - 1;
>> + if (STMT_VINFO_STRIDED_P (first_element))
>> +   DR_GROUP_GAP (vinfo) = 0;
>> + else
>> +   DR_GROUP_GAP (vinfo) = group_size - 1;
>>   vinfo = next;
>> }
>> }
>> Index: gcc/testsuite/gcc.dg/vect/pr92677.c
>> ===
>> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
>> +++ gcc/testsuite/gcc.dg/vect/pr92677.c 2019-11-29 10:52:30.475476141 +
>> @@ -0,0 +1,26 @@
>> +/* { dg-do compile } */
>> +/* { dg-additional-options "-O3" } */
>> +
>> +int a, c;
>> +int *b;
>> +long d;
>> +double *e;
>> +
>> +void fn1() {
>> +  long f;
>> +  double g, h;
>> +  while (c) {
>> +if (d) {
>> +  g = *e;
>> +  *(b + 4) = g;
>> +}
>> +if (f) {
>> +  h = *(e + 2);
>> +  *(b + 6) = h;
>> +}
>> +e += a;
>> +b += 8;
>> +c--;
>> +d += 2;
>> +  }
>> +}


Re: [C++ PATCH] (temporarily) undefine __cpp_consteval

2019-11-29 Thread Marek Polacek
On Fri, Nov 29, 2019 at 03:49:37PM +0100, Jakub Jelinek wrote:
> On Fri, Nov 29, 2019 at 09:42:06AM -0500, Marek Polacek wrote:
> > On Fri, Nov 29, 2019 at 10:30:13AM +0100, Jakub Jelinek wrote:
> > > --- a/htdocs/projects/cxx-status.html
> > > +++ b/htdocs/projects/cxx-status.html
> > > @@ -288,7 +288,8 @@
> > >  
> > > Immediate functions (consteval) 
> > >http://wg21.link/p1073r3";>P1073R3
> > > -   10 
> > > 
> > > +   10
> > > +(partial, no consteval virtual support) 
> > > __cpp_consteval >= 201811 
> > 
> > You can use class="partial" which I added for use in the DR table.
> 
> We have other partial ones (__VA_OPT__), so then:
> 
> diff --git a/htdocs/projects/cxx-status.html b/htdocs/projects/cxx-status.html
> index c6ff78e1..00ca61e4 100644
> --- a/htdocs/projects/cxx-status.html
> +++ b/htdocs/projects/cxx-status.html
> @@ -87,7 +87,7 @@
> __VA_OPT__ for preprocessor comma elision 
>http://wg21.link/p0306r4";>P0306R4
>http://wg21.link/p1042r1";>P1042R1
> -   8
> +   8
>  (partial, no #__VA_OPT__ support) 
> 
>  
> @@ -288,7 +288,8 @@
>  
> Immediate functions (consteval) 
>http://wg21.link/p1073r3";>P1073R3
> -   10 
> 
> +   10
> +(partial, no consteval virtual support) 
> __cpp_consteval >= 201811 
>  
>  

Sure, this is fine.

Marek



Re: [PATCH] PR85678: Change default to -fno-common

2019-11-29 Thread Martin Liška

On 11/29/19 3:43 PM, Wilco Dijkstra wrote:

How significant? Is it mostly the common mistake of forgetting extern?


Likely, I see it in at least 400 packages out of 11000 which we have
in openSUSE:Factory. Plus there are many 'nm -B' configure script
defects:
https://lists.gnu.org/archive/html/bug-autoconf/2019-11/msg1.html

Thank you for the porting to entry!

Martin


Re: [PATCH] Fix ICE in re-simplification of VEC_COND_EXPR

2019-11-29 Thread Harwath, Frederik
On 29.11.19 15:46, Richard Sandiford wrote:

> Thanks for doing this, looks good to me FWIW.  I was seeing the same
> failure for SVE but hadn't found time to look at it.

Thank you all for the review. Committed as r278853.

Frederik



Re: [PATCH][RFC] Add new ipa-reorder pass

2019-11-29 Thread Martin Liška

Hi.

I'm sending v3 of the patch where I changed:
- function.cold sections are properly put into .text.unlikely and
  not into a .text.sorted.XYZ section

I've just finished measurements and I still have the original speed up
for tramp3d:
Total runs: 10, before: 13.92, after: 13.82, cmp: 99.219%

Thoughts?
Martin

>From f3fd85c29a4a2746555294bd30e3c31129030074 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 5 Sep 2019 13:32:41 +0200
Subject: [PATCH] Add new ipa-reorder pass.

gcc/ChangeLog:

2019-11-25  Martin Liska  

	* Makefile.in: Add ipa-reorder.o.
	* cgraph.c (cgraph_node::dump): Dump
	text_sorted_order.
	(cgraph_node_cmp_by_text_sorted):
	New function that sorts functions based
	on text_sorted_order.
	* cgraph.h (cgraph_node): Add text_sorted_order.
	(cgraph_node_cmp_by_text_sorted): New.
	* cgraphclones.c (cgraph_node::create_clone):
	Clone also text_sorted_order.
	* cgraphunit.c (node_cmp): Remove.
	(expand_all_functions): Use new function
	cgraph_node_cmp_by_text_sorted.
	* common.opt: Add new option reorder_functions_algorithm.
	* flag-types.h (enum reorder_functions_algorithm):
	New enum.
	* ipa-reorder.c: New file.
	* lto-cgraph.c (lto_output_node): Stream in and out
	text_sorted_order.
	(input_node): Likewise.
	* passes.def: Add pass_ipa_reorder.
	* timevar.def (TV_IPA_REORDER): New.
	* tree-pass.h (make_pass_ipa_reorder): New.
	* varasm.c (default_function_section): Assign text.sorted.X
	section.

gcc/lto/ChangeLog:

2019-11-25  Martin Liska  

	* lto-partition.c (node_cmp): Remove.
	(lto_balanced_map): Use new cgraph_node_cmp_by_text_sorted.
---
 gcc/Makefile.in |   1 +
 gcc/cgraph.c|  20 +++
 gcc/cgraph.h|   3 +
 gcc/cgraphclones.c  |   1 +
 gcc/cgraphunit.c|  21 +--
 gcc/common.opt  |  14 ++
 gcc/flag-types.h|   8 +
 gcc/ipa-reorder.c   | 383 
 gcc/lto-cgraph.c|   2 +
 gcc/lto/lto-partition.c |  33 +---
 gcc/passes.def  |   1 +
 gcc/timevar.def |   1 +
 gcc/tree-pass.h |   1 +
 gcc/varasm.c|   9 +
 14 files changed, 448 insertions(+), 50 deletions(-)
 create mode 100644 gcc/ipa-reorder.c

diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 7d3c13230e4..163d47c47f1 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1369,6 +1369,7 @@ OBJS = \
 	init-regs.o \
 	internal-fn.o \
 	ipa-cp.o \
+	ipa-reorder.o \
 	ipa-sra.o \
 	ipa-devirt.o \
 	ipa-fnsummary.o \
diff --git a/gcc/cgraph.c b/gcc/cgraph.c
index 180d21e4796..9c09d63e4ee 100644
--- a/gcc/cgraph.c
+++ b/gcc/cgraph.c
@@ -1951,6 +1951,8 @@ cgraph_node::dump (FILE *f)
 }
   if (tp_first_run > 0)
 fprintf (f, " first_run:%i", tp_first_run);
+  if (text_sorted_order > 0)
+fprintf (f, " text_sorted_order:%i", text_sorted_order);
   if (origin)
 fprintf (f, " nested in:%s", origin->asm_name ());
   if (gimple_has_body_p (decl))
@@ -3718,6 +3720,24 @@ cgraph_edge::possibly_call_in_translation_unit_p (void)
   return node->get_availability () >= AVAIL_INTERPOSABLE;
 }
 
+/* Sort cgraph_nodes by text_sorted_order if available, or by order.  */
+
+int
+cgraph_node_cmp_by_text_sorted (const void *pa, const void *pb)
+{
+  const cgraph_node *a = *(const cgraph_node * const *) pa;
+  const cgraph_node *b = *(const cgraph_node * const *) pb;
+
+  /* Functions with text_sorted_order should be before these
+ without profile.  */
+  if (a->text_sorted_order == 0 || b->text_sorted_order == 0)
+return a->text_sorted_order - b->text_sorted_order;
+
+  return a->text_sorted_order != b->text_sorted_order
+	 ? b->text_sorted_order - a->text_sorted_order
+	 : b->order - a->order;
+}
+
 /* A stashed copy of "symtab" for use by selftest::symbol_table_test.
This needs to be a global so that it can be a GC root, and thus
prevent the stashed copy from being garbage-collected if the GC runs
diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index 0d2442c997c..039819db358 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1435,6 +1435,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cgraph_node : public symtab_node
   unsigned int profile_id;
   /* Time profiler: first run of function.  */
   int tp_first_run;
+  /* Order in .text.sorted.* section.  */
+  int text_sorted_order;
 
   /* Set when decl is an abstract function pointed to by the
  ABSTRACT_DECL_ORIGIN of a reachable function.  */
@@ -2451,6 +2453,7 @@ bool cgraph_function_possibly_inlined_p (tree);
 
 const char* cgraph_inline_failed_string (cgraph_inline_failed_t);
 cgraph_inline_failed_type_t cgraph_inline_failed_type (cgraph_inline_failed_t);
+int cgraph_node_cmp_by_text_sorted (const void *pa, const void *pb);
 
 /* In cgraphunit.c  */
 void cgraphunit_c_finalize (void);
diff --git a/gcc/cgraphclones.c b/gcc/cgraphclones.c
index a79491e0b88..f624cbee185 100644
--- a/gcc/cgraphclones.c
+++ b/gcc/cgraphclones.c
@@ -372,6 +372,7 @@ cgraph_node::create_clone (tree new_decl, profile_count prof_count,
   new_node->rtl = rtl;
   ne

Re: [PATCH v3] PR92398: Fix testcase failure of pr72804.c

2019-11-29 Thread Segher Boessenkool
Hi Xiong Hu,

On Mon, Nov 25, 2019 at 10:24:35AM +0800, luoxhu wrote:
> P9LE generated instruction is not worse than P8LE.
> mtvsrdd;xxlnot;stxv vs. not;not;std;std.

To be clear: it can have longer latency, but latency via memory is not
so critical, and this does save decode and other resources.  It's hard
to choose which is best :-)

>   * gcc.target/powerpc/pr72804.c: Split the store function to...
>   * gcc.target/powerpc/pr92398.h: ... this one.  New.

I wanted to say that splitting one single function to a header file is a
bit overkill, but it gives a nice place to discuss the differences in
generated code on different CPUs, so okay, it's useful :-)

> +   store generates difference instructions as below:
> +   P9+: mtvsrdd;xxlnot;stxv.
> +   P8/P7/P6 LE: not;not;std;std.
> +   P8 BE: mtvsrd;mtvsrd;xxpermdi;xxlnor;stxvd2x.
> +   P7/P6 BE: std;std;addi;lxvd2x;xxlnor;stxvd2x.
> +   P9+ and P9- LE are expected, P6/P7/P8 BE are unexpected.  */

Great overview, thanks.

> diff --git a/gcc/testsuite/gcc.target/powerpc/pr92398.p9+.c 
> b/gcc/testsuite/gcc.target/powerpc/pr92398.p9+.c
> new file mode 100644
> index 000..2ebe2025cef
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr92398.p9+.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target { lp64 && p9+ } } } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx" } */
> +
> +/* { dg-final { scan-assembler-times {\mmtvsrdd\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mxxlnor\M} 1 } } */
> +/* { dg-final { scan-assembler-times {\mstxv\M} 1 } } */

Maybe add scan-assembler-not for "not" and "std", and in the < p9 testcase
for these three?

> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/powerpc/pr92398.p9-.c
> @@ -0,0 +1,10 @@
> +/* { dg-do compile { target { lp64 && {! p9+} } } } */
> +/* { dg-require-effective-target powerpc_vsx_ok } */
> +/* { dg-options "-O2 -mvsx" } */
> +
> +/* { dg-final { scan-assembler-times {\mnot\M} 2 { xfail be } } } */
> +/* { dg-final { scan-assembler-times {\mstd\M} 2 { xfail { {p8} && {be} } } 
> } } */

I think you can write that as just

/* { dg-final { scan-assembler-times {\mstd\M} 2 { xfail { p8 && be } } } } */

Okay for trunk with or without such tweaks.  Thanks, and sorry the review
took a while!


Segher


[PATCH][AArch64] Add support for fused compare and branch

2019-11-29 Thread Wilco Dijkstra
Hi,

Add support for fused compare with branch.  Rename the existing
AARCH64_FUSE_CMP_BRANCH to ALU_BRANCH, and AARCH64_FUSE_ALU_BRANCH
to ALU_CBZ to make it clear what is being fused.

AArch64 bootstrap OK, OK to commit?

ChangeLog:

2019-11-29  Wilco Dijkstra  

* config/aarch64/aarch64.c
(thunderxt88_tunings): Use AARCH64_FUSE_ALU_BRANCH.
(thunderx_tunings): Likewise.
(tsv110_tunings): Use AARCH64_FUSE_ALU_BRANCH and AARCH64_FUSE_ALU_CBZ.
(thunderx2t99_tunings): Likewise.
(aarch_macro_fusion_pair_p): Add support for AARCH64_FUSE_CMP_BRANCH.
* config/aarch64/aarch64-fusion-pairs.def: Add ALU_CBZ fusion.
--

diff --git a/gcc/config/aarch64/aarch64-fusion-pairs.def 
b/gcc/config/aarch64/aarch64-fusion-pairs.def
index 
ce4bb92d5c9d1f187c026b1a714e485a2b9f1a74..051009b42b2db4e79a8b302fd3f1b65dedfdba8f
 100644
--- a/gcc/config/aarch64/aarch64-fusion-pairs.def
+++ b/gcc/config/aarch64/aarch64-fusion-pairs.def
@@ -35,5 +35,6 @@ AARCH64_FUSION_PAIR ("adrp+ldr", ADRP_LDR)
 AARCH64_FUSION_PAIR ("cmp+branch", CMP_BRANCH)
 AARCH64_FUSION_PAIR ("aes+aesmc", AES_AESMC)
 AARCH64_FUSION_PAIR ("alu+branch", ALU_BRANCH)
+AARCH64_FUSION_PAIR ("alu+cbz", ALU_CBZ)
 
 #undef AARCH64_FUSION_PAIR
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
94e664af52fb0e3404e76d4d0b67a618023571ba..f74134ea2e080c361be0facc274575c09fbb7a82
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -915,7 +915,7 @@ static const struct tune_params thunderxt88_tunings =
   SVE_NOT_IMPLEMENTED, /* sve_width  */
   6, /* memmov_cost  */
   2, /* issue_rate  */
-  AARCH64_FUSE_CMP_BRANCH, /* fusible_ops  */
+  AARCH64_FUSE_ALU_BRANCH, /* fusible_ops  */
   "8", /* function_align.  */
   "8", /* jump_align.  */
   "8", /* loop_align.  */
@@ -941,7 +941,7 @@ static const struct tune_params thunderx_tunings =
   SVE_NOT_IMPLEMENTED, /* sve_width  */
   6, /* memmov_cost  */
   2, /* issue_rate  */
-  AARCH64_FUSE_CMP_BRANCH, /* fusible_ops  */
+  AARCH64_FUSE_ALU_BRANCH, /* fusible_ops  */
   "8", /* function_align.  */
   "8", /* jump_align.  */
   "8", /* loop_align.  */
@@ -968,8 +968,8 @@ static const struct tune_params tsv110_tunings =
   SVE_NOT_IMPLEMENTED, /* sve_width  */
   4,/* memmov_cost  */
   4,/* issue_rate  */
-  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_CMP_BRANCH
-   | AARCH64_FUSE_ALU_BRANCH), /* fusible_ops  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_ALU_BRANCH
+   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
   "16", /* function_align.  */
   "4",  /* jump_align.  */
   "8",  /* loop_align.  */
@@ -1103,8 +1103,8 @@ static const struct tune_params thunderx2t99_tunings =
   SVE_NOT_IMPLEMENTED, /* sve_width  */
   4, /* memmov_cost.  */
   4, /* issue_rate.  */
-  (AARCH64_FUSE_CMP_BRANCH | AARCH64_FUSE_AES_AESMC
-   | AARCH64_FUSE_ALU_BRANCH), /* fusible_ops  */
+  (AARCH64_FUSE_ALU_BRANCH | AARCH64_FUSE_AES_AESMC
+   | AARCH64_FUSE_ALU_CBZ), /* fusible_ops  */
   "16",/* function_align.  */
   "8", /* jump_align.  */
   "16",/* loop_align.  */
@@ -20363,7 +20363,14 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
*curr)
 }
 }
 
+  /* Fuse compare (CMP/CMN/TST/BICS) and conditional branch.  */
   if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_BRANCH)
+  && prev_set && curr_set && any_condjump_p (curr)
+  && reg_referenced_p (SET_DEST (curr_set), PATTERN (curr)))
+return true;
+
+  /* Fuse flag-setting ALU instructions and conditional branch.  */
+  if (aarch64_fusion_enabled_p (AARCH64_FUSE_ALU_BRANCH)
   && any_condjump_p (curr))
 {
   unsigned int condreg1, condreg2;
@@ -20387,9 +20394,10 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
*curr)
}
 }
 
+  /* Fuse ALU instructions and CBZ/CBNZ.  */
   if (prev_set
   && curr_set
-  && aarch64_fusion_enabled_p (AARCH64_FUSE_ALU_BRANCH)
+  && aarch64_fusion_enabled_p (AARCH64_FUSE_ALU_CBZ)
   && any_condjump_p (curr))
 {
   /* We're trying to match:


Re: [PATCH][AArch64] Add support for fused compare and branch

2019-11-29 Thread Richard Sandiford
Wilco Dijkstra  writes:
> Hi,
>
> Add support for fused compare with branch.  Rename the existing
> AARCH64_FUSE_CMP_BRANCH to ALU_BRANCH, and AARCH64_FUSE_ALU_BRANCH
> to ALU_CBZ to make it clear what is being fused.

This night have been easier to review as three patches:

(1) rename ALU_BRANCH to ALU_CBZ
(2) rename CMP_BRANCH to ALU_BRANCH
(3) add back CMP_BRANCH with more accurate semantics

But what uses CMP_BRANCH after the patch?  It looked like you renamed
all existing uses and didn't add any new ones.

> @@ -20363,7 +20363,14 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn 
> *curr)
>  }
>  }
>
> +  /* Fuse compare (CMP/CMN/TST/BICS) and conditional branch.  */
>if (aarch64_fusion_enabled_p (AARCH64_FUSE_CMP_BRANCH)
> +  && prev_set && curr_set && any_condjump_p (curr)
> +  && reg_referenced_p (SET_DEST (curr_set), PATTERN (curr)))

Looks like this should be prev_set.  But this condition will trigger
for any prev_insn that is only being kept around for its effect on the
flags, not just CMP/CMN/TST/BICS.  If it's only supposed to be those
four insns then I think we should test for them explicitly.

Thanks,
Richard

> +return true;
> +
> +  /* Fuse flag-setting ALU instructions and conditional branch.  */
> +  if (aarch64_fusion_enabled_p (AARCH64_FUSE_ALU_BRANCH)
>&& any_condjump_p (curr))
>  {
>unsigned int condreg1, condreg2;


[wwwdocs] Make gcc-9/ and gcc-10/ a bit more uniform (wrt. )

2019-11-29 Thread Gerald Pfeifer
...which is another way of saying that there actually were no style 
sheets applied to some of those (in addition to inconsistent formatting).

Committed.

Gerald


commit 553b5e98de2dc90183773b0c0d750db62d6ad8db
Author: Gerald Pfeifer 
Date:   Fri Nov 29 17:40:34 2019 +0100

diff --git a/htdocs/gcc-10/changes.html b/htdocs/gcc-10/changes.html
index f0f0d31..5cca097 100644
--- a/htdocs/gcc-10/changes.html
+++ b/htdocs/gcc-10/changes.html
@@ -1,8 +1,10 @@
 
 
+
 
 
 GCC 10 Release Series — Changes, New Features, and Fixes
+https://gcc.gnu.org/gcc.css"; />
 
 
 

[PATCH committed] wwwdocs: Update simulator in backends.html (i386, m68k, s390, tilegx)

2019-11-29 Thread Segher Boessenkool
All of i386, m68k, s390, and tilegx are supported in QEMU (see
https://wiki.qemu.org/Documentation/Platforms for example), so they
should not have "S" (or "?" in that column).
---
 htdocs/backends.html | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/htdocs/backends.html b/htdocs/backends.html
index a392d5e..2de21fe 100644
--- a/htdocs/backends.html
+++ b/htdocs/backends.html
@@ -84,13 +84,13 @@ fr30   | ??FI B  pb mgs
 frv| ??   B   b   i   s
 gcn|   S C D  qa e
 h8300  |   FI B cgs
-i386   |   ? Qq   b   ia
+i386   | Qq   b   ia
 ia64   |   ? Q   Cqr  b m i
 iq2000 | ???   FICB   b  g  t
 lm32   |   F g
 m32c   |L  FIlb  gs
 m32r   |   FI b   s
-m68k   |   ? pb   i
+m68k   | pb   i
 mcore  |  ?FIpb mgs
 mep|   F Cb  g  t s
 microblaze | CB   i   s
@@ -110,11 +110,11 @@ riscv  | Q   Cqr gia
 rl78   |L  F l   gs
 rs6000 | Q   Cqr pb   ia
 rx |  s
-s390   |   ? Qqr gia e
+s390   | Qqr gia e
 sh | Q   CB   qr pi
 sparc  | Q   CB   qr  b   ia
 stormy16   | ???L  FIC D lb   i
-tilegx |   S Q   Cq  gi  e
+tilegx | Q   Cq  gi  e
 tilepro|   S   F C   gi  e
 v850   | g a  s
 vax|  M I   c b   i  e
-- 
1.8.3.1



Re: [PATCH 2/4] The main m68k cc0 conversion

2019-11-29 Thread Segher Boessenkool
On Mon, Nov 25, 2019 at 03:44:44PM +0100, John Paul Adrian Glaubitz wrote:
> On 11/25/19 3:40 PM, Segher Boessenkool wrote:
> > On Mon, Nov 25, 2019 at 01:38:53PM +0100, Tobias Burnus wrote:
> >> Thanks for the m68k work! Can you also update 
> >> https://gcc.gnu.org/backends.html ?
> > 
> >> PS: I wonder whether some other archs also should be updated on that web 
> >> page.
> > 
> > Possibly.  Probably?
> > 
> > But, do you have any particular suggestions?
> 
> For m68k, it should be added that a free simulator is available as qemu
> has gained full emulation support for m68k as compared to the previous
> ColdFire-only emulation, i.e. the question mark in the "S" column should
> be removed.
> 
> Same applies to i386, s390 and tilegx. These are all supported by qemu.

I've done this now (my first commit to the new git repo, wow that was
easy to do :-) )  Thanks for the suggestion!


Segher


Re: [Patch][Fortran] OpenACC – permit common blocks in some clauses

2019-11-29 Thread Tobias Burnus

Hi Thomas,

On 11/28/19 6:01 PM, Thomas Schwinge wrote:
Definition (3.32.1 in F2018): "blank common" = "unnamed common block". 


I just want to add the following, which came into my mind after thinking 
more about device_resident (the other email in this thread). Fortran 
(here: 2018, 8.10.2.5) has:


"Named common blocks of the same name shall be of the same size in all 
scoping units of a program in which they appear, but blank common blocks 
may be of different sizes."


* * *

Depending on the use of a common block (see other email in the same 
thread, to be send shortly), that's fine or not. If the common block 
only exists on the device (i.e. in a device routine / 'target' 
procedure) or only on the host, everything is fine. — In this case, the 
connection between host and target is done by single variables – and no 
one cares whether they are in a common block or not.


It only becomes interesting if the same(-named) common block is known to 
both the host and the device – in that case, it is important that the 
size matches, otherwise either the copying to the device or (via 
'update') from the device to the host will write beyond the static 
variable! — Also in the latter case, it makes sense that 
'copy(/block_name/)' will map the whole common block and not only the 
directly used variables (which might be none).


* * *

OpenACC: Does one need device_resident to allocate 'static' global 
memory on the device? If not, then its only use would be for same-named 
common blocks, existing on both the device and the host. If it is 
needed, then one needs to think about the semantic – will it declare a 
common block which exists only on the device or one which exists on both 
device and host with the same name. — I think that needs to be spelt out 
in the spec clearly; at the moment, it is ambiguous. In any case, it 
influences how copy(/block_name/) acts.


For OpenMP, my impression is that the spec is completely silent on 
device-located common blocks. And if a common block is only on the host, 
copy(/block/) just maps the used (common-block) variables to the target 
– which is fine. — Seems as if some spec work is needed as well.


* * *


I go by the assumption that everything contained in the base
languages of OpenACC/OpenMP ([…] C, C++, Fortran
standards), should also work in an OpenACC/OpenMP context in a sensible
manner […]

I concur.
ACK. Instead of "burying" such things in long emails, I like to see 
GCC PRs filed, which can then be actioned on individually.


Well, I think one first needs to understand what's supposed to be in the 
standard. Having said this, I have now filled – PR 92728 + PR 92730.


[blank commons]

assumption would thus be: yes, ought to be supported -- but I haven't 
thought through whether that makes sense, so...


By itself, using blank commons make sense if one maps variables from a 
common block but not if one maps the whole common block.


[Blank commons + PGI]

I will later play around with the PGI compiler; but I think it is really 
a spec issue and I care less what a specific compiler does. (Even 
though, with OpenACC, it is kind of the reference compiler.)



determine whether that makes sense to support in an OpenACC context


I think that needs discussion about what one wants to achieve instead of 
directly patching the spec.



* I have now a new test case
libgomp/testsuite/libgomp.oacc-fortran/common-block-3.f90 which looks at
omplower.
Actually, I think this should be: 
gcc/testsuite/gfortran.dg/goacc/common-block-3.f90

Thanks. Curious: why 'omplower' instead of 'gimple' dump?

[…]

My rationale is that your code changes are in 'gcc/gimplify.c', so you'd
test for that stuff in the 'gimple' dump (which is between 'original' and
'omplower').

[switching to gimple]

I think it does (but please argue if it doesn't to you), but that's not
high priority, of course.


Hmm, that might be more specific to other parts – but with optional 
arguments, I had constantly to look at what has been passed on to 
libgomp via omp_arr – even though the code was produced directly by the 
front end. The 'pragma' simply didn't tell the whole story – omp-low.c 
added and removed some '*' and '&' which were crucial.


Probably, 'map' as parsed here doesn't change any more between 
gimplify.c and the early stages of omp-low.c, but I feel safer if the 
wanted result survived until the end of omp-low.c and does not get 
modified in an unintended way later on.



Note that I said 'dg-additional-options', not 'dg-options', so please
re-consider.


Ups. Yes, dg-additional-options should work :-)

[oacc/pr84963.f90]

Good find. […] to change 'dg-options "-O2"' to 'dg-additional-options 
"-O2"
Please verify, and then commit that to trunk, gcc-8-branch, gcc-7-branch, 


Done so – except I committed to GCC 9 instead of GCC 7, which is now 
closed :-)

[Question about a test case]


Then let's please document that in the test case sources, for that's not
quite obvious.


I have now committed

Re: [PATCH 0/4]: C++ P1423R3 char8_t remediation implementation

2019-11-29 Thread Jonathan Wakely

On 15/09/19 15:39 -0400, Tom Honermann wrote:
This series of patches provides an implementation of the changes for 
C++ proposal P1423R3 [1].


These changes do not impact default libstdc++ behavior for C++17 and 
earlier; they are only active for C++2a or when the -fchar8_t option 
is specified.


Tested x86_64-linux.

Patch 1: Decouple constraints for u8path from path constructors.
Patch 2: Update __cpp_lib_char8_t feature test macro value, add 
deleted operators, update u8path.

Patch 3: Updates to existing tests.
Patch 4: New tests.

Tom.

[1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1423r3.html


It took a while, but I've committed these four patches, with just some
minor whitespace changes and changelog tweaks.

I'm also following it up with this patch, which corrects some
pre-existing problems that got worse with the new deleted operator<<
overloads.

Tested powerpc64le-linux, committed to trunk.


commit 81c954850d8422eaeefe3c9e833b75f8dbeb905f
Author: Jonathan Wakely 
Date:   Fri Nov 29 17:15:25 2019 +

libstdc++: Adjust some function templates for coding conventions

* include/bits/fs_path.h (path::operator/=): Change template-head to
use typename instead of class.
* include/experimental/bits/fs_path.h (path::operator/=): Likewise.
* include/std/ostream (operator<<): Likewise.

diff --git a/libstdc++-v3/include/bits/fs_path.h b/libstdc++-v3/include/bits/fs_path.h
index 643478292cd..b129372447b 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -272,7 +272,7 @@ namespace __detail
 
 path& operator/=(const path& __p);
 
-template 
+template
   __detail::_Path<_Source>&
   operator/=(_Source const& __source)
   {
diff --git a/libstdc++-v3/include/experimental/bits/fs_path.h b/libstdc++-v3/include/experimental/bits/fs_path.h
index b924fbfd5f6..91202e5b008 100644
--- a/libstdc++-v3/include/experimental/bits/fs_path.h
+++ b/libstdc++-v3/include/experimental/bits/fs_path.h
@@ -278,7 +278,7 @@ namespace __detail
 
 path& operator/=(const path& __p) { return _M_append(__p._M_pathname); }
 
-template 
+template
   __detail::_Path<_Source>&
   operator/=(_Source const& __source)
   { return append(__source); }
diff --git a/libstdc++-v3/include/std/ostream b/libstdc++-v3/include/std/ostream
index 771c28db7b7..895e4d7ab4e 100644
--- a/libstdc++-v3/include/std/ostream
+++ b/libstdc++-v3/include/std/ostream
@@ -512,18 +512,18 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 { return (__out << __out.widen(__c)); }
 
   // Specialization
-  template 
+  template
 inline basic_ostream&
 operator<<(basic_ostream& __out, char __c)
 { return __ostream_insert(__out, &__c, 1); }
 
   // Signed and unsigned
-  template
+  template
 inline basic_ostream&
 operator<<(basic_ostream& __out, signed char __c)
 { return (__out << static_cast(__c)); }
 
-  template
+  template
 inline basic_ostream&
 operator<<(basic_ostream& __out, unsigned char __c)
 { return (__out << static_cast(__c)); }
@@ -533,37 +533,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // numeric values.
 
 #ifdef _GLIBCXX_USE_WCHAR_T
-  template
+  template
 basic_ostream&
 operator<<(basic_ostream&, wchar_t) = delete;
 #endif // _GLIBCXX_USE_WCHAR_T
 
 #ifdef _GLIBCXX_USE_CHAR8_T
-  template
+  template
 basic_ostream&
 operator<<(basic_ostream&, char8_t) = delete;
 #endif
 
-  template
+  template
 basic_ostream&
 operator<<(basic_ostream&, char16_t) = delete;
 
-  template
+  template
 basic_ostream&
 operator<<(basic_ostream&, char32_t) = delete;
 
 #ifdef _GLIBCXX_USE_WCHAR_T
 #ifdef _GLIBCXX_USE_CHAR8_T
-  template
+  template
 basic_ostream&
 operator<<(basic_ostream&, char8_t) = delete;
 #endif // _GLIBCXX_USE_CHAR8_T
 
-  template
+  template
 basic_ostream&
 operator<<(basic_ostream&, char16_t) = delete;
 
-  template
+  template
 basic_ostream&
 operator<<(basic_ostream&, char32_t) = delete;
 #endif // _GLIBCXX_USE_WCHAR_T
@@ -601,7 +601,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 operator<<(basic_ostream<_CharT, _Traits>& __out, const char* __s);
 
   // Partial specializations
-  template
+  template
 inline basic_ostream&
 operator<<(basic_ostream& __out, const char* __s)
 {
@@ -614,12 +614,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 }
 
   // Signed and unsigned
-  template
+  template
 inline basic_ostream&
 operator<<(basic_ostream& __out, const signed char* __s)
 { return (__out << reinterpret_cast(__s)); }
 
-  template
+  template
 inline basic_ostream &
 operator<<(basic_ostream& __out, const unsigned char* __s)
 { return (__out << reinterpret_cast(__s)); }
@@ -629,37 +629,37 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
// pointer values.
 
 #ifdef _GLIBCXX_USE_WCHAR_T
-  template
+  template
 basic_ostream&
 operator<<(basic_ostream&, const wchar_t*) = dele

Re: [Patch][Fortran] OpenACC – permit common blocks in some clauses

2019-11-29 Thread Tobias Burnus

Hi Thomas,

I have started with this email – and then stopped and replied to the 
other email in this thread: 
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02678.html – which covers 
parts which otherwise would belong into this email.


On 11/28/19 6:02 PM, Thomas Schwinge wrote:
[Test case which uses common blocks in device_resident.]

If you'd like to, please commit that, to document the status quo.  (I
have not reviewed.)


Did so as r278845 with a slightly updated comment.


Observations:
* !$acc declare has to come after the declaration of the common block.
That's now tracked in PR fortran/92728 for OpenMP/OpenACC – together 
with blank commons.
Good find -- purely a Fortran front end issue […] is a reason 
(implementation complexity?)


Having any order makes it feel more Fortran like; the complexity comes 
from splitting matching and checking the clauses, but it shouldn't be 
rocket science.



* If I just use '!$acc parallel', the used variables are copied in
according to OpenMP 4.0 semantics, i.e. without a defaultmap clause (of
OpenMP 4.5+; not yet in gfortran), scalars are firstprivate and arrays
are map(fromto:). – Does this behaviour match the spec or should this
automatically mapped to, e.g., no_create as the 'device_resident' is
known? [Side remark: the module file does contain
"OACC_DECLARE_DEVICE_RESIDENT".]

Not sure at this point.


s/OpenMP 4.0/OpenMP 4.5/

Regarding the mapping: Both OpenACC and OpenMP agree and it is fine (cf. 
OpenACC 2.7 last 'Description' paragraph in parallel/kernels, 2.5.1 + 
2.5.2). And OpenMP 4.5, Sect. 2.15.5 (esp. last three bullet points) or 
OpenMP 5, Sect. 2.19.7. (Missing omp bits see PR fortran/92568.).


Regarding device_resident: It is not fully clear to me what the intent 
is – and "The host may not be able to access variables in a 
device_resident clause." does not make it clearer.


In terms of the spec, the mapping with firstprivate/[copy alias tofrom] 
is fine – as is the explict use of present. However, if commons exists 
on both device + host, 'copy(/block/)' should work and also copy 
common-block variables, which are not referrenced in the 
parallel/kernels block – which currently does not work.



* If I explicitly use '!$acc parallel present(/block/)' that fails
because present() does not permit common blocks.
(OpenACC 2.7, p36, l.1054: "For all clauses except deviceptr and
present, the list argument may include a Fortran common block name
enclosed within slashes").

Do you understand the rationale behind that restriction, by the way?  I'm
not sure I do.


Regarding 'present', I don't: If copy/no_create is fine, why should 
present be a problem? (And vice versa.)


For 'deviceptr', it kind of does make sense – unless one wants to store 
the pointer as 'intptr_t' in an integer variable or want to have a 
pointer (i.e. Fortran attribute) in 'common' which will cause mapping 
problems for the common block. — In any case, the 'dummy argument' 
constraint prevents common blocks. – BTW: Those constraints do not make 
sense but seem to be same as for OpenMP's is_device_ptr. (They are both 
too loose and to strict; I miss type(c_ptr) [as local var + as dummy w/ 
value attribute].)


Cheers,

Tobias



[COMMITTED][GCC8] Backport driver/89014 Use-after-free in aarch64 -march=native

2019-11-29 Thread Wilco Dijkstra
Hi,

I've backported r268189 to GCC8:

aarch64: fix use-after-free in -march=native (PR driver/89014)

Running:
  $ valgrind ./xgcc -B. -c test.c -march=native
on aarch64 shows a use-after-free in host_detect_local_cpu due
to the std::string result of aarch64_get_extension_string_for_isa_flags
only living until immediately after a c_str call.

This leads to corrupt "-march=" values being passed to cc1.

This patch fixes the use-after-free, though it appears to also need
Tamar's patch here:
  https://gcc.gnu.org/ml/gcc-patches/2018-12/msg01302.html
in order to generate valid values for cc1.  This may have worked by
accident in the past, if the corrupt "-march=" value happened to be
0-terminated in the "right" place; with this patch it now appears
to reliably break without Tamar's patch.

Backport from mainline
2019-01-23  David Malcolm  

PR driver/89014
* config/aarch64/driver-aarch64.c (host_detect_local_cpu): Fix
use-after-free of the result of
aarch64_get_extension_string_for_isa_flags.

Modified:
branches/gcc-8-branch/gcc/ChangeLog
branches/gcc-8-branch/gcc/config/aarch64/driver-aarch64.c
--
--- branches/gcc-8-branch/gcc/config/aarch64/driver-aarch64.c   2019/11/29 
15:02:35 278853
+++ branches/gcc-8-branch/gcc/config/aarch64/driver-aarch64.c   2019/11/29 
17:22:30 278854
@@ -179,7 +179,6 @@
   unsigned int variants[2] = { ALL_VARIANTS, ALL_VARIANTS };
   unsigned int n_variants = 0;
   bool processed_exts = false;
-  const char *ext_string = "";
   unsigned long extension_flags = 0;
   unsigned long default_flags = 0;
 
@@ -357,11 +356,12 @@
   if (tune)
 return res;
 
-  ext_string
-= aarch64_get_extension_string_for_isa_flags (extension_flags,
- default_flags).c_str ();
-
-  res = concat (res, ext_string, NULL);
+  {
+std::string extension
+  = aarch64_get_extension_string_for_isa_flags (extension_flags,
+   default_flags);
+res = concat (res, extension.c_str (), NULL);
+  }
 
   return res;


[PATCH] [libiberty] Fix write buffer overflow in cplus_demangle

2019-11-29 Thread Tim Rühsen
* cplus-dem.c (ada_demangle): Correctly calculate the demangled
  size by using two passes.

Fixes #92453
---
 libiberty/ChangeLog   |   5 +
 libiberty/cplus-dem.c | 408 +++---
 2 files changed, 226 insertions(+), 187 deletions(-)

diff --git a/libiberty/ChangeLog b/libiberty/ChangeLog
index 95cb1525f2..085b7b3fbf 100644
--- a/libiberty/ChangeLog
+++ b/libiberty/ChangeLog
@@ -1,3 +1,8 @@
+2019-11-16  Tim Ruehsen  
+
+   * cplus-dem.c (ada_demangle): Correctly calculate the demangled
+   size by using two passes. Fixes #92453.
+
 2019-08-08  Martin Liska  

PR bootstrap/91352
diff --git a/libiberty/cplus-dem.c b/libiberty/cplus-dem.c
index a39e2bf2ed..be76e691b0 100644
--- a/libiberty/cplus-dem.c
+++ b/libiberty/cplus-dem.c
@@ -230,7 +230,7 @@ rust_demangle (const char *mangled, int options)
 char *
 ada_demangle (const char *mangled, int option ATTRIBUTE_UNUSED)
 {
-  int len0;
+  int len0 = 0, run;
   const char* p;
   char *d;
   char *demangled = NULL;
@@ -243,236 +243,270 @@ ada_demangle (const char *mangled, int option 
ATTRIBUTE_UNUSED)
   if (!ISLOWER (mangled[0]))
 goto unknown;

-  /* Most of the demangling will trivially remove chars.  Operator names
- may add one char but because they are always preceeded by '__' which is
- replaced by '.', they eventually never expand the size.
- A few special names such as '___elabs' add a few chars (at most 7), but
- they occur only once.  */
-  len0 = strlen (mangled) + 7 + 1;
-  demangled = XNEWVEC (char, len0);
-
-  d = demangled;
-  p = mangled;
-  while (1)
+  for (run = 0; run <= 1; run++)
 {
-  /* An entity names is expected.  */
-  if (ISLOWER (*p))
+  if (run == 1)
 {
-  /* An identifier, which is always lower case.  */
-  do
-*d++ = *p++;
-  while (ISLOWER(*p) || ISDIGIT (*p)
- || (p[0] == '_' && (ISLOWER (p[1]) || ISDIGIT (p[1];
+  demangled = XNEWVEC (char, len0 + 1);
 }
-  else if (p[0] == 'O')
+
+  p = mangled;
+  d = demangled;
+
+  while (1)
 {
-  /* An operator name.  */
-  static const char * const operators[][2] =
-{{"Oabs", "abs"},  {"Oand", "and"},{"Omod", "mod"},
- {"Onot", "not"},  {"Oor", "or"},  {"Orem", "rem"},
- {"Oxor", "xor"},  {"Oeq", "="},   {"One", "/="},
- {"Olt", "<"}, {"Ole", "<="},  {"Ogt", ">"},
- {"Oge", ">="},{"Oadd", "+"},  {"Osubtract", "-"},
- {"Oconcat", "&"}, {"Omultiply", "*"}, {"Odivide", "/"},
- {"Oexpon", "**"}, {NULL, NULL}};
-  int k;
-
-  for (k = 0; operators[k][0] != NULL; k++)
+  /* An entity names is expected.  */
+  if (ISLOWER (*p))
 {
-  size_t slen = strlen (operators[k][0]);
-  if (strncmp (p, operators[k][0], slen) == 0)
+  /* An identifier, which is always lower case.  */
+  do
+run ? (*d++ = *p++) : (p++, len0++);
+  while (ISLOWER (*p) || ISDIGIT (*p)
+ || (p[0] == '_' && (ISLOWER (p[1]) || ISDIGIT (p[1];
+}
+  else if (p[0] == 'O')
+{
+  /* An operator name.  */
+  static const char * const operators[][2] ={
+{"Oabs", "abs"},
+{"Oand", "and"},
+{"Omod", "mod"},
+{"Onot", "not"},
+{"Oor", "or"},
+{"Orem", "rem"},
+{"Oxor", "xor"},
+{"Oeq", "="},
+{"One", "/="},
+{"Olt", "<"},
+{"Ole", "<="},
+{"Ogt", ">"},
+{"Oge", ">="},
+{"Oadd", "+"},
+{"Osubtract", "-"},
+{"Oconcat", "&"},
+{"Omultiply", "*"},
+{"Odivide", "/"},
+{"Oexpon", "**"},
+{NULL, NULL}};
+  int k;
+
+  for (k = 0; operators[k][0] != NULL; k++)
 {
-  p += slen;
-  slen = strlen (operators[k][1]);
-  *d++ = '"';
-  memcpy (d, operators[k][1], slen);
-  d += slen;
-  *d++ = '"';
-  break;
+  size_t slen = strlen (operators[k][0]);
+  if (strncmp (p, operators[k][0], slen) == 0)
+{
+  p += slen;
+  slen = strlen (operators[k][1]);
+  if (run)
+{
+  *d++ = '"';
+  memcpy (d, operators[k][1], slen);
+  d += slen;
+  *d++ = '"';
+}
+  else
+len0 += slen + 2;
+  

Re: [PATCH 0/4]: C++ P1423R3 char8_t remediation implementation

2019-11-29 Thread Jonathan Wakely

On 29/11/19 17:45 +, Jonathan Wakely wrote:

On 15/09/19 15:39 -0400, Tom Honermann wrote:
This series of patches provides an implementation of the changes for 
C++ proposal P1423R3 [1].


These changes do not impact default libstdc++ behavior for C++17 and 
earlier; they are only active for C++2a or when the -fchar8_t option 
is specified.


Tested x86_64-linux.

Patch 1: Decouple constraints for u8path from path constructors.
Patch 2: Update __cpp_lib_char8_t feature test macro value, add 
deleted operators, update u8path.

Patch 3: Updates to existing tests.
Patch 4: New tests.

Tom.

[1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1423r3.html


It took a while, but I've committed these four patches, with just some
minor whitespace changes and changelog tweaks.


Running the new tests revealed a latent bug on Windows, where
experimental::filesystem::u8path(const Source&) assumed the input
was an iterator over a NTCTS. That worked for a const char* but not a
std::string or experimental::string_view.

The attached patch fixes that (and simplifies the #if and if-constexpr
conditions for Windows) but there's a remaining bug. Constructing a
experimental::filesystem::path from a char8_t string doesn't do the
right thing on Windows, so these cases fails:

 fs::path p(u8"\xf0\x9d\x84\x9e");
 VERIFY( p.u8string() == u8"\U0001D11E" );

 p = fs::u8path(u8"\xf0\x9d\x84\x9e");
 VERIFY( p.u8string() == u8"\U0001D11E" );

It works correctly for std::filesystem::path, just not the TS version.

I plan to commit the attached patch after some more testing. I'll
backport something like it too.


commit f9ff56e4624c34be0615aab62ed4a2ce559dd567
Author: Jonathan Wakely 
Date:   Fri Nov 29 19:36:39 2019 +

libstdc++: Fix experimental::filesystem::u8path(const Source&) for Windows

This function failed to compile when called with a std::string.

* include/bits/fs_path.h (u8path(InputIterator, InputIterator))
(u8path(const Source&)) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Simplify
conditions.
* include/experimental/bits/fs_path.h (__u8path(const Source&, char)):
Add overloads for std::string and types convertible to std::string.

diff --git a/libstdc++-v3/include/bits/fs_path.h b/libstdc++-v3/include/bits/fs_path.h
index b129372447b..20ec42da57d 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -691,14 +691,8 @@ namespace __detail
 u8path(_InputIterator __first, _InputIterator __last)
 {
 #ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
-#ifdef _GLIBCXX_USE_CHAR8_T
-  if constexpr (is_same_v<_CharT, char8_t>)
+  if constexpr (is_same_v<_CharT, char>)
 	{
-	  return path{ __first, __last };
-	}
-  else
-	{
-#endif
 	  // XXX This assumes native wide encoding is UTF-16.
 	  std::codecvt_utf8_utf16 __cvt;
 	  path::string_type __tmp;
@@ -710,16 +704,16 @@ namespace __detail
 	  else
 	{
 	  const std::string __u8str{__first, __last};
-	  const char* const __ptr = __u8str.data();
-	  if (__str_codecvt_in_all(__ptr, __ptr + __u8str.size(), __tmp, __cvt))
+	  const char* const __p = __u8str.data();
+	  if (__str_codecvt_in_all(__p, __p + __u8str.size(), __tmp, __cvt))
 		return path{ __tmp };
 	}
 	  _GLIBCXX_THROW_OR_ABORT(filesystem_error(
 	  "Cannot convert character sequence",
 	  std::make_error_code(errc::illegal_byte_sequence)));
-#ifdef _GLIBCXX_USE_CHAR8_T
 	}
-#endif
+  else
+	return path{ __first, __last };
 #else
   // This assumes native normal encoding is UTF-8.
   return path{ __first, __last };
@@ -737,14 +731,8 @@ namespace __detail
 u8path(const _Source& __source)
 {
 #ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
-#ifdef _GLIBCXX_USE_CHAR8_T
-  if constexpr (is_same_v<_CharT, char8_t>)
+  if constexpr (is_same_v<_CharT, char>)
 	{
-	  return path{ __source };
-	}
-  else
-	{
-#endif
 	  if constexpr (is_convertible_v)
 	{
 	  const std::string_view __s = __source;
@@ -755,9 +743,9 @@ namespace __detail
 	  std::string __s = path::_S_string_from_iter(__source);
 	  return filesystem::u8path(__s.data(), __s.data() + __s.size());
 	}
-#ifdef _GLIBCXX_USE_CHAR8_T
 	}
-#endif
+  else
+	return path{ __source };
 #else
   return path{ __source };
 #endif
diff --git a/libstdc++-v3/include/experimental/bits/fs_path.h b/libstdc++-v3/include/experimental/bits/fs_path.h
index 91202e5b008..796458b37b0 100644
--- a/libstdc++-v3/include/experimental/bits/fs_path.h
+++ b/libstdc++-v3/include/experimental/bits/fs_path.h
@@ -644,8 +644,22 @@ namespace __detail
 
   /// Create a path from a UTF-8-encoded sequence of char
 #ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
+  inline path
+  __u8path(const string& __s, char)
+  {
+return filesystem::u8path(__s.data(), __s.data() + __s.size());
+  }
+
   template
-inline path
+inline __enable_if_t::value, path>
+__u8path(const _Source& __source, char)
+  

[PATCH] Fix PR91003

2019-11-29 Thread Richard Biener


Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2019-11-29  Richard Biener  

PR tree-optimization/91003
* tree-vect-slp.c (vect_mask_constant_operand_p): Pass in the
operand number, avoid handling the non-condition operands of
COND_EXPRs as comparisons.
(vect_get_constant_vectors): Pass down the operand number.
(vect_get_slp_defs): Likewise.

* gfortran.dg/pr91003.f90: New testcase.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 278841)
+++ gcc/tree-vect-slp.c (working copy)
@@ -3477,7 +3477,7 @@
 /* Return 1 if vector type STMT_VINFO is a boolean vector.  */
 
 static bool
-vect_mask_constant_operand_p (stmt_vec_info stmt_vinfo)
+vect_mask_constant_operand_p (stmt_vec_info stmt_vinfo, unsigned op_num)
 {
   enum tree_code code = gimple_expr_code (stmt_vinfo->stmt);
   tree op, vectype;
@@ -3502,9 +3502,17 @@
   tree cond = gimple_assign_rhs1 (stmt);
 
   if (TREE_CODE (cond) == SSA_NAME)
-   op = cond;
+   {
+ if (op_num > 0)
+   return VECTOR_BOOLEAN_TYPE_P (STMT_VINFO_VECTYPE (stmt_vinfo));
+ op = cond;
+   }
   else
-   op = TREE_OPERAND (cond, 0);
+   {
+ if (op_num > 1)
+   return VECTOR_BOOLEAN_TYPE_P (STMT_VINFO_VECTYPE (stmt_vinfo));
+ op = TREE_OPERAND (cond, 0);
+   }
 
   if (!vect_is_simple_use (op, stmt_vinfo->vinfo, &dt, &vectype))
gcc_unreachable ();
@@ -3635,9 +3643,10 @@
operands.  */
 
 static void
-vect_get_constant_vectors (slp_tree op_node, slp_tree slp_node,
+vect_get_constant_vectors (slp_tree slp_node, unsigned op_num,
vec *vec_oprnds)
 {
+  slp_tree op_node = SLP_TREE_CHILDREN (slp_node)[op_num];
   stmt_vec_info stmt_vinfo = SLP_TREE_SCALAR_STMTS (slp_node)[0];
   vec_info *vinfo = stmt_vinfo->vinfo;
   unsigned HOST_WIDE_INT nunits;
@@ -3659,7 +3668,7 @@
   /* Check if vector type is a boolean vector.  */
   tree stmt_vectype = STMT_VINFO_VECTYPE (stmt_vinfo);
   if (VECT_SCALAR_BOOLEAN_TYPE_P (TREE_TYPE (op))
-  && vect_mask_constant_operand_p (stmt_vinfo))
+  && vect_mask_constant_operand_p (stmt_vinfo, op_num))
 vector_type = truth_type_for (stmt_vectype);
   else
 vector_type = get_vectype_for_scalar_type (vinfo, TREE_TYPE (op), op_node);
@@ -3892,7 +3901,7 @@
  vect_get_slp_vect_defs (child, &vec_defs);
}
   else
-   vect_get_constant_vectors (child, slp_node, &vec_defs);
+   vect_get_constant_vectors (slp_node, i, &vec_defs);
 
   vec_oprnds->quick_push (vec_defs);
 }
Index: gcc/testsuite/gfortran.dg/pr91003.f90
===
--- gcc/testsuite/gfortran.dg/pr91003.f90   (nonexistent)
+++ gcc/testsuite/gfortran.dg/pr91003.f90   (working copy)
@@ -0,0 +1,33 @@
+! { dg-do compile }
+! { dg-options "-Ofast" }
+  SUBROUTINE FOO(N, A, B, C, D, E, F, G)
+  COMPLEX A(*)
+  LOGICAL H
+  INTEGER G
+  REAL I, C, J, F, F1, F2, K, E, L, M, B, D
+  DO JC = 1, N
+K = F*REAL(A(JC))
+Z = F*AIMAG(A(JC))
+H = .FALSE.
+L = G
+IF(ABS(Z).LT.D .AND. I.GE. MAX(D, B*C, B*J)) THEN
+  H = .TRUE.
+  L = (D / F1) / MAX(D, F2*I)
+END IF
+IF(ABS(K).LT.D .AND. C.GE. MAX(D, B*I, B*J)) THEN
+  L = MAX(L, (D / F1) / MAX(D, F2*C))
+END IF
+IF(ABS(E).LT.D .AND. J.GE. MAX(D, B*C, B*I)) THEN
+  H = .TRUE.
+  L = MAX(L, (D / BNRM1) / MAX(D, BNRM2*J))
+END IF
+IF(H) THEN
+  M = (L*D)*MAX(ABS(K), ABS(Z), ABS(E))
+END IF
+IF(H) THEN
+  K = (L*REAL(A(JC)))*F
+  Z = (L*AIMAG(A(JC)))*F
+END IF
+A(JC) = CMPLX(K, Z)
+  END DO
+  END


*ping* [patch, fortran] Fix PR 91783

2019-11-29 Thread Thomas Koenig

Am 24.11.19 um 18:09 schrieb Thomas Koenig:

Hello world,

this patch fixes a 10 regression in dependency checking. The
approach is simple - if gfc_dep_resolver is handed references
with _data, remove that.

Regression-tested. OK for trunk?


Ping?


Fix libdecnumber handling of non-canonical BID significands (PR middle-end/91226)

2019-11-29 Thread Joseph Myers
As reported in bug 91226, the libdecnumber code used on the host to
interpret DFP values in the BID encoding fails, for _Decimal64 and
_Decimal128, to check for the case where a significand is too large
and so specified in IEEE 754 to be a non-canonical encoding of the
zero significand.  This patch adds the required handling of that case,
together with tests both using -O2 (testing this host code) and -O0
(testing libgcc code, which already worked before the patch); the
tests also cover _Decimal32, which already had the required check.

In the _Decimal128 case, where the code previously completely ignored
the case where the first four bits of the combination field are 1100,
1101 or 1110, the logic for determining the correct quantum exponent
in that case is also newly added by this patch, so tests are added for
that as well (again, libgcc already handled it correctly when the
conversion was done at runtime rather than at compile time).

Bootstrapped with no regressions for x86_64-pc-linux-gnu.  OK to
commit (to trunk)?  (Note 1: we don't have a maintainer for
libdecnumber.  Note 2: as a wrong-code fix, this could be considered
later for backporting to release branches if no problems appear with
it on trunk.  Note 3: presumably binutils-gdb will pick this up at
some point through a merge of libdecnumber from the GCC repository.)

libdecnumber:
2019-11-29  Joseph Myers  

PR middle-end/91226
* bid/bid2dpd_dpd2bid.c (_bid_to_dpd64): Handle non-canonical
significands.
(_bid_to_dpd128): Likewise.  Check for case where combination
field starts 1100, 1101 or 1110.

gcc/testsuite:
2019-11-29  Joseph Myers  

PR middle-end/91226
* gcc.dg/dfp/bid-non-canonical-d128-1.c,
gcc.dg/dfp/bid-non-canonical-d128-2.c,
gcc.dg/dfp/bid-non-canonical-d128-3.c,
gcc.dg/dfp/bid-non-canonical-d128-4.c,
gcc.dg/dfp/bid-non-canonical-d32-1.c,
gcc.dg/dfp/bid-non-canonical-d32-2.c,
gcc.dg/dfp/bid-non-canonical-d64-1.c,
gcc.dg/dfp/bid-non-canonical-d64-2.c: New tests.

Index: gcc/testsuite/gcc.dg/dfp/bid-non-canonical-d128-1.c
===
--- gcc/testsuite/gcc.dg/dfp/bid-non-canonical-d128-1.c (nonexistent)
+++ gcc/testsuite/gcc.dg/dfp/bid-non-canonical-d128-1.c (working copy)
@@ -0,0 +1,30 @@
+/* Test non-canonical BID significands: _Decimal128.  Bug 91226.  */
+/* { dg-do run { target { { i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-std=gnu2x -O2" } */
+
+extern void abort (void);
+extern void exit (int);
+
+union u
+{
+  _Decimal128 d128;
+  unsigned __int128 u128;
+};
+
+#define U128(hi, lo) (((unsigned __int128) lo) \
+ | (((unsigned __int128) hi) << 64))
+
+int
+main (void)
+{
+  unsigned __int128 i = U128 (0x3041ed09bead87c0ULL, 0x378d8e640001ULL);
+  union u x;
+  _Decimal128 d128;
+  x.u128 = i;
+  d128 = x.d128;
+  volatile double d = d128;
+  if (d == 0)
+exit (0);
+  else
+abort ();
+}
Index: gcc/testsuite/gcc.dg/dfp/bid-non-canonical-d128-2.c
===
--- gcc/testsuite/gcc.dg/dfp/bid-non-canonical-d128-2.c (nonexistent)
+++ gcc/testsuite/gcc.dg/dfp/bid-non-canonical-d128-2.c (working copy)
@@ -0,0 +1,42 @@
+/* Test non-canonical BID significands: _Decimal128, case where
+   combination field starts 11.  Bug 91226.  */
+/* { dg-do run { target { { i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-std=gnu2x -O2" } */
+
+extern void abort (void);
+extern void exit (int);
+
+union u
+{
+  _Decimal128 d128;
+  unsigned __int128 u128;
+};
+
+#define U128(hi, lo) (((unsigned __int128) lo) \
+ | (((unsigned __int128) hi) << 64))
+
+int
+main (void)
+{
+  unsigned __int128 i = U128 (0x6e79ULL, 0x1ULL);
+  union u x;
+  _Decimal128 d128;
+  x.u128 = i;
+  d128 = x.d128;
+  volatile double d = d128;
+  if (d != 0)
+abort ();
+  /* The above number should have quantum exponent 1234.  */
+  _Decimal128 t1233 = 0.e1233DL, t1234 = 0.e1234DL, t1235 = 0.e1235DL;
+  _Decimal128 dx;
+  dx = d128 + t1233;
+  if (__builtin_memcmp (&dx, &t1233, 16) != 0)
+abort ();
+  dx = d128 + t1234;
+  if (__builtin_memcmp (&dx, &t1234, 16) != 0)
+abort ();
+  dx = d128 + t1235;
+  if (__builtin_memcmp (&dx, &t1234, 16) != 0)
+abort ();
+  exit (0);
+}
Index: gcc/testsuite/gcc.dg/dfp/bid-non-canonical-d128-3.c
===
--- gcc/testsuite/gcc.dg/dfp/bid-non-canonical-d128-3.c (nonexistent)
+++ gcc/testsuite/gcc.dg/dfp/bid-non-canonical-d128-3.c (working copy)
@@ -0,0 +1,5 @@
+/* Test non-canonical BID significands: _Decimal128.  Bug 91226.  */
+/* { dg-do run { target { { i?86-*-* x86_64-*-* } && lp64 } } } */
+/* { dg-options "-std=gnu2x -O0" } */
+
+#include "bid-non-canonical-d128-1.c"
Index: gcc/testsuite/gcc.dg/dfp/bid-non-canonical-d128-4.c
=

Re: [PATCH v2 1/2] driver: Do not warn about ineffective `-x' option if no inputs were given

2019-11-29 Thread Joseph Myers
On Fri, 29 Nov 2019, Maciej W. Rozycki wrote:

> Fix an issue with the GCC driver and the `-x' option where a warning is 
> issued in an invocation like:
> 
> $ riscv64-linux-gnu-gcc -print-multi-directory -x c++
> riscv64-linux-gnu-gcc: warning: '-x c++' after last input file has no effect
> lib64/lp64d
> $ 
> 
> where no inputs were given and hence the use of `-x' is irrelevant.  
> The statement printed is also untrue as the `-x' does not come after the 
> last input file given that none was given.  Do not print it then if no 
> inputs were supplied.
> 
>   gcc/
>   * gcc.c (process_command): Only warn about an ineffective `-x' 
>   option if any input files have actually been supplied.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH v2][MSP430] Add msp430-elfbare target

2019-11-29 Thread Jozef Lawrynowicz
The attached patch consolidates some configuration tweaks I previously submitted
as modifications to the msp430-elf target into a new target called
"msp430-elfbare" i.e. "bare-metal".

MSP430: Disable TM clone registry by default
  https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00550.html
MSP430: Disable __cxa_atexit
  https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00552.html

The patches tweak the CRT code to achieve the smallest possible code size, 
and rely on some additional generic tweaks to crtstuff.c.

I did submit these tweaks a while ago, but I didn't get any feedback,
however even if they are acceptable I suspect it is too late for GCC-10 anyway:
libgcc: Dont define __do_global_dtors_aux if it will be empty
  https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00417.html
libgcc: Implement TARGET_LIBGCC_REMOVE_DSO_HANDLE
  https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00418.html

(The second one is a bit hacky, but without some way of removing the
__dso_handle declaration, we end up with 150 bytes of unnecessary code in some
programs.)

So for this patch crtstuff.c was copied to the msp430 subdirectory and the
changes were made to that target specific version.

Tiny program size can now be achieved by configuring gcc for msp430-elfbare.

For example in an "empty main" program which loops forever:
  msp430-elfbare @ -Os:
 textdata bss dec hex filename
   14   0   0  14   e a.out
  msp430-elf @ -Os:
 textdata bss dec hex filename
  270   6   2 278 116 a.out

Successfully regtested msp430-elfbare vs msp430-elf.

Ok to apply?

P.S. This patch relies on the -fno-exceptions multilib patch submitted here:
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02523.html

P.P.S. This requires some minor configury tweaks to Newlib and GDB of the form:
-  msp430*-*-elf)
+  msp430-*-elf*)
I'll apply these changes if the patch is accepted.
>From cff4611855d838315e793d45256de5fc8eeefafe Mon Sep 17 00:00:00 2001
From: Jozef Lawrynowicz 
Date: Mon, 25 Nov 2019 19:41:05 +
Subject: [PATCH] MSP430: Add new msp430-elfbare target

contrib/ChangeLog:

2019-11-29  Jozef Lawrynowicz  

	* config-list.mk: Add msp430-elfbare.

gcc/ChangeLog:

2019-11-29  Jozef Lawrynowicz  

	* config.gcc: s/msp430*-*-*/msp430-*-*.
	Handle msp430-*-elfbare.
	* config/msp430/msp430-devices.c (TARGET_SUBDIR): Define.
	(_MSPMKSTR): Define.
	(__MSPMKSTR): Define.
	(rest_of_devices_path): Use TARGET_SUBDIR value in string.
	* config/msp430/msp430.c (msp430_option_override): Error if
	-fuse-cxa-atexit is used when it has been disabled at configure time.
	* config/msp430/t-msp430: Define TARGET_SUBDIR when building
	msp430-devices.o.
	* doc/install.texi: Document msp430-*-elf and msp430-*-elfbare.
	* doc/invoke.texi: Update documentation about which path devices.csv is
	searched for.

gcc/testsuite/ChangeLog:

2019-11-29  Jozef Lawrynowicz  

	* g++.dg/init/dso_handle1.C: Require cxa_atexit support.
	* g++.dg/init/dso_handle2.C: Likewise.
	* g++.dg/other/cxa-atexit1.C: Likewise.
	* gcc.target/msp430/msp430.exp: Update csv-using-installed.c test to
	handle msp430-elfbare configuration.

libgcc/ChangeLog:

2019-11-29  Jozef Lawrynowicz  

	* config.host: Use t-msp430-elfbare-crtstuff Makefile fragment when GCC
	is configured for the msp430-elfbare target.
	* config/msp430/msp430-elfbare-crtstuff.c: New file.
	* config/msp430/t-msp430: Remove Makefile rules for object files
	built from crtstuff.c
	* config/msp430/t-msp430-crtstuff: New file.
	* config/msp430/t-msp430-elfbare-crtstuff: New file.
	* configure: Regenerate.
	* configure.ac: Disable TM clone registry by default for
	msp430-elfbare.
---
 contrib/config-list.mk|   2 +-
 gcc/config.gcc|  14 +-
 gcc/config/msp430/msp430-devices.c|  16 +-
 gcc/config/msp430/msp430.c|  10 +
 gcc/config/msp430/t-msp430|   2 +-
 gcc/doc/install.texi  |  16 +-
 gcc/doc/invoke.texi   |   4 +-
 gcc/testsuite/g++.dg/init/dso_handle1.C   |   1 +
 gcc/testsuite/g++.dg/init/dso_handle2.C   |   1 +
 gcc/testsuite/g++.dg/other/cxa-atexit1.C  |   1 +
 gcc/testsuite/gcc.target/msp430/msp430.exp|   8 +-
 libgcc/config.host|  20 +-
 .../config/msp430/msp430-elfbare-crtstuff.c   | 761 ++
 libgcc/config/msp430/t-msp430 |   6 -
 libgcc/config/msp430/t-msp430-crtstuff|  29 +
 .../config/msp430/t-msp430-elfbare-crtstuff   |  43 +
 libgcc/configure  |   9 +
 libgcc/configure.ac   |   8 +
 18 files changed, 933 insertions(+), 18 deletions(-)
 create mode 100644 libgcc/config/msp430/msp430-elfbare-crtstuff.c
 create mode 100644 libgcc/config/msp430/t-msp430-crtstuff
 create mode 100644 libgcc/config/msp430/t-msp430-elfbare-crtstuff

diff --git a/contrib/config-list.mk b/contrib/c

Re: [PING][PATCH] doc: Correct `--enable-version-specific-runtime-libs' support information

2019-11-29 Thread Joseph Myers
On Fri, 29 Nov 2019, Maciej W. Rozycki wrote:

> On Wed, 20 Nov 2019, Maciej W. Rozycki wrote:
> 
> > The `--enable-version-specific-runtime-libs' configuration option is now 
> > supported throughout all of our target library subdirectories, so update 
> > installation documentation accordingly and also mention that the default 
> > for the option is `yes' for libada and `no' for the remaining libraries.
> 
>  Ping for:
> 
> 

This patch is OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [C] Add a target hook that allows targets to verify type usage

2019-11-29 Thread Joseph Myers
On Fri, 29 Nov 2019, Richard Sandiford wrote:

> Ping
> 
> Richard Sandiford  writes:
> > This patch adds a new target hook to check whether there are any
> > target-specific reasons why a type cannot be used in a certain
> > source-language context.  It works in a similar way to existing
> > hooks like TARGET_INVALID_CONVERSION and TARGET_INVALID_UNARY_OP.

This patch is OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 0/4]: C++ P1423R3 char8_t remediation implementation

2019-11-29 Thread Jonathan Wakely

On 29/11/19 19:48 +, Jonathan Wakely wrote:

On 29/11/19 17:45 +, Jonathan Wakely wrote:

On 15/09/19 15:39 -0400, Tom Honermann wrote:
This series of patches provides an implementation of the changes 
for C++ proposal P1423R3 [1].


These changes do not impact default libstdc++ behavior for C++17 
and earlier; they are only active for C++2a or when the -fchar8_t 
option is specified.


Tested x86_64-linux.

Patch 1: Decouple constraints for u8path from path constructors.
Patch 2: Update __cpp_lib_char8_t feature test macro value, add 
deleted operators, update u8path.

Patch 3: Updates to existing tests.
Patch 4: New tests.

Tom.

[1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1423r3.html


It took a while, but I've committed these four patches, with just some
minor whitespace changes and changelog tweaks.


Running the new tests revealed a latent bug on Windows, where
experimental::filesystem::u8path(const Source&) assumed the input
was an iterator over a NTCTS. That worked for a const char* but not a
std::string or experimental::string_view.

The attached patch fixes that (and simplifies the #if and if-constexpr
conditions for Windows) but there's a remaining bug. Constructing a
experimental::filesystem::path from a char8_t string doesn't do the
right thing on Windows, so these cases fails:

fs::path p(u8"\xf0\x9d\x84\x9e");
VERIFY( p.u8string() == u8"\U0001D11E" );

p = fs::u8path(u8"\xf0\x9d\x84\x9e");
VERIFY( p.u8string() == u8"\U0001D11E" );

It works correctly for std::filesystem::path, just not the TS version.


I think this is the fix needed for the TS code:

--- a/libstdc++-v3/include/experimental/bits/fs_path.h
+++ b/libstdc++-v3/include/experimental/bits/fs_path.h
@@ -765,7 +765,14 @@ namespace __detail
  {
#ifdef _GLIBCXX_USE_CHAR8_T
   if constexpr (is_same<_CharT, char8_t>::value)
- return _S_wconvert((const char*)__f, (const char*)__l, true_type());
+ {
+   const char* __f2 = (const char*)__f;
+   const char* __l2 = (const char*)__l;
+   std::wstring __wstr;
+   std::codecvt_utf8_utf16 __wcvt;
+   if (__str_codecvt_in_all(__f2, __l2, __wstr, __wcvt))
+ return __wstr;
+ }
   else
#endif
 {

The current code uses std::codecvt but when
we know the input is UTF-8 encoded we should use codecvt_utf8_utf16
(which is what the C++17 code already does for char8_t input).

I'll add that the patch I'm testing.



Patch to fix PR92283

2019-11-29 Thread Vladimir Makarov

The following patch fixes

   https://gcc.gnu.org/bugzilla/show_bug.cgi?id=92283

The patch has no test because it is very hard to reproduce PR and check 
the patch even on a specific GCC revision.


The patch was successfully bootstrapped and tested on x86-64.

Committed as r278865


Index: ChangeLog
===
--- ChangeLog	(revision 278864)
+++ ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2019-11-29  Vladimir Makarov  
+
+	PR rtl-optimization/92283
+	* lra.c (lra): Update reg notes after inheritance sub-pass and
+	before constraint sub-pass.
+
 2019-11-29  Richard Biener  
 
 	PR tree-optimization/91003
Index: lra.c
===
--- lra.c	(revision 278864)
+++ lra.c	(working copy)
@@ -2473,7 +2473,7 @@ lra (FILE *f)
 		 But don't remove dead insns or change global live
 		 info as we can undo inheritance transformations after
 		 inheritance pseudo assigning.  */
-	  lra_create_live_ranges (true, false);
+	  lra_create_live_ranges (true, !lra_simple_p);
 	  live_p = true;
 	  /* If we don't spill non-reload and non-inheritance
 		 pseudos, there is no sense to run memory-memory move
@@ -2514,6 +2514,11 @@ lra (FILE *f)
 		}
 	}
 	  while (fails_p);
+	  if (! live_p) {
+	/* We need the correct reg notes for work of constraint sub-pass.  */
+	lra_create_live_ranges (true, true);
+	live_p = true;
+	  }
 	}
   /* Don't clear optional reloads bitmap until all constraints are
 	 satisfied as we need to differ them from regular reloads.  */


Re: [PATCH] Fix attribute((section)) for templates

2019-11-29 Thread Strager Neds
I discovered an issue with my patch. I need help resolving it.

Take the following code for example:

template
struct s
{
  static inline int __attribute__((section(".testsection"))) var = 1;
};

struct public_symbol {};

namespace {
struct internal_symbol {};
}

int *
f(bool which)
{
  if (which)
return &s::var;
  else
return &s::var;
}

With my patch, compiling this code fails with the following error:

example.C:4:62: error: 's<{anonymous}::internal_symbol>::var'
causes a section type conflict with 's::var'
example.C:4:62: note: 's::var' was declared here

The error is reported by gcc/varasm.c (get_section) because
s::var has the following section flags:

SECTION_NAMED | SECTION_NOTYPE | SECTION_WRITE
(flags == 0x280200)

but s::var has the following section flags:

SECTION_NAMED | SECTION_LINKONCE | SECTION_WRITE
(sect->common.flags == 0x200a00)

and a section can't have both of these flag at the same time. In
particular, SECTION_LINKONCE conflicts with not-SECTION_LINKONCE.

How can we solve this problem? Some ideas (none of which I like):

* Disallow this code, possibly with an improved diagnostic.
* Silently make the section SECTION_LINKONCE if there is a conflict.
* Silently make the section not-SECTION_LINKONCE if there is a conflict.
* Silently make the section not-SECTION_LINKONCE unconditionally (even
  if there is no conflict).
* Make two sections with the same name, one with SECTION_LINKONCE and
  one with not-SECTION_LINKONCE. This is what Clang does. Clang seems to
  Do What I Mean for ELF; the .o file has one COMDAT section and another
  non-COMDAT section.
* Extend attribute((section())) to allow specifying different section
  names for different section flags.

Thanks in advance for your feedback!

On Fri, Nov 22, 2019 at 12:09 PM Strager Neds  wrote:
>
> Here's a revised version of the patch. This revised version is ready for 
> review.
>
> When GCC encounters __attribute__((section("foo"))) on a function or
> variable declaration, it adds an entry in the symbol table for the
> declaration to remember its desired section. The symbol table is
> separate from the declaration's tree node.
>
> When instantiating a template, GCC copies the tree of the template
> recursively. GCC does *not* copy symbol table entries when copying
> function and variable declarations.
>
> Combined, these two details mean that section attributes on function and
> variable declarations in a template have no effect.
>
> Fix this issue by copying the section name (in the symbol table) when
> copying a tree node for template instantiation. This addresses PR
> c++/70435 and PR c++/88061.
>
> Originally, I tried copying section names in copy_node. This caused
> problems for reasons I do not understand. This patch in this email
> avoids those problems by copying section names only in the callers of
> copy_node relevant to template instantation.
>
> Known unknowns (questions for the audience):
>
> * For all targets which support the section attribute, are functions and
>   variables deduplicated (comdat) when using a custom section? It seems
>   to work with GNU ELF on Linux and with Mach-O on macOS (i.e. I end up
>   with only one copy), but I'm unsure about other platforms. Richard
>   Biener raised this concern in PR c++/88061. Is this something I should
>   worry much about?
> * Do we need to check or copy implicit_section or alias? I don't know
>   what these properties mean, but they look related to section_name.
>
> Note: This patch depends on the following unmerged patches (but could be
> changed to not depend on them):
>
> * Simplify testing symbol sections:
>   https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02062.html
> * Fix attribute((section)) with -flto:
>   https://gcc.gnu.org/ml/gcc-patches/2019-11/msg02063.html
> * Refactor copying decl section names:
>   https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00979.html
>
> Testing:
>
> * Bootstrap on x86_64-linux-gnu with --disable-multilib
>   --enable-checking=release --enable-languages=c,c++. Observe no change
>   in test results (aside from the added tests).
> * Bootstrap on macOS x86_64-apple-darwin16.7.0 with --disable-multilib
>   --enable-checking=release --enable-languages=c,c++. Observe no change
>   in test results (aside from the added tests).
>
> 2019-11-20  Matthew Glazar 
>
> * gcc/cp/pt.c (tsubst_function_decl): Copy the section name from the
> original function.
> (tsubst_decl): Copy the section name from the original variable (if the
> variable is global).
> ---
>  gcc/cp/pt.c   |  5 +++
>  ...section-class-template-function-template.C | 25 +++
>  ...ass-template-specialized-static-variable.C | 29 +
>  ...template-static-inline-variable-template.C | 19 
>  ...on-class-template-static-inline-variable.C | 19 
>  .../section-class-template-static-variable.C  | 20 +
>  ...on-

[C++ PATCH] Fix ICE in build_new_op_1 (PR c++/92705)

2019-11-29 Thread Jakub Jelinek
Hi!

The changed code in build_new_op_1 ICEs on the following testcase,
because conv is user_conv_p with kind == ck_ambig, for which next_conversion
returns NULL.  It seems in other spots where for user_conv_p we are walking
the conversion chain we also don't assume there must be ck_user, so this
patch just uses the first conv if ck_user is not found (so that the previous
diagnostics about ambiguous conversion is emitted).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-11-29  Jakub Jelinek  

PR c++/92705
* call.c (build_new_op_1): For user_conv_p, if there is no
ck_user conversion, use the first one.

* g++.dg/conversion/ambig4.C: New test.

--- gcc/cp/call.c.jj2019-11-29 12:50:09.664608879 +0100
+++ gcc/cp/call.c   2019-11-29 14:09:54.311718859 +0100
@@ -6370,8 +6370,12 @@ build_new_op_1 (const op_location_t &loc
  conv = cand->convs[0];
  if (conv->user_conv_p)
{
- while (conv->kind != ck_user)
-   conv = next_conversion (conv);
+ for (conversion *t = conv; t; t = next_conversion (t))
+   if (t->kind == ck_user)
+ {
+   conv = t;
+   break;
+ }
  arg1 = convert_like (conv, arg1, complain);
}
 
@@ -6380,8 +6384,12 @@ build_new_op_1 (const op_location_t &loc
  conv = cand->convs[1];
  if (conv->user_conv_p)
{
- while (conv->kind != ck_user)
-   conv = next_conversion (conv);
+ for (conversion *t = conv; t; t = next_conversion (t))
+   if (t->kind == ck_user)
+ {
+   conv = t;
+   break;
+ }
  arg2 = convert_like (conv, arg2, complain);
}
}
@@ -6391,8 +6399,12 @@ build_new_op_1 (const op_location_t &loc
  conv = cand->convs[2];
  if (conv->user_conv_p)
{
- while (conv->kind != ck_user)
-   conv = next_conversion (conv);
+ for (conversion *t = conv; t; t = next_conversion (t))
+   if (t->kind == ck_user)
+ {
+   conv = t;
+   break;
+ }
  arg3 = convert_like (conv, arg3, complain);
}
}
--- gcc/testsuite/g++.dg/conversion/ambig4.C.jj 2019-11-29 14:11:35.239183848 
+0100
+++ gcc/testsuite/g++.dg/conversion/ambig4.C2019-11-29 14:11:07.006613238 
+0100
@@ -0,0 +1,14 @@
+// PR c++/92705
+// { dg-do compile }
+
+struct A {};
+struct B {};
+struct C { operator B * (); }; // { dg-message "candidate" }
+struct D { operator B * (); }; // { dg-message "candidate" }
+struct E : C, D { operator A * (); };
+
+void
+foo (E e, int B::* pmf)
+{
+  int i = e->*pmf; // { dg-error "is ambiguous" }
+}

Jakub



[C++ PATCH] Fix nsdmi handling for bitfields (PR c++/92732)

2019-11-29 Thread Jakub Jelinek
Hi!

As the second testcase shows, we shouldn't be calling convert_for_*
with TREE_TYPE (decl) for bitfields, we need DECL_BIT_FIELD_TYPE
in that case instead (unlowered_expr_type doesn't work here,
as that wants a COMPONENT_REF which we don't have).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-11-29  Jakub Jelinek  

PR c++/92732
* typeck2.c (digest_nsdmi_init): For bitfields, use
DECL_BIT_FIELD_TYPE instead of TREE_TYPE.

* g++.dg/cpp2a/bitfield3.C: Don't expect narrowing conversion
warnings.
* g++.dg/cpp2a/bitfield4.C: New test.

--- gcc/cp/typeck2.c.jj 2019-11-25 22:44:24.911346688 +0100
+++ gcc/cp/typeck2.c2019-11-29 21:48:58.825713516 +0100
@@ -1335,6 +1335,9 @@ digest_nsdmi_init (tree decl, tree init,
   gcc_assert (TREE_CODE (decl) == FIELD_DECL);
 
   tree type = TREE_TYPE (decl);
+  if (DECL_BIT_FIELD_TYPE (decl))
+type = cp_build_qualified_type (DECL_BIT_FIELD_TYPE (decl),
+   cp_type_quals (type));
   int flags = LOOKUP_IMPLICIT;
   if (DIRECT_LIST_INIT_P (init))
 {
--- gcc/testsuite/g++.dg/cpp2a/bitfield3.C.jj   2017-09-29 19:49:17.684827064 
+0200
+++ gcc/testsuite/g++.dg/cpp2a/bitfield3.C  2019-11-29 21:52:56.636122855 
+0100
@@ -15,11 +15,9 @@ const int b = 0;
 struct S {
   int c : 5 = 2 * a;   // { dg-warning "default member 
initializers for bit-fields only available with" "" { target c++17_down } }
   int d : 6 { c + a }; // { dg-warning "default member 
initializers for bit-fields only available with" "" { target c++17_down } }
-   // { dg-warning "narrowing conversion 
of" "" { target *-*-* } .-1 }
   int e : true ? 7 : a = 3;
   int f : (true ? 8 : b) = d + a;  // { dg-warning "default member 
initializers for bit-fields only available with" "" { target c++17_down } }
   int g : (true ? 9 : b) { f + a };// { dg-warning "default member 
initializers for bit-fields only available with" "" { target c++17_down } }
-   // { dg-warning "narrowing conversion 
of" "" { target *-*-* } .-1 }
   int h : 1 || new int { 0 };
   int i = g + a;
 };
@@ -28,11 +26,9 @@ template 
 struct U {
   int j : W = 3 * a;   // { dg-warning "default member 
initializers for bit-fields only available with" "" { target c++17_down } }
   int k : W { j + a }; // { dg-warning "default member 
initializers for bit-fields only available with" "" { target c++17_down } }
-   // { dg-warning "narrowing conversion 
of" "" { target *-*-* } .-1 }
   int l : V ? 7 : a = 3;
   int m : (V ? W : b) = k + a; // { dg-warning "default member 
initializers for bit-fields only available with" "" { target c++17_down } }
   int n : (V ? W : b) { m + a };   // { dg-warning "default member 
initializers for bit-fields only available with" "" { target c++17_down } }
-   // { dg-warning "narrowing conversion 
of" "" { target *-*-* } .-1 }
   int o : 1 || new int { 0 };
   int p = n + a;
 };
--- gcc/testsuite/g++.dg/cpp2a/bitfield4.C.jj   2019-11-29 21:48:28.752167687 
+0100
+++ gcc/testsuite/g++.dg/cpp2a/bitfield4.C  2019-11-29 21:52:40.548365714 
+0100
@@ -0,0 +1,12 @@
+// PR c++/92732
+// { dg-do compile { target c++17 } }
+// { dg-options "" }
+
+enum class byte : unsigned char { };
+using uint8_t = unsigned char;
+
+struct T
+{
+  byte a : 2 = byte{0};// { dg-warning "default member initializers 
for bit-fields only available with" "" { target c++17_down } }
+  uint8_t b : 2 = 0;   // { dg-warning "default member initializers for 
bit-fields only available with" "" { target c++17_down } }
+} t;

Jakub



[PATCH] Improve A*B+-A -> A*(B+-1) and A+-A*B -> A*(1+-B) match.pd optimization

2019-11-29 Thread Jakub Jelinek
Hi!

As discussed in the PR, we can't optimize e.g.
  int a = t - 1;
  int b = a * v;
  return b + v;
into return t * v; for signed non-wrapv arithmetics.  This can be done
by the match.pd (A * B) +- A -> (B +- 1) * A or
A +- (A * B) -> (1 +- B) * A canonicalizations.  Being a lazy guy,
I wrote attached brute force proglet to look for all the cases (for signed
char) where there is no UB in the original and the transformation would
introduce UB.  For the simple cases with just A and B, rather than A, B and
C, the problematic cases are for signed char only:
A*B+A -> (B+1)*A A==-1 && B==127
A*B+A -> (B+1)*A A==0 && B==127
A*B-A -> (B-1)*A A==0 && B==-128
A-A*B -> (1-B)*A A==-1 && B==-127
A-A*B -> (1-B)*A A==0 && B==-128
A-A*B -> (1-B)*A A==0 && B==-127
The current patterns already use VRP (tree_expr_nonzero_p and
expr_not_equal_to) to do the transformation only if A is known not to be 0
or -1.  But as the above problematic cases show, for A*B-A the -1
case is actually not problematic (transformation doesn't introduce UB;
if A is -1, -1*B+1 has UB only for minimum and minimum+1, and the
replacement (B-1)*-1 has also UB for those two cases only) and even when we
know nothing about A value range, if we know something about B value range,
we could still optimize.  So, for the
  int a = t - 1;
  int b = a * v;
  return b + v;
case, the a = t - 1 has value range that doesn't include maximum and
so we can conclude it is ok to transform it into return ((t - 1) + 1) * v
and thus t * v.

Unfortunately, the patch "broke" a few loop_versioning_*.f90 tests (CCing
author), where there are small differences between the lversion pass,
e.g. in loop_versioning_1.f90 (f1):
-  # RANGE ~[0, 0]
-  _4 = iftmp.11_6 * S.4_19;
-  _11 = _4 - iftmp.11_6;
+  # RANGE [0, 9223372036854775806] NONZERO 9223372036854775807
+  _4 = S.4_19 + -1;
+  _11 = _4 * iftmp.11_6;
and the lversion pass then emits just one message instead of two, but in the
end assembly is identical.  In loop_versioning_6.f90 (though, with -m32
only), the code before lversion pass actually looks better in f1:
-  # i_35 = PHI <1(8), i_28(9)>
-  _9 = iftmp.33_15 * i_35;
-  _10 = _9 * 2;
-  _21 = _10 - iftmp.33_15;
-  (*x.0_23)[_21] = 1.0e+2;
-  _11 = i_35 * 2;
-  _12 = _11 + 1;
-  _13 = _12 * iftmp.33_15;
-  _22 = _13 - iftmp.33_15;
-  (*x.0_23)[_22] = 1.01e+2;
-  i_28 = i_35 + 1;
-  if (iftmp.36_25 < i_28)
+  # i_31 = PHI <1(8), i_26(9)>
+  _10 = iftmp.33_13 * i_31;
+  _11 = _10 * 2;
+  _19 = _11 - iftmp.33_13;
+  (*x.0_21)[_19] = 1.0e+2;
+  (*x.0_21)[_11] = 1.01e+2;
+  i_26 = i_31 + 1;
+  if (iftmp.36_23 < i_26)
where due to the new canonicalizations we managed to avoid some
multiplications.  One index was iftmp*i*2-iftmp and the other
was iftmp*(i*2+1)-iftmp and with the patch we managed to simplify
the latter into iftmp*i*2 and use for that the temporary used for
the first expression.  f1 is actually in assembly smaller because of this.
The lp64 vs. ! lp64 is just a wild guess, guess testing on further targets
will show what is the target property that matters.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2019-11-29  Jakub Jelinek  

PR tree-optimization/92712
* match.pd ((A * B) +- A -> (B +- 1) * A,
A +- (A * B) -> (1 +- B) * A): Allow optimizing signed integers
even when we don't know anything about range of A, but do know
something about range of B and the simplification won't introduce
new UB.

* gcc.dg/tree-ssa/pr92712-1.c: New test.
* gcc.dg/tree-ssa/pr92712-2.c: New test.
* gcc.dg/tree-ssa/pr92712-3.c: New test.
* gfortran.dg/loop_versioning_1.f90: Adjust expected number of
likely to be innermost dimension messages.
* gfortran.dg/loop_versioning_10.f90: Likewise.
* gfortran.dg/loop_versioning_6.f90: Likewise.

--- gcc/match.pd.jj 2019-11-05 14:59:22.546873967 +0100
+++ gcc/match.pd2019-11-29 18:17:27.472002727 +0100
@@ -2480,18 +2480,42 @@ (define_operator_list COND_TERNARY
 (plusminus @0 (mult:c@3 @0 @2))
 (if ((!ANY_INTEGRAL_TYPE_P (type)
  || TYPE_OVERFLOW_WRAPS (type)
+ /* For @0 + @0*@2 this transformation would introduce UB
+(where there was none before) for @0 in [-1,0] and @2 max.
+For @0 - @0*@2 this transformation would introduce UB
+for @0 0 and @2 in [min,min+1] or @0 -1 and @2 min+1.  */
  || (INTEGRAL_TYPE_P (type)
- && tree_expr_nonzero_p (@0)
- && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)
+ && ((tree_expr_nonzero_p (@0)
+  && expr_not_equal_to (@0,
+   wi::minus_one (TYPE_PRECISION (type
+ || (plusminus == PLUS_EXPR
+ ? expr_not_equal_to (@2,
+   wi::max_value (TYPE_PRECISION (type), SIGNED))
+ /* Let's ignore the @0 -1 and @2 min case.  */
+

Re: [PATCH 0/4]: C++ P1423R3 char8_t remediation implementation

2019-11-29 Thread Jonathan Wakely

On 29/11/19 21:45 +, Jonathan Wakely wrote:

On 29/11/19 19:48 +, Jonathan Wakely wrote:

On 29/11/19 17:45 +, Jonathan Wakely wrote:

On 15/09/19 15:39 -0400, Tom Honermann wrote:
This series of patches provides an implementation of the changes 
for C++ proposal P1423R3 [1].


These changes do not impact default libstdc++ behavior for C++17 
and earlier; they are only active for C++2a or when the 
-fchar8_t option is specified.


Tested x86_64-linux.

Patch 1: Decouple constraints for u8path from path constructors.
Patch 2: Update __cpp_lib_char8_t feature test macro value, add 
deleted operators, update u8path.

Patch 3: Updates to existing tests.
Patch 4: New tests.

Tom.

[1]: http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2019/p1423r3.html


It took a while, but I've committed these four patches, with just some
minor whitespace changes and changelog tweaks.


Running the new tests revealed a latent bug on Windows, where
experimental::filesystem::u8path(const Source&) assumed the input
was an iterator over a NTCTS. That worked for a const char* but not a
std::string or experimental::string_view.

The attached patch fixes that (and simplifies the #if and if-constexpr
conditions for Windows) but there's a remaining bug. Constructing a
experimental::filesystem::path from a char8_t string doesn't do the
right thing on Windows, so these cases fails:

fs::path p(u8"\xf0\x9d\x84\x9e");
VERIFY( p.u8string() == u8"\U0001D11E" );

p = fs::u8path(u8"\xf0\x9d\x84\x9e");
VERIFY( p.u8string() == u8"\U0001D11E" );

It works correctly for std::filesystem::path, just not the TS version.


I think this is the fix needed for the TS code:

--- a/libstdc++-v3/include/experimental/bits/fs_path.h
+++ b/libstdc++-v3/include/experimental/bits/fs_path.h
@@ -765,7 +765,14 @@ namespace __detail
 {
#ifdef _GLIBCXX_USE_CHAR8_T
  if constexpr (is_same<_CharT, char8_t>::value)
- return _S_wconvert((const char*)__f, (const char*)__l, true_type());
+ {
+   const char* __f2 = (const char*)__f;
+   const char* __l2 = (const char*)__l;
+   std::wstring __wstr;
+   std::codecvt_utf8_utf16 __wcvt;
+   if (__str_codecvt_in_all(__f2, __l2, __wstr, __wcvt))
+ return __wstr;
+ }
  else
#endif
{

The current code uses std::codecvt but when
we know the input is UTF-8 encoded we should use codecvt_utf8_utf16
(which is what the C++17 code already does for char8_t input).

I'll add that the patch I'm testing.


Here's the final patch I'm committing. Tested powerpc64le-linux and
x86_64-w64-mingw, committed to trunk.


commit cb232cf3d475d51fe5e550680bff64dfd32f57a5
Author: Jonathan Wakely 
Date:   Fri Nov 29 19:36:39 2019 +

libstdc++: Fix experimental::filesystem::u8path(const Source&) for Windows

This function failed to compile when called with a std::string.

Also, constructing a path with a char8_t string did not correctly treat
the string as already UTF-8 encoded.

* include/bits/fs_path.h (u8path(InputIterator, InputIterator))
(u8path(const Source&)) [_GLIBCXX_FILESYSTEM_IS_WINDOWS]: Simplify
conditions.
* include/experimental/bits/fs_path.h [_GLIBCXX_FILESYSTEM_IS_WINDOWS]
(__u8path(const Source&, char)): Add overloads for std::string and
types convertible to std::string.
(_Cvt::_S_wconvert): Add a new overload for char8_t strings and use
codecvt_utf8_utf16 to do the correct conversion.

diff --git a/libstdc++-v3/include/bits/fs_path.h b/libstdc++-v3/include/bits/fs_path.h
index b129372447b..20ec42da57d 100644
--- a/libstdc++-v3/include/bits/fs_path.h
+++ b/libstdc++-v3/include/bits/fs_path.h
@@ -691,14 +691,8 @@ namespace __detail
 u8path(_InputIterator __first, _InputIterator __last)
 {
 #ifdef _GLIBCXX_FILESYSTEM_IS_WINDOWS
-#ifdef _GLIBCXX_USE_CHAR8_T
-  if constexpr (is_same_v<_CharT, char8_t>)
+  if constexpr (is_same_v<_CharT, char>)
 	{
-	  return path{ __first, __last };
-	}
-  else
-	{
-#endif
 	  // XXX This assumes native wide encoding is UTF-16.
 	  std::codecvt_utf8_utf16 __cvt;
 	  path::string_type __tmp;
@@ -710,16 +704,16 @@ namespace __detail
 	  else
 	{
 	  const std::string __u8str{__first, __last};
-	  const char* const __ptr = __u8str.data();
-	  if (__str_codecvt_in_all(__ptr, __ptr + __u8str.size(), __tmp, __cvt))
+	  const char* const __p = __u8str.data();
+	  if (__str_codecvt_in_all(__p, __p + __u8str.size(), __tmp, __cvt))
 		return path{ __tmp };
 	}
 	  _GLIBCXX_THROW_OR_ABORT(filesystem_error(
 	  "Cannot convert character sequence",
 	  std::make_error_code(errc::illegal_byte_sequence)));
-#ifdef _GLIBCXX_USE_CHAR8_T
 	}
-#endif
+  else
+	return path{ __first, __last };
 #else
   // This assumes native normal encoding is UTF-8.
   return path{ __first, __last };
@@ -737,14 +731,8 @@ namespace __detail

[PATCH] Default to --enable-libstdcxx-filesystem-ts for *-*-mingw*

2019-11-29 Thread Jonathan Wakely

* acinclude.m4 (GLIBCXX_ENABLE_FILESYSTEM_TS): Enable by default for
mingw targets.
* configure: Regenerate.

Tested powerpc64le-linux and mingw-w64, committed to trunk.

commit 60168e315f0e1533d30847006b207e138dc54b3d
Author: redi 
Date:   Sat Nov 30 01:03:40 2019 +

libstdc++: Default to --enable-libstdcxx-filesystem-ts for *-*-mingw*

* acinclude.m4 (GLIBCXX_ENABLE_FILESYSTEM_TS): Enable by default for
mingw targets.
* configure: Regenerate.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@278870 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/acinclude.m4 b/libstdc++-v3/acinclude.m4
index ad2cb01d94f..73e07e513bc 100644
--- a/libstdc++-v3/acinclude.m4
+++ b/libstdc++-v3/acinclude.m4
@@ -4436,6 +4436,9 @@ AC_DEFUN([GLIBCXX_ENABLE_FILESYSTEM_TS], [
   solaris*)
 enable_libstdcxx_filesystem_ts=yes
 ;;
+  mingw*)
+enable_libstdcxx_filesystem_ts=yes
+;;
   *)
 enable_libstdcxx_filesystem_ts=no
 ;;


Re: [PATCH] [libiberty] Fix read buffer overflow in split_directories

2019-11-29 Thread Ian Lance Taylor via gcc-patches
On Thu, Nov 28, 2019 at 1:11 PM Tim Rühsen  wrote:
>
> An empty name param leads to read buffer overflow in
> function split_directories.
>
> * libiberty/make-relative-prefix.c (split_directories):
>   Return early on empty name.
> ---
>  libiberty/ChangeLog  | 7 +++
>  libiberty/make-relative-prefix.c | 3 +++
>  2 files changed, 10 insertions(+)
>
> diff --git a/libiberty/ChangeLog b/libiberty/ChangeLog
> index b516903d94..b7e24d11ef 100644
> --- a/libiberty/ChangeLog
> +++ b/libiberty/ChangeLog
> @@ -1,3 +1,10 @@
> +2019-11-28  Tim Ruehsen  
> +
> +   Fix read buffer overflow in split_directories
> +
> +   * make-relative-prefix.c (split_directories):
> +   Return early on empty 'name'
> +

This is OK.

Thanks.

Ian