Re: [PATCH][PR89341]Fix ICE on function definition with weakref/alias attributes attached

2019-06-27 Thread JunMa

在 2019/6/19 上午4:38, Jeff Law 写道:

On 3/26/19 5:40 AM, JunMa wrote:

Hi

According to gnu document of function attributes, neither weakref nor alias
could be attached to a function defined in current translation unit.
Although GCC checks the attributes under some circumstances, it still fails
on some cases and even causes ICE.

This patch checks whether alias/weakref attribute attaches on a function
declaration which has body, and removes the attribute later.
This also avoid the ICE.

Bootstrapped/regtested on x86_64-linux, ok for GCC10?

Regards
JunMa


gcc/ChangeLog

2019-03-26  Jun Ma 

     PR89341
     * cgraphunit.c (handle_alias_pairs): Check whether alias attribute
attaches
     on a function declaration which has body.

gcc/testsuite/ChangeLog

2019-03-26  Jun Ma 

     PR89341
     * gcc.dg/attr-alias-6.c: New test.
     * gcc.dg/attr-weakref-5.c: Likewise.

Based on my reading of the BZ, this should result in a hard error,
rather than an "attribute ignored" warning.

Sorry for the late reply.

Yes, It should error out with message like "'foo' defined both normally
and as 'alias' attribute". However, this patch just tries to fix ICE, and
keeps remain unchanged.

I do have a patch to fix the message and the testcases, I'll send it later.

Regards
JunMa

Jeff





Re: [PATCH][Tree-optimization/PR89772]fold memchr builtins for character not in constant nul-padded string

2019-06-27 Thread JunMa

在 2019/6/18 上午6:46, Jeff Law 写道:

On 5/9/19 2:01 AM, JunMa wrote:

在 2019/5/9 上午10:22, JunMa 写道:

在 2019/5/9 上午3:02, Bernd Edlinger 写道:

On 5/8/19 4:31 PM, Richard Biener wrote:

On Tue, May 7, 2019 at 4:34 AM JunMa  wrote:

在 2019/5/6 下午7:58, JunMa 写道:

在 2019/5/6 下午6:02, Richard Biener 写道:

On Thu, Mar 21, 2019 at 5:57 AM JunMa 
wrote:

Hi
For now, gcc can not fold code like:

const char a[5] = "123"
__builtin_memchr (a, '7', sizeof a)

It tries to avoid folding out of string length although length of a
is 5.
This is a bit conservative, it's safe to folding memchr/bcmp/memcmp
builtins when constant string stores in array with some trailing
nuls.

This patch folds these cases by exposing additional length of
trailing nuls in c_getstr().
Bootstrapped/regtested on x86_64-linux, ok for trunk?

It's probably better if gimple_fold_builtin_memchr uses
string_constant
directly instead?

We can use string_constant in gimple_fold_builtin_memchr, however
it is a
bit complex to use it in memchr/memcmp constant folding.

You are changing memcmp constant folding but only have a testcase
for memchr.

I'll add later.

If we decide to amend c_getstr I would rather make it return the
total length instead of the number of trailing zeros.

I think return the total length is safe in other place as well.
I used the argument in patch since I do not want any impact on
other part at all.


Although it is safe to use total length, I found that function
inline_expand_builtin_string_cmp() which used c_getstr() may emit
redundant rtls for trailing null chars when total length is returned.

Also it is trivial to handle constant string with trailing null chars.

So this updated patch follow richard's suggestion: using
string_constant
directly.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

Doesn't this fold to NULL for the case where seaching for '0' and it
doesn't occur in the string constant but only the zero-padding?

So I'm not sure if JunMa addressed this question specifically.  What
happens is we'll get back a NULL terminated string from c_getstr, but
the returned length will include the NUL terminator.  Then we call
memchr on the result with a length that would include that NUL
terminator.  So I'm pretty sure the current patch will DTRT in that case.

It'd be better to have a stronger test which verified not only that the
call was folded away, but what the resultant value was and whether or
not it was the right value.

That would include testing for NUL in the string as well as testing for
NUL in the tail padding.

I'll ack the change to gimple-fold, but please improve the testcase a
bit and commit the change to gimple-fold and the improved testcase together.

Thanks, and sorry for the delays.

Thanks, I'll update the testcase and check in after pass the regtest.

Regards
Jun

jeff





Re: [RFC PATCH, i386]: Autovectorize 8-byte vectors

2019-06-27 Thread Richard Biener
On Thu, 27 Jun 2019, Uros Bizjak wrote:

> On Thu, Jun 27, 2019 at 8:05 AM Jakub Jelinek  wrote:
> >
> > On Wed, Jun 26, 2019 at 12:19:28PM +0200, Uros Bizjak wrote:
> > > Yes, the patch works OK. I'll regression test it and push it later today.
> >
> > I think it caused
> > +FAIL: gcc.dg/tree-ssa/pr84512.c scan-tree-dump optimized "return 285;"
> > which admittedly already is xfailed on various targets.
> > We now newly vectorize those loops and there is no FRE or similar pass
> > after vectorization to clean it up, in particular optimize the
> > a[8] and a[9] loads given the MEM  [(int *)&a + 32B]
> > store:
> >   MEM  [(int *)&a + 32B] = { 64, 81 };
> >   _13 = a[8];
> >   res_6 = _13 + 140;
> >   _18 = a[9];
> >   res_15 = res_6 + _18;
> >   a ={v} {CLOBBER};
> >   return res_15;
> 
> Yes, I have seen pr84512.c, but the failure is benign. It is caused by
> the fact that we now vectorize the loops of the test.
> 
> > Shall we xfail it, or is there a plan to enable FRE after vectorization,
> > or similar pass that would be able to do similar memory optimizations?
> > Note, the RTL passes are able to optimize it in the end in this testcase.
> 
> The testcase failure could be solved by -fno-tree-vectorize, but I
> think that the value should be propagated through vectors, and tree
> optimizers should optimize the vectorized function in the same way as
> scalar function.

FRE needs a simple fix (oops) to handle this case though.

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

And yes, I think ultimatively we want a late FRE...

Richard.

2019-06-27  Richard Biener  

* tree-ssa-sccvn.c (vn_reference_lookup_3): Encode valueized RHS.

* gcc.dg/tree-ssa/ssa-fre-67.c: New testcase.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 272732)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -2242,7 +2242,7 @@ vn_reference_lookup_3 (ao_ref *ref, tree
  tree rhs = gimple_assign_rhs1 (def_stmt);
  if (TREE_CODE (rhs) == SSA_NAME)
rhs = SSA_VAL (rhs);
- len = native_encode_expr (gimple_assign_rhs1 (def_stmt),
+ len = native_encode_expr (rhs,
buffer, sizeof (buffer),
(offseti - offset2) / BITS_PER_UNIT);
  if (len > 0 && len * BITS_PER_UNIT >= maxsizei)
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-67.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-67.c  (revision 272732)
+++ gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-67.c  (working copy)
@@ -1,16 +1,32 @@
 /* { dg-do compile } */
-/* { dg-options "-O2 -fno-tree-ccp -fdump-tree-fre1-stats" } */
+/* { dg-options "-fgimple -O1 -fdump-tree-fre1" } */
 
-int foo()
+int a[10];
+typedef int v2si __attribute__((vector_size(__SIZEOF_INT__*2)));
+int __GIMPLE (ssa,guessed_local(97603132),startwith("fre1"))
+ foo ()
 {
-  int i = 0;
-  do
-{
-  i++;
-}
-  while (i != 1);
-  return i;
+  int i;
+  int _59;
+  int _44;
+  int _13;
+  int _18;
+  v2si _80;
+  v2si _81;
+  int res;
+
+  __BB(2,guessed_local(97603132)):
+  _59 = 64;
+  i_61 = 9;
+  _44 = i_61 * i_61;
+  _80 = _Literal (v2si) {_59, _44};
+  _81 = _80;
+  __MEM  ((int *)&a + _Literal (int *) 32) = _81;
+  i_48 = 9;
+  _13 = a[8];
+  _18 = a[i_48];
+  res_15 = _13 + _18;
+  return res_15;
 }
 
-/* { dg-final { scan-tree-dump "RPO iteration over 3 blocks visited 3 blocks" 
"fre1" } } */
-/* { dg-final { scan-tree-dump "return 1;" "fre1" } } */
+/* { dg-final { scan-tree-dump "return 145;" "fre1" } } */


allow EH to escape from GIMPLE_EH_ELSE ELSE block

2019-06-27 Thread Alexandre Oliva
The only preexisting use of GIMPLE_EH_ELSE, for transactional memory
commits, did not allow exceptions to escape from the ELSE path.  The
trick it uses to allow the ELSE path to see the propagating exception
does not work very well if the exception cleanup raises further
exceptions: the ELSE block is configured to handle exceptions in
itself.  This confuses the heck out of CFG and EH cleanups.

Basing the lowering context for the ELSE block on outer_state, rather
than this_state, gets us the expected enclosing handler.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* tree-eh.c (honor_protect_cleanup_actions): Use outer_
rather than this_state as the lowering context for the ELSE
seq in a GIMPLE_EH_ELSE.
---
 gcc/tree-eh.c |   13 -
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/gcc/tree-eh.c b/gcc/tree-eh.c
index 23c56b5661a1..4de09d1bf7b5 100644
--- a/gcc/tree-eh.c
+++ b/gcc/tree-eh.c
@@ -1001,11 +1001,14 @@ honor_protect_cleanup_actions (struct leh_state 
*outer_state,
   gimple_try_set_cleanup (tf->top_p, gimple_eh_else_n_body (eh_else));
   finally = gimple_eh_else_e_body (eh_else);
 
-  /* Let the ELSE see the exception that's being processed.  */
-  eh_region save_ehp = this_state->ehp_region;
-  this_state->ehp_region = this_state->cur_region;
-  lower_eh_constructs_1 (this_state, &finally);
-  this_state->ehp_region = save_ehp;
+  /* Let the ELSE see the exception that's being processed, but
+since the cleanup is outside the try block, process it with
+outer_state, otherwise it may be used as a cleanup for
+itself, and Bad Things (TM) ensue.  */
+  eh_region save_ehp = outer_state->ehp_region;
+  outer_state->ehp_region = this_state->cur_region;
+  lower_eh_constructs_1 (outer_state, &finally);
+  outer_state->ehp_region = save_ehp;
 }
   else
 {

-- 
Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
Be the change, be Free! FSF Latin America board member
GNU Toolchain EngineerFree Software Evangelist
Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara


introduce EH_ELSE tree and gimplifier

2019-06-27 Thread Alexandre Oliva
I found GIMPLE_EH_ELSE offered exactly the semantics I needed for some
Ada changes yet to be contributed, but GIMPLE_EH_ELSE was only built
by GIMPLE passes, and I needed to build earlier something that
eventually became GIMPLE_EH_ELSE.

This patch does that, introducing an EH_ELSE tree, and logic to dump
it and to gimplify it.

Regstrapped on x86_64-linux-gnu.  Ok to install?


for  gcc/ChangeLog

* doc/generic.texi (Cleanups): Document EH_ELSE.
* except.c: Likewise.
* expr.c (expand_expr_real_1): Reject it.
* gimplify.c (gimplify_expr): Gimplify it, within
TRY_FINALLY_EXPR.
* tree-dump.c (dequeue_and_dump): Dump it.
* tree-pretty-print.c (dump_generic_node): Likewise.
* tree.c (block_may_fallthru): Handle it.
* tree.def (EH_ELSE): Introduce.
---
 gcc/doc/generic.texi|5 +
 gcc/except.c|   12 ++--
 gcc/expr.c  |1 +
 gcc/gimplify.c  |   18 +-
 gcc/tree-dump.c |1 +
 gcc/tree-pretty-print.c |   11 +++
 gcc/tree.c  |3 +++
 gcc/tree.def|7 +++
 8 files changed, 51 insertions(+), 7 deletions(-)

diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
index 67f7ad53af6b..1d0e3cfec1d6 100644
--- a/gcc/doc/generic.texi
+++ b/gcc/doc/generic.texi
@@ -2180,6 +2180,11 @@ After the second sequence is executed, if it completes 
normally by
 falling off the end, execution continues wherever the first sequence
 would have continued, by falling off the end, or doing a goto, etc.
 
+If the second sequence is an @code{EH_ELSE} selector, then the sequence
+in its first operand is used when the first sequence completes normally,
+and that in its second operand is used for exceptional cleanups, i.e.,
+when an exception propagates out of the first sequence.
+
 @code{TRY_FINALLY_EXPR} complicates the flow graph, since the cleanup
 needs to appear on every edge out of the controlled block; this
 reduces the freedom to move code across these edges.  Therefore, the
diff --git a/gcc/except.c b/gcc/except.c
index edaeeb4cfd1b..76cd099749f3 100644
--- a/gcc/except.c
+++ b/gcc/except.c
@@ -27,14 +27,14 @@ along with GCC; see the file COPYING3.  If not see
the compilation process:
 
In the beginning, in the front end, we have the GENERIC trees
-   TRY_CATCH_EXPR, TRY_FINALLY_EXPR, WITH_CLEANUP_EXPR,
+   TRY_CATCH_EXPR, TRY_FINALLY_EXPR, EH_ELSE, WITH_CLEANUP_EXPR,
CLEANUP_POINT_EXPR, CATCH_EXPR, and EH_FILTER_EXPR.
 
-   During initial gimplification (gimplify.c) these are lowered
-   to the GIMPLE_TRY, GIMPLE_CATCH, and GIMPLE_EH_FILTER nodes.
-   The WITH_CLEANUP_EXPR and CLEANUP_POINT_EXPR nodes are converted
-   into GIMPLE_TRY_FINALLY nodes; the others are a more direct 1-1
-   conversion.
+   During initial gimplification (gimplify.c) these are lowered to the
+   GIMPLE_TRY, GIMPLE_CATCH, GIMPLE_EH_ELSE, and GIMPLE_EH_FILTER
+   nodes.  The WITH_CLEANUP_EXPR and CLEANUP_POINT_EXPR nodes are
+   converted into GIMPLE_TRY_FINALLY nodes; the others are a more
+   direct 1-1 conversion.
 
During pass_lower_eh (tree-eh.c) we record the nested structure
of the TRY nodes in EH_REGION nodes in CFUN->EH->REGION_TREE.
diff --git a/gcc/expr.c b/gcc/expr.c
index c78bc74c0d9f..70fb721a9621 100644
--- a/gcc/expr.c
+++ b/gcc/expr.c
@@ -11292,6 +11292,7 @@ expand_expr_real_1 (tree exp, rtx target, machine_mode 
tmode,
 case CATCH_EXPR:
 case EH_FILTER_EXPR:
 case TRY_FINALLY_EXPR:
+case EH_ELSE:
   /* Lowered by tree-eh.c.  */
   gcc_unreachable ();
 
diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 0b25e7100cde..de4cb66e2b65 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -13074,7 +13074,22 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, 
gimple_seq *post_p,
input_location = UNKNOWN_LOCATION;
eval = cleanup = NULL;
gimplify_and_add (TREE_OPERAND (*expr_p, 0), &eval);
-   gimplify_and_add (TREE_OPERAND (*expr_p, 1), &cleanup);
+   if (TREE_CODE (*expr_p) == TRY_FINALLY_EXPR
+   && TREE_CODE (TREE_OPERAND (*expr_p, 1)) == EH_ELSE)
+ {
+   gimple_seq n = NULL, e = NULL;
+   gimplify_and_add (TREE_OPERAND (TREE_OPERAND (*expr_p, 1),
+   0), &n);
+   gimplify_and_add (TREE_OPERAND (TREE_OPERAND (*expr_p, 1),
+   1), &e);
+   if (!gimple_seq_empty_p (n) && !gimple_seq_empty_p (e))
+ {
+   geh_else *stmt = gimple_build_eh_else (n, e);
+   gimple_seq_add_stmt (&cleanup, stmt);
+ }
+ }
+   else
+ gimplify_and_add (TREE_OPERAND (*expr_p, 1), &cleanup);
/* Don't create bogus GIMPLE_TRY with empty cleanup.  */
if (gimple_seq_empty_p (cleanup))
  {
@@ -13632,6 +13647,7 @@ g

RFC: Include in for C++17

2019-06-27 Thread Jonathan Wakely

I wonder if we want this patch:

--- a/libstdc++-v3/include/std/utility
+++ b/libstdc++-v3/include/std/utility
@@ -74,6 +74,9 @@
#include 
#include 
#include 
+#if __cplusplus == 201703L && defined __STRICT_ANSI__
+# include 
+#endif

namespace std _GLIBCXX_VISIBILITY(default)
{

In the published C++17 standard, std::to_chars and std::from_chars are
meant to be in . The new  header was only
introduced post-C++17 by a defect report (specifically, by P0682R1).

But maybe this isn't needed, because there's no code in the wild that
was written before  was invented, so everybody should
already be using that.




[PATCH] x86: fix/improve vgf2p8affine*qb insns

2019-06-27 Thread Jan Beulich
- the affine transformations are not commutative (the two source
  operands have entirely different meaning)
- there's no need for three alternatives
- the nonimmediate_operand/Bm combination can better be vector_operand/m

gcc/
2019-06-27  Jan Beulich  

* config/i386/sse.md (vgf2p8affineinvqb_,
vgf2p8affineqb_): Drop % constraint modifier.
Eliminate redundant alternative.  Use vector_operand plus "m"
constraint.

gcc/testsuite/
2019-06-27  Jan Beulich  

* gcc.target/i386/gfni-5.c: New.

---
On top of this (in further patches) it looks like the Bm constraint could be
dropped altogether. At the very least its combination with vector_operand is
pointless, because both resolve to the same vector_memory_operand. Same goes
for round{,_saeonly}_nimm_predicate, which too resolves to vector_operand.

--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -22072,56 +22072,53 @@
   "vpopcnt\t{%1, %0|%0, %1}")
 
 (define_insn "vgf2p8affineinvqb_"
-  [(set (match_operand:VI1_AVX512F 0 "register_operand" "=x,x,v")
+  [(set (match_operand:VI1_AVX512F 0 "register_operand" "=x,v")
(unspec:VI1_AVX512F
- [(match_operand:VI1_AVX512F 1 "register_operand" "%0,x,v")
-  (match_operand:VI1_AVX512F 2 "nonimmediate_operand" "xBm,xm,vm")
-  (match_operand:QI 3 "const_0_to_255_operand" "n,n,n")]
+ [(match_operand:VI1_AVX512F 1 "register_operand" "0,v")
+  (match_operand:VI1_AVX512F 2 "vector_operand" "xm,vm")
+  (match_operand:QI 3 "const_0_to_255_operand" "n,n")]
  UNSPEC_GF2P8AFFINEINV))]
   "TARGET_GFNI"
   "@
gf2p8affineinvqb\t{%3, %2, %0| %0, %2, %3}
-   vgf2p8affineinvqb\t{%3, %2, %1, %0| %0, %1, 
%2, %3}
vgf2p8affineinvqb\t{%3, %2, %1, %0| %0, %1, 
%2, %3}"
-  [(set_attr "isa" "noavx,avx,avx512f")
-   (set_attr "prefix_data16" "1,*,*")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "prefix_data16" "1,*")
(set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,maybe_evex,evex")
+   (set_attr "prefix" "orig,maybe_evex")
(set_attr "mode" "")])
 
 (define_insn "vgf2p8affineqb_"
-  [(set (match_operand:VI1_AVX512F 0 "register_operand" "=x,x,v")
+  [(set (match_operand:VI1_AVX512F 0 "register_operand" "=x,v")
(unspec:VI1_AVX512F
- [(match_operand:VI1_AVX512F 1 "register_operand" "%0,x,v")
-  (match_operand:VI1_AVX512F 2 "nonimmediate_operand" "xBm,xm,vm")
-  (match_operand:QI 3 "const_0_to_255_operand" "n,n,n")]
+ [(match_operand:VI1_AVX512F 1 "register_operand" "0,v")
+  (match_operand:VI1_AVX512F 2 "vector_operand" "xm,vm")
+  (match_operand:QI 3 "const_0_to_255_operand" "n,n")]
  UNSPEC_GF2P8AFFINE))]
   "TARGET_GFNI"
   "@
gf2p8affineqb\t{%3, %2, %0| %0, %2, %3}
-   vgf2p8affineqb\t{%3, %2, %1, %0| %0, %1, %2, 
%3}
vgf2p8affineqb\t{%3, %2, %1, %0| %0, %1, %2, 
%3}"
-  [(set_attr "isa" "noavx,avx,avx512f")
-   (set_attr "prefix_data16" "1,*,*")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "prefix_data16" "1,*")
(set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,maybe_evex,evex")
+   (set_attr "prefix" "orig,maybe_evex")
(set_attr "mode" "")])
 
 (define_insn "vgf2p8mulb_"
-  [(set (match_operand:VI1_AVX512F 0 "register_operand" "=x,x,v")
+  [(set (match_operand:VI1_AVX512F 0 "register_operand" "=x,v")
(unspec:VI1_AVX512F
- [(match_operand:VI1_AVX512F 1 "register_operand" "%0,x,v")
-  (match_operand:VI1_AVX512F 2 "nonimmediate_operand" "xBm,xm,vm")]
+ [(match_operand:VI1_AVX512F 1 "register_operand" "%0,v")
+  (match_operand:VI1_AVX512F 2 "vector_operand" "xm,vm")]
  UNSPEC_GF2P8MUL))]
   "TARGET_GFNI"
   "@
gf2p8mulb\t{%2, %0| %0, %2}
-   vgf2p8mulb\t{%2, %1, %0| %0, %1, %2}
vgf2p8mulb\t{%2, %1, %0| %0, %1, %2}"
-  [(set_attr "isa" "noavx,avx,avx512f")
-   (set_attr "prefix_data16" "1,*,*")
+  [(set_attr "isa" "noavx,avx")
+   (set_attr "prefix_data16" "1,*")
(set_attr "prefix_extra" "1")
-   (set_attr "prefix" "orig,maybe_evex,evex")
+   (set_attr "prefix" "orig,maybe_evex")
(set_attr "mode" "")])
 
 (define_insn "vpshrd_"
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/gfni-5.c
@@ -0,0 +1,19 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2 -mgfni" } */
+
+typedef char __attribute__((vector_size(16))) v16qi_t;
+
+v16qi_t test16a (v16qi_t x, v16qi_t a)
+{
+  asm volatile ("" : "+m" (a));
+  return __builtin_ia32_vgf2p8affineqb_v16qi (x, a, 0);
+}
+
+v16qi_t test16b (v16qi_t x, v16qi_t a)
+{
+  asm volatile ("" : "+m" (x));
+  return __builtin_ia32_vgf2p8affineqb_v16qi (x, a, 0);
+}
+
+/* { dg-final { scan-assembler-times "gf2p8affineqb\[ \t].*\\(" 1 } } */
+/* { dg-final { scan-assembler-times "gf2p8affineqb\[ \t].*%xmm.*%xmm" 1 } } */





[PATCH] ix86: pass correct options to compiler for gfni-4 testcase

2019-06-27 Thread Jan Beulich
SSE2 is the required prereq of the builtins; as x86-64 has SSE2 enabled
anyway, the test failure was noticable on 32-bit builds only.

gcc/testsuite/
2019-06-27  Jan Beulich  

* gcc.target/i386/gfni-4.c: Pass -msse2.

--- a/gcc/testsuite/gcc.target/i386/gfni-4.c
+++ b/gcc/testsuite/gcc.target/i386/gfni-4.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-mgfni -O2 -msse" } */
+/* { dg-options "-mgfni -O2 -msse2" } */
 /* { dg-final { scan-assembler-times "gf2p8affineinvqb\[ 
\\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ 
\\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "gf2p8affineqb\[ 
\\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ 
\\t\]+#)" 1 } } */
 /* { dg-final { scan-assembler-times "gf2p8mulb\[ 
\\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */




[PATCH] x86/AVX512: improve generated code for mask-to-vector-register conversions

2019-06-27 Thread Jan Beulich
Conversion of comparison results to full vectors does, when VPMOVM2* are
unavailable, not require any intermediate VMOVDQ{A,U}*: Simply use
embedded masking on VPTERNLOG* right away, which is available with
AVX512F (while VPMOVM2{D,Q} are available only with AVX512DQ).

Note that the chosen immediate is only one of many possible ones; I was
trying to make the insn here distinguishable from the pre-existing uses
of vpternlog.

gcc/
2019-06-27  Jan Beulich  

* config/i386/sse.md (_cvtmask2):
Require only AVX512F.
(*_cvtmask2): Likewise.  Add
alternative expanding to vpternlog.

--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -6395,21 +6395,25 @@
  (match_dup 2)
  (match_dup 3)
  (match_operand: 1 "register_operand")))]
-  "TARGET_AVX512DQ"
+  "TARGET_AVX512F"
   "{
 operands[2] = CONSTM1_RTX (mode);
 operands[3] = CONST0_RTX (mode);
   }")
 
 (define_insn "*_cvtmask2"
-  [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v")
+  [(set (match_operand:VI48_AVX512VL 0 "register_operand" "=v,v")
(vec_merge:VI48_AVX512VL
  (match_operand:VI48_AVX512VL 2 "vector_all_ones_operand")
  (match_operand:VI48_AVX512VL 3 "const0_operand")
- (match_operand: 1 "register_operand" "k")))]
-  "TARGET_AVX512DQ"
-  "vpmovm2\t{%1, %0|%0, %1}"
-  [(set_attr "prefix" "evex")
+ (match_operand: 1 "register_operand" "k,Yk")))]
+  "TARGET_AVX512F"
+  "@
+   vpmovm2\t{%1, %0|%0, %1}
+   vpternlog\t{$0x81, %0, %0, %0%{%1%}%{z%}|%0%{%1%}%{z%}, %0, 
%0, 0x81}"
+  [(set_attr "isa" "avx512dq,*")
+   (set_attr "length_immediate" "0,1")
+   (set_attr "prefix" "evex")
(set_attr "mode" "")])
 
 (define_insn "sse2_cvtps2pd"






[PATCH] x86/AVX512: improve generated code for bit-wise negation of vectors of integers

2019-06-27 Thread Jan Beulich
NOT on vectors of integers does not require loading a constant vector of
all ones into a register - VPTERNLOG can be used here (and could/should
be further used to carry out other binary and ternary logical operations
which don't have a special purpose instruction).

gcc/
2019-06-27  Jan Beulich  

* config/i386/sse.md (ternlogsuffix): New.
(one_cmpl2): Don't force CONSTM1_RTX into a register when
AVX512F is in use.
(one_cmpl2): New.

--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -853,6 +853,13 @@
(V4SF "k") (V2DF "q")
(SF "k") (DF "q")])
 
+;; Mapping of vector modes to VPTERNLOG suffix
+(define_mode_attr ternlogsuffix
+  [(V8DI "q") (V4DI "q") (V2DI "q")
+   (V16SI "d") (V8SI "d") (V4SI "d")
+   (V32HI "d") (V16HI "d") (V8HI "d")
+   (V64QI "d") (V32QI "d") (V16QI "d")])
+
 ;; Number of scalar elements in each vector type
 (define_mode_attr ssescalarnum
   [(V64QI "64") (V16SI "16") (V8DI "8")
@@ -12564,9 +12571,22 @@
(match_dup 2)))]
   "TARGET_SSE"
 {
-  operands[2] = force_reg (mode, CONSTM1_RTX (mode));
+  if (!TARGET_AVX512F)
+operands[2] = force_reg (mode, CONSTM1_RTX (mode));
+  else
+operands[2] = CONSTM1_RTX (mode);
 })
 
+(define_insn "one_cmpl2"
+  [(set (match_operand:VI 0 "register_operand" "=v")
+   (xor:VI (match_operand:VI 1 "nonimmediate_operand" "vm")
+   (match_operand:VI 2 "vector_all_ones_operand" "BC")))]
+  "TARGET_AVX512F"
+  "vpternlog\t{$0x55, %1, %0, 
%0|%0, %0, %1, 0x55}"
+  [(set_attr "type" "sselog")
+   (set_attr "prefix" "evex")
+   (set_attr "mode" "")])
+
 (define_expand "_andnot3"
   [(set (match_operand:VI_AVX2 0 "register_operand")
(and:VI_AVX2






Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-27 Thread Rainer Orth
Hi Hongtao,

>> as usual, the new effective-target keyword needs documenting in
>> sourcebuild.texi.
> Like this?

a couple of nits: first of all, your mailer seems to replace tabs by a
single space.  Please fix this or attach the patch instead.

> Index: ChangeLog
> ===
> --- ChangeLog (revision 272668)
> +++ ChangeLog (working copy)
> @@ -1,3 +1,8 @@
> +2019-06-27  Hongtao Liu  
> +
> + * doc/sourcebuild.texi: Document new effective target keyword
> + avx512vp2intersect.

Please include the sections you're modifying, something like

* doc/sourcebuild.texi (Effective-Target Keywords, Other
hardware attributes): Document avx512vp2intersect.

And please don't include the ChangeLog in the patch, but include it in
the mail proper: it won't apply due to date and context changes anyway.
Best review https://gcc.gnu.org/contribute.html where this is documented
besides other points of patch submission.

Besides, it's most likely useful to also review the GNU Coding
Standards, too, not only for ChangeLog formatting.

> Index: testsuite/ChangeLog
> ===
> --- testsuite/ChangeLog (revision 272668)
> +++ testsuite/ChangeLog (working copy)
> @@ -1,3 +1,11 @@
> +2019-06-27  Hongtao Liu  
> +
> + * lib/target-supports.exp: Add
> + check_effective_target_avx512vp2intersect.

Use

* lib/target-supports.exp
(check_effective_target_avx512vp2intersect): New proc.

> + * gcc.target/i386/avx512vp2intersect-2intersect-1b.c: Add
> + dg-require-effective-target avx512vp2intersect.

Better:

* gcc.target/i386/avx512vp2intersect-2intersect-1b.c: Require
avx512vp2intersect.

but that's a matter of preference.

> Index: testsuite/gcc.target/i386/avx512vp2intersect-2intersect-1b.c
> ===
> --- testsuite/gcc.target/i386/avx512vp2intersect-2intersect-1b.c
> (revision 272668)
> +++ testsuite/gcc.target/i386/avx512vp2intersect-2intersect-1b.c (working 
> copy)
> @@ -1,5 +1,6 @@
>  /* { dg-do run } */
>  /* { dg-options "-O2 -mavx512vp2intersect" } */
> +/* { dg-require-effective-target "avx512vp2intersect" } */

No need to quote avx512vp2intersect here and in the next test.

Ok with those nits fixed.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH] x86: fix CVT{,T}PD2PI insns

2019-06-27 Thread Jan Beulich
With just an "m" constraint misaligned memory operands won't be forced
into a register, and hence cause #GP. So far this was guaranteed only
in the case that CVT{,T}PD2DQ were chosen (which looks to be the case on
x86-64 only).

Instead of switching the second alternative to Bm, use just m on the
first and replace nonimmediate_operand by vector_operand.

gcc/
2019-06-27  Jan Beulich  

* config/i386/sse.md (sse2_cvtpd2pi, sse2_cvttpd2pi): Use
vector_operand plus "m" constraint.

gcc/testsuite/
2019-06-27  Jan Beulich  

* gcc.target/i386/cvtpd2pi: New.

--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -5505,7 +5505,7 @@
 
 (define_insn "sse2_cvtpd2pi"
   [(set (match_operand:V2SI 0 "register_operand" "=v,?!y")
-   (unspec:V2SI [(match_operand:V2DF 1 "nonimmediate_operand" "vBm,xm")]
+   (unspec:V2SI [(match_operand:V2DF 1 "vector_operand" "vm,xm")]
 UNSPEC_FIX_NOTRUNC))]
   "TARGET_SSE2"
   "@
@@ -5523,7 +5523,7 @@
 
 (define_insn "sse2_cvttpd2pi"
   [(set (match_operand:V2SI 0 "register_operand" "=v,?!y")
-   (fix:V2SI (match_operand:V2DF 1 "nonimmediate_operand" "vBm,xm")))]
+   (fix:V2SI (match_operand:V2DF 1 "vector_operand" "vm,xm")))]
   "TARGET_SSE2"
   "@
* return TARGET_AVX ? \"vcvttpd2dq{x}\t{%1, %0|%0, %1}\" : 
\"cvttpd2dq\t{%1, %0|%0, %1}\";
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/cvtpd2pi.c
@@ -0,0 +1,36 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -msse2" } */
+
+typedef int __attribute__((vector_size(8))) v2si_t;
+typedef int __attribute__((vector_size(16))) v4si_t;
+typedef double __attribute__((vector_size(16))) v2df_t;
+
+struct __attribute__((packed)) s {
+  int i;
+  v2si_t m;
+  v4si_t v;
+};
+
+int test (struct s*ps)
+{
+  v4si_t r = ps->v;
+  v2si_t m;
+
+  if (ps->i > 0)
+{
+  asm volatile ("" : "+m" (*ps));
+  m = __builtin_ia32_cvtpd2pi ((v2df_t)ps->v);
+  r[0] = __builtin_ia32_paddd (m, m)[0];
+}
+  else
+{
+  asm volatile ("" : "+m" (*ps));
+  m = __builtin_ia32_cvttpd2pi ((v2df_t)ps->v);
+  r[0] = __builtin_ia32_paddd (m, m)[0];
+}
+
+  return r[0];
+}
+
+/* { dg-final { scan-assembler-not "cvtpd2pi\[ \t]\[^\n\r]*\\(" } } */
+/* { dg-final { scan-assembler-not "cvttpd2pi\[ \t]\[^\n\r]*\\(" } } */






[PATCH] Fix PR91004

2019-06-27 Thread Richard Biener


This fixes

FAIL: g++.dg/torture/pr34850.C   -O2  (test for excess errors)
FAIL: g++.dg/torture/pr34850.C   -O3 -g  (test for excess errors)
FAIL: g++.dg/torture/pr34850.C   -Os  (test for excess errors)
FAIL: g++.dg/torture/pr34850.C   -O2 -flto -fno-use-linker-plugin 
-flto-partition=none  (test for excess errors)
FAIL: g++.dg/torture/pr34850.C   -O2 -flto -fuse-linker-plugin 
-fno-fat-lto-objects  (test for excess errors)

by reducing the testcase a bit less.

Tested on x86_64-unknown-linux-gnu, applied to trunk.

2019-06-27  Richard Biener  

PR testsuite/91004
* g++.dg/torture/pr34850.C: Fix overly reduced testcase.

Index: gcc/testsuite/g++.dg/torture/pr34850.C
===
--- gcc/testsuite/g++.dg/torture/pr34850.C  (revision 272732)
+++ gcc/testsuite/g++.dg/torture/pr34850.C  (working copy)
@@ -13,7 +13,8 @@ extern "C" {
 extern __inline __attribute__ ((__always_inline__)) __attribute__ 
((__gnu_inline__, __artificial__))
 void * memset (void *__dest, int __ch, size_t __len) throw () {
if (__builtin_constant_p (__len) && __len == 0)
-   __warn_memset_zero_len ();
+ __warn_memset_zero_len ();
+   return __dest;
 }
 }
 inline void clear_mem(void* ptr, u32bit n){


Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-27 Thread Hongtao Liu
On Thu, Jun 27, 2019 at 5:02 PM Rainer Orth  
wrote:
>
> Hi Hongtao,
>
> >> as usual, the new effective-target keyword needs documenting in
> >> sourcebuild.texi.
> > Like this?
>
> a couple of nits: first of all, your mailer seems to replace tabs by a
> single space.  Please fix this or attach the patch instead.
>
> > Index: ChangeLog
> > ===
> > --- ChangeLog (revision 272668)
> > +++ ChangeLog (working copy)
> > @@ -1,3 +1,8 @@
> > +2019-06-27  Hongtao Liu  
> > +
> > + * doc/sourcebuild.texi: Document new effective target keyword
> > + avx512vp2intersect.
>
> Please include the sections you're modifying, something like
>
> * doc/sourcebuild.texi (Effective-Target Keywords, Other
> hardware attributes): Document avx512vp2intersect.
>
> And please don't include the ChangeLog in the patch, but include it in
> the mail proper: it won't apply due to date and context changes anyway.
> Best review https://gcc.gnu.org/contribute.html where this is documented
> besides other points of patch submission.
>
> Besides, it's most likely useful to also review the GNU Coding
> Standards, too, not only for ChangeLog formatting.
>
> > Index: testsuite/ChangeLog
> > ===
> > --- testsuite/ChangeLog (revision 272668)
> > +++ testsuite/ChangeLog (working copy)
> > @@ -1,3 +1,11 @@
> > +2019-06-27  Hongtao Liu  
> > +
> > + * lib/target-supports.exp: Add
> > + check_effective_target_avx512vp2intersect.
>
> Use
>
> * lib/target-supports.exp
> (check_effective_target_avx512vp2intersect): New proc.
>
> > + * gcc.target/i386/avx512vp2intersect-2intersect-1b.c: Add
> > + dg-require-effective-target avx512vp2intersect.
>
> Better:
>
> * gcc.target/i386/avx512vp2intersect-2intersect-1b.c: Require
> avx512vp2intersect.
>
> but that's a matter of preference.
>
> > Index: testsuite/gcc.target/i386/avx512vp2intersect-2intersect-1b.c
> > ===
> > --- testsuite/gcc.target/i386/avx512vp2intersect-2intersect-1b.c
> > (revision 272668)
> > +++ testsuite/gcc.target/i386/avx512vp2intersect-2intersect-1b.c (working 
> > copy)
> > @@ -1,5 +1,6 @@
> >  /* { dg-do run } */
> >  /* { dg-options "-O2 -mavx512vp2intersect" } */
> > +/* { dg-require-effective-target "avx512vp2intersect" } */
>
> No need to quote avx512vp2intersect here and in the next test.
>
> Ok with those nits fixed.
>
Yes, thanks a lot.
> Thanks.
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University

Ok for trunk?

-- 
BR,
Hongtao


[PATCH] Fix ICE when __builtin_calloc has no LHS (PR tree-optimization/91014).

2019-06-27 Thread Martin Liška
Hi.

This is quite an obvious changes I've noticed during fuzzing
of s390x target compiler.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-06-27  Martin Liska  

PR tree-optimization/91014
* tree-ssa-dse.c (initialize_ao_ref_for_dse): Bail out
when LHS is NULL_TREE.

gcc/testsuite/ChangeLog:

2019-06-27  Martin Liska  

PR tree-optimization/91014
* gcc.target/s390/pr91014.c: New test.
---
 gcc/testsuite/gcc.target/s390/pr91014.c | 8 
 gcc/tree-ssa-dse.c  | 5 +++--
 2 files changed, 11 insertions(+), 2 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/s390/pr91014.c


diff --git a/gcc/testsuite/gcc.target/s390/pr91014.c b/gcc/testsuite/gcc.target/s390/pr91014.c
new file mode 100644
index 000..eb37b333b5b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/s390/pr91014.c
@@ -0,0 +1,8 @@
+/* { dg-do compile } */
+/* { dg-options "-O" } */
+/* { dg-require-effective-target alloca } */
+
+void foo(void)
+{
+ __builtin_calloc (1, 1); /* { dg-warning "ignoring return value of '__builtin_calloc' declared with attribute 'warn_unused_result'" } */
+}
diff --git a/gcc/tree-ssa-dse.c b/gcc/tree-ssa-dse.c
index 1b1a9f34230..df05a55ce78 100644
--- a/gcc/tree-ssa-dse.c
+++ b/gcc/tree-ssa-dse.c
@@ -129,10 +129,11 @@ initialize_ao_ref_for_dse (gimple *stmt, ao_ref *write)
 	{
 	  tree nelem = gimple_call_arg (stmt, 0);
 	  tree selem = gimple_call_arg (stmt, 1);
+	  tree lhs;
 	  if (TREE_CODE (nelem) == INTEGER_CST
-		  && TREE_CODE (selem) == INTEGER_CST)
+		  && TREE_CODE (selem) == INTEGER_CST
+		  && (lhs = gimple_call_lhs (stmt)) != NULL_TREE)
 		{
-		  tree lhs = gimple_call_lhs (stmt);
 		  tree size = fold_build2 (MULT_EXPR, TREE_TYPE (nelem),
 	   nelem, selem);
 		  ao_ref_init_from_ptr_and_size (write, lhs, size);



Re: [PATCH] ix86: pass correct options to compiler for gfni-4 testcase

2019-06-27 Thread Uros Bizjak
On Thu, Jun 27, 2019 at 10:58 AM Jan Beulich  wrote:
>
> SSE2 is the required prereq of the builtins; as x86-64 has SSE2 enabled
> anyway, the test failure was noticable on 32-bit builds only.
>
> gcc/testsuite/
> 2019-06-27  Jan Beulich  
>
> * gcc.target/i386/gfni-4.c: Pass -msse2.

OK.

Thanks,
Uros.

> --- a/gcc/testsuite/gcc.target/i386/gfni-4.c
> +++ b/gcc/testsuite/gcc.target/i386/gfni-4.c
> @@ -1,5 +1,5 @@
>  /* { dg-do compile } */
> -/* { dg-options "-mgfni -O2 -msse" } */
> +/* { dg-options "-mgfni -O2 -msse2" } */
>  /* { dg-final { scan-assembler-times "gf2p8affineinvqb\[ 
> \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ 
> \\t\]+#)" 1 } } */
>  /* { dg-final { scan-assembler-times "gf2p8affineqb\[ 
> \\t\]+\[^\{\n\]*\\\$3\[^\n\r]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ 
> \\t\]+#)" 1 } } */
>  /* { dg-final { scan-assembler-times "gf2p8mulb\[ 
> \\t\]+\[^\{\n\]*%xmm\[0-9\]+\[^\n\r]*%xmm\[0-9\]+(?:\n|\[ \\t\]+#)" 1 } } */
>
>


Re: [PATCH] Remove quite obvious dead assignments.

2019-06-27 Thread Martin Liška
On 6/26/19 1:39 PM, Jakub Jelinek wrote:
> On Wed, Jun 26, 2019 at 12:57:15PM +0200, Martin Liška wrote:
>> --- a/gcc/asan.c
>> +++ b/gcc/asan.c
>> @@ -1713,8 +1713,8 @@ asan_emit_allocas_unpoison (rtx top, rtx bot, rtx_insn 
>> *before)
>>rtx ret = init_one_libfunc ("__asan_allocas_unpoison");
>>top = convert_memory_address (ptr_mode, top);
>>bot = convert_memory_address (ptr_mode, bot);
>> -  ret = emit_library_call_value (ret, NULL_RTX, LCT_NORMAL, ptr_mode,
>> - top, ptr_mode, bot, ptr_mode);
>> +  emit_library_call_value (ret, NULL_RTX, LCT_NORMAL, ptr_mode,
>> +   top, ptr_mode, bot, ptr_mode);
> 
> If you don't need the return value, you should use emit_library_call,
> not emit_library_call_value.

Done.

> 
>> --- a/gcc/cfgexpand.c
>> +++ b/gcc/cfgexpand.c
>> @@ -3044,7 +3044,7 @@ expand_asm_stmt (gasm *stmt)
>>}
>>  }
>>  }
>> -  unsigned nclobbers = clobber_rvec.length();
>> +  unsigned nclobbers;
> 
> Can you instead move the nclobbers declaration to the spot where it
> is initialized (right now the second time), and add space before (
> there too?

Sure.

> 
>> --- a/gcc/config/i386/i386-expand.c
>> +++ b/gcc/config/i386/i386-expand.c
>> @@ -16051,8 +16051,6 @@ ix86_expand_rounddf_32 (rtx operand0, rtx operand1)
>>mhalf = expand_simple_binop (mode, MINUS, half, one, NULL_RTX,
>> 0, OPTAB_DIRECT);
>>  
>> -  /* Compensate.  */
>> -  tmp = gen_reg_rtx (mode);
>>/* xa2 = xa2 - (dxa > 0.5 ? 1 : 0) */
>>tmp = ix86_expand_sse_compare_mask (UNGT, dxa, half, false);
>>emit_insn (gen_rtx_SET (tmp, gen_rtx_AND (mode, one, tmp)));
> 
> Is this really desirable?
> Just quick look suggests perhaps it wants to use one temporary
> for the gen_reg_rtx (mode) and use that on the lhs of SET, and another one
> as the last operand to AND holding ix86_expand_sse_compare_mask?
> Though, such problem is in most of the ix86_expand_sse_compare_mask callers.
> While ix86_expand_sse_compare_mask returns a result of gen_reg_rtx too,
> I think it is better not to reuse pseudos for different values.
> I'd say take out this change and deal with it with i386 maintainers
> separately.

Good point, I've just created:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91016

> 
>> --- a/gcc/tree-vect-stmts.c
>> +++ b/gcc/tree-vect-stmts.c
>> @@ -3483,8 +3483,7 @@ vectorizable_call (stmt_vec_info stmt_info, 
>> gimple_stmt_iterator *gsi,
>>  = gimple_build_call_internal_vec (ifn, vargs);
>>gimple_call_set_lhs (call, half_res);
>>gimple_call_set_nothrow (call, true);
>> -  new_stmt_info
>> -= vect_finish_stmt_generation (stmt_info, call, gsi);
>> +  vect_finish_stmt_generation (stmt_info, call, gsi);
>>if ((i & 1) == 0)
>>  {
>>prev_res = half_res;
>> @@ -3583,8 +3582,7 @@ vectorizable_call (stmt_vec_info stmt_info, 
>> gimple_stmt_iterator *gsi,
>>gcall *call = gimple_build_call_internal_vec (ifn, vargs);
>>gimple_call_set_lhs (call, half_res);
>>gimple_call_set_nothrow (call, true);
>> -  new_stmt_info
>> -= vect_finish_stmt_generation (stmt_info, call, gsi);
>> +  vect_finish_stmt_generation (stmt_info, call, gsi);
>>if ((j & 1) == 0)
>>  {
>>prev_res = half_res;
> 
> This looks wrong.
> There should have been
> SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt_info);
> in between for slp_node, or the usual code like:
>   if (cond_for_first)
> STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt_info;
>   else
> STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
> prev_stmt_info = new_stmt_info;
> otherwise.  In any case, I think this should be dealt with separately.

Likewise here:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91017

I'm going to install updated patch once regression tests finish for it.
I excluded changes for the PR mentioned.

Martin

> 
>   Jakub
> 

>From 045440c628968c53cb10de317a3420912da81a93 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Wed, 26 Jun 2019 10:29:38 +0200
Subject: [PATCH] Remove quite obvious dead assignments.

gcc/ChangeLog:

2019-06-26  Martin Liska  

	* asan.c (asan_emit_allocas_unpoison): Remove obviously
	dead assignments.
	* bt-load.c (move_btr_def): Likewise.
	* builtins.c (expand_builtin_apply_args_1): Likewise.
	(expand_builtin_apply): Likewise.
	* cfgexpand.c (expand_asm_stmt): Likewise.
	(construct_init_block): Likewise.
	* cfghooks.c (verify_flow_info): Likewise.
	* cfgloopmanip.c (remove_path): Likewise.
	* cfgrtl.c (rtl_verify_bb_layout): Likewise.
	* cgraph.c (cgraph_node::set_pure_flag): Likewise.
	* combine.c (simplify_if_then_else): Likewise.
	* config/i386/i386.c (ix86_setup_incoming_vararg_bounds): Likewise.
	(cho

Re: [PATCH] rs6000: Enable -fvariable-expansion-in-unroller by default

2019-06-27 Thread Richard Biener
On Thu, Jun 27, 2019 at 5:23 AM Bill Schmidt  wrote:
>
> Hi,
>
> We've done some experimenting and realized that the subject option almost
> always provide improved performance for Power when the loop unroller is
> enabled.  So this patch turns that flag on by default for us.
>
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
> Is this OK for trunk?

I guess it creates more freedom for combine (more single-uses) and register
allocation.  I wonder in which cases this might pessimize things?  I guess
the pre-RA scheduler might make RAs life harder with creating overlapping
life-ranges.

I guess you didn't actually investigate the nature of the improvements you saw?

Do we want to adjust the flags documentation, saying whether this is enabled
by default depends on the target (or even list them)?

Thanks,
Richard.

> Thanks!
> Bill
>
>
> 2019-06-27  Bill Schmidt  
>
> * config/rs6000/rs6000.c (rs6000_option_override_internal): Enable
> -fvariable-expansion-in-unroller by default.
>
>
> Index: gcc/config/rs6000/rs6000.c
> ===
> --- gcc/config/rs6000/rs6000.c  (revision 272719)
> +++ gcc/config/rs6000/rs6000.c  (working copy)
> @@ -3616,6 +3616,11 @@ rs6000_option_override_internal (bool global_init_
>&& !global_options_set.x_flag_asynchronous_unwind_tables)
>  flag_asynchronous_unwind_tables = 1;
>
> +  /* -fvariable-expansion-in-unroller is a win for POWER whenever the
> + loop unroller is active.  It is only checked during unrolling, so
> + we can just set it on by default.  */
> +  flag_variable_expansion_in_unroller = 1;
> +
>/* Set the pointer size.  */
>if (TARGET_64BIT)
>  {
>


Re: allow EH to escape from GIMPLE_EH_ELSE ELSE block

2019-06-27 Thread Richard Biener
On Thu, Jun 27, 2019 at 10:18 AM Alexandre Oliva  wrote:
>
> The only preexisting use of GIMPLE_EH_ELSE, for transactional memory
> commits, did not allow exceptions to escape from the ELSE path.  The
> trick it uses to allow the ELSE path to see the propagating exception
> does not work very well if the exception cleanup raises further
> exceptions: the ELSE block is configured to handle exceptions in
> itself.  This confuses the heck out of CFG and EH cleanups.
>
> Basing the lowering context for the ELSE block on outer_state, rather
> than this_state, gets us the expected enclosing handler.
>
> Regstrapped on x86_64-linux-gnu.  Ok to install?

Testcase?

>
> for  gcc/ChangeLog
>
> * tree-eh.c (honor_protect_cleanup_actions): Use outer_
> rather than this_state as the lowering context for the ELSE
> seq in a GIMPLE_EH_ELSE.
> ---
>  gcc/tree-eh.c |   13 -
>  1 file changed, 8 insertions(+), 5 deletions(-)
>
> diff --git a/gcc/tree-eh.c b/gcc/tree-eh.c
> index 23c56b5661a1..4de09d1bf7b5 100644
> --- a/gcc/tree-eh.c
> +++ b/gcc/tree-eh.c
> @@ -1001,11 +1001,14 @@ honor_protect_cleanup_actions (struct leh_state 
> *outer_state,
>gimple_try_set_cleanup (tf->top_p, gimple_eh_else_n_body (eh_else));
>finally = gimple_eh_else_e_body (eh_else);
>
> -  /* Let the ELSE see the exception that's being processed.  */
> -  eh_region save_ehp = this_state->ehp_region;
> -  this_state->ehp_region = this_state->cur_region;
> -  lower_eh_constructs_1 (this_state, &finally);
> -  this_state->ehp_region = save_ehp;
> +  /* Let the ELSE see the exception that's being processed, but
> +since the cleanup is outside the try block, process it with
> +outer_state, otherwise it may be used as a cleanup for
> +itself, and Bad Things (TM) ensue.  */
> +  eh_region save_ehp = outer_state->ehp_region;
> +  outer_state->ehp_region = this_state->cur_region;
> +  lower_eh_constructs_1 (outer_state, &finally);
> +  outer_state->ehp_region = save_ehp;
>  }
>else
>  {
>
> --
> Alexandre Oliva, freedom fighter  he/him   https://FSFLA.org/blogs/lxo
> Be the change, be Free! FSF Latin America board member
> GNU Toolchain EngineerFree Software Evangelist
> Hay que enGNUrecerse, pero sin perder la terGNUra jamás - Che GNUevara


Re: [PATCH] Enable GCC support for AVX512_VP2INTERSECT.

2019-06-27 Thread Rainer Orth
Hi Hongtao,

> On Thu, Jun 27, 2019 at 5:02 PM Rainer Orth  
> wrote:
>>
>> Hi Hongtao,
>>
>> >> as usual, the new effective-target keyword needs documenting in
>> >> sourcebuild.texi.
>> > Like this?
>>
>> a couple of nits: first of all, your mailer seems to replace tabs by a
>> single space.  Please fix this or attach the patch instead.
>>
>> > Index: ChangeLog
>> > ===
>> > --- ChangeLog (revision 272668)
>> > +++ ChangeLog (working copy)
>> > @@ -1,3 +1,8 @@
>> > +2019-06-27  Hongtao Liu  
>> > +
>> > + * doc/sourcebuild.texi: Document new effective target keyword
>> > + avx512vp2intersect.
>>
>> Please include the sections you're modifying, something like
>>
>> * doc/sourcebuild.texi (Effective-Target Keywords, Other
>> hardware attributes): Document avx512vp2intersect.
>>
>> And please don't include the ChangeLog in the patch, but include it in
>> the mail proper: it won't apply due to date and context changes anyway.
>> Best review https://gcc.gnu.org/contribute.html where this is documented
>> besides other points of patch submission.
>>
>> Besides, it's most likely useful to also review the GNU Coding
>> Standards, too, not only for ChangeLog formatting.
>>
>> > Index: testsuite/ChangeLog
>> > ===
>> > --- testsuite/ChangeLog (revision 272668)
>> > +++ testsuite/ChangeLog (working copy)
>> > @@ -1,3 +1,11 @@
>> > +2019-06-27  Hongtao Liu  
>> > +
>> > + * lib/target-supports.exp: Add
>> > + check_effective_target_avx512vp2intersect.
>>
>> Use
>>
>> * lib/target-supports.exp
>> (check_effective_target_avx512vp2intersect): New proc.
>>
>> > + * gcc.target/i386/avx512vp2intersect-2intersect-1b.c: Add
>> > + dg-require-effective-target avx512vp2intersect.
>>
>> Better:
>>
>> * gcc.target/i386/avx512vp2intersect-2intersect-1b.c: Require
>> avx512vp2intersect.
>>
>> but that's a matter of preference.
>>
>> > Index: testsuite/gcc.target/i386/avx512vp2intersect-2intersect-1b.c
>> > ===
>> > --- testsuite/gcc.target/i386/avx512vp2intersect-2intersect-1b.c
>> > (revision 272668)
>> > +++ testsuite/gcc.target/i386/avx512vp2intersect-2intersect-1b.c
>> > (working copy)
>> > @@ -1,5 +1,6 @@
>> >  /* { dg-do run } */
>> >  /* { dg-options "-O2 -mavx512vp2intersect" } */
>> > +/* { dg-require-effective-target "avx512vp2intersect" } */
>>
>> No need to quote avx512vp2intersect here and in the next test.
>>
>> Ok with those nits fixed.
>>
> Yes, thanks a lot.
>> Thanks.
>> Rainer
>>
>> --
>> -
>> Rainer Orth, Center for Biotechnology, Bielefeld University
>
> Ok for trunk?

ENOPATCH

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: introduce EH_ELSE tree and gimplifier

2019-06-27 Thread Richard Biener
On Thu, Jun 27, 2019 at 10:18 AM Alexandre Oliva  wrote:
>
> I found GIMPLE_EH_ELSE offered exactly the semantics I needed for some
> Ada changes yet to be contributed, but GIMPLE_EH_ELSE was only built
> by GIMPLE passes, and I needed to build earlier something that
> eventually became GIMPLE_EH_ELSE.
>
> This patch does that, introducing an EH_ELSE tree, and logic to dump
> it and to gimplify it.
>
> Regstrapped on x86_64-linux-gnu.  Ok to install?
>
>
> for  gcc/ChangeLog
>
> * doc/generic.texi (Cleanups): Document EH_ELSE.
> * except.c: Likewise.
> * expr.c (expand_expr_real_1): Reject it.
> * gimplify.c (gimplify_expr): Gimplify it, within
> TRY_FINALLY_EXPR.
> * tree-dump.c (dequeue_and_dump): Dump it.
> * tree-pretty-print.c (dump_generic_node): Likewise.
> * tree.c (block_may_fallthru): Handle it.
> * tree.def (EH_ELSE): Introduce.
> ---
>  gcc/doc/generic.texi|5 +
>  gcc/except.c|   12 ++--
>  gcc/expr.c  |1 +
>  gcc/gimplify.c  |   18 +-
>  gcc/tree-dump.c |1 +
>  gcc/tree-pretty-print.c |   11 +++
>  gcc/tree.c  |3 +++
>  gcc/tree.def|7 +++
>  8 files changed, 51 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/doc/generic.texi b/gcc/doc/generic.texi
> index 67f7ad53af6b..1d0e3cfec1d6 100644
> --- a/gcc/doc/generic.texi
> +++ b/gcc/doc/generic.texi
> @@ -2180,6 +2180,11 @@ After the second sequence is executed, if it completes 
> normally by
>  falling off the end, execution continues wherever the first sequence
>  would have continued, by falling off the end, or doing a goto, etc.
>
> +If the second sequence is an @code{EH_ELSE} selector, then the sequence
> +in its first operand is used when the first sequence completes normally,
> +and that in its second operand is used for exceptional cleanups, i.e.,
> +when an exception propagates out of the first sequence.
> +
>  @code{TRY_FINALLY_EXPR} complicates the flow graph, since the cleanup
>  needs to appear on every edge out of the controlled block; this
>  reduces the freedom to move code across these edges.  Therefore, the
> diff --git a/gcc/except.c b/gcc/except.c
> index edaeeb4cfd1b..76cd099749f3 100644
> --- a/gcc/except.c
> +++ b/gcc/except.c
> @@ -27,14 +27,14 @@ along with GCC; see the file COPYING3.  If not see
> the compilation process:
>
> In the beginning, in the front end, we have the GENERIC trees
> -   TRY_CATCH_EXPR, TRY_FINALLY_EXPR, WITH_CLEANUP_EXPR,
> +   TRY_CATCH_EXPR, TRY_FINALLY_EXPR, EH_ELSE, WITH_CLEANUP_EXPR,
> CLEANUP_POINT_EXPR, CATCH_EXPR, and EH_FILTER_EXPR.
>
> -   During initial gimplification (gimplify.c) these are lowered
> -   to the GIMPLE_TRY, GIMPLE_CATCH, and GIMPLE_EH_FILTER nodes.
> -   The WITH_CLEANUP_EXPR and CLEANUP_POINT_EXPR nodes are converted
> -   into GIMPLE_TRY_FINALLY nodes; the others are a more direct 1-1
> -   conversion.
> +   During initial gimplification (gimplify.c) these are lowered to the
> +   GIMPLE_TRY, GIMPLE_CATCH, GIMPLE_EH_ELSE, and GIMPLE_EH_FILTER
> +   nodes.  The WITH_CLEANUP_EXPR and CLEANUP_POINT_EXPR nodes are
> +   converted into GIMPLE_TRY_FINALLY nodes; the others are a more
> +   direct 1-1 conversion.
>
> During pass_lower_eh (tree-eh.c) we record the nested structure
> of the TRY nodes in EH_REGION nodes in CFUN->EH->REGION_TREE.
> diff --git a/gcc/expr.c b/gcc/expr.c
> index c78bc74c0d9f..70fb721a9621 100644
> --- a/gcc/expr.c
> +++ b/gcc/expr.c
> @@ -11292,6 +11292,7 @@ expand_expr_real_1 (tree exp, rtx target, 
> machine_mode tmode,
>  case CATCH_EXPR:
>  case EH_FILTER_EXPR:
>  case TRY_FINALLY_EXPR:
> +case EH_ELSE:
>/* Lowered by tree-eh.c.  */
>gcc_unreachable ();
>
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index 0b25e7100cde..de4cb66e2b65 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -13074,7 +13074,22 @@ gimplify_expr (tree *expr_p, gimple_seq *pre_p, 
> gimple_seq *post_p,
> input_location = UNKNOWN_LOCATION;
> eval = cleanup = NULL;
> gimplify_and_add (TREE_OPERAND (*expr_p, 0), &eval);
> -   gimplify_and_add (TREE_OPERAND (*expr_p, 1), &cleanup);
> +   if (TREE_CODE (*expr_p) == TRY_FINALLY_EXPR
> +   && TREE_CODE (TREE_OPERAND (*expr_p, 1)) == EH_ELSE)
> + {
> +   gimple_seq n = NULL, e = NULL;
> +   gimplify_and_add (TREE_OPERAND (TREE_OPERAND (*expr_p, 1),
> +   0), &n);
> +   gimplify_and_add (TREE_OPERAND (TREE_OPERAND (*expr_p, 1),
> +   1), &e);
> +   if (!gimple_seq_empty_p (n) && !gimple_seq_empty_p (e))
> + {
> +   geh_else *stmt = gimple_build_eh_else (n, e);
> +   gimple_seq_add_stmt (&cleanup, stmt);
> + 

Re: [PATCH] Fix ICE when __builtin_calloc has no LHS (PR tree-optimization/91014).

2019-06-27 Thread Richard Biener
On Thu, Jun 27, 2019 at 11:21 AM Martin Liška  wrote:
>
> Hi.
>
> This is quite an obvious changes I've noticed during fuzzing
> of s390x target compiler.
>
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>
> Ready to be installed?

OK.

Richard.

> Thanks,
> Martin
>
> gcc/ChangeLog:
>
> 2019-06-27  Martin Liska  
>
> PR tree-optimization/91014
> * tree-ssa-dse.c (initialize_ao_ref_for_dse): Bail out
> when LHS is NULL_TREE.
>
> gcc/testsuite/ChangeLog:
>
> 2019-06-27  Martin Liska  
>
> PR tree-optimization/91014
> * gcc.target/s390/pr91014.c: New test.
> ---
>  gcc/testsuite/gcc.target/s390/pr91014.c | 8 
>  gcc/tree-ssa-dse.c  | 5 +++--
>  2 files changed, 11 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/s390/pr91014.c
>
>


[PATCH] PR libstdc++/91012 fixfilesystem_error::what() string

2019-06-27 Thread Jonathan Wakely

When I refactored the filesystem_error code I changed it to only use the
constructor parameter in the what() string, instead of the string
returned by system_error::what(). That meant it no longer included the
description of the error_code that system_error adds. This restores the
previous behaivour, as encouraged by the standard ("Implementations
should include the system_error::what() string and the pathnames of
path1 and path2 in the native format in the returned string").

PR libstdc++/91012
* src/c++17/fs_path.cc (filesystem_error::_Impl): Use a string_view
for the what_arg parameters.
(filesystem_error::filesystem_error): Pass system_error::what() to
the _Impl constructor.
* testsuite/27_io/filesystem/filesystem_error/cons.cc: Ensure that
filesystem_error::what() contains system_error::what().

Tested x86_64-linux, committed to trunk. I'll backport to gcc-9-branch
too.

commit 026d1259cc4b761c4ab638ab2bcd251f880cbd32
Author: redi 
Date:   Thu Jun 27 09:42:39 2019 +

PR libstdc++/91012 fixfilesystem_error::what() string

When I refactored the filesystem_error code I changed it to only use the
constructor parameter in the what() string, instead of the string
returned by system_error::what(). That meant it no longer included the
description of the error_code that system_error adds. This restores the
previous behaivour, as encouraged by the standard ("Implementations
should include the system_error::what() string and the pathnames of
path1 and path2 in the native format in the returned string").

PR libstdc++/91012
* src/c++17/fs_path.cc (filesystem_error::_Impl): Use a string_view
for the what_arg parameters.
(filesystem_error::filesystem_error): Pass system_error::what() to
the _Impl constructor.
* testsuite/27_io/filesystem/filesystem_error/cons.cc: Ensure that
filesystem_error::what() contains system_error::what().

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@272739 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/src/c++17/fs_path.cc 
b/libstdc++-v3/src/c++17/fs_path.cc
index 82ac736f82a..14842452354 100644
--- a/libstdc++-v3/src/c++17/fs_path.cc
+++ b/libstdc++-v3/src/c++17/fs_path.cc
@@ -1928,20 +1928,20 @@ fs::hash_value(const path& p) noexcept
 
 struct fs::filesystem_error::_Impl
 {
-  _Impl(const string& what_arg, const path& p1, const path& p2)
+  _Impl(string_view what_arg, const path& p1, const path& p2)
   : path1(p1), path2(p2), what(make_what(what_arg, &p1, &p2))
   { }
 
-  _Impl(const string& what_arg, const path& p1)
+  _Impl(string_view what_arg, const path& p1)
   : path1(p1), path2(), what(make_what(what_arg, &p1, nullptr))
   { }
 
-  _Impl(const string& what_arg)
+  _Impl(string_view what_arg)
   : what(make_what(what_arg, nullptr, nullptr))
   { }
 
   static std::string
-  make_what(const std::string& s, const path* p1, const path* p2)
+  make_what(string_view s, const path* p1, const path* p2)
   {
 const std::string pstr1 = p1 ? p1->u8string() : std::string{};
 const std::string pstr2 = p2 ? p2->u8string() : std::string{};
@@ -1977,20 +1977,20 @@ template class std::__shared_ptr;
 fs::filesystem_error::
 filesystem_error(const string& what_arg, error_code ec)
 : system_error(ec, what_arg),
-  _M_impl(std::__make_shared<_Impl>(what_arg))
+  _M_impl(std::__make_shared<_Impl>(system_error::what()))
 { }
 
 fs::filesystem_error::
 filesystem_error(const string& what_arg, const path& p1, error_code ec)
 : system_error(ec, what_arg),
-  _M_impl(std::__make_shared<_Impl>(what_arg, p1))
+  _M_impl(std::__make_shared<_Impl>(system_error::what(), p1))
 { }
 
 fs::filesystem_error::
 filesystem_error(const string& what_arg, const path& p1, const path& p2,
 error_code ec)
 : system_error(ec, what_arg),
-  _M_impl(std::__make_shared<_Impl>(what_arg, p1, p2))
+  _M_impl(std::__make_shared<_Impl>(system_error::what(), p1, p2))
 { }
 
 fs::filesystem_error::~filesystem_error() = default;
diff --git a/libstdc++-v3/testsuite/27_io/filesystem/filesystem_error/cons.cc 
b/libstdc++-v3/testsuite/27_io/filesystem/filesystem_error/cons.cc
index 8b24541cbc6..1644d355339 100644
--- a/libstdc++-v3/testsuite/27_io/filesystem/filesystem_error/cons.cc
+++ b/libstdc++-v3/testsuite/27_io/filesystem/filesystem_error/cons.cc
@@ -36,9 +36,10 @@ test01()
   const std::error_code ec = make_error_code(std::errc::is_a_directory);
   const std::filesystem::path p1 = "test/path/one";
   const std::filesystem::path p2 = "/test/path/two";
+  std::system_error syserr(ec, str);
 
   const filesystem_error e1(str, ec);
-  VERIFY( contains(e1.what(), str) );
+  VERIFY( contains(e1.what(), syserr.what()) );
   VERIFY( !contains(e1.what(), "[]") ); // no "empty path" in the string
   VERIFY( e1.path1().empty() );
   VERIFY( e1.path2().empty() );
@@ -47,7 +48,7 @@ test01()
   const filesystem

Re: Use ODR for canonical types construction in LTO

2019-06-27 Thread Jan Hubicka
Hi,
this is updated patch for ODR based canonical type calculation.  It makes use
of TYPE_CXX_ODR_P instead of doing the guesswork and while thinking how to get
rid of the quadratic behaviour of the hash I noticed that all the logic fits
quite naturally into gimple_register_canonical_type_1.

We only need to arrange that all non-ODR types gets to hash before we has
TYPE_CXX_ODR_P and since hash functions recursively populate the type hash
we know that subtypes with linkage are processed before types.

>From the WPA stats we now get relatively few non-ODR types building cc1plus
[WPA] Compared 1062055 SCCs, 890518 collisions (0.838486)
[WPA] Merged 1059104 SCCs
[WPA] Merged 6025225 tree bodies
[WPA] Merged 639655 types
[WPA] 101515 types prevailed (177264 associated trees)
[WPA] GIMPLE canonical type table: size 16381, 262 elements, 5685 searches, 48 
collisions (ratio: 0.008443)
[WPA] GIMPLE canonical type pointer-map: 3548 elements, 14261 searches

So 262 distinct types compared to 1590 which is an effect of Jason's patch.
There are 3286 ODR types having type canonical set by name and 910 have
conflict to non-ODR type.

I am still waiting for the statistics of alias oracle, but for tramp3d there
are no significant changes compared to the first patch.

lto-bootstrapped/regtested x86_64-linux, OK?

* class.c (layout_class_type): Set TYPE_CXX_ODR_P for as-base
type copy.

* ipa-devirt.c (odr_type_d): Add tbaa_enabled flag.
(add_type_duplicate): When odr hash is not allocated, to nothing.
(odr_based_tbaa_p): New function.
(set_type_canonical_for_odr_type): New function.
* ipa-utils.h (enable_odr_based_tbaa, odr_based_tbaa_p,
set_type_canonical_for_odr_type): New.
* lto-common.c: Include demangle.h and tree-pretty-print.h
(type_streaming_finished): New static var.
(gimple_register_canonical_type_1): Return updated hash; handle ODR
types.
(iterative_hash_canonical_type): Update use of
gimple_register_canonical_type_1.
* tree.c (gimple_canonical_types_compatible_p): ODR types with
ODR based TBAA are not equivalent to non-ODR types.

* g++.dg/lto/alias-2_0.C: New testcase.
* g++.dg/lto/alias-2_1.C: New testcase.
Index: cp/class.c
===
--- cp/class.c  (revision 272656)
+++ cp/class.c  (working copy)
@@ -6395,6 +6395,7 @@ layout_class_type (tree t, tree *virtual
   SET_TYPE_ALIGN (base_t, rli->record_align);
   TYPE_USER_ALIGN (base_t) = TYPE_USER_ALIGN (t);
   TYPE_TYPELESS_STORAGE (base_t) = TYPE_TYPELESS_STORAGE (t);
+  TYPE_CXX_ODR_P (base_t) = true;
 
   /* Copy the non-static data members of T. This will include its
 direct non-virtual bases & vtable.  */
Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 272656)
+++ ipa-devirt.c(working copy)
@@ -213,6 +213,8 @@ struct GTY(()) odr_type_d
   bool odr_violated;
   /* Set when virtual table without RTTI previaled table with.  */
   bool rtti_broken;
+  /* Set when the canonical type is determined using the type name.  */
+  bool tbaa_enabled;
 };
 
 /* Return TRUE if all derived types of T are known and thus
@@ -1610,6 +1612,9 @@ add_type_duplicate (odr_type val, tree t
 
   val->types_set->add (type);
 
+  if (!odr_hash)
+return NULL;
+
   gcc_checking_assert (can_be_name_hashed_p (type)
   && can_be_name_hashed_p (val->type));
 
@@ -1989,6 +1994,46 @@ prevailing_odr_type (tree type)
   return t->type;
 }
 
+/* Set tbaa_enabled flag for TYPE.  */
+
+void
+enable_odr_based_tbaa (tree type)
+{
+  odr_type t = get_odr_type (type, true);
+  t->tbaa_enabled = true;
+}
+
+/* True if canonical type of TYPE is determined using ODR name.  */
+
+bool
+odr_based_tbaa_p (const_tree type)
+{
+  if (!RECORD_OR_UNION_TYPE_P (type))
+return false;
+  odr_type t = get_odr_type (const_cast  (type), false);
+  if (!t || !t->tbaa_enabled)
+return false;
+  return true;
+}
+
+/* Set TYPE_CANONICAL of type and all its variants and duplicates
+   to CANONICAL.  */
+
+void
+set_type_canonical_for_odr_type (tree type, tree canonical)
+{
+  odr_type t = get_odr_type (type, false);
+  unsigned int i;
+  tree tt;
+
+  for (tree t2 = t->type; t2; t2 = TYPE_NEXT_VARIANT (t2))
+TYPE_CANONICAL (t2) = canonical;
+  if (t->types)
+FOR_EACH_VEC_ELT (*t->types, i, tt)
+  for (tree t2 = tt; t2; t2 = TYPE_NEXT_VARIANT (t2))
+TYPE_CANONICAL (t2) = canonical;
+}
+
 /* Return true if we reported some ODR violation on TYPE.  */
 
 bool
Index: ipa-utils.h
===
--- ipa-utils.h (revision 272656)
+++ ipa-utils.h (working copy)
@@ -93,6 +93,9 @@ bool odr_or_derived_type_p (const_tree t
 bool odr_types_equivalent_p (tree type1, tree type2);
 bool odr_type_violation_reported_p (tree type)

Re: [PATCH] Remove quite obvious dead assignments.

2019-06-27 Thread Richard Sandiford
Martin Liška  writes:
> On 6/26/19 1:39 PM, Jakub Jelinek wrote:
>> On Wed, Jun 26, 2019 at 12:57:15PM +0200, Martin Liška wrote:
>>> --- a/gcc/tree-vect-stmts.c
>>> +++ b/gcc/tree-vect-stmts.c
>>> @@ -3483,8 +3483,7 @@ vectorizable_call (stmt_vec_info stmt_info, 
>>> gimple_stmt_iterator *gsi,
>>> = gimple_build_call_internal_vec (ifn, vargs);
>>>   gimple_call_set_lhs (call, half_res);
>>>   gimple_call_set_nothrow (call, true);
>>> - new_stmt_info
>>> -   = vect_finish_stmt_generation (stmt_info, call, gsi);
>>> + vect_finish_stmt_generation (stmt_info, call, gsi);
>>>   if ((i & 1) == 0)
>>> {
>>>   prev_res = half_res;
>>> @@ -3583,8 +3582,7 @@ vectorizable_call (stmt_vec_info stmt_info, 
>>> gimple_stmt_iterator *gsi,
>>>   gcall *call = gimple_build_call_internal_vec (ifn, vargs);
>>>   gimple_call_set_lhs (call, half_res);
>>>   gimple_call_set_nothrow (call, true);
>>> - new_stmt_info
>>> -   = vect_finish_stmt_generation (stmt_info, call, gsi);
>>> + vect_finish_stmt_generation (stmt_info, call, gsi);
>>>   if ((j & 1) == 0)
>>> {
>>>   prev_res = half_res;
>> 
>> This looks wrong.
>> There should have been
>> SLP_TREE_VEC_STMTS (slp_node).quick_push (new_stmt_info);
>> in between for slp_node, or the usual code like:
>>   if (cond_for_first)
>> STMT_VINFO_VEC_STMT (stmt_info) = *vec_stmt = new_stmt_info;
>>   else
>> STMT_VINFO_RELATED_STMT (prev_stmt_info) = new_stmt_info;
>>prev_stmt_info = new_stmt_info;
>> otherwise.  In any case, I think this should be dealt with separately.
>
> Likewise here:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=91017

I think this part of the patch is OK.  SLP_TREE_VEC_STMTS and
STMT_VINFO_VEC_STMT should only receive the final stmt for each vector
result, whereas these stmts are producing intermediate results.

The assignments were introduced as part of a mechanical patch to avoid
using vinfo_for_stmt to get stmt_infos that we'd just created.  I'd
missed that in these cases we didn't actually use vinfo_for_stmt.

Off-hand, I'm not sure whether we actually need to create the
stmt_info here as things stand, or whether we're really just using
vect_finish_stmt_generation to insert the stmt in the right way.
I did have some WIP patches that would need the stmt_info to be
created though.

Thanks,
Richard


Re: [PATCH] Improve avx_vec_concat (PR target/90991)

2019-06-27 Thread Uros Bizjak
On Thu, Jun 27, 2019 at 8:17 AM Jakub Jelinek  wrote:
>
> Hi!
>
> In the last two alternatives of avx_vec_concat, we can allow memory
> source, which optimizes the following testcases from weird
> vmovaps (%rdi), %xmm0
> vmovaps %xmm0, %xmm0
> and similar to just the first instruction.  I went through all the
> gen_avx_vec_concat* users and all of them ensure the middle operand is a
> register by force_reg or constraint, rather than by testing the predicate
> of the instruction, so I believe the additional condition on the instruction
> is fine and we don't need some expander to fix up the case when middle
> operand is a MEM and last operand is not zero.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>
> 2019-06-27  Jakub Jelinek  
>
> PR target/90991
> * config/i386/sse.md (avx_vec_concat): Use nonimmediate_operand
> instead of register_operand for operands[1], add m to its constraints
> if operands[2] uses "C" constraint.  Ensure in condition that if
> operands[2] is not 0, then operands[1] is not a MEM.  For last two
> alternatives, use unaligned loads instead of aligned if operands[1] is
> misaligned_operand.
>
> * gcc.target/i386/avx2-pr90991-1.c: New test.
> * gcc.target/i386/avx512dq-pr90991-2.c: New test.

OK with a nit below.

Thanks,
Uros.

>
> --- gcc/config/i386/sse.md.jj   2019-06-26 09:22:40.506567515 +0200
> +++ gcc/config/i386/sse.md  2019-06-26 09:59:15.271571330 +0200
> @@ -20743,9 +20743,10 @@ (define_insn "_
>  (define_insn "avx_vec_concat"
>[(set (match_operand:V_256_512 0 "register_operand" "=x,v,x,Yv")
> (vec_concat:V_256_512
> - (match_operand: 1 "register_operand" "x,v,x,v")
> + (match_operand: 1 "nonimmediate_operand" 
> "x,v,xm,vm")
>   (match_operand: 2 "nonimm_or_0_operand" 
> "xm,vm,C,C")))]
> -  "TARGET_AVX"
> +  "TARGET_AVX && (operands[2] == CONST0_RTX (mode)
> + || !MEM_P (operands[1]))"

Please put "&& (operands[2] ..." in a separate line.

>  {
>switch (which_alternative)
>  {
> @@ -20771,27 +20772,63 @@ (define_insn "avx_vec_concat"
>switch (get_attr_mode (insn))
> {
> case MODE_V16SF:
> - return "vmovaps\t{%1, %t0|%t0, %1}";
> + if (misaligned_operand (operands[1], mode))
> +   return "vmovups\t{%1, %t0|%t0, %1}";
> + else
> +   return "vmovaps\t{%1, %t0|%t0, %1}";
> case MODE_V8DF:
> - return "vmovapd\t{%1, %t0|%t0, %1}";
> + if (misaligned_operand (operands[1], mode))
> +   return "vmovupd\t{%1, %t0|%t0, %1}";
> + else
> +   return "vmovapd\t{%1, %t0|%t0, %1}";
> case MODE_V8SF:
> - return "vmovaps\t{%1, %x0|%x0, %1}";
> + if (misaligned_operand (operands[1], mode))
> +   return "vmovups\t{%1, %x0|%x0, %1}";
> + else
> +   return "vmovaps\t{%1, %x0|%x0, %1}";
> case MODE_V4DF:
> - return "vmovapd\t{%1, %x0|%x0, %1}";
> + if (misaligned_operand (operands[1], mode))
> +   return "vmovupd\t{%1, %x0|%x0, %1}";
> + else
> +   return "vmovapd\t{%1, %x0|%x0, %1}";
> case MODE_XI:
> - if (which_alternative == 2)
> -   return "vmovdqa\t{%1, %t0|%t0, %1}";
> - else if (GET_MODE_SIZE (mode) == 8)
> -   return "vmovdqa64\t{%1, %t0|%t0, %1}";
> + if (misaligned_operand (operands[1], mode))
> +   {
> + if (which_alternative == 2)
> +   return "vmovdqu\t{%1, %t0|%t0, %1}";
> + else if (GET_MODE_SIZE (mode) == 8)
> +   return "vmovdqu64\t{%1, %t0|%t0, %1}";
> + else
> +   return "vmovdqu32\t{%1, %t0|%t0, %1}";
> +   }
>   else
> -   return "vmovdqa32\t{%1, %t0|%t0, %1}";
> +   {
> + if (which_alternative == 2)
> +   return "vmovdqa\t{%1, %t0|%t0, %1}";
> + else if (GET_MODE_SIZE (mode) == 8)
> +   return "vmovdqa64\t{%1, %t0|%t0, %1}";
> + else
> +   return "vmovdqa32\t{%1, %t0|%t0, %1}";
> +   }
> case MODE_OI:
> - if (which_alternative == 2)
> -   return "vmovdqa\t{%1, %x0|%x0, %1}";
> - else if (GET_MODE_SIZE (mode) == 8)
> -   return "vmovdqa64\t{%1, %x0|%x0, %1}";
> + if (misaligned_operand (operands[1], mode))
> +   {
> + if (which_alternative == 2)
> +   return "vmovdqu\t{%1, %x0|%x0, %1}";
> + else if (GET_MODE_SIZE (mode) == 8)
> +   return "vmovdqu64\t{%1, %x0|%x0, %1}";
> + else
> +   return "vmovdqu32\t{%1, %x0|%x0, %1}";
> +   }
>   else
> -   return "vmovdqa32\t{%1, %x0|%x0, %1}";
> +   {
> + if (which_alternative == 2)
> +   return "vmovdqa\t{%1, %x0|%x0, %1}";
> + 

[PATCH] x86: mark "k" and "Yk" constraints as non-internal

2019-06-27 Thread Jan Beulich
Without these constraints asm() can't make use of mask registers.

gcc/
2019-06-27  Jan Beulich  

* config/i386/constraints.md: Remove @internal from "k" and
"Yk".

--- a/gcc/config/i386/constraints.md
+++ b/gcc/config/i386/constraints.md
@@ -79,10 +79,10 @@
  "Second from top of 80387 floating-point stack (@code{%st(1)}).")
 
 (define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS"
-"@internal Any mask register that can be used as predicate, i.e. k1-k7.")
+"Any mask register that can be used as predicate, i.e. k1-k7.")
 
 (define_register_constraint "k" "TARGET_AVX512F ? ALL_MASK_REGS : NO_REGS"
-"@internal Any mask register.")
+"Any mask register.")
 
 ;; Vector registers (also used for plain floating point nowadays).
 (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS"




Re: [PATCH] x86: mark "k" and "Yk" constraints as non-internal

2019-06-27 Thread Jan Beulich
>>> On 27.06.19 at 12:11,  wrote:
> Without these constraints asm() can't make use of mask registers.

Similarly it is entirely unclear to me how to use e.g. v4fmaddps or
vp2intersectd in asm(): For the former the respective "Yh"
constraint was dropped (oddly enough leaving its comment line in
place), and there's nothing I can spot for a mask register pair.

Using (in an attempt to find alternative options)

typedef float __attribute__((vector_size(256))) v64sf_t;

results in "impossible constraint" when using an operand of this
type in asm(), while

typedef unsigned __attribute__((mode(P2HI))) kpair_t;

gives "no data type for mode" even without any actual use.

Jan




Re: [RFA][tree-optimization/90883] Improve DSE to handle redundant calls

2019-06-27 Thread Christophe Lyon
On Thu, 27 Jun 2019 at 04:23, Jeff Law  wrote:
>
> On 6/26/19 7:14 PM, Bill Schmidt wrote:
> > Looks like this patch breaks bootstrap.
> >
> > /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c: In function
> > 'void dse\
> > _optimize_redundant_stores(gimple*)':
> > /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:649:46: error:
> > ISO C++\
> >  forbids converting a string constant to 'char*' [-Werror=write-strings]
> >   649 |   delete_dead_or_redundant_assignment (&gsi, "redundant");
> >   |  ^~~
> > /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:651:40: error:
> > ISO C++\
> >  forbids converting a string constant to 'char*' [-Werror=write-strings]
> >   651 |   delete_dead_or_redundant_call (&gsi, "redundant");
> >   |^~~
> > /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c: In member
> > function 'v\
> > oid dse_dom_walker::dse_optimize_stmt(gimple_stmt_iterator*)':
> > /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:979:41: error:
> > ISO C++\
> >  forbids converting a string constant to 'char*' [-Werror=write-strings]
> >   979 | delete_dead_or_redundant_call (gsi, "dead");
> >   | ^~
> > /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:1007:39: error:
> > ISO C+\
> > + forbids converting a string constant to 'char*' [-Werror=write-strings]
> >  1007 |   delete_dead_or_redundant_call (gsi, "dead");
> >   |   ^~
> > /home3/wschmidt/gcc/gcc-mainline-base/gcc/tree-ssa-dse.c:1066:49: error:
> > ISO C+\
> > + forbids converting a string constant to 'char*' [-Werror=write-strings]
> >  1066 |   delete_dead_or_redundant_assignment (gsi, "dead");
> Strange as I know I've bootstrapped and tested.  I'll take care of it
>
> Thanks,
> jeff


Hi Jeff,

I've also noticed that
FAIL: g++.dg/tree-ssa/pr90883.C   scan-tree-dump dse1 "Deleted
redundant store: .*.a = {}"
on aarch64

Christophe


Re: [PATCH] x86: fix/improve vgf2p8affine*qb insns

2019-06-27 Thread Uros Bizjak
On Thu, Jun 27, 2019 at 10:57 AM Jan Beulich  wrote:
>
> - the affine transformations are not commutative (the two source
>   operands have entirely different meaning)
> - there's no need for three alternatives
> - the nonimmediate_operand/Bm combination can better be vector_operand/m
>
> gcc/
> 2019-06-27  Jan Beulich  
>
> * config/i386/sse.md (vgf2p8affineinvqb_,
> vgf2p8affineqb_): Drop % constraint modifier.
> Eliminate redundant alternative.  Use vector_operand plus "m"
> constraint.

Please just drop % modifier and use vector_operand here. IIRC,
register allocator operates on constraints, it doesn't care for
predicates. But predicates shouldn't be more constrained than
constraints. So, having "m" instead of "Bm" is a bad idea with
vector_operand.

Uros.
>
> gcc/testsuite/
> 2019-06-27  Jan Beulich  
>
> * gcc.target/i386/gfni-5.c: New.
>
> ---
> On top of this (in further patches) it looks like the Bm constraint could be
> dropped altogether. At the very least its combination with vector_operand is
> pointless, because both resolve to the same vector_memory_operand. Same goes
> for round{,_saeonly}_nimm_predicate, which too resolves to vector_operand.
>
> --- a/gcc/config/i386/sse.md
> +++ b/gcc/config/i386/sse.md
> @@ -22072,56 +22072,53 @@
>"vpopcnt\t{%1, %0|%0, %1}")
>
>  (define_insn "vgf2p8affineinvqb_"
> -  [(set (match_operand:VI1_AVX512F 0 "register_operand" "=x,x,v")
> +  [(set (match_operand:VI1_AVX512F 0 "register_operand" "=x,v")
> (unspec:VI1_AVX512F
> - [(match_operand:VI1_AVX512F 1 "register_operand" "%0,x,v")
> -  (match_operand:VI1_AVX512F 2 "nonimmediate_operand" "xBm,xm,vm")
> -  (match_operand:QI 3 "const_0_to_255_operand" "n,n,n")]
> + [(match_operand:VI1_AVX512F 1 "register_operand" "0,v")
> +  (match_operand:VI1_AVX512F 2 "vector_operand" "xm,vm")
> +  (match_operand:QI 3 "const_0_to_255_operand" "n,n")]
>   UNSPEC_GF2P8AFFINEINV))]
>"TARGET_GFNI"
>"@
> gf2p8affineinvqb\t{%3, %2, %0| %0, %2, %3}
> -   vgf2p8affineinvqb\t{%3, %2, %1, %0| %0, %1, 
> %2, %3}
> vgf2p8affineinvqb\t{%3, %2, %1, %0| %0, %1, 
> %2, %3}"
> -  [(set_attr "isa" "noavx,avx,avx512f")
> -   (set_attr "prefix_data16" "1,*,*")
> +  [(set_attr "isa" "noavx,avx")
> +   (set_attr "prefix_data16" "1,*")
> (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "orig,maybe_evex,evex")
> +   (set_attr "prefix" "orig,maybe_evex")
> (set_attr "mode" "")])
>
>  (define_insn "vgf2p8affineqb_"
> -  [(set (match_operand:VI1_AVX512F 0 "register_operand" "=x,x,v")
> +  [(set (match_operand:VI1_AVX512F 0 "register_operand" "=x,v")
> (unspec:VI1_AVX512F
> - [(match_operand:VI1_AVX512F 1 "register_operand" "%0,x,v")
> -  (match_operand:VI1_AVX512F 2 "nonimmediate_operand" "xBm,xm,vm")
> -  (match_operand:QI 3 "const_0_to_255_operand" "n,n,n")]
> + [(match_operand:VI1_AVX512F 1 "register_operand" "0,v")
> +  (match_operand:VI1_AVX512F 2 "vector_operand" "xm,vm")
> +  (match_operand:QI 3 "const_0_to_255_operand" "n,n")]
>   UNSPEC_GF2P8AFFINE))]
>"TARGET_GFNI"
>"@
> gf2p8affineqb\t{%3, %2, %0| %0, %2, %3}
> -   vgf2p8affineqb\t{%3, %2, %1, %0| %0, %1, 
> %2, %3}
> vgf2p8affineqb\t{%3, %2, %1, %0| %0, %1, 
> %2, %3}"
> -  [(set_attr "isa" "noavx,avx,avx512f")
> -   (set_attr "prefix_data16" "1,*,*")
> +  [(set_attr "isa" "noavx,avx")
> +   (set_attr "prefix_data16" "1,*")
> (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "orig,maybe_evex,evex")
> +   (set_attr "prefix" "orig,maybe_evex")
> (set_attr "mode" "")])
>
>  (define_insn "vgf2p8mulb_"
> -  [(set (match_operand:VI1_AVX512F 0 "register_operand" "=x,x,v")
> +  [(set (match_operand:VI1_AVX512F 0 "register_operand" "=x,v")
> (unspec:VI1_AVX512F
> - [(match_operand:VI1_AVX512F 1 "register_operand" "%0,x,v")
> -  (match_operand:VI1_AVX512F 2 "nonimmediate_operand" "xBm,xm,vm")]
> + [(match_operand:VI1_AVX512F 1 "register_operand" "%0,v")
> +  (match_operand:VI1_AVX512F 2 "vector_operand" "xm,vm")]
>   UNSPEC_GF2P8MUL))]
>"TARGET_GFNI"
>"@
> gf2p8mulb\t{%2, %0| %0, %2}
> -   vgf2p8mulb\t{%2, %1, %0| %0, %1, %2}
> vgf2p8mulb\t{%2, %1, %0| %0, %1, %2}"
> -  [(set_attr "isa" "noavx,avx,avx512f")
> -   (set_attr "prefix_data16" "1,*,*")
> +  [(set_attr "isa" "noavx,avx")
> +   (set_attr "prefix_data16" "1,*")
> (set_attr "prefix_extra" "1")
> -   (set_attr "prefix" "orig,maybe_evex,evex")
> +   (set_attr "prefix" "orig,maybe_evex")
> (set_attr "mode" "")])
>
>  (define_insn "vpshrd_"
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/i386/gfni-5.c
> @@ -0,0 +1,19 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -msse2 -mgfni" } */
> +
> +typedef char __attribute__((vector_size(16))) v16qi_t;
> +
> +v16qi_t test16a (v16qi_t x, v16qi_t a)
> +{
> +  asm volatile ("" : "+m" (a));
> +  retur

Re: Use ODR for canonical types construction in LTO

2019-06-27 Thread Richard Biener
On Thu, 27 Jun 2019, Jan Hubicka wrote:

> Hi,
> this is updated patch for ODR based canonical type calculation.  It makes use
> of TYPE_CXX_ODR_P instead of doing the guesswork and while thinking how to get
> rid of the quadratic behaviour of the hash I noticed that all the logic fits
> quite naturally into gimple_register_canonical_type_1.
> 
> We only need to arrange that all non-ODR types gets to hash before we has
> TYPE_CXX_ODR_P and since hash functions recursively populate the type hash
> we know that subtypes with linkage are processed before types.
> 
> From the WPA stats we now get relatively few non-ODR types building cc1plus
> [WPA] Compared 1062055 SCCs, 890518 collisions (0.838486)
> [WPA] Merged 1059104 SCCs
> [WPA] Merged 6025225 tree bodies
> [WPA] Merged 639655 types
> [WPA] 101515 types prevailed (177264 associated trees)
> [WPA] GIMPLE canonical type table: size 16381, 262 elements, 5685 searches, 
> 48 collisions (ratio: 0.008443)
> [WPA] GIMPLE canonical type pointer-map: 3548 elements, 14261 searches
> 
> So 262 distinct types compared to 1590 which is an effect of Jason's patch.
> There are 3286 ODR types having type canonical set by name and 910 have
> conflict to non-ODR type.
> 
> I am still waiting for the statistics of alias oracle, but for tramp3d there
> are no significant changes compared to the first patch.
> 
> lto-bootstrapped/regtested x86_64-linux, OK?
> 
>   * class.c (layout_class_type): Set TYPE_CXX_ODR_P for as-base
>   type copy.
> 
>   * ipa-devirt.c (odr_type_d): Add tbaa_enabled flag.
>   (add_type_duplicate): When odr hash is not allocated, to nothing.
>   (odr_based_tbaa_p): New function.
>   (set_type_canonical_for_odr_type): New function.
>   * ipa-utils.h (enable_odr_based_tbaa, odr_based_tbaa_p,
>   set_type_canonical_for_odr_type): New.
>   * lto-common.c: Include demangle.h and tree-pretty-print.h
>   (type_streaming_finished): New static var.
>   (gimple_register_canonical_type_1): Return updated hash; handle ODR
>   types.
>   (iterative_hash_canonical_type): Update use of
>   gimple_register_canonical_type_1.
>   * tree.c (gimple_canonical_types_compatible_p): ODR types with
>   ODR based TBAA are not equivalent to non-ODR types.
> 
>   * g++.dg/lto/alias-2_0.C: New testcase.
>   * g++.dg/lto/alias-2_1.C: New testcase.
> Index: cp/class.c
> ===
> --- cp/class.c(revision 272656)
> +++ cp/class.c(working copy)
> @@ -6395,6 +6395,7 @@ layout_class_type (tree t, tree *virtual
>SET_TYPE_ALIGN (base_t, rli->record_align);
>TYPE_USER_ALIGN (base_t) = TYPE_USER_ALIGN (t);
>TYPE_TYPELESS_STORAGE (base_t) = TYPE_TYPELESS_STORAGE (t);
> +  TYPE_CXX_ODR_P (base_t) = true;

 = TYPE_CXX_ODR_P (t);

would be much more obvious here.

>/* Copy the non-static data members of T. This will include its
>direct non-virtual bases & vtable.  */
> Index: ipa-devirt.c
> ===
> --- ipa-devirt.c  (revision 272656)
> +++ ipa-devirt.c  (working copy)
> @@ -213,6 +213,8 @@ struct GTY(()) odr_type_d
>bool odr_violated;
>/* Set when virtual table without RTTI previaled table with.  */
>bool rtti_broken;
> +  /* Set when the canonical type is determined using the type name.  */
> +  bool tbaa_enabled;
>  };
>  
>  /* Return TRUE if all derived types of T are known and thus
> @@ -1610,6 +1612,9 @@ add_type_duplicate (odr_type val, tree t
>  
>val->types_set->add (type);
>  
> +  if (!odr_hash)
> +return NULL;
> +
>gcc_checking_assert (can_be_name_hashed_p (type)
>  && can_be_name_hashed_p (val->type));
>  
> @@ -1989,6 +1994,46 @@ prevailing_odr_type (tree type)
>return t->type;
>  }
>  
> +/* Set tbaa_enabled flag for TYPE.  */
> +
> +void
> +enable_odr_based_tbaa (tree type)
> +{
> +  odr_type t = get_odr_type (type, true);
> +  t->tbaa_enabled = true;
> +}
> +
> +/* True if canonical type of TYPE is determined using ODR name.  */
> +
> +bool
> +odr_based_tbaa_p (const_tree type)
> +{
> +  if (!RECORD_OR_UNION_TYPE_P (type))
> +return false;
> +  odr_type t = get_odr_type (const_cast  (type), false);
> +  if (!t || !t->tbaa_enabled)
> +return false;
> +  return true;
> +}
> +
> +/* Set TYPE_CANONICAL of type and all its variants and duplicates
> +   to CANONICAL.  */
> +
> +void
> +set_type_canonical_for_odr_type (tree type, tree canonical)
> +{
> +  odr_type t = get_odr_type (type, false);
> +  unsigned int i;
> +  tree tt;
> +
> +  for (tree t2 = t->type; t2; t2 = TYPE_NEXT_VARIANT (t2))
> +TYPE_CANONICAL (t2) = canonical;
> +  if (t->types)
> +FOR_EACH_VEC_ELT (*t->types, i, tt)
> +  for (tree t2 = tt; t2; t2 = TYPE_NEXT_VARIANT (t2))
> +TYPE_CANONICAL (t2) = canonical;
> +}
> +
>  /* Return true if we reported some ODR violation on TYPE.  */

Re: Use ODR for canonical types construction in LTO

2019-06-27 Thread Jan Hubicka
> > +  if (RECORD_OR_UNION_TYPE_P (t)
> > +  && odr_type_p (t) && !odr_type_violation_reported_p (t))
> > +{
> > +  /* Here we rely on fact that all non-ODR types was inserted into
> > +canonical type hash and thus we can safely detect conflicts between
> > +ODR types and interoperable non-ODR types.  */
> > +  gcc_checking_assert (type_streaming_finished
> > +  && TYPE_MAIN_VARIANT (t) == t);
> > +  slot = htab_find_slot_with_hash (gimple_canonical_types, t, hash,
> > +  NO_INSERT);
> > +  if (slot && !TYPE_CXX_ODR_P (*(tree *)slot))
> > +   {
> > + tree nonodr = *(tree *)slot;
> > + if (symtab->dump_file)
> > +   {
> > + char *name = cplus_demangle_v3
> > +(IDENTIFIER_POINTER
> > +  (DECL_ASSEMBLER_NAME (TYPE_NAME (t))), 0);
> 
> Ugh - do we really want to demangle here?  I think not.  Surely
> not without -details or with -slim.  Readers can c++filt this
> easily.

OK, i can dump mangled name. 
Just print_generic_expr prints all instantiations of the tempalte same
way which makes it look odd that there are so many types of same name.
> > + /* Set canonical for T and all other ODR equivalent duplicates
> > +including incomplete structures.  */
> > + set_type_canonical_for_odr_type (t, prevail);
> > + enable_odr_based_tbaa (t);
> 
> I suppose there never will be a set of ODR types with the same
> prevailing type but some of them having a conflict with a nonodr
> type and some not?

No, we check them by structural equality while verifying ODR violations
that is more strict than gimple_canonical_types_compatible.
> 
> > + if (!type_in_anonymous_namespace_p (t))
> > +   hash = htab_hash_string (IDENTIFIER_POINTER
> > +  (DECL_ASSEMBLER_NAME
> > +  (TYPE_NAME (prevail;
> > + else
> > +   hash = TYPE_UID (TYPE_MAIN_VARIANT (t));
> > + num_canonical_type_hash_entries++;
> > + bool existed_p = canonical_type_hash_cache->put (prevail, hash);
> 
> but those hash differently?  I think you wanted to put t, not prevail
> here.  And you want to use the TYPE_UID of prevail as well?

Actually this was intended to be prevail in both cases, but it is
harmless.  The difference here is that anonymous types have
DECL_ASSEMBLED_NAME , so if we inserted them to the hash
table based on names they will all conflict.

Anonymous namespace types never have duplicated main variants, so
t==prevail here, but I will update the patch to use prevail since it is
much more obvious (I am also not sure what I was thinking of when
I typed TYPE_MAIN_VARIANT)
> 
> Otherwise looks good.
> 
> You can commit the C++ FE change with the adjustment in case it
> fixes the reported verification ICEs.

It only fixes ICE WRT the sanity checking that all non-ODR types
are inserted first.  W/o that change the as-base type will get inserted
during streaming in and recursively we will insert ODR types early.

I am waiting for testcase WRT the other ODR ICE reported today. I think
tree.c when creating type variants wants to copy the flag: the wrong
type is va_list which is likely created by a backend bypassing the C++
hook.

I will re-test with these changes.

Honza


Re: Use ODR for canonical types construction in LTO

2019-06-27 Thread Richard Biener
On Thu, 27 Jun 2019, Jan Hubicka wrote:

> > > +  if (RECORD_OR_UNION_TYPE_P (t)
> > > +  && odr_type_p (t) && !odr_type_violation_reported_p (t))
> > > +{
> > > +  /* Here we rely on fact that all non-ODR types was inserted into
> > > +  canonical type hash and thus we can safely detect conflicts between
> > > +  ODR types and interoperable non-ODR types.  */
> > > +  gcc_checking_assert (type_streaming_finished
> > > +&& TYPE_MAIN_VARIANT (t) == t);
> > > +  slot = htab_find_slot_with_hash (gimple_canonical_types, t, hash,
> > > +NO_INSERT);
> > > +  if (slot && !TYPE_CXX_ODR_P (*(tree *)slot))
> > > + {
> > > +   tree nonodr = *(tree *)slot;
> > > +   if (symtab->dump_file)
> > > + {
> > > +   char *name = cplus_demangle_v3
> > > +  (IDENTIFIER_POINTER
> > > +(DECL_ASSEMBLER_NAME (TYPE_NAME (t))), 0);
> > 
> > Ugh - do we really want to demangle here?  I think not.  Surely
> > not without -details or with -slim.  Readers can c++filt this
> > easily.
> 
> OK, i can dump mangled name. 
> Just print_generic_expr prints all instantiations of the tempalte same
> way which makes it look odd that there are so many types of same name.
> > > +   /* Set canonical for T and all other ODR equivalent duplicates
> > > +  including incomplete structures.  */
> > > +   set_type_canonical_for_odr_type (t, prevail);
> > > +   enable_odr_based_tbaa (t);
> > 
> > I suppose there never will be a set of ODR types with the same
> > prevailing type but some of them having a conflict with a nonodr
> > type and some not?
> 
> No, we check them by structural equality while verifying ODR violations
> that is more strict than gimple_canonical_types_compatible.
> > 
> > > +   if (!type_in_anonymous_namespace_p (t))
> > > + hash = htab_hash_string (IDENTIFIER_POINTER
> > > +(DECL_ASSEMBLER_NAME
> > > +(TYPE_NAME (prevail;
> > > +   else
> > > + hash = TYPE_UID (TYPE_MAIN_VARIANT (t));
> > > +   num_canonical_type_hash_entries++;
> > > +   bool existed_p = canonical_type_hash_cache->put (prevail, hash);
> > 
> > but those hash differently?  I think you wanted to put t, not prevail
> > here.  And you want to use the TYPE_UID of prevail as well?
> 
> Actually this was intended to be prevail in both cases, but it is
> harmless.  The difference here is that anonymous types have
> DECL_ASSEMBLED_NAME , so if we inserted them to the hash
> table based on names they will all conflict.
> 
> Anonymous namespace types never have duplicated main variants, so
> t==prevail here, but I will update the patch to use prevail since it is
> much more obvious (I am also not sure what I was thinking of when
> I typed TYPE_MAIN_VARIANT)

I think using 't' is more obvious here since that's what we should cache
the hash value for.  That is, even in the !type_in_anonymous_namespace_p 
(t) case you want to cache 't's hash, not prevails.  But yes,
TYPE_UID (prevail) might be more obvious to you.

> > 
> > Otherwise looks good.
> > 
> > You can commit the C++ FE change with the adjustment in case it
> > fixes the reported verification ICEs.
> 
> It only fixes ICE WRT the sanity checking that all non-ODR types
> are inserted first.  W/o that change the as-base type will get inserted
> during streaming in and recursively we will insert ODR types early.
> 
> I am waiting for testcase WRT the other ODR ICE reported today. I think
> tree.c when creating type variants wants to copy the flag: the wrong
> type is va_list which is likely created by a backend bypassing the C++
> hook.
> 
> I will re-test with these changes.
> 
> Honza
> 

-- 
Richard Biener 
SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)

Re: Use ODR for canonical types construction in LTO

2019-06-27 Thread Jan Hubicka
> > Actually this was intended to be prevail in both cases, but it is
> > harmless.  The difference here is that anonymous types have
> > DECL_ASSEMBLED_NAME , so if we inserted them to the hash
> > table based on names they will all conflict.
> > 
> > Anonymous namespace types never have duplicated main variants, so
> > t==prevail here, but I will update the patch to use prevail since it is
> > much more obvious (I am also not sure what I was thinking of when
> > I typed TYPE_MAIN_VARIANT)
> 
> I think using 't' is more obvious here since that's what we should cache
> the hash value for.  That is, even in the !type_in_anonymous_namespace_p 
> (t) case you want to cache 't's hash, not prevails.  But yes,
> TYPE_UID (prevail) might be more obvious to you.

OK, I will use t on both: for !anonoymous_namespace types the names are
the same.

Honza
> 
> > > 
> > > Otherwise looks good.
> > > 
> > > You can commit the C++ FE change with the adjustment in case it
> > > fixes the reported verification ICEs.
> > 
> > It only fixes ICE WRT the sanity checking that all non-ODR types
> > are inserted first.  W/o that change the as-base type will get inserted
> > during streaming in and recursively we will insert ODR types early.
> > 
> > I am waiting for testcase WRT the other ODR ICE reported today. I think
> > tree.c when creating type variants wants to copy the flag: the wrong
> > type is va_list which is likely created by a backend bypassing the C++
> > hook.
> > 
> > I will re-test with these changes.
> > 
> > Honza
> > 
> 
> -- 
> Richard Biener 
> SUSE Linux GmbH, Maxfeldstrasse 5, 90409 Nuernberg, Germany;
> GF: Felix Imendörffer, Mary Higgins, Sri Rasiah; HRB 21284 (AG Nürnberg)



Re: [PATCH] x86: fix/improve vgf2p8affine*qb insns

2019-06-27 Thread Jan Beulich
>>> On 27.06.19 at 12:20,  wrote:
> On Thu, Jun 27, 2019 at 10:57 AM Jan Beulich  wrote:
>>
>> - the affine transformations are not commutative (the two source
>>   operands have entirely different meaning)
>> - there's no need for three alternatives
>> - the nonimmediate_operand/Bm combination can better be vector_operand/m
>>
>> gcc/
>> 2019-06-27  Jan Beulich  
>>
>> * config/i386/sse.md (vgf2p8affineinvqb_,
>> vgf2p8affineqb_): Drop % constraint modifier.
>> Eliminate redundant alternative.  Use vector_operand plus "m"
>> constraint.
> 
> Please just drop % modifier and use vector_operand here. IIRC,
> register allocator operates on constraints, it doesn't care for
> predicates. But predicates shouldn't be more constrained than
> constraints. So, having "m" instead of "Bm" is a bad idea with
> vector_operand.

Well, putting back the Bm is easy (if that's really needed). But do
you also mean me to put back to redundant 3rd alternative?

Jan




Re: [PATCH] x86: fix/improve vgf2p8affine*qb insns

2019-06-27 Thread Uros Bizjak
On Thu, Jun 27, 2019 at 12:49 PM Jan Beulich  wrote:
>
> >>> On 27.06.19 at 12:20,  wrote:
> > On Thu, Jun 27, 2019 at 10:57 AM Jan Beulich  wrote:
> >>
> >> - the affine transformations are not commutative (the two source
> >>   operands have entirely different meaning)
> >> - there's no need for three alternatives
> >> - the nonimmediate_operand/Bm combination can better be vector_operand/m
> >>
> >> gcc/
> >> 2019-06-27  Jan Beulich  
> >>
> >> * config/i386/sse.md (vgf2p8affineinvqb_,
> >> vgf2p8affineqb_): Drop % constraint modifier.
> >> Eliminate redundant alternative.  Use vector_operand plus "m"
> >> constraint.
> >
> > Please just drop % modifier and use vector_operand here. IIRC,
> > register allocator operates on constraints, it doesn't care for
> > predicates. But predicates shouldn't be more constrained than
> > constraints. So, having "m" instead of "Bm" is a bad idea with
> > vector_operand.
>
> Well, putting back the Bm is easy (if that's really needed). But do
> you also mean me to put back to redundant 3rd alternative?

It is not redundant, "x" and "v" are different register constraints.

Uros.


Re: Use ODR for canonical types construction in LTO

2019-06-27 Thread Jan Hubicka
Hi,
here is update patch I am re-testing. Ok if it suceeds?

Orace quary stats finished in meantime.

Alias oracle query stats:
  refs_may_alias_p: 39232255 disambiguations, 47436580 queries
  ref_maybe_used_by_call_p: 59801 disambiguations, 39811399 queries
  call_may_clobber_ref_p: 5967 disambiguations, 9009 queries
  nonoverlapping_component_refs_p: 10184 disambiguations, 754231 queries
  nonoverlapping_component_refs_of_decl_p: 6488 disambiguations, 214611 queries
  aliasing_component_refs_p: 88820 disambiguations, 291741 queries
  TBAA oracle: 11687911 disambiguations 34188770 queries
   13343414 are in alias set 0
   5417962 queries asked about the same object
   148 queries asked about the same alias set
   0 access volatile
   1672142 are dependent in the DAG
   2067193 are aritificially in conflict with void *

PTA query stats:
  pt_solution_includes: 502249 disambiguations, 7180660 queries
  pt_solutions_intersect: 357826 disambiguations, 7059723 queries

So there is no big change in TBAA effectivity compared to previous patch.
There is however nice improvements in aliasing_component_refs_p 
61523 -> 88820 which I think is due to Jason's change

Honza

Index: cp/class.c
===
--- cp/class.c  (revision 272732)
+++ cp/class.c  (working copy)
@@ -6395,6 +6395,7 @@ layout_class_type (tree t, tree *virtual
   SET_TYPE_ALIGN (base_t, rli->record_align);
   TYPE_USER_ALIGN (base_t) = TYPE_USER_ALIGN (t);
   TYPE_TYPELESS_STORAGE (base_t) = TYPE_TYPELESS_STORAGE (t);
+  TYPE_CXX_ODR_P (base_t) = TYPE_CXX_ODR_P (t);
 
   /* Copy the non-static data members of T. This will include its
 direct non-virtual bases & vtable.  */
Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 272732)
+++ ipa-devirt.c(working copy)
@@ -213,6 +213,8 @@ struct GTY(()) odr_type_d
   bool odr_violated;
   /* Set when virtual table without RTTI previaled table with.  */
   bool rtti_broken;
+  /* Set when the canonical type is determined using the type name.  */
+  bool tbaa_enabled;
 };
 
 /* Return TRUE if all derived types of T are known and thus
@@ -1610,6 +1612,9 @@ add_type_duplicate (odr_type val, tree t
 
   val->types_set->add (type);
 
+  if (!odr_hash)
+return NULL;
+
   gcc_checking_assert (can_be_name_hashed_p (type)
   && can_be_name_hashed_p (val->type));
 
@@ -1989,6 +1994,46 @@ prevailing_odr_type (tree type)
   return t->type;
 }
 
+/* Set tbaa_enabled flag for TYPE.  */
+
+void
+enable_odr_based_tbaa (tree type)
+{
+  odr_type t = get_odr_type (type, true);
+  t->tbaa_enabled = true;
+}
+
+/* True if canonical type of TYPE is determined using ODR name.  */
+
+bool
+odr_based_tbaa_p (const_tree type)
+{
+  if (!RECORD_OR_UNION_TYPE_P (type))
+return false;
+  odr_type t = get_odr_type (const_cast  (type), false);
+  if (!t || !t->tbaa_enabled)
+return false;
+  return true;
+}
+
+/* Set TYPE_CANONICAL of type and all its variants and duplicates
+   to CANONICAL.  */
+
+void
+set_type_canonical_for_odr_type (tree type, tree canonical)
+{
+  odr_type t = get_odr_type (type, false);
+  unsigned int i;
+  tree tt;
+
+  for (tree t2 = t->type; t2; t2 = TYPE_NEXT_VARIANT (t2))
+TYPE_CANONICAL (t2) = canonical;
+  if (t->types)
+FOR_EACH_VEC_ELT (*t->types, i, tt)
+  for (tree t2 = tt; t2; t2 = TYPE_NEXT_VARIANT (t2))
+TYPE_CANONICAL (t2) = canonical;
+}
+
 /* Return true if we reported some ODR violation on TYPE.  */
 
 bool
Index: ipa-utils.h
===
--- ipa-utils.h (revision 272732)
+++ ipa-utils.h (working copy)
@@ -93,6 +93,9 @@ bool odr_or_derived_type_p (const_tree t
 bool odr_types_equivalent_p (tree type1, tree type2);
 bool odr_type_violation_reported_p (tree type);
 tree prevailing_odr_type (tree type);
+void enable_odr_based_tbaa (tree type);
+bool odr_based_tbaa_p (const_tree type);
+void set_type_canonical_for_odr_type (tree type, tree canonical);
 
 /* Return vector containing possible targets of polymorphic call E.
If COMPLETEP is non-NULL, store true if the list is complete. 
Index: lto/lto-common.c
===
--- lto/lto-common.c(revision 272732)
+++ lto/lto-common.c(working copy)
@@ -1,5 +1,5 @@
 /* Top-level LTO routines.
-   Copyright (C) 2009-2018 Free Software Foundation, Inc.
+   Copyright (C) 2009-2019 Free Software Foundation, Inc.
Contributed by CodeSourcery, Inc.
 
 This file is part of GCC.
@@ -56,6 +56,11 @@ along with GCC; see the file COPYING3.
 #include "attribs.h"
 #include "builtins.h"
 #include "lto-common.h"
+#include "tree-pretty-print.h"
+
+/* True when no new types are going to be streamd from the global stream.  */
+
+static bool type_streaming_finished = false;

Re: [PATCH] x86: mark "k" and "Yk" constraints as non-internal

2019-06-27 Thread Uros Bizjak
On Thu, Jun 27, 2019 at 12:11 PM Jan Beulich  wrote:
>
> Without these constraints asm() can't make use of mask registers.

asm should be deprecated. We have intrinsics for this purpose.

Uros.

> gcc/
> 2019-06-27  Jan Beulich  
>
> * config/i386/constraints.md: Remove @internal from "k" and
> "Yk".
>
> --- a/gcc/config/i386/constraints.md
> +++ b/gcc/config/i386/constraints.md
> @@ -79,10 +79,10 @@
>   "Second from top of 80387 floating-point stack (@code{%st(1)}).")
>
>  (define_register_constraint "Yk" "TARGET_AVX512F ? MASK_REGS : NO_REGS"
> -"@internal Any mask register that can be used as predicate, i.e. k1-k7.")
> +"Any mask register that can be used as predicate, i.e. k1-k7.")
>
>  (define_register_constraint "k" "TARGET_AVX512F ? ALL_MASK_REGS : NO_REGS"
> -"@internal Any mask register.")
> +"Any mask register.")
>
>  ;; Vector registers (also used for plain floating point nowadays).
>  (define_register_constraint "y" "TARGET_MMX ? MMX_REGS : NO_REGS"
>
>


Re: Use ODR for canonical types construction in LTO

2019-06-27 Thread Richard Biener
On Thu, 27 Jun 2019, Jan Hubicka wrote:

> Hi,
> here is update patch I am re-testing. Ok if it suceeds?

+ if (!type_in_anonymous_namespace_p (t))
+   hash = htab_hash_string (IDENTIFIER_POINTER
+  (DECL_ASSEMBLER_NAME
+  (TYPE_NAME (t;
+ else
+   hash = TYPE_UID (t);
+ num_canonical_type_hash_entries++;
+ bool existed_p = canonical_type_hash_cache->put (prevail, hash);

but you still cache the hash for prevail rather than t while we
are eventually asking for the hash of t again?  I suppose not since
we just adjusted TYPE_CANONICAL and we're then supposed to pick
the hash from the canonical type?  I think this deserves a comment.
And it should use 'prevail' then everywhere here for consistency,
otherwise it really looks odd.

> @@ -357,7 +367,7 @@ iterative_hash_canonical_type (tree type  
>optimal order.  To avoid quadratic behavior also register the
>type here.  */
>v = hash_canonical_type (type);
> -  gimple_register_canonical_type_1 (type, v);
> +  v = gimple_register_canonical_type_1 (type, v);
>  }
>hstate.add_int (v);
>  }

Just noticed this should be

hstate.merge_hash (v);

results in the very same effect but is clearer IMHO.

Richard.


> Orace quary stats finished in meantime.
> 
> Alias oracle query stats:
>   refs_may_alias_p: 39232255 disambiguations, 47436580 queries
>   ref_maybe_used_by_call_p: 59801 disambiguations, 39811399 queries
>   call_may_clobber_ref_p: 5967 disambiguations, 9009 queries
>   nonoverlapping_component_refs_p: 10184 disambiguations, 754231 queries
>   nonoverlapping_component_refs_of_decl_p: 6488 disambiguations, 214611 
> queries
>   aliasing_component_refs_p: 88820 disambiguations, 291741 queries
>   TBAA oracle: 11687911 disambiguations 34188770 queries
>13343414 are in alias set 0
>5417962 queries asked about the same object
>148 queries asked about the same alias set
>0 access volatile
>1672142 are dependent in the DAG
>2067193 are aritificially in conflict with void *
> 
> PTA query stats:
>   pt_solution_includes: 502249 disambiguations, 7180660 queries
>   pt_solutions_intersect: 357826 disambiguations, 7059723 queries
> 
> So there is no big change in TBAA effectivity compared to previous patch.
> There is however nice improvements in aliasing_component_refs_p 
> 61523 -> 88820 which I think is due to Jason's change
> 
> Honza
> 
> Index: cp/class.c
> ===
> --- cp/class.c(revision 272732)
> +++ cp/class.c(working copy)
> @@ -6395,6 +6395,7 @@ layout_class_type (tree t, tree *virtual
>SET_TYPE_ALIGN (base_t, rli->record_align);
>TYPE_USER_ALIGN (base_t) = TYPE_USER_ALIGN (t);
>TYPE_TYPELESS_STORAGE (base_t) = TYPE_TYPELESS_STORAGE (t);
> +  TYPE_CXX_ODR_P (base_t) = TYPE_CXX_ODR_P (t);
>  
>/* Copy the non-static data members of T. This will include its
>direct non-virtual bases & vtable.  */
> Index: ipa-devirt.c
> ===
> --- ipa-devirt.c  (revision 272732)
> +++ ipa-devirt.c  (working copy)
> @@ -213,6 +213,8 @@ struct GTY(()) odr_type_d
>bool odr_violated;
>/* Set when virtual table without RTTI previaled table with.  */
>bool rtti_broken;
> +  /* Set when the canonical type is determined using the type name.  */
> +  bool tbaa_enabled;
>  };
>  
>  /* Return TRUE if all derived types of T are known and thus
> @@ -1610,6 +1612,9 @@ add_type_duplicate (odr_type val, tree t
>  
>val->types_set->add (type);
>  
> +  if (!odr_hash)
> +return NULL;
> +
>gcc_checking_assert (can_be_name_hashed_p (type)
>  && can_be_name_hashed_p (val->type));
>  
> @@ -1989,6 +1994,46 @@ prevailing_odr_type (tree type)
>return t->type;
>  }
>  
> +/* Set tbaa_enabled flag for TYPE.  */
> +
> +void
> +enable_odr_based_tbaa (tree type)
> +{
> +  odr_type t = get_odr_type (type, true);
> +  t->tbaa_enabled = true;
> +}
> +
> +/* True if canonical type of TYPE is determined using ODR name.  */
> +
> +bool
> +odr_based_tbaa_p (const_tree type)
> +{
> +  if (!RECORD_OR_UNION_TYPE_P (type))
> +return false;
> +  odr_type t = get_odr_type (const_cast  (type), false);
> +  if (!t || !t->tbaa_enabled)
> +return false;
> +  return true;
> +}
> +
> +/* Set TYPE_CANONICAL of type and all its variants and duplicates
> +   to CANONICAL.  */
> +
> +void
> +set_type_canonical_for_odr_type (tree type, tree canonical)
> +{
> +  odr_type t = get_odr_type (type, false);
> +  unsigned int i;
> +  tree tt;
> +
> +  for (tree t2 = t->type; t2; t2 = TYPE_NEXT_VARIANT (t2))
> +TYPE_CANONICAL (t2) = canonical;
> +  if (t->types)
> +FOR_EACH_VEC_ELT (*t->types, i, tt)
> +  

[PATCH] Fix various issues seen with clang-static-analyzer.

2019-06-27 Thread Martin Liška
Hi.

The patch addressed couple of issues that are explained and I should
have permission to install the patch.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Thanks,
Martin

gcc/ChangeLog:

2019-06-27  Martin Liska  

PR tree-optimization/90974
PR rtl-optimization/90975
PR rtl-optimization/90976
PR target/91016
PR tree-optimization/91017
* config/i386/i386-expand.c (ix86_expand_rounddf_32): Remove
unused tmp.
* lra.c (lra_set_insn_recog_data): Remove a leftover from
initial commit of IRA.
* optabs.c (expand_twoval_binop): Use xop0 and xop1 instead
of op0 and op1.
* tree-vect-loop.c (vect_create_epilog_for_reduction):
Remove unused mode1.
* tree-vect-stmts.c (vectorizable_call): Remove dead assignment
to new_stmt_info.
---
 gcc/config/i386/i386-expand.c | 5 ++---
 gcc/lra.c | 8 ++--
 gcc/optabs.c  | 4 ++--
 gcc/tree-vect-loop.c  | 1 -
 gcc/tree-vect-stmts.c | 6 ++
 5 files changed, 8 insertions(+), 16 deletions(-)


diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index d50b811d863..8a4955f87d2 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -16052,14 +16052,13 @@ ix86_expand_rounddf_32 (rtx operand0, rtx operand1)
 			   0, OPTAB_DIRECT);
 
   /* Compensate.  */
-  tmp = gen_reg_rtx (mode);
   /* xa2 = xa2 - (dxa > 0.5 ? 1 : 0) */
   tmp = ix86_expand_sse_compare_mask (UNGT, dxa, half, false);
-  emit_insn (gen_rtx_SET (tmp, gen_rtx_AND (mode, one, tmp)));
+  emit_insn (gen_rtx_SET (tmp, gen_rtx_AND (mode, tmp, one)));
   xa2 = expand_simple_binop (mode, MINUS, xa2, tmp, NULL_RTX, 0, OPTAB_DIRECT);
   /* xa2 = xa2 + (dxa <= -0.5 ? 1 : 0) */
   tmp = ix86_expand_sse_compare_mask (UNGE, mhalf, dxa, false);
-  emit_insn (gen_rtx_SET (tmp, gen_rtx_AND (mode, one, tmp)));
+  emit_insn (gen_rtx_SET (tmp, gen_rtx_AND (mode, tmp, one)));
   xa2 = expand_simple_binop (mode, PLUS, xa2, tmp, NULL_RTX, 0, OPTAB_DIRECT);
 
   /* res = copysign (xa2, operand1) */
diff --git a/gcc/lra.c b/gcc/lra.c
index bef2f676a78..982a3cc630b 100644
--- a/gcc/lra.c
+++ b/gcc/lra.c
@@ -1029,12 +1029,8 @@ lra_set_insn_recog_data (rtx_insn *insn)
 			   data->operand_loc,
 			   constraints, operand_mode, NULL);
 	  if (nop > 0)
-	{
-	  const char *p =  recog_data.constraints[0];
-
-	  for (p =	constraints[0]; *p; p++)
-		nalt += *p == ',';
-	}
+	for (const char *p =constraints[0]; *p; p++)
+	  nalt += *p == ',';
 	  data->insn_static_data = insn_static_data
 	= get_static_insn_data (-1, nop, 0, nalt);
 	  for (i = 0; i < nop; i++)
diff --git a/gcc/optabs.c b/gcc/optabs.c
index 5a718e7f635..18ca7370917 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -2095,8 +2095,8 @@ expand_twoval_binop (optab binoptab, rtx op0, rtx op1, rtx targ0, rtx targ1,
   xop1 = avoid_expensive_constant (mode1, binoptab, 1, xop1, unsignedp);
 
   create_fixed_operand (&ops[0], targ0);
-  create_convert_operand_from (&ops[1], op0, mode, unsignedp);
-  create_convert_operand_from (&ops[2], op1, mode, unsignedp);
+  create_convert_operand_from (&ops[1], xop0, mode, unsignedp);
+  create_convert_operand_from (&ops[2], xop1, mode, unsignedp);
   create_fixed_operand (&ops[3], targ1);
   if (maybe_expand_insn (icode, 4, ops))
 	return 1;
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index b37bf6f427d..a6b0e5c5fdf 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -5431,7 +5431,6 @@ vect_create_epilog_for_reduction (vec vect_defs,
 dump_printf_loc (MSG_NOTE, vect_location,
 			 "Reduce using vector shifts\n");
 
-	  mode1 = TYPE_MODE (vectype1);
   vec_dest = vect_create_destination_var (scalar_dest, vectype1);
   for (elt_offset = nelements / 2;
elt_offset >= 1;
diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
index 47da2953dc7..415ac0c8679 100644
--- a/gcc/tree-vect-stmts.c
+++ b/gcc/tree-vect-stmts.c
@@ -3483,8 +3483,7 @@ vectorizable_call (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
 			= gimple_build_call_internal_vec (ifn, vargs);
 		  gimple_call_set_lhs (call, half_res);
 		  gimple_call_set_nothrow (call, true);
-		  new_stmt_info
-			= vect_finish_stmt_generation (stmt_info, call, gsi);
+		  vect_finish_stmt_generation (stmt_info, call, gsi);
 		  if ((i & 1) == 0)
 			{
 			  prev_res = half_res;
@@ -3583,8 +3582,7 @@ vectorizable_call (stmt_vec_info stmt_info, gimple_stmt_iterator *gsi,
 	  gcall *call = gimple_build_call_internal_vec (ifn, vargs);
 	  gimple_call_set_lhs (call, half_res);
 	  gimple_call_set_nothrow (call, true);
-	  new_stmt_info
-		= vect_finish_stmt_generation (stmt_info, call, gsi);
+	  vect_finish_stmt_generation (stmt_info, call, gsi);
 	  if ((j & 1) == 0)
 		{
 		  pre

[PATCH] Add late non-iterating FRE with optimize > 1

2019-06-27 Thread Richard Biener


This fixes FREs handling of TARGET_MEM_REF (it didn't consider
&TARGET_MEM_REF) and adds a late FRE pass which has iteration
disabled and runs only at -O[2s]+ to limit the compile-time
impact.

This helps cases where unrolling and vectorization exposes
"piecewise" redundancies DOM cannot handle.  Thus

 (vector *)&a = { 1, 2, 3, 4 };
 .. = a[2];

there's still the opposite case not handled (PR83518) but
I will see whether I can make it work without too much cost:

 a[0] = 1;
 a[1] = 2;
 a[2] = 3;
 a[3] = 4;
 ... = (vector *)&a;

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

I'll commit the TARGET_MEM_REF fixing indepenently.

Any comments?  I'm not sure I like globbing the iteration
parameter and the optimize > 1 check; maybe I should simply
rename it to 'late' ...

The compile-time impact might be non-trivial for those testcases
that run into a large overhead from the alias-stmt walking but
I didn't do any measurements yet.

Thanks,
Richard.

2019-06-27  Richard Biener  

* tree-ssa-sccvn.c (class pass_fre): Add may_iterate
pass parameter.
(pass_fre::execute): Honor it.
* passes.def: Adjust pass_fre invocations to allow iterating,
add non-iterating pass_fre before late threading/dom.

Index: gcc/tree-ssa-sccvn.c
===
--- gcc/tree-ssa-sccvn.c(revision 272742)
+++ gcc/tree-ssa-sccvn.c(working copy)
@@ -791,39 +791,6 @@ vn_reference_eq (const_vn_reference_t co
 static void
 copy_reference_ops_from_ref (tree ref, vec *result)
 {
-  if (TREE_CODE (ref) == TARGET_MEM_REF)
-{
-  vn_reference_op_s temp;
-
-  result->reserve (3);
-
-  memset (&temp, 0, sizeof (temp));
-  temp.type = TREE_TYPE (ref);
-  temp.opcode = TREE_CODE (ref);
-  temp.op0 = TMR_INDEX (ref);
-  temp.op1 = TMR_STEP (ref);
-  temp.op2 = TMR_OFFSET (ref);
-  temp.off = -1;
-  temp.clique = MR_DEPENDENCE_CLIQUE (ref);
-  temp.base = MR_DEPENDENCE_BASE (ref);
-  result->quick_push (temp);
-
-  memset (&temp, 0, sizeof (temp));
-  temp.type = NULL_TREE;
-  temp.opcode = ERROR_MARK;
-  temp.op0 = TMR_INDEX2 (ref);
-  temp.off = -1;
-  result->quick_push (temp);
-
-  memset (&temp, 0, sizeof (temp));
-  temp.type = NULL_TREE;
-  temp.opcode = TREE_CODE (TMR_BASE (ref));
-  temp.op0 = TMR_BASE (ref);
-  temp.off = -1;
-  result->quick_push (temp);
-  return;
-}
-
   /* For non-calls, store the information that makes up the address.  */
   tree orig = ref;
   while (ref)
@@ -853,6 +820,20 @@ copy_reference_ops_from_ref (tree ref, v
  temp.base = MR_DEPENDENCE_BASE (ref);
  temp.reverse = REF_REVERSE_STORAGE_ORDER (ref);
  break;
+   case TARGET_MEM_REF:
+ /* The base address gets its own vn_reference_op_s structure.  */
+ temp.op0 = TMR_INDEX (ref);
+ temp.op1 = TMR_STEP (ref);
+ temp.op2 = TMR_OFFSET (ref);
+ temp.clique = MR_DEPENDENCE_CLIQUE (ref);
+ temp.base = MR_DEPENDENCE_BASE (ref);
+ result->safe_push (temp);
+ memset (&temp, 0, sizeof (temp));
+ temp.type = NULL_TREE;
+ temp.opcode = ERROR_MARK;
+ temp.op0 = TMR_INDEX2 (ref);
+ temp.off = -1;
+ break;
case BIT_FIELD_REF:
  /* Record bits, position and storage order.  */
  temp.op0 = TREE_OPERAND (ref, 1);
@@ -6872,14 +6853,24 @@ class pass_fre : public gimple_opt_pass
 {
 public:
   pass_fre (gcc::context *ctxt)
-: gimple_opt_pass (pass_data_fre, ctxt)
+: gimple_opt_pass (pass_data_fre, ctxt), may_iterate (true)
   {}
 
   /* opt_pass methods: */
   opt_pass * clone () { return new pass_fre (m_ctxt); }
-  virtual bool gate (function *) { return flag_tree_fre != 0; }
+  void set_pass_param (unsigned int n, bool param)
+{
+  gcc_assert (n == 0);
+  may_iterate = param;
+}
+  virtual bool gate (function *)
+{
+  return flag_tree_fre != 0 && (may_iterate || optimize > 1);
+}
   virtual unsigned int execute (function *);
 
+private:
+  bool may_iterate;
 }; // class pass_fre
 
 unsigned int
@@ -6888,15 +6879,16 @@ pass_fre::execute (function *fun)
   unsigned todo = 0;
 
   /* At -O[1g] use the cheap non-iterating mode.  */
+  bool iterate_p = may_iterate && (optimize > 1);
   calculate_dominance_info (CDI_DOMINATORS);
-  if (optimize > 1)
+  if (iterate_p)
 loop_optimizer_init (AVOID_CFG_MODIFICATIONS);
 
   default_vn_walk_kind = VN_WALKREWRITE;
-  todo = do_rpo_vn (fun, NULL, NULL, optimize > 1, true);
+  todo = do_rpo_vn (fun, NULL, NULL, iterate_p, true);
   free_rpo_vn ();
 
-  if (optimize > 1)
+  if (iterate_p)
 loop_optimizer_finalize ();
 
   return todo;
Index: gcc/passes.def
===
--- gcc/passes.def  (revision 272742)
+++ gcc/passes.def  (working copy)
@@ -83,7 +83,7 @@ along wi

Re: [PATCH] rs6000: Enable -fvariable-expansion-in-unroller by default

2019-06-27 Thread Segher Boessenkool
Hi Bill,

On Wed, Jun 26, 2019 at 10:23:06PM -0500, Bill Schmidt wrote:
> 2019-06-27  Bill Schmidt  
> 
>   * config/rs6000/rs6000.c (rs6000_option_override_internal): Enable
>   -fvariable-expansion-in-unroller by default.
> 
> 
> --- gcc/config/rs6000/rs6000.c(revision 272719)
> +++ gcc/config/rs6000/rs6000.c(working copy)
> @@ -3616,6 +3616,11 @@ rs6000_option_override_internal (bool global_init_
>&& !global_options_set.x_flag_asynchronous_unwind_tables)
>  flag_asynchronous_unwind_tables = 1;
>  
> +  /* -fvariable-expansion-in-unroller is a win for POWER whenever the
> + loop unroller is active.  It is only checked during unrolling, so
> + we can just set it on by default.  */
> +  flag_variable_expansion_in_unroller = 1;

You should test global_options_set like the asynch unwind thing above,
so that a user can still turn off variable expansion if he/she so desires.


Okay with that change.  Thanks!


Segher


[PATCH] PR libstdc++/85494 use rand_s in std::random_device

2019-06-27 Thread Jonathan Wakely

This is a minimal fix for the use of a deterministic RNG on mingw-w64,
simply using rand_s unconditionally. The rest of the r271740 changes on
trunk are not included. That means that RDSEED and RDRAND are not
available for mingw-w64 and the token passed to the constructor is
ignored completely.

PR libstdc++/85494 use rand_s in std::random_device
* config/os/mingw32-w64/os_defines.h (_GLIBCXX_USE_CRT_RAND_S): Define.
* src/c++11/cow-string-inst.cc (random_device::_M_init_pretr1)
[_GLIBCXX_USE_CRT_RAND_S]: Do nothing if rand_s will be used.
* src/c++11/random.cc [_GLIBCXX_USE_CRT_RAND_S] (__winxp_rand_s):
Define new function.
(random_device::_M_init_pretr1) [_GLIBCXX_USE_CRT_RAND_S]: Do nothing
if rand_s will be used.
(random_device::_M_getval_pretr1) [_GLIBCXX_USE_CRT_RAND_S]: Use
__winxp_rand_s().
* testsuite/26_numerics/random/random_device/85494.cc: New test.


Tested x86_64-pc-linux-gnu and x86_64-w64-mingw32, committed to
gcc-9-branch.

commit f055311955f357f567cd38d2fe8da2af16c1c3ae
Author: redi 
Date:   Thu Jun 27 11:31:02 2019 +

PR libstdc++/85494 use rand_s in std::random_device

This is a minimal fix for the use of a deterministic RNG on mingw-w64,
simply using rand_s unconditionally. The rest of the r271740 changes on
trunk are not included. That means that RDSEED and RDRAND are not
available for mingw-w64 and the token passed to the constructor is
ignored completely.

PR libstdc++/85494 use rand_s in std::random_device
* config/os/mingw32-w64/os_defines.h (_GLIBCXX_USE_CRT_RAND_S): 
Define.
* src/c++11/cow-string-inst.cc (random_device::_M_init_pretr1)
[_GLIBCXX_USE_CRT_RAND_S]: Do nothing if rand_s will be used.
* src/c++11/random.cc [_GLIBCXX_USE_CRT_RAND_S] (__winxp_rand_s):
Define new function.
(random_device::_M_init_pretr1) [_GLIBCXX_USE_CRT_RAND_S]: Do 
nothing
if rand_s will be used.
(random_device::_M_getval_pretr1) [_GLIBCXX_USE_CRT_RAND_S]: Use
__winxp_rand_s().
* testsuite/26_numerics/random/random_device/85494.cc: New test.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-9-branch@272748 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/libstdc++-v3/config/os/mingw32-w64/os_defines.h 
b/libstdc++-v3/config/os/mingw32-w64/os_defines.h
index 6c19d3953a6..418c6f569df 100644
--- a/libstdc++-v3/config/os/mingw32-w64/os_defines.h
+++ b/libstdc++-v3/config/os/mingw32-w64/os_defines.h
@@ -88,4 +88,6 @@
 // See libstdc++/59807
 #define _GTHREAD_USE_MUTEX_INIT_FUNC 1
 
+#define _GLIBCXX_USE_CRT_RAND_S 1
+
 #endif
diff --git a/libstdc++-v3/src/c++11/cow-string-inst.cc 
b/libstdc++-v3/src/c++11/cow-string-inst.cc
index c36f297438e..a5e4ad29dc3 100644
--- a/libstdc++-v3/src/c++11/cow-string-inst.cc
+++ b/libstdc++-v3/src/c++11/cow-string-inst.cc
@@ -77,8 +77,9 @@ namespace std _GLIBCXX_VISIBILITY(default)
   }
 
   void
-  random_device::_M_init_pretr1(const std::string& token)
+  random_device::_M_init_pretr1(const std::string& token [[gnu::unused]])
   {
+#ifndef _GLIBCXX_USE_CRT_RAND_S
 unsigned long __seed = 5489UL;
 if (token != "mt19937")
   {
@@ -90,6 +91,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
 "(const std::string&)"));
   }
 _M_mt.seed(__seed);
+#endif
   }
 } // namespace
 #endif
diff --git a/libstdc++-v3/src/c++11/random.cc b/libstdc++-v3/src/c++11/random.cc
index 1146d21c3f9..0bb6b6bf5b8 100644
--- a/libstdc++-v3/src/c++11/random.cc
+++ b/libstdc++-v3/src/c++11/random.cc
@@ -23,6 +23,8 @@
 // .
 
 #define _GLIBCXX_USE_CXX11_ABI 1
+#define _CRT_RAND_S // define this before including  to get rand_s
+
 #include 
 
 #ifdef  _GLIBCXX_USE_C99_STDINT_TR1
@@ -50,6 +52,10 @@
 # include 
 #endif
 
+#ifdef _GLIBCXX_USE_CRT_RAND_S
+# include 
+#endif
+
 namespace std _GLIBCXX_VISIBILITY(default)
 {
   namespace
@@ -85,6 +91,18 @@ namespace std _GLIBCXX_VISIBILITY(default)
   return val;
 }
 #endif
+
+#ifdef _GLIBCXX_USE_CRT_RAND_S
+# pragma GCC poison _M_mt
+unsigned int
+__winxp_rand_s()
+{
+  unsigned int val;
+  if (::rand_s(&val) != 0)
+   std::__throw_runtime_error(__N("random_device: rand_s failed"));
+  return val;
+}
+#endif
   }
 
   void
@@ -122,9 +140,11 @@ namespace std _GLIBCXX_VISIBILITY(default)
   }
 
   void
-  random_device::_M_init_pretr1(const std::string& token)
+  random_device::_M_init_pretr1(const std::string& token [[gnu::unused]])
   {
+#ifndef _GLIBCXX_USE_CRT_RAND_S
 _M_mt.seed(_M_strtoul(token));
+#endif
   }
 
   void
@@ -170,7 +190,11 @@ namespace std _GLIBCXX_VISIBILITY(default)
   random_device::result_type
   random_device::_M_getval_pretr1()
   {
+#ifdef _GLIBCXX_USE_CRT_RAND_S
+return __winxp_rand_s();
+#else
 return _M_mt();
+#endif
   }
 
 

Re: [PATCH] x86: fix/improve vgf2p8affine*qb insns

2019-06-27 Thread Jan Beulich
>>> On 27.06.19 at 12:58,  wrote:
> On Thu, Jun 27, 2019 at 12:49 PM Jan Beulich  wrote:
>>
>> >>> On 27.06.19 at 12:20,  wrote:
>> > On Thu, Jun 27, 2019 at 10:57 AM Jan Beulich  wrote:
>> >>
>> >> - the affine transformations are not commutative (the two source
>> >>   operands have entirely different meaning)
>> >> - there's no need for three alternatives
>> >> - the nonimmediate_operand/Bm combination can better be vector_operand/m
>> >>
>> >> gcc/
>> >> 2019-06-27  Jan Beulich  
>> >>
>> >> * config/i386/sse.md (vgf2p8affineinvqb_,
>> >> vgf2p8affineqb_): Drop % constraint modifier.
>> >> Eliminate redundant alternative.  Use vector_operand plus "m"
>> >> constraint.
>> >
>> > Please just drop % modifier and use vector_operand here. IIRC,
>> > register allocator operates on constraints, it doesn't care for
>> > predicates. But predicates shouldn't be more constrained than
>> > constraints. So, having "m" instead of "Bm" is a bad idea with
>> > vector_operand.
>>
>> Well, putting back the Bm is easy (if that's really needed). But do
>> you also mean me to put back to redundant 3rd alternative?
> 
> It is not redundant, "x" and "v" are different register constraints.

Well, yes, "v" is covering a wider set than "x". But only if AVX512F
is enabled. The original combinations ("x", "avx") and ("v", "avx512f")
are thus effectively the same as the new ("v", "avx"). But yes - I
guess I'll split this from the actual bug fix.

Jan




Re: Use ODR for canonical types construction in LTO

2019-06-27 Thread Jan Hubicka
> On Thu, 27 Jun 2019, Jan Hubicka wrote:
> 
> > Hi,
> > here is update patch I am re-testing. Ok if it suceeds?
> 
> + if (!type_in_anonymous_namespace_p (t))
> +   hash = htab_hash_string (IDENTIFIER_POINTER
> +  (DECL_ASSEMBLER_NAME
> +  (TYPE_NAME (t;
> + else
> +   hash = TYPE_UID (t);
> + num_canonical_type_hash_entries++;
> + bool existed_p = canonical_type_hash_cache->put (prevail, hash);
> 
> but you still cache the hash for prevail rather than t while we
> are eventually asking for the hash of t again?  I suppose not since
> we just adjusted TYPE_CANONICAL and we're then supposed to pick
> the hash from the canonical type?  I think this deserves a comment.
> And it should use 'prevail' then everywhere here for consistency,
> otherwise it really looks odd.

Yes, all ODR variants will get prevail as their TYPE_CANONICAL and
therefore we need to cache its hash.
I will add comment.
> 
> > @@ -357,7 +367,7 @@ iterative_hash_canonical_type (tree type  
> >optimal order.  To avoid quadratic behavior also register the
> >type here.  */
> >v = hash_canonical_type (type);
> > -  gimple_register_canonical_type_1 (type, v);
> > +  v = gimple_register_canonical_type_1 (type, v);
> >  }
> >hstate.add_int (v);
> >  }
> 
> Just noticed this should be
> 
> hstate.merge_hash (v);
> 
> results in the very same effect but is clearer IMHO.

OK, will fix that too, re-test and commit.
Thanks!

Honza


Re: [PATCH] x86: fix/improve vgf2p8affine*qb insns

2019-06-27 Thread Uros Bizjak
On Thu, Jun 27, 2019 at 1:39 PM Jan Beulich  wrote:
>
> >>> On 27.06.19 at 12:58,  wrote:
> > On Thu, Jun 27, 2019 at 12:49 PM Jan Beulich  wrote:
> >>
> >> >>> On 27.06.19 at 12:20,  wrote:
> >> > On Thu, Jun 27, 2019 at 10:57 AM Jan Beulich  wrote:
> >> >>
> >> >> - the affine transformations are not commutative (the two source
> >> >>   operands have entirely different meaning)
> >> >> - there's no need for three alternatives
> >> >> - the nonimmediate_operand/Bm combination can better be vector_operand/m
> >> >>
> >> >> gcc/
> >> >> 2019-06-27  Jan Beulich  
> >> >>
> >> >> * config/i386/sse.md (vgf2p8affineinvqb_,
> >> >> vgf2p8affineqb_): Drop % constraint modifier.
> >> >> Eliminate redundant alternative.  Use vector_operand plus "m"
> >> >> constraint.
> >> >
> >> > Please just drop % modifier and use vector_operand here. IIRC,
> >> > register allocator operates on constraints, it doesn't care for
> >> > predicates. But predicates shouldn't be more constrained than
> >> > constraints. So, having "m" instead of "Bm" is a bad idea with
> >> > vector_operand.
> >>
> >> Well, putting back the Bm is easy (if that's really needed). But do
> >> you also mean me to put back to redundant 3rd alternative?
> >
> > It is not redundant, "x" and "v" are different register constraints.
>
> Well, yes, "v" is covering a wider set than "x". But only if AVX512F
> is enabled. The original combinations ("x", "avx") and ("v", "avx512f")
> are thus effectively the same as the new ("v", "avx"). But yes - I
> guess I'll split this from the actual bug fix.

No, you are correct. Please use "v", and remove the redundant alternative.

Uros.


Re: [PATCH] rs6000: Enable -fvariable-expansion-in-unroller by default

2019-06-27 Thread Segher Boessenkool
On Thu, Jun 27, 2019 at 11:33:45AM +0200, Richard Biener wrote:
> On Thu, Jun 27, 2019 at 5:23 AM Bill Schmidt  wrote:
> > We've done some experimenting and realized that the subject option almost
> > always provide improved performance for Power when the loop unroller is
> > enabled.  So this patch turns that flag on by default for us.
> 
> I guess it creates more freedom for combine (more single-uses) and register
> allocation.  I wonder in which cases this might pessimize things?  I guess
> the pre-RA scheduler might make RAs life harder with creating overlapping
> life-ranges.
> 
> I guess you didn't actually investigate the nature of the improvements you 
> saw?

It breaks the length of dependency chains by a factor equal to the unroll
factor.  I do not know why this doesn't help a lot everywhere.  It of
course raises register pressure, maybe that is just it?

> Do we want to adjust the flags documentation, saying whether this is enabled
> by default depends on the target (or even list them)?

Good idea, thanks.


Segher


Re: [PATCH] x86: mark "k" and "Yk" constraints as non-internal

2019-06-27 Thread Jan Beulich
>>> On 27.06.19 at 13:09,  wrote:
> On Thu, Jun 27, 2019 at 12:11 PM Jan Beulich  wrote:
>>
>> Without these constraints asm() can't make use of mask registers.
> 
> asm should be deprecated. We have intrinsics for this purpose.

While maybe not explicitly applicable here, the intrinsics aren't
(afaict) providing full flexibility. In particular (just as example)
I haven't found a way to use embedded broadcast with the
intrinsics, but I can easily do so with asm().

Furthermore there are other reasons to use asm() - things like
the Linux kernel are full of it for a reason. And once one has
to use asm(), the resulting code typically is easier to follow if
one doesn't further intermix it with uses of builtins.

And finally, if asm() was indeed meant to be deprecated, how
come it pretty recently got extended to allow for "inline"?

Jan




Re: [PATCH] x86: mark "k" and "Yk" constraints as non-internal

2019-06-27 Thread Uros Bizjak
On Thu, Jun 27, 2019 at 1:46 PM Jan Beulich  wrote:
>
> >>> On 27.06.19 at 13:09,  wrote:
> > On Thu, Jun 27, 2019 at 12:11 PM Jan Beulich  wrote:
> >>
> >> Without these constraints asm() can't make use of mask registers.
> >
> > asm should be deprecated. We have intrinsics for this purpose.
>
> While maybe not explicitly applicable here, the intrinsics aren't
> (afaict) providing full flexibility. In particular (just as example)
> I haven't found a way to use embedded broadcast with the
> intrinsics, but I can easily do so with asm().
>
> Furthermore there are other reasons to use asm() - things like
> the Linux kernel are full of it for a reason. And once one has
> to use asm(), the resulting code typically is easier to follow if
> one doesn't further intermix it with uses of builtins.
>
> And finally, if asm() was indeed meant to be deprecated, how
> come it pretty recently got extended to allow for "inline"?

I didn't mean that asm() in general should be deprecated, but for SSE
and other vector extensions, where intrinsics are available,
intrinsics should be used instead. There was exactly zero requests to
use new asm constraints, it looks that people are satisfied with
intrinsics approach (which is also future-proof, etc).

Uros.


Re: [PATCH] rs6000: Enable -fvariable-expansion-in-unroller by default

2019-06-27 Thread Bill Schmidt
On 6/27/19 6:45 AM, Segher Boessenkool wrote:
> On Thu, Jun 27, 2019 at 11:33:45AM +0200, Richard Biener wrote:
>> On Thu, Jun 27, 2019 at 5:23 AM Bill Schmidt  wrote:
>>> We've done some experimenting and realized that the subject option almost
>>> always provide improved performance for Power when the loop unroller is
>>> enabled.  So this patch turns that flag on by default for us.
>> I guess it creates more freedom for combine (more single-uses) and register
>> allocation.  I wonder in which cases this might pessimize things?  I guess
>> the pre-RA scheduler might make RAs life harder with creating overlapping
>> life-ranges.
>>
>> I guess you didn't actually investigate the nature of the improvements you 
>> saw?
> It breaks the length of dependency chains by a factor equal to the unroll
> factor.  I do not know why this doesn't help a lot everywhere.  It of
> course raises register pressure, maybe that is just it?

Right, it's all about breaking dependencies to more efficiently exploit
the microarchitecture.  By default, variable expansion in GCC is quite
conservative, creating only two reduction streams out of one, so it's
pretty rare for it to cause spill.  This can be adjusted upwards with
--param max-variable-expansions-in-unroller=n.  Our experiments show
that raising n to 4 starts to cause some minor degradations, which are
almost certainly due to pressure, so the default setting looks appropriate.
>
>> Do we want to adjust the flags documentation, saying whether this is enabled
>> by default depends on the target (or even list them)?
> Good idea, thanks.

OK, I'll update the docs and make the change that Segher requested. 
Thanks for the reviews!

Bill
>
>
> Segher



Re: [PATCH] x86: mark "k" and "Yk" constraints as non-internal

2019-06-27 Thread Jan Beulich
>>> On 27.06.19 at 14:00,  wrote:
> On Thu, Jun 27, 2019 at 1:46 PM Jan Beulich  wrote:
>>
>> >>> On 27.06.19 at 13:09,  wrote:
>> > On Thu, Jun 27, 2019 at 12:11 PM Jan Beulich  wrote:
>> >>
>> >> Without these constraints asm() can't make use of mask registers.
>> >
>> > asm should be deprecated. We have intrinsics for this purpose.
>>
>> While maybe not explicitly applicable here, the intrinsics aren't
>> (afaict) providing full flexibility. In particular (just as example)
>> I haven't found a way to use embedded broadcast with the
>> intrinsics, but I can easily do so with asm().
>>
>> Furthermore there are other reasons to use asm() - things like
>> the Linux kernel are full of it for a reason. And once one has
>> to use asm(), the resulting code typically is easier to follow if
>> one doesn't further intermix it with uses of builtins.
>>
>> And finally, if asm() was indeed meant to be deprecated, how
>> come it pretty recently got extended to allow for "inline"?
> 
> I didn't mean that asm() in general should be deprecated, but for SSE
> and other vector extensions, where intrinsics are available,
> intrinsics should be used instead. There was exactly zero requests to
> use new asm constraints, it looks that people are satisfied with
> intrinsics approach (which is also future-proof, etc).

So what about my embedded broadcast example then? "Zero
requests" is clearly not exactly right. It simply didn't occur to me
(until I noticed the @internal here) that I should raise such a
request, rather than just using asm(). Subsequently I did then
notice "Yh" going away, complicating things further ...

I'd also like to note that the choice of types on some of the builtins
makes it rather cumbersome to use them. Especially for scalar
operations I've found myself better resorting to asm(). See
https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/tests/x86_emulator/simd.c
(most of the changes submitted (not so) recently have been
coming from the work of putting together this and its sibling
tests for the Xen Project instruction emulator).

Jan




Re: [PATCH] x86: mark "k" and "Yk" constraints as non-internal

2019-06-27 Thread Uros Bizjak
On Thu, Jun 27, 2019 at 2:23 PM Jan Beulich  wrote:
>
> >>> On 27.06.19 at 14:00,  wrote:
> > On Thu, Jun 27, 2019 at 1:46 PM Jan Beulich  wrote:
> >>
> >> >>> On 27.06.19 at 13:09,  wrote:
> >> > On Thu, Jun 27, 2019 at 12:11 PM Jan Beulich  wrote:
> >> >>
> >> >> Without these constraints asm() can't make use of mask registers.
> >> >
> >> > asm should be deprecated. We have intrinsics for this purpose.
> >>
> >> While maybe not explicitly applicable here, the intrinsics aren't
> >> (afaict) providing full flexibility. In particular (just as example)
> >> I haven't found a way to use embedded broadcast with the
> >> intrinsics, but I can easily do so with asm().
> >>
> >> Furthermore there are other reasons to use asm() - things like
> >> the Linux kernel are full of it for a reason. And once one has
> >> to use asm(), the resulting code typically is easier to follow if
> >> one doesn't further intermix it with uses of builtins.
> >>
> >> And finally, if asm() was indeed meant to be deprecated, how
> >> come it pretty recently got extended to allow for "inline"?
> >
> > I didn't mean that asm() in general should be deprecated, but for SSE
> > and other vector extensions, where intrinsics are available,
> > intrinsics should be used instead. There was exactly zero requests to
> > use new asm constraints, it looks that people are satisfied with
> > intrinsics approach (which is also future-proof, etc).
>
> So what about my embedded broadcast example then? "Zero
> requests" is clearly not exactly right. It simply didn't occur to me
> (until I noticed the @internal here) that I should raise such a
> request, rather than just using asm(). Subsequently I did then
> notice "Yh" going away, complicating things further ...

There was some work by HJ a while ago that implemented automatic
generation of embedded broadcasts. Perhaps he can answer the question.

Uros.

> I'd also like to note that the choice of types on some of the builtins
> makes it rather cumbersome to use them. Especially for scalar
> operations I've found myself better resorting to asm(). See
> https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/tests/x86_emulator/simd.c
> (most of the changes submitted (not so) recently have been
> coming from the work of putting together this and its sibling
> tests for the Xen Project instruction emulator).
>
> Jan
>
>


[PATCH] Add .gnu.lto_.lto section.

2019-06-27 Thread Martin Liška
Hi.

So this is a patch candidate for the .gnu.lto_.lto ELF section.
As mentioned, the section contains information about bytecode version,
compression algorithm used and slim LTO object flag.

One minor issues is that I need to append random suffix to the section:
  [ 6] .gnu.lto_.lto.36e74acfb145b04b PROGBITS 86 
08 00   E  0   0  1

That's because of in WPA we load more object files and the hash is used for 
mapping.

Is the idea of the patch right?
Martin
>From ec550463df26855982b7b70393766cc8e7c6cc8e Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Fri, 21 Jun 2019 12:14:04 +0200
Subject: [PATCH] Add .gnu.lto_.lto section.

---
 gcc/lto-section-in.c   |  9 +++--
 gcc/lto-section-out.c  |  2 --
 gcc/lto-streamer-out.c | 40 +---
 gcc/lto-streamer.h | 25 +
 gcc/lto/lto-common.c   | 15 +++
 5 files changed, 64 insertions(+), 27 deletions(-)

diff --git a/gcc/lto-section-in.c b/gcc/lto-section-in.c
index 4cfc0cad4be..4e7d1181f23 100644
--- a/gcc/lto-section-in.c
+++ b/gcc/lto-section-in.c
@@ -52,10 +52,10 @@ const char *lto_section_name[LTO_N_SECTION_TYPES] =
   "icf",
   "offload_table",
   "mode_table",
-  "hsa"
+  "hsa",
+  "lto"
 };
 
-
 /* Hooks so that the ipa passes can call into the lto front end to get
sections.  */
 
@@ -146,7 +146,7 @@ lto_get_section_data (struct lto_file_decl_data *file_data,
   /* WPA->ltrans streams are not compressed with exception of function bodies
  and variable initializers that has been verbatim copied from earlier
  compilations.  */
-  if (!flag_ltrans || decompress)
+  if ((!flag_ltrans || decompress) && section_type != LTO_section_lto)
 {
   /* Create a mapping header containing the underlying data and length,
 	 and prepend this to the uncompression buffer.  The uncompressed data
@@ -167,9 +167,6 @@ lto_get_section_data (struct lto_file_decl_data *file_data,
   data = buffer.data + header_length;
 }
 
-  lto_check_version (((const lto_header *)data)->major_version,
-		 ((const lto_header *)data)->minor_version,
-		 file_data->file_name);
   return data;
 }
 
diff --git a/gcc/lto-section-out.c b/gcc/lto-section-out.c
index c91e58f0465..7ae102164ef 100644
--- a/gcc/lto-section-out.c
+++ b/gcc/lto-section-out.c
@@ -285,8 +285,6 @@ lto_destroy_simple_output_block (struct lto_simple_output_block *ob)
   /* Write the header which says how to decode the pieces of the
  t.  */
   memset (&header, 0, sizeof (struct lto_simple_header));
-  header.major_version = LTO_major_version;
-  header.minor_version = LTO_minor_version;
   header.main_size = ob->main_stream->total_size;
   lto_write_data (&header, sizeof header);
 
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index dc68429303c..7dee770aa11 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -1974,10 +1974,6 @@ produce_asm (struct output_block *ob, tree fn)
   /* The entire header is stream computed here.  */
   memset (&header, 0, sizeof (struct lto_function_header));
 
-  /* Write the header.  */
-  header.major_version = LTO_major_version;
-  header.minor_version = LTO_minor_version;
-
   if (section_type == LTO_section_function_body)
 header.cfg_size = ob->cfg_stream->total_size;
   header.main_size = ob->main_stream->total_size;
@@ -2270,10 +2266,6 @@ lto_output_toplevel_asms (void)
   /* The entire header stream is computed here.  */
   memset (&header, 0, sizeof (header));
 
-  /* Write the header.  */
-  header.major_version = LTO_major_version;
-  header.minor_version = LTO_minor_version;
-
   header.main_size = ob->main_stream->total_size;
   header.string_size = ob->string_stream->total_size;
   lto_write_data (&header, sizeof header);
@@ -2390,6 +2382,29 @@ prune_offload_funcs (void)
 DECL_PRESERVE_P (fn_decl) = 1;
 }
 
+/* Produce LTO section that contains global information
+   about LTO bytecode.  */
+
+static void
+produce_lto_section ()
+{
+  /* Stream LTO meta section.  */
+  output_block *ob = create_output_block (LTO_section_lto);
+
+  char * section_name = lto_get_section_name (LTO_section_lto, NULL, NULL);
+  lto_begin_section (section_name, false);
+  free (section_name);
+
+  lto_compression compression = ZLIB;
+
+  bool slim_object = flag_generate_lto && !flag_fat_lto_objects;
+  lto_section s
+= { LTO_major_version, LTO_minor_version, slim_object, compression, 0 };
+  lto_write_data (&s, sizeof s);
+  lto_end_section ();
+  destroy_output_block (ob);
+}
+
 /* Main entry point from the pass manager.  */
 
 void
@@ -2412,6 +2427,8 @@ lto_output (void)
   /* Initialize the streamer.  */
   lto_streamer_init ();
 
+  produce_lto_section ();
+
   n_nodes = lto_symtab_encoder_size (encoder);
   /* Process only the functions with bodies.  */
   for (i = 0; i < n_nodes; i++)
@@ -2827,10 +2844,6 @@ lto_write_mode_table (void)
   struct lto_simple_header_with_strings header;
   memset (&header, 0, sizeof (head

Re: [PATCH] Deprecate -frepo option.

2019-06-27 Thread Martin Liška
On 6/21/19 4:28 PM, Richard Biener wrote:
> On Fri, Jun 21, 2019 at 4:13 PM Jakub Jelinek  wrote:
>>
>> On Fri, Jun 21, 2019 at 04:04:00PM +0200, Martin Liška wrote:
>>> On 6/21/19 1:58 PM, Jakub Jelinek wrote:
 On Fri, Jun 21, 2019 at 01:52:09PM +0200, Martin Liška wrote:
> On 6/21/19 1:47 PM, Jonathan Wakely wrote:
>> On Fri, 21 Jun 2019 at 11:40, Martin Liška wrote:
>>> Yes, I would be fine to deprecate that for GCC 10.1
>>
>> Would it be appropriate to issue a warning in GCC 10.x if the option is 
>> used?
>
> Sure. With the patch attached one will see:
>
> $ gcc -frepo /tmp/main.cc -c
> gcc: warning: switch ‘-frepo’ is no longer supported
>
> I'm sending patch that also removes -frepo tests from test-suite.
> I've been testing the patch.

 IMHO for just deprecation of an option you don't want to remove it from the
 testsuite, just match the warning it will generate in those tests, and
 I'm not convinced you want to remove it from the documentation (rather than
 just saying in the documentation that the option is deprecated and might be
 removed in a later GCC version).
>>>
>>> Agree with you. I'm sending updated version of the patch.
>>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
>>
>> I'm also not convinced about the Deprecated flag, seems like that is a flag
>> that we use for options that have been already removed.
>> So, instead there should be some proper warning in the C++ FE for it,
>> or just Warn.
> 
> In principle -frepo is a nice idea - does it live up to its promises?  That 
> is,
> does it actually work, for example when throwing it on the libstdc++
> testsuite or a larger C++ project? 

@Jonathan, Jason: Do we know whether it really work?

> The option doesn't document
> optimization issues but I assume template bodies are not available
> for IPA optimizations unless -frepo is combined with LTO where the
> template CU[s] then bring them in.
> 
> So I'm not sure - do we really want to remove this feature?
> 
> Richard.
> 
>> Jakub



Re: [PATCH] Deprecate -frepo option.

2019-06-27 Thread Jonathan Wakely
On Thu, 27 Jun 2019 at 13:30, Martin Liška  wrote:
>
> On 6/21/19 4:28 PM, Richard Biener wrote:
> > On Fri, Jun 21, 2019 at 4:13 PM Jakub Jelinek  wrote:
> >>
> >> On Fri, Jun 21, 2019 at 04:04:00PM +0200, Martin Liška wrote:
> >>> On 6/21/19 1:58 PM, Jakub Jelinek wrote:
>  On Fri, Jun 21, 2019 at 01:52:09PM +0200, Martin Liška wrote:
> > On 6/21/19 1:47 PM, Jonathan Wakely wrote:
> >> On Fri, 21 Jun 2019 at 11:40, Martin Liška wrote:
> >>> Yes, I would be fine to deprecate that for GCC 10.1
> >>
> >> Would it be appropriate to issue a warning in GCC 10.x if the option 
> >> is used?
> >
> > Sure. With the patch attached one will see:
> >
> > $ gcc -frepo /tmp/main.cc -c
> > gcc: warning: switch ‘-frepo’ is no longer supported
> >
> > I'm sending patch that also removes -frepo tests from test-suite.
> > I've been testing the patch.
> 
>  IMHO for just deprecation of an option you don't want to remove it from 
>  the
>  testsuite, just match the warning it will generate in those tests, and
>  I'm not convinced you want to remove it from the documentation (rather 
>  than
>  just saying in the documentation that the option is deprecated and might 
>  be
>  removed in a later GCC version).
> >>>
> >>> Agree with you. I'm sending updated version of the patch.
> >>> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >>
> >> I'm also not convinced about the Deprecated flag, seems like that is a flag
> >> that we use for options that have been already removed.
> >> So, instead there should be some proper warning in the C++ FE for it,
> >> or just Warn.
> >
> > In principle -frepo is a nice idea - does it live up to its promises?  That 
> > is,
> > does it actually work, for example when throwing it on the libstdc++
> > testsuite or a larger C++ project?
>
> @Jonathan, Jason: Do we know whether it really work?

I don't know. It's nearly 20 years since I've tried it, but apparently
a few people try using it:
https://stackoverflow.com/search?q=frepo+c%2B%2B

The first result was answered by me in 2012 saying nobody uses it:
https://stackoverflow.com/a/11832613/981959


Re: [PATCH] x86: mark "k" and "Yk" constraints as non-internal

2019-06-27 Thread Segher Boessenkool
On Thu, Jun 27, 2019 at 05:46:00AM -0600, Jan Beulich wrote:
> While maybe not explicitly applicable here, the intrinsics aren't
> (afaict) providing full flexibility. In particular (just as example)
> I haven't found a way to use embedded broadcast with the
> intrinsics, but I can easily do so with asm().
> 
> Furthermore there are other reasons to use asm() - things like
> the Linux kernel are full of it for a reason.

Some of it for a (good) reason.  Yes.

> And once one has
> to use asm(), the resulting code typically is easier to follow if
> one doesn't further intermix it with uses of builtins.

If you have to write more than a few lines of assembler, you are usually
*much* better off just writing it in an assembler source file (a .s).

> And finally, if asm() was indeed meant to be deprecated, how
> come it pretty recently got extended to allow for "inline"?

That was just a simple extension for a very specific purpose.  This does
not mean you should use inline assembler if there are better alternatives.
I'm not sure Linux using such extremely huge inline assembler statements
is a good idea at all, or if there are better ways to do what they want
(not that I see any fwiw); but they already did, and this takes the pain
away a bit for them, so why not.

You should not use inline assembler when there are better way of
expressing what you want.  Like with builtins, for example.  Inline
assembler is hard to write correctly, and hard to *keep* correct.  It is
mixing abstractions (that is its *purpose*), which of course makes it a
royal pain to deal with, esp. if overused.


Segher


Re: [PATCH] Handle '\0' in strcmp in RTL expansion (PR tree-optimization/90892).

2019-06-27 Thread Martin Liška
On 6/18/19 12:16 PM, Jakub Jelinek wrote:
> I think any such length changes should be moved after the two punt checks.
> Move also the len3 setting before the new checks (of course conditional on
> is_ncmp).

Ok, I'm sending updated version of the patch that addresses this.
I've been testing the patch.

Ready to be installed then?
Thanks,
Martin
>From 35043ab431c0dc1e8dcda484725a1f8875a4b95b Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Mon, 17 Jun 2019 10:39:15 +0200
Subject: [PATCH] Handle '\0' in strcmp in RTL expansion (PR
 tree-optimization/90892).

gcc/ChangeLog:

2019-06-17  Martin Liska  

	PR tree-optimization/90892
	* builtins.c (inline_expand_builtin_string_cmp): Handle '\0'
	in string constants.

gcc/testsuite/ChangeLog:

2019-06-17  Martin Liska  

	PR tree-optimization/90892
	* gcc.dg/pr90892.c: New test.
---
 gcc/builtins.c | 17 ++---
 gcc/testsuite/gcc.dg/pr90892.c | 14 ++
 2 files changed, 28 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/pr90892.c

diff --git a/gcc/builtins.c b/gcc/builtins.c
index c53afe8b033..db7939cfde8 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -7118,8 +7118,19 @@ inline_expand_builtin_string_cmp (tree exp, rtx target)
 return NULL_RTX;
 
   /* For strncmp, if the length is not a const, not qualify.  */
-  if (is_ncmp && !tree_fits_uhwi_p (len3_tree))
-return NULL_RTX;
+  if (is_ncmp)
+{
+  if (!tree_fits_uhwi_p (len3_tree))
+	return NULL_RTX;
+  else
+	len3 = tree_to_uhwi (len3_tree);
+}
+
+  if (src_str1 != NULL)
+len1 = strnlen (src_str1, len1) + 1;
+
+  if (src_str2 != NULL)
+len2 = strnlen (src_str2, len2) + 1;
 
   int const_str_n = 0;
   if (!len1)
@@ -7134,7 +7145,7 @@ inline_expand_builtin_string_cmp (tree exp, rtx target)
   gcc_checking_assert (const_str_n > 0);
   length = (const_str_n == 1) ? len1 : len2;
 
-  if (is_ncmp && (len3 = tree_to_uhwi (len3_tree)) < length)
+  if (is_ncmp && len3 < length)
 length = len3;
 
   /* If the length of the comparision is larger than the threshold,
diff --git a/gcc/testsuite/gcc.dg/pr90892.c b/gcc/testsuite/gcc.dg/pr90892.c
new file mode 100644
index 000..e4b5310807a
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/pr90892.c
@@ -0,0 +1,14 @@
+/* PR tree-optimization/90892 */
+/* { dg-do run } */
+/* { dg-options "-O2" } */
+
+const char *a = "A\0b";
+
+int
+main()
+{
+  if (__builtin_strncmp(a, "A\0", 2) != 0)
+__builtin_abort ();
+
+  return 0;
+}
-- 
2.21.0



RE: Use ODR for canonical types construction in LTO

2019-06-27 Thread JiangNing OS
No. Since this is LTO, it's very hard to simplify the big application. Sorry 
for that.

I think Christophe is mentioning the case from g++.dg is reporting the similar 
issue like " type variant differs by TYPE_CXX_ODR_P  ", right?

Thanks,
-Jiangning

> -Original Message-
> From: Jan Hubicka 
> Sent: Thursday, June 27, 2019 2:29 PM
> To: JiangNing OS 
> Cc: Eric Botcazou ; Christophe Lyon
> ; gcc Patches ;
> Richard Biener ; d...@dcepelik.cz; Martin Liška
> 
> Subject: Re: Use ODR for canonical types construction in LTO
> 
> > Hi,
> >
> > This commit
> > https://gcc.gnu.org/viewcvs/gcc?view=revision&revision=272628 is
> > breaking trunk LTO on some real benchmarks, so can it be fixed or
> > reverted? For example,
> 
> Do you have a testcase?
> Honza
> >
> > lto1: error: type variant differs by TYPE_CXX_ODR_P   > 0x99943d08 __va_list BLK
> > size  bitsizetype> constant 256>
> > unit-size  sizetype> constant 32>
> > align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x99943d08
> > fields  > type  > void>
> > public unsigned DI
> > size 
> > unit-size 
> > align:64 warn_if_not_align:0 symtab:0 alias-set 7 
> > structural-equality
> > pointer_to_this >
> > unsigned DI :0:0 size  
> > unit-size
> 
> > align:64 warn_if_not_align:0 offset_align 128
> > offset 
> > bit-offset  context
> 
> > chain  0x99940fc0>
> > unsigned DI :0:0 size  
> > unit-
> size 
> > align:64 warn_if_not_align:0 offset_align 128 offset 
> >  0x99890c30 0> bit-offset  context
>  chain  __vr_top>>>
> > reference_to_this  chain  > 0x99960098 __va_list>>   BLK
> > size  bitsizetype> constant 256>
> > unit-size  sizetype> constant 32>
> > align:64 warn_if_not_align:0 symtab:0 alias-set -1 canonical-type
> 0x99943d08
> > fields  > type  > void>
> > public unsigned DI
> > size 
> > unit-size 
> > align:64 warn_if_not_align:0 symtab:0 alias-set 7 
> > structural-equality
> > pointer_to_this >
> > unsigned DI :0:0 size  
> > unit-size
> 
> > align:64 warn_if_not_align:0 offset_align 128
> > offset 
> > bit-offset  context
> 
> > chain  0x99940fc0>
> > unsigned DI :0:0 size  
> > unit-
> size 
> > align:64 warn_if_not_align:0 offset_align 128 offset 
> >  0x99890c30 0> bit-offset  context
>  chain  __vr_top>>>
> > pointer_to_this  reference_to_this
> > >
> > lto1: internal compiler error: 'verify_type' failed
> > 0xe33e93 verify_type(tree_node const*)
> > ../../gcc/gcc/tree.c:14655
> > 0x5efc4b lto_fixup_state
> > ../../gcc/gcc/lto/lto-common.c:2429
> > 0x5fc01b lto_fixup_decls
> > ../../gcc/gcc/lto/lto-common.c:2460
> > 0x5fc01b read_cgraph_and_symbols(unsigned int, char const**)
> > ../../gcc/gcc/lto/lto-common.c:2693
> > 0x5ded23 lto_main()
> > ../../gcc/gcc/lto/lto.c:616
> > Please submit a full bug report,
> > with preprocessed source if appropriate.
> > Please include the complete backtrace with any bug report.
> > See  for instructions.
> > lto-wrapper: fatal error: /home/amptest/gcc/install_last//bin/g++
> > returned 1 exit status compilation terminated.
> > /usr/bin/ld: error: lto-wrapper failed
> > collect2: error: ld returned 1 exit status
> >
> > Thanks,
> > -Jiangning
> >
> > > -Original Message-
> > > From: gcc-patches-ow...@gcc.gnu.org 
> > > On Behalf Of Christophe Lyon
> > > Sent: Tuesday, June 25, 2019 8:30 PM
> > > To: Jan Hubicka 
> > > Cc: Eric Botcazou ; gcc Patches  > > patc...@gcc.gnu.org>; Richard Biener ;
> > > d...@dcepelik.cz; Martin Liška 
> > > Subject: Re: Use ODR for canonical types construction in LTO
> > >
> > > Hi,
> > >
> > >
> > > On Tue, 25 Jun 2019 at 10:20, Jan Hubicka  wrote:
> > > >
> > > > > > * gcc-interface/decl.c (gnat_to_gnu_entity): Check that
> > > > > > type is array or integer prior checking string flag.
> > > > >
> > > > > The test for array is superfluous here.
> > > > >
> > > > > > * gcc-interface/gigi.h (gnat_signed_type_for,
> > > > > > maybe_character_value): Likewise.
> > > > >
> > > > > Wrong ChangeLog, the first modified function is
> maybe_character_type.
> > > > >
> > > > > I have installed the attached patchlet after testing it on 
> > > > > x86-64/Linux.
> > > > >
> > > > >
> > > > >   * gcc-interface/decl.c (gnat_to_gnu_entity): Remove
> > > > > superfluous test
> > > in
> > > > >   previous change.
> > > > >   * gcc-interface/gigi.h (maybe_character_type): Fix formatting.
> > > > >   (maybe_character_value): Likewise.
> > > >
> > > > Thanks a lot. I was not quite sure if ARRAY_TYPEs can happen there
> > > > and I should have added you to the CC.
> > > >
> > >
> > > After the main commit (r272628), I have noticed regressions

Re: [PATCH][gcc] libgccjit: add bitfield support

2019-06-27 Thread Andrea Corallo
Hi Dave,
last version for this patch addressing the suggestion about the
JIT_BIT_FIELD macros comment description.

Thank you for all the suggestions.

Regarding the write access please see my previous answer into the binary
op patch thread.

Bests
  Andrea


2019-06-20  Andrea Corallo andrea.cora...@arm.com

* docs/topics/compatibility.rst (LIBGCCJIT_ABI_12): New ABI tag.
* docs/topics/types.rst: Add gcc_jit_context_new_bitfield.
* jit-common.h (namespace recording): Add class bitfield.
* jit-playback.c:
(DECL_JIT_BIT_FIELD, SET_DECL_JIT_BIT_FIELD): Add macros.
(playback::context::new_bitfield): New method.
(playback::compound_type::set_fields): Add bitfield support.
(playback::lvalue::mark_addressable): Was jit_mark_addressable make this
a method of lvalue plus return a bool to communicate success.
(playback::lvalue::get_address): Check for jit_mark_addressable return
value.
* jit-playback.h (new_bitfield): New method.
(class bitfield): New class.
(class lvalue): Add jit_mark_addressable method.
* jit-recording.c (recording::context::new_bitfield): New method.
(recording::bitfield::replay_into): New method.
(recording::bitfield::write_to_dump): Likewise.
(recording::bitfield::make_debug_string): Likewise.
(recording::bitfield::write_reproducer): Likewise.
* jit-recording.h (class context): Add new_bitfield method.
(class field): Make it derivable by class bitfield.
(class bitfield): Add new class.
* libgccjit++.h (class context): Add new_bitfield method.
* libgccjit.c (struct gcc_jit_bitfield): New structure.
(gcc_jit_context_new_bitfield): New function.
* libgccjit.h
(LIBGCCJIT_HAVE_gcc_jit_context_new_bitfield) New macro.
(gcc_jit_context_new_bitfield): New function.
* libgccjit.map (LIBGCCJIT_ABI_12) New ABI tag.


2019-06-20  Andrea Corallo andrea.cora...@arm.com

* jit.dg/all-non-failing-tests.h: Add test-accessing-bitfield.c.
* jit.dg/test-accessing-bitfield.c: New testcase.
* jit.dg/test-error-gcc_jit_context_new_bitfield-invalid-type.c:
Likewise.
* jit.dg/test-error-gcc_jit_context_new_bitfield-invalid-width.c:
Likewise.
* jit.dg/test-error-gcc_jit_lvalue_get_address-bitfield.c:
Likewise.
diff --git a/gcc/jit/docs/topics/compatibility.rst b/gcc/jit/docs/topics/compatibility.rst
index abefa56..da64920 100644
--- a/gcc/jit/docs/topics/compatibility.rst
+++ b/gcc/jit/docs/topics/compatibility.rst
@@ -177,3 +177,8 @@ entrypoints:
 
 ``LIBGCCJIT_ABI_11`` covers the addition of
 :func:`gcc_jit_context_add_driver_option`
+
+``LIBGCCJIT_ABI_12``
+
+``LIBGCCJIT_ABI_12`` covers the addition of
+:func:`gcc_jit_context_new_bitfield`
diff --git a/gcc/jit/docs/topics/types.rst b/gcc/jit/docs/topics/types.rst
index 1d2dcd4..37d9d01 100644
--- a/gcc/jit/docs/topics/types.rst
+++ b/gcc/jit/docs/topics/types.rst
@@ -247,6 +247,30 @@ You can model C `struct` types by creating :c:type:`gcc_jit_struct *` and
underlying string, so it is valid to pass in a pointer to an on-stack
buffer.
 
+.. function:: gcc_jit_field *\
+  gcc_jit_context_new_bitfield (gcc_jit_context *ctxt,\
+gcc_jit_location *loc,\
+gcc_jit_type *type,\
+int width,\
+const char *name)
+
+   Construct a new bit field, with the given type width and name.
+
+   The parameter ``name`` must be non-NULL.  The call takes a copy of the
+   underlying string, so it is valid to pass in a pointer to an on-stack
+   buffer.
+
+   The parameter ``type`` must be an integer type.
+
+   The parameter ``width`` must be a positive integer that does not exceed the
+   size of ``type``.
+
+   This API entrypoint was added in :ref:`LIBGCCJIT_ABI_12`; you can test
+   for its presence using
+   .. code-block:: c
+
+  #ifdef LIBGCCJIT_HAVE_gcc_jit_context_new_bitfield
+
 .. function:: gcc_jit_object *\
   gcc_jit_field_as_object (gcc_jit_field *field)
 
diff --git a/gcc/jit/jit-common.h b/gcc/jit/jit-common.h
index 1d96cc3..e747d96 100644
--- a/gcc/jit/jit-common.h
+++ b/gcc/jit/jit-common.h
@@ -119,6 +119,7 @@ namespace recording {
 	class union_;
   class vector_type;
 class field;
+  class bitfield;
 class fields;
 class function;
 class block;
diff --git a/gcc/jit/jit-playback.h b/gcc/jit/jit-playback.h
index bc4de9c..d4b148e 100644
--- a/gcc/jit/jit-playback.h
+++ b/gcc/jit/jit-playback.h
@@ -75,6 +75,12 @@ public:
 	 type *type,
 	 const char *name);
 
+  field *
+  new_bitfield (location *loc,
+		type *type,
+		int width,
+		const char *name);
+
   compound_type *
   new_compound_type (location *loc,
 		 const char *name,
@@ -426,6 +432,8 @@ private:
   tree m_inner;
 };
 
+class bitfield : public field {};
+
 class function : public wrapper
 {
 public:
@@ -614,6 +622,8 @@ public:
   rvalue *
   get_address (location *loc);
 
+private:
+  bool mark_addressable (location *loc);
 };
 

[PATCH] Fix 2 clang warnings.

2019-06-27 Thread Martin Liška
Hi.

This reduces 2 warnings reported by clang.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-06-27  Martin Liska  

* edit-context.c (test_applying_fixits_unreadable_file): Do not
use () for a constructor call.
(test_applying_fixits_line_out_of_range): Likewise.
* ggc-page.c (free_page): Use (char *) for %p printf format
argument.
---
 gcc/edit-context.c | 4 ++--
 gcc/ggc-page.c | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)


diff --git a/gcc/edit-context.c b/gcc/edit-context.c
index d3246ab0334..93d10664ae9 100644
--- a/gcc/edit-context.c
+++ b/gcc/edit-context.c
@@ -1639,7 +1639,7 @@ static void
 test_applying_fixits_unreadable_file ()
 {
   const char *filename = "this-does-not-exist.txt";
-  line_table_test ltt ();
+  line_table_test ltt;
   linemap_add (line_table, LC_ENTER, false, filename, 1);
 
   location_t loc = linemap_position_for_column (line_table, 1);
@@ -1670,7 +1670,7 @@ test_applying_fixits_line_out_of_range ()
   const char *old_content = "One-liner file\n";
   temp_source_file tmp (SELFTEST_LOCATION, ".txt", old_content);
   const char *filename = tmp.get_filename ();
-  line_table_test ltt ();
+  line_table_test ltt;
   linemap_add (line_table, LC_ENTER, false, filename, 2);
 
   /* Try to insert a string in line 2.  */
diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c
index 7066ef2c488..602dd7c61b7 100644
--- a/gcc/ggc-page.c
+++ b/gcc/ggc-page.c
@@ -977,7 +977,7 @@ free_page (page_entry *entry)
   if (GGC_DEBUG_LEVEL >= 2)
 fprintf (G.debug_file,
 	 "Deallocating page at %p, data %p-%p\n", (void *) entry,
-	 entry->page, entry->page + entry->bytes - 1);
+	 (char *)entry->page, (char *)entry->page + entry->bytes - 1);
 
   /* Mark the page as inaccessible.  Discard the handle to avoid handle
  leak.  */



[PATCH] Remove another bunch of dead assignment.

2019-06-27 Thread Martin Liška
Hi.

The patch continues to remove quite obvious dead assignments
that are not used.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-06-27  Martin Liska  

* config/i386/i386-expand.c (ix86_expand_sse2_mulvxdi3): Remove
dead assignemts.
* lra-eliminations.c (eliminate_regs_in_insn): Likewise.
* reg-stack.c (check_asm_stack_operands): Likewise.
* tree-ssa-structalias.c (create_function_info_for): Likewise.
* tree-vect-generic.c (expand_vector_operations_1): Likewise.

gcc/c-family/ChangeLog:

2019-06-27  Martin Liska  

* c-common.c (try_to_locate_new_include_insertion_point): Remove
dead assignemts.

gcc/cp/ChangeLog:

2019-06-27  Martin Liska  

* call.c (build_new_op_1): Remove
dead assignemts.
* typeck.c (cp_build_binary_op): Likewise.

gcc/fortran/ChangeLog:

2019-06-27  Martin Liska  

* check.c (gfc_check_c_funloc): Remove
dead assignemts.
* decl.c (variable_decl): Likewise.
* resolve.c (resolve_typebound_function): Likewise.
* simplify.c (gfc_simplify_matmul): Likewise.
(gfc_simplify_scan): Likewise.
* trans-array.c (gfc_could_be_alias): Likewise.
* trans-common.c (add_equivalences): Likewise.
* trans-expr.c (trans_class_vptr_len_assignment): Likewise.
(gfc_trans_array_constructor_copy): Likewise.
(gfc_trans_assignment_1): Likewise.
* trans-intrinsic.c (conv_intrinsic_atomic_op): Likewise.
* trans-openmp.c (gfc_omp_finish_clause): Likewise.
* trans-types.c (gfc_get_array_descriptor_base): Likewise.
* trans.c (gfc_build_final_call): Likewise.

libcpp/ChangeLog:

2019-06-27  Martin Liska  

* line-map.c (linemap_get_expansion_filename): Remove
dead assignemts.
* mkdeps.c (make_write): Likewise.
---
 gcc/c-family/c-common.c   |  4 ++--
 gcc/config/i386/i386-expand.c |  3 +--
 gcc/cp/call.c |  2 +-
 gcc/cp/typeck.c   |  1 -
 gcc/fortran/check.c   | 18 +++---
 gcc/fortran/decl.c|  1 -
 gcc/fortran/resolve.c |  1 -
 gcc/fortran/simplify.c| 27 ---
 gcc/fortran/trans-array.c |  2 --
 gcc/fortran/trans-common.c|  6 ++
 gcc/fortran/trans-expr.c  |  6 --
 gcc/fortran/trans-intrinsic.c |  1 -
 gcc/fortran/trans-openmp.c|  1 -
 gcc/fortran/trans-types.c | 10 +-
 gcc/fortran/trans.c   |  3 ---
 gcc/lra-eliminations.c|  2 +-
 gcc/reg-stack.c   |  1 -
 gcc/tree-ssa-structalias.c|  1 -
 gcc/tree-vect-generic.c   |  2 --
 libcpp/line-map.c |  3 +--
 libcpp/mkdeps.c   |  2 +-
 21 files changed, 33 insertions(+), 64 deletions(-)


diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index da4aadbc590..cb92710f2bc 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -8601,8 +8601,8 @@ try_to_locate_new_include_insertion_point (const char *file, location_t loc)
 
   /*  Get ordinary map containing LOC (or its expansion).  */
   const line_map_ordinary *ord_map_for_loc = NULL;
-  loc = linemap_resolve_location (line_table, loc, LRK_MACRO_EXPANSION_POINT,
-  &ord_map_for_loc);
+  linemap_resolve_location (line_table, loc, LRK_MACRO_EXPANSION_POINT,
+			&ord_map_for_loc);
   gcc_assert (ord_map_for_loc);
 
   for (unsigned int i = 0; i < LINEMAPS_ORDINARY_USED (line_table); i++)
diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
index d50b811d863..1bd251ea8e2 100644
--- a/gcc/config/i386/i386-expand.c
+++ b/gcc/config/i386/i386-expand.c
@@ -19780,8 +19780,7 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2)
   emit_insn (gen_vec_widen_umult_even_v4si (t5, 
 	gen_lowpart (V4SImode, op1),
 	gen_lowpart (V4SImode, op2)));
-  op0 = expand_binop (mode, add_optab, t5, t4, op0, 1, OPTAB_DIRECT);
-
+  expand_binop (mode, add_optab, t5, t4, op0, 1, OPTAB_DIRECT);
 }
   else
 {
diff --git a/gcc/cp/call.c b/gcc/cp/call.c
index e4923f4ccbf..07093255505 100644
--- a/gcc/cp/call.c
+++ b/gcc/cp/call.c
@@ -6167,7 +6167,7 @@ build_new_op_1 (const op_location_t &loc, enum tree_code code, int flags,
 	  conv = cand->convs[2];
 	  if (conv->kind == ck_ref_bind)
 		conv = next_conversion (conv);
-	  arg3 = convert_like (conv, arg3, complain);
+	  convert_like (conv, arg3, complain);
 	}
 
 	}
diff --git a/gcc/cp/typeck.c b/gcc/cp/typeck.c
index dd76ebe3dbf..77095953134 100644
--- a/gcc/cp/typeck.c
+++ b/gcc/cp/typeck.c
@@ -5218,7 +5218,6 @@ cp_build_binary_op (const op_location_t &location,
 	}
 	  result_type = build_opaque_vector_type (intt,
 		  TYPE_VECTOR_SUBPARTS (type0));
-	  converted = 1;
 	  return build_vec_cmp (resultcode, result_type, op0, op1);
 	}
   build_type = boolean_type_node;
diff --git a/gcc/f

Re: [PATCH] Fix 2 clang warnings.

2019-06-27 Thread Jakub Jelinek
On Thu, Jun 27, 2019 at 04:03:06PM +0200, Martin Liška wrote:
>   * ggc-page.c (free_page): Use (char *) for %p printf format
>   argument.

> --- a/gcc/ggc-page.c
> +++ b/gcc/ggc-page.c
> @@ -977,7 +977,7 @@ free_page (page_entry *entry)
>if (GGC_DEBUG_LEVEL >= 2)
>  fprintf (G.debug_file,
>"Deallocating page at %p, data %p-%p\n", (void *) entry,
> -  entry->page, entry->page + entry->bytes - 1);
> +  (char *)entry->page, (char *)entry->page + entry->bytes - 1);
>  
>/* Mark the page as inaccessible.  Discard the handle to avoid handle
>   leak.  */

Can you explain this?  It makes no sense to me.  What is the warning?
entry->page already has char * type, so why are any casts needed?
If you want to be pedantic, C says that %p argument should be pointer to
void, so I'd understand more
 (void *) entry->page, (void *) (entry->page + entry->bytes - 1));
with that formatting.

Jakub


Re: [PATCH 21/30] Changes to pdp11

2019-06-27 Thread Paul Koning



> On Jun 25, 2019, at 4:22 PM, acsaw...@linux.ibm.com wrote:
> 
> From: Aaron Sawdey 
> 
>   * config/pdp11/pdp11.md (movmemhi, movmemhi1,
>   movmemhi_nocc, UNSPEC_MOVMEM): Change movmem to cpymem.

Ok, thanks.

paul




Re: [PATCH] Fix 2 clang warnings.

2019-06-27 Thread Martin Liška
On 6/27/19 4:11 PM, Jakub Jelinek wrote:
> On Thu, Jun 27, 2019 at 04:03:06PM +0200, Martin Liška wrote:
>>  * ggc-page.c (free_page): Use (char *) for %p printf format
>>  argument.
> 
>> --- a/gcc/ggc-page.c
>> +++ b/gcc/ggc-page.c
>> @@ -977,7 +977,7 @@ free_page (page_entry *entry)
>>if (GGC_DEBUG_LEVEL >= 2)
>>  fprintf (G.debug_file,
>>   "Deallocating page at %p, data %p-%p\n", (void *) entry,
>> - entry->page, entry->page + entry->bytes - 1);
>> + (char *)entry->page, (char *)entry->page + entry->bytes - 1);
>>  
>>/* Mark the page as inaccessible.  Discard the handle to avoid handle
>>   leak.  */
> 
> Can you explain this?  It makes no sense to me.  What is the warning?
> entry->page already has char * type, so why are any casts needed?
> If you want to be pedantic, C says that %p argument should be pointer to
> void, so I'd understand more
>(void *) entry->page, (void *) (entry->page + entry->bytes - 1));
> with that formatting.

You are right, it should be the other way around:

/home/marxin/Programming/gcc/gcc/ggc-page.c:946:60: warning: format specifies 
type 'void *' but the argument has type 'char *' [-Wformat-pedantic]
 (void *) entry, (unsigned long) OBJECT_SIZE (order), page,
  ^~~~
/home/marxin/Programming/gcc/gcc/ggc-page.c:947:7: warning: format specifies 
type 'void *' but the argument has type 'char *' [-Wformat-pedantic]
 page + entry_size - 1);
 ^
/home/marxin/Programming/gcc/gcc/ggc-page.c:980:7: warning: format specifies 
type 'void *' but the argument has type 'char *' [-Wformat-pedantic]
 entry->page, entry->page + entry->bytes - 1);
 ^~~
/home/marxin/Programming/gcc/gcc/ggc-page.c:980:20: warning: format specifies 
type 'void *' but the argument has type 'char *' [-Wformat-pedantic]
 entry->page, entry->page + entry->bytes - 1);
  ^~

Martin

> 
>   Jakub
> 

>From 648388767ade0f7de683e79dca3fdcda3f2acda6 Mon Sep 17 00:00:00 2001
From: Martin Liska 
Date: Thu, 27 Jun 2019 14:09:50 +0200
Subject: [PATCH] Fix 2 clang warnings.

gcc/ChangeLog:

2019-06-27  Martin Liska  

	* edit-context.c (test_applying_fixits_unreadable_file): Do not
	use () for a constructor call.
	(test_applying_fixits_line_out_of_range): Likewise.
	* ggc-page.c (alloc_page): Use (void *) for %p printf format
	argument.
	(free_page): Likewise.
---
 gcc/edit-context.c | 4 ++--
 gcc/ggc-page.c | 6 +++---
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/gcc/edit-context.c b/gcc/edit-context.c
index d3246ab0334..93d10664ae9 100644
--- a/gcc/edit-context.c
+++ b/gcc/edit-context.c
@@ -1639,7 +1639,7 @@ static void
 test_applying_fixits_unreadable_file ()
 {
   const char *filename = "this-does-not-exist.txt";
-  line_table_test ltt ();
+  line_table_test ltt;
   linemap_add (line_table, LC_ENTER, false, filename, 1);
 
   location_t loc = linemap_position_for_column (line_table, 1);
@@ -1670,7 +1670,7 @@ test_applying_fixits_line_out_of_range ()
   const char *old_content = "One-liner file\n";
   temp_source_file tmp (SELFTEST_LOCATION, ".txt", old_content);
   const char *filename = tmp.get_filename ();
-  line_table_test ltt ();
+  line_table_test ltt;
   linemap_add (line_table, LC_ENTER, false, filename, 2);
 
   /* Try to insert a string in line 2.  */
diff --git a/gcc/ggc-page.c b/gcc/ggc-page.c
index 7066ef2c488..a95ff466704 100644
--- a/gcc/ggc-page.c
+++ b/gcc/ggc-page.c
@@ -943,8 +943,8 @@ alloc_page (unsigned order)
   if (GGC_DEBUG_LEVEL >= 2)
 fprintf (G.debug_file,
 	 "Allocating page at %p, object size=%lu, data %p-%p\n",
-	 (void *) entry, (unsigned long) OBJECT_SIZE (order), page,
-	 page + entry_size - 1);
+	 (void *) entry, (unsigned long) OBJECT_SIZE (order),
+	 (void *) page, (void *) (page + entry_size - 1));
 
   return entry;
 }
@@ -977,7 +977,7 @@ free_page (page_entry *entry)
   if (GGC_DEBUG_LEVEL >= 2)
 fprintf (G.debug_file,
 	 "Deallocating page at %p, data %p-%p\n", (void *) entry,
-	 entry->page, entry->page + entry->bytes - 1);
+	 (void *) entry->page, (void *) (entry->page + entry->bytes - 1));
 
   /* Mark the page as inaccessible.  Discard the handle to avoid handle
  leak.  */
-- 
2.21.0



Re: [PATCH] Deprecate -frepo option.

2019-06-27 Thread Martin Liška
On 6/27/19 2:58 PM, Jonathan Wakely wrote:
> On Thu, 27 Jun 2019 at 13:30, Martin Liška  wrote:
>>
>> On 6/21/19 4:28 PM, Richard Biener wrote:
>>> On Fri, Jun 21, 2019 at 4:13 PM Jakub Jelinek  wrote:

 On Fri, Jun 21, 2019 at 04:04:00PM +0200, Martin Liška wrote:
> On 6/21/19 1:58 PM, Jakub Jelinek wrote:
>> On Fri, Jun 21, 2019 at 01:52:09PM +0200, Martin Liška wrote:
>>> On 6/21/19 1:47 PM, Jonathan Wakely wrote:
 On Fri, 21 Jun 2019 at 11:40, Martin Liška wrote:
> Yes, I would be fine to deprecate that for GCC 10.1

 Would it be appropriate to issue a warning in GCC 10.x if the option 
 is used?
>>>
>>> Sure. With the patch attached one will see:
>>>
>>> $ gcc -frepo /tmp/main.cc -c
>>> gcc: warning: switch ‘-frepo’ is no longer supported
>>>
>>> I'm sending patch that also removes -frepo tests from test-suite.
>>> I've been testing the patch.
>>
>> IMHO for just deprecation of an option you don't want to remove it from 
>> the
>> testsuite, just match the warning it will generate in those tests, and
>> I'm not convinced you want to remove it from the documentation (rather 
>> than
>> just saying in the documentation that the option is deprecated and might 
>> be
>> removed in a later GCC version).
>
> Agree with you. I'm sending updated version of the patch.
> Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

 I'm also not convinced about the Deprecated flag, seems like that is a flag
 that we use for options that have been already removed.
 So, instead there should be some proper warning in the C++ FE for it,
 or just Warn.
>>>
>>> In principle -frepo is a nice idea - does it live up to its promises?  That 
>>> is,
>>> does it actually work, for example when throwing it on the libstdc++
>>> testsuite or a larger C++ project?
>>
>> @Jonathan, Jason: Do we know whether it really work?
> 
> I don't know. It's nearly 20 years since I've tried it, but apparently
> a few people try using it:
> https://stackoverflow.com/search?q=frepo+c%2B%2B
> 
> The first result was answered by me in 2012 saying nobody uses it:
> https://stackoverflow.com/a/11832613/981959
> 

Looks at this, it seems to me very legacy and I would recommend to remove it.

Martin


Re: [PATCH] Fix 2 clang warnings.

2019-06-27 Thread Jakub Jelinek
On Thu, Jun 27, 2019 at 04:21:10PM +0200, Martin Liška wrote:
> 2019-06-27  Martin Liska  
> 
>   * edit-context.c (test_applying_fixits_unreadable_file): Do not
>   use () for a constructor call.
>   (test_applying_fixits_line_out_of_range): Likewise.
>   * ggc-page.c (alloc_page): Use (void *) for %p printf format
>   argument.
>   (free_page): Likewise.

LGTM.

Jakub


Re: [PATCH] Fix 2 clang warnings.

2019-06-27 Thread Martin Sebor

On 6/27/19 8:21 AM, Martin Liška wrote:

On 6/27/19 4:11 PM, Jakub Jelinek wrote:

On Thu, Jun 27, 2019 at 04:03:06PM +0200, Martin Liška wrote:

* ggc-page.c (free_page): Use (char *) for %p printf format
argument.



--- a/gcc/ggc-page.c
+++ b/gcc/ggc-page.c
@@ -977,7 +977,7 @@ free_page (page_entry *entry)
if (GGC_DEBUG_LEVEL >= 2)
  fprintf (G.debug_file,
 "Deallocating page at %p, data %p-%p\n", (void *) entry,
-entry->page, entry->page + entry->bytes - 1);
+(char *)entry->page, (char *)entry->page + entry->bytes - 1);
  
/* Mark the page as inaccessible.  Discard the handle to avoid handle

   leak.  */


Can you explain this?  It makes no sense to me.  What is the warning?
entry->page already has char * type, so why are any casts needed?
If you want to be pedantic, C says that %p argument should be pointer to
void, so I'd understand more
 (void *) entry->page, (void *) (entry->page + entry->bytes - 1));
with that formatting.


You are right, it should be the other way around:

/home/marxin/Programming/gcc/gcc/ggc-page.c:946:60: warning: format specifies 
type 'void *' but the argument has type 'char *' [-Wformat-pedantic]


I'm normally in favor of cleaning up warnings but this one looks
worse than useless to me: heeding it makes what's clean, easy t
read, and perfectly correct and safe code harder to read purely
for the sake of pedantry.  There is no difference between a char*
and void*.  If we want to make an improvement let's propose to
relax the pointless C requirement that the %p argument be a void*
and allow it to be any object pointer.

Martin


  (void *) entry, (unsigned long) OBJECT_SIZE (order), page,
   ^~~~
/home/marxin/Programming/gcc/gcc/ggc-page.c:947:7: warning: format specifies 
type 'void *' but the argument has type 'char *' [-Wformat-pedantic]
  page + entry_size - 1);
  ^
/home/marxin/Programming/gcc/gcc/ggc-page.c:980:7: warning: format specifies 
type 'void *' but the argument has type 'char *' [-Wformat-pedantic]
  entry->page, entry->page + entry->bytes - 1);
  ^~~
/home/marxin/Programming/gcc/gcc/ggc-page.c:980:20: warning: format specifies 
type 'void *' but the argument has type 'char *' [-Wformat-pedantic]
  entry->page, entry->page + entry->bytes - 1);
   ^~

Martin



Jakub







Re: [PATCH] constrain one character optimization to one character stores (PR 90989)

2019-06-27 Thread Jeff Law
On 6/26/19 8:40 PM, Martin Sebor wrote:
> On 6/26/19 4:31 PM, Jeff Law wrote:
>> On 6/25/19 5:03 PM, Martin Sebor wrote:
>>
>>>
>>> The caller ensures that handle_char_store is only called for stores
>>> to arrays (MEM_REF) or single elements as wide as char.
>> Where?  I don't see it, even after fixing the formatting in
>> strlen_check_and_optimize_stmt :-)
>>
>>>    gimple *stmt = gsi_stmt (*gsi);
>>>
>>>    if (is_gimple_call (stmt))
>>
>> I think we can agree that we don't have a call, so this is false.
>>
>>>   else if (is_gimple_assign (stmt) && !gimple_clobber_p (stmt))
>>>  {
>>>    tree lhs = gimple_assign_lhs (stmt);
>> This should be true IIUC, so we'll go into its THEN block.
>>
>>
>>>    if (TREE_CODE (lhs) == SSA_NAME && POINTER_TYPE_P (TREE_TYPE
>>> (lhs)))
>> Should be false.
>>
>>>    else if (TREE_CODE (lhs) == SSA_NAME && INTEGRAL_TYPE_P
>>> (TREE_TYPE (lhs)))
>>
>> Should also be false.
>>
>>>    else if (TREE_CODE (lhs) != SSA_NAME && !TREE_SIDE_EFFECTS (lhs))
>> Should be true since LHS is a MEM_REF.
>>
>>
>>>     {
>>>    tree type = TREE_TYPE (lhs);
>>>    if (TREE_CODE (type) == ARRAY_TYPE)
>>>  type = TREE_TYPE (type);
>>>    if (TREE_CODE (type) == INTEGER_TYPE
>>>    && TYPE_MODE (type) == TYPE_MODE (char_type_node)
>>>    && TYPE_PRECISION (type) == TYPE_PRECISION
>>> (char_type_node))
>>>  {
>>>    if (! handle_char_store (gsi))
>>>  return false;
>>>  }
>>>  }
>> If TREE_TYPE (type) is an ARRAY_TYPE, we strip the ARRAY_TYPE.  We then
>> verify that TYPE is a single character type.  _But_ we stripped away the
>> ARRAY_TYPE.  So ISTM that we allow either an array of chars or a single
>> char on the LHS.
>>
>> So how does this avoid multiple character stores?!?  We could have had
>> an ARRAY_REF on the LHS and we never check the number of elements in the
>> array.  There's no check on the RHS either.  SO I don't see how we
>> guarantee that we're dealing with a single character store.
>>
>> What am I missing here?
> 
> Can you show me a test case where it breaks?  If not, I don't know
> what you want me to do.  I'll just move on to something else.
THe thing to do is research what gimple accepts and what other passes
may do.  Given this is a code generation bug, "just moving on" isn't
really a good option unless we're going to rip out that little bit of code.

As I was thinking about this last night, the pass we'd want to look at
would be store merging.  Let's do that on the call today.

jeff


Re: [PATCH] constrain one character optimization to one character stores (PR 90989)

2019-06-27 Thread Jeff Law
On 6/27/19 9:00 AM, Jeff Law wrote:
> On 6/26/19 8:40 PM, Martin Sebor wrote:
>> On 6/26/19 4:31 PM, Jeff Law wrote:
>>> On 6/25/19 5:03 PM, Martin Sebor wrote:
>>>

 The caller ensures that handle_char_store is only called for stores
 to arrays (MEM_REF) or single elements as wide as char.
>>> Where?  I don't see it, even after fixing the formatting in
>>> strlen_check_and_optimize_stmt :-)
>>>
    gimple *stmt = gsi_stmt (*gsi);

    if (is_gimple_call (stmt))
>>>
>>> I think we can agree that we don't have a call, so this is false.
>>>
   else if (is_gimple_assign (stmt) && !gimple_clobber_p (stmt))
  {
    tree lhs = gimple_assign_lhs (stmt);
>>> This should be true IIUC, so we'll go into its THEN block.
>>>
>>>
    if (TREE_CODE (lhs) == SSA_NAME && POINTER_TYPE_P (TREE_TYPE
 (lhs)))
>>> Should be false.
>>>
    else if (TREE_CODE (lhs) == SSA_NAME && INTEGRAL_TYPE_P
 (TREE_TYPE (lhs)))
>>>
>>> Should also be false.
>>>
    else if (TREE_CODE (lhs) != SSA_NAME && !TREE_SIDE_EFFECTS (lhs))
>>> Should be true since LHS is a MEM_REF.
>>>
>>>
     {
    tree type = TREE_TYPE (lhs);
    if (TREE_CODE (type) == ARRAY_TYPE)
  type = TREE_TYPE (type);
    if (TREE_CODE (type) == INTEGER_TYPE
    && TYPE_MODE (type) == TYPE_MODE (char_type_node)
    && TYPE_PRECISION (type) == TYPE_PRECISION
 (char_type_node))
  {
    if (! handle_char_store (gsi))
  return false;
  }
  }
>>> If TREE_TYPE (type) is an ARRAY_TYPE, we strip the ARRAY_TYPE.  We then
>>> verify that TYPE is a single character type.  _But_ we stripped away the
>>> ARRAY_TYPE.  So ISTM that we allow either an array of chars or a single
>>> char on the LHS.
>>>
>>> So how does this avoid multiple character stores?!?  We could have had
>>> an ARRAY_REF on the LHS and we never check the number of elements in the
>>> array.  There's no check on the RHS either.  SO I don't see how we
>>> guarantee that we're dealing with a single character store.
>>>
>>> What am I missing here?
>>
>> Can you show me a test case where it breaks?  If not, I don't know
>> what you want me to do.  I'll just move on to something else.
> THe thing to do is research what gimple accepts and what other passes
> may do.  Given this is a code generation bug, "just moving on" isn't
> really a good option unless we're going to rip out that little bit of code.
> 
> As I was thinking about this last night, the pass we'd want to look at
> would be store merging.  Let's do that on the call today.
Actually it was trivial to create with store merging.

char x[20];
foo()
{
  x[0] = 0x41;
  x[1] = 0x42;
}

  MEM  [(char *)&x] = 16961;

So clearly we can get this in gimple.  So we need a check of some kind,
either on the LHS or the RHS to ensure that we actually have a single
character store as opposed to something like the above.

jeff


Re: [PATCH] Fix 2 clang warnings.

2019-06-27 Thread Martin Sebor

On 6/27/19 8:03 AM, Martin Liška wrote:

Hi.

This reduces 2 warnings reported by clang.

Patch can bootstrap on x86_64-linux-gnu and survives regression tests.

Ready to be installed?
Thanks,
Martin

gcc/ChangeLog:

2019-06-27  Martin Liska  

* edit-context.c (test_applying_fixits_unreadable_file): Do not
use () for a constructor call.
(test_applying_fixits_line_out_of_range): Likewise.
* ggc-page.c (free_page): Use (char *) for %p printf format
argument.
---
  gcc/edit-context.c | 4 ++--
  gcc/ggc-page.c | 2 +-
  2 files changed, 3 insertions(+), 3 deletions(-)


Just as a side note:

diff --git a/gcc/edit-context.c b/gcc/edit-context.c
index d3246ab0334..93d10664ae9 100644
--- a/gcc/edit-context.c
+++ b/gcc/edit-context.c
@@ -1639,7 +1639,7 @@ static void
 test_applying_fixits_unreadable_file ()
 {
   const char *filename = "this-does-not-exist.txt";
-  line_table_test ltt ();
+  line_table_test ltt;
   linemap_add (line_table, LC_ENTER, false, filename, 1);

   location_t loc = linemap_position_for_column (line_table, 1);

I'm guessing the warning above is about declaring a function
returning line_table_test rather than declaring an object of
the type.  (It would be helpful to show those warnings when
fixing them.  A small test case confirms it's -Wvexing-parse:
warning: empty parentheses interpreted as a function declaration.)

This is a common mistake to make so it would be a useful
warning to add to GCC as well.  And in fact, I see at least
one request for it in Bugzilla: 25814 (with the related
pr86564).

Martin


Re: [RFC PATCH, i386]: Autovectorize 8-byte vectors

2019-06-27 Thread Jeff Law
On 6/27/19 12:05 AM, Jakub Jelinek wrote:
> On Wed, Jun 26, 2019 at 12:19:28PM +0200, Uros Bizjak wrote:
>> Yes, the patch works OK. I'll regression test it and push it later today.
> 
> I think it caused
> +FAIL: gcc.dg/tree-ssa/pr84512.c scan-tree-dump optimized "return 285;"
> which admittedly already is xfailed on various targets.
> We now newly vectorize those loops and there is no FRE or similar pass
> after vectorization to clean it up, in particular optimize the
> a[8] and a[9] loads given the MEM  [(int *)&a + 32B]
> store:
>   MEM  [(int *)&a + 32B] = { 64, 81 };
>   _13 = a[8];
>   res_6 = _13 + 140;
>   _18 = a[9];
>   res_15 = res_6 + _18;
>   a ={v} {CLOBBER};
>   return res_15;
> 
> Shall we xfail it, or is there a plan to enable FRE after vectorization,
> or similar pass that would be able to do similar memory optimizations?
> Note, the RTL passes are able to optimize it in the end in this testcase.
I wonder if we could logically break up the vector store within DOM.  If
we did that we'd end up with a[8] and a[9] in DOM's expression hash
table.  That would allow us to replace the loads into _13 and _18 with
constants and the rest should just fall out.

Care to open a BZ?  If so, go ahead and assign it to me.

jeff


Re: [RFC PATCH, i386]: Autovectorize 8-byte vectors

2019-06-27 Thread Jakub Jelinek
On Thu, Jun 27, 2019 at 09:24:58AM -0600, Jeff Law wrote:
> On 6/27/19 12:05 AM, Jakub Jelinek wrote:
> > On Wed, Jun 26, 2019 at 12:19:28PM +0200, Uros Bizjak wrote:
> >> Yes, the patch works OK. I'll regression test it and push it later today.
> > 
> > I think it caused
> > +FAIL: gcc.dg/tree-ssa/pr84512.c scan-tree-dump optimized "return 285;"
> > which admittedly already is xfailed on various targets.
> > We now newly vectorize those loops and there is no FRE or similar pass
> > after vectorization to clean it up, in particular optimize the
> > a[8] and a[9] loads given the MEM  [(int *)&a + 32B]
> > store:
> >   MEM  [(int *)&a + 32B] = { 64, 81 };
> >   _13 = a[8];
> >   res_6 = _13 + 140;
> >   _18 = a[9];
> >   res_15 = res_6 + _18;
> >   a ={v} {CLOBBER};
> >   return res_15;
> > 
> > Shall we xfail it, or is there a plan to enable FRE after vectorization,
> > or similar pass that would be able to do similar memory optimizations?
> > Note, the RTL passes are able to optimize it in the end in this testcase.
> I wonder if we could logically break up the vector store within DOM.  If
> we did that we'd end up with a[8] and a[9] in DOM's expression hash
> table.  That would allow us to replace the loads into _13 and _18 with
> constants and the rest should just fall out.
> 
> Care to open a BZ?  If so, go ahead and assign it to me.

I think Richi is on working on adding fre3 now.

Jakub


Re: [RFC PATCH, i386]: Autovectorize 8-byte vectors

2019-06-27 Thread Jeff Law
On 6/27/19 9:34 AM, Jakub Jelinek wrote:
> On Thu, Jun 27, 2019 at 09:24:58AM -0600, Jeff Law wrote:
>> On 6/27/19 12:05 AM, Jakub Jelinek wrote:
>>> On Wed, Jun 26, 2019 at 12:19:28PM +0200, Uros Bizjak wrote:
 Yes, the patch works OK. I'll regression test it and push it later today.
>>>
>>> I think it caused
>>> +FAIL: gcc.dg/tree-ssa/pr84512.c scan-tree-dump optimized "return 285;"
>>> which admittedly already is xfailed on various targets.
>>> We now newly vectorize those loops and there is no FRE or similar pass
>>> after vectorization to clean it up, in particular optimize the
>>> a[8] and a[9] loads given the MEM  [(int *)&a + 32B]
>>> store:
>>>   MEM  [(int *)&a + 32B] = { 64, 81 };
>>>   _13 = a[8];
>>>   res_6 = _13 + 140;
>>>   _18 = a[9];
>>>   res_15 = res_6 + _18;
>>>   a ={v} {CLOBBER};
>>>   return res_15;
>>>
>>> Shall we xfail it, or is there a plan to enable FRE after vectorization,
>>> or similar pass that would be able to do similar memory optimizations?
>>> Note, the RTL passes are able to optimize it in the end in this testcase.
>> I wonder if we could logically break up the vector store within DOM.  If
>> we did that we'd end up with a[8] and a[9] in DOM's expression hash
>> table.  That would allow us to replace the loads into _13 and _18 with
>> constants and the rest should just fall out.
>>
>> Care to open a BZ?  If so, go ahead and assign it to me.
> 
> I think Richi is on working on adding fre3 now.
Yea, I saw that later.  I think Richi's message indicated he wanted a
late fre pass, so even if DOM was to capture this, it may not eliminate
the desire for a late fre pass.

jeff


Re: [PATCH 1/3] C++20 constexpr lib part 1/3

2019-06-27 Thread Jonathan Wakely

On 26/06/19 19:13 -0400, Ed Smith-Rowland via libstdc++ wrote:

Here is the first of three patches for C++20 constexpr library.

?? Implement C++20 p0202 - Add constexpr Modifiers to Functions in 
 and  Headers.

 ??Implement C++20 p1023 - constexpr comparison operators for std::array.

I believe I have answered peoples concerns with the last patch 
attempts [https://gcc.gnu.org/ml/libstdc++/2019-03/msg00132.html].


The patch is large because of test cases but really just boils down to 
adding constexpr for c++2a.


I still have some concerns about the changes like:

template
+  _GLIBCXX20_CONSTEXPR
  bool
-  operator()(_Iterator __it, _Value& __val) const
+  operator()(_Iterator __it, const _Value& __val) const
  { return *__it < __val; }


I think that might change semantics for some types, and I haven't yet
figured out if we care about those cases or not. I'll try to come up
with some examples that change meaning.

This could use _GLIBCXX14_CONSTEXPR:

+  /**
+   * A constexpr wrapper for __builtin_memmove.
+   * @param __num The number of elements of type _Tp (not bytes).

I don't think we want a Doxygen comment for this, as it's not part of
the public API. Just a normal comment is fine.

+   */
+  template
+_GLIBCXX20_CONSTEXPR
+inline void*
+__memmove(_Tp* __dst, const _Tp* __src, size_t __num)
+{
+#if __cplusplus > 201703L \
+&& defined(_GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED)
+  if (__builtin_is_constant_evaluated())
+   {
+ for(; __num > 0; --__num)
+   {
+ if constexpr (_IsMove)
+   *__dst = std::move(*__src);
+ else
+   *__dst = *__src;

Do we need this _IsMove condition here? We should only be using this
function for trivially copyable types, in which case copy vs move
shouldn't matter, should it?

Oh ... is it because we also end up using this __memmove function for
non-trivially copyable types, during constant evaluation? Hmm. Then
it's more than just a wrapper for __builtin_memmove, right? It's
"either memmove or constexpr range copy", or something.

I don't think this will work in a constant expression:

  /// Assign @p __new_val to @p __obj and return its previous value.
  template 
+_GLIBCXX20_CONSTEXPR
inline _Tp
exchange(_Tp& __obj, _Up&& __new_val)
{ return std::__exchange(__obj, std::forward<_Up>(__new_val)); }

Because std::__exchange hasn't been marked constexpr. The test passes
because it doesn't call it in a context that requires constant
evaluation:

 const auto x = std::exchange(e, pi);

I see the same problem in other tests too:

+  constexpr std::array car{{0, 1, 2, 3, 4, 5, 6, 6, 8, 9, 9, 11}};
+
+  const auto out0x = std::adjacent_find(car.begin(), car.end());
+
+  const auto out1x = std::adjacent_find(car.begin(), car.end(),
+   std::equal_to());





Re: [PATCH][middle-end/88784] Middle end is missing some optimizations about unsigned

2019-06-27 Thread Jeff Law
On 6/27/19 12:11 AM, Li Jia He wrote:
> Hi,
> 
> According to the optimizable case described by Qi Feng on
> issue 88784, we can combine the cases into the following:
> 
> 1. x >  y  &&  x != XXX_MIN  -->   x > y
> 2. x >  y  &&  x == XXX_MIN  -->   false
> 3. x <= y  &&  x == XXX_MIN  -->   x == XXX_MIN
> 
> 4. x <  y  &&  x != XXX_MAX  -->   x < y
> 5. x <  y  &&  x == XXX_MAX  -->   false
> 6. x >= y  &&  x == XXX_MAX  -->   x == XXX_MAX
> 
> 7. x >  y  ||  x != XXX_MIN  -->   x != XXX_MIN
> 8. x <= y  ||  x != XXX_MIN  -->   true
> 9. x <= y  ||  x == XXX_MIN  -->   x <= y
> 
> 10. x <  y  ||  x != XXX_MAX  -->   x != UXXX_MAX
> 11. x >= y  ||  x != XXX_MAX  -->   true
> 12. x >= y  ||  x == XXX_MAX  -->   x >= y
> 
> Note: XXX_MIN represents the minimum value of type x.
>   XXX_MAX represents the maximum value of type x.
> 
> Here we don't need to care about whether the operation is
> signed or unsigned.  For example, in the below equation:
> 
> 'x >  y  &&  x != XXX_MIN  -->   x > y'
> 
> If the x type is signed int and XXX_MIN is INT_MIN, we can
> optimize it to 'x > y'.  However, if the type of x is unsigned
> int and XXX_MIN is 0, we can still optimize it to 'x > y'.
> 
> The regression testing for the patch was done on GCC mainline on
> 
> powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> with no regressions.  Is it OK for trunk ?
> 
> Thanks,
> Lijia He
> 
> gcc/ChangeLog
> 
> 2019-06-27  Li Jia He  
>   Qi Feng  
> 
>   PR middle-end/88784
>   * gimple-fold.c (and_comparisons_contain_equal_operands): New function.
>   (and_comparisons_1): Use and_comparisons_contain_equal_operands.
>   (or_comparisons_contain_equal_operands): New function.
>   (or_comparisons_1): Use or_comparisons_contain_equal_operands.
Would this be better done via match.pd?  ISTM this transformation would
be well suited for that framework.

jeff


Re: [PATCH] constrain one character optimization to one character stores (PR 90989)

2019-06-27 Thread Jakub Jelinek
On Thu, Jun 27, 2019 at 09:15:41AM -0600, Jeff Law wrote:
> Actually it was trivial to create with store merging.
> 
> char x[20];
> foo()
> {
>   x[0] = 0x41;
>   x[1] = 0x42;
> }
> 
>   MEM  [(char *)&x] = 16961;
> 
> So clearly we can get this in gimple.  So we need a check of some kind,
> either on the LHS or the RHS to ensure that we actually have a single
> character store as opposed to something like the above.

handle_char_store is only called if:
  tree type = TREE_TYPE (lhs);
  if (TREE_CODE (type) == ARRAY_TYPE)
type = TREE_TYPE (type);
  if (TREE_CODE (type) == INTEGER_TYPE
  && TYPE_MODE (type) == TYPE_MODE (char_type_node)
  && TYPE_PRECISION (type) == TYPE_PRECISION (char_type_node))
{
  if (! handle_char_store (gsi))
return false;
}
so the above case will not trigger, the only way to call it for multiple
character stores is if you have an array.  And so far I've succeeded
creating that just with strcpy builtin.
E.g. the
  if (storing_all_zeros_p && cmp == 0 && si->full_string_p)
{
  /* When overwriting a '\0' with a '\0', the store can be removed
 if we know it has been stored in the current function.  */
  if (!stmt_could_throw_p (cfun, stmt) && si->writable)
optimization looks unsafe if we are actually storing more than one zero
because then the first zero in there is redundant, but there could be
non-zero chars after that and removing the larger all zeros store will
keep those other chars with incorrect values; I've tried to reproduce it with:
__attribute__((noipa)) void
bar (char *x, int l1, int l2)
{
  if (__builtin_memcmp (x, "abcd\0", 6) != 0 || l1 != 4 || l2 != 4)
__builtin_abort ();
}

int
main ()
{
  char x[64];
  __builtin_memset (x, -1, sizeof (x));
  asm volatile ("" : : "g" (&x[0]) : "memory");
  __builtin_strcpy (x, "abcd");
  int l1 = __builtin_strlen (x);
#ifdef ONE
  short __attribute__((may_alias)) *p = (void *) &x[0];
  *(p + 2) = 0;
#else
  short *p = (short *) (&x[0] + 4);
  __builtin_memcpy (p, "\0\0", 2);
#endif
  int l2 = __builtin_strlen (x);
  bar (x, l1, l2);
}

but the first case creates a short store, so handle_char_store isn't
called, and second for some reason isn't folded from memcpy to a char[2]
store.

Fundamentally, there is no reason to guard handle_char_store with
char or array of char stores, as long as CHAR_BIT == 8 && BITS_PER_UNIT == 8
and native_encode_expr can handle the rhs expression - then we can analyze
exactly where is the first '\0' character in there and in the code take it
into account.

Jakub


[PATCH], PR Fix floatsi{sf,df}2 and floatunssi{sf,df}2 for a future powerpc machine

2019-06-27 Thread Michael Meissner
As I detail in PR 91009, I had some testsuite failures with my patches for a
future machine.  In the future patches, I added new RTL attributes to support
the new prefixed load/store instructions (which will include pc-relative
support).  This new attribute needs to look at the 'type' RTL attribute, and
that in turn depends on doing a constrain operands to determine whether the
insn is a load, store, or add instruction.

This constrain operands call occurs before the first split pass.  The insns
that convert SImode to SFmode or DFmode which are split in the first pass,
currently use the "wa" constraint for the floating point result.  Unfortunately
if the user uses the -mno-vsx option, the "wa" constraint becomes NO_REGS.

This patch is a rather limited patch that just adds an alternative using the
"d" constraint, so there is a valid alternative before splitting.

This patch does not do the cleanup mentioned in PR 90822.  That will still be
done, but I wanted to fix the actual problem before doing the next iteration of
the code cleanup.

I have done a bootstrap and make check, and there were no regressions.  Can I
check this into the trunk?

2019-06-27   Michael Meissner  

PR target/91009
* config/rs6000/rs6000.md (floatsi2_lfiwax): Add non-VSX
alternative.
(floatsi2_lfiwax_mem): Add non-VSX alternative.
(floatunssi2_lfiwzx): Add non-VSX alternative.
(floatunssi2_lfiwzx_mem): Add non-VSX alternative.

Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 272436)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -5227,9 +5227,9 @@ (define_insn "lfiwax"
 ; not be needed and also in case the insns are deleted as dead code.
 
 (define_insn_and_split "floatsi2_lfiwax"
-  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=")
-   (float:SFDF (match_operand:SI 1 "nonimmediate_operand" "r")))
-   (clobber (match_scratch:DI 2 "=wa"))]
+  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,")
+   (float:SFDF (match_operand:SI 1 "nonimmediate_operand" "r,r")))
+   (clobber (match_scratch:DI 2 "=d,wa"))]
   "TARGET_HARD_FLOAT && TARGET_LFIWAX
&&  && can_create_pseudo_p ()"
   "#"
@@ -5266,11 +5266,11 @@ (define_insn_and_split "floatsi2_l
(set_attr "type" "fpload")])
 
 (define_insn_and_split "floatsi2_lfiwax_mem"
-  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=")
+  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,")
(float:SFDF
 (sign_extend:DI
- (match_operand:SI 1 "indexed_or_indirect_operand" "Z"
-   (clobber (match_scratch:DI 2 "=wa"))]
+ (match_operand:SI 1 "indexed_or_indirect_operand" "Z,Z"
+   (clobber (match_scratch:DI 2 "=d,wa"))]
   "TARGET_HARD_FLOAT && TARGET_LFIWAX && "
   "#"
   ""
@@ -5303,9 +5303,9 @@ (define_insn "lfiwzx"
(set_attr "isa" "*,p8v,p8v,p9v")])
 
 (define_insn_and_split "floatunssi2_lfiwzx"
-  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=")
-   (unsigned_float:SFDF (match_operand:SI 1 "nonimmediate_operand" "r")))
-   (clobber (match_scratch:DI 2 "=wa"))]
+  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,")
+   (unsigned_float:SFDF (match_operand:SI 1 "nonimmediate_operand" "r,r")))
+   (clobber (match_scratch:DI 2 "=d,wa"))]
   "TARGET_HARD_FLOAT && TARGET_LFIWZX && "
   "#"
   ""
@@ -5341,11 +5341,11 @@ (define_insn_and_split "floatunssi
(set_attr "type" "fpload")])
 
 (define_insn_and_split "floatunssi2_lfiwzx_mem"
-  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=")
+  [(set (match_operand:SFDF 0 "gpc_reg_operand" "=d,")
(unsigned_float:SFDF
 (zero_extend:DI
- (match_operand:SI 1 "indexed_or_indirect_operand" "Z"
-   (clobber (match_scratch:DI 2 "=wa"))]
+ (match_operand:SI 1 "indexed_or_indirect_operand" "Z,Z"
+   (clobber (match_scratch:DI 2 "=d,wa"))]
   "TARGET_HARD_FLOAT && TARGET_LFIWZX && "
   "#"
   ""

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH 1/3] C++20 constexpr lib part 1/3

2019-06-27 Thread Ed Smith-Rowland via gcc-patches

On 6/27/19 11:41 AM, Jonathan Wakely wrote:

On 26/06/19 19:13 -0400, Ed Smith-Rowland via libstdc++ wrote:

Here is the first of three patches for C++20 constexpr library.

?? Implement C++20 p0202 - Add constexpr Modifiers to Functions 
in  and  Headers.
 ??Implement C++20 p1023 - constexpr comparison operators for 
std::array.


I believe I have answered peoples concerns with the last patch 
attempts [https://gcc.gnu.org/ml/libstdc++/2019-03/msg00132.html].


The patch is large because of test cases but really just boils down 
to adding constexpr for c++2a.


I still have some concerns about the changes like:

?? template
+?? _GLIBCXX20_CONSTEXPR
?? bool
-?? operator()(_Iterator __it, _Value& __val) const
+?? operator()(_Iterator __it, const _Value& __val) const
?? { return *__it < __val; }

Make a type where operator< changes the rhs.?? I was thinking you'd want 
a compare to operate on const things but I guess we have to be ready for 
anything.?? I was also thinking < is left associative for a class member 
but i guess a class could have a thing that mutates the rhs too.?? 
Nothing got hit in any of my testing.?? If this is a thing we probably 
should make a general test for things like this somewhere.?? Maybe 
someone wants to log references or something.


1. I'll experiment to see if I really need these (I *thought* I did, but...)

2. If I still do then I'll make overloads?



I think that might change semantics for some types, and I haven't yet
figured out if we care about those cases or not. I'll try to come up
with some examples that change meaning.

This could use _GLIBCXX14_CONSTEXPR:

+?? /**
+ * A constexpr wrapper for __builtin_memmove.
+ * @param __num The number of elements of type _Tp (not bytes).

I don't think we want a Doxygen comment for this, as it's not part of
the public API. Just a normal comment is fine.

+ */
+?? template
+?? _GLIBCXX20_CONSTEXPR
+?? inline void*
+?? __memmove(_Tp* __dst, const _Tp* __src, size_t __num)
+?? {
+#if __cplusplus > 201703L \
+?? && defined(_GLIBCXX_HAVE_BUILTIN_IS_CONSTANT_EVALUATED)
+?? if (__builtin_is_constant_evaluated())
+?? {
+?? for(; __num > 0; --__num)
+?? {
+?? if constexpr (_IsMove)
+?? *__dst = std::move(*__src);
+?? else
+?? *__dst = *__src;

Do we need this _IsMove condition here? We should only be using this
function for trivially copyable types, in which case copy vs move
shouldn't matter, should it?

Oh ... is it because we also end up using this __memmove function for
non-trivially copyable types, during constant evaluation? Hmm. Then
it's more than just a wrapper for __builtin_memmove, right? It's
"either memmove or constexpr range copy", or something.


Yup.?? I'm also bad at naming things.?? Furthermore, I expect the 
standards people to want to have our versions of memcpy, memcmp, memmove 
that we own and can make constexpr.?? These are such important concepts.


Also, part 2 brings in constexpr swap. Maybe this is the may to 
implement an actual memmove?




I don't think this will work in a constant expression:

?? /// Assign @p __new_val to @p __obj and return its previous value.
?? template 
+?? _GLIBCXX20_CONSTEXPR
?? inline _Tp
?? exchange(_Tp& __obj, _Up&& __new_val)
?? { return std::__exchange(__obj, std::forward<_Up>(__new_val)); }

Because std::__exchange hasn't been marked constexpr. The test passes
because it doesn't call it in a context that requires constant
evaluation:

??const auto x = std::exchange(e, pi);

Derp.


I see the same problem in other tests too:

+?? constexpr std::array car{{0, 1, 2, 3, 4, 5, 6, 6, 8, 9, 9, 
11}};

+
+?? const auto out0x = std::adjacent_find(car.begin(), car.end());
+
+?? const auto out1x = std::adjacent_find(car.begin(), car.end(),
+?? std::equal_to())


I will go through all the tests and make sure that some nontrivial 
calculation is done that contributes to a final bool return.?? And clean 
up the mess.?? I did this in chunk 2 but I guess I left a lot of chunk 1.


Come to think of it, we could build *insane* concepts after this.?? For 
good or ill.


Ed



Re: [PATCH] constrain one character optimization to one character stores (PR 90989)

2019-06-27 Thread Martin Sebor

On 6/27/19 9:15 AM, Jeff Law wrote:

On 6/27/19 9:00 AM, Jeff Law wrote:

On 6/26/19 8:40 PM, Martin Sebor wrote:

On 6/26/19 4:31 PM, Jeff Law wrote:

On 6/25/19 5:03 PM, Martin Sebor wrote:



The caller ensures that handle_char_store is only called for stores
to arrays (MEM_REF) or single elements as wide as char.

Where?  I don't see it, even after fixing the formatting in
strlen_check_and_optimize_stmt :-)


    gimple *stmt = gsi_stmt (*gsi);

    if (is_gimple_call (stmt))


I think we can agree that we don't have a call, so this is false.


   else if (is_gimple_assign (stmt) && !gimple_clobber_p (stmt))
  {
    tree lhs = gimple_assign_lhs (stmt);

This should be true IIUC, so we'll go into its THEN block.



    if (TREE_CODE (lhs) == SSA_NAME && POINTER_TYPE_P (TREE_TYPE
(lhs)))

Should be false.


    else if (TREE_CODE (lhs) == SSA_NAME && INTEGRAL_TYPE_P
(TREE_TYPE (lhs)))


Should also be false.


    else if (TREE_CODE (lhs) != SSA_NAME && !TREE_SIDE_EFFECTS (lhs))

Should be true since LHS is a MEM_REF.



     {
    tree type = TREE_TYPE (lhs);
    if (TREE_CODE (type) == ARRAY_TYPE)
  type = TREE_TYPE (type);
    if (TREE_CODE (type) == INTEGER_TYPE
    && TYPE_MODE (type) == TYPE_MODE (char_type_node)
    && TYPE_PRECISION (type) == TYPE_PRECISION
(char_type_node))
  {
    if (! handle_char_store (gsi))
  return false;
  }
  }

If TREE_TYPE (type) is an ARRAY_TYPE, we strip the ARRAY_TYPE.  We then
verify that TYPE is a single character type.  _But_ we stripped away the
ARRAY_TYPE.  So ISTM that we allow either an array of chars or a single
char on the LHS.

So how does this avoid multiple character stores?!?  We could have had
an ARRAY_REF on the LHS and we never check the number of elements in the
array.  There's no check on the RHS either.  SO I don't see how we
guarantee that we're dealing with a single character store.

What am I missing here?


Can you show me a test case where it breaks?  If not, I don't know
what you want me to do.  I'll just move on to something else.

THe thing to do is research what gimple accepts and what other passes
may do.  Given this is a code generation bug, "just moving on" isn't
really a good option unless we're going to rip out that little bit of code.

As I was thinking about this last night, the pass we'd want to look at
would be store merging.  Let's do that on the call today.

Actually it was trivial to create with store merging.

char x[20];
foo()
{
   x[0] = 0x41;
   x[1] = 0x42;
}

   MEM  [(char *)&x] = 16961;


The LHS is unsigned short so handle_char_store would not be called
because of the check in the caller.  You would need something like:

  MEM  [(char *)&x] = { 'a', 'b' };

or something like that where the type is an array of char.  I have
no idea if something like it can happen or how.  As it is, I'm
pretty sure it's not valid (but who knows).


So clearly we can get this in gimple.  So we need a check of some kind,
either on the LHS or the RHS to ensure that we actually have a single
character store as opposed to something like the above.


As far as I can see that describes the check in the caller.

I never said the MEM_REF you show above can't come up in Gimple.
What I am saying is that I don't know how to get it to cause
a problem here, or even to make it into the function to begin
with.  Store merging doesn't do it because it runs after strlen.
If it did run earlier as I said above, it wouldn't make it into
the function.  I tried it to convince myself.

Without a test case showing how something can happen I don't know
what to do to prevent it.  You're throwing out hypothetical
scenarios based on what's valid Gimple and expect me to write
code to handle them.  I'm sorry but I don't work that way.  I
need a test case to see a problem, understand it, and to verify
I have fixed it properly.

In this instance, I can't think of a problem here, either real
or hypothetical, so it's impossible for me to do something to
prevent it.  Even if the multicharacter store did make it into
the function I don't even understand what it is you want me to
do -- we'd have to look at the value of the RHS, find a nul in
its representation if there is one, and either compute the new
string length from it or invalidate the existing length.  Either
way it's not trivial and would need to be tested to make sure
it's correct.  So we're back to a test case.

Martin

PS Here's a test case that can be made to fail but only if a)
store merging runs before strlen, and also b) the LHS check
above is disabled.

  char b[4];

  int f (const char *s)
  {
__builtin_strcpy (b, "123");

int i = __builtin_strcmp (s, b);

b[0] = 'a';
b[1] = 0;

//  MEM  [(char *)&b] = 97;

if (__builtin_strlen (b) != 1)
  __builtin_abort ();

return i;
  }


Re: [PATCH] x86: mark "k" and "Yk" constraints as non-internal

2019-06-27 Thread H.J. Lu
On Thu, Jun 27, 2019 at 5:26 AM Uros Bizjak  wrote:
>
> On Thu, Jun 27, 2019 at 2:23 PM Jan Beulich  wrote:
> >
> > >>> On 27.06.19 at 14:00,  wrote:
> > > On Thu, Jun 27, 2019 at 1:46 PM Jan Beulich  wrote:
> > >>
> > >> >>> On 27.06.19 at 13:09,  wrote:
> > >> > On Thu, Jun 27, 2019 at 12:11 PM Jan Beulich  wrote:
> > >> >>
> > >> >> Without these constraints asm() can't make use of mask registers.
> > >> >
> > >> > asm should be deprecated. We have intrinsics for this purpose.
> > >>
> > >> While maybe not explicitly applicable here, the intrinsics aren't
> > >> (afaict) providing full flexibility. In particular (just as example)
> > >> I haven't found a way to use embedded broadcast with the
> > >> intrinsics, but I can easily do so with asm().
> > >>
> > >> Furthermore there are other reasons to use asm() - things like
> > >> the Linux kernel are full of it for a reason. And once one has
> > >> to use asm(), the resulting code typically is easier to follow if
> > >> one doesn't further intermix it with uses of builtins.
> > >>
> > >> And finally, if asm() was indeed meant to be deprecated, how
> > >> come it pretty recently got extended to allow for "inline"?
> > >
> > > I didn't mean that asm() in general should be deprecated, but for SSE
> > > and other vector extensions, where intrinsics are available,
> > > intrinsics should be used instead. There was exactly zero requests to
> > > use new asm constraints, it looks that people are satisfied with
> > > intrinsics approach (which is also future-proof, etc).
> >
> > So what about my embedded broadcast example then? "Zero
> > requests" is clearly not exactly right. It simply didn't occur to me
> > (until I noticed the @internal here) that I should raise such a
> > request, rather than just using asm(). Subsequently I did then
> > notice "Yh" going away, complicating things further ...
>
> There was some work by HJ a while ago that implemented automatic
> generation of embedded broadcasts. Perhaps he can answer the question.

I implemented broadcast for some commonly used instructions.  Adding
broadcast to all will take time, especially when there is no user demand.

> Uros.
>
> > I'd also like to note that the choice of types on some of the builtins
> > makes it rather cumbersome to use them. Especially for scalar
> > operations I've found myself better resorting to asm(). See
> > https://xenbits.xen.org/gitweb/?p=xen.git;a=blob;f=tools/tests/x86_emulator/simd.c
> > (most of the changes submitted (not so) recently have been
> > coming from the work of putting together this and its sibling
> > tests for the Xen Project instruction emulator).
> >
> > Jan
> >
> >



-- 
H.J.


Re: [PATCH] constrain one character optimization to one character stores (PR 90989)

2019-06-27 Thread Jakub Jelinek
On Thu, Jun 27, 2019 at 10:58:25AM -0600, Martin Sebor wrote:
> The LHS is unsigned short so handle_char_store would not be called
> because of the check in the caller.  You would need something like:
> 
>   MEM  [(char *)&x] = { 'a', 'b' };

This is invalid, because the rhs is non-empty CONSTRUCTOR that doesn't have
VECTOR_TYPE - verify_gimple_assign_single has:
case CONSTRUCTOR:
  if (TREE_CODE (rhs1_type) == VECTOR_TYPE)
...
  else if (CONSTRUCTOR_NELTS (rhs1) != 0)
{
  error ("non-vector %qs with elements", code_name);
  debug_generic_stmt (rhs1);
  return true;
}

But
  MEM  [(char *)&x] = MEM  [(char *)"ab"];
is valid.  Or = {} would be valid too, even for array stores.

Jakub


Re: [PATCH 1/3] C++20 constexpr lib part 1/3

2019-06-27 Thread Ville Voutilainen
On Thu, 27 Jun 2019 at 19:55, Ed Smith-Rowland via libstdc++
 wrote:
> > I don't think this will work in a constant expression:
> >
> > ?? /// Assign @p __new_val to @p __obj and return its previous value.
> > ?? template 
> > +?? _GLIBCXX20_CONSTEXPR
> > ?? inline _Tp
> > ?? exchange(_Tp& __obj, _Up&& __new_val)
> > ?? { return std::__exchange(__obj, std::forward<_Up>(__new_val)); }
> >
> > Because std::__exchange hasn't been marked constexpr. The test passes
> > because it doesn't call it in a context that requires constant
> > evaluation:
> >
> > ??const auto x = std::exchange(e, pi);
> Derp.
> >
> > I see the same problem in other tests too:
> >
> > +?? constexpr std::array car{{0, 1, 2, 3, 4, 5, 6, 6, 8, 9, 9,
> > 11}};
> > +
> > +?? const auto out0x = std::adjacent_find(car.begin(), car.end());
> > +
> > +?? const auto out1x = std::adjacent_find(car.begin(), car.end(),
> > +?? std::equal_to())
>
> I will go through all the tests and make sure that some nontrivial
> calculation is done that contributes to a final bool return.?? And clean
> up the mess.?? I did this in chunk 2 but I guess I left a lot of chunk 1.

As was the case with the iterator patch, it's not a question of doing
a nontrivial calculation, but
a question of initializing a constexpr variable with the result. (Yeah
sure a non-type template
argument would also work but eurgh). Then, and only then, have we
proven that the code
works in a constexpr context. Initializing a const doesn't do that.
constinit would, but that's
C++20.


Re: [PATCH] Remove another bunch of dead assignment.

2019-06-27 Thread Richard Sandiford
Martin Liška  writes:
> diff --git a/gcc/config/i386/i386-expand.c b/gcc/config/i386/i386-expand.c
> index d50b811d863..1bd251ea8e2 100644
> --- a/gcc/config/i386/i386-expand.c
> +++ b/gcc/config/i386/i386-expand.c
> @@ -19780,8 +19780,7 @@ ix86_expand_sse2_mulvxdi3 (rtx op0, rtx op1, rtx op2)
>emit_insn (gen_vec_widen_umult_even_v4si (t5, 
>   gen_lowpart (V4SImode, op1),
>   gen_lowpart (V4SImode, op2)));
> -  op0 = expand_binop (mode, add_optab, t5, t4, op0, 1, OPTAB_DIRECT);
> -
> +  expand_binop (mode, add_optab, t5, t4, op0, 1, OPTAB_DIRECT);
>  }
>else
>  {

This means that nothing uses the expanded value.  It looks like the
call was meant to be force_expand_binop instead, so that the expansion
always goes to op0.

Richard


Re: [PATCH] Deprecate -frepo option.

2019-06-27 Thread Jason Merrill
On Thu, Jun 27, 2019 at 10:23 AM Martin Liška  wrote:
>
> On 6/27/19 2:58 PM, Jonathan Wakely wrote:
> > On Thu, 27 Jun 2019 at 13:30, Martin Liška  wrote:
> >>
> >> On 6/21/19 4:28 PM, Richard Biener wrote:
> >>> On Fri, Jun 21, 2019 at 4:13 PM Jakub Jelinek  wrote:
> 
>  On Fri, Jun 21, 2019 at 04:04:00PM +0200, Martin Liška wrote:
> > On 6/21/19 1:58 PM, Jakub Jelinek wrote:
> >> On Fri, Jun 21, 2019 at 01:52:09PM +0200, Martin Liška wrote:
> >>> On 6/21/19 1:47 PM, Jonathan Wakely wrote:
>  On Fri, 21 Jun 2019 at 11:40, Martin Liška wrote:
> > Yes, I would be fine to deprecate that for GCC 10.1
> 
>  Would it be appropriate to issue a warning in GCC 10.x if the option 
>  is used?
> >>>
> >>> Sure. With the patch attached one will see:
> >>>
> >>> $ gcc -frepo /tmp/main.cc -c
> >>> gcc: warning: switch ‘-frepo’ is no longer supported
> >>>
> >>> I'm sending patch that also removes -frepo tests from test-suite.
> >>> I've been testing the patch.
> >>
> >> IMHO for just deprecation of an option you don't want to remove it 
> >> from the
> >> testsuite, just match the warning it will generate in those tests, and
> >> I'm not convinced you want to remove it from the documentation (rather 
> >> than
> >> just saying in the documentation that the option is deprecated and 
> >> might be
> >> removed in a later GCC version).
> >
> > Agree with you. I'm sending updated version of the patch.
> > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> 
>  I'm also not convinced about the Deprecated flag, seems like that is a 
>  flag
>  that we use for options that have been already removed.
>  So, instead there should be some proper warning in the C++ FE for it,
>  or just Warn.
> >>>
> >>> In principle -frepo is a nice idea - does it live up to its promises?  
> >>> That is,
> >>> does it actually work, for example when throwing it on the libstdc++
> >>> testsuite or a larger C++ project?
> >>
> >> @Jonathan, Jason: Do we know whether it really work?
> >
> > I don't know. It's nearly 20 years since I've tried it, but apparently
> > a few people try using it:
> > https://stackoverflow.com/search?q=frepo+c%2B%2B
> >
> > The first result was answered by me in 2012 saying nobody uses it:
> > https://stackoverflow.com/a/11832613/981959
>
> Looks at this, it seems to me very legacy and I would recommend to remove it.

It's useful on targets without COMDAT support.  Are there any such
that we care about at this point?

If the problem is the combination with LTO, why not just prohibit that?

Jason


[Committed] PF fortran/90987 -- Adjust parsing of COMMONI

2019-06-27 Thread Steve Kargl
I've committed the attached patch after successful
regression tests on x86_64-*-freebsd.  The patch
returns MATCH_NO when matching is done for a COMMON
statement, but the statement in fact is not COMMON.
See testcases.

While here I corrected the recording of the wrong
revision number in the ChangeLogs from my previous
commit.

2019-06-27  Steven G. Kargl  

PR fortran/90987
 * match.c (gfc_match_common): Adjust parsing of fixed and free form
source code containing, e.g., COMMONI.

2019-06-27  Steven G. Kargl  

PR fortran/90987
* gfortran.dg/common_1.f: new test.
* gfortran.dg/common_26.f90: Ditto.

-- 
Steve
Index: gcc/fortran/match.c
===
--- gcc/fortran/match.c	(revision 272755)
+++ gcc/fortran/match.c	(working copy)
@@ -5115,7 +5115,15 @@ gfc_match_common (void)
   gfc_array_spec *as;
   gfc_equiv *e1, *e2;
   match m;
+  char c;
 
+  /* COMMON has been matched.  In free form source code, the next character
+ needs to be whitespace or '/'.  Check that here.   Fixed form source
+ code needs to be checked below.  */
+  c = gfc_peek_ascii_char ();
+  if (gfc_current_form == FORM_FREE && !gfc_is_whitespace (c) && c != '/')
+return MATCH_NO;
+
   as = NULL;
 
   for (;;)
@@ -5279,10 +5287,24 @@ gfc_match_common (void)
 	  gfc_gobble_whitespace ();
 	  if (gfc_match_eos () == MATCH_YES)
 	goto done;
-	  if (gfc_peek_ascii_char () == '/')
+	  c = gfc_peek_ascii_char ();
+	  if (c == '/')
 	break;
-	  if (gfc_match_char (',') != MATCH_YES)
-	goto syntax;
+	  if (c != ',')
+	{
+	  /* In Fixed form source code, gfortran can end up here for an
+		 expression of the form COMMONI = RHS.  This may not be an
+		 error, so return MATCH_NO.  */
+	  if (gfc_current_form == FORM_FIXED && c == '=')
+		{
+		  gfc_free_array_spec (as);
+		  return MATCH_NO;
+		}
+	  goto syntax;
+	}
+	  else
+	gfc_match_char (',');
+
 	  gfc_gobble_whitespace ();
 	  if (gfc_peek_ascii_char () == '/')
 	break;
@@ -6248,6 +6270,7 @@ gfc_match_select_type (void)
   sym->attr.flavor = FL_VARIABLE;
   sym->attr.referenced = 1;
   sym->attr.class_ok = 1;
+  sym->attr.target = expr2->symtree->n.sym->attr.target;
 }
   else
 {
Index: gcc/testsuite/gfortran.dg/common_1.f
===
--- gcc/testsuite/gfortran.dg/common_1.f	(nonexistent)
+++ gcc/testsuite/gfortran.dg/common_1.f	(working copy)
@@ -0,0 +1,14 @@
+! { dg-do compile }
+  module mymod
+  type :: mytyp
+  integer :: i
+  end type mytyp
+  contains
+  subroutine mysub
+  implicit none
+  type(mytyp) :: a
+  integer :: commoni,commonj
+  commoni = a%i
+  commonj = a%j  ! { dg-error "is not a member of" }
+  end subroutine mysub
+  end module mymod
Index: gcc/testsuite/gfortran.dg/common_26.f90
===
--- gcc/testsuite/gfortran.dg/common_26.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/common_26.f90	(working copy)
@@ -0,0 +1,14 @@
+! { dg-do compile }
+module mymod
+  type :: mytyp
+integer :: i
+  end type mytyp
+contains
+  subroutine mysub
+implicit none
+type(mytyp) :: a
+integer :: commoni,commonj
+commoni = a%i
+commonj = a%j ! { dg-error "is not a member of" }
+  end subroutine mysub
+end module mymod


Re: Use ODR for canonical types construction in LTO

2019-06-27 Thread Jason Merrill
On Mon, Jun 24, 2019 at 1:46 PM Jan Hubicka  wrote:
> >
> > I thought I remembered someone's recent-ish work to treat specially
> > types containing a char array, but I'm not finding it now.
> >
> > > For dynamically allocated memory as well as for stack space after stack
> > > slot sharing done in cfgexpand I see this is necessary since we do not
> > > preserve any information about placement new.
> >
> > Yes, untyped memory is different, but I'd think that memory allocated
> > through (non-placement) new should also be considered typed.
>
> I will try to return to this once the code is cleaned up. It would
> be quite interesting to make this better defined.
> > >
> > > I think this is what Richard refers to the code generating clobber
> > > statements that is only leaking as-base types to the middle-end visible
> > > part of IL and the code in call.c copying base structures.
> >
> > Right.  Is there a better way we could express copying/clobbering only
> > part of the object without involving the as-base types?
>
> I think currently two variants was discussed
>  1) use the trik Richard proposed for class a containing fake
> subclass a_as_base that has reduced size (i.e. is the AS_BASE type
> we have now) and adjust all component_refs accordingly introducing
> the view_convert_exprs for outer decls.
>
> Then clobbers and copies in call.c could use the as_base type.

This is the approach I talked about in 22488, which would also fix the
issue of fields with TYPE_SIZE larger than DECL_SIZE.

>  2) expose IS_FAKE_BASE_TYPE to middle-end and teach TBAA machinery
> about the fact that these are actually same types for everything
> it cares about.
>
> We have similar hacks for Fortran commons already, but of couse
> it is not most pretty (I outlined plan in previous mail)
>
> I would personally lean towards 2 since it will keep all component_refs
> in its current natural form and because I know how to implement it :)
> I guess it is Richi and yours call to pick variant which is better...

Both of these you mention still involve the as-base type.  I was
wondering if there was a way to avoid that.

> > > So at this time basically every C++ type can inter-operate with non-C++.
> > > I was thinking of relaxing this somewhat but wanted to see if C++
> > > standard says something here. Things that may be sensible include:
> > >  1) perhaps non-POD types especially those with vptr pointers do
> > > not need to be inter-operable.
> >
> > PODs were intended to be the C-compatible subset, yes.
> >
> > >  2) anonymous namespace types
> > >  3) types in namespace
> >
> > As long as these types don't have explicit language linkage (e.g.
> > extern "C"), sure.
>
> Great, I will add those to my TODOs.
> Do we have any way to tell language linkage from middle-end?

Not with a flag, currently, but you could look at the mangled name to
recognize extern "C".

Jason


Re: [PATCH] Deprecate -frepo option.

2019-06-27 Thread Jan Hubicka
> 
> It's useful on targets without COMDAT support.  Are there any such
> that we care about at this point?
> 
> If the problem is the combination with LTO, why not just prohibit that?

The problem is that at the collect2 time we want to decide whether to
hold stderr/stdout of the linker. The issue is that we do not know yet
if we are going to LTO (because that is decided by linker plugin) or
whether we do repo files (because we look for those only after linker
failure).

We can look for repo files first but that could take some time
especially for large programs (probably not too bad compared to actual
linking later) or we may require -frepo to be passed to collect2.

Honza
> 
> Jason


Re: [PATCH] Deprecate -frepo option.

2019-06-27 Thread Iain Sandoe


> On 27 Jun 2019, at 19:21, Jan Hubicka  wrote:
> 
>> 
>> It's useful on targets without COMDAT support.  Are there any such
>> that we care about at this point?
>> 
>> If the problem is the combination with LTO, why not just prohibit that?
> 
> The problem is that at the collect2 time we want to decide whether to
> hold stderr/stdout of the linker. The issue is that we do not know yet
> if we are going to LTO (because that is decided by linker plugin) or
> whether we do repo files (because we look for those only after linker
> failure).

you could pre-scan for LTO, as is done for the collect2+non-plugin linker.
(similar to the comment below, that would take some time, but probably
small c.f the actual link).
Iain

> We can look for repo files first but that could take some time
> especially for large programs (probably not too bad compared to actual
> linking later) or we may require -frepo to be passed to collect2.
> 
> Honza
>> 
>> Jason



Re: [PATCH 32/30] Document movmem/cpymem changes in gcc-10/changes.html

2019-06-27 Thread Aaron Sawdey
On 6/25/19 4:43 PM, Jeff Law wrote:
> On 6/25/19 2:22 PM, acsaw...@linux.ibm.com wrote:
>> From: Aaron Sawdey 
>>
>>  * builtins.c (get_memory_rtx): Fix comment.
>>  * optabs.def (movmem_optab): Change to cpymem_optab.
>>  * expr.c (emit_block_move_via_cpymem): Change movmem to cpymem.
>>  (emit_block_move_hints): Change movmem to cpymem.
>>  * defaults.h: Change movmem to cpymem.
>>  * targhooks.c (get_move_ratio): Change movmem to cpymem.
>>  (default_use_by_pieces_infrastructure_p): Ditto.
> So I think you're missing an update to the RTL/MD documentation.  This
> is also likely to cause problems for any out-of-tree ports, so it's
> probably worth a mention in the gcc-10 changes, which will need to be
> created (in CVS no less, ugh).
> 
> I think the stuff posted to-date is fine, but it shouldn't go in without
> the corresponding docs and gcc-10 changes updates.

Here is the corresponding documentation change for gcc-10/changes.html.

OK for trunk?

Thanks,
Aaron




Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-10/changes.html,v
retrieving revision 1.4
diff -r1.4 changes.html
139c139,149
< 
---
> Other significant improvements
> 
>   
> To allow inline expansion of both memcpy
> and memmove, the existing movmem instruction
> patterns used for non-overlapping memory copies have been renamed to
> cpymem. The movmem name is now used
> for overlapping memory moves, consistent with the
> library functions memcpy and memmove.
>   
> 

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain


Re: [PATCH] constrain one character optimization to one character stores (PR 90989)

2019-06-27 Thread Richard Biener
On June 27, 2019 7:04:32 PM GMT+02:00, Jakub Jelinek  wrote:
>On Thu, Jun 27, 2019 at 10:58:25AM -0600, Martin Sebor wrote:
>> The LHS is unsigned short so handle_char_store would not be called
>> because of the check in the caller.  You would need something like:
>> 
>>   MEM  [(char *)&x] = { 'a', 'b' };
>
>This is invalid, because the rhs is non-empty CONSTRUCTOR that doesn't
>have
>VECTOR_TYPE - verify_gimple_assign_single has:
>case CONSTRUCTOR:
>  if (TREE_CODE (rhs1_type) == VECTOR_TYPE)
>...
>  else if (CONSTRUCTOR_NELTS (rhs1) != 0)
>{
>  error ("non-vector %qs with elements", code_name);
>  debug_generic_stmt (rhs1);
>  return true;
>}
>
>But
>  MEM  [(char *)&x] = MEM  [(char *)"ab"];
>is valid.  Or = {} would be valid too, even for array stores.

And you can of course write GIMPLE unit tests for the pass using the GIMPLE FE. 

Richard. 

>   Jakub



Re: [PATCH] Deprecate -frepo option.

2019-06-27 Thread Jan Hubicka
> 
> > On 27 Jun 2019, at 19:21, Jan Hubicka  wrote:
> > 
> >> 
> >> It's useful on targets without COMDAT support.  Are there any such
> >> that we care about at this point?
> >> 
> >> If the problem is the combination with LTO, why not just prohibit that?
> > 
> > The problem is that at the collect2 time we want to decide whether to
> > hold stderr/stdout of the linker. The issue is that we do not know yet
> > if we are going to LTO (because that is decided by linker plugin) or
> > whether we do repo files (because we look for those only after linker
> > failure).
> 
> you could pre-scan for LTO, as is done for the collect2+non-plugin linker.
> (similar to the comment below, that would take some time, but probably
> small c.f the actual link).

the collect2+non-plugin does not handle static archives so it would need
more work while I think we should kill that code (and make darwin to use
lld)

Honza

> Iain
> 
> > We can look for repo files first but that could take some time
> > especially for large programs (probably not too bad compared to actual
> > linking later) or we may require -frepo to be passed to collect2.
> > 
> > Honza
> >> 
> >> Jason
> 


[Darwin, PPC, committed] Do not use longcall for 64b code.

2019-06-27 Thread Iain Sandoe
The linker [ld64] that supports 64Bit does not need the JBSR longcall
optimisation, and will not work with the most generic case (where the
symbol is undefined external, but there is no symbl stub).  So switch
the longcall option off.  ld64 will generate branch islands as needed.

tested on powerpc-darwin9.
applied to mainline
thanks
Iain

2019-06-27  Iain Sandoe  

* config/rs6000/rs6000.c (darwin_rs6000_override_options): Do not
use longcall for 64b code.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 3b59db5..fbff6bd 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -3418,6 +3418,15 @@ darwin_rs6000_override_options (void)
   rs6000_isa_flags |= OPTION_MASK_POWERPC64;
   warning (0, "%qs requires PowerPC64 architecture, enabling", "-m64");
 }
+
+  /* The linkers [ld64] that support 64Bit do not need the JBSR longcall
+ optimisation, and will not work with the most generic case (where the
+ symbol is undefined external, but there is no symbl stub).  */
+  if (TARGET_64BIT)
+rs6000_default_long_calls = 0;
+
+  /* ld_classic is (so far) still used for kernel (static) code, and supports
+ the JBSR longcall / branch islands.  */
   if (flag_mkernel)
 {
   rs6000_default_long_calls = 1;



[Darwin, PPC, committed] Correct whitespace in specs.

2019-06-27 Thread Iain Sandoe
A recent merge dropped whitespace in the endfile specs, which affects
transactional memory cases.

applied to mainline
thanks
Iain

2019-06-27  Iain Sandoe  

* config/rs6000/darwin.h (ENDFILE_SPEC): Correct whitespace in the
spec.

--- a/gcc/config/rs6000/darwin.h
+++ b/gcc/config/rs6000/darwin.h
@@ -135,7 +135,7 @@ extern int darwin_emit_picsym_stub;
 /* The PPC regs save/restore functions are leaves and could, conceivably
be used by the tm destructor.  */
 #undef ENDFILE_SPEC
-#define ENDFILE_SPEC TM_DESTRUCTOR "-lef_ppc"
+#define ENDFILE_SPEC TM_DESTRUCTOR " -lef_ppc"
 
 #undef SUBTARGET_EXTRA_SPECS
 #define SUBTARGET_EXTRA_SPECS  \



[Darwin, PPC, committed] Allow the user to override the use of hard float in kexts.

2019-06-27 Thread Iain Sandoe
The default for the kernel is soft-float, however a user writing a kernel
extension might want to make use of hard float.  This change makes
" -mkernel -mhard-float " work as expected.

tested on powerpc-darwin9
applied to mainline
thanks
Iain

2019-06-27  Iain Sandoe  

* config/rs6000/rs6000.c (darwin_rs6000_override_options): Honour
user-specified float mode choice for kernel mode code.

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index fbff6bd..5e80673 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -3430,7 +3430,10 @@ darwin_rs6000_override_options (void)
   if (flag_mkernel)
 {
   rs6000_default_long_calls = 1;
-  rs6000_isa_flags |= OPTION_MASK_SOFT_FLOAT;
+
+  /* Allow a kext author to do -mkernel -mhard-float.  */
+  if (! (rs6000_isa_flags_explicit & OPTION_MASK_SOFT_FLOAT))
+rs6000_isa_flags |= OPTION_MASK_SOFT_FLOAT;
 }
 
   /* Make -m64 imply -maltivec.  Darwin's 64-bit ABI includes



Re: [PATCH] constrain one character optimization to one character stores (PR 90989)

2019-06-27 Thread Martin Sebor

On 6/27/19 11:04 AM, Jakub Jelinek wrote:

On Thu, Jun 27, 2019 at 10:58:25AM -0600, Martin Sebor wrote:

The LHS is unsigned short so handle_char_store would not be called
because of the check in the caller.  You would need something like:

   MEM  [(char *)&x] = { 'a', 'b' };


This is invalid, because the rhs is non-empty CONSTRUCTOR that doesn't have
VECTOR_TYPE - verify_gimple_assign_single has:
 case CONSTRUCTOR:
   if (TREE_CODE (rhs1_type) == VECTOR_TYPE)
...
   else if (CONSTRUCTOR_NELTS (rhs1) != 0)
 {
   error ("non-vector %qs with elements", code_name);
   debug_generic_stmt (rhs1);
   return true;
 }

But
   MEM  [(char *)&x] = MEM  [(char *)"ab"];


Thanks.  A test case that uses this is below.  It's handled correctly
because the RHS is not all non-zero bytes (storing_nonzero_p is false.

The block Jeff is concerned about is the one below:

  else if (storing_nonzero_p
   && cmp > 0
   && TREE_CODE (TREE_TYPE (rhs)) == INTEGER_TYPE)
{
  gsi_next (gsi);
  return false;

so what would be needed to show an outstanding problem even with
the patch applied (i.e., the INTEGER_TYPE check) is a RHS that's
interpreted as all nonzero bytes despite having a nul byte in its
representation.  I.e., an integer that initializer_zerop() sets
the second argument to true for, or tree_expr_nonzero_p (RHS)
returns true for.

Both initializer_zerop and tree_expr_nonzero_p give up on this
representation even though they could handle it because it's
a constant.  Handling it would be an improvement but it would
still return false.  What we need is a scalar for the RHS that
is non-zero with a nul byte in it.  That would do it, but I
don't know how to create it or if it's even possible.

Martin


  char b[10];

  const char a[2] = "4";

  int f (const char *s)
  {
__builtin_strcpy (b, "123");

int i = __builtin_strcmp (s, b);

// MEM  [(char * {ref-all})&b]
//   = MEM  [(char * {ref-all})&a];
__builtin_memcpy (b, a, 2);

if (__builtin_strlen (b) != 1)
  __builtin_abort ();

return i;
  }


Re: [PATCH] Fix 2 clang warnings.

2019-06-27 Thread NightStrike
On Thu, Jun 27, 2019 at 11:16 AM Martin Sebor  wrote:
>
> On 6/27/19 8:03 AM, Martin Liška wrote:
> > Hi.
> >
> > This reduces 2 warnings reported by clang.
> >
> > Patch can bootstrap on x86_64-linux-gnu and survives regression tests.
> >
> > Ready to be installed?
> > Thanks,
> > Martin
> >
> > gcc/ChangeLog:
> >
> > 2019-06-27  Martin Liska  
> >
> >   * edit-context.c (test_applying_fixits_unreadable_file): Do not
> >   use () for a constructor call.
> >   (test_applying_fixits_line_out_of_range): Likewise.
> >   * ggc-page.c (free_page): Use (char *) for %p printf format
> >   argument.
> > ---
> >   gcc/edit-context.c | 4 ++--
> >   gcc/ggc-page.c | 2 +-
> >   2 files changed, 3 insertions(+), 3 deletions(-)
>
> Just as a side note:
>
> diff --git a/gcc/edit-context.c b/gcc/edit-context.c
> index d3246ab0334..93d10664ae9 100644
> --- a/gcc/edit-context.c
> +++ b/gcc/edit-context.c
> @@ -1639,7 +1639,7 @@ static void
>   test_applying_fixits_unreadable_file ()
>   {
> const char *filename = "this-does-not-exist.txt";
> -  line_table_test ltt ();
> +  line_table_test ltt;
> linemap_add (line_table, LC_ENTER, false, filename, 1);
>
> location_t loc = linemap_position_for_column (line_table, 1);
>
> I'm guessing the warning above is about declaring a function
> returning line_table_test rather than declaring an object of
> the type.  (It would be helpful to show those warnings when
> fixing them.  A small test case confirms it's -Wvexing-parse:
> warning: empty parentheses interpreted as a function declaration.)
>
> This is a common mistake to make so it would be a useful
> warning to add to GCC as well.  And in fact, I see at least
> one request for it in Bugzilla: 25814 (with the related
> pr86564).
>
> Martin

Realistically, if it's worth fixing any warning that clang outputted
and that GCC doesn't. then perhaps the right fix is to add the
warning to GCC.  Or, if it's there but not in -Wall, add it to -Wall.
You're pointing out this specific case, but there've been several of
these patches that are all different.


RFC: GCC Aarch64 SIMD vectorization question involving libmvec

2019-06-27 Thread Steve Ellcey
I am testing the latest GCC with not-yet-submitted GLIBC changes that
implement libmvec on Aarch64.

While trying to run SPEC 2017 (specifically 521.wrf_r) I ran into a
case where GCC was generating a call to _ZGVnN2vv_powf, that is a
vectorized powf call for 2 (not 4) elements.  This was a problem
because I only implemented a 4 element 32 bit vectorized powf function
for libmvec and not a 2 element version.

I think this is due to aarch64_simd_clone_compute_vecsize_and_simdlen
which allows for (element count * element size) to be either 64
or 128.

I would like some thoughts on what we should do about this, should
we require glibc/libmvec to provide 2 element 32 bit floating point
vector functions (as well as the 4 element ones) or should we change
aarch64_simd_clone_compute_vecsize_and_simdlen to only allow 4
element (128 total bit size) vectors and not 2 element (64 total bit
size) ones?

This is obviously a question for the pre-SVE vector instructions,
I am not sure how this would be handled in SVE.

Steve Ellcey
sell...@marvell.com

P.S.  Here a test case in Fortran that generated the 2 element
  vector call.  It unrolled the loop into one vector call
  of 2 elements and one scalar call.

  SUBROUTINE FOO(B,W,P)
  REAL, DIMENSION (3) :: W, P
  DO 10 I = 1, 3
  P(I) = W(I) ** B
10CONTINUE
  END SUBROUTINE FOO


[PATCH] Prepare for prefixed instructions on PowerPC

2019-06-27 Thread Michael Meissner
A future PowerPC machine may have prefixed instructions.  This patch changes
the RTL "length" attribute of all of the "mov*", and "*extend*" insns from an
explicit "4" to "*".  This change prepares for the length attribute to be set
appropriately for single instruction loads, stores, and add immediates that may
include prefixed instructions.  The idea is to do this patch now, rather than
having these lines be part of the larger patches.

As we discussed off-line earlier, I changed all of the "4" lengths to be "*",
even for instruction alternatives that would not be subject to being changed to
be a prefixed instruction (i.e. the sign extend instruction "extsw"'s length is
set to "*", even though it would not be a prefixed instruction).  I also
changed the Altivec load/store instructions, as well as a power8 permute
instruction just be consistant with the other changes.

This patch does not fix the places where the length is not "4".  Those lengths
will be updated as needed as the rest of the prefixed instruction support is
rolled out.  It is intended to be a simple patch to make the other patches
somewhat simpler.

I did a bootstrap and make check.  There were no regressions.  Can I check this
patch into the trunk?

2019-06-27  Michael Meissner  

* config/rs6000/altivec.md (altivec_mov, VM2 iterator):
Change the RTL attribute "length" from "4" to "*" to allow the
length attribute to be adjusted automatically for prefixed load,
store, and add immediate instructions.
* config/rs6000/rs6000.md (extendhi2, EXTHI iterator):
Likewise.
(extendsi2, EXTSI iterator): Likewise.
(movsi_internal1): Likewise.
(movsi_from_sf): Likewise.
(movdi_from_sf_zero_ext): Likewise.
(mov_internal): Likewise.
(movcc_internal1, QHI iterator): Likewise.
(mov_softfloat, FMOVE32 iterator): Likewise.
(movsf_from_si): Likewise.
(mov_hardfloat32, FMOVE64 iterator): Likewise.
(mov_softfloat64, FMOVE64 iterator): Likewise.
(mov, FMOVE128 iterator): Likewise.
(movdi_internal64): Likewise.
* config/rs6000/vsx.md (vsx_le_permute_, VSX_TI iterator):
Likewise.
(vsx_le_undo_permute_, VSX_TI iterator): Likewise.
(vsx_mov_64bit, VSX_M iterator): Likewise.
(vsx_mov_32bit, VSX_M iterator): Likewise.
(vsx_splat_v4sf): Likewise.

Index: gcc/config/rs6000/altivec.md
===
--- gcc/config/rs6000/altivec.md(revision 272714)
+++ gcc/config/rs6000/altivec.md(working copy)
@@ -256,7 +256,7 @@ (define_insn "*altivec_mov"
* return output_vec_const_move (operands);
#"
   [(set_attr "type" "vecstore,vecload,veclogical,store,load,*,veclogical,*,*")
-   (set_attr "length" "4,4,4,20,20,20,4,8,32")])
+   (set_attr "length" "*,*,*,20,20,20,*,8,32")])
 
 ;; Unlike other altivec moves, allow the GPRs, since a normal use of TImode
 ;; is for unions.  However for plain data movement, slightly favor the vector
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 272719)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -965,7 +965,7 @@ (define_insn "*extendhi2"
vextsh2d %0,%1"
   [(set_attr "type" "load,exts,fpload,vecperm")
(set_attr "sign_extend" "yes")
-   (set_attr "length" "4,4,8,4")
+   (set_attr "length" "*,*,8,*")
(set_attr "isa" "*,*,p9v,p9v")])
 
 (define_split
@@ -1040,7 +1040,7 @@ (define_insn "extendsi2"
#"
   [(set_attr "type" "load,exts,fpload,fpload,mffgpr,vecexts,vecperm,mftgpr")
(set_attr "sign_extend" "yes")
-   (set_attr "length" "4,4,4,4,4,4,8,8")
+   (set_attr "length" "*,*,*,*,*,*,8,8")
(set_attr "isa" "*,*,p6,p8v,p8v,p9v,p8v,p8v")])
 
 (define_split
@@ -6915,11 +6915,11 @@ (define_insn "*movsi_internal1"
 veclogical, veclogical,  vecsimple,   mffgpr,  mftgpr,
 *,  *,   *")
(set_attr "length"
-   "4,  4,   4,   4,   4,
-4,  4,   4,   4,   4,
-8,  4,   4,   4,   4,
-4,  4,   8,   4,   4,
-4,  4,   4")
+   "*,  *,   *,   *,   *,
+*,  *,   *,   *,   *,
+8,  *,   *,   *,   *,
+*,  *,   8,   *,   *,
+*,  *,   *")
(set_attr "isa"
"*,  *,   *,   p8v, p8v,
 *,  p8v, p8v, *,   *,
@@ -6995,9 +6995,9 @@ (define_insn_and_split "movsi_from_sf"
 fpstore,fpstore, fpstore,

[PATCH, obvious] gdbhooks.py: rename parameters to match usage

2019-06-27 Thread Vladislav Ivanishin
Hi,

The parameter names here were in the wrong order (which doesn't matter
to the interpreter, but confuses the humans reading the calling code).

I am going to install the following patch soon.

gcc/ChangeLog:

* gdbhooks.py (GdbPrettyPrinters.add_printer_for_types): Reorder
parameter names in accord with usage (no functional change).
(GdbPrettyPrinters.add_printer_for_regex): Ditto.
---
 gcc/gdbhooks.py | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/gcc/gdbhooks.py b/gcc/gdbhooks.py
index 7b1a7be0002..15a5ceaa56f 100644
--- a/gcc/gdbhooks.py
+++ b/gcc/gdbhooks.py
@@ -521,11 +521,11 @@ class GdbPrettyPrinters(gdb.printing.PrettyPrinter):
 def __init__(self, name):
 super(GdbPrettyPrinters, self).__init__(name, [])
 
-def add_printer_for_types(self, name, class_, types):
-self.subprinters.append(GdbSubprinterTypeList(name, class_, types))
+def add_printer_for_types(self, types, name, class_):
+self.subprinters.append(GdbSubprinterTypeList(types, name, class_))
 
-def add_printer_for_regex(self, name, class_, regex):
-self.subprinters.append(GdbSubprinterRegex(name, class_, regex))
+def add_printer_for_regex(self, regex, name, class_):
+self.subprinters.append(GdbSubprinterRegex(regex, name, class_))
 
 def __call__(self, gdbval):
 type_ = gdbval.type.unqualified()
-- 
2.22.0


-- 
Vlad


[C++ PATCH] PR c++/55442 - memory-hog with highly recursive constexpr.

2019-06-27 Thread Jason Merrill
This testcase in the PR is extremely recursive, and therefore uses a huge
amount of memory on caching the results of individual calls.  We no longer
need to track all calls to catch infinite recursion, as we have other limits
on maximum depth and operations count.  So let's only cache a few calls at
the top level: 8 seems to be a reasonable compromise.

Tested x86_64-pc-linux-gnu, applying to trunk.

gcc/c-family/
* c.opt (fconstexpr-loop-limit): New.
gcc/cp/
* constexpr.c (push_cx_call_context): Return depth.
(cxx_eval_call_expression): Don't cache past constexpr_cache_depth.
---
 gcc/doc/invoke.texi| 16 ++--
 gcc/c-family/c.opt |  4 
 gcc/cp/constexpr.c | 29 ++---
 gcc/c-family/ChangeLog |  5 +
 gcc/cp/ChangeLog   |  6 ++
 5 files changed, 47 insertions(+), 13 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 03d2895c9d7..664c38f78af 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -209,8 +209,9 @@ in the following sections.
 @xref{C++ Dialect Options,,Options Controlling C++ Dialect}.
 @gccoptlist{-fabi-version=@var{n}  -fno-access-control @gol
 -faligned-new=@var{n}  -fargs-in-order=@var{n}  -fchar8_t  -fcheck-new @gol
--fconstexpr-depth=@var{n}  -fconstexpr-loop-limit=@var{n} @gol
--fconstexpr-ops-limit=@var{n} -fno-elide-constructors @gol
+-fconstexpr-depth=@var{n}  -fconstexpr-cache-depth=@var{n} @gol
+-fconstexpr-loop-limit=@var{n}  -fconstexpr-ops-limit=@var{n} @gol
+-fno-elide-constructors @gol
 -fno-enforce-eh-specs @gol
 -fno-gnu-keywords @gol
 -fno-implicit-templates @gol
@@ -2527,6 +2528,17 @@ to @var{n}.  A limit is needed to detect endless 
recursion during
 constant expression evaluation.  The minimum specified by the standard
 is 512.
 
+@item -fconstexpr-cache-depth=@var{n}
+@opindex fconstexpr-cache-depth
+Set the maximum level of nested evaluation depth for C++11 constexpr
+functions that will be cached to @var{n}.  This is a heuristic that
+trades off compilation speed (when the cache avoids repeated
+calculations) against memory consumption (when the cache grows very
+large from highly recursive evaluations).  The default is 8.  Very few
+users are likely to want to adjust it, but if your code does heavy
+constexpr calculations you might want to experiment to find which
+value works best for you.
+
 @item -fconstexpr-loop-limit=@var{n}
 @opindex fconstexpr-loop-limit
 Set the maximum number of iterations for a loop in C++14 constexpr functions
diff --git a/gcc/c-family/c.opt b/gcc/c-family/c.opt
index a4cf3bd623d..080066fa608 100644
--- a/gcc/c-family/c.opt
+++ b/gcc/c-family/c.opt
@@ -1424,6 +1424,10 @@ fconstexpr-depth=
 C++ ObjC++ Joined RejectNegative UInteger Var(max_constexpr_depth) Init(512)
 -fconstexpr-depth= Specify maximum constexpr recursion depth.
 
+fconstexpr-cache-depth=
+C++ ObjC++ Joined RejectNegative UInteger Var(constexpr_cache_depth) Init(8)
+-fconstexpr-cache-depth=   Specify maximum constexpr recursion 
cache depth.
+
 fconstexpr-loop-limit=
 C++ ObjC++ Joined RejectNegative UInteger Var(constexpr_loop_limit) 
Init(262144)
 -fconstexpr-loop-limit=Specify maximum constexpr loop 
iteration count.
diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index a3a36d09d5e..d11e7af3eb1 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -1447,16 +1447,17 @@ static vec call_stack;
 static int call_stack_tick;
 static int last_cx_error_tick;
 
-static bool
+static int
 push_cx_call_context (tree call)
 {
   ++call_stack_tick;
   if (!EXPR_HAS_LOCATION (call))
 SET_EXPR_LOCATION (call, input_location);
   call_stack.safe_push (call);
-  if (call_stack.length () > (unsigned) max_constexpr_depth)
+  int len = call_stack.length ();
+  if (len > max_constexpr_depth)
 return false;
-  return true;
+  return len;
 }
 
 static void
@@ -1587,7 +1588,7 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, tree 
t,
   tree fun = get_function_named_in_call (t);
   constexpr_call new_call
 = { NULL, NULL, NULL, 0, ctx->manifestly_const_eval };
-  bool depth_ok;
+  int depth_ok;
 
   if (fun == NULL_TREE)
 return cxx_eval_internal_function (ctx, t, lval,
@@ -1791,14 +1792,20 @@ cxx_eval_call_expression (const constexpr_ctx *ctx, 
tree t,
   entry = *slot;
   if (entry == NULL)
{
- /* We need to keep a pointer to the entry, not just the slot, as the
-slot can move in the call to cxx_eval_builtin_function_call.  */
- *slot = entry = ggc_alloc ();
- *entry = new_call;
- fb.preserve ();
+ /* Only cache up to constexpr_cache_depth to limit memory use.  */
+ if (depth_ok < constexpr_cache_depth)
+   {
+ /* We need to keep a pointer to the entry, not just the slot, as
+the slot can move during evaluation of the body.  */
+ *slot = entry = ggc_alloc ();
+ *entry = new_call;
+ fb.pres

  1   2   >