Re: Add support for bitwise reductions

2018-01-26 Thread Christophe Lyon
On 25 January 2018 at 11:24, Richard Sandiford
 wrote:
> Rainer Orth  writes:
>> Jeff Law  writes:
>>> On 11/22/2017 11:12 AM, Richard Sandiford wrote:
 Richard Sandiford  writes:
> This patch adds support for the SVE bitwise reduction instructions
> (ANDV, ORV and EORV).  It's a fairly mechanical extension of existing
> REDUC_* operators.
>
> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
> and powerpc64le-linux-gnu.

 Here's an updated version that applies on top of the recent
 removal of REDUC_*_EXPR.  Tested as before.

 Thanks,
 Richard


 2017-11-22  Richard Sandiford  
 Alan Hayward  
 David Sherwood  

 gcc/
 * optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab)
 (reduc_xor_scal_optab): New optabs.
 * doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m})
 (reduc_xor_scal_@var{m}): Document.
 * doc/sourcebuild.texi (vect_logical_reduc): Likewise.
 * internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New
 internal functions.
 * fold-const-call.c (fold_const_call): Handle them.
 * tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new
 internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR.
 * config/aarch64/aarch64-sve.md (reduc__scal_):
 (*reduc__scal_): New patterns.
 * config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV)
 (UNSPEC_XORV): New unspecs.
 (optab): Add entries for them.
 (BITWISEV): New int iterator.
 (bit_reduc_op): New int attributes.

 gcc/testsuite/
 * lib/target-supports.exp (check_effective_target_vect_logical_reduc):
 New proc.
 * gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc
 and add an associated scan-dump test.  Prevent vectorization
 of the first two loops.
 * gcc.dg/vect/vect-reduc-or_2.c: Likewise.
 * gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions.
 * gcc.target/aarch64/sve_reduc_2.c: Likewise.
 * gcc.target/aarch64/sve_reduc_1_run.c: Likewise.
 (INIT_VECTOR): Tweak initial value so that some bits are always set.
 * gcc.target/aarch64/sve_reduc_2_run.c: Likewise.
>>> OK.
>>> Jeff
>>
>> Two tests have regressed on sparc-sun-solaris2.*:
>>
>> +FAIL: gcc.dg/vect/vect-reduc-or_1.c -flto -ffat-lto-objects
>> scan-tree-dump vect "Reduce using vector shifts"
>> +FAIL: gcc.dg/vect/vect-reduc-or_1.c scan-tree-dump vect "Reduce using
>> vector shifts"
>> +FAIL: gcc.dg/vect/vect-reduc-or_2.c -flto -ffat-lto-objects
>> scan-tree-dump vect "Reduce using vector shifts"
>> +FAIL: gcc.dg/vect/vect-reduc-or_2.c scan-tree-dump vect "Reduce using
>> vector shifts"
>
> Bah, I think I broke this yesterday in:
>
> 2018-01-24  Richard Sandiford  
>
> PR testsuite/83889
> [...]
> * gcc.dg/vect/vect-reduc-or_1.c: Remove conditional dg-do run.
> * gcc.dg/vect/vect-reduc-or_2.c: Likewise.
>
> (r257022), which removed:
>
>   /* { dg-do run { target { whole_vector_shift || vect_logical_reduc } } } */
>
> I'd somehow thought that the dump lines in these two tests were already
> guarded, but they weren't.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu and applied as obvious.
> Sorry for the breakage.
>
> Richard
>
>

Hi Richard,

While this fixes the regression on armeb (same as on sparc), the
effect on arm-none-linux-gnueabi and arm-none-eabi
is that the tests are now skipped, while they used to pass.
Is this expected? Or is the guard you added too restrictive?

Thanks,

Christophe

> 2018-01-25  Richard Sandiford  
>
> gcc/testsuite/
> * gcc.dg/vect/vect-reduc-or_1.c: Require whole_vector_shift for
> the shift dump line.
> * gcc.dg/vect/vect-reduc-or_2.c: Likewise.
>
> Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c
> ===
> --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2018-01-24 16:22:31.724089913 
> +
> +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2018-01-25 10:16:16.283500281 
> +
> @@ -45,5 +45,5 @@ main (unsigned char argc, char **argv)
>return 0;
>  }
>
> -/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target 
> { ! vect_logical_reduc } } } } */
> +/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target 
> { whole_vector_shift && { ! vect_logical_reduc } } } } } */
>  /* { dg-final { scan-tree-dump "Reduce using direct vector reduction" "vect" 
> { target vect_logical_reduc } } } */
> Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c
> ===
> --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c 2018-01-24 16:22:31.724089913 
> +
> +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_2.c 2018-01-25 10:16:16.284500239 
> +
> @@ -44,5 

Re: Add support for bitwise reductions

2018-01-26 Thread Richard Sandiford
Christophe Lyon  writes:
> On 25 January 2018 at 11:24, Richard Sandiford
>  wrote:
>> Rainer Orth  writes:
>>> Jeff Law  writes:
 On 11/22/2017 11:12 AM, Richard Sandiford wrote:
> Richard Sandiford  writes:
>> This patch adds support for the SVE bitwise reduction instructions
>> (ANDV, ORV and EORV).  It's a fairly mechanical extension of existing
>> REDUC_* operators.
>>
>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
>> and powerpc64le-linux-gnu.
>
> Here's an updated version that applies on top of the recent
> removal of REDUC_*_EXPR.  Tested as before.
>
> Thanks,
> Richard
>
>
> 2017-11-22  Richard Sandiford  
> Alan Hayward  
> David Sherwood  
>
> gcc/
> * optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab)
> (reduc_xor_scal_optab): New optabs.
> * doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m})
> (reduc_xor_scal_@var{m}): Document.
> * doc/sourcebuild.texi (vect_logical_reduc): Likewise.
> * internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New
> internal functions.
> * fold-const-call.c (fold_const_call): Handle them.
> * tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new
> internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR.
> * config/aarch64/aarch64-sve.md (reduc__scal_):
> (*reduc__scal_): New patterns.
> * config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV)
> (UNSPEC_XORV): New unspecs.
> (optab): Add entries for them.
> (BITWISEV): New int iterator.
> (bit_reduc_op): New int attributes.
>
> gcc/testsuite/
> * lib/target-supports.exp (check_effective_target_vect_logical_reduc):
> New proc.
> * gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc
> and add an associated scan-dump test.  Prevent vectorization
> of the first two loops.
> * gcc.dg/vect/vect-reduc-or_2.c: Likewise.
> * gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions.
> * gcc.target/aarch64/sve_reduc_2.c: Likewise.
> * gcc.target/aarch64/sve_reduc_1_run.c: Likewise.
> (INIT_VECTOR): Tweak initial value so that some bits are always set.
> * gcc.target/aarch64/sve_reduc_2_run.c: Likewise.
 OK.
 Jeff
>>>
>>> Two tests have regressed on sparc-sun-solaris2.*:
>>>
>>> +FAIL: gcc.dg/vect/vect-reduc-or_1.c -flto -ffat-lto-objects
>>> scan-tree-dump vect "Reduce using vector shifts"
>>> +FAIL: gcc.dg/vect/vect-reduc-or_1.c scan-tree-dump vect "Reduce using
>>> vector shifts"
>>> +FAIL: gcc.dg/vect/vect-reduc-or_2.c -flto -ffat-lto-objects
>>> scan-tree-dump vect "Reduce using vector shifts"
>>> +FAIL: gcc.dg/vect/vect-reduc-or_2.c scan-tree-dump vect "Reduce using
>>> vector shifts"
>>
>> Bah, I think I broke this yesterday in:
>>
>> 2018-01-24  Richard Sandiford  
>>
>> PR testsuite/83889
>> [...]
>> * gcc.dg/vect/vect-reduc-or_1.c: Remove conditional dg-do run.
>> * gcc.dg/vect/vect-reduc-or_2.c: Likewise.
>>
>> (r257022), which removed:
>>
>>   /* { dg-do run { target { whole_vector_shift || vect_logical_reduc } } } */
>>
>> I'd somehow thought that the dump lines in these two tests were already
>> guarded, but they weren't.
>>
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu and applied as obvious.
>> Sorry for the breakage.
>>
>> Richard
>>
>>
>
> Hi Richard,
>
> While this fixes the regression on armeb (same as on sparc), the
> effect on arm-none-linux-gnueabi and arm-none-eabi
> is that the tests are now skipped, while they used to pass.
> Is this expected? Or is the guard you added too restrictive?

I think that means that the tests went from UNSUPPORTED to PASS on
the last two targets with r257022.  Is that right?

It's expected in the sense that whole_vector_shift isn't true for
any arm*-*-* target, and historically this test was restricted to
whole_vector_shift (apart from the blip this week).

Thanks,
Richard

> Thanks,
>
> Christophe
>
>> 2018-01-25  Richard Sandiford  
>>
>> gcc/testsuite/
>> * gcc.dg/vect/vect-reduc-or_1.c: Require whole_vector_shift for
>> the shift dump line.
>> * gcc.dg/vect/vect-reduc-or_2.c: Likewise.
>>
>> Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c
>> ===
>> --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2018-01-24 
>> 16:22:31.724089913 +
>> +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2018-01-25 
>> 10:16:16.283500281 +
>> @@ -45,5 +45,5 @@ main (unsigned char argc, char **argv)
>>return 0;
>>  }
>>
>> -/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target 
>> { ! vect_logical_reduc } } } } */
>> +/* { dg-final { scan-tree-dump "Reduce using vector shifts" "vect" { target 
>> { whole_vector

Re: Add support for bitwise reductions

2018-01-26 Thread Christophe Lyon
On 26 January 2018 at 10:33, Richard Sandiford
 wrote:
> Christophe Lyon  writes:
>> On 25 January 2018 at 11:24, Richard Sandiford
>>  wrote:
>>> Rainer Orth  writes:
 Jeff Law  writes:
> On 11/22/2017 11:12 AM, Richard Sandiford wrote:
>> Richard Sandiford  writes:
>>> This patch adds support for the SVE bitwise reduction instructions
>>> (ANDV, ORV and EORV).  It's a fairly mechanical extension of existing
>>> REDUC_* operators.
>>>
>>> Tested on aarch64-linux-gnu (with and without SVE), x86_64-linux-gnu
>>> and powerpc64le-linux-gnu.
>>
>> Here's an updated version that applies on top of the recent
>> removal of REDUC_*_EXPR.  Tested as before.
>>
>> Thanks,
>> Richard
>>
>>
>> 2017-11-22  Richard Sandiford  
>> Alan Hayward  
>> David Sherwood  
>>
>> gcc/
>> * optabs.def (reduc_and_scal_optab, reduc_ior_scal_optab)
>> (reduc_xor_scal_optab): New optabs.
>> * doc/md.texi (reduc_and_scal_@var{m}, reduc_ior_scal_@var{m})
>> (reduc_xor_scal_@var{m}): Document.
>> * doc/sourcebuild.texi (vect_logical_reduc): Likewise.
>> * internal-fn.def (IFN_REDUC_AND, IFN_REDUC_IOR, IFN_REDUC_XOR): New
>> internal functions.
>> * fold-const-call.c (fold_const_call): Handle them.
>> * tree-vect-loop.c (reduction_fn_for_scalar_code): Return the new
>> internal functions for BIT_AND_EXPR, BIT_IOR_EXPR and BIT_XOR_EXPR.
>> * config/aarch64/aarch64-sve.md (reduc__scal_):
>> (*reduc__scal_): New patterns.
>> * config/aarch64/iterators.md (UNSPEC_ANDV, UNSPEC_ORV)
>> (UNSPEC_XORV): New unspecs.
>> (optab): Add entries for them.
>> (BITWISEV): New int iterator.
>> (bit_reduc_op): New int attributes.
>>
>> gcc/testsuite/
>> * lib/target-supports.exp 
>> (check_effective_target_vect_logical_reduc):
>> New proc.
>> * gcc.dg/vect/vect-reduc-or_1.c: Also run for vect_logical_reduc
>> and add an associated scan-dump test.  Prevent vectorization
>> of the first two loops.
>> * gcc.dg/vect/vect-reduc-or_2.c: Likewise.
>> * gcc.target/aarch64/sve_reduc_1.c: Add AND, IOR and XOR reductions.
>> * gcc.target/aarch64/sve_reduc_2.c: Likewise.
>> * gcc.target/aarch64/sve_reduc_1_run.c: Likewise.
>> (INIT_VECTOR): Tweak initial value so that some bits are always set.
>> * gcc.target/aarch64/sve_reduc_2_run.c: Likewise.
> OK.
> Jeff

 Two tests have regressed on sparc-sun-solaris2.*:

 +FAIL: gcc.dg/vect/vect-reduc-or_1.c -flto -ffat-lto-objects
 scan-tree-dump vect "Reduce using vector shifts"
 +FAIL: gcc.dg/vect/vect-reduc-or_1.c scan-tree-dump vect "Reduce using
 vector shifts"
 +FAIL: gcc.dg/vect/vect-reduc-or_2.c -flto -ffat-lto-objects
 scan-tree-dump vect "Reduce using vector shifts"
 +FAIL: gcc.dg/vect/vect-reduc-or_2.c scan-tree-dump vect "Reduce using
 vector shifts"
>>>
>>> Bah, I think I broke this yesterday in:
>>>
>>> 2018-01-24  Richard Sandiford  
>>>
>>> PR testsuite/83889
>>> [...]
>>> * gcc.dg/vect/vect-reduc-or_1.c: Remove conditional dg-do run.
>>> * gcc.dg/vect/vect-reduc-or_2.c: Likewise.
>>>
>>> (r257022), which removed:
>>>
>>>   /* { dg-do run { target { whole_vector_shift || vect_logical_reduc } } } 
>>> */
>>>
>>> I'd somehow thought that the dump lines in these two tests were already
>>> guarded, but they weren't.
>>>
>>> Tested on aarch64-linux-gnu and x86_64-linux-gnu and applied as obvious.
>>> Sorry for the breakage.
>>>
>>> Richard
>>>
>>>
>>
>> Hi Richard,
>>
>> While this fixes the regression on armeb (same as on sparc), the
>> effect on arm-none-linux-gnueabi and arm-none-eabi
>> is that the tests are now skipped, while they used to pass.
>> Is this expected? Or is the guard you added too restrictive?
>
> I think that means that the tests went from UNSUPPORTED to PASS on
> the last two targets with r257022.  Is that right?
>
Yes, that's what I meant.

> It's expected in the sense that whole_vector_shift isn't true for
> any arm*-*-* target, and historically this test was restricted to
> whole_vector_shift (apart from the blip this week).
>
OK, then. Just surprising to see PASS disappear.

Thanks,

Christophe

> Thanks,
> Richard
>
>> Thanks,
>>
>> Christophe
>>
>>> 2018-01-25  Richard Sandiford  
>>>
>>> gcc/testsuite/
>>> * gcc.dg/vect/vect-reduc-or_1.c: Require whole_vector_shift for
>>> the shift dump line.
>>> * gcc.dg/vect/vect-reduc-or_2.c: Likewise.
>>>
>>> Index: gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c
>>> ===
>>> --- gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2018-01-24 
>>> 16:22:31.724089913 +
>>> +++ gcc/testsuite/gcc.dg/vect/vect-reduc-or_1.c 2018-01-25 
>>> 10:16:16.283500281 +

Re: [PATCH] Fix PR81082

2018-01-26 Thread Richard Biener
On Thu, 25 Jan 2018, Richard Biener wrote:

> On Thu, 25 Jan 2018, Marc Glisse wrote:
> 
> > On Thu, 25 Jan 2018, Richard Biener wrote:
> > 
> > > --- gcc/match.pd  (revision 257047)
> > > +++ gcc/match.pd  (working copy)
> > > @@ -1939,6 +1939,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> > >  (minus (convert (view_convert:stype @1))
> > >   (convert (view_convert:stype @2)))
> > > 
> > > +/* (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A -> A * (C+-1).
> > > +Modeled after fold_plusminus_mult_expr.  */
> > > +(if (!TYPE_SATURATING (type)
> > > + && (!FLOAT_TYPE_P (type) || flag_associative_math))
> > > + (for plusminus (plus minus)
> > > +  (simplify
> > > +   (plusminus (mult:s @0 @1) (mult:cs @0 @2))
> > 
> > No :c on the first mult, so we don't actually handle A*C+B*C?
> 
> Hmm, I somehow convinced myself that it's only necessary on one
> of the mults...  but you are of course right.  Will re-test with
> that fixed.

This is what I have applied.  Note I had to XFAIL a minor part of
gcc.dg/tree-ssa/loop-15.c, namely simplifying the final value
replacement down to n * n.  While the patch should enable this
the place where the transform could trigger with enough information
about n is VRP but that doesn't end up folding all stmts - and
that's something I'm not willing to change right now.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied on trunk.

Richard.

2018-01-26  Richard Biener  

PR tree-optimization/81082
* fold-const.c (fold_plusminus_mult_expr): Do not perform the
association if it requires casting to unsigned.
* match.pd ((A * C) +- (B * C) -> (A+-B)): New patterns derived
from fold_plusminus_mult_expr to catch important cases late when
range info is available.

* gcc.dg/vect/pr81082.c: New testcase.
* gcc.dg/tree-ssa/loop-15.c: XFAIL the (int)((unsigned)n + -1U) * n + n
simplification to n * n.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 257047)
+++ gcc/fold-const.c(working copy)
@@ -7097,7 +7097,7 @@ fold_plusminus_mult_expr (location_t loc
 
   /* Same may be zero and thus the operation 'code' may overflow.  Likewise
  same may be minus one and thus the multiplication may overflow.  Perform
- the operations in an unsigned type.  */
+ the sum operation in an unsigned type.  */
   tree utype = unsigned_type_for (type);
   tree tem = fold_build2_loc (loc, code, utype,
  fold_convert_loc (loc, utype, alt0),
@@ -7110,9 +7110,9 @@ fold_plusminus_mult_expr (location_t loc
 return fold_build2_loc (loc, MULT_EXPR, type,
fold_convert (type, tem), same);
 
-  return fold_convert_loc (loc, type,
-  fold_build2_loc (loc, MULT_EXPR, utype, tem,
-   fold_convert_loc (loc, utype, 
same)));
+  /* Do not resort to unsigned multiplication because
+ we lose the no-overflow property of the expression.  */
+  return NULL_TREE;
 }
 
 /* Subroutine of native_encode_expr.  Encode the INTEGER_CST
Index: gcc/match.pd
===
--- gcc/match.pd(revision 257047)
+++ gcc/match.pd(working copy)
@@ -1939,6 +1939,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  (minus (convert (view_convert:stype @1))
(convert (view_convert:stype @2)))
 
+/* (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A -> A * (C+-1).
+Modeled after fold_plusminus_mult_expr.  */
+(if (!TYPE_SATURATING (type)
+ && (!FLOAT_TYPE_P (type) || flag_associative_math))
+ (for plusminus (plus minus)
+  (simplify
+   (plusminus (mult:cs @0 @1) (mult:cs @0 @2))
+   (if (!ANY_INTEGRAL_TYPE_P (type)
+|| TYPE_OVERFLOW_WRAPS (type)
+|| (INTEGRAL_TYPE_P (type)
+   && tree_expr_nonzero_p (@0)
+   && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)
+(mult (plusminus @1 @2) @0)))
+  /* We cannot generate constant 1 for fract.  */
+  (if (!ALL_FRACT_MODE_P (TYPE_MODE (type)))
+   (simplify
+(plusminus @0 (mult:cs @0 @2))
+(if (!ANY_INTEGRAL_TYPE_P (type)
+|| TYPE_OVERFLOW_WRAPS (type)
+|| (INTEGRAL_TYPE_P (type)
+&& tree_expr_nonzero_p (@0)
+&& expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)
+ (mult (plusminus { build_one_cst (type); } @2) @0)))
+   (simplify
+(plusminus (mult:cs @0 @2) @0)
+(if (!ANY_INTEGRAL_TYPE_P (type)
+|| TYPE_OVERFLOW_WRAPS (type)
+|| (INTEGRAL_TYPE_P (type)
+&& tree_expr_nonzero_p (@0)
+&& expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)
+ (mult (plusminus @2 { build_one_cst (type); }) @0))
 
 /* Simplifications of MIN_EXPR, MAX_EXPR, fmin() and fmax().  */
 
Index: gcc/testsuite/gcc.dg/vect/pr81082.c

Re: [PATCH] Fix PR84003

2018-01-26 Thread Jakub Jelinek
On Thu, Jan 25, 2018 at 12:18:21PM +0100, Richard Biener wrote:
> 2018-01-25  Richard Biener  
> 
>   PR rtl-optimization/84003
>   * dse.c (dse_step1): When removing redundant stores make sure
>   to adjust the earlier stores alias-set to match semantics of
>   the removed one and its own.
>   (dse_step6): Likewise.
> 
>   * g++.dg/torture/pr77745.C: Mark foo noinline to trigger
>   latent bug in DSE.
> 
> Index: gcc/dse.c
> ===
> --- gcc/dse.c (revision 257043)
> +++ gcc/dse.c (working copy)
> @@ -2725,6 +2725,19 @@ dse_step1 (void)
>   "eliminated\n",
>INSN_UID (ptr->insn),
>INSN_UID (s_info->redundant_reason->insn));
> +   /* If the later store we delete could have changed the
> +  dynamic type of the memory make sure the one we
> +  preserve serves as a store for all loads that could
> +  validly have accessed the one we delete.  */
> +   store_info *r_info = s_info->redundant_reason->store_rec;
> +   while (r_info)
> + {
> +   if (r_info->is_set
> +   && (MEM_ALIAS_SET (s_info->mem)
> +   != MEM_ALIAS_SET (r_info->mem)))
> + set_mem_alias_set (r_info->mem, 0);

Is alias set 0 the only easily computable choice if there is a mismatch?
Also, isn't it fine if one of the alias set is a subset of the other one?

More importantly, I think set_mem_alias_set (r_info->mem, 0) can't work,
r_info->mem is a result of canon_rtx (SET_DEST (body)), so sometimes could
be the MEM that appears in the instruction, but at other times could be a
different pointer and changing that wouldn't change anything in the actual
instruction.  So, in order to do this you'd need to add probably another
field and record the original SET_DEST (body) record_store is called with.
Even that doesn't need to be something that really appears in the
instruction, but the exception in that case is the memset call and that
semantically has alias set of 0.

> --- gcc/testsuite/g++.dg/torture/pr77745.C(revision 257043)
> +++ gcc/testsuite/g++.dg/torture/pr77745.C(working copy)
> @@ -2,7 +2,7 @@
>  
>  inline void* operator new(__SIZE_TYPE__, void* __p) noexcept { return __p; }
>  
> -long foo(char *c1, char *c2)
> +long __attribute__((noinline)) foo(char *c1, char *c2)
>  {
>long *p1 = new (c1) long;
>*p1 = 100;

Is it desirable to modify an existing test, rather than say macroize this
and add pr77745-2.C that will define some macro and include this pr77745.C,
such that we cover both noinline and without noinline?

Jakub


Re: [PATCH] Fix gcc.target/aarch64/sve/peel_ind_1.c for -mcmodel=tiny

2018-01-26 Thread Szabolcs Nagy

On 24/01/18 20:10, Richard Sandiford wrote:

Szabolcs Nagy  writes:

Fix test failures with -mcmodel=tiny when adr is generated instead of adrp.

FAIL: gcc.target/aarch64/sve/peel_ind_1.c -march=armv8.2-a+sve
scan-assembler \\tadrp\\tx[0-9]+, x\\n
FAIL: gcc.target/aarch64/sve/peel_ind_2.c -march=armv8.2-a+sve
scan-assembler \\tadrp\\tx[0-9]+, x\\n
FAIL: gcc.target/aarch64/sve/peel_ind_3.c -march=armv8.2-a+sve
scan-assembler \\tadrp\\tx[0-9]+, x\\n

gcc/testsuite/ChangeLog:

2018-01-24  Szabolcs Nagy  

  * gcc.target/aarch64/sve/peel_ind_1.c: Match (adrp|adr) in scan-assembler.
  * gcc.target/aarch64/sve/peel_ind_2.c: Likewise.
  * gcc.target/aarch64/sve/peel_ind_3.c: Likewise.


LGTM FWIW.  Thanks for fixing this!



committed as obvious.


Re: [PATCH] Fix PR84003

2018-01-26 Thread Richard Biener
On Thu, 25 Jan 2018, Richard Sandiford wrote:

> Richard Sandiford  writes:
> > Richard Biener  writes:
> >> The following patch fixes PR84003 where RTL DSE removes a redundant
> >> store (a store storing the same value as an earlier store) but in
> >> doing this changing program semantics because the later store changes
> >> the effective type of the memory location.  This in turn allows
> >> sched2 to do an originally invalid transform.
> >>
> >> The fix is to harden DSE so that while removing the later store
> >> the earlier stores alias-sets are changed to reflect both dynamic
> >> types - which means using alias-set zero.
> >>
> >> On GIMPLE we simply avoid removing such type-changing stores because
> >> we cannot easily get hold on the earlier store.
> >>
> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> >>
> >> Disclaimer: this is a "local" fix, I didn't try to understand much
> >> of the DSE data structures but only learned from +-10 lines around
> >> the affected transform which I found by searching for the dumped
> >> messages ...  Richard, you contributed the pass so please review.
> >
> > So the file says.  In reality I only wrote an early version, and what
> > was committed contains very little of that.  So TBH this pass is almost
> > a complete unknown to me.  That said...
> >
> > Might it be worth keeping the store instead?
> 
> ...that is, like gimple. :-)  Sorry, I spent so long thinking about this
> that I forgot you'd said that gimple already does the same thing.

Yeah, and it still runs into PR82224 ... which means it is not 
conservative enough.

> Richard
> 
> > Giving things alias set
> > zero seems like a pretty big hammer and would turn the remaining store
> > into something close to a scheduling barrier.
> >
> > IOW, how about checking alias_set_subset_of in:
> >
> >   /* Even if PTR won't be eliminated as unneeded, if both
> >  PTR and this insn store the same constant value, we might
> >  eliminate this insn instead.  */
> >   if (s_info->const_rhs
> >   && const_rhs
> >   && known_subrange_p (offset, width,
> >s_info->offset, s_info->width)
> >   && all_positions_needed_p (s_info, offset - s_info->offset,
> >  width))
> >
> > instead?

That's what GIMPLE does (use alias_set_subset_of), but it arguably
doesn't work for unions (ok, there even the equality case doesn't 
work...).

I guess I thought it's worth trying sth new and adjust the earlier
store ;)  Sth that I can't easily do on GIMPLE but everything seemed
to be in place in RTL DSE.

So, when looking at the above code it seems like there are exactly
two insns we look at?  s_info and 'mem' I guess.  And s_info is
the earlier one.

So sth like

Index: gcc/dse.c
===
--- gcc/dse.c   (revision 257077)
+++ gcc/dse.c   (working copy)
@@ -1532,7 +1532,12 @@ record_store (rtx body, bb_info_t bb_inf
  && known_subrange_p (offset, width,
   s_info->offset, s_info->width)
  && all_positions_needed_p (s_info, offset - s_info->offset,
-width))
+width)
+ /* We can only remove the later store if the earlier aliases
+at least all accesses the later one.  */
+ && (MEM_ALIAS_SET (mem) == MEM_ALIAS_SET (s_info->mem)
+ || alias_set_subset_of (MEM_ALIAS_SET (mem),
+ MEM_ALIAS_SET (s_info->mem
{
  if (GET_MODE (mem) == BLKmode)
{

?  I can confirm that this patch works as well.  Is that what we prefer?
It would be consistent with what we do on GIMPLE currently (irrespective
of still existing bugs in this area...).

Note that with the above instead of dse1 removing the later store
dse2 now decides to remove the earlier one...  (which is fine!).

So I'm testing the above now, ok if it succeeds?

Thanks,
Richard.

> > Failing that:
> >
> >> 2018-01-25  Richard Biener  
> >>
> >>PR rtl-optimization/84003
> >>* dse.c (dse_step1): When removing redundant stores make sure
> >>to adjust the earlier stores alias-set to match semantics of
> >>the removed one and its own.
> >>(dse_step6): Likewise.
> >>
> >>* g++.dg/torture/pr77745.C: Mark foo noinline to trigger
> >>latent bug in DSE.
> >>
> >> Index: gcc/dse.c
> >> ===
> >> --- gcc/dse.c  (revision 257043)
> >> +++ gcc/dse.c  (working copy)
> >> @@ -2725,6 +2725,19 @@ dse_step1 (void)
> >>"eliminated\n",
> >> INSN_UID (ptr->insn),
> >> INSN_UID (s_info->redundant_reason->insn));
> >> +/* If the later store we delete could have change

Re: Silence false positive on LTO type merging waring

2018-01-26 Thread Richard Biener
On Thu, 25 Jan 2018, Jan Hubicka wrote:

> Hi,
> the testcase triggers invalid warning on type mismatch because array
> of pointers to complete type has different alias set from array of pointers
> to incomplete type.  This is valid, because incoplete pointer has alias set
> of void_ptr which alias all pointers and arrays alias with their members.

But isn't that a problem?  Not if the pointer to incomlete type have
alias set zero but IIRC they don't, right?

Richard.

> I already silenced the wanring for pointers but I forgot about arrays.
> 
> Bootstrapped/regtested x86_64-linux, OK?
>   
>   * gcc.dg/lto/pr83954.h: New testcase.
>   * gcc.dg/lto/pr83954_0.c: New testcase.
>   * gcc.dg/lto/pr83954_1.c: New testcase.
>   * lto-symtab.c (warn_type_compatibility_p): Silence false positive
>   for type match warning on arrays of pointers.
> 
> Index: testsuite/gcc.dg/lto/pr83954.h
> ===
> --- testsuite/gcc.dg/lto/pr83954.h(revision 0)
> +++ testsuite/gcc.dg/lto/pr83954.h(working copy)
> @@ -0,0 +1,3 @@
> +struct foo;
> +extern struct foo *FOO_PTR_ARR[1];
> +
> Index: testsuite/gcc.dg/lto/pr83954_0.c
> ===
> --- testsuite/gcc.dg/lto/pr83954_0.c  (revision 0)
> +++ testsuite/gcc.dg/lto/pr83954_0.c  (working copy)
> @@ -0,0 +1,8 @@
> +/* { dg-lto-do link } */
> +#include "pr83954.h"
> +
> +int main() { 
> +  // just to prevent symbol removal
> +  FOO_PTR_ARR[1] = 0;
> +  return 0;
> +}
> Index: testsuite/gcc.dg/lto/pr83954_1.c
> ===
> --- testsuite/gcc.dg/lto/pr83954_1.c  (revision 0)
> +++ testsuite/gcc.dg/lto/pr83954_1.c  (working copy)
> @@ -0,0 +1,7 @@
> +#include "pr83954.h"
> +
> +struct foo {
> + int x;
> +};
> +struct foo *FOO_PTR_ARR[1] = { 0 };
> +
> Index: lto/lto-symtab.c
> ===
> --- lto/lto-symtab.c  (revision 257009)
> +++ lto/lto-symtab.c  (working copy)
> @@ -284,11 +284,22 @@ warn_type_compatibility_p (tree prevaili
>alias_set_type set1 = get_alias_set (type);
>alias_set_type set2 = get_alias_set (prevailing_type);
>  
> -  if (set1 && set2 && set1 != set2 
> -  && (!POINTER_TYPE_P (type) || !POINTER_TYPE_P (prevailing_type)
> +  if (set1 && set2 && set1 != set2)
> + {
> +  tree t1 = type, t2 = prevailing_type;
> +
> +   /* Alias sets of arrays are the same as alias sets of the inner
> +  types.  */
> +   while (TREE_CODE (t1) == ARRAY_TYPE && TREE_CODE (t2) == ARRAY_TYPE)
> + {
> +   t1 = TREE_TYPE (t1);
> +   t2 = TREE_TYPE (t2);
> + }
> +  if ((!POINTER_TYPE_P (t1) || !POINTER_TYPE_P (t2))
> || (set1 != TYPE_ALIAS_SET (ptr_type_node)
> -   && set2 != TYPE_ALIAS_SET (ptr_type_node
> -lev |= 5;
> +   && set2 != TYPE_ALIAS_SET (ptr_type_node)))
> + lev |= 5;
> + }
>  }
>  
>return lev;
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Fix RTL DCE with separate shrink wrapped epilogues (PR rtl-optimization/83985)

2018-01-26 Thread Richard Biener
On Thu, 25 Jan 2018, Segher Boessenkool wrote:

> On Thu, Jan 25, 2018 at 11:20:33PM +0100, Jakub Jelinek wrote:
> > Hi!
> > 
> > The r241060 change added the second hunk in this patch which the patch is
> > reverting.  The problem is that not deleting some unmarked insns in
> > delete_unmarked_insns is done in a wrong place, it causes indeed not to
> > delete the instruction we don't want to DCE, but the instructions that
> > are needed by the instructions (in this case a memory load whose result
> > the REG_CFA_RESTORE instruction uses) are not marked either and those are
> > deleted.
> > 
> > The following patch fixes it by making such instructions non-deletable,
> > which means they are marked and the DCE algorithm then marks the
> > instructions they need too.
> 
> Looks good to me!  Thanks.  And sorry for causing the bug in the first
> place :-/

Ok.

Richard.


Re: [PATCH] Fix -Wrestrict SSA_NAME handling (PR c/83989)

2018-01-26 Thread Richard Biener
On Thu, 25 Jan 2018, Jakub Jelinek wrote:

> Hi!
> 
> builtin_memref ctor for a SSA_NAME with non-NULL SSA_NAME_VAR returns
> the underlying variable, rather than just the SSA_NAME.
> Later on the code checks if the bases are equal and then compares
> corresponding offsets.
> 
> The fact that two different SSA_NAMEs have the same underlying variable
> says nothing at all whether they have the same value, as the testcase shows,
> either the SSA_NAMEs can be completely unrelated, or related, but with
> different offsets.  The code already has code to handle POINTER_PLUS_EXPR
> with a constant offset, to look through that.  Perhaps in the future there
> can be other cases we'd special case, but generally we should compare as
> bases the SSA_NAMEs, if they have the same or different base says nothing
> about them.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Richard.

> 2018-01-25  Jakub Jelinek  
> 
>   PR c/83989
>   * gimple-ssa-warn-restrict.c (builtin_memref::builtin_memref): Don't
>   use SSA_NAME_VAR as base for SSA_NAMEs with non-NULL SSA_NAME_VAR.
> 
>   * c-c++-common/Wrestrict-3.c: New test.
> 
> --- gcc/gimple-ssa-warn-restrict.c.jj 2018-01-17 11:54:17.0 +0100
> +++ gcc/gimple-ssa-warn-restrict.c2018-01-25 14:10:26.182498552 +0100
> @@ -373,9 +373,6 @@ builtin_memref::builtin_memref (tree exp
> offrange[1] += off;
>   }
>   }
> -
> - if (TREE_CODE (base) == SSA_NAME && SSA_NAME_VAR (base))
> -   base = SSA_NAME_VAR (base);
>}
>  
>if (size)
> --- gcc/testsuite/c-c++-common/Wrestrict-3.c.jj   2018-01-25 
> 14:16:01.574563425 +0100
> +++ gcc/testsuite/c-c++-common/Wrestrict-3.c  2018-01-25 14:14:39.273547506 
> +0100
> @@ -0,0 +1,48 @@
> +/* PR c/83989 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -Wrestrict" } */
> +
> +__attribute__((__malloc__)) extern void *my_malloc (__SIZE_TYPE__);
> +void baz (void *);
> +
> +#define SIZE 32
> +
> +void
> +foo (void)
> +{
> +  void *recmem = __builtin_malloc (SIZE);
> +  baz (recmem);
> +  while (1)
> +{
> +  void *oldrecmem = recmem;
> +  recmem = __builtin_malloc (SIZE);
> +  if (!recmem)
> + {
> +   __builtin_free (oldrecmem);
> +   return;
> + }
> +  __builtin_memcpy (recmem, oldrecmem, SIZE);/* { dg-bogus 
> "accessing" } */
> +  baz (recmem);
> +  __builtin_free (oldrecmem);
> +}
> +}
> +
> +void
> +bar (void)
> +{
> +  void *recmem = my_malloc (SIZE);
> +  baz (recmem);
> +  while (1)
> +{
> +  void *oldrecmem = recmem;
> +  recmem = my_malloc (SIZE);
> +  if (!recmem)
> + {
> +   __builtin_free (oldrecmem);
> +   return;
> + }
> +  __builtin_memcpy (recmem, oldrecmem, SIZE);/* { dg-bogus 
> "accessing" } */
> +  baz (recmem);
> +  __builtin_free (oldrecmem);
> +}
> +}
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH] Fix PR84003

2018-01-26 Thread Richard Biener
On Fri, 26 Jan 2018, Jakub Jelinek wrote:

> On Thu, Jan 25, 2018 at 12:18:21PM +0100, Richard Biener wrote:
> > 2018-01-25  Richard Biener  
> > 
> > PR rtl-optimization/84003
> > * dse.c (dse_step1): When removing redundant stores make sure
> > to adjust the earlier stores alias-set to match semantics of
> > the removed one and its own.
> > (dse_step6): Likewise.
> > 
> > * g++.dg/torture/pr77745.C: Mark foo noinline to trigger
> > latent bug in DSE.
> > 
> > Index: gcc/dse.c
> > ===
> > --- gcc/dse.c   (revision 257043)
> > +++ gcc/dse.c   (working copy)
> > @@ -2725,6 +2725,19 @@ dse_step1 (void)
> > "eliminated\n",
> >  INSN_UID (ptr->insn),
> >  INSN_UID (s_info->redundant_reason->insn));
> > + /* If the later store we delete could have changed the
> > +dynamic type of the memory make sure the one we
> > +preserve serves as a store for all loads that could
> > +validly have accessed the one we delete.  */
> > + store_info *r_info = s_info->redundant_reason->store_rec;
> > + while (r_info)
> > +   {
> > + if (r_info->is_set
> > + && (MEM_ALIAS_SET (s_info->mem)
> > + != MEM_ALIAS_SET (r_info->mem)))
> > +   set_mem_alias_set (r_info->mem, 0);
> 
> Is alias set 0 the only easily computable choice if there is a mismatch?
> Also, isn't it fine if one of the alias set is a subset of the other one?
> 
> More importantly, I think set_mem_alias_set (r_info->mem, 0) can't work,
> r_info->mem is a result of canon_rtx (SET_DEST (body)), so sometimes could
> be the MEM that appears in the instruction, but at other times could be a
> different pointer and changing that wouldn't change anything in the actual
> instruction.  So, in order to do this you'd need to add probably another
> field and record the original SET_DEST (body) record_store is called with.

Uh, indeed.  See my mail in response to Richard which comes up with
an alternate patch avoiding this issue.

> Even that doesn't need to be something that really appears in the
> instruction, but the exception in that case is the memset call and that
> semantically has alias set of 0.
>
> > --- gcc/testsuite/g++.dg/torture/pr77745.C  (revision 257043)
> > +++ gcc/testsuite/g++.dg/torture/pr77745.C  (working copy)
> > @@ -2,7 +2,7 @@
> >  
> >  inline void* operator new(__SIZE_TYPE__, void* __p) noexcept { return __p; 
> > }
> >  
> > -long foo(char *c1, char *c2)
> > +long __attribute__((noinline)) foo(char *c1, char *c2)
> >  {
> >long *p1 = new (c1) long;
> >*p1 = 100;
> 
> Is it desirable to modify an existing test, rather than say macroize this
> and add pr77745-2.C that will define some macro and include this pr77745.C,
> such that we cover both noinline and without noinline?

Yeah, I'll do that.

Richard.


Re: [PATCH] Fix PR84003

2018-01-26 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, 25 Jan 2018, Richard Sandiford wrote:
>
>> Richard Sandiford  writes:
>> > Richard Biener  writes:
>> >> The following patch fixes PR84003 where RTL DSE removes a redundant
>> >> store (a store storing the same value as an earlier store) but in
>> >> doing this changing program semantics because the later store changes
>> >> the effective type of the memory location.  This in turn allows
>> >> sched2 to do an originally invalid transform.
>> >>
>> >> The fix is to harden DSE so that while removing the later store
>> >> the earlier stores alias-sets are changed to reflect both dynamic
>> >> types - which means using alias-set zero.
>> >>
>> >> On GIMPLE we simply avoid removing such type-changing stores because
>> >> we cannot easily get hold on the earlier store.
>> >>
>> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>> >>
>> >> Disclaimer: this is a "local" fix, I didn't try to understand much
>> >> of the DSE data structures but only learned from +-10 lines around
>> >> the affected transform which I found by searching for the dumped
>> >> messages ...  Richard, you contributed the pass so please review.
>> >
>> > So the file says.  In reality I only wrote an early version, and what
>> > was committed contains very little of that.  So TBH this pass is almost
>> > a complete unknown to me.  That said...
>> >
>> > Might it be worth keeping the store instead?
>> 
>> ...that is, like gimple. :-)  Sorry, I spent so long thinking about this
>> that I forgot you'd said that gimple already does the same thing.
>
> Yeah, and it still runs into PR82224 ... which means it is not 
> conservative enough.
>
>> Richard
>> 
>> > Giving things alias set
>> > zero seems like a pretty big hammer and would turn the remaining store
>> > into something close to a scheduling barrier.
>> >
>> > IOW, how about checking alias_set_subset_of in:
>> >
>> >  /* Even if PTR won't be eliminated as unneeded, if both
>> > PTR and this insn store the same constant value, we might
>> > eliminate this insn instead.  */
>> >  if (s_info->const_rhs
>> >  && const_rhs
>> >  && known_subrange_p (offset, width,
>> >   s_info->offset, s_info->width)
>> >  && all_positions_needed_p (s_info, offset - s_info->offset,
>> > width))
>> >
>> > instead?
>
> That's what GIMPLE does (use alias_set_subset_of), but it arguably
> doesn't work for unions (ok, there even the equality case doesn't 
> work...).
>
> I guess I thought it's worth trying sth new and adjust the earlier
> store ;)  Sth that I can't easily do on GIMPLE but everything seemed
> to be in place in RTL DSE.
>
> So, when looking at the above code it seems like there are exactly
> two insns we look at?  s_info and 'mem' I guess.  And s_info is
> the earlier one.

Yeah.

> So sth like
>
> Index: gcc/dse.c
> ===
> --- gcc/dse.c   (revision 257077)
> +++ gcc/dse.c   (working copy)
> @@ -1532,7 +1532,12 @@ record_store (rtx body, bb_info_t bb_inf
>   && known_subrange_p (offset, width,
>s_info->offset, s_info->width)
>   && all_positions_needed_p (s_info, offset - s_info->offset,
> -width))
> +width)
> + /* We can only remove the later store if the earlier aliases
> +at least all accesses the later one.  */
> + && (MEM_ALIAS_SET (mem) == MEM_ALIAS_SET (s_info->mem)
> + || alias_set_subset_of (MEM_ALIAS_SET (mem),
> + MEM_ALIAS_SET (s_info->mem
> {
>   if (GET_MODE (mem) == BLKmode)
> {
>
> ?  I can confirm that this patch works as well.  Is that what we prefer?

Not sure I can call that one, but it seems safer (especially for
backports).

> It would be consistent with what we do on GIMPLE currently (irrespective
> of still existing bugs in this area...).
>
> Note that with the above instead of dse1 removing the later store
> dse2 now decides to remove the earlier one...  (which is fine!).
>
> So I'm testing the above now, ok if it succeeds?

LGTM (but I can't approve it).

I was wondering later... if the problem is that we have:

   I1: store X to Y with alias set S1
   I2: load from Y with alias set S1
   I3: store X to Y with alias set S2
   I4: load from Y with alias set S2

and removing I3 allows I4 to be scheduled before I1, what would happen
if I1 and I3 store different values (with no dse)?  Would something stop
the scheduler from ordering it as:

   I1 I3 I2 I4

where I2 would load the wrong value?  If so, why didn't that same thing
work here?

Thanks,
Richard


RE: [PATCH] RL78 addsi3 improvement

2018-01-26 Thread Sebastian Perta
HI DJ,

Thank you!

>> I wonder if these types of optimizations should be added to the
assembler too?  
Thank you for the suggestion, I will take a look into it.

Best Regards,
Sebastian


> -Original Message-
> From: DJ Delorie [mailto:d...@redhat.com]
> Sent: 25 January 2018 19:38
> To: Sebastian Perta 
> Cc: gcc-patches@gcc.gnu.org
> Subject: Re: [PATCH] RL78 addsi3 improvement
> 
> 
> This is OK.
> 
> I wonder if these types of optimizations should be added to the
> assembler too?  At least, if relaxation is enabled...



[wwwdocs] GCC-7.3 changes: Add note about LEON3FT erratum workaround

2018-01-26 Thread Daniel Hellstrom

Hi,

I would like to commit the following comment about LEON3-FT errata now 
available in GCC-7.3.


Thanks,
Daniel


Index: htdocs/gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.98
diff -u -r1.98 changes.html
--- htdocs/gcc-7/changes.html    23 Jan 2018 06:41:52 -    1.98
+++ htdocs/gcc-7/changes.html    26 Jan 2018 12:05:33 -
@@ -1327,5 +1327,15 @@
  Support has been added for Epiphany target.
    

+Target Specific Changes
+
+SPARC
+  
+    Work arounds for the four LEON3FT errata
+    GRLIB-TN-0010..0013 have been added. Relevant errata are activated
+    by the target specific -mfix-ut699,
+    -mfix-ut700 and -mfix-gr712rc switches.
+  
+
 
 


Re: [PATCH] Fix PR84003

2018-01-26 Thread Richard Biener
On Fri, 26 Jan 2018, Richard Sandiford wrote:

> Richard Biener  writes:
> > On Thu, 25 Jan 2018, Richard Sandiford wrote:
> >
> >> Richard Sandiford  writes:
> >> > Richard Biener  writes:
> >> >> The following patch fixes PR84003 where RTL DSE removes a redundant
> >> >> store (a store storing the same value as an earlier store) but in
> >> >> doing this changing program semantics because the later store changes
> >> >> the effective type of the memory location.  This in turn allows
> >> >> sched2 to do an originally invalid transform.
> >> >>
> >> >> The fix is to harden DSE so that while removing the later store
> >> >> the earlier stores alias-sets are changed to reflect both dynamic
> >> >> types - which means using alias-set zero.
> >> >>
> >> >> On GIMPLE we simply avoid removing such type-changing stores because
> >> >> we cannot easily get hold on the earlier store.
> >> >>
> >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> >> >>
> >> >> Disclaimer: this is a "local" fix, I didn't try to understand much
> >> >> of the DSE data structures but only learned from +-10 lines around
> >> >> the affected transform which I found by searching for the dumped
> >> >> messages ...  Richard, you contributed the pass so please review.
> >> >
> >> > So the file says.  In reality I only wrote an early version, and what
> >> > was committed contains very little of that.  So TBH this pass is almost
> >> > a complete unknown to me.  That said...
> >> >
> >> > Might it be worth keeping the store instead?
> >> 
> >> ...that is, like gimple. :-)  Sorry, I spent so long thinking about this
> >> that I forgot you'd said that gimple already does the same thing.
> >
> > Yeah, and it still runs into PR82224 ... which means it is not 
> > conservative enough.
> >
> >> Richard
> >> 
> >> > Giving things alias set
> >> > zero seems like a pretty big hammer and would turn the remaining store
> >> > into something close to a scheduling barrier.
> >> >
> >> > IOW, how about checking alias_set_subset_of in:
> >> >
> >> >/* Even if PTR won't be eliminated as unneeded, if both
> >> >   PTR and this insn store the same constant value, we might
> >> >   eliminate this insn instead.  */
> >> >if (s_info->const_rhs
> >> >&& const_rhs
> >> >&& known_subrange_p (offset, width,
> >> > s_info->offset, s_info->width)
> >> >&& all_positions_needed_p (s_info, offset - s_info->offset,
> >> >   width))
> >> >
> >> > instead?
> >
> > That's what GIMPLE does (use alias_set_subset_of), but it arguably
> > doesn't work for unions (ok, there even the equality case doesn't 
> > work...).
> >
> > I guess I thought it's worth trying sth new and adjust the earlier
> > store ;)  Sth that I can't easily do on GIMPLE but everything seemed
> > to be in place in RTL DSE.
> >
> > So, when looking at the above code it seems like there are exactly
> > two insns we look at?  s_info and 'mem' I guess.  And s_info is
> > the earlier one.
> 
> Yeah.
> 
> > So sth like
> >
> > Index: gcc/dse.c
> > ===
> > --- gcc/dse.c   (revision 257077)
> > +++ gcc/dse.c   (working copy)
> > @@ -1532,7 +1532,12 @@ record_store (rtx body, bb_info_t bb_inf
> >   && known_subrange_p (offset, width,
> >s_info->offset, s_info->width)
> >   && all_positions_needed_p (s_info, offset - s_info->offset,
> > -width))
> > +width)
> > + /* We can only remove the later store if the earlier aliases
> > +at least all accesses the later one.  */
> > + && (MEM_ALIAS_SET (mem) == MEM_ALIAS_SET (s_info->mem)
> > + || alias_set_subset_of (MEM_ALIAS_SET (mem),
> > + MEM_ALIAS_SET (s_info->mem
> > {
> >   if (GET_MODE (mem) == BLKmode)
> > {
> >
> > ?  I can confirm that this patch works as well.  Is that what we prefer?
> 
> Not sure I can call that one, but it seems safer (especially for
> backports).
> 
> > It would be consistent with what we do on GIMPLE currently (irrespective
> > of still existing bugs in this area...).
> >
> > Note that with the above instead of dse1 removing the later store
> > dse2 now decides to remove the earlier one...  (which is fine!).
> >
> > So I'm testing the above now, ok if it succeeds?
> 
> LGTM (but I can't approve it).

I can self-approve ;)

> I was wondering later... if the problem is that we have:
> 
>I1: store X to Y with alias set S1
>I2: load from Y with alias set S1
>I3: store X to Y with alias set S2
>I4: load from Y with alias set S2
> 
> and removing I3 allows I4 to be scheduled before I1, what would happen
> if I1 and I3 store different values (with no dse)?  Would something stop
> the

Re: [wwwdocs] GCC-7.3 changes: Add note about LEON3FT erratum workaround

2018-01-26 Thread Daniel Hellstrom

Sebastian, thanks for the feedback! I have updated the patch accordingly:

Index: htdocs/gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.98
diff -u -r1.98 changes.html
--- htdocs/gcc-7/changes.html    23 Jan 2018 06:41:52 -    1.98
+++ htdocs/gcc-7/changes.html    26 Jan 2018 12:24:49 -
@@ -1320,6 +1320,16 @@
 are not listed here).

 

+Target Specific Changes
+
+SPARC
+  
+    Work arounds for the four href="http://www.gaisler.com/index.php/information/app-tech-notes";>
+    LEON3FT errata GRLIB-TN-0010..0013 have been added. 
Relevant errata

+    are activated by the target specific -mfix-ut699,
+    -mfix-ut700 and -mfix-gr712rc 
switches.

+  
+
 Operating Systems

 RTEMS


On 2018-01-26 13:09, Daniel Hellstrom wrote:

Hi,

I would like to commit the following comment about LEON3-FT errata now 
available in GCC-7.3.


Thanks,
Daniel


Index: htdocs/gcc-7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.98
diff -u -r1.98 changes.html
--- htdocs/gcc-7/changes.html    23 Jan 2018 06:41:52 - 1.98
+++ htdocs/gcc-7/changes.html    26 Jan 2018 12:05:33 -
@@ -1327,5 +1327,15 @@
  Support has been added for Epiphany target.
    

+Target Specific Changes
+
+SPARC
+  
+    Work arounds for the four LEON3FT errata
+    GRLIB-TN-0010..0013 have been added. Relevant errata are 
activated

+    by the target specific -mfix-ut699,
+    -mfix-ut700 and -mfix-gr712rc 
switches.

+  
+
 
 




Re: Silence false positive on LTO type merging waring

2018-01-26 Thread Jan Hubicka
> On Thu, 25 Jan 2018, Jan Hubicka wrote:
> 
> > Hi,
> > the testcase triggers invalid warning on type mismatch because array
> > of pointers to complete type has different alias set from array of pointers
> > to incomplete type.  This is valid, because incoplete pointer has alias set
> > of void_ptr which alias all pointers and arrays alias with their members.
> 
> But isn't that a problem?  Not if the pointer to incomlete type have
> alias set zero but IIRC they don't, right?

pointers to incomplete type are same as alias set of ptr_type_node (1 in this 
case)
and pointer to complete types have different alias set but that set has
is_pointer flag.  When checking for conflict we make 1 to alias with everything 
that
has is_pointer flag in it, so it will return true for these two array types.

Honza
> 
> Richard.
> 
> > I already silenced the wanring for pointers but I forgot about arrays.
> > 
> > Bootstrapped/regtested x86_64-linux, OK?
> > 
> > * gcc.dg/lto/pr83954.h: New testcase.
> > * gcc.dg/lto/pr83954_0.c: New testcase.
> > * gcc.dg/lto/pr83954_1.c: New testcase.
> > * lto-symtab.c (warn_type_compatibility_p): Silence false positive
> > for type match warning on arrays of pointers.
> > 
> > Index: testsuite/gcc.dg/lto/pr83954.h
> > ===
> > --- testsuite/gcc.dg/lto/pr83954.h  (revision 0)
> > +++ testsuite/gcc.dg/lto/pr83954.h  (working copy)
> > @@ -0,0 +1,3 @@
> > +struct foo;
> > +extern struct foo *FOO_PTR_ARR[1];
> > +
> > Index: testsuite/gcc.dg/lto/pr83954_0.c
> > ===
> > --- testsuite/gcc.dg/lto/pr83954_0.c(revision 0)
> > +++ testsuite/gcc.dg/lto/pr83954_0.c(working copy)
> > @@ -0,0 +1,8 @@
> > +/* { dg-lto-do link } */
> > +#include "pr83954.h"
> > +
> > +int main() { 
> > +  // just to prevent symbol removal
> > +  FOO_PTR_ARR[1] = 0;
> > +  return 0;
> > +}
> > Index: testsuite/gcc.dg/lto/pr83954_1.c
> > ===
> > --- testsuite/gcc.dg/lto/pr83954_1.c(revision 0)
> > +++ testsuite/gcc.dg/lto/pr83954_1.c(working copy)
> > @@ -0,0 +1,7 @@
> > +#include "pr83954.h"
> > +
> > +struct foo {
> > + int x;
> > +};
> > +struct foo *FOO_PTR_ARR[1] = { 0 };
> > +
> > Index: lto/lto-symtab.c
> > ===
> > --- lto/lto-symtab.c(revision 257009)
> > +++ lto/lto-symtab.c(working copy)
> > @@ -284,11 +284,22 @@ warn_type_compatibility_p (tree prevaili
> >alias_set_type set1 = get_alias_set (type);
> >alias_set_type set2 = get_alias_set (prevailing_type);
> >  
> > -  if (set1 && set2 && set1 != set2 
> > -  && (!POINTER_TYPE_P (type) || !POINTER_TYPE_P (prevailing_type)
> > +  if (set1 && set2 && set1 != set2)
> > +   {
> > +  tree t1 = type, t2 = prevailing_type;
> > +
> > + /* Alias sets of arrays are the same as alias sets of the inner
> > +types.  */
> > + while (TREE_CODE (t1) == ARRAY_TYPE && TREE_CODE (t2) == ARRAY_TYPE)
> > +   {
> > + t1 = TREE_TYPE (t1);
> > + t2 = TREE_TYPE (t2);
> > +   }
> > +  if ((!POINTER_TYPE_P (t1) || !POINTER_TYPE_P (t2))
> >   || (set1 != TYPE_ALIAS_SET (ptr_type_node)
> > - && set2 != TYPE_ALIAS_SET (ptr_type_node
> > -lev |= 5;
> > + && set2 != TYPE_ALIAS_SET (ptr_type_node)))
> > + lev |= 5;
> > +   }
> >  }
> >  
> >return lev;
> > 
> > 
> 
> -- 
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


Re: C++ PATCH to fix ICE with vector expr folding (PR c++/83659)

2018-01-26 Thread Richard Biener
On Thu, Jan 25, 2018 at 3:45 PM, Jakub Jelinek  wrote:
> On Fri, Jan 05, 2018 at 09:52:36AM +0100, Richard Biener wrote:
>> On Wed, Jan 3, 2018 at 5:31 PM, Marek Polacek  wrote:
>> > Here we are crashing because cxx_fold_indirect_ref got a POINTER_PLUS_EXPR
>> > with offset > signed HOST_WIDE_INT and we tried to convert it to sHWI.
>> >
>> > The matching code in fold_indirect_ref_1 uses uHWIs so I've followed suit.
>> > But that code now also uses poly_uint64 and I'm not sure if any of the
>> > constexpr.c code should use it, too.  But this patch fixes the ICE.
>>
>> POINTER_PLUS_EXPR offets are to be interpreted as signed (ptrdiff_t)
>> so using uhwi and then performing an unsigned division is wrong code.
>> See mem_ref_offset how to deal with this (ugh - poly-ints...).  Basically
>> you have to force the thing to signed.  Like just use
>>
>>   HOST_WIDE_INT offset = TREE_INT_CST_LOW (op01);
>
> Does it really matter here though?  Any negative offsets there are UB, we
> should punt on them rather than try to optimize them.
> As we known op01 is unsigned, if we check that it fits into shwi_p, it means
> it will be 0 to shwi max and then we can handle it in uhwi too.

Ah, of course.  Didn't look up enough context to see what this code
does in the end ;)

>   /* ((foo*)&vectorfoo)[1] => BIT_FIELD_REF */
>   if (VECTOR_TYPE_P (op00type)
>   && (same_type_ignoring_top_level_qualifiers_p
> -(type, TREE_TYPE (op00type
> +(type, TREE_TYPE (op00type)))
> + && tree_fits_shwi_p (op01))

nevertheless this appearant "mismatch" deserves a comment (checking
shwi and using uhwi).

> {
> - HOST_WIDE_INT offset = tree_to_shwi (op01);
> + unsigned HOST_WIDE_INT offset = tree_to_uhwi (op01);
>   tree part_width = TYPE_SIZE (type);
> - unsigned HOST_WIDE_INT part_widthi = tree_to_shwi 
> (part_width)/BITS_PER_UNIT;
> + unsigned HOST_WIDE_INT part_widthi
> +   = tree_to_uhwi (part_width) / BITS_PER_UNIT;
>   unsigned HOST_WIDE_INT indexi = offset * BITS_PER_UNIT;
>   tree index = bitsize_int (indexi);
>
>   if (known_lt (offset / part_widthi,
> TYPE_VECTOR_SUBPARTS (op00type)))
> return fold_build3_loc (loc,
> BIT_FIELD_REF, type, op00,
> part_width, index);
>
> }
>
> Jakub


[PATCH] Fix lambdas in template default argument of inherited ctor. (backport for PR 81860)

2018-01-26 Thread Jonathan Wakely

This is a backport of r251426 which incidentally fixed PR 81860 and
its dup. The bug was closed as fixed, but as a regression it should be
fixed for 7.x too.

The patch applied cleanly to the branch except for a minor conflict in
get_defaulted_eh_spec because r250994 isn't on the branch.

Tested x86_64-linux, OK for gcc-7-branch?


Backport from mainline
2017-08-29  Jason Merrill  

* method.c (synthesized_method_base_walk): Replace an inherited
template with its specialization.
(synthesized_method_walk): Make inheriting_ctor a pointer.
(maybe_explain_implicit_delete, explain_implicit_non_constexpr)
(deduce_inheriting_ctor, implicitly_declare_fn): Adjust.

Backport from mainline
2018-01-02  Marek Polacek  

PR c++/81860
* g++.dg/cpp0x/inh-ctor30.C: New test.


commit 89229ea7637591687eec623839dc2a3b93c2c99d
Author: Jonathan Wakely 
Date:   Fri Jan 26 12:24:24 2018 +

Fix lambdas in template default argument of inherited ctor.

Backport from mainline
2017-08-29  Jason Merrill  

* method.c (synthesized_method_base_walk): Replace an inherited
template with its specialization.
(synthesized_method_walk): Make inheriting_ctor a pointer.
(maybe_explain_implicit_delete, explain_implicit_non_constexpr)
(deduce_inheriting_ctor, implicitly_declare_fn): Adjust.

Backport from mainline
2018-01-02  Marek Polacek  

PR c++/81860
* g++.dg/cpp0x/inh-ctor30.C: New test.

diff --git a/gcc/cp/method.c b/gcc/cp/method.c
index c7b67589924..59ad43f73fe 100644
--- a/gcc/cp/method.c
+++ b/gcc/cp/method.c
@@ -1430,7 +1430,7 @@ static tree
 synthesized_method_base_walk (tree binfo, tree base_binfo, 
  int quals, bool copy_arg_p,
  bool move_p, bool ctor_p,
- tree inheriting_ctor, tree inherited_parms,
+ tree *inheriting_ctor, tree inherited_parms,
  tree fnname, int flags, bool diag,
  tree *spec_p, bool *trivial_p,
  bool *deleted_p, bool *constexpr_p)
@@ -1441,8 +1441,9 @@ synthesized_method_base_walk (tree binfo, tree base_binfo,
 
   if (copy_arg_p)
 argtype = build_stub_type (BINFO_TYPE (base_binfo), quals, move_p);
-  else if ((inherited_binfo
-   = binfo_inherited_from (binfo, base_binfo, inheriting_ctor)))
+  else if (inheriting_ctor
+  && (inherited_binfo
+  = binfo_inherited_from (binfo, base_binfo, *inheriting_ctor)))
 {
   argtype = inherited_parms;
   /* Don't check access on the inherited constructor.  */
@@ -1464,6 +1465,12 @@ synthesized_method_base_walk (tree binfo, tree 
base_binfo,
   if (defer != dk_no_deferred)
 pop_deferring_access_checks ();
 
+  /* Replace an inherited template with the appropriate specialization.  */
+  if (inherited_binfo && rval
+  && DECL_P (*inheriting_ctor) && DECL_P (rval)
+  && DECL_CONTEXT (*inheriting_ctor) == DECL_CONTEXT (rval))
+*inheriting_ctor = DECL_CLONED_FUNCTION (rval);
+
   process_subob_fn (rval, spec_p, trivial_p, deleted_p,
constexpr_p, diag, BINFO_TYPE (base_binfo));
   if (ctor_p &&
@@ -1498,7 +1505,7 @@ static void
 synthesized_method_walk (tree ctype, special_function_kind sfk, bool const_p,
 tree *spec_p, bool *trivial_p, bool *deleted_p,
 bool *constexpr_p, bool diag,
-tree inheriting_ctor, tree inherited_parms)
+tree *inheriting_ctor, tree inherited_parms)
 {
   tree binfo, base_binfo, fnname;
   int i;
@@ -1553,7 +1560,7 @@ synthesized_method_walk (tree ctype, 
special_function_kind sfk, bool const_p,
 }
 
   gcc_assert ((sfk == sfk_inheriting_constructor)
- == (inheriting_ctor != NULL_TREE));
+ == (inheriting_ctor && *inheriting_ctor != NULL_TREE));
 
   /* If that user-written default constructor would satisfy the
  requirements of a constexpr constructor (7.1.5), the
@@ -1628,7 +1635,7 @@ synthesized_method_walk (tree ctype, 
special_function_kind sfk, bool const_p,
   tree scope = push_scope (ctype);
 
   int flags = LOOKUP_NORMAL | LOOKUP_SPECULATIVE;
-  if (!inheriting_ctor)
+  if (sfk != sfk_inheriting_constructor)
 flags |= LOOKUP_DEFAULTED;
 
   tsubst_flags_t complain = diag ? tf_warning_or_error : tf_none;
@@ -1731,9 +1738,9 @@ get_defaulted_eh_spec (tree decl)
   tree parm_type = TREE_VALUE (parms);
   bool const_p = CP_TYPE_CONST_P (non_reference (parm_type));
   tree spec = empty_except_spec;
+  tree inh = DECL_INHERITED_CTOR (decl);
   synthesized_method_walk (ctype, sfk, const_p, &spec, NULL, NULL,
-  NULL, false, DECL_INHERITED_CTOR (decl),
-  parms);
+  NULL, false, &inh, parms);
   return spec;

Fix LRA subreg calculation for big-endian targets

2018-01-26 Thread Richard Sandiford
LRA was using a subreg offset of 0 whenever constraints matched
two operands with different modes.  That leads to an invalid offset
(and ICE) on big-endian targets if one of the modes is narrower
than a word.  E.g. if a (reg:SI X) is matched to a (reg:QI Y),
the big-endian subreg should be (subreg:QI (reg:SI X) 3) rather
than (subreg:QI (reg:SI X) 0).

But this raises the issue of what the behaviour should be when the
matched operands occupy different numbers of registers.  Should the
register numbers match, or should the locations of the lsbs match?
Although the documentation isn't clear, reload went for the second
interpretation (which seems the most natural to me):

  /* On a REG_WORDS_BIG_ENDIAN machine, point to the last register of a
 multiple hard register group of scalar integer registers, so that
 for example (reg:DI 0) and (reg:SI 1) will be considered the same
 register.  */

So I think this means that we can/must use the lowpart offset
unconditionally, rather than trying to separate out the multi-register
case.  This also matches the LRA handling of constant integers, which
already uses lowpart subregs.

The patch fixes gcc.target/aarch64/sve/extract_[34].c for aarch64_be.

Tested on aarch64_be-none-elf, aarch64-linux-gnu and x86_64-linux-gnu.
OK to install?


2018-01-26  Richard Sandiford  

gcc/
* lra-constraints.c (match_reload): Use subreg_lowpart_offset
rather than 0 when creating partial subregs.

Index: gcc/lra-constraints.c
===
--- gcc/lra-constraints.c   2018-01-20 13:43:02.060083731 +
+++ gcc/lra-constraints.c   2018-01-26 13:22:46.350577506 +
@@ -945,7 +945,10 @@ match_reload (signed char out, signed ch
  if (SCALAR_INT_MODE_P (inmode))
new_out_reg = gen_lowpart_SUBREG (outmode, reg);
  else
-   new_out_reg = gen_rtx_SUBREG (outmode, reg, 0);
+   {
+ poly_uint64 offset = subreg_lowpart_offset (outmode, inmode);
+ new_out_reg = gen_rtx_SUBREG (outmode, reg, offset);
+   }
  LRA_SUBREG_P (new_out_reg) = 1;
  /* If the input reg is dying here, we can use the same hard
 register for REG and IN_RTX.  We do it only for original
@@ -965,7 +968,10 @@ match_reload (signed char out, signed ch
  if (SCALAR_INT_MODE_P (outmode))
new_in_reg = gen_lowpart_SUBREG (inmode, reg);
  else
-   new_in_reg = gen_rtx_SUBREG (inmode, reg, 0);
+   {
+ poly_uint64 offset = subreg_lowpart_offset (inmode, outmode);
+ new_in_reg = gen_rtx_SUBREG (inmode, reg, offset);
+   }
  /* NEW_IN_REG is non-paradoxical subreg.  We don't want
 NEW_OUT_REG living above.  We add clobber clause for
 this.  This is just a temporary clobber.  We can remove


[AArch64] Fix sve/extract_[12].c for big-endian SVE

2018-01-26 Thread Richard Sandiford
sve/extract_[12].c were relying on the target-independent optimisation
that removes a redundant vec_select, so that we don't end up with
things like:

dup v0.4s, v0.4s[0]
...use s0...

But that optimisation rightly doesn't trigger for big-endian targets,
because GCC expects lane 0 to be in the high part of the register
rather than the low part.

SVE breaks this assumption -- see the comment at the head of
aarch64-sve.md for details -- so the optimisation is valid for
both endiannesses.  Long term, we probably need some kind of target
hook to make GCC aware of this.

But there's another problem with the current extract pattern: it doesn't
tell the register allocator how cheap an extraction of lane 0 is with
tied registers.  It seems better to split the lane 0 case out into
its own pattern and use tied operands for the FPR<-SIMD case,
so that using different registers has the cost of an extra reload.
I think we want this for both endiannesses, regardless of the hook
described above.

Also, the gen_lowpart in this pattern fails for aarch64_be due to
TARGET_CAN_CHANGE_MODE_CLASS restrictions, so the patch uses gen_rtx_REG
instead.  We're only creating this rtl in order to print it, so there's
no need for anything fancier.

Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?

Richard


2018-01-26  Richard Sandiford  

gcc/
* config/aarch64/aarch64-sve.md (*vec_extract_0): New
pattern.
(*vec_extract_v128): Require a nonzero lane number.
Use gen_rtx_REG rather than gen_lowpart.

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2018-01-13 18:01:51.232735405 +
+++ gcc/config/aarch64/aarch64-sve.md   2018-01-26 13:26:50.176756711 +
@@ -484,18 +484,52 @@ (define_expand "vec_extract"
   }
 )
 
+;; Extract element zero.  This is a special case because we want to force
+;; the registers to be the same for the second alternative, and then
+;; split the instruction into nothing after RA.
+(define_insn_and_split "*vec_extract_0"
+  [(set (match_operand: 0 "aarch64_simd_nonimmediate_operand" "=r, w, 
Utv")
+   (vec_select:
+ (match_operand:SVE_ALL 1 "register_operand" "w, 0, w")
+ (parallel [(const_int 0)])))]
+  "TARGET_SVE"
+  {
+operands[1] = gen_rtx_REG (mode, REGNO (operands[1]));
+switch (which_alternative)
+  {
+   case 0:
+ return "umov\\t%0, %1.[0]";
+   case 1:
+ return "#";
+   case 2:
+ return "st1\\t{%1.}[0], %0";
+   default:
+ gcc_unreachable ();
+  }
+  }
+  "&& reload_completed
+   && REG_P (operands[1])
+   && REGNO (operands[0]) == REGNO (operands[1])"
+  [(const_int 0)]
+  {
+emit_note (NOTE_INSN_DELETED);
+DONE;
+  }
+  [(set_attr "type" "neon_to_gp_q, untyped, neon_store1_one_lane_q")]
+)
+
 ;; Extract an element from the Advanced SIMD portion of the register.
 ;; We don't just reuse the aarch64-simd.md pattern because we don't
-;; want any chnage in lane number on big-endian targets.
+;; want any change in lane number on big-endian targets.
 (define_insn "*vec_extract_v128"
   [(set (match_operand: 0 "aarch64_simd_nonimmediate_operand" "=r, w, 
Utv")
(vec_select:
  (match_operand:SVE_ALL 1 "register_operand" "w, w, w")
  (parallel [(match_operand:SI 2 "const_int_operand")])))]
   "TARGET_SVE
-   && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (mode), 0, 15)"
+   && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (mode), 1, 15)"
   {
-operands[1] = gen_lowpart (mode, operands[1]);
+operands[1] = gen_rtx_REG (mode, REGNO (operands[1]));
 switch (which_alternative)
   {
case 0:


[AArch64] Tighten aarch64_secondary_reload condition (PR 83845)

2018-01-26 Thread Richard Sandiford
aarch64_secondary_reload enforced a secondary reload via
aarch64_sve_reload_be for memory and pseudo registers, but failed
to do the same for subregs of pseudo registers.  To avoid this and
any similar problems, the patch instead tests for things that the move
patterns handle directly; if the operand isn't one of those, we should
use the reload pattern instead.

The patch fixes an ICE in sve/mask_struct_store_3.c for aarch64_be,
where the bogus target description was (rightly) causing LRA to cycle.

Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?

Richard


2018-01-26  Richard Sandiford  

gcc/
PR tearget/83845
* config/aarch64/aarch64.c (aarch64_secondary_reload): Tighten
check for operands that need to go through aarch64_sve_reload_be.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2018-01-19 11:57:11.141991997 +
+++ gcc/config/aarch64/aarch64.c2018-01-26 13:32:54.240529011 +
@@ -7249,9 +7249,14 @@ aarch64_secondary_reload (bool in_p ATTR
  machine_mode mode,
  secondary_reload_info *sri)
 {
+  /* Use aarch64_sve_reload_be for SVE reloads that cannot be handled
+ directly by the *aarch64_sve_mov_be move pattern.  See the
+ comment at the head of aarch64-sve.md for more details about the
+ big-endian handling.  */
   if (BYTES_BIG_ENDIAN
   && reg_class_subset_p (rclass, FP_REGS)
-  && (MEM_P (x) || (REG_P (x) && !HARD_REGISTER_P (x)))
+  && !((REG_P (x) && HARD_REGISTER_P (x))
+  || aarch64_simd_valid_immediate (x, NULL))
   && aarch64_sve_data_mode_p (mode))
 {
   sri->icode = CODE_FOR_aarch64_sve_reload_be;


[C++/82878] backport fix

2018-01-26 Thread Nathan Sidwell
I'm applying this backport to the gcc-7 branch.  Jonathan was kind 
enough to point out it's needed tehre.


nathan
--
Nathan Sidwell
2018-01-26  Nathan Sidwell  

	PR c++/82878
	PR c++/78495
	* call.c (build_call_a): Don't set CALL_FROM_THUNK_P for inherited
	ctor.
	* cp-gimplify.c	(cp_genericize_r): Restore THUNK dereference
	inhibibition check removed in previous c++/78495 change.

	PR c++/82878
	* g++.dg/cpp0x/pr82878.C: New.
	* g++.dg/cpp1z/inh-ctor38.C: Check moves too.

Index: cp/call.c
===
--- cp/call.c	(revision 257087)
+++ cp/call.c	(working copy)
@@ -375,18 +375,10 @@ build_call_a (tree function, int n, tree
 
   TREE_HAS_CONSTRUCTOR (function) = (decl && DECL_CONSTRUCTOR_P (decl));
 
-  if (current_function_decl && decl
-  && flag_new_inheriting_ctors
-  && DECL_INHERITED_CTOR (current_function_decl)
-  && (DECL_INHERITED_CTOR (current_function_decl)
-	  == DECL_CLONED_FUNCTION (decl)))
-/* Pass arguments directly to the inherited constructor.  */
-CALL_FROM_THUNK_P (function) = true;
-
   /* Don't pass empty class objects by value.  This is useful
  for tags in STL, which are used to control overload resolution.
  We don't need to handle other cases of copying empty classes.  */
-  else if (! decl || ! DECL_BUILT_IN (decl))
+  if (! decl || ! DECL_BUILT_IN (decl))
 for (i = 0; i < n; i++)
   {
 	tree arg = CALL_EXPR_ARG (function, i);
Index: cp/cp-gimplify.c
===
--- cp/cp-gimplify.c	(revision 257087)
+++ cp/cp-gimplify.c	(working copy)
@@ -1107,6 +1107,14 @@ cp_genericize_r (tree *stmt_p, int *walk
   && omp_var_to_track (stmt))
 omp_cxx_notice_variable (wtd->omp_ctx, stmt);
 
+  /* Don't dereference parms in a thunk, pass the references through. */
+  if ((TREE_CODE (stmt) == CALL_EXPR && CALL_FROM_THUNK_P (stmt))
+  || (TREE_CODE (stmt) == AGGR_INIT_EXPR && AGGR_INIT_FROM_THUNK_P (stmt)))
+{
+  *walk_subtrees = 0;
+  return NULL;
+}
+
   /* Dereference invisible reference parms.  */
   if (wtd->handle_invisiref_parm_p && is_invisiref_parm (stmt))
 {
Index: cp/lambda.c
===
--- cp/lambda.c	(revision 257087)
+++ cp/lambda.c	(working copy)
@@ -1072,7 +1072,6 @@ maybe_add_lambda_conv_op (tree type)
   }
   }
 
-
   if (generic_lambda_p)
 {
   if (decltype_call)
Index: testsuite/g++.dg/cpp0x/pr82878.C
===
--- testsuite/g++.dg/cpp0x/pr82878.C	(revision 0)
+++ testsuite/g++.dg/cpp0x/pr82878.C	(working copy)
@@ -0,0 +1,20 @@
+// { dg-do compile { target c++11 } }
+// { dg-additional-options "-O" }
+// pr 82878 erroneously unwrapped a reference parm in the lambda::_FUN
+// thunk.
+
+struct A {
+  ~A();
+  operator int ();
+};
+
+void baz ();
+
+void
+bar (A b)
+{
+  void (*lam) (A) = [](A) { baz (); };
+
+  if (auto c = b)
+lam (c);
+}
Index: testsuite/g++.dg/cpp1z/inh-ctor38.C
===
--- testsuite/g++.dg/cpp1z/inh-ctor38.C	(revision 257087)
+++ testsuite/g++.dg/cpp1z/inh-ctor38.C	(working copy)
@@ -1,17 +1,19 @@
 // { dg-do run { target c++11 } }
 // PR78495 failed to propagate pass-by-value struct to base ctor.
 
+static int moves = 0;
+
 struct Ptr {
   void *ptr = 0;
 
   Ptr() {}
   Ptr(Ptr const&) = delete;
-  Ptr(Ptr&& other) : ptr (other.ptr) {}
+  Ptr(Ptr&& other) : ptr (other.ptr) {moves++;}
 };
 
 struct Base {
   Ptr val;
-  Base(Ptr val_) : val(static_cast(val_)) {}
+  Base(Ptr val_);
 };
 
 struct Derived: Base {
@@ -27,5 +29,13 @@ void *Foo () {
 }
 
 int main () {
-  return Foo () != 0;
+  if (Foo ())
+return 1;
+
+  if (moves != 2)
+return 2;
+
+  return 0;
 }
+
+Base::Base(Ptr val_) : val(static_cast(val_)) {}


Re: Silence false positive on LTO type merging waring

2018-01-26 Thread Richard Biener
On Fri, 26 Jan 2018, Jan Hubicka wrote:

> > On Thu, 25 Jan 2018, Jan Hubicka wrote:
> > 
> > > Hi,
> > > the testcase triggers invalid warning on type mismatch because array
> > > of pointers to complete type has different alias set from array of 
> > > pointers
> > > to incomplete type.  This is valid, because incoplete pointer has alias 
> > > set
> > > of void_ptr which alias all pointers and arrays alias with their members.
> > 
> > But isn't that a problem?  Not if the pointer to incomlete type have
> > alias set zero but IIRC they don't, right?
> 
> pointers to incomplete type are same as alias set of ptr_type_node (1 in this 
> case)
> and pointer to complete types have different alias set but that set has
> is_pointer flag.  When checking for conflict we make 1 to alias with 
> everything that
> has is_pointer flag in it, so it will return true for these two array types.

Ah, ok.  The patch is ok then.  I suppose in theory complex pointers
or vector pointers would have the same issue if we'd ever generate
those (GIMPLE at least doens't like them ;)).

Richard.

> Honza
> > 
> > Richard.
> > 
> > > I already silenced the wanring for pointers but I forgot about arrays.
> > > 
> > > Bootstrapped/regtested x86_64-linux, OK?
> > >   
> > >   * gcc.dg/lto/pr83954.h: New testcase.
> > >   * gcc.dg/lto/pr83954_0.c: New testcase.
> > >   * gcc.dg/lto/pr83954_1.c: New testcase.
> > >   * lto-symtab.c (warn_type_compatibility_p): Silence false positive
> > >   for type match warning on arrays of pointers.
> > > 
> > > Index: testsuite/gcc.dg/lto/pr83954.h
> > > ===
> > > --- testsuite/gcc.dg/lto/pr83954.h(revision 0)
> > > +++ testsuite/gcc.dg/lto/pr83954.h(working copy)
> > > @@ -0,0 +1,3 @@
> > > +struct foo;
> > > +extern struct foo *FOO_PTR_ARR[1];
> > > +
> > > Index: testsuite/gcc.dg/lto/pr83954_0.c
> > > ===
> > > --- testsuite/gcc.dg/lto/pr83954_0.c  (revision 0)
> > > +++ testsuite/gcc.dg/lto/pr83954_0.c  (working copy)
> > > @@ -0,0 +1,8 @@
> > > +/* { dg-lto-do link } */
> > > +#include "pr83954.h"
> > > +
> > > +int main() { 
> > > +  // just to prevent symbol removal
> > > +  FOO_PTR_ARR[1] = 0;
> > > +  return 0;
> > > +}
> > > Index: testsuite/gcc.dg/lto/pr83954_1.c
> > > ===
> > > --- testsuite/gcc.dg/lto/pr83954_1.c  (revision 0)
> > > +++ testsuite/gcc.dg/lto/pr83954_1.c  (working copy)
> > > @@ -0,0 +1,7 @@
> > > +#include "pr83954.h"
> > > +
> > > +struct foo {
> > > + int x;
> > > +};
> > > +struct foo *FOO_PTR_ARR[1] = { 0 };
> > > +
> > > Index: lto/lto-symtab.c
> > > ===
> > > --- lto/lto-symtab.c  (revision 257009)
> > > +++ lto/lto-symtab.c  (working copy)
> > > @@ -284,11 +284,22 @@ warn_type_compatibility_p (tree prevaili
> > >alias_set_type set1 = get_alias_set (type);
> > >alias_set_type set2 = get_alias_set (prevailing_type);
> > >  
> > > -  if (set1 && set2 && set1 != set2 
> > > -  && (!POINTER_TYPE_P (type) || !POINTER_TYPE_P (prevailing_type)
> > > +  if (set1 && set2 && set1 != set2)
> > > + {
> > > +  tree t1 = type, t2 = prevailing_type;
> > > +
> > > +   /* Alias sets of arrays are the same as alias sets of the inner
> > > +  types.  */
> > > +   while (TREE_CODE (t1) == ARRAY_TYPE && TREE_CODE (t2) == ARRAY_TYPE)
> > > + {
> > > +   t1 = TREE_TYPE (t1);
> > > +   t2 = TREE_TYPE (t2);
> > > + }
> > > +  if ((!POINTER_TYPE_P (t1) || !POINTER_TYPE_P (t2))
> > > || (set1 != TYPE_ALIAS_SET (ptr_type_node)
> > > -   && set2 != TYPE_ALIAS_SET (ptr_type_node
> > > -lev |= 5;
> > > +   && set2 != TYPE_ALIAS_SET (ptr_type_node)))
> > > + lev |= 5;
> > > + }
> > >  }
> > >  
> > >return lev;
> > > 
> > > 
> > 
> > -- 
> > Richard Biener 
> > SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> > 21284 (AG Nuernberg)
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[AArch64] Generalise aarch64_simd_valid_immediate for SVE

2018-01-26 Thread Richard Sandiford
The current aarch64_simd_valid_immediate code predates the move
to the new CONST_VECTOR representation, so for variable-length SVE
it only handles duplicates of single elements, rather than duplicates
of repeating patterns.

This patch removes the restriction.  It means that the validity
of a duplicated constant depends only on the bit pattern, not on
the mode used to represent it.

The patch is needed by a later big-endian fix.

Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?

Richard


2018-01-26  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_simd_valid_immediate): Handle
all CONST_VECTOR_DUPLICATE_P vectors, not just those with a single
duplicated element.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2018-01-26 13:32:54.240529011 +
+++ gcc/config/aarch64/aarch64.c2018-01-26 13:46:00.955822193 +
@@ -13164,10 +13164,11 @@ aarch64_simd_valid_immediate (rtx op, si
 return false;
 
   scalar_mode elt_mode = GET_MODE_INNER (mode);
-  rtx elt = NULL, base, step;
+  rtx base, step;
   unsigned int n_elts;
-  if (const_vec_duplicate_p (op, &elt))
-n_elts = 1;
+  if (GET_CODE (op) == CONST_VECTOR
+  && CONST_VECTOR_DUPLICATE_P (op))
+n_elts = CONST_VECTOR_NPATTERNS (op);
   else if ((vec_flags & VEC_SVE_DATA)
   && const_vec_series_p (op, &base, &step))
 {
@@ -13192,14 +13193,17 @@ aarch64_simd_valid_immediate (rtx op, si
|| op == CONSTM1_RTX (mode));
 
   scalar_float_mode elt_float_mode;
-  if (elt
-  && is_a  (elt_mode, &elt_float_mode)
-  && (aarch64_float_const_zero_rtx_p (elt)
- || aarch64_float_const_representable_p (elt)))
+  if (n_elts == 1
+  && is_a  (elt_mode, &elt_float_mode))
 {
-  if (info)
-   *info = simd_immediate_info (elt_float_mode, elt);
-  return true;
+  rtx elt = CONST_VECTOR_ENCODED_ELT (op, 0);
+  if (aarch64_float_const_zero_rtx_p (elt)
+ || aarch64_float_const_representable_p (elt))
+   {
+ if (info)
+   *info = simd_immediate_info (elt_float_mode, elt);
+ return true;
+   }
 }
 
   unsigned int elt_size = GET_MODE_SIZE (elt_mode);
@@ -13214,11 +13218,11 @@ aarch64_simd_valid_immediate (rtx op, si
   bytes.reserve (n_elts * elt_size);
   for (unsigned int i = 0; i < n_elts; i++)
 {
-  if (!elt || n_elts != 1)
-   /* The vector is provided in gcc endian-neutral fashion.
-  For aarch64_be, it must be laid out in the vector register
-  in reverse order.  */
-   elt = CONST_VECTOR_ELT (op, BYTES_BIG_ENDIAN ? (n_elts - 1 - i) : i);
+  /* The vector is provided in gcc endian-neutral fashion.
+For aarch64_be Advanced SIMD, it must be laid out in the vector
+register in reverse order.  */
+  bool swap_p = ((vec_flags & VEC_ADVSIMD) != 0 && BYTES_BIG_ENDIAN);
+  rtx elt = CONST_VECTOR_ELT (op, swap_p ? (n_elts - 1 - i) : i);
 
   if (elt_mode != elt_int_mode)
elt = gen_lowpart (elt_int_mode, elt);


[AArch64] Use all SVE LD1RQ variants

2018-01-26 Thread Richard Sandiford
The fallback way of handling a repeated 128-bit constant vector for SVE
is to force the 128 bits to the constant pool and use LD1RQ to load it.
Previously the code always used the byte variant of LD1RQ (LD1RQB),
with a preceding BSWAP for big-endian targets.  However, that BSWAP
doesn't handle all cases correctly.

The simplest fix seemed to be to use the LD1RQ appropriate for the
element size.

This helps to fix some of the sve/slp_*.c tests for aarch64_be,
although a later patch is needed as well.

Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?

Richard


2018-01-26  Richard Sandiford  

gcc/
* config/aarch64/aarch64-sve.md (sve_ld1rq): Replace with...
(*sve_ld1rq): ... this new pattern.  Handle all element sizes,
not just bytes.
* config/aarch64/aarch64.c (aarch64_expand_sve_widened_duplicate):
Remove BSWAP handing for big-endian targets and use the form of
LD1RQ appropariate for the mode.

gcc/testsuite/
* gcc.target/aarch64/sve/slp_2.c: Expect LD1RQD rather than LD1RQB.
* gcc.target/aarch64/sve/slp_3.c: Expect LD1RQW rather than LD1RQB.
* gcc.target/aarch64/sve/slp_4.c: Expect LD1RQH rather than LD1RQB.

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2018-01-26 13:26:50.176756711 +
+++ gcc/config/aarch64/aarch64-sve.md   2018-01-26 13:49:00.069630245 +
@@ -652,14 +652,14 @@ (define_insn "sve_ld1r"
 ;; Load 128 bits from memory and duplicate to fill a vector.  Since there
 ;; are so few operations on 128-bit "elements", we don't define a VNx1TI
 ;; and simply use vectors of bytes instead.
-(define_insn "sve_ld1rq"
-  [(set (match_operand:VNx16QI 0 "register_operand" "=w")
-   (unspec:VNx16QI
- [(match_operand:VNx16BI 1 "register_operand" "Upl")
+(define_insn "*sve_ld1rq"
+  [(set (match_operand:SVE_ALL 0 "register_operand" "=w")
+   (unspec:SVE_ALL
+ [(match_operand: 1 "register_operand" "Upl")
   (match_operand:TI 2 "aarch64_sve_ld1r_operand" "Uty")]
  UNSPEC_LD1RQ))]
   "TARGET_SVE"
-  "ld1rqb\t%0.b, %1/z, %2"
+  "ld1rq\t%0., %1/z, %2"
 )
 
 ;; Implement a predicate broadcast by shifting the low bit of the scalar
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2018-01-26 13:46:00.955822193 +
+++ gcc/config/aarch64/aarch64.c2018-01-26 13:49:00.071630173 +
@@ -2787,16 +2787,7 @@ aarch64_expand_sve_widened_duplicate (rt
   return true;
 }
 
-  /* The bytes are loaded in little-endian order, so do a byteswap on
- big-endian targets.  */
-  if (BYTES_BIG_ENDIAN)
-{
-  src = simplify_unary_operation (BSWAP, src_mode, src, src_mode);
-  if (!src)
-   return NULL_RTX;
-}
-
-  /* Use LD1RQ to load the 128 bits from memory.  */
+  /* Use LD1RQ[BHWD] to load the 128 bits from memory.  */
   src = force_const_mem (src_mode, src);
   if (!src)
 return false;
@@ -2808,8 +2799,12 @@ aarch64_expand_sve_widened_duplicate (rt
   src = replace_equiv_address (src, addr);
 }
 
-  rtx ptrue = force_reg (VNx16BImode, CONSTM1_RTX (VNx16BImode));
-  emit_insn (gen_sve_ld1rq (gen_lowpart (VNx16QImode, dest), ptrue, src));
+  machine_mode mode = GET_MODE (dest);
+  unsigned int elem_bytes = GET_MODE_UNIT_SIZE (mode);
+  machine_mode pred_mode = aarch64_sve_pred_mode (elem_bytes).require ();
+  rtx ptrue = force_reg (pred_mode, CONSTM1_RTX (pred_mode));
+  src = gen_rtx_UNSPEC (mode, gen_rtvec (2, ptrue, src), UNSPEC_LD1RQ);
+  emit_insn (gen_rtx_SET (dest, src));
   return true;
 }
 
Index: gcc/testsuite/gcc.target/aarch64/sve/slp_2.c
===
--- gcc/testsuite/gcc.target/aarch64/sve/slp_2.c2018-01-13 
17:58:43.651957575 +
+++ gcc/testsuite/gcc.target/aarch64/sve/slp_2.c2018-01-26 
13:49:00.071630173 +
@@ -32,7 +32,7 @@ TEST_ALL (VEC_PERM)
 /* { dg-final { scan-assembler-times {\tld1rh\tz[0-9]+\.h, } 2 } } */
 /* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 3 } } */
 /* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 3 } } */
-/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 3 } } */
+/* { dg-final { scan-assembler-times {\tld1rqd\tz[0-9]+\.d, } 3 } } */
 /* { dg-final { scan-assembler-not {\tzip1\t} } } */
 /* { dg-final { scan-assembler-not {\tzip2\t} } } */
 
Index: gcc/testsuite/gcc.target/aarch64/sve/slp_3.c
===
--- gcc/testsuite/gcc.target/aarch64/sve/slp_3.c2018-01-13 
17:58:43.651957575 +
+++ gcc/testsuite/gcc.target/aarch64/sve/slp_3.c2018-01-26 
13:49:00.071630173 +
@@ -36,7 +36,7 @@ TEST_ALL (VEC_PERM)
 /* 1 for each 16-bit type and 4 for double.  */
 /* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 7 } } */
 /* 1

Re: Silence false positive on LTO type merging waring

2018-01-26 Thread Jan Hubicka
> On Fri, 26 Jan 2018, Jan Hubicka wrote:
> 
> > > On Thu, 25 Jan 2018, Jan Hubicka wrote:
> > > 
> > > > Hi,
> > > > the testcase triggers invalid warning on type mismatch because array
> > > > of pointers to complete type has different alias set from array of 
> > > > pointers
> > > > to incomplete type.  This is valid, because incoplete pointer has alias 
> > > > set
> > > > of void_ptr which alias all pointers and arrays alias with their 
> > > > members.
> > > 
> > > But isn't that a problem?  Not if the pointer to incomlete type have
> > > alias set zero but IIRC they don't, right?
> > 
> > pointers to incomplete type are same as alias set of ptr_type_node (1 in 
> > this case)
> > and pointer to complete types have different alias set but that set has
> > is_pointer flag.  When checking for conflict we make 1 to alias with 
> > everything that
> > has is_pointer flag in it, so it will return true for these two array types.
> 
> Ah, ok.  The patch is ok then.  I suppose in theory complex pointers
> or vector pointers would have the same issue if we'd ever generate
> those (GIMPLE at least doens't like them ;)).

Yep, VECTOR_TYPE is only remaining case where get_alias_set recurse, but I would
wait for those to be actually used.  We will not only generate them but produce
static variables of them for waring to triger.

Honza


[AArch64] Prefer LD1RQ for big-endian SVE

2018-01-26 Thread Richard Sandiford
This patch deals with cases in which a CONST_VECTOR contains a
repeating bit pattern that is wider than one element but narrower
than 128 bits.  The current code:

* treats the repeating pattern as a single element
* uses the associated LD1R to load and replicate it (such as LD1RD
  for 64-bit patterns)
* uses a subreg to cast the result back to the original vector type

The problem is that for big-endian targets, the final cast is
effectively a form of element reverse.  E.g. say we're using LD1RD to load
16-bit elements, with h being the high parts and l being the low parts:

   +-+-+-+-+-+
 lanes |  0  |  1  |  2  |  3  |  4  | ...
   +-+-+-+-+-+
 memory  bytes |h0 l0 h1 l1 h2 l2 h3 l3 h0 l0 
   +--
 V  V  V  V  V  V  V  V
 --+---+
register   |   0   |
 after   --+---+  lsb
 LD1RD    h3 l3 h0 l0 h1 l1 h2 l2 h3 l3|
 --+

 +-+-+-+-+-+
expected ... |  4  |  3  |  2  |  1  |  0  |
register +-+-+-+-+-+  lsb
contents  h0 l0 h3 l3 h2 l2 h1 l1 h0 l0|
 --+

A later patch fixes the handling of general subregs to account
for this, but it means that we need to do a REV instruction
after the load.  It seems better to use LD1RQ[BHW] on a 128-bit
pattern instead, since that gets the endianness right without
a separate fixup instruction.

This is another step towards fixing sve/slp_* for aarch64_be.

Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?

Richard


2018-01-26  Richard Sandiford  

gcc/
* config/aarch64/aarch64.c (aarch64_expand_sve_const_vector): Prefer
the TImode handling for big-endian targets.

gcc/testsuite/
* gcc.target/aarch64/sve/slp_2.c: Expect LD1RQ to be used instead
of LD1R[HWD] for multi-element constants on big-endian targets.
* gcc.target/aarch64/sve/slp_3.c: Likewise.
* gcc.target/aarch64/sve/slp_4.c: Likewise.

Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2018-01-26 13:49:00.071630173 +
+++ gcc/config/aarch64/aarch64.c2018-01-26 13:51:19.665760175 +
@@ -2824,10 +2824,18 @@ aarch64_expand_sve_const_vector (rtx des
   /* The constant is a repeating seqeuence of at least two elements,
 where the repeating elements occupy no more than 128 bits.
 Get an integer representation of the replicated value.  */
-  unsigned int int_bits = GET_MODE_UNIT_BITSIZE (mode) * npatterns;
-  gcc_assert (int_bits <= 128);
-
-  scalar_int_mode int_mode = int_mode_for_size (int_bits, 0).require ();
+  scalar_int_mode int_mode;
+  if (BYTES_BIG_ENDIAN)
+   /* For now, always use LD1RQ to load the value on big-endian
+  targets, since the handling of smaller integers includes a
+  subreg that is semantically an element reverse.  */
+   int_mode = TImode;
+  else
+   {
+ unsigned int int_bits = GET_MODE_UNIT_BITSIZE (mode) * npatterns;
+ gcc_assert (int_bits <= 128);
+ int_mode = int_mode_for_size (int_bits, 0).require ();
+   }
   rtx int_value = simplify_gen_subreg (int_mode, src, mode, 0);
   if (int_value
  && aarch64_expand_sve_widened_duplicate (dest, int_mode, int_value))
Index: gcc/testsuite/gcc.target/aarch64/sve/slp_2.c
===
--- gcc/testsuite/gcc.target/aarch64/sve/slp_2.c2018-01-26 
13:49:00.071630173 +
+++ gcc/testsuite/gcc.target/aarch64/sve/slp_2.c2018-01-26 
13:51:19.665760175 +
@@ -29,9 +29,12 @@ #define TEST_ALL(T)  \
 
 TEST_ALL (VEC_PERM)
 
-/* { dg-final { scan-assembler-times {\tld1rh\tz[0-9]+\.h, } 2 } } */
-/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 3 } } */
-/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 3 } } */
+/* { dg-final { scan-assembler-times {\tld1rh\tz[0-9]+\.h, } 2 { target 
aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-times {\tld1rqb\tz[0-9]+\.b, } 2 { target 
aarch64_big_endian } } } */
+/* { dg-final { scan-assembler-times {\tld1rw\tz[0-9]+\.s, } 3 { target 
aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-times {\tld1rqh\tz[0-9]+\.h, } 3 { target 
aarch64_big_endian } } } */
+/* { dg-final { scan-assembler-times {\tld1rd\tz[0-9]+\.d, } 3 { target 
aarch64_little_endian } } } */
+/* { dg-final { scan-assembler-times {\tld1rqw\tz[0-9]+\.s, } 3 

[AArch64] Handle SVE subregs that are effectively REVs

2018-01-26 Thread Richard Sandiford
Subreg reads should be equivalent to storing the inner register to
memory and loading the appropriate memory bytes back, with subreg
writes doing the reverse.  For the reasons explained in the comments,
this isn't what happens for big-endian SVE if we simply reinterpret
one vector register as having a different element size, so the
conceptual store and load is needed in the general case.

However, that obviously produces poor code if we do it too often.
The patch therefore adds a pattern for handling the operation in
registers.  This copes with the important case of a VIEW_CONVERT
created by tree-vect-slp.c:duplicate_and_interleave.

It might make sense to tighten the predicates in aarch64-sve.md so
that such subregs are not allowed as operands to most instructions,
but that's future work.

This fixes the sve/slp_*.c tests on aarch64_be.

Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?

Richard


2018-01-26  Richard Sandiford  

gcc/
* config/aarch64/aarch64-protos.h (aarch64_split_sve_subreg_move)
(aarch64_maybe_expand_sve_subreg_move): Declare.
* config/aarch64/aarch64.md (UNSPEC_REV_SUBREG): New unspec.
* config/aarch64/predicates.md (aarch64_any_register_operand): New
predicate.
* config/aarch64/aarch64-sve.md (mov): Optimize subreg moves
that are semantically a reverse operation.
(*aarch64_sve_mov_subreg_be): New pattern.
* config/aarch64/aarch64.c (aarch64_maybe_expand_sve_subreg_move):
(aarch64_replace_reg_mode, aarch64_split_sve_subreg_move): New
functions.
(aarch64_can_change_mode_class): For big-endian, forbid changes
between two SVE modes if they have different element sizes.

Index: gcc/config/aarch64/aarch64-protos.h
===
--- gcc/config/aarch64/aarch64-protos.h 2018-01-19 11:57:11.135992201 +
+++ gcc/config/aarch64/aarch64-protos.h 2018-01-26 13:55:12.276219256 +
@@ -447,6 +447,8 @@ void aarch64_expand_epilogue (bool);
 void aarch64_expand_mov_immediate (rtx, rtx, rtx (*) (rtx, rtx) = 0);
 void aarch64_emit_sve_pred_move (rtx, rtx, rtx);
 void aarch64_expand_sve_mem_move (rtx, rtx, machine_mode);
+bool aarch64_maybe_expand_sve_subreg_move (rtx, rtx);
+void aarch64_split_sve_subreg_move (rtx, rtx, rtx);
 void aarch64_expand_prologue (void);
 void aarch64_expand_vector_init (rtx, rtx);
 void aarch64_init_cumulative_args (CUMULATIVE_ARGS *, const_tree, rtx,
Index: gcc/config/aarch64/aarch64.md
===
--- gcc/config/aarch64/aarch64.md   2018-01-19 11:57:11.134992236 +
+++ gcc/config/aarch64/aarch64.md   2018-01-26 13:55:12.279219121 +
@@ -168,6 +168,7 @@ (define_c_enum "unspec" [
 UNSPEC_INSR
 UNSPEC_CLASTB
 UNSPEC_FADDA
+UNSPEC_REV_SUBREG
 ])
 
 (define_c_enum "unspecv" [
Index: gcc/config/aarch64/predicates.md
===
--- gcc/config/aarch64/predicates.md2018-01-19 11:57:11.130992372 +
+++ gcc/config/aarch64/predicates.md2018-01-26 13:55:12.279219121 +
@@ -617,3 +617,7 @@ (define_predicate "aarch64_gather_scale_
 (define_predicate "aarch64_gather_scale_operand_d"
   (and (match_code "const_int")
(match_test "INTVAL (op) == 1 || INTVAL (op) == 8")))
+
+;; A special predicate that doesn't match a particular mode.
+(define_special_predicate "aarch64_any_register_operand"
+  (match_code "reg"))
Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2018-01-26 13:49:00.069630245 +
+++ gcc/config/aarch64/aarch64-sve.md   2018-01-26 13:55:12.277219211 +
@@ -84,6 +84,32 @@ (define_expand "mov"
  gen_vec_duplicate);
DONE;
   }
+
+/* Optimize subregs on big-endian targets: we can use REV[BHW]
+   instead of going through memory.  */
+if (BYTES_BIG_ENDIAN
+&& aarch64_maybe_expand_sve_subreg_move (operands[0], operands[1]))
+  DONE;
+  }
+)
+
+;; A pattern for optimizing SUBREGs that have a reinterpreting effect
+;; on big-endian targets; see aarch64_maybe_expand_sve_subreg_move
+;; for details.  We use a special predicate for operand 2 to reduce
+;; the number of patterns.
+(define_insn_and_split "*aarch64_sve_mov_subreg_be"
+  [(set (match_operand:SVE_ALL 0 "aarch64_sve_nonimmediate_operand" "=w")
+   (unspec:SVE_ALL
+  [(match_operand:VNx16BI 1 "register_operand" "Upl")
+  (match_operand 2 "aarch64_any_register_operand" "w")]
+ UNSPEC_REV_SUBREG))]
+  "TARGET_SVE && BYTES_BIG_ENDIAN"
+  "#"
+  "&& reload_completed"
+  [(const_int 0)]
+  {
+aarch64_split_sve_subreg_move (operands[0], operands[1], operands[2]);
+DONE;
   }
 )
 
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/co

Re: [AArch64] Fix sve/extract_[12].c for big-endian SVE

2018-01-26 Thread Kyrill Tkachov

Hi Richard,

On 26/01/18 13:31, Richard Sandiford wrote:

sve/extract_[12].c were relying on the target-independent optimisation
that removes a redundant vec_select, so that we don't end up with
things like:

dup v0.4s, v0.4s[0]
...use s0...

But that optimisation rightly doesn't trigger for big-endian targets,
because GCC expects lane 0 to be in the high part of the register
rather than the low part.

SVE breaks this assumption -- see the comment at the head of
aarch64-sve.md for details -- so the optimisation is valid for
both endiannesses.  Long term, we probably need some kind of target
hook to make GCC aware of this.

But there's another problem with the current extract pattern: it doesn't
tell the register allocator how cheap an extraction of lane 0 is with
tied registers.  It seems better to split the lane 0 case out into
its own pattern and use tied operands for the FPR<-SIMD case,
so that using different registers has the cost of an extra reload.
I think we want this for both endiannesses, regardless of the hook
described above.

Also, the gen_lowpart in this pattern fails for aarch64_be due to
TARGET_CAN_CHANGE_MODE_CLASS restrictions, so the patch uses gen_rtx_REG
instead.  We're only creating this rtl in order to print it, so there's
no need for anything fancier.

Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?

Richard


2018-01-26  Richard Sandiford 

gcc/
* config/aarch64/aarch64-sve.md (*vec_extract_0): New
pattern.
(*vec_extract_v128): Require a nonzero lane number.
Use gen_rtx_REG rather than gen_lowpart.

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2018-01-13 18:01:51.232735405 +
+++ gcc/config/aarch64/aarch64-sve.md   2018-01-26 13:26:50.176756711 +
@@ -484,18 +484,52 @@ (define_expand "vec_extract"
   }
 )

+;; Extract element zero.  This is a special case because we want to force
+;; the registers to be the same for the second alternative, and then
+;; split the instruction into nothing after RA.
+(define_insn_and_split "*vec_extract_0"
+  [(set (match_operand: 0 "aarch64_simd_nonimmediate_operand" "=r, w, 
Utv")
+   (vec_select:
+ (match_operand:SVE_ALL 1 "register_operand" "w, 0, w")
+ (parallel [(const_int 0)])))]
+  "TARGET_SVE"
+  {
+operands[1] = gen_rtx_REG (mode, REGNO (operands[1]));
+switch (which_alternative)
+  {
+   case 0:
+ return "umov\\t%0, %1.[0]";
+   case 1:
+ return "#";
+   case 2:
+ return "st1\\t{%1.}[0], %0";
+   default:
+ gcc_unreachable ();
+  }
+  }
+  "&& reload_completed
+   && REG_P (operands[1])
+   && REGNO (operands[0]) == REGNO (operands[1])"


Is it guaranteed that operands[0] will be a REG rtx here?
The aarch64_simd_nonimmediate_operand predicate and Utv constraint might allow
a MEM, which would cause an error in an RTL-checking build.
If I'm not mistaken GCC may attempt the split even when matching alternative 2.

Kyrill


+  [(const_int 0)]
+  {
+emit_note (NOTE_INSN_DELETED);
+DONE;
+  }
+  [(set_attr "type" "neon_to_gp_q, untyped, neon_store1_one_lane_q")]
+)
+
 ;; Extract an element from the Advanced SIMD portion of the register.
 ;; We don't just reuse the aarch64-simd.md pattern because we don't
-;; want any chnage in lane number on big-endian targets.
+;; want any change in lane number on big-endian targets.
 (define_insn "*vec_extract_v128"
   [(set (match_operand: 0 "aarch64_simd_nonimmediate_operand" "=r, w, 
Utv")
 (vec_select:
   (match_operand:SVE_ALL 1 "register_operand" "w, w, w")
   (parallel [(match_operand:SI 2 "const_int_operand")])))]
   "TARGET_SVE
-   && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (mode), 0, 15)"
+   && IN_RANGE (INTVAL (operands[2]) * GET_MODE_SIZE (mode), 1, 15)"
   {
-operands[1] = gen_lowpart (mode, operands[1]);
+operands[1] = gen_rtx_REG (mode, REGNO (operands[1]));
 switch (which_alternative)
   {
 case 0:




[SLP/AArch64] Fix unpack handling for big-endian SVE

2018-01-26 Thread Richard Sandiford
I hadn't realised that on big-endian targets, VEC_UNPACK*HI_EXPR unpacks
the low-numbered lanes and VEC_UNPACK*LO_EXPR unpacks the high-numbered
lanes.  This meant that both the SVE patterns and the handling of
fully-masked loops were wrong.

The patch deals with that by making sure that all vec_unpack* optabs
are define_expands, using BYTES_BIG_ENDIAN to choose the appropriate
define_insn.  This in turn meant that we can get rid of the duplication
between the signed and unsigned patterns for predicates.  (We provide
implementations of both the signed and unsigned optabs because the sign
doesn't matter for predicates: every element contains only one
significant bit.)

Also, the float unpacks need to unpack one half of the input vector,
but the unpacked upper bits are "don't care".  There are two obvious
ways of handling that: use an unpack (filling with zeros) or use a ZIP
(filling with a duplicate of the low bits).  The code previously used
unpacks, but the sequence involved a subreg that is semantically an
element reverse on big-endian targets.  Using the ZIP patterns avoids
that, and at the moment there's no reason to prefer one over the other
for performance reasons, so the patch switches to ZIP unconditionally.
As the comment says, it would be easy to optimise this later if UUNPK
turns out to be better for some implementations.

Tested on aarch64_be-elf, aarch64-linux-gnu and x86_64-linux-gnu.
I think the tree-vect-loop-manip.c part is obvious, but OK for the
AArch64 bits?

Richard


2018-01-26  Richard Sandiford  

gcc/
* tree-vect-loop-manip.c (vect_maybe_permute_loop_masks):
Reverse the choice between VEC_UNPACK_LO_EXPR and VEC_UNPACK_HI_EXPR
for big-endian.
* config/aarch64/iterators.md (hi_lanes_optab): New int attribute.
* config/aarch64/aarch64-sve.md
(*aarch64_sve_): Rename to...
(aarch64_sve_): ...this.
(*extend2): Rename to...
(aarch64_sve_extend2): ...this.
(vec_unpack__): Turn into a define_expand,
renaming the old pattern to...
(aarch64_sve_punpk_): ...this.  Only define
unsigned packs.
(vec_unpack__): Turn into a
define_expand, renaming the old pattern to...
(aarch64_sve_unpk_): ...this.
(*vec_unpacku___no_convert): Delete.
(vec_unpacks__): Take BYTES_BIG_ENDIAN into
account when deciding which SVE instruction the optab should use.
(vec_unpack_float__vnx4si): Likewise.

gcc/testsuite/
* gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Expect zips rather
than unpacks.
* gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise.
* gcc.target/aarch64/sve/unpack_float_1.c: Likewise.

Index: gcc/tree-vect-loop-manip.c
===
--- gcc/tree-vect-loop-manip.c  2018-01-13 18:02:00.948360274 +
+++ gcc/tree-vect-loop-manip.c  2018-01-26 14:00:15.717916957 +
@@ -334,7 +334,8 @@ vect_maybe_permute_loop_masks (gimple_se
{
  tree src = src_rgm->masks[i / 2];
  tree dest = dest_rgm->masks[i];
- tree_code code = (i & 1 ? VEC_UNPACK_HI_EXPR
+ tree_code code = ((i & 1) == (BYTES_BIG_ENDIAN ? 0 : 1)
+   ? VEC_UNPACK_HI_EXPR
: VEC_UNPACK_LO_EXPR);
  gassign *stmt;
  if (dest_masktype == unpack_masktype)
Index: gcc/config/aarch64/iterators.md
===
--- gcc/config/aarch64/iterators.md 2018-01-13 18:01:26.106685901 +
+++ gcc/config/aarch64/iterators.md 2018-01-26 14:00:15.716916999 +
@@ -1674,6 +1674,15 @@ (define_int_attr perm_hilo [(UNSPEC_ZIP1
(UNSPEC_UNPACKSHI "hi") (UNSPEC_UNPACKUHI "hi")
(UNSPEC_UNPACKSLO "lo") (UNSPEC_UNPACKULO "lo")])
 
+;; Return true if the associated optab refers to the high-numbered lanes,
+;; false if it refers to the low-numbered lanes.  The convention is for
+;; "hi" to refer to the low-numbered lanes (the first ones in memory)
+;; for big-endian.
+(define_int_attr hi_lanes_optab [(UNSPEC_UNPACKSHI "!BYTES_BIG_ENDIAN")
+(UNSPEC_UNPACKUHI "!BYTES_BIG_ENDIAN")
+(UNSPEC_UNPACKSLO "BYTES_BIG_ENDIAN")
+(UNSPEC_UNPACKULO "BYTES_BIG_ENDIAN")])
+
 (define_int_attr frecp_suffix  [(UNSPEC_FRECPE "e") (UNSPEC_FRECPX "x")])
 
 (define_int_attr crc_variant [(UNSPEC_CRC32B "crc32b") (UNSPEC_CRC32H "crc32h")
Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2018-01-26 13:55:12.277219211 +
+++ gcc/config/aarch64/aarch64-sve.md   2018-01-26 14:00:15.716916999 +
@@ -817,7 +817,7 @@ (define_insn "*aarch64_sve_\t%0., %1., %2."
 )
 
-(define_insn "*aarch64_sve_"
+(define_insn "aarch64_sve_"
   [(set (match_oper

Re: PING^3 [PATCH] i386: Avoid PLT when shadow stack is enabled directly

2018-01-26 Thread H.J. Lu
On Sun, Jan 7, 2018 at 7:13 PM, H.J. Lu  wrote:
> On Fri, Dec 8, 2017 at 5:02 AM, H.J. Lu  wrote:
>> On Tue, Oct 24, 2017 at 5:31 AM, H.J. Lu  wrote:
>>> PLT should be avoided with shadow stack in 32-bit mode if more than 2
>>> parameters are passed in registers since only 2 parameters can be passed
>>> in registers for external function calls via PLT with shadow stack
>>> enabled.
>>>
>>> OK for trunk if there is no regressions?
>>
>> Here is the updated patch.
>>
>> PING.
>
> PING:
>
> https://gcc.gnu.org/ml/gcc-patches/2017-12/msg00485.html
>

PING


-- 
H.J.


PING^2: [PATCH] i386: Insert ENDBR before the profiling counter call

2018-01-26 Thread H.J. Lu
On Sun, Jan 7, 2018 at 7:11 PM, H.J. Lu  wrote:
> On Tue, Oct 24, 2017 at 10:58 AM, H.J. Lu  wrote:
>> On Tue, Oct 24, 2017 at 10:40 AM, Andi Kleen  wrote:
>>> "H.J. Lu"  writes:
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/i386/pr82699-4.c
 @@ -0,0 +1,11 @@
 +/* { dg-do compile { target { *-*-linux* && { ! ia32 } } } } */
 +/* { dg-options "-O2 -fpic -fcf-protection -mcet -pg -mfentry 
 -fasynchronous-unwind-tables" } */
 +/* { dg-final { scan-assembler-times {\t\.cfi_startproc\n\tendbr} 1 }
 } */
>>>
>>> Would add test cases for -mnop-mcount and -mrecord-mcount too
>>>
>>
>> Here is the updated patch to add a testcase for -mnop-mcount -mrecord-mcount.
>> No other changes otherwise.
>>
>
> PING:
>
> https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01741.html
>
>

PING.


-- 
H.J.


[AArch64] Fix SVE testsuite failures for ILP32 (PR 83846)

2018-01-26 Thread Richard Sandiford
The SVE tests are split into code-quality compile tests and runtime
tests.  A lot of the former are geared towards LP64.  It would be
possible (but tedious!) to mark up every line that is expected to work
only for LP64, but I think it would be a constant source of problems.

Since the code has not been tuned for ILP32 yet, I think the best
thing is to select only the runtime tests for that combination.
They all pass on aarch64-elf and aarch64_be-elf except vec-cond-[34].c,
which are unsupported due to the lack of fenv support.

The patch also replaces uses of built-in types with stdint.h types
where possible.  (This excludes tests that change the endianness,
since we can't assume that system header files work in that case.)

Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?

Richard


2018-01-17  Richard Sandiford  

gcc/testsuite/
PR testsuite/83846
* gcc.target/aarch64/sve/aarch64-sve.exp: Only do *_run tests
for ILP32.
* gcc.target/aarch64/sve/clastb_2_run.c (main): Use TYPE instead
of hard-coding the choice.
* gcc.target/aarch64/sve/clastb_4_run.c (main): Likewise.
* gcc.target/aarch64/sve/clastb_5_run.c (main): Likewise.
* gcc.target/aarch64/sve/clastb_3_run.c (main): Likewise.  Generalize
memset call.
* gcc.target/aarch64/sve/const_pred_1.C: Include stdint.h and use
stdint.h types.
* gcc.target/aarch64/sve/const_pred_2.C: Likewise.
* gcc.target/aarch64/sve/const_pred_3.C: Likewise.
* gcc.target/aarch64/sve/const_pred_4.C: Likewise.
* gcc.target/aarch64/sve/load_const_offset_2.c: Likewise.
* gcc.target/aarch64/sve/logical_1.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_1.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_2.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_3.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_4.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_5.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_6.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_7.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_load_8.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_1.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_2.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_3.c: Likewise.
* gcc.target/aarch64/sve/mask_struct_store_4.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_1.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_2.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_2_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_3.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_3_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_4.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_4_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_7.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_8.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_8_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_9.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_9_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_10.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_10_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_11.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_11_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_12.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_12_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_13.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_13_run.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_14.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_18.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_19.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_20.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_21.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_22.c: Likewise.
* gcc.target/aarch64/sve/struct_vect_23.c: Likewise.
* gcc.target/aarch64/sve/popcount_1.c (popcount_64): Use
__builtin_popcountll rather than __builtin_popcountl.

Index: gcc/testsuite/gcc.target/aarch64/sve/aarch64-sve.exp
===
--- gcc/testsuite/gcc.target/aarch64/sve/aarch64-sve.exp2018-01-13 
17:54:52.925270132 +
+++ gcc/testsuite/gcc.target/aarch64/sve/aarch64-sve.exp2018-01-26 
14:09:27.089932690 +
@@ -44,8 +44,16 @@ if { [check_effective_target_aarch64_sve
 set sve_flags "-march=armv8.2-a+sve"
 }
 
+# Most of the code-quality tests are written for LP64.  Just do the
+# correctness tests for ILP32.
+if { [check_effective_target_ilp32] } {
+set pattern "*_run"
+} else {
+set pattern "*"
+}
+
 # Main loop.
-dg-runtest [lsort [glob -nocomplain $srcdir/$subdir

Re: [PATCH][arm] XFAIL advsimd-intrinsics/vld1x2.c

2018-01-26 Thread Kyrill Tkachov

Hi all,

I'm committing this to trunk after a discussion with James.
There's really not that much aarch64-specific change, it can be considered 
obvious from that perspective.

Thanks,
Kyrill

On 19/01/18 10:58, Kyrill Tkachov wrote:

Ping.

https://gcc.gnu.org/ml/gcc-patches/2018-01/msg00913.html

Thanks,
Kyrill

On 11/01/18 15:33, Kyrill Tkachov wrote:

Hi all,

This recently added test fails on arm. We haven't implemented these intrinsics 
for arm
(any volunteers?) so for now let's XFAIL these on that target.
Also, the float64 versions of these intrinsics are not supposed to be available 
on arm
so this patch slightly adjusts the test to not include them for aarch32.
In any case the entire test is XFAILed on arm, so this doesn't have any 
noticeable
effect.

The same number of tests (PASS) still occur on aarch64 but now they appear as 
XFAIL
rather than FAIL on arm.

Ok for trunk? (from an aarch64 perspective).

Thanks,
Kyrill

2018-01-11  Kyrylo Tkachov  

 * gcc.target/aarch64/advsimd-intrinsics/vld1x2.c: Make float64
 tests specific to aarch64.  XFAIL test on arm.






Ping^2: Re: [PATCH] __VA_OPT__ fixes (PR preprocessor/83063, PR preprocessor/83708)

2018-01-26 Thread Jakub Jelinek
Ping^2
http://gcc.gnu.org/ml/gcc-patches/2018-01/msg00727.html

On Wed, Jan 17, 2018 at 05:47:12PM +0100, Jakub Jelinek wrote:
> I'd like to ping this patch.
> As I wrote, it isn't a full solution for all the __VA_OPT__ issues,
> but it isn't even clear to me how exactly it should behave, but fixes
> some ICEs and a couple of most important issues and shouldn't make things
> worse, at least on the gcc and clang __VA_OPT__ testcases.

Jakub


Re: [SFN+LVU+IEPM v4 7/9] [LVU] Introduce location views

2018-01-26 Thread Jakub Jelinek
On Thu, Jan 25, 2018 at 05:58:37PM -0200, Alexandre Oliva wrote:
> > This looks wrong.  The proposal has not been accepted yet, so you
> > really can't know if DW_LLE_view_pair will be like that or whether it
> > will have value of 9.  Unfortunately, the DWARF standard doesn't specify a
> > vendor range for DW_LLE_* values.  I'd use 0xf0 or so, and don't pretend
> > there is DW_LLE_view_pair at all, just use DW_LLE_GNU_view_pair everywhere.
> > Jason, what do you think?
> 
> My reasoning was that, since we'd only emit this as an
> explicitly-requested backward-incompatible extension to DWARF-5 (by
> specifying -gdwarf-6 in this patch, but the option spelling could be
> changed), any LLE number whatsoever would do.  The point of the #define
> was to write the code in GCC so that, once DW_LLE_view_pair was
> standardized (if it ever was), we'd just set up an enum for it and we'd
> be done, but that would only happen at DWARF6+ anyway, so we'd be able
> to tell, since we'd then have actual DWARF6, rather than DWARF5 with an
> incompatible extensions (which is what we emit with the current
> -gdwarf-6 option; see below).

Having to tweak debug info consumers so that they treat DW_LLE_* of 9
one way for .debug_loclist of version 5 and another way for .debug_loclist
of version 6 isn't a good idea.
Why don't you emit the DW_LLE_GNU_view_pair stuff in .debug_loclists
already with -gdwarf-5, if needed paired without some TU attribute that says
that views are emitted?

> >> +static inline bool
> >> +dwarf2out_locviews_in_attribute ()
> >> +{
> >> +  return debug_variable_location_views
> >> +&& dwarf_version <= 5;
> 
> > Formatting, should be
> >   return debug_variable_location_views && dwarf_version <= 5;
> > or
> >   return (debug_variable_location_views
> >   && dwarf_version <= 5);
> > if it wouldn't fit (but it does).
> 
> Hmm...  Where does that mandate come from?, if you don't mind my asking.
> I have just revised both GNU Coding Standards, and GCC-specific
> conventions, to make sure I hadn't missed a change or some long-ago
> established convention, and I couldn't find anything that supports that
> "should".

Haven't looked for it it in the coding standards, but it is something
I've been asked to do in my code by various reviewers over the years and
what I've been asking others in my patch reviews from others too.

I'm personally not using emacs and so the extra ()s isn't something I'd
want myself, it is just something others asked for multiple times.
I feel strongly about indenting stuff right, which if it wouldn't fit would
be
  return verylongcondition
 && otherlongcondition__;
rather than
  return verylongcondition
  && otherlongcondition__;
or similar, it would surprise me if it is not in the coding standard.

> Personally, I don't see that line breaks before operators are only
> permitted when the operand that follows wouldn't fit; the standards are

Yes, you can break it, but why, it doesn't make the code more readable,
but less in this case.

> No, that means something else, namely emit location view lists in a
> separate attribute, matching corresponding ranges.
> 
> Maybe we shouldn't even have an option to emit that at this point, or
> make it a very different spelling.  But I'd argue there's no point in
> discarding the code that implements the format that was proposed for
> standardization; at the very least it serves as a reference.  That's
> also an argument for retaining the ability to emit it somehow, even if
> with an option that we document as to-be-dropped if/when there's a
> standard representation.

So, what is the reason not to emit the format you are proposing already with
-gdwarf-5, perhaps with some extra attribute on the TU (of course not when
-gstrict-dwarf)?
Debuggers are still barely catching up with DWARF5, so even if they would be
upset by the DW_LLE_GNU_view_pair stuff, they can be still tweaked to ignore
those (skip over them) if they don't want to deal with the views.

Jakub


Re: [PATCH] Fix PR81082

2018-01-26 Thread Christophe Lyon
On 26 January 2018 at 11:25, Richard Biener  wrote:
> On Thu, 25 Jan 2018, Richard Biener wrote:
>
>> On Thu, 25 Jan 2018, Marc Glisse wrote:
>>
>> > On Thu, 25 Jan 2018, Richard Biener wrote:
>> >
>> > > --- gcc/match.pd  (revision 257047)
>> > > +++ gcc/match.pd  (working copy)
>> > > @@ -1939,6 +1939,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>> > >  (minus (convert (view_convert:stype @1))
>> > >   (convert (view_convert:stype @2)))
>> > >
>> > > +/* (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A -> A * (C+-1).
>> > > +Modeled after fold_plusminus_mult_expr.  */
>> > > +(if (!TYPE_SATURATING (type)
>> > > + && (!FLOAT_TYPE_P (type) || flag_associative_math))
>> > > + (for plusminus (plus minus)
>> > > +  (simplify
>> > > +   (plusminus (mult:s @0 @1) (mult:cs @0 @2))
>> >
>> > No :c on the first mult, so we don't actually handle A*C+B*C?
>>
>> Hmm, I somehow convinced myself that it's only necessary on one
>> of the mults...  but you are of course right.  Will re-test with
>> that fixed.
>
> This is what I have applied.  Note I had to XFAIL a minor part of
> gcc.dg/tree-ssa/loop-15.c, namely simplifying the final value
> replacement down to n * n.  While the patch should enable this
> the place where the transform could trigger with enough information
> about n is VRP but that doesn't end up folding all stmts - and
> that's something I'm not willing to change right now.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied on trunk.
>

Hi,


I think this patch caused regressions on aarch64:
FAIL: gcc.dg/wmul-1.c scan-tree-dump-not widening_mul "WIDEN_MULT_PLUS_EXPR"

Thanks,

Christophe

> Richard.
>
> 2018-01-26  Richard Biener  
>
> PR tree-optimization/81082
> * fold-const.c (fold_plusminus_mult_expr): Do not perform the
> association if it requires casting to unsigned.
> * match.pd ((A * C) +- (B * C) -> (A+-B)): New patterns derived
> from fold_plusminus_mult_expr to catch important cases late when
> range info is available.
>
> * gcc.dg/vect/pr81082.c: New testcase.
> * gcc.dg/tree-ssa/loop-15.c: XFAIL the (int)((unsigned)n + -1U) * n + 
> n
> simplification to n * n.
>
> Index: gcc/fold-const.c
> ===
> --- gcc/fold-const.c(revision 257047)
> +++ gcc/fold-const.c(working copy)
> @@ -7097,7 +7097,7 @@ fold_plusminus_mult_expr (location_t loc
>
>/* Same may be zero and thus the operation 'code' may overflow.  Likewise
>   same may be minus one and thus the multiplication may overflow.  Perform
> - the operations in an unsigned type.  */
> + the sum operation in an unsigned type.  */
>tree utype = unsigned_type_for (type);
>tree tem = fold_build2_loc (loc, code, utype,
>   fold_convert_loc (loc, utype, alt0),
> @@ -7110,9 +7110,9 @@ fold_plusminus_mult_expr (location_t loc
>  return fold_build2_loc (loc, MULT_EXPR, type,
> fold_convert (type, tem), same);
>
> -  return fold_convert_loc (loc, type,
> -  fold_build2_loc (loc, MULT_EXPR, utype, tem,
> -   fold_convert_loc (loc, utype, 
> same)));
> +  /* Do not resort to unsigned multiplication because
> + we lose the no-overflow property of the expression.  */
> +  return NULL_TREE;
>  }
>
>  /* Subroutine of native_encode_expr.  Encode the INTEGER_CST
> Index: gcc/match.pd
> ===
> --- gcc/match.pd(revision 257047)
> +++ gcc/match.pd(working copy)
> @@ -1939,6 +1939,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (minus (convert (view_convert:stype @1))
> (convert (view_convert:stype @2)))
>
> +/* (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A -> A * (C+-1).
> +Modeled after fold_plusminus_mult_expr.  */
> +(if (!TYPE_SATURATING (type)
> + && (!FLOAT_TYPE_P (type) || flag_associative_math))
> + (for plusminus (plus minus)
> +  (simplify
> +   (plusminus (mult:cs @0 @1) (mult:cs @0 @2))
> +   (if (!ANY_INTEGRAL_TYPE_P (type)
> +|| TYPE_OVERFLOW_WRAPS (type)
> +|| (INTEGRAL_TYPE_P (type)
> +   && tree_expr_nonzero_p (@0)
> +   && expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION (type)
> +(mult (plusminus @1 @2) @0)))
> +  /* We cannot generate constant 1 for fract.  */
> +  (if (!ALL_FRACT_MODE_P (TYPE_MODE (type)))
> +   (simplify
> +(plusminus @0 (mult:cs @0 @2))
> +(if (!ANY_INTEGRAL_TYPE_P (type)
> +|| TYPE_OVERFLOW_WRAPS (type)
> +|| (INTEGRAL_TYPE_P (type)
> +&& tree_expr_nonzero_p (@0)
> +&& expr_not_equal_to (@0, wi::minus_one (TYPE_PRECISION 
> (type)
> + (mult (plusminus { build_one_cst (type); } @2) @0)))
> +   (simplify
> +(plusminus (mult:cs @0 @2) @0)
> +(if (!ANY_INTEGRAL_TYPE_P (type)
> +|| T

Re: [SLP/AArch64] Fix unpack handling for big-endian SVE

2018-01-26 Thread Richard Biener
On Fri, Jan 26, 2018 at 3:21 PM, Richard Sandiford
 wrote:
> I hadn't realised that on big-endian targets, VEC_UNPACK*HI_EXPR unpacks
> the low-numbered lanes and VEC_UNPACK*LO_EXPR unpacks the high-numbered
> lanes.  This meant that both the SVE patterns and the handling of
> fully-masked loops were wrong.
>
> The patch deals with that by making sure that all vec_unpack* optabs
> are define_expands, using BYTES_BIG_ENDIAN to choose the appropriate
> define_insn.  This in turn meant that we can get rid of the duplication
> between the signed and unsigned patterns for predicates.  (We provide
> implementations of both the signed and unsigned optabs because the sign
> doesn't matter for predicates: every element contains only one
> significant bit.)
>
> Also, the float unpacks need to unpack one half of the input vector,
> but the unpacked upper bits are "don't care".  There are two obvious
> ways of handling that: use an unpack (filling with zeros) or use a ZIP
> (filling with a duplicate of the low bits).  The code previously used
> unpacks, but the sequence involved a subreg that is semantically an
> element reverse on big-endian targets.  Using the ZIP patterns avoids
> that, and at the moment there's no reason to prefer one over the other
> for performance reasons, so the patch switches to ZIP unconditionally.
> As the comment says, it would be easy to optimise this later if UUNPK
> turns out to be better for some implementations.
>
> Tested on aarch64_be-elf, aarch64-linux-gnu and x86_64-linux-gnu.
> I think the tree-vect-loop-manip.c part is obvious, but OK for the
> AArch64 bits?
>
> Richard
>
>
> 2018-01-26  Richard Sandiford  
>
> gcc/
> * tree-vect-loop-manip.c (vect_maybe_permute_loop_masks):
> Reverse the choice between VEC_UNPACK_LO_EXPR and VEC_UNPACK_HI_EXPR
> for big-endian.
> * config/aarch64/iterators.md (hi_lanes_optab): New int attribute.
> * config/aarch64/aarch64-sve.md
> (*aarch64_sve_): Rename to...
> (aarch64_sve_): ...this.
> (*extend2): Rename to...
> (aarch64_sve_extend2): ...this.
> (vec_unpack__): Turn into a define_expand,
> renaming the old pattern to...
> (aarch64_sve_punpk_): ...this.  Only define
> unsigned packs.
> (vec_unpack__): Turn into a
> define_expand, renaming the old pattern to...
> (aarch64_sve_unpk_): ...this.
> (*vec_unpacku___no_convert): Delete.
> (vec_unpacks__): Take BYTES_BIG_ENDIAN into
> account when deciding which SVE instruction the optab should use.
> (vec_unpack_float__vnx4si): Likewise.
>
> gcc/testsuite/
> * gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Expect zips rather
> than unpacks.
> * gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise.
> * gcc.target/aarch64/sve/unpack_float_1.c: Likewise.
>
> Index: gcc/tree-vect-loop-manip.c
> ===
> --- gcc/tree-vect-loop-manip.c  2018-01-13 18:02:00.948360274 +
> +++ gcc/tree-vect-loop-manip.c  2018-01-26 14:00:15.717916957 +
> @@ -334,7 +334,8 @@ vect_maybe_permute_loop_masks (gimple_se
> {
>   tree src = src_rgm->masks[i / 2];
>   tree dest = dest_rgm->masks[i];
> - tree_code code = (i & 1 ? VEC_UNPACK_HI_EXPR
> + tree_code code = ((i & 1) == (BYTES_BIG_ENDIAN ? 0 : 1)
> +   ? VEC_UNPACK_HI_EXPR
> : VEC_UNPACK_LO_EXPR);

This looks broken anyways -- the element oder on GIMPLE is the same
for little and big-endian.  I see two other BYTES_BIG_ENDIAN uses in
tree-vect-*, not sure when they crept in but they are all broken.

Richard.

>   gassign *stmt;
>   if (dest_masktype == unpack_masktype)
> Index: gcc/config/aarch64/iterators.md
> ===
> --- gcc/config/aarch64/iterators.md 2018-01-13 18:01:26.106685901 +
> +++ gcc/config/aarch64/iterators.md 2018-01-26 14:00:15.716916999 +
> @@ -1674,6 +1674,15 @@ (define_int_attr perm_hilo [(UNSPEC_ZIP1
> (UNSPEC_UNPACKSHI "hi") (UNSPEC_UNPACKUHI "hi")
> (UNSPEC_UNPACKSLO "lo") (UNSPEC_UNPACKULO "lo")])
>
> +;; Return true if the associated optab refers to the high-numbered lanes,
> +;; false if it refers to the low-numbered lanes.  The convention is for
> +;; "hi" to refer to the low-numbered lanes (the first ones in memory)
> +;; for big-endian.
> +(define_int_attr hi_lanes_optab [(UNSPEC_UNPACKSHI "!BYTES_BIG_ENDIAN")
> +(UNSPEC_UNPACKUHI "!BYTES_BIG_ENDIAN")
> +(UNSPEC_UNPACKSLO "BYTES_BIG_ENDIAN")
> +(UNSPEC_UNPACKULO "BYTES_BIG_ENDIAN")])
> +
>  (define_int_attr frecp_suffix  [(UNSPEC_FRECPE "e") (UNSPEC_FRECPX "x")])
>
>  (define_int_attr crc_variant [(UNSPEC_

Re: [SLP/AArch64] Fix unpack handling for big-endian SVE

2018-01-26 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, Jan 26, 2018 at 3:21 PM, Richard Sandiford
>  wrote:
>> I hadn't realised that on big-endian targets, VEC_UNPACK*HI_EXPR unpacks
>> the low-numbered lanes and VEC_UNPACK*LO_EXPR unpacks the high-numbered
>> lanes.  This meant that both the SVE patterns and the handling of
>> fully-masked loops were wrong.
>>
>> The patch deals with that by making sure that all vec_unpack* optabs
>> are define_expands, using BYTES_BIG_ENDIAN to choose the appropriate
>> define_insn.  This in turn meant that we can get rid of the duplication
>> between the signed and unsigned patterns for predicates.  (We provide
>> implementations of both the signed and unsigned optabs because the sign
>> doesn't matter for predicates: every element contains only one
>> significant bit.)
>>
>> Also, the float unpacks need to unpack one half of the input vector,
>> but the unpacked upper bits are "don't care".  There are two obvious
>> ways of handling that: use an unpack (filling with zeros) or use a ZIP
>> (filling with a duplicate of the low bits).  The code previously used
>> unpacks, but the sequence involved a subreg that is semantically an
>> element reverse on big-endian targets.  Using the ZIP patterns avoids
>> that, and at the moment there's no reason to prefer one over the other
>> for performance reasons, so the patch switches to ZIP unconditionally.
>> As the comment says, it would be easy to optimise this later if UUNPK
>> turns out to be better for some implementations.
>>
>> Tested on aarch64_be-elf, aarch64-linux-gnu and x86_64-linux-gnu.
>> I think the tree-vect-loop-manip.c part is obvious, but OK for the
>> AArch64 bits?
>>
>> Richard
>>
>>
>> 2018-01-26  Richard Sandiford  
>>
>> gcc/
>> * tree-vect-loop-manip.c (vect_maybe_permute_loop_masks):
>> Reverse the choice between VEC_UNPACK_LO_EXPR and VEC_UNPACK_HI_EXPR
>> for big-endian.
>> * config/aarch64/iterators.md (hi_lanes_optab): New int attribute.
>> * config/aarch64/aarch64-sve.md
>> (*aarch64_sve_): Rename to...
>> (aarch64_sve_): ...this.
>> (*extend2): Rename to...
>> (aarch64_sve_extend2): ...this.
>> (vec_unpack__): Turn into a define_expand,
>> renaming the old pattern to...
>> (aarch64_sve_punpk_): ...this.  Only define
>> unsigned packs.
>> (vec_unpack__): Turn into a
>> define_expand, renaming the old pattern to...
>> (aarch64_sve_unpk_): ...this.
>> (*vec_unpacku___no_convert): Delete.
>> (vec_unpacks__): Take BYTES_BIG_ENDIAN into
>> account when deciding which SVE instruction the optab should use.
>> (vec_unpack_float__vnx4si): Likewise.
>>
>> gcc/testsuite/
>> * gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Expect zips rather
>> than unpacks.
>> * gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise.
>> * gcc.target/aarch64/sve/unpack_float_1.c: Likewise.
>>
>> Index: gcc/tree-vect-loop-manip.c
>> ===
>> --- gcc/tree-vect-loop-manip.c  2018-01-13 18:02:00.948360274 +
>> +++ gcc/tree-vect-loop-manip.c  2018-01-26 14:00:15.717916957 +
>> @@ -334,7 +334,8 @@ vect_maybe_permute_loop_masks (gimple_se
>> {
>>   tree src = src_rgm->masks[i / 2];
>>   tree dest = dest_rgm->masks[i];
>> - tree_code code = (i & 1 ? VEC_UNPACK_HI_EXPR
>> + tree_code code = ((i & 1) == (BYTES_BIG_ENDIAN ? 0 : 1)
>> +   ? VEC_UNPACK_HI_EXPR
>> : VEC_UNPACK_LO_EXPR);
>
> This looks broken anyways -- the element oder on GIMPLE is the same
> for little and big-endian.  I see two other BYTES_BIG_ENDIAN uses in
> tree-vect-*, not sure when they crept in but they are all broken.

Are you sure?  Everything apart from this new code seems consistently
to treat UNPACK_HI as "high part of the register" rather than
"high memory address/lane number".  I think changing it now would
break powerpc64 and mips.

Thanks,
Richard


Re: [PATCH] Fix PR81082

2018-01-26 Thread Richard Biener
On Fri, Jan 26, 2018 at 3:57 PM, Christophe Lyon
 wrote:
> On 26 January 2018 at 11:25, Richard Biener  wrote:
>> On Thu, 25 Jan 2018, Richard Biener wrote:
>>
>>> On Thu, 25 Jan 2018, Marc Glisse wrote:
>>>
>>> > On Thu, 25 Jan 2018, Richard Biener wrote:
>>> >
>>> > > --- gcc/match.pd  (revision 257047)
>>> > > +++ gcc/match.pd  (working copy)
>>> > > @@ -1939,6 +1939,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>> > >  (minus (convert (view_convert:stype @1))
>>> > >   (convert (view_convert:stype @2)))
>>> > >
>>> > > +/* (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A -> A * (C+-1).
>>> > > +Modeled after fold_plusminus_mult_expr.  */
>>> > > +(if (!TYPE_SATURATING (type)
>>> > > + && (!FLOAT_TYPE_P (type) || flag_associative_math))
>>> > > + (for plusminus (plus minus)
>>> > > +  (simplify
>>> > > +   (plusminus (mult:s @0 @1) (mult:cs @0 @2))
>>> >
>>> > No :c on the first mult, so we don't actually handle A*C+B*C?
>>>
>>> Hmm, I somehow convinced myself that it's only necessary on one
>>> of the mults...  but you are of course right.  Will re-test with
>>> that fixed.
>>
>> This is what I have applied.  Note I had to XFAIL a minor part of
>> gcc.dg/tree-ssa/loop-15.c, namely simplifying the final value
>> replacement down to n * n.  While the patch should enable this
>> the place where the transform could trigger with enough information
>> about n is VRP but that doesn't end up folding all stmts - and
>> that's something I'm not willing to change right now.
>>
>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied on trunk.
>>
>
> Hi,
>
>
> I think this patch caused regressions on aarch64:
> FAIL: gcc.dg/wmul-1.c scan-tree-dump-not widening_mul "WIDEN_MULT_PLUS_EXPR"

Late forwprop does

@@ -75,7 +21,7 @@
   _1 = (long unsigned int) Idx_6(D);
   _2 = _1 * 40;
   _12 = _1 * 4;
-  _17 = _2 + _12;
+  _17 = _1 * 44;
   _13 = Arr_7(D) + _17;
   MEM[(int[10] *)_13] = 1;
   _4 = _2 + 400;
  _18 = _4 + _12;
  _16 = Arr_7(D) + _18;
  MEM[(int[10] *)_16] = 2;


which I'm not sure ends up profitable at the end.  It makes _12
dead so the total number of multiplications stays the same.
For this we then apply the WIDEN_MULT_PLUS_EXPR two
times given _2 is no longer used multiple times.

The reason the above transform happens is the match.pd :s
behavior which "ignores" :s in case the resulting expression
is "simple".

Can you open a bugreport?

Richard.

> Thanks,
>
> Christophe
>
>> Richard.
>>
>> 2018-01-26  Richard Biener  
>>
>> PR tree-optimization/81082
>> * fold-const.c (fold_plusminus_mult_expr): Do not perform the
>> association if it requires casting to unsigned.
>> * match.pd ((A * C) +- (B * C) -> (A+-B)): New patterns derived
>> from fold_plusminus_mult_expr to catch important cases late when
>> range info is available.
>>
>> * gcc.dg/vect/pr81082.c: New testcase.
>> * gcc.dg/tree-ssa/loop-15.c: XFAIL the (int)((unsigned)n + -1U) * n 
>> + n
>> simplification to n * n.
>>
>> Index: gcc/fold-const.c
>> ===
>> --- gcc/fold-const.c(revision 257047)
>> +++ gcc/fold-const.c(working copy)
>> @@ -7097,7 +7097,7 @@ fold_plusminus_mult_expr (location_t loc
>>
>>/* Same may be zero and thus the operation 'code' may overflow.  Likewise
>>   same may be minus one and thus the multiplication may overflow.  
>> Perform
>> - the operations in an unsigned type.  */
>> + the sum operation in an unsigned type.  */
>>tree utype = unsigned_type_for (type);
>>tree tem = fold_build2_loc (loc, code, utype,
>>   fold_convert_loc (loc, utype, alt0),
>> @@ -7110,9 +7110,9 @@ fold_plusminus_mult_expr (location_t loc
>>  return fold_build2_loc (loc, MULT_EXPR, type,
>> fold_convert (type, tem), same);
>>
>> -  return fold_convert_loc (loc, type,
>> -  fold_build2_loc (loc, MULT_EXPR, utype, tem,
>> -   fold_convert_loc (loc, utype, 
>> same)));
>> +  /* Do not resort to unsigned multiplication because
>> + we lose the no-overflow property of the expression.  */
>> +  return NULL_TREE;
>>  }
>>
>>  /* Subroutine of native_encode_expr.  Encode the INTEGER_CST
>> Index: gcc/match.pd
>> ===
>> --- gcc/match.pd(revision 257047)
>> +++ gcc/match.pd(working copy)
>> @@ -1939,6 +1939,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>   (minus (convert (view_convert:stype @1))
>> (convert (view_convert:stype @2)))
>>
>> +/* (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A -> A * (C+-1).
>> +Modeled after fold_plusminus_mult_expr.  */
>> +(if (!TYPE_SATURATING (type)
>> + && (!FLOAT_TYPE_P (type) || flag_associative_math))
>> + (for plusminus (plus minus)
>> +  (simplify
>> +   (plusminus (mult:cs @0 @1) (mult:cs @0

Re: [AArch64] Fix sve/extract_[12].c for big-endian SVE

2018-01-26 Thread Richard Sandiford
Kyrill  Tkachov  writes:
> On 26/01/18 13:31, Richard Sandiford wrote:
>> sve/extract_[12].c were relying on the target-independent optimisation
>> that removes a redundant vec_select, so that we don't end up with
>> things like:
>>
>> dup v0.4s, v0.4s[0]
>> ...use s0...
>>
>> But that optimisation rightly doesn't trigger for big-endian targets,
>> because GCC expects lane 0 to be in the high part of the register
>> rather than the low part.
>>
>> SVE breaks this assumption -- see the comment at the head of
>> aarch64-sve.md for details -- so the optimisation is valid for
>> both endiannesses.  Long term, we probably need some kind of target
>> hook to make GCC aware of this.
>>
>> But there's another problem with the current extract pattern: it doesn't
>> tell the register allocator how cheap an extraction of lane 0 is with
>> tied registers.  It seems better to split the lane 0 case out into
>> its own pattern and use tied operands for the FPR<-SIMD case,
>> so that using different registers has the cost of an extra reload.
>> I think we want this for both endiannesses, regardless of the hook
>> described above.
>>
>> Also, the gen_lowpart in this pattern fails for aarch64_be due to
>> TARGET_CAN_CHANGE_MODE_CLASS restrictions, so the patch uses gen_rtx_REG
>> instead.  We're only creating this rtl in order to print it, so there's
>> no need for anything fancier.
>>
>> Tested on aarch64_be-elf and aarch64-linux-gnu.  OK to install?
>>
>> Richard
>>
>>
>> 2018-01-26  Richard Sandiford 
>>
>> gcc/
>> * config/aarch64/aarch64-sve.md (*vec_extract_0): New
>> pattern.
>> (*vec_extract_v128): Require a nonzero lane number.
>> Use gen_rtx_REG rather than gen_lowpart.
>>
>> Index: gcc/config/aarch64/aarch64-sve.md
>> ===
>> --- gcc/config/aarch64/aarch64-sve.md   2018-01-13 18:01:51.232735405 +
>> +++ gcc/config/aarch64/aarch64-sve.md   2018-01-26 13:26:50.176756711 +
>> @@ -484,18 +484,52 @@ (define_expand "vec_extract"
>>}
>>  )
>>
>> +;; Extract element zero.  This is a special case because we want to force
>> +;; the registers to be the same for the second alternative, and then
>> +;; split the instruction into nothing after RA.
>> +(define_insn_and_split "*vec_extract_0"
>> +  [(set (match_operand: 0 "aarch64_simd_nonimmediate_operand" "=r, w, 
>> Utv")
>> +   (vec_select:
>> + (match_operand:SVE_ALL 1 "register_operand" "w, 0, w")
>> + (parallel [(const_int 0)])))]
>> +  "TARGET_SVE"
>> +  {
>> +operands[1] = gen_rtx_REG (mode, REGNO (operands[1]));
>> +switch (which_alternative)
>> +  {
>> +   case 0:
>> + return "umov\\t%0, %1.[0]";
>> +   case 1:
>> + return "#";
>> +   case 2:
>> + return "st1\\t{%1.}[0], %0";
>> +   default:
>> + gcc_unreachable ();
>> +  }
>> +  }
>> +  "&& reload_completed
>> +   && REG_P (operands[1])
>> +   && REGNO (operands[0]) == REGNO (operands[1])"
>
> Is it guaranteed that operands[0] will be a REG rtx here?
> The aarch64_simd_nonimmediate_operand predicate and Utv constraint might allow
> a MEM, which would cause an error in an RTL-checking build.
> If I'm not mistaken GCC may attempt the split even when matching alternative 2

Bah, good catch.  The REG_P was checking the wrong operand.
Fixed version below.

Richard


2018-01-26  Richard Sandiford  

gcc/
* config/aarch64/aarch64-sve.md (*vec_extract_0): New
pattern.
(*vec_extract_v128): Require a nonzero lane number.

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2018-01-26 15:14:36.016144657 +
+++ gcc/config/aarch64/aarch64-sve.md   2018-01-26 15:14:36.171138212 +
@@ -484,18 +484,52 @@ (define_expand "vec_extract"
   }
 )
 
+;; Extract element zero.  This is a special case because we want to force
+;; the registers to be the same for the second alternative, and then
+;; split the instruction into nothing after RA.
+(define_insn_and_split "*vec_extract_0"
+  [(set (match_operand: 0 "aarch64_simd_nonimmediate_operand" "=r, w, 
Utv")
+   (vec_select:
+ (match_operand:SVE_ALL 1 "register_operand" "w, 0, w")
+ (parallel [(const_int 0)])))]
+  "TARGET_SVE"
+  {
+operands[1] = gen_rtx_REG (mode, REGNO (operands[1]));
+switch (which_alternative)
+  {
+   case 0:
+ return "umov\\t%0, %1.[0]";
+   case 1:
+ return "#";
+   case 2:
+ return "st1\\t{%1.}[0], %0";
+   default:
+ gcc_unreachable ();
+  }
+  }
+  "&& reload_completed
+   && REG_P (operands[0])
+   && REGNO (operands[0]) == REGNO (operands[1])"
+  [(const_int 0)]
+  {
+emit_note (NOTE_INSN_DELETED);
+DONE;
+  }
+  [(set_attr "type" "neon_to_gp_q, untyped, neon_store1_one_lane_q")]
+)
+
 ;; Extract an element from the Advanced SIMD portion of the regist

Re: [SLP/AArch64] Fix unpack handling for big-endian SVE

2018-01-26 Thread Richard Biener
On Fri, Jan 26, 2018 at 4:04 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Fri, Jan 26, 2018 at 3:21 PM, Richard Sandiford
>>  wrote:
>>> I hadn't realised that on big-endian targets, VEC_UNPACK*HI_EXPR unpacks
>>> the low-numbered lanes and VEC_UNPACK*LO_EXPR unpacks the high-numbered
>>> lanes.  This meant that both the SVE patterns and the handling of
>>> fully-masked loops were wrong.
>>>
>>> The patch deals with that by making sure that all vec_unpack* optabs
>>> are define_expands, using BYTES_BIG_ENDIAN to choose the appropriate
>>> define_insn.  This in turn meant that we can get rid of the duplication
>>> between the signed and unsigned patterns for predicates.  (We provide
>>> implementations of both the signed and unsigned optabs because the sign
>>> doesn't matter for predicates: every element contains only one
>>> significant bit.)
>>>
>>> Also, the float unpacks need to unpack one half of the input vector,
>>> but the unpacked upper bits are "don't care".  There are two obvious
>>> ways of handling that: use an unpack (filling with zeros) or use a ZIP
>>> (filling with a duplicate of the low bits).  The code previously used
>>> unpacks, but the sequence involved a subreg that is semantically an
>>> element reverse on big-endian targets.  Using the ZIP patterns avoids
>>> that, and at the moment there's no reason to prefer one over the other
>>> for performance reasons, so the patch switches to ZIP unconditionally.
>>> As the comment says, it would be easy to optimise this later if UUNPK
>>> turns out to be better for some implementations.
>>>
>>> Tested on aarch64_be-elf, aarch64-linux-gnu and x86_64-linux-gnu.
>>> I think the tree-vect-loop-manip.c part is obvious, but OK for the
>>> AArch64 bits?
>>>
>>> Richard
>>>
>>>
>>> 2018-01-26  Richard Sandiford  
>>>
>>> gcc/
>>> * tree-vect-loop-manip.c (vect_maybe_permute_loop_masks):
>>> Reverse the choice between VEC_UNPACK_LO_EXPR and VEC_UNPACK_HI_EXPR
>>> for big-endian.
>>> * config/aarch64/iterators.md (hi_lanes_optab): New int attribute.
>>> * config/aarch64/aarch64-sve.md
>>> (*aarch64_sve_): Rename to...
>>> (aarch64_sve_): ...this.
>>> (*extend2): Rename to...
>>> (aarch64_sve_extend2): ...this.
>>> (vec_unpack__): Turn into a define_expand,
>>> renaming the old pattern to...
>>> (aarch64_sve_punpk_): ...this.  Only define
>>> unsigned packs.
>>> (vec_unpack__): Turn into a
>>> define_expand, renaming the old pattern to...
>>> (aarch64_sve_unpk_): ...this.
>>> (*vec_unpacku___no_convert): Delete.
>>> (vec_unpacks__): Take BYTES_BIG_ENDIAN into
>>> account when deciding which SVE instruction the optab should use.
>>> (vec_unpack_float__vnx4si): Likewise.
>>>
>>> gcc/testsuite/
>>> * gcc.target/aarch64/sve/unpack_fcvt_signed_1.c: Expect zips rather
>>> than unpacks.
>>> * gcc.target/aarch64/sve/unpack_fcvt_unsigned_1.c: Likewise.
>>> * gcc.target/aarch64/sve/unpack_float_1.c: Likewise.
>>>
>>> Index: gcc/tree-vect-loop-manip.c
>>> ===
>>> --- gcc/tree-vect-loop-manip.c  2018-01-13 18:02:00.948360274 +
>>> +++ gcc/tree-vect-loop-manip.c  2018-01-26 14:00:15.717916957 +
>>> @@ -334,7 +334,8 @@ vect_maybe_permute_loop_masks (gimple_se
>>> {
>>>   tree src = src_rgm->masks[i / 2];
>>>   tree dest = dest_rgm->masks[i];
>>> - tree_code code = (i & 1 ? VEC_UNPACK_HI_EXPR
>>> + tree_code code = ((i & 1) == (BYTES_BIG_ENDIAN ? 0 : 1)
>>> +   ? VEC_UNPACK_HI_EXPR
>>> : VEC_UNPACK_LO_EXPR);
>>
>> This looks broken anyways -- the element oder on GIMPLE is the same
>> for little and big-endian.  I see two other BYTES_BIG_ENDIAN uses in
>> tree-vect-*, not sure when they crept in but they are all broken.
>
> Are you sure?  Everything apart from this new code seems consistently
> to treat UNPACK_HI as "high part of the register" rather than
> "high memory address/lane number".  I think changing it now would
> break powerpc64 and mips.

Looks like so.  But we agreed on changing this, looks like this is a left-over.
(I'm always looking at constant folding for how it's supposed to work).
Looks like VEC_WIDEN_LSHIFT_HI/O_EXPR doesn't have constant folding.

Richard.

> Thanks,
> Richard


C++ PATCH for c++/82514, ICE with local class in generic lambda

2018-01-26 Thread Jason Merrill
The problem here was that when substituting the local class we first
substitute its context, and we weren't finding the rebuilt function
we're inside of, so we tried to create new instantiations of the
closure and call operator, leading to chaos.

The core of this patch is the change to tsubst_function_decl to
properly look up the enclosing function that we want for the context.
The subtle bit was handling the case where we're actually dealing with
a full instantiation of a generic lambda op(); we don't want to try to
look up an enclosing one when we're initially creating the decl as
part of normal template instantiation.  Since in the new model such a
full instantiation will always be from a template with exactly one
level of template parameters, we can check that to distinguish this
case.

I also needed to adjust enclosing_instantiation_of to handle that
case, as regenerated_lambda_fn_p was giving the wrong answer.  The
result is actually simpler.

The change to do_pushtag isn't necessary, but it's a memory saving
opportunity I noticed; we never look in local_classes for closures, so
we shouldn't put them there either.

It's always interesting when after you get a patch working, you start
cutting it back to what's actually needed and discard bits that you
tried along the way but turned out not to be necessary.  This patch
was a lot longer at one point...

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 220f44eaba310514a3410065dbf423886d5500d3
Author: Jason Merrill 
Date:   Wed Jan 24 13:33:00 2018 -0500

PR c++/82514 - ICE with local class in generic lambda.

* pt.c (regenerated_lambda_fn_p): Remove.
(enclosing_instantiation_of): Don't use it.
(tsubst_function_decl): Call enclosing_instantiation_of.

* pt.c (lookup_template_class_1): Add sanity check.
* name-lookup.c (do_pushtag): Don't add closures to local_classes.

diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index c37e52283e4..d0488c0a17e 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -6451,7 +6451,8 @@ do_pushtag (tree name, tree type, tag_scope scope)
 template instantiation rather than in some nested context.  */
  add_decl_expr (decl);
}
- else
+ /* Lambdas use LAMBDA_EXPR_DISCRIMINATOR instead.  */
+ else if (!LAMBDA_TYPE_P (type))
vec_safe_push (local_classes, type);
}
 }
diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index abfdbd96ae8..de8ad94200a 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -222,6 +222,7 @@ static tree tsubst_attributes (tree, tree, tsubst_flags_t, 
tree);
 static tree canonicalize_expr_argument (tree, tsubst_flags_t);
 static tree make_argument_pack (tree);
 static void register_parameter_specializations (tree, tree);
+static tree enclosing_instantiation_of (tree tctx);
 
 /* Make the current scope suitable for access checking when we are
processing T.  T can be FUNCTION_DECL for instantiated function
@@ -8951,6 +8952,10 @@ lookup_template_class_1 (tree d1, tree arglist, tree 
in_decl, tree context,
}
   else if (CLASS_TYPE_P (template_type))
{
+ /* Lambda closures are regenerated in tsubst_lambda_expr, not
+instantiated here.  */
+ gcc_assert (!LAMBDA_TYPE_P (template_type));
+
  t = make_class_type (TREE_CODE (template_type));
  CLASSTYPE_DECLARED_CLASS (t)
= CLASSTYPE_DECLARED_CLASS (template_type);
@@ -12183,9 +12188,20 @@ tsubst_function_decl (tree t, tree args, 
tsubst_flags_t complain,
return t;
 
   /* Calculate the most general template of which R is a
-specialization, and the complete set of arguments used to
-specialize R.  */
+specialization.  */
   gen_tmpl = most_general_template (DECL_TI_TEMPLATE (t));
+
+  /* We're substituting a lambda function under tsubst_lambda_expr but not
+directly from it; find the matching function we're already inside.
+But don't do this if T is a generic lambda with a single level of
+template parms, as in that case we're doing a normal instantiation. */
+  if (LAMBDA_FUNCTION_P (t) && !lambda_fntype
+ && (!generic_lambda_fn_p (t)
+ || TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (gen_tmpl)) > 1))
+   return enclosing_instantiation_of (t);
+
+  /* Calculate the complete set of arguments used to
+specialize R.  */
   argvec = tsubst_template_args (DECL_TI_ARGS
 (DECL_TEMPLATE_RESULT
  (DECL_TI_TEMPLATE (t))),
@@ -12609,24 +12625,15 @@ lambda_fn_in_template_p (tree fn)
   return CLASSTYPE_TEMPLATE_INFO (closure) != NULL_TREE;
 }
 
-/* True if FN is the op() for a lambda regenerated from a lambda in an
-   uninstantiated template.  */
-
-bool
-regenerated_lambda_fn_p (tree fn)
-{
-  return (LAMBDA_FUNCTION_P (fn)
- && !DECL_TEMPLATE_

Re: [PATCH] Fix PR81082

2018-01-26 Thread Christophe Lyon
On 26 January 2018 at 16:13, Richard Biener  wrote:
> On Fri, Jan 26, 2018 at 3:57 PM, Christophe Lyon
>  wrote:
>> On 26 January 2018 at 11:25, Richard Biener  wrote:
>>> On Thu, 25 Jan 2018, Richard Biener wrote:
>>>
 On Thu, 25 Jan 2018, Marc Glisse wrote:

 > On Thu, 25 Jan 2018, Richard Biener wrote:
 >
 > > --- gcc/match.pd  (revision 257047)
 > > +++ gcc/match.pd  (working copy)
 > > @@ -1939,6 +1939,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 > >  (minus (convert (view_convert:stype @1))
 > >   (convert (view_convert:stype @2)))
 > >
 > > +/* (A * C) +- (B * C) -> (A+-B) * C and (A * C) +- A -> A * (C+-1).
 > > +Modeled after fold_plusminus_mult_expr.  */
 > > +(if (!TYPE_SATURATING (type)
 > > + && (!FLOAT_TYPE_P (type) || flag_associative_math))
 > > + (for plusminus (plus minus)
 > > +  (simplify
 > > +   (plusminus (mult:s @0 @1) (mult:cs @0 @2))
 >
 > No :c on the first mult, so we don't actually handle A*C+B*C?

 Hmm, I somehow convinced myself that it's only necessary on one
 of the mults...  but you are of course right.  Will re-test with
 that fixed.
>>>
>>> This is what I have applied.  Note I had to XFAIL a minor part of
>>> gcc.dg/tree-ssa/loop-15.c, namely simplifying the final value
>>> replacement down to n * n.  While the patch should enable this
>>> the place where the transform could trigger with enough information
>>> about n is VRP but that doesn't end up folding all stmts - and
>>> that's something I'm not willing to change right now.
>>>
>>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied on trunk.
>>>
>>
>> Hi,
>>
>>
>> I think this patch caused regressions on aarch64:
>> FAIL: gcc.dg/wmul-1.c scan-tree-dump-not widening_mul "WIDEN_MULT_PLUS_EXPR"
>
> Late forwprop does
>
> @@ -75,7 +21,7 @@
>_1 = (long unsigned int) Idx_6(D);
>_2 = _1 * 40;
>_12 = _1 * 4;
> -  _17 = _2 + _12;
> +  _17 = _1 * 44;
>_13 = Arr_7(D) + _17;
>MEM[(int[10] *)_13] = 1;
>_4 = _2 + 400;
>   _18 = _4 + _12;
>   _16 = Arr_7(D) + _18;
>   MEM[(int[10] *)_16] = 2;
>
>
> which I'm not sure ends up profitable at the end.  It makes _12
> dead so the total number of multiplications stays the same.
> For this we then apply the WIDEN_MULT_PLUS_EXPR two
> times given _2 is no longer used multiple times.
>
> The reason the above transform happens is the match.pd :s
> behavior which "ignores" :s in case the resulting expression
> is "simple".
>
> Can you open a bugreport?
>

Sure, this is:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84067

> Richard.
>
>> Thanks,
>>
>> Christophe
>>
>>> Richard.
>>>
>>> 2018-01-26  Richard Biener  
>>>
>>> PR tree-optimization/81082
>>> * fold-const.c (fold_plusminus_mult_expr): Do not perform the
>>> association if it requires casting to unsigned.
>>> * match.pd ((A * C) +- (B * C) -> (A+-B)): New patterns derived
>>> from fold_plusminus_mult_expr to catch important cases late when
>>> range info is available.
>>>
>>> * gcc.dg/vect/pr81082.c: New testcase.
>>> * gcc.dg/tree-ssa/loop-15.c: XFAIL the (int)((unsigned)n + -1U) * n 
>>> + n
>>> simplification to n * n.
>>>
>>> Index: gcc/fold-const.c
>>> ===
>>> --- gcc/fold-const.c(revision 257047)
>>> +++ gcc/fold-const.c(working copy)
>>> @@ -7097,7 +7097,7 @@ fold_plusminus_mult_expr (location_t loc
>>>
>>>/* Same may be zero and thus the operation 'code' may overflow.  Likewise
>>>   same may be minus one and thus the multiplication may overflow.  
>>> Perform
>>> - the operations in an unsigned type.  */
>>> + the sum operation in an unsigned type.  */
>>>tree utype = unsigned_type_for (type);
>>>tree tem = fold_build2_loc (loc, code, utype,
>>>   fold_convert_loc (loc, utype, alt0),
>>> @@ -7110,9 +7110,9 @@ fold_plusminus_mult_expr (location_t loc
>>>  return fold_build2_loc (loc, MULT_EXPR, type,
>>> fold_convert (type, tem), same);
>>>
>>> -  return fold_convert_loc (loc, type,
>>> -  fold_build2_loc (loc, MULT_EXPR, utype, tem,
>>> -   fold_convert_loc (loc, utype, 
>>> same)));
>>> +  /* Do not resort to unsigned multiplication because
>>> + we lose the no-overflow property of the expression.  */
>>> +  return NULL_TREE;
>>>  }
>>>
>>>  /* Subroutine of native_encode_expr.  Encode the INTEGER_CST
>>> Index: gcc/match.pd
>>> ===
>>> --- gcc/match.pd(revision 257047)
>>> +++ gcc/match.pd(working copy)
>>> @@ -1939,6 +1939,37 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>>>   (minus (convert (view_convert:stype @1))
>>> (convert (view_convert:stype @2)))
>>>
>>> +/* (A * C) +- (B * C) -> (A+-B) *

[PATCH, i386]: Fix PR81763, Issues with BMI on 32bit x86 apps on GCC 7.1+

2018-01-26 Thread Uros Bizjak
Hello!

It turned out that *andndi3_doubleword pattern needs earlyclobber on
the output operand of its (=r,r,rm) alternative to prevent partial
regs of the first split instruction from clobbering input operands of
the second insn.The patch also adds a couple of alternatives with
matching input and output operands that are guaranteed to not clobber
the other input operand in a sequence of split insns of a
three-operand non-destructive insn.

2018-01-26  Uros Bizjak  

PR target/81763
* config/i386/i386.md (*andndi3_doubleword): Add earlyclobber
to (=&r,r,rm) alternative. Add (=r,0,rm) and (=r,r,0) alternatives.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline, will be bacported to gcc-7 branch early next week.

Uros.
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 5cd3ec0..fe9649d 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -9250,14 +9250,14 @@
 })
 
 (define_insn "*andndi3_doubleword"
-  [(set (match_operand:DI 0 "register_operand" "=r,&r")
+  [(set (match_operand:DI 0 "register_operand" "=&r,r,r,&r")
(and:DI
- (not:DI (match_operand:DI 1 "register_operand" "r,0"))
- (match_operand:DI 2 "nonimmediate_operand" "rm,rm")))
+ (not:DI (match_operand:DI 1 "register_operand" "r,0,r,0"))
+ (match_operand:DI 2 "nonimmediate_operand" "rm,rm,0,rm")))
(clobber (reg:CC FLAGS_REG))]
   "!TARGET_64BIT && TARGET_STV && TARGET_SSE2"
   "#"
-  [(set_attr "isa" "bmi,*")])
+  [(set_attr "isa" "bmi,bmi,bmi,*")])
 
 (define_split
   [(set (match_operand:DI 0 "register_operand")


[PATCH] Fix PR84057

2018-01-26 Thread Richard Biener

There's another latent bug in loop unrolling edge/region removal code
which doesn't deal with removing edges in already removed regions.

The following fixes this.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2018-01-26  Richard Biener  

PR tree-optimization/84057
* tree-ssa-loop-ivcanon.c (unloop_loops): Deal with already
removed paths when removing edges.

* gcc.dg/graphite/pr84057.c: New testcase.

Index: gcc/tree-ssa-loop-ivcanon.c
===
--- gcc/tree-ssa-loop-ivcanon.c (revision 257077)
+++ gcc/tree-ssa-loop-ivcanon.c (working copy)
@@ -660,14 +660,21 @@ unloop_loops (bitmap loop_closed_ssa_inv
   loops_to_unloop.release ();
   loops_to_unloop_nunroll.release ();
 
-  /* Remove edges in peeled copies.  */
+  /* Remove edges in peeled copies.  Given remove_path removes dominated
+ regions we need to cope with removal of already removed paths.  */
   unsigned i;
   edge e;
+  auto_vec src_bbs;
+  src_bbs.reserve_exact (edges_to_remove.length ());
   FOR_EACH_VEC_ELT (edges_to_remove, i, e)
-{
-  bool ok = remove_path (e, irred_invalidated, 
loop_closed_ssa_invalidated);
-  gcc_assert (ok);
-}
+src_bbs.quick_push (e->src->index);
+  FOR_EACH_VEC_ELT (edges_to_remove, i, e)
+if (BASIC_BLOCK_FOR_FN (cfun, src_bbs[i]))
+  {
+   bool ok = remove_path (e, irred_invalidated,
+  loop_closed_ssa_invalidated);
+   gcc_assert (ok);
+  }
   edges_to_remove.release ();
 }
 
Index: gcc/testsuite/gcc.dg/graphite/pr84057.c
===
--- gcc/testsuite/gcc.dg/graphite/pr84057.c (nonexistent)
+++ gcc/testsuite/gcc.dg/graphite/pr84057.c (working copy)
@@ -0,0 +1,31 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -fgraphite -funroll-loops -fno-tree-ccp -fno-tree-dce" } 
*/
+
+int ue;
+
+void
+fr (int ct)
+{
+  int au = 0;
+  int *ra = &au;
+
+  while (au < 1)
+{
+  au -= 0x7878788;
+  if (au != ct && ue != 0)
+   {
+ while (au < 1)
+   {
+   }
+
+fc:
+ while (ct != 0)
+   {
+   }
+   }
+}
+
+  for (au = 0; au < 2; ++au)
+if (ct != 0)
+  goto fc;
+}


[og7] Privatize independent OpenACC reductions

2018-01-26 Thread Cesar Philippidis
I've applied this patch to openacc-gcc-7-branch which privatizes
reduction variables inside independent loops which are guaranteed to be
assigned worker or vector partitioning during oaccdevlow. Without this
patch, the inner-reduction.c test case would generate bogus results.
This patch is an extension of Chung-Lin's patch posted here
.

The next step going forward is to teach the gimplifier to mark those
reduction variables are implicitly firstprivate, instead of
copy+private. Some members in the OpenACC technical committee argue that
the reduction variable in inner-reduction.c should be firstprivate, not
copy.

Cesar
Privatize independent OpenACC reductions.

2018-01-26  Cesar Philippidis  

	gcc/
	* gimplify.c (oacc_privatize_reduction): New function.
	(omp_add_variable): Use it to determine if a reduction variable
	needs to be privatized.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/inner-reduction.c: New test.


diff --git a/gcc/gimplify.c b/gcc/gimplify.c
index 7a9cc241792..72ed8f1a249 100644
--- a/gcc/gimplify.c
+++ b/gcc/gimplify.c
@@ -6604,6 +6604,32 @@ omp_firstprivatize_type_sizes (struct gimplify_omp_ctx *ctx, tree type)
   lang_hooks.types.omp_firstprivatize_type_sizes (ctx, type);
 }
 
+/* Determine if CTX might contain any gang partitioned loops.  During
+   oacc_dev_low, independent loops are assign gangs at the outermost
+   level, and vectors in the innermost.  */
+
+static bool
+oacc_privatize_reduction (struct gimplify_omp_ctx *ctx)
+{
+  if (ctx == NULL)
+return false;
+
+  if (ctx->region_type != ORT_ACC)
+return false;
+
+  for (tree c = ctx->clauses; c; c = OMP_CLAUSE_CHAIN (c))
+switch (OMP_CLAUSE_CODE (c))
+  {
+  case OMP_CLAUSE_SEQ:
+	return oacc_privatize_reduction (ctx->outer_context);
+  case OMP_CLAUSE_GANG:
+	return true;
+  default:;
+  }
+
+  return true;
+}
+
 /* Add an entry for DECL in the OMP context CTX with FLAGS.  */
 
 static void
@@ -6733,7 +6759,14 @@ omp_add_variable (struct gimplify_omp_ctx *ctx, tree decl, unsigned int flags)
 	}
 
   /* Set new copy map as 'private' if sure we're not gang-partitioning.  */
-  bool map_private = !gang && (worker || vector);
+  bool map_private;
+
+  if (gang)
+	map_private = false;
+  else if (worker || vector)
+	map_private = true;
+  else
+	map_private = oacc_privatize_reduction (ctx->outer_context);
 
   while (outer_ctx)
 	{
diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/inner-reduction.c b/libgomp/testsuite/libgomp.oacc-c-c++-common/inner-reduction.c
new file mode 100644
index 000..0c317dcf8a6
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/inner-reduction.c
@@ -0,0 +1,23 @@
+#include 
+
+int
+main ()
+{
+  const int n = 1000;
+  int i, j, temp, a[n];
+
+#pragma acc parallel loop
+  for (i = 0; i < n; i++)
+{
+  temp = i;
+#pragma acc loop reduction (+:temp)
+  for (j = 0; j < n; j++)
+	temp ++;
+  a[i] = temp;
+}
+
+  for (i = 0; i < n; i++)
+assert (a[i] == i+n);
+
+  return 0;
+}


[PATCH][OBVIOUS] Fix ifunc detection.

2018-01-26 Thread Martin Liška
Hi.

This fixes detection of ifunc target capability.
I'm going to install the patch.

Martin

gcc/testsuite/ChangeLog:

2018-01-26  Martin Liska  

* lib/target-supports.exp: Return a value, otherwise -Wreturn-type
warning is seen.
---
 gcc/testsuite/lib/target-supports.exp | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)


diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index 24514233cea..c2ec93b9c80 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -449,7 +449,7 @@ proc check_ifunc_available { } {
 	extern "C" {
 	#endif
 	typedef void F (void);
-	F* g (void) {}
+	F* g (void) { return 0; }
 	void f () __attribute__ ((ifunc ("g")));
 	#ifdef __cplusplus
 	}



C++ PATCH for c++/84036, ICE with variadic capture

2018-01-26 Thread Jason Merrill
My recent patch for 82249 caused a substitution here to return a
TREE_VEC rather than an EXPR_PACK_EXPANSION; fixed by pulling out the
expansion in that case.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 577e646e8472e6bc226168fb96d0884663b0c2cd
Author: Jason Merrill 
Date:   Fri Jan 26 10:51:17 2018 -0500

PR c++/84036 - ICE with variadic capture.

* pt.c (tsubst_pack_expansion): When optimizing a simple
substitution, pull a single pack expansion out of its pack.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index de8ad94200a..6c5d06b9ebb 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -11575,6 +11575,12 @@ tsubst_pack_expansion (tree t, tree args, 
tsubst_flags_t complain,
   && TREE_PURPOSE (packs) == pattern)
 {
   tree args = ARGUMENT_PACK_ARGS (TREE_VALUE (packs));
+
+  /* If the argument pack is a single pack expansion, pull it out.  */
+  if (TREE_VEC_LENGTH (args) == 1
+ && pack_expansion_args_count (args))
+   return TREE_VEC_ELT (args, 0);
+
   /* Types need no adjustment, nor does sizeof..., and if we still have
 some pack expansion args we won't do anything yet.  */
   if (TREE_CODE (t) == TYPE_PACK_EXPANSION
diff --git a/gcc/testsuite/g++.dg/cpp1y/lambda-generic-variadic8.C 
b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-variadic8.C
new file mode 100644
index 000..7740d660de2
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/lambda-generic-variadic8.C
@@ -0,0 +1,13 @@
+// PR c++/84036
+// { dg-do compile { target c++14 } }
+
+template < typename T >
+auto f(T){
+[](auto ... i){
+[i ...]{};
+};
+}
+
+int main(){
+f(0);
+}


Re: [PATCH][AArch64] Fix gcc.target/aarch64/subs_compare_[12].c

2018-01-26 Thread James Greenhalgh
On Tue, Jan 23, 2018 at 02:49:03PM +, Kyrill Tkachov wrote:
> Hi all,
> 
> This patch fixes the testsuite failures gcc.target/aarch64/subs_compare_1.c
> and subs_compare_2.c The tests check that we combine a sequence like:
>  sub w2, w0, w1
>  cmp w0, w1
> 
> into
>  subsw2, w0, w1
> 
> This is done by a couple of peepholes in aarch64.md.
> 
> Unfortunately due to scheduling and other optimisations the SUB and CMP
> can come in a different order:
>  cmp w0, w1
>  sub w0, w0, w1
> 
> And the existing peepholes cannot catch that and we fail to combine the two.
> This patch adds a peephole that matches the CMP as the first insn and the SUB
> as the second and outputs a SUBS.  This is almost equivalent to the existing
> peephole that matches SUB first and CMP second except that it doesn't have
> the restriction that the output register of the SUB has to not be one of the
> input registers.  Remember "sub w0, w0, w1 ; cmp w0, w1" is *not* equivalent
> to: "subs  w0, w0, w1" but "cmp w0, w1 ; sub w0, w0, w1" is.
> 
> So this is what this patch does. It adds a peephole for the case above and
> one for the SUB-immediate variant (because the SUB-immediate is represented
> as PLUS-of-negated-immediate and thus has different RTL structure).
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Ok for trunk?

OK.

Thanks,
James

> Thanks,
> Kyrill
> 
> 2018-01-23  Kyrylo Tkachov  
> 
>  * config/aarch64/aarch64.md: Add peepholes for CMP + SUB -> SUBS
>  and CMP + SUB-immediate -> SUBS.




Re: [PATCH] avoid assuming known string length is constant (PR 83896)

2018-01-26 Thread Martin Sebor

On 01/22/2018 09:33 AM, Jakub Jelinek wrote:

On Tue, Jan 16, 2018 at 03:20:24PM -0700, Martin Sebor wrote:

gcc/ChangeLog:

PR tree-optimization/83896
* tree-ssa-strlen.c (get_string_len): Rename...
(get_string_cst_length): ...to this.  Return HOST_WIDE_INT.
Avoid assuming length is constant.
(handle_char_store): Use HOST_WIDE_INT for string length.

gcc/testsuite/ChangeLog:

PR tree-optimization/83896
* gcc.dg/strlenopt-43.c: New test.

...


   if (TREE_CODE (rhs) == MEM_REF
   && integer_zerop (TREE_OPERAND (rhs, 1)))


For the future, there is no reason it couldn't handle
also non-zero offsets if the to be returned length is bigger or
equal than that offset, where the offset can be then subtracted.
But let's defer that for GCC9.


I opened bug 84069 for this oversight.




@@ -2789,7 +2793,8 @@ handle_pointer_plus (gimple_stmt_iterator *gsi)
  if (idx > 0)
{
  strinfo *si = get_strinfo (idx);
- if (si && si->full_string_p)
+ if (si && si->full_string_p
+ && tree_fits_shwi_p (si->nonzero_chars))


If a && or || using condition doesn't fit onto a single line, it should be
split into one condition per line, so please change the above into:
  if (si
  && si->full_string_p
  && tree_fits_shwi_p (si->nonzero_chars))


@@ -2822,7 +2824,7 @@ handle_char_store (gimple_stmt_iterator *gsi)
   unsigned HOST_WIDE_INT offset = 0;

   /* Set to the length of the string being assigned if known.  */
-  int rhslen;
+  HOST_WIDE_INT rhslen;


Please remove the empty line above the above comment, the function
starts with definitions of multiple variables, rhslen isn't even
initialized and there is nothing special on it compared to others,
so they should be in one block.

Ok for trunk with those changes.


I committed the patch with the tweaks suggested above and below
in 257100.

Martin




--- gcc/testsuite/gcc.dg/strlenopt-43.c (nonexistent)
+++ gcc/testsuite/gcc.dg/strlenopt-43.c (working copy)
@@ -0,0 +1,13 @@
+/* PR tree-optimization/83896 - ice in get_string_len on a call to strlen
+   with non-constant length
+   { dg-do compile }
+   { dg-options "-O2 -Wall -fdump-tree-optimized" } */
+
+extern char a[5];
+extern char b[];
+
+void f (void)
+{
+  if (__builtin_strlen (b) != 4)
+__builtin_memcpy (a, b, sizeof a);
+}


Most of the strlenopt*.c tests use the strlenopt.h header and then
use the non-__builtin_ function names, if you want to change that too,
please do so, but not a requirement.

Jakub





Re: [PATCH] C++: avoid most reserved words as misspelling suggestions (PR c++/81610 and PR c++/80567)

2018-01-26 Thread David Malcolm
On Mon, 2017-12-11 at 17:24 -0500, Jason Merrill wrote:
> On Wed, Nov 22, 2017 at 10:36 AM, David Malcolm 
> wrote:

Original post:
  https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02048.html

> > PR c++/81610 and PR c++/80567 report problems where the C++
> > frontend
> > suggested "if", "for" and "else" as corrections for misspelled
> > variable
> > names.

I've now marked these PRs as regressions: the nonsensical suggestions
are only offered by trunk, not by gcc 7 and earlier.

> Hmm, what about cases where people are actually misspelling keywords?
> Don't we want to handle that?
> 
> fi (true) { }
> retrun 42;

I'd prefer not to.

gcc 7 and earlier don't attempt to correct the spelling of the "fi" and
"retrun" above.

trunk currently does offer "return" as a suggestion, but it was by
accident, and I'm wary of attempting to support these corrections: is
"fi" meant to be an "if", or a function call that's missing its decl,
or a name lookup issue?  ...etc

> In the PRs you mention, the actual identifiers are 1) missing
> includes, which we should check first, and 2) pretty far from the
> suggested keywords.

The C++ FE is missing a suggestion about which #include to use for
"memset", but I'd prefer to treat that as a follow-up patch (and
probably for next stage 1).

In the meantime, is this patch OK for trunk? (as a regression fix)

Thanks
Dave


[PATCH] rs6000: Fix safe-indirect-jump-[18].c

2018-01-26 Thread Segher Boessenkool
Thist patch merges the safe-indirect-jump-1.c and -8.c testcases,
since they do the same thing.  On the 64-bit and AIX ABIs the indirect
call is not a sibcall, since there is code generated after the call
(the restore of r2).  On the 32-bit non-AIX ABIs it is a sibcall.

Tested on powerpc64-linux {-m32,-m64}, and on powerpc-ibm-aix7.1.3.0 .
Is this okay for trunk and 7 branch?


Segher


2018-01-26  Segher Boessenkool  

gcc/testsuite/
* gcc.target/powerpc/safe-indirect-jump-1.c: Build on all targets.
Make expected output depend on whether we expect sibcalls or not.
* gcc.target/powerpc/safe-indirect-jump-8.c: Delete (merged into
safe-indirect-jump-1.c).

---
 gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-1.c | 10 --
 gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-8.c | 16 
 2 files changed, 8 insertions(+), 18 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-8.c

diff --git a/gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-1.c 
b/gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-1.c
index 99cfab2..d1ab83a 100644
--- a/gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-1.c
+++ b/gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-1.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { lp64 } } } */
+/* { dg-do compile } */
 /* { dg-additional-options "-mno-speculate-indirect-jumps" } */
 
 /* Test for deliberate misprediction of indirect calls.  */
@@ -11,4 +11,10 @@ int bar ()
 }
 
 /* { dg-final { scan-assembler "crset 2" } } */
-/* { dg-final { scan-assembler "beqctrl-" } } */
+
+/* The AIX and ELFv2 ABIs don't allow a sibcall here.  */
+/* { dg-final { scan-assembler "beqctrl-" { target { lp64 || { powerpc*-*-aix* 
} } } } } */
+
+/* The other ABIs do allow a sibcall.  */
+/* { dg-final { scan-assembler "beqctr-" { target { ilp32 && !powerpc*-*-aix* 
} } } } */
+/* { dg-final { scan-assembler {b \$} { target { ilp32 && !powerpc*-*-aix* } } 
} } */
diff --git a/gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-8.c 
b/gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-8.c
deleted file mode 100644
index 0a6f231..000
--- a/gcc/testsuite/gcc.target/powerpc/safe-indirect-jump-8.c
+++ /dev/null
@@ -1,16 +0,0 @@
-/* { dg-do compile { target { ilp32 } } } */
-/* { dg-skip-if "" { powerpc*-*-aix* } } */
-/* { dg-additional-options "-O2 -mno-speculate-indirect-jumps" } */
-
-/* Test for deliberate misprediction of -m32 sibcalls.  */
-
-extern int (*f)();
-
-int bar ()
-{
-  return (*f) ();
-}
-
-/* { dg-final { scan-assembler "crset 2" } } */
-/* { dg-final { scan-assembler "beqctr-" } } */
-/* { dg-final { scan-assembler {b \$} } } */
-- 
1.8.3.1



Re: [PATCH, fortran] Support Fortran 2018 teams

2018-01-26 Thread Alessandro Fanfarillo
Committed revision 257105.

Thanks.

On Wed, Jan 24, 2018 at 3:17 PM, Jakub Jelinek  wrote:
> On Wed, Jan 24, 2018 at 08:19:58PM +, Paul Richard Thomas wrote:
>> (Jakub, This is all hidden behind the -fcoarray option. To my mind
>> this is safe for release.)
>
> Ok from RM POV.
>
> Jakub



-- 

Alessandro Fanfarillo, Ph.D.
Postdoctoral Researcher
National Center for Atmospheric Research
Mesa Lab, Boulder, CO, USA
303-497-1229


Re: [PATCH] rs6000: Fix safe-indirect-jump-[18].c

2018-01-26 Thread David Edelsohn
On Fri, Jan 26, 2018 at 2:27 PM, Segher Boessenkool
 wrote:
> Thist patch merges the safe-indirect-jump-1.c and -8.c testcases,
> since they do the same thing.  On the 64-bit and AIX ABIs the indirect
> call is not a sibcall, since there is code generated after the call
> (the restore of r2).  On the 32-bit non-AIX ABIs it is a sibcall.
>
> Tested on powerpc64-linux {-m32,-m64}, and on powerpc-ibm-aix7.1.3.0 .
> Is this okay for trunk and 7 branch?
>
>
> Segher
>
>
> 2018-01-26  Segher Boessenkool  
>
> gcc/testsuite/
> * gcc.target/powerpc/safe-indirect-jump-1.c: Build on all targets.
> Make expected output depend on whether we expect sibcalls or not.
> * gcc.target/powerpc/safe-indirect-jump-8.c: Delete (merged into
> safe-indirect-jump-1.c).

Okay.

Thanks, David


C++ PATCH for c++/83956, wrong error with anon union and deleted dtor

2018-01-26 Thread Jason Merrill
Here, when we walk the subobjects to determine the exception
specification of a defaulted destructor in order to apply it to a
user-provided destructor, we shouldn't look at variant members dtors,
which aren't called.

We really shouldn't be looking at variant members here at all except
to determine deletedness, but decided to go with a more limited change
since we're in stage 4.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit df8979f62c2977bd18a2e085e90dbdd4c2f8f98e
Author: Jason Merrill 
Date:   Fri Jan 26 11:38:19 2018 -0500

PR c++/83956 - wrong dtor error with anonymous union

* method.c (walk_field_subobs): Variant members only affect
deletedness.
(maybe_explain_implicit_delete): Pass &deleted_p for diagnostic.

diff --git a/gcc/cp/method.c b/gcc/cp/method.c
index 5bc830cd820..33029d7967e 100644
--- a/gcc/cp/method.c
+++ b/gcc/cp/method.c
@@ -1305,6 +1305,15 @@ walk_field_subobs (tree fields, tree fnname, 
special_function_kind sfk,
  || DECL_ARTIFICIAL (field))
continue;
 
+  /* Variant members only affect deletedness.  In particular, they don't
+affect the exception-specification of a user-provided destructor,
+which we're figuring out via get_defaulted_eh_spec.  So if we aren't
+asking if this is deleted, don't even look up the function; we don't
+want an error about a deleted function we aren't actually calling.  */
+  if (sfk == sfk_destructor && deleted_p == NULL
+ && TREE_CODE (DECL_CONTEXT (field)) == UNION_TYPE)
+   break;
+
   mem_type = strip_array_types (TREE_TYPE (field));
   if (assign_p)
{
@@ -1850,7 +1859,7 @@ maybe_explain_implicit_delete (tree decl)
  "%q#D is implicitly deleted because the default "
  "definition would be ill-formed:", decl);
  synthesized_method_walk (ctype, sfk, const_p,
-  NULL, NULL, NULL, NULL, true,
+  NULL, NULL, &deleted_p, NULL, true,
   &inh, parms);
}
  else if (!comp_except_specs
diff --git a/gcc/testsuite/g++.dg/cpp0x/anon-union2.C 
b/gcc/testsuite/g++.dg/cpp0x/anon-union2.C
new file mode 100644
index 000..38c91f41a5c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/anon-union2.C
@@ -0,0 +1,12 @@
+// PR c++/83956
+// { dg-do compile { target c++11 } }
+
+struct a {
+  ~a() = delete;
+};
+struct b {
+  ~b() {}
+  union {
+a c;
+  };
+};


[PATCH] RISC-V: Add --specs=nosys.specs support.

2018-01-26 Thread Jim Wilson
This makes --specs=nosys.specs work correctly.  Without this patch, libnosys
is ignored because libgloss gets pulled in first.  We may have to revisit this
in the future when we have some proper BSPs defined for various RISC-V
hardware.  Meanwhile, adding libgloss by default makes things easier for
inexperienced users and I don't want to break that.

This was tested with a testsuite run for a newlib target, there were no
regressions.  Also tested by hand to verify that --specs=nosys.specs works
correctly.

Committed.

Jim

gcc/
* config/riscv/elf.h (LIB_SPEC): Don't include -lgloss when nosys.specs
specified.
---
 gcc/config/riscv/elf.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/riscv/elf.h b/gcc/config/riscv/elf.h
index 43ad68bbdf2..f39e83234d2 100644
--- a/gcc/config/riscv/elf.h
+++ b/gcc/config/riscv/elf.h
@@ -24,7 +24,7 @@ along with GCC; see the file COPYING3.  If not see
 /* Link against Newlib libraries, because the ELF backend assumes Newlib.
Handle the circular dependence between libc and libgloss. */
 #undef  LIB_SPEC
-#define LIB_SPEC "--start-group -lc -lgloss --end-group"
+#define LIB_SPEC "--start-group -lc %{!specs=nosys.specs:-lgloss} --end-group"
 
 #undef  STARTFILE_SPEC
 #define STARTFILE_SPEC "crt0%O%s crtbegin%O%s"
-- 
2.14.1


Re: [PATCH] fix for diagnostic/84034

2018-01-26 Thread David Malcolm
On Thu, 2018-01-25 at 18:53 +, Bernd Edlinger wrote:
> Hi,
> 
> as PR diagnostic/84034 shows, source files with
> dos style line endings can cause a glitch in the
> terminal emulation that erases the source line that
> is supposed to be shown.
> 
> That happens when the colorizing escape sequences are
> printed between the CR and the LF.  Apparently the LF is
> being ignored and thus the following line overwrites
> everything from the last source line.
> 
> 
> The following patch fixes the visual glitch by handling
> a CR '\r' like a TAB '\t' character.
> 
> 
> Bootstrapped and reg-rested in x86_64-pc-linux-gnu.
> OK for trunk?

Thanks for working on this.

[BEGIN BRAIN-DUMP:

Before gcc 6, we handled this case by printing (e.g. gcc 5):
/tmp/t.c: In function ‘test’:
/tmp/t.c:5:20: warning: suggest parentheses around ‘&&’ within ‘||’ [-
Wparentheses]
   (d && b > e) &&
^
where all lines would end with LF, apart from the quoted source line,
which would end with CR+LF.

>From gcc 6 onwards, the lines all end with LF, apart from quoted source
lines, which end with:
  CR + (end-colorization) + LF.

What's happening is that layout::print_source_line etc treat the CR as
non-whitespace, and prints it.  Then layout::print_newline () prints
the end-colorization and LF.

Presumably we *don't* want to preserve that mixed-ending behavior in
our output from gcc 5 and earlier descibed above; that seems like an
accident of the old implementation, and doesn't seem useful.

Your patch effectively turns the CR before the LF into trailing
whitespace, stopping the CR from being printed, turning it into just:
  (end-colorization) + LF
and thus fixing the glitch.

END BRAIN-DUMP]

The patch is OK.

Thanks
Dave


[PATCH] Extend lazy computation of expensive preprocessor macros to FLOATN*

2018-01-26 Thread Jakub Jelinek
Hi!

Honza reported today on IRC that we spent (again) significant time
of empty file compilation computing preprocessor *_MAX/*_MIN etc. macros.
In 2010 I've added lazy computation for these, only when they are first used
except for -dD, but reserved just 12 entries for those, as only
FLT/DBL/LDBL prefixed macros (4 times for each kind) were needed at that
point.  In 2016 for PR32187 Joseph has added a bunch of other kinds and
because there is no space in the array reserved for those, they are
evaluated right away, which is quite expensive.

The following patch makes them lazy again.
Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-01-26  Jakub Jelinek  

* c-cppbuiltin.c (c_cpp_builtins): Use ggc_strdup for the fp_suffix
argument.
(lazy_hex_fp_values): Allow up to 36 lazy hex fp values rather than
just 12.
(builtin_define_with_hex_fp_value): Likewise.

* include/cpplib.h (enum cpp_builtin_type): Change BT_LAST_USER from
BT_FIRST_USER + 31 to BT_FIRST_USER + 63.

--- gcc/c-family/c-cppbuiltin.c.jj  2018-01-03 10:20:21.369538150 +0100
+++ gcc/c-family/c-cppbuiltin.c 2018-01-26 11:01:15.266648197 +0100
@@ -1124,8 +1124,8 @@ c_cpp_builtins (cpp_reader *pfile)
   floatn_nx_types[i].extended ? "X" : "");
   sprintf (csuffix, "F%d%s", floatn_nx_types[i].n,
   floatn_nx_types[i].extended ? "x" : "");
-  builtin_define_float_constants (prefix, csuffix, "%s", csuffix,
- FLOATN_NX_TYPE_NODE (i));
+  builtin_define_float_constants (prefix, ggc_strdup (csuffix), "%s",
+ csuffix, FLOATN_NX_TYPE_NODE (i));
 }
 
   /* For decfloat.h.  */
@@ -1571,7 +1571,7 @@ struct GTY(()) lazy_hex_fp_value_struct
   int digits;
   const char *fp_suffix;
 };
-static GTY(()) struct lazy_hex_fp_value_struct lazy_hex_fp_values[12];
+static GTY(()) struct lazy_hex_fp_value_struct lazy_hex_fp_values[36];
 static GTY(()) int lazy_hex_fp_value_count;
 
 static bool
@@ -1616,7 +1616,7 @@ builtin_define_with_hex_fp_value (const
   char dec_str[64], buf[256], buf1[128], buf2[64];
 
   /* This is very expensive, so if possible expand them lazily.  */
-  if (lazy_hex_fp_value_count < 12
+  if (lazy_hex_fp_value_count < 36
   && flag_dump_macros == 0
   && !cpp_get_options (parse_in)->traditional)
 {
--- libcpp/include/cpplib.h.jj  2018-01-18 21:11:59.890207215 +0100
+++ libcpp/include/cpplib.h 2018-01-26 10:58:10.249699482 +0100
@@ -719,7 +719,7 @@ enum cpp_builtin_type
   BT_COUNTER,  /* `__COUNTER__' */
   BT_HAS_ATTRIBUTE,/* `__has_attribute__(x)' */
   BT_FIRST_USER,   /* User defined builtin macros.  */
-  BT_LAST_USER = BT_FIRST_USER + 31
+  BT_LAST_USER = BT_FIRST_USER + 63
 };
 
 #define CPP_HASHNODE(HNODE)((cpp_hashnode *) (HNODE))

Jakub


[PATCH] Avoid O(n^2) compile time issue in sched2 with debug insns (PR middle-end/84040)

2018-01-26 Thread Jakub Jelinek
Hi!

On a testcase which has 10 consecutive debug insns sched2 spends
a lot of time calling prev_nonnote_nondebug_insn on each of the debug
insns, even when it is completely useless, because no target wants
to fuse a non-debug insn with some debug insn after it, it makes sense
only for two non-debug insns.

By returning early for those, we'll just walk the long set of them once when
we process some non-debug instruction after a long block of debug insns.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-01-26  Jakub Jelinek  

PR middle-end/84040
* sched-deps.c (sched_macro_fuse_insns): Return immediately for
debug insns.

--- gcc/sched-deps.c.jj 2018-01-03 10:19:56.301534141 +0100
+++ gcc/sched-deps.c2018-01-26 16:21:01.922414579 +0100
@@ -2834,10 +2834,16 @@ static void
 sched_macro_fuse_insns (rtx_insn *insn)
 {
   rtx_insn *prev;
+  /* No target hook would return true for debug insn as any of the
+ hook operand, and with very large sequences of only debug insns
+ where on each we call sched_macro_fuse_insns it has quadratic
+ compile time complexity.  */
+  if (DEBUG_INSN_P (insn))
+return;
   prev = prev_nonnote_nondebug_insn (insn);
   if (!prev)
 return;
- 
+
   if (any_condjump_p (insn))
 {
   unsigned int condreg1, condreg2;

Jakub


[PATCH] Avoid creating >= 8GB EXPR_CONSTANT initializers for ilp32 targets (PR fortran/84065)

2018-01-26 Thread Jakub Jelinek
Hi!

resolve_charlen is going to error on ilp32 target charlens above what
can fit into 32-bit signed int, but add_init_expr_to_sym is done before
resolve_charlen, allocating huge amounts of memory in those cases
when we'll error later is just waste of compile time if running on 64-bit
host with enough memory, or will unnecessarily ICE due to out of memory
otherwise.

Another option would be to emit the error resolve_charlen is going to
emit and arrange for it not to be emitted afterwards, this just seemed
to be easier.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

It would be nice if the FE had some way to express very large repetitive
EXPR_CONSTANT more efficiently, say by not allocating the memory for
the rest which is filled only by ' ', or by allowing to say this and this
is repeated after the initial portion this many times.

2018-01-26  Jakub Jelinek  

PR fortran/84065
* decl.c (add_init_expr_to_sym): Ignore initializers for too large
lengths.

--- gcc/fortran/decl.c.jj   2018-01-23 21:35:04.0 +0100
+++ gcc/fortran/decl.c  2018-01-26 18:24:22.064763299 +0100
@@ -1757,22 +1757,32 @@ add_init_expr_to_sym (const char *name,
  if (!gfc_specification_expr (sym->ts.u.cl->length))
return false;
 
- HOST_WIDE_INT len = gfc_mpz_get_hwi 
(sym->ts.u.cl->length->value.integer);
-
- if (init->expr_type == EXPR_CONSTANT)
-   gfc_set_constant_character_len (len, init, -1);
- else if (init->expr_type == EXPR_ARRAY)
+ int k = gfc_validate_kind (BT_INTEGER, gfc_charlen_int_kind,
+false);
+ /* resolve_charlen will complain later on if the length
+is too large.  Just skeep the initialization in that case.  */
+ if (mpz_cmp (sym->ts.u.cl->length->value.integer,
+  gfc_integer_kinds[k].huge) <= 0)
{
- gfc_constructor *c;
+ HOST_WIDE_INT len
+   = gfc_mpz_get_hwi (sym->ts.u.cl->length->value.integer);
+
+ if (init->expr_type == EXPR_CONSTANT)
+   gfc_set_constant_character_len (len, init, -1);
+ else if (init->expr_type == EXPR_ARRAY)
+   {
+ gfc_constructor *c;
 
- /* Build a new charlen to prevent simplification from
-deleting the length before it is resolved.  */
- init->ts.u.cl = gfc_new_charlen (gfc_current_ns, NULL);
- init->ts.u.cl->length = gfc_copy_expr (sym->ts.u.cl->length);
+ /* Build a new charlen to prevent simplification from
+deleting the length before it is resolved.  */
+ init->ts.u.cl = gfc_new_charlen (gfc_current_ns, NULL);
+ init->ts.u.cl->length
+   = gfc_copy_expr (sym->ts.u.cl->length);
 
- for (c = gfc_constructor_first (init->value.constructor);
-  c; c = gfc_constructor_next (c))
-   gfc_set_constant_character_len (len, c->expr, -1);
+ for (c = gfc_constructor_first (init->value.constructor);
+  c; c = gfc_constructor_next (c))
+   gfc_set_constant_character_len (len, c->expr, -1);
+   }
}
}
}

Jakub


Re: [PATCH] Extend lazy computation of expensive preprocessor macros to FLOATN*

2018-01-26 Thread Joseph Myers
On Sat, 27 Jan 2018, Jakub Jelinek wrote:

> Hi!
> 
> Honza reported today on IRC that we spent (again) significant time
> of empty file compilation computing preprocessor *_MAX/*_MIN etc. macros.
> In 2010 I've added lazy computation for these, only when they are first used
> except for -dD, but reserved just 12 entries for those, as only
> FLT/DBL/LDBL prefixed macros (4 times for each kind) were needed at that
> point.  In 2016 for PR32187 Joseph has added a bunch of other kinds and
> because there is no space in the array reserved for those, they are
> evaluated right away, which is quite expensive.
> 
> The following patch makes them lazy again.
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Rather than hardcoding a number 36 I'd rather it used an explicit 
computation (with explanatory comment) such as 4 * (3 + 
NUM_FLOATN_NX_TYPES) (if that is indeed a safe value to use to guarantee 
covering all these types).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: C++ PATCH to fix ICE with vector expr folding (PR c++/83659)

2018-01-26 Thread Jakub Jelinek
On Fri, Jan 26, 2018 at 02:11:19PM +0100, Richard Biener wrote:
> >> POINTER_PLUS_EXPR offets are to be interpreted as signed (ptrdiff_t)
> >> so using uhwi and then performing an unsigned division is wrong code.
> >> See mem_ref_offset how to deal with this (ugh - poly-ints...).  Basically
> >> you have to force the thing to signed.  Like just use
> >>
> >>   HOST_WIDE_INT offset = TREE_INT_CST_LOW (op01);
> >
> > Does it really matter here though?  Any negative offsets there are UB, we
> > should punt on them rather than try to optimize them.
> > As we known op01 is unsigned, if we check that it fits into shwi_p, it means
> > it will be 0 to shwi max and then we can handle it in uhwi too.
> 
> Ah, of course.  Didn't look up enough context to see what this code
> does in the end ;)
> 
> >   /* ((foo*)&vectorfoo)[1] => BIT_FIELD_REF */
> >   if (VECTOR_TYPE_P (op00type)
> >   && (same_type_ignoring_top_level_qualifiers_p
> > -(type, TREE_TYPE (op00type
> > +(type, TREE_TYPE (op00type)))
> > + && tree_fits_shwi_p (op01))
> 
> nevertheless this appearant "mismatch" deserves a comment (checking
> shwi and using uhwi).

So like this?

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2018-01-26  Marek Polacek  
Jakub Jelinek  

PR c++/83659
* constexpr.c (cxx_fold_indirect_ref): Use unsigned HOST_WIDE_INT
when computing offsets.  Verify first that tree_fits_shwi_p (op01).
Formatting fix.

* g++.dg/torture/pr83659.C: New test.

--- gcc/cp/constexpr.c.jj   2018-01-24 17:18:42.392392254 +0100
+++ gcc/cp/constexpr.c  2018-01-26 18:54:43.991828743 +0100
@@ -3070,7 +3070,8 @@ cxx_fold_indirect_ref (location_t loc, t
{
  tree part_width = TYPE_SIZE (type);
  tree index = bitsize_int (0);
- return fold_build3_loc (loc, BIT_FIELD_REF, type, op, part_width, 
index);
+ return fold_build3_loc (loc, BIT_FIELD_REF, type, op, part_width,
+ index);
}
   /* Also handle conversion to an empty base class, which
 is represented with a NOP_EXPR.  */
@@ -3109,12 +3110,22 @@ cxx_fold_indirect_ref (location_t loc, t
 
  /* ((foo*)&vectorfoo)[1] => BIT_FIELD_REF */
  if (VECTOR_TYPE_P (op00type)
- && (same_type_ignoring_top_level_qualifiers_p
- (type, TREE_TYPE (op00type
+ && same_type_ignoring_top_level_qualifiers_p
+   (type, TREE_TYPE (op00type))
+ /* POINTER_PLUS_EXPR second operand is sizetype, unsigned,
+but we want to treat offsets with MSB set as negative.
+For the code below negative offsets are invalid and
+TYPE_SIZE of the element is something unsigned, so
+check whether op01 fits into HOST_WIDE_INT, which implies
+it is from 0 to INTTYPE_MAXIMUM (HOST_WIDE_INT), and
+then just use tree_to_uhwi because we want to treat the
+value as unsigned.  */
+ && tree_fits_shwi_p (op01))
{
- HOST_WIDE_INT offset = tree_to_shwi (op01);
+ unsigned HOST_WIDE_INT offset = tree_to_uhwi (op01);
  tree part_width = TYPE_SIZE (type);
- unsigned HOST_WIDE_INT part_widthi = tree_to_shwi 
(part_width)/BITS_PER_UNIT;
+ unsigned HOST_WIDE_INT part_widthi
+   = tree_to_uhwi (part_width) / BITS_PER_UNIT;
  unsigned HOST_WIDE_INT indexi = offset * BITS_PER_UNIT;
  tree index = bitsize_int (indexi);
 
--- gcc/testsuite/g++.dg/torture/pr83659.C.jj   2018-01-26 18:46:40.077572013 
+0100
+++ gcc/testsuite/g++.dg/torture/pr83659.C  2018-01-26 18:56:36.822888606 
+0100
@@ -0,0 +1,18 @@
+// PR c++/83659
+// { dg-do compile }
+
+typedef int V __attribute__ ((__vector_size__ (16)));
+V a;
+V b[2];
+
+int
+foo ()
+{
+  return reinterpret_cast  (&a)[-1] += 1;
+}
+
+int
+bar ()
+{
+  return reinterpret_cast  (&a[1])[-1];
+}


Jakub


Re: [PATCH] Extend lazy computation of expensive preprocessor macros to FLOATN*

2018-01-26 Thread Jakub Jelinek
On Fri, Jan 26, 2018 at 11:22:04PM +, Joseph Myers wrote:
> On Sat, 27 Jan 2018, Jakub Jelinek wrote:
> > Honza reported today on IRC that we spent (again) significant time
> > of empty file compilation computing preprocessor *_MAX/*_MIN etc. macros.
> > In 2010 I've added lazy computation for these, only when they are first used
> > except for -dD, but reserved just 12 entries for those, as only
> > FLT/DBL/LDBL prefixed macros (4 times for each kind) were needed at that
> > point.  In 2016 for PR32187 Joseph has added a bunch of other kinds and
> > because there is no space in the array reserved for those, they are
> > evaluated right away, which is quite expensive.
> > 
> > The following patch makes them lazy again.
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> Rather than hardcoding a number 36 I'd rather it used an explicit 
> computation (with explanatory comment) such as 4 * (3 + 
> NUM_FLOATN_NX_TYPES) (if that is indeed a safe value to use to guarantee 
> covering all these types).

Works for me, this tests fine on a couple of tests, ok for trunk if it
passes bootstrap/regtest?

2018-01-27  Jakub Jelinek  

* c-cppbuiltin.c (c_cpp_builtins): Use ggc_strdup for the fp_suffix
argument.
(LAZY_HEX_FP_VALUES_CNT): Define.
(lazy_hex_fp_values): Allow up to LAZY_HEX_FP_VALUES_CNT lazy hex fp
values rather than just 12.
(builtin_define_with_hex_fp_value): Likewise.

* include/cpplib.h (enum cpp_builtin_type): Change BT_LAST_USER from
BT_FIRST_USER + 31 to BT_FIRST_USER + 63.

--- gcc/c-family/c-cppbuiltin.c.jj  2018-01-03 10:20:21.369538150 +0100
+++ gcc/c-family/c-cppbuiltin.c 2018-01-26 11:01:15.266648197 +0100
@@ -1124,8 +1124,8 @@ c_cpp_builtins (cpp_reader *pfile)
   floatn_nx_types[i].extended ? "X" : "");
   sprintf (csuffix, "F%d%s", floatn_nx_types[i].n,
   floatn_nx_types[i].extended ? "x" : "");
-  builtin_define_float_constants (prefix, csuffix, "%s", csuffix,
- FLOATN_NX_TYPE_NODE (i));
+  builtin_define_float_constants (prefix, ggc_strdup (csuffix), "%s",
+ csuffix, FLOATN_NX_TYPE_NODE (i));
 }
 
   /* For decfloat.h.  */
@@ -1571,7 +1571,14 @@ struct GTY(()) lazy_hex_fp_value_struct
   int digits;
   const char *fp_suffix;
 };
-static GTY(()) struct lazy_hex_fp_value_struct lazy_hex_fp_values[12];
+/* Number of the expensive to compute macros we should evaluate lazily.
+   Each builtin_define_float_constants invocation calls
+   builtin_define_with_hex_fp_value 4 times and builtin_define_float_constants
+   is called for FLT, DBL, LDBL and up to NUM_FLOATN_NX_TYPES times for
+   FLTNN*.  */ 
+#define LAZY_HEX_FP_VALUES_CNT (4 * (3 + NUM_FLOATN_NX_TYPES))
+static GTY(()) struct lazy_hex_fp_value_struct
+  lazy_hex_fp_values[LAZY_HEX_FP_VALUES_CNT];
 static GTY(()) int lazy_hex_fp_value_count;
 
 static bool
@@ -1616,7 +1623,7 @@ builtin_define_with_hex_fp_value (const
   char dec_str[64], buf[256], buf1[128], buf2[64];
 
   /* This is very expensive, so if possible expand them lazily.  */
-  if (lazy_hex_fp_value_count < 12
+  if (lazy_hex_fp_value_count < LAZY_HEX_FP_VALUES_CNT
   && flag_dump_macros == 0
   && !cpp_get_options (parse_in)->traditional)
 {
--- libcpp/include/cpplib.h.jj  2018-01-18 21:11:59.890207215 +0100
+++ libcpp/include/cpplib.h 2018-01-26 10:58:10.249699482 +0100
@@ -719,7 +719,7 @@ enum cpp_builtin_type
   BT_COUNTER,  /* `__COUNTER__' */
   BT_HAS_ATTRIBUTE,/* `__has_attribute__(x)' */
   BT_FIRST_USER,   /* User defined builtin macros.  */
-  BT_LAST_USER = BT_FIRST_USER + 31
+  BT_LAST_USER = BT_FIRST_USER + 63
 };
 
 #define CPP_HASHNODE(HNODE)((cpp_hashnode *) (HNODE))


Jakub


[PATCH] PR fortran/83633 -- check for nonconstant explicit shape array

2018-01-26 Thread Steve Kargl
The attached patch implements a check for F2015:C830.
The wording of the F2008:C531 is nearly identical, but
the restriction on BLOCK is noted in the normative test.
The 3 lines in the new testcase show be sufficient to
see the issue.  In regression testing, I needed to 
adjust the regex pattern in a few existing test because
it seems my patch now catches the problem earlier.

Regression tested on x86_64-*-freebsd.  OK to commit?

2018-01-26  Steven G. Kargl  

PR fortran/83633
* decl.c (variable_decl): Check that an explicit-shape-array with
nonconstant bounds is allowed.
 
2018-01-26  Steven G. Kargl  

PR fortran/83633
* gfortran.dg/explicit_shape_1.f90: New test.
* gfortran.dg/automatic_module_variable.f90: Update regex.
* gfortran.dg/bad_automatic_objects_1.f90: Ditto.
* gfortran.dg/constant_shape.f90: Ditto.
* gfortran.dg/dec_structure_23.f90: Ditto.
* gfortran.dg/pr78240.f90: Ditto.

-- 
Steve
Index: gcc/fortran/decl.c
===
--- gcc/fortran/decl.c	(revision 257103)
+++ gcc/fortran/decl.c	(working copy)
@@ -2280,7 +2280,10 @@ variable_decl (int elem)
   /* At this point, we know for sure if the symbol is PARAMETER and can thus
  determine (and check) whether it can be implied-shape.  If it
  was parsed as assumed-size, change it because PARAMETERs can not
- be assumed-size.  */
+ be assumed-size.
+
+ An explicit-shape-array cannot appear under several conditions.
+ That check is done here as well.  */
   if (as)
 {
   if (as->type == AS_IMPLIED_SHAPE && current_attr.flavor != FL_PARAMETER)
@@ -2302,6 +2305,44 @@ variable_decl (int elem)
 	  m = MATCH_ERROR;
 	  goto cleanup;
 	}
+
+  /* F2015:C830 (R816) An explicit-shape-spec whose bounds are not
+	 constant expressions shall appear only in a subprogram, derived
+	 type definition, BLOCK construct, or interface body.  */
+  if (as->type == AS_EXPLICIT
+	  && gfc_current_state () != COMP_BLOCK
+	  && gfc_current_state () != COMP_DERIVED
+	  && gfc_current_state () != COMP_FUNCTION
+	  && gfc_current_state () != COMP_INTERFACE
+	  && gfc_current_state () != COMP_SUBROUTINE)
+	{
+	  gfc_expr *e;
+	  bool not_constant = false;
+
+	  for (int i = 0; i < as->rank + as->corank; i++)
+	{
+	  e = gfc_copy_expr (as->lower[i]);
+	  gfc_resolve_expr (e);
+	  gfc_simplify_expr (e, 0);
+	  if (e && (e->expr_type != EXPR_CONSTANT))
+		not_constant = true;
+	  gfc_free_expr (e);
+
+	  e = gfc_copy_expr (as->upper[i]);
+	  gfc_resolve_expr (e);
+	  gfc_simplify_expr (e, 0);
+	  if (e && (e->expr_type != EXPR_CONSTANT))
+		not_constant = true;
+	  gfc_free_expr (e);
+	}
+
+	  if (not_constant)
+	{ 
+	  gfc_error ("Explicit shaped array with nonconstant bounds at %C");
+	  m = MATCH_ERROR;
+	  goto cleanup;
+	}
+	}
 }
 
   char_len = NULL;
@@ -2421,6 +2462,7 @@ variable_decl (int elem)
   && gfc_current_block ()->result
   && strcmp ("ppr@", gfc_current_block ()->result->name) == 0)
 strcpy (name, "ppr@");
+
 
   /* OK, we've successfully matched the declaration.  Now put the
  symbol in the current namespace, because it might be used in the
Index: gcc/testsuite/gfortran.dg/explicit_shape_1.f90
===
--- gcc/testsuite/gfortran.dg/explicit_shape_1.f90	(nonexistent)
+++ gcc/testsuite/gfortran.dg/explicit_shape_1.f90	(working copy)
@@ -0,0 +1,7 @@
+! { dg-do compile }
+! PR fortran/83633
+! Original testcase by Nathan T. Weeks  
+!
+integer :: A(command_argument_count()) = 1 ! { dg-error "nonconstant bounds" }
+write (*,*) A
+end
Index: gcc/testsuite/gfortran.dg/automatic_module_variable.f90
===
--- gcc/testsuite/gfortran.dg/automatic_module_variable.f90	(revision 257103)
+++ gcc/testsuite/gfortran.dg/automatic_module_variable.f90	(working copy)
@@ -1,10 +1,12 @@
 ! { dg-do compile }
 ! Tests fix for PR15976
 !
+! Error message update with patch for PR fortran/83633
+!
 module sd
   integer, parameter :: n = 20
   integer :: i(n)
-  integer :: j(m) ! { dg-error "must have constant shape" }
+  integer :: j(m) ! { dg-error "array with nonconstant bounds" }
   integer, pointer :: p(:)
   integer, allocatable :: q(:)
 contains
Index: gcc/testsuite/gfortran.dg/bad_automatic_objects_1.f90
===
--- gcc/testsuite/gfortran.dg/bad_automatic_objects_1.f90	(revision 257103)
+++ gcc/testsuite/gfortran.dg/bad_automatic_objects_1.f90	(working copy)
@@ -5,16 +5,18 @@
 !
 ! Contributed by Joost VandeVondele  
 !
+! Error message update with patch for PR fortran/83633
+!
 module foo
   integer::  i
 end module foo
 module bar
   use foo
-  integer, dimension (i) :: j ! { dg-error "must have constant shape" }
+  integer, dimension (i) 

Re: [PATCH] Extend lazy computation of expensive preprocessor macros to FLOATN*

2018-01-26 Thread Joseph Myers
On Sat, 27 Jan 2018, Jakub Jelinek wrote:

> Works for me, this tests fine on a couple of tests, ok for trunk if it
> passes bootstrap/regtest?
> 
> 2018-01-27  Jakub Jelinek  
> 
>   * c-cppbuiltin.c (c_cpp_builtins): Use ggc_strdup for the fp_suffix
>   argument.
>   (LAZY_HEX_FP_VALUES_CNT): Define.
>   (lazy_hex_fp_values): Allow up to LAZY_HEX_FP_VALUES_CNT lazy hex fp
>   values rather than just 12.
>   (builtin_define_with_hex_fp_value): Likewise.
> 
>   * include/cpplib.h (enum cpp_builtin_type): Change BT_LAST_USER from
>   BT_FIRST_USER + 31 to BT_FIRST_USER + 63.

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] i386: Use const reference of struct ix86_frame to avoid copy

2018-01-26 Thread H.J. Lu
On Thu, Jan 18, 2018 at 12:23 AM, Uros Bizjak  wrote:
> On Wed, Jan 17, 2018 at 5:00 PM, H.J. Lu  wrote:
>> We can use const reference of struct ix86_frame to avoid making a local
>> copy of ix86_frame.  ix86_expand_epilogue makes a local copy of struct
>> ix86_frame and uses the reg_save_offset field as a local variable.  This
>> patch uses a separate local variable for reg_save_offset.
>>
>> Tested on x86-64 with ada.  OK for trunk?
>
> OK.
>

I'd like to backport it to gcc-7-branch.   Is that OK?

Thanks.


H.J.
>
>> H.J.
>> --
>> PR target/83905
>> * config/i386/i386.c (ix86_expand_prologue): Use cost reference
>> of struct ix86_frame.
>> (ix86_expand_epilogue): Likewise.  Add a local variable for
>> the reg_save_offset field in struct ix86_frame.
>> ---
>>  gcc/config/i386/i386.c | 24 
>>  1 file changed, 12 insertions(+), 12 deletions(-)
>>
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index a301e18ed70..340eca42449 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -13385,7 +13385,6 @@ ix86_expand_prologue (void)
>>  {
>>struct machine_function *m = cfun->machine;
>>rtx insn, t;
>> -  struct ix86_frame frame;
>>HOST_WIDE_INT allocate;
>>bool int_registers_saved;
>>bool sse_registers_saved;
>> @@ -13413,7 +13412,7 @@ ix86_expand_prologue (void)
>>m->fs.sp_valid = true;
>>m->fs.sp_realigned = false;
>>
>> -  frame = m->frame;
>> +  const struct ix86_frame &frame = cfun->machine->frame;
>>
>>if (!TARGET_64BIT && ix86_function_ms_hook_prologue 
>> (current_function_decl))
>>  {
>> @@ -14291,7 +14290,6 @@ ix86_expand_epilogue (int style)
>>  {
>>struct machine_function *m = cfun->machine;
>>struct machine_frame_state frame_state_save = m->fs;
>> -  struct ix86_frame frame;
>>bool restore_regs_via_mov;
>>bool using_drap;
>>bool restore_stub_is_tail = false;
>> @@ -14304,7 +14302,7 @@ ix86_expand_epilogue (int style)
>>  }
>>
>>ix86_finalize_stack_frame_flags ();
>> -  frame = m->frame;
>> +  const struct ix86_frame &frame = cfun->machine->frame;
>>
>>m->fs.sp_realigned = stack_realign_fp;
>>m->fs.sp_valid = stack_realign_fp
>> @@ -14348,11 +14346,13 @@ ix86_expand_epilogue (int style)
>>   + UNITS_PER_WORD);
>>  }
>>
>> +  HOST_WIDE_INT reg_save_offset = frame.reg_save_offset;
>> +
>>/* Special care must be taken for the normal return case of a function
>>   using eh_return: the eax and edx registers are marked as saved, but
>>   not restored along this path.  Adjust the save location to match.  */
>>if (crtl->calls_eh_return && style != 2)
>> -frame.reg_save_offset -= 2 * UNITS_PER_WORD;
>> +reg_save_offset -= 2 * UNITS_PER_WORD;
>>
>>/* EH_RETURN requires the use of moves to function properly.  */
>>if (crtl->calls_eh_return)
>> @@ -14368,11 +14368,11 @@ ix86_expand_epilogue (int style)
>>else if (TARGET_EPILOGUE_USING_MOVE
>>&& cfun->machine->use_fast_prologue_epilogue
>>&& (frame.nregs > 1
>> -  || m->fs.sp_offset != frame.reg_save_offset))
>> +  || m->fs.sp_offset != reg_save_offset))
>>  restore_regs_via_mov = true;
>>else if (frame_pointer_needed
>>&& !frame.nregs
>> -  && m->fs.sp_offset != frame.reg_save_offset)
>> +  && m->fs.sp_offset != reg_save_offset)
>>  restore_regs_via_mov = true;
>>else if (frame_pointer_needed
>>&& TARGET_USE_LEAVE
>> @@ -14440,7 +14440,7 @@ ix86_expand_epilogue (int style)
>>rtx t;
>>
>>if (frame.nregs)
>> -   ix86_emit_restore_regs_using_mov (frame.reg_save_offset, style == 2);
>> +   ix86_emit_restore_regs_using_mov (reg_save_offset, style == 2);
>>
>>/* eh_return epilogues need %ecx added to the stack pointer.  */
>>if (style == 2)
>> @@ -14535,19 +14535,19 @@ ix86_expand_epilogue (int style)
>>  in epilogues.  */
>>if (!m->fs.sp_valid || m->fs.sp_realigned
>>   || (TARGET_SEH
>> - && (m->fs.sp_offset - frame.reg_save_offset
>> + && (m->fs.sp_offset - reg_save_offset
>>   >= SEH_MAX_FRAME_SIZE)))
>> {
>>   pro_epilogue_adjust_stack (stack_pointer_rtx, 
>> hard_frame_pointer_rtx,
>>  GEN_INT (m->fs.fp_offset
>> - - frame.reg_save_offset),
>> + - reg_save_offset),
>>  style, false);
>> }
>> -  else if (m->fs.sp_offset != frame.reg_save_offset)
>> +  else if (m->fs.sp_offset != reg_save_offset)
>> {
>>   pro_epilogue_adjust_stack (stack_pointer_rtx, stack_pointer_rtx,
>>  GEN_INT (m->fs.sp_offset
>> - - frame.reg_save_offset),
>> + 

Go patch committed: Show readable names in escape analysis messages

2018-01-26 Thread Ian Lance Taylor
This patch to the Go frontend shows readable names in the messages
printed for escape analysis.  This will print names like `x` instead
of `.p.x`.  Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 257069)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-897ce971b06a39c217d02dce9e1361bc7a240188
+13b25c25faa8afd625732d2630a4f9ece5cacb2e
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/escape.cc
===
--- gcc/go/gofrontend/escape.cc (revision 257033)
+++ gcc/go/gofrontend/escape.cc (working copy)
@@ -131,7 +131,7 @@ Node::ast_format(Gogo* gogo) const
   Named_object* no = this->object();
   if (no->is_function() && no->func_value()->enclosing() != NULL)
return "func literal";
-  ss << no->name();
+  ss << no->message_name();
 }
   else if (this->expr() != NULL)
 {
Index: gcc/go/gofrontend/expressions.cc
===
--- gcc/go/gofrontend/expressions.cc(revision 257033)
+++ gcc/go/gofrontend/expressions.cc(working copy)
@@ -783,7 +783,7 @@ Var_expression::do_get_backend(Translate
 void
 Var_expression::do_dump_expression(Ast_dump_context* ast_dump_context) const
 {
-  ast_dump_context->ostream() << this->variable_->name() ;
+  ast_dump_context->ostream() << this->variable_->message_name() ;
 }
 
 // Make a reference to a variable in an expression.
@@ -859,7 +859,7 @@ Enclosed_var_expression::do_address_take
 void
 Enclosed_var_expression::do_dump_expression(Ast_dump_context* adc) const
 {
-  adc->ostream() << this->variable_->name();
+  adc->ostream() << this->variable_->message_name();
 }
 
 // Make a reference to a variable within an enclosing function.
Index: gcc/go/gofrontend/wb.cc
===
--- gcc/go/gofrontend/wb.cc (revision 256820)
+++ gcc/go/gofrontend/wb.cc (working copy)
@@ -151,7 +151,7 @@ Check_escape::variable(Named_object* no)
   && no->result_var_value()->is_in_heap()))
 go_error_at(no->location(),
 "%s escapes to heap, not allowed in runtime",
-no->name().c_str());
+no->message_name().c_str());
   return TRAVERSE_CONTINUE;
 }
 


[PATCH] RISC-V: Allow register pairs for 64-bit target.

2018-01-26 Thread Jim Wilson
This was noticed while looking at some 64-bit libgcc code.  Some functions
like negti had a stack frame allocated, but did not use the stack.  The 32-bit
equivalent negdi did not have a stack frame allocated.  This is because a
128-bit local variable got allocated to the stack and then optimized away.
But we allow register pairs for the 32-bit target, we should allow them for
the 64-bit target too.  Fixed by adding a missing definition for
MAX_FIXED_MODE_SIZE.

This was tested by multiple 32-bit and 64-bit newlib and glibc builds and
make checks.  There were no regressions.  The libgcc code for negti no longer
has a stack frame allocated.  libstdc++.so is 3KB smaller with the patch.

Committed.

Jim

gcc/
* config/riscv/riscv.h (MAX_FIXED_MODE_SIZE): New.
---
 gcc/config/riscv/riscv.h | 4 
 1 file changed, 4 insertions(+)

diff --git a/gcc/config/riscv/riscv.h b/gcc/config/riscv/riscv.h
index a002bff4480..1c1c3431119 100644
--- a/gcc/config/riscv/riscv.h
+++ b/gcc/config/riscv/riscv.h
@@ -158,6 +158,10 @@ along with GCC; see the file COPYING3.  If not see
 
 #define PCC_BITFIELD_TYPE_MATTERS 1
 
+/* An integer expression for the size in bits of the largest integer machine
+   mode that should actually be used.  We allow pairs of registers.  */
+#define MAX_FIXED_MODE_SIZE GET_MODE_BITSIZE (TARGET_64BIT ? TImode : DImode)
+
 /* If defined, a C expression to compute the alignment for a static
variable.  TYPE is the data type, and ALIGN is the alignment that
the object would ordinarily have.  The value of this macro is used
-- 
2.14.1



Re: [PATCH] Avoid O(n^2) compile time issue in sched2 with debug insns (PR middle-end/84040)

2018-01-26 Thread Vladimir Makarov



On 01/26/2018 06:15 PM, Jakub Jelinek wrote:

Hi!

On a testcase which has 10 consecutive debug insns sched2 spends
a lot of time calling prev_nonnote_nondebug_insn on each of the debug
insns, even when it is completely useless, because no target wants
to fuse a non-debug insn with some debug insn after it, it makes sense
only for two non-debug insns.

By returning early for those, we'll just walk the long set of them once when
we process some non-debug instruction after a long block of debug insns.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?


OK.


2018-01-26  Jakub Jelinek  

PR middle-end/84040
* sched-deps.c (sched_macro_fuse_insns): Return immediately for
debug insns.

--- gcc/sched-deps.c.jj 2018-01-03 10:19:56.301534141 +0100
+++ gcc/sched-deps.c2018-01-26 16:21:01.922414579 +0100
@@ -2834,10 +2834,16 @@ static void
  sched_macro_fuse_insns (rtx_insn *insn)
  {
rtx_insn *prev;
+  /* No target hook would return true for debug insn as any of the
+ hook operand, and with very large sequences of only debug insns
+ where on each we call sched_macro_fuse_insns it has quadratic
+ compile time complexity.  */
+  if (DEBUG_INSN_P (insn))
+return;
prev = prev_nonnote_nondebug_insn (insn);
if (!prev)
  return;
-
+
if (any_condjump_p (insn))
  {
unsigned int condreg1, condreg2;


Re: [PATCH] Fix PR84003

2018-01-26 Thread Richard Biener
On Fri, 26 Jan 2018, Richard Biener wrote:

> On Fri, 26 Jan 2018, Richard Sandiford wrote:
> 
> > Richard Biener  writes:
> > > On Thu, 25 Jan 2018, Richard Sandiford wrote:
> > >
> > >> Richard Sandiford  writes:
> > >> > Richard Biener  writes:
> > >> >> The following patch fixes PR84003 where RTL DSE removes a redundant
> > >> >> store (a store storing the same value as an earlier store) but in
> > >> >> doing this changing program semantics because the later store changes
> > >> >> the effective type of the memory location.  This in turn allows
> > >> >> sched2 to do an originally invalid transform.
> > >> >>
> > >> >> The fix is to harden DSE so that while removing the later store
> > >> >> the earlier stores alias-sets are changed to reflect both dynamic
> > >> >> types - which means using alias-set zero.
> > >> >>
> > >> >> On GIMPLE we simply avoid removing such type-changing stores because
> > >> >> we cannot easily get hold on the earlier store.
> > >> >>
> > >> >> Bootstrap and regtest running on x86_64-unknown-linux-gnu.
> > >> >>
> > >> >> Disclaimer: this is a "local" fix, I didn't try to understand much
> > >> >> of the DSE data structures but only learned from +-10 lines around
> > >> >> the affected transform which I found by searching for the dumped
> > >> >> messages ...  Richard, you contributed the pass so please review.
> > >> >
> > >> > So the file says.  In reality I only wrote an early version, and what
> > >> > was committed contains very little of that.  So TBH this pass is almost
> > >> > a complete unknown to me.  That said...
> > >> >
> > >> > Might it be worth keeping the store instead?
> > >> 
> > >> ...that is, like gimple. :-)  Sorry, I spent so long thinking about this
> > >> that I forgot you'd said that gimple already does the same thing.
> > >
> > > Yeah, and it still runs into PR82224 ... which means it is not 
> > > conservative enough.
> > >
> > >> Richard
> > >> 
> > >> > Giving things alias set
> > >> > zero seems like a pretty big hammer and would turn the remaining store
> > >> > into something close to a scheduling barrier.
> > >> >
> > >> > IOW, how about checking alias_set_subset_of in:
> > >> >
> > >> >  /* Even if PTR won't be eliminated as unneeded, if both
> > >> > PTR and this insn store the same constant value, we might
> > >> > eliminate this insn instead.  */
> > >> >  if (s_info->const_rhs
> > >> >  && const_rhs
> > >> >  && known_subrange_p (offset, width,
> > >> >   s_info->offset, s_info->width)
> > >> >  && all_positions_needed_p (s_info, offset - 
> > >> > s_info->offset,
> > >> > width))
> > >> >
> > >> > instead?
> > >
> > > That's what GIMPLE does (use alias_set_subset_of), but it arguably
> > > doesn't work for unions (ok, there even the equality case doesn't 
> > > work...).
> > >
> > > I guess I thought it's worth trying sth new and adjust the earlier
> > > store ;)  Sth that I can't easily do on GIMPLE but everything seemed
> > > to be in place in RTL DSE.
> > >
> > > So, when looking at the above code it seems like there are exactly
> > > two insns we look at?  s_info and 'mem' I guess.  And s_info is
> > > the earlier one.
> > 
> > Yeah.
> > 
> > > So sth like
> > >
> > > Index: gcc/dse.c
> > > ===
> > > --- gcc/dse.c   (revision 257077)
> > > +++ gcc/dse.c   (working copy)
> > > @@ -1532,7 +1532,12 @@ record_store (rtx body, bb_info_t bb_inf
> > >   && known_subrange_p (offset, width,
> > >s_info->offset, s_info->width)
> > >   && all_positions_needed_p (s_info, offset - s_info->offset,
> > > -width))
> > > +width)
> > > + /* We can only remove the later store if the earlier aliases
> > > +at least all accesses the later one.  */
> > > + && (MEM_ALIAS_SET (mem) == MEM_ALIAS_SET (s_info->mem)
> > > + || alias_set_subset_of (MEM_ALIAS_SET (mem),
> > > + MEM_ALIAS_SET (s_info->mem
> > > {
> > >   if (GET_MODE (mem) == BLKmode)
> > > {
> > >
> > > ?  I can confirm that this patch works as well.  Is that what we prefer?
> > 
> > Not sure I can call that one, but it seems safer (especially for
> > backports).
> > 
> > > It would be consistent with what we do on GIMPLE currently (irrespective
> > > of still existing bugs in this area...).
> > >
> > > Note that with the above instead of dse1 removing the later store
> > > dse2 now decides to remove the earlier one...  (which is fine!).
> > >
> > > So I'm testing the above now, ok if it succeeds?
> > 
> > LGTM (but I can't approve it).
> 
> I can self-approve ;)
> 
> > I was wondering later... if the problem is