[patch] prevent .cfi_personality/.cfi_lsda on !dwarf eh configurations

2017-11-30 Thread Olivier Hainque
Hello,

Bootstrap is currently broken with Ada on x86_64-windows using SEH exceptions,
with several stage2/stage3 object comparison failures like

  Bootstrap comparison failure!
  gcc/ada/ali-util.o differs
  gcc/ada/ali.o differs
  ...

(PR 81470)

The differences come from variations between -g and -g0 on the placement of
data attached to LDFCM labels, in particular those related to the emission of
.cfi_personality and .cfi_lsda by dwarf2out_do_cfi_startproc.

These aren't needed for SEH anyway, so the proposed patch just prevents the
generation of those .cfi notes if the target eh format isn't dwarf2.

We have used this for a couple of years on gcc-6, it cures the same
bootstrap failure on gcc-7, and bootstraps + regression-tests fine
with mainline on x86_64-linux.

OK to commit ?

2017-11-30  Tristan Gingold  

* dwarf2out.c (dwarf2out_do_cfi_startproc): Only emit
.cfi_personality or .cfi_lsda if the eh data format is dwarf2.



cfi_startproc-nondwarf2.diff
Description: Binary data


[C++ PATCH] Plural forms for count != eltscnt structured binding diagnostics

2017-11-30 Thread Jakub Jelinek
On Wed, Nov 29, 2017 at 06:19:04PM -0700, Martin Sebor wrote:
> > --- gcc/cp/decl.c.jj2017-11-28 22:23:34.0 +0100
> > +++ gcc/cp/decl.c   2017-11-29 15:00:24.487658715 +0100
> > @@ -7518,6 +7518,12 @@ cp_finish_decomp (tree decl, tree first,
> >  "constant expression", type);
> >   goto error_out;
> > }
> > +  if (!tree_fits_uhwi_p (tsize))
> > +   {
> > + error_at (loc, "%u names provided while %qT decomposes into "
> 
> When count is 1 as in the test below the error isn't grammatically
> correct ("1 names").  I see that the same message is already issued
> elsewhere in the function so this seems like an opportunity to use
> the right form here and also fix the other one at the same time or
> in a followup.  The error_n function exists to issue the right form
> for the language, singular or plural.  It's not as convenient when
> the sentence contains two terms that may be singular or plural,
> but that can also be dealt with.

Here is an incremental patch to deal with that on top of the previously
posted patch, ok for trunk if it passes testing?  Note, not using inform_n
in this !tree_fits_uhwi_p case, because gettext doesn't recommend that for
negative values and the other values printed there (the very large ones) are
printed using hexadecimal where also I think human plural forms are rarely
useful.

2017-11-30  Jakub Jelinek  

* decl.c (cp_finish_decomp): Split up count != eltscnt and
!tree_fits_uhwi_p (tsize) error_at calls into error_n and inform_n
to handle plural forms properly.

* g++.dg/cpp1z/decomp3.C: Adjust for structured binding count
mismatch diagnostics split into error and warning with plural
forms.
* g++.dg/cpp1z/decomp10.C: Likewise.
* g++.dg/cpp1z/decomp32.C: Likewise.

--- gcc/cp/decl.c.jj2017-11-30 09:44:19.0 +0100
+++ gcc/cp/decl.c   2017-11-30 09:57:44.539504854 +0100
@@ -7445,11 +7445,18 @@ cp_finish_decomp (tree decl, tree first,
{
cnt_mismatch:
  if (count > eltscnt)
-   error_at (loc, "%u names provided while %qT decomposes into "
-  "%wu elements", count, type, eltscnt);
+   error_n (loc, count,
+"%u name provided for structured binding",
+"%u names provided for structured binding", count);
  else
-   error_at (loc, "only %u names provided while %qT decomposes into "
-  "%wu elements", count, type, eltscnt);
+   error_n (loc, count,
+"only %u name provided for structured binding",
+"only %u names provided for structured binding", count);
+ inform_n (loc, eltscnt != (unsigned long) eltscnt
+? (eltscnt % 100) + 100 : eltscnt,
+   "while %qT decomposes into %wu element",
+   "while %qT decomposes into %wu elements",
+   type, eltscnt);
  goto error_out;
}
   eltype = TREE_TYPE (type);
@@ -7520,8 +7527,11 @@ cp_finish_decomp (tree decl, tree first,
}
   if (!tree_fits_uhwi_p (tsize))
{
- error_at (loc, "%u names provided while %qT decomposes into "
-"%E elements", count, type, tsize);
+ error_n (loc, count,
+  "%u name provided for structured binding",
+  "%u names provided for structured binding", count);
+ inform (loc, "while %qT decomposes into %E elements",
+ type, tsize);
  goto error_out;
}
   eltscnt = tree_to_uhwi (tsize);
--- gcc/testsuite/g++.dg/cpp1z/decomp3.C.jj 2017-09-15 18:11:04.0 
+0200
+++ gcc/testsuite/g++.dg/cpp1z/decomp3.C2017-11-30 10:09:42.756619330 
+0100
@@ -51,16 +51,21 @@ int arr[4];
 void
 test3 (A &b, B c)
 {
-  auto [ d, e, f ] = arr;  // { dg-error "only 3 names provided 
while 'int .4.' decomposes into 4 elements" }
-   // { dg-warning "structured bindings 
only available with -std=c..17 or -std=gnu..17" "" { target c++14_down } .-1 }
-  auto & [ g, h, i, j, k ] = arr;  // { dg-error "5 names provided while 
'int .4.' decomposes into 4 elements" }
-   // { dg-warning "structured bindings 
only available with -std=c..17 or -std=gnu..17" "" { target c++14_down } .-1 }
-  auto [ l, m ] = b;   // { dg-error "only 2 names provided 
while 'A' decomposes into 3 elements" }
-   // { dg-warning "structured bindings 
only available with -std=c..17 or -std=gnu..17" "" { target c++14_down } .-1 }
-  auto & [ n, o, p, q ] = b;   // { dg-error "4 names provided while 
'A' decomposes into 3 elements" }
-   // { dg-warning "structured bindings 
only available with -std=c..17 or -std=gnu..17" "" { target c++14

Re: [PATCH] Improve __builtin_mul_overflow (uns, 1U << y, &res) (PR target/83210)

2017-11-30 Thread Richard Biener
On Wed, 29 Nov 2017, Jakub Jelinek wrote:

> Hi!
> 
> Even if target has an umulv4_optab pattern like i?86 mul[lq]; jo,
> if one argument is constant power of two, using two shifts, or
> for multiplication by 2 just shl with js on previous value is faster
> (though, for -Os mul[lq]; jo wins).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

So this is because we don't go through the standard multiplication
expander, right?  But for icode == CODE_FOR_nothing we probably
should do this and then emit the compare-and-jump?

And for constant multiplicators in general as well?

As you do it you assume that the targets optab is less efficient
than two shifts and a compare but how do you know?  Shouldn't we
do regular mult expansion and cost the resulting sequence against
the one from the umulv expander?

That said, the patch looks x86 specific but done in generic code...

Thanks,
Richard.

> 2017-11-29  Jakub Jelinek  
> 
>   PR target/83210
>   * internal-fn.c (expand_mul_overflow): Optimize unsigned
>   multiplication by power of 2 constant into two shifts + comparison.
> 
>   * gcc.target/i386/pr83210.c: New test.
> 
> --- gcc/internal-fn.c.jj  2017-11-22 21:37:46.0 +0100
> +++ gcc/internal-fn.c 2017-11-29 17:44:17.699009169 +0100
> @@ -1462,6 +1462,39 @@ expand_mul_overflow (location_t loc, tre
>type = build_nonstandard_integer_type (GET_MODE_PRECISION (mode), uns);
>sign = uns ? UNSIGNED : SIGNED;
>icode = optab_handler (uns ? umulv4_optab : mulv4_optab, mode);
> +  if (uns
> +  && (integer_pow2p (arg0) || integer_pow2p (arg1))
> +  && (optimize_insn_for_speed_p () || icode == CODE_FOR_nothing))
> +{
> +  /* Optimize unsigned multiplication by power of 2 constant
> +  using 2 shifts, one for result, one to extract the shifted
> +  out bits to see if they are all zero.
> +  Don't do this if optimizing for size and we have umulv4_optab,
> +  in that case assume multiplication will be shorter.  */
> +  rtx opn0 = op0;
> +  rtx opn1 = op1;
> +  tree argn0 = arg0;
> +  tree argn1 = arg1;
> +  if (integer_pow2p (arg0))
> + {
> +   std::swap (opn0, opn1);
> +   std::swap (argn0, argn1);
> + }
> +  int cnt = tree_log2 (argn1);
> +  if (cnt >= 0 && cnt < GET_MODE_PRECISION (mode))
> + {
> +   rtx upper = const0_rtx;
> +   res = expand_shift (LSHIFT_EXPR, mode, opn0, cnt, NULL_RTX, uns);
> +   if (cnt != 0)
> + upper = expand_shift (RSHIFT_EXPR, mode, opn0,
> +   GET_MODE_PRECISION (mode) - cnt,
> +   NULL_RTX, uns);
> +   do_compare_rtx_and_jump (upper, const0_rtx, EQ, true, mode,
> +NULL_RTX, NULL, done_label,
> +profile_probability::very_likely ());
> +   goto do_error_label;
> + }
> +}
>if (icode != CODE_FOR_nothing)
>  {
>struct expand_operand ops[4];
> --- gcc/testsuite/gcc.target/i386/pr83210.c.jj2017-11-29 
> 17:58:00.102881065 +0100
> +++ gcc/testsuite/gcc.target/i386/pr83210.c   2017-11-29 17:57:36.0 
> +0100
> @@ -0,0 +1,53 @@
> +/* PR target/83210 */
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +/* { dg-final { scan-assembler-not {\mmul[lq]\M} } } */
> +
> +void bar (void);
> +
> +unsigned
> +f1 (unsigned int x)
> +{
> +  unsigned res;
> +  if (__builtin_mul_overflow (x, 2, &res))
> +bar ();
> +  return res;
> +}
> +
> +unsigned long
> +f2 (unsigned long x)
> +{
> +  unsigned long res;
> +  if (__builtin_mul_overflow (16, x, &res))
> +bar ();
> +  return res;
> +}
> +
> +unsigned long long
> +f3 (unsigned long long x)
> +{
> +  unsigned long long res;
> +  if (__builtin_mul_overflow (x, (1ULL << (__SIZEOF_LONG_LONG__ * 
> __CHAR_BIT__ - 1)), &res))
> +bar ();
> +  return res;
> +}
> +
> +#ifdef __SIZEOF_INT128__
> +unsigned __int128
> +f4 (unsigned __int128 x)
> +{
> +  unsigned __int128 res;
> +  if (__builtin_mul_overflow (x, (((unsigned __int128) 1) << 
> (__SIZEOF_INT128__ * __CHAR_BIT__ / 2)), &res))
> +bar ();
> +  return res;
> +}
> +
> +unsigned __int128
> +f5 (unsigned __int128 x)
> +{
> +  unsigned __int128 res;
> +  if (__builtin_mul_overflow (x, (((unsigned __int128) 1) << 
> (__SIZEOF_INT128__ * __CHAR_BIT__ / 2 + 3)), &res))
> +bar ();
> +  return res;
> +}
> +#endif
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


[PATCH,RFC] combine: Remove use_crosses_set_p

2017-11-30 Thread Segher Boessenkool
This removes use_crosses_set_p, and uses modified_between_p instead
everywhere it was used.  This improves optimisation.  I'm a little bit
worried it might increase compile time; so far I haven't seen it do
that though.

Longer term I would like to get rid of reg_stat completely, perhaps
use dataflow everywhere.

Comments/ideas welcome, both on this patch and that longer-term plan.


Segher


2017-11-30  Segher Boessenkool  

* combine.c: Adjust comment.
(use_crosses_set_p): Delete.
(can_combine_p): Use modified_between_p instead of use_crosses_set_p.
(try_combine): Ditto.

---
 gcc/combine.c | 69 ++-
 1 file changed, 7 insertions(+), 62 deletions(-)

diff --git a/gcc/combine.c b/gcc/combine.c
index 22a382d..748c9a8 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -39,7 +39,7 @@ along with GCC; see the file COPYING3.  If not see
insn as having a logical link to the preceding insn.  The same is true
for an insn explicitly using CC0.
 
-   We check (with use_crosses_set_p) to avoid combining in such a way
+   We check (with modified_between_p) to avoid combining in such a way
as to move a computation to a place where its value would be different.
 
Combination is done by mathematically substituting the previous
@@ -482,7 +482,6 @@ static void record_dead_and_set_regs_1 (rtx, const_rtx, 
void *);
 static void record_dead_and_set_regs (rtx_insn *);
 static int get_last_value_validate (rtx *, rtx_insn *, int, int);
 static rtx get_last_value (const_rtx);
-static int use_crosses_set_p (const_rtx, int);
 static void reg_dead_at_p_1 (rtx, const_rtx, void *);
 static int reg_dead_at_p (rtx, rtx_insn *);
 static void move_deaths (rtx, rtx, int, rtx_insn *, rtx *);
@@ -2011,7 +2010,7 @@ can_combine_p (rtx_insn *insn, rtx_insn *i3, rtx_insn 
*pred ATTRIBUTE_UNUSED,
   || (! all_adjacent
  && (((!MEM_P (src)
|| ! find_reg_note (insn, REG_EQUIV, src))
-  && use_crosses_set_p (src, DF_INSN_LUID (insn)))
+  && modified_between_p (src, insn, i3))
  || (GET_CODE (src) == ASM_OPERANDS && MEM_VOLATILE_P (src))
  || GET_CODE (src) == UNSPEC_VOLATILE))
   /* Don't combine across a CALL_INSN, because that would possibly
@@ -3727,7 +3726,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
}
   else if (m_split_insn && NEXT_INSN (NEXT_INSN (m_split_insn)) == NULL_RTX
   && (next_nonnote_nondebug_insn (i2) == i3
-  || ! use_crosses_set_p (PATTERN (m_split_insn), DF_INSN_LUID 
(i2
+  || !modified_between_p (PATTERN (m_split_insn), i2, i3)))
{
  rtx i2set, i3set;
  rtx newi3pat = PATTERN (NEXT_INSN (m_split_insn));
@@ -3791,7 +3790,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
  || can_change_dest_mode (i2dest, added_sets_2,
   GET_MODE (*split)))
  && (next_nonnote_nondebug_insn (i2) == i3
- || ! use_crosses_set_p (*split, DF_INSN_LUID (i2)))
+ || !modified_between_p (*split, i2, i3))
  /* We can't overwrite I2DEST if its value is still used by
 NEWPAT.  */
  && ! reg_referenced_p (i2dest, newpat))
@@ -3982,8 +3981,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
   && GET_CODE (XVECEXP (newpat, 0, 1)) == SET
   && rtx_equal_p (SET_SRC (XVECEXP (newpat, 0, 1)),
   XEXP (SET_SRC (XVECEXP (newpat, 0, 0)), 0))
-  && ! use_crosses_set_p (SET_SRC (XVECEXP (newpat, 0, 1)),
-  DF_INSN_LUID (i2))
+  && !modified_between_p (SET_SRC (XVECEXP (newpat, 0, 1)), i2, i3)
   && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != ZERO_EXTRACT
   && GET_CODE (SET_DEST (XVECEXP (newpat, 0, 1))) != STRICT_LOW_PART
   && ! (temp_expr = SET_DEST (XVECEXP (newpat, 0, 1)),
@@ -4057,7 +4055,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
 be first.  The PARALLEL might also have been pre-existing in i3,
 so we need to make sure that we won't wrongly hoist a SET to i2
 that would conflict with a death note present in there.  */
-  if (!use_crosses_set_p (SET_SRC (set1), DF_INSN_LUID (i2))
+  if (!modified_between_p (SET_SRC (set1), i2, i3)
  && !(REG_P (SET_DEST (set1))
   && find_reg_note (i2, REG_DEAD, SET_DEST (set1)))
  && !(GET_CODE (SET_DEST (set1)) == SUBREG
@@ -4072,7 +4070,7 @@ try_combine (rtx_insn *i3, rtx_insn *i2, rtx_insn *i1, 
rtx_insn *i0,
  newi2pat = set1;
  newpat = set0;
}
-  else if (!use_crosses_set_p (SET_SRC (set0), DF_INSN_LUID (i2))
+  else if (!modified_between_p (SET_SRC (set0), i2, i3)
   && !(REG_P (SET_DEST (set0))
&& find_reg_n

generic retuning part 1 - x86-tune-costs update

2017-11-30 Thread Jan Hubicka
Hi,
this patch makes costs in generic to math better modern chips (core, haswell,
buldozer and zen).  The only important change is to drop cost of unaligned loads
and stores becuase all modern chips handle it well.  This makes vectorizer to
not peel for alignment and saves a lot of code size without sacrifying
performance.

I have benchmarked it on zen and skylake and it is small but almost consistent
win in performance too.  Notable regression is fma3d regressing aprox. 5%
on zen.  This is the case of native tuning as well, so I will look into it
incrementally.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

PR target/81616
* x86-tnue-costs.h (generic_cost): Revise for modern CPUs
* gcc.target/i386/l_fma_double_1.c: Update count of fma instructions.
* gcc.target/i386/l_fma_double_2.c: Update count of fma instructions.
* gcc.target/i386/l_fma_double_3.c: Update count of fma instructions.
* gcc.target/i386/l_fma_double_4.c: Update count of fma instructions.
* gcc.target/i386/l_fma_double_5.c: Update count of fma instructions.
* gcc.target/i386/l_fma_double_6.c: Update count of fma instructions.
* gcc.target/i386/l_fma_float_1.c: Update count of fma instructions.
* gcc.target/i386/l_fma_float_2.c: Update count of fma instructions.
* gcc.target/i386/l_fma_float_3.c: Update count of fma instructions.
* gcc.target/i386/l_fma_float_4.c: Update count of fma instructions.
* gcc.target/i386/l_fma_float_5.c: Update count of fma instructions.
* gcc.target/i386/l_fma_float_6.c: Update count of fma instructions.
Index: config/i386/x86-tune-costs.h
===
--- config/i386/x86-tune-costs.h(revision 255252)
+++ config/i386/x86-tune-costs.h(working copy)
@@ -2243,11 +2243,11 @@ struct processor_costs generic_cost = {
COSTS_N_INSNS (4),  /*   HI */
COSTS_N_INSNS (3),  /*   SI */
COSTS_N_INSNS (4),  /*   DI */
-   COSTS_N_INSNS (2)}, /*other */
+   COSTS_N_INSNS (4)}, /*other */
   0,   /* cost of multiply per each bit set */
-  {COSTS_N_INSNS (18), /* cost of a divide/mod for QI */
-   COSTS_N_INSNS (26), /*  HI */
-   COSTS_N_INSNS (42), /*  SI */
+  {COSTS_N_INSNS (16), /* cost of a divide/mod for QI */
+   COSTS_N_INSNS (22), /*  HI */
+   COSTS_N_INSNS (30), /*  SI */
COSTS_N_INSNS (74), /*  DI */
COSTS_N_INSNS (74)},/*  
other */
   COSTS_N_INSNS (1),   /* cost of movsx */
@@ -2275,13 +2275,13 @@ struct processor_costs generic_cost = {
   2, 3, 4, /* cost of moving XMM,YMM,ZMM register 
*/
   {6, 6, 6, 10, 15},   /* cost of loading SSE registers
   in 32,64,128,256 and 512-bit */
-  {10, 10, 10, 15, 20},/* cost of unaligned loads.  */
+  {6, 6, 6, 10, 15},   /* cost of unaligned loads.  */
   {6, 6, 6, 10, 15},   /* cost of storing SSE registers
   in 32,64,128,256 and 512-bit */
-  {10, 10, 10, 15, 20},/* cost of unaligned storess.  
*/
-  20, 20,  /* SSE->integer and integer->SSE moves 
*/
-  6, 6,/* Gather load static, per_elt. 
 */
-  6, 6,/* Gather store static, 
per_elt.  */
+  {6, 6, 6, 10, 15},   /* cost of unaligned storess.  */
+  6, 6,/* SSE->integer and 
integer->SSE moves */
+  18, 6,   /* Gather load static, per_elt.  */
+  18, 6,   /* Gather store static, per_elt.  */
   32,  /* size of l1 cache.  */
   512, /* size of l2 cache.  */
   64,  /* size of prefetch block */
@@ -2290,11 +2290,11 @@ struct processor_costs generic_cost = {
  value is increased to perhaps more appropriate value of 5.  */
   3,   /* Branch cost */
   COSTS_N_INSNS (3),   /* cost of FADD and FSUB insns.  */
-  COSTS_N_INSNS (3),   /* cost of FMUL instruction.  */
+  COSTS_N_INSNS (5),   /* cost of FMUL instruction.  */
   COSTS_N_INSNS (20),  /* cost of FDIV instruction.  */

Re: [PATCH] Improve __builtin_mul_overflow (uns, 1U << y, &res) (PR target/83210)

2017-11-30 Thread Jakub Jelinek
On Thu, Nov 30, 2017 at 10:26:33AM +0100, Richard Biener wrote:
> On Wed, 29 Nov 2017, Jakub Jelinek wrote:
> 
> > Hi!
> > 
> > Even if target has an umulv4_optab pattern like i?86 mul[lq]; jo,
> > if one argument is constant power of two, using two shifts, or
> > for multiplication by 2 just shl with js on previous value is faster
> > (though, for -Os mul[lq]; jo wins).
> > 
> > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> So this is because we don't go through the standard multiplication
> expander, right?

We sometimes go through the standard multiplication expander, sometimes
widening multiplication expander, sometimes highpart multiplication
expander, also sometimes with the arguments forced into registers because
we pick various subregs of them etc.  expand_mul_overflow already now has
a lots of cases.

>  But for icode == CODE_FOR_nothing we probably
> should do this and then emit the compare-and-jump?

The icode == CODE_FOR_nothing check is there because typically the
code when backend doesn't provide anything is larger or much larger, just
see how many different expansions there are.

The patch handles the stuff fully, i.e. emits two shifts plus
compare-and-jump, and does that for the power of 2 constants for
icode == CODE_FOR_nothing always and otherwise (i.e. i?86/x86_64)
only if not optimizing for size.

> And for constant multiplicators in general as well?

See the PR, I'm not convinced that is at least generally a win.

The case of two constant arguments should be handled earlier in gimple
optimizers, and case of one constant argument that is not power of two
means that either we'll compute the overflow by dividing by the constant
again (if that ends up really a divide, I'm pretty sure it is always a loss,
if it turns into multiplication, also, and if it is multiplication by some
large constant and shifts, we can end up with a really large code)
and compare against the original value.

> As you do it you assume that the targets optab is less efficient
> than two shifts and a compare but how do you know?  Shouldn't we
> do regular mult expansion and cost the resulting sequence against
> the one from the umulv expander?
> 
> That said, the patch looks x86 specific but done in generic code...

Well, right now x86 is the only target that has umulv4 patterns,
and from recent discussions with Segher and ARM folks it doesn't look like
that is going to change at least for power/arm/aarch64 any time soon.

So, not sure if it is worth to doing expensive and hard to maintain
expansion of both variants and pick the cheaper one right now, it is not
about correctness.  If some further targets add these patterns and the
current strategy doesn't work well there, we can reconsider.

Jakub


Re: generic retuning part 1 - x86-tune-costs update

2017-11-30 Thread Richard Biener
On Thu, Nov 30, 2017 at 10:40 AM, Jan Hubicka  wrote:
> Hi,
> this patch makes costs in generic to math better modern chips (core, haswell,
> buldozer and zen).  The only important change is to drop cost of unaligned 
> loads
> and stores becuase all modern chips handle it well.  This makes vectorizer to
> not peel for alignment and saves a lot of code size without sacrifying
> performance.
>
> I have benchmarked it on zen and skylake and it is small but almost consistent
> win in performance too.  Notable regression is fma3d regressing aprox. 5%
> on zen.  This is the case of native tuning as well, so I will look into it
> incrementally.
>
> Bootstrapped/regtested x86_64-linux, comitted.

The question is how we cost such things as store bandwith where IIRC
an unaligned store counts 'two' entries in the pipelines store buffers.
Likewise unaligned loads do usually still have a penalty.

What changed is that when the loads/stores happen to be aligned
using the unaligned instruction variant doesn't have a penalty.

So I'm not sure peeling for alignmend isn't a win, it just depends more
on the number of memory streams involved.

Richard.

> Honza
>
> PR target/81616
> * x86-tnue-costs.h (generic_cost): Revise for modern CPUs
> * gcc.target/i386/l_fma_double_1.c: Update count of fma instructions.
> * gcc.target/i386/l_fma_double_2.c: Update count of fma instructions.
> * gcc.target/i386/l_fma_double_3.c: Update count of fma instructions.
> * gcc.target/i386/l_fma_double_4.c: Update count of fma instructions.
> * gcc.target/i386/l_fma_double_5.c: Update count of fma instructions.
> * gcc.target/i386/l_fma_double_6.c: Update count of fma instructions.
> * gcc.target/i386/l_fma_float_1.c: Update count of fma instructions.
> * gcc.target/i386/l_fma_float_2.c: Update count of fma instructions.
> * gcc.target/i386/l_fma_float_3.c: Update count of fma instructions.
> * gcc.target/i386/l_fma_float_4.c: Update count of fma instructions.
> * gcc.target/i386/l_fma_float_5.c: Update count of fma instructions.
> * gcc.target/i386/l_fma_float_6.c: Update count of fma instructions.
> Index: config/i386/x86-tune-costs.h
> ===
> --- config/i386/x86-tune-costs.h(revision 255252)
> +++ config/i386/x86-tune-costs.h(working copy)
> @@ -2243,11 +2243,11 @@ struct processor_costs generic_cost = {
> COSTS_N_INSNS (4),  /*   HI */
> COSTS_N_INSNS (3),  /*   SI */
> COSTS_N_INSNS (4),  /*   DI */
> -   COSTS_N_INSNS (2)}, /*other */
> +   COSTS_N_INSNS (4)}, /*other */
>0,   /* cost of multiply per each bit set 
> */
> -  {COSTS_N_INSNS (18), /* cost of a divide/mod for QI */
> -   COSTS_N_INSNS (26), /*  HI */
> -   COSTS_N_INSNS (42), /*  SI */
> +  {COSTS_N_INSNS (16), /* cost of a divide/mod for QI */
> +   COSTS_N_INSNS (22), /*  HI */
> +   COSTS_N_INSNS (30), /*  SI */
> COSTS_N_INSNS (74), /*  DI */
> COSTS_N_INSNS (74)},/*  
> other */
>COSTS_N_INSNS (1),   /* cost of movsx */
> @@ -2275,13 +2275,13 @@ struct processor_costs generic_cost = {
>2, 3, 4, /* cost of moving XMM,YMM,ZMM 
> register */
>{6, 6, 6, 10, 15},   /* cost of loading SSE registers
>in 32,64,128,256 and 512-bit */
> -  {10, 10, 10, 15, 20},/* cost of unaligned loads.  
> */
> +  {6, 6, 6, 10, 15},   /* cost of unaligned loads.  */
>{6, 6, 6, 10, 15},   /* cost of storing SSE registers
>in 32,64,128,256 and 512-bit */
> -  {10, 10, 10, 15, 20},/* cost of unaligned storess. 
>  */
> -  20, 20,  /* SSE->integer and integer->SSE 
> moves */
> -  6, 6,/* Gather load static, 
> per_elt.  */
> -  6, 6,/* Gather store static, 
> per_elt.  */
> +  {6, 6, 6, 10, 15},   /* cost of unaligned storess.  */
> +  6, 6,/* SSE->integer and 
> integer->SSE moves */
> +  18, 6,   /* Gather load static, per_elt.  */
> +  18, 6,   /* Gather store static, per_elt.  */
>32,  

Re: [PATCH] Improve __builtin_mul_overflow (uns, 1U << y, &res) (PR target/83210)

2017-11-30 Thread Richard Biener
On Thu, 30 Nov 2017, Jakub Jelinek wrote:

> On Thu, Nov 30, 2017 at 10:26:33AM +0100, Richard Biener wrote:
> > On Wed, 29 Nov 2017, Jakub Jelinek wrote:
> > 
> > > Hi!
> > > 
> > > Even if target has an umulv4_optab pattern like i?86 mul[lq]; jo,
> > > if one argument is constant power of two, using two shifts, or
> > > for multiplication by 2 just shl with js on previous value is faster
> > > (though, for -Os mul[lq]; jo wins).
> > > 
> > > Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> > 
> > So this is because we don't go through the standard multiplication
> > expander, right?
> 
> We sometimes go through the standard multiplication expander, sometimes
> widening multiplication expander, sometimes highpart multiplication
> expander, also sometimes with the arguments forced into registers because
> we pick various subregs of them etc.  expand_mul_overflow already now has
> a lots of cases.
> 
> >  But for icode == CODE_FOR_nothing we probably
> > should do this and then emit the compare-and-jump?
> 
> The icode == CODE_FOR_nothing check is there because typically the
> code when backend doesn't provide anything is larger or much larger, just
> see how many different expansions there are.
> 
> The patch handles the stuff fully, i.e. emits two shifts plus
> compare-and-jump, and does that for the power of 2 constants for
> icode == CODE_FOR_nothing always and otherwise (i.e. i?86/x86_64)
> only if not optimizing for size.
> 
> > And for constant multiplicators in general as well?
> 
> See the PR, I'm not convinced that is at least generally a win.
> 
> The case of two constant arguments should be handled earlier in gimple
> optimizers, and case of one constant argument that is not power of two
> means that either we'll compute the overflow by dividing by the constant
> again (if that ends up really a divide, I'm pretty sure it is always a loss,
> if it turns into multiplication, also, and if it is multiplication by some
> large constant and shifts, we can end up with a really large code)
> and compare against the original value.
> 
> > As you do it you assume that the targets optab is less efficient
> > than two shifts and a compare but how do you know?  Shouldn't we
> > do regular mult expansion and cost the resulting sequence against
> > the one from the umulv expander?
> > 
> > That said, the patch looks x86 specific but done in generic code...
> 
> Well, right now x86 is the only target that has umulv4 patterns,
> and from recent discussions with Segher and ARM folks it doesn't look like
> that is going to change at least for power/arm/aarch64 any time soon.
> 
> So, not sure if it is worth to doing expensive and hard to maintain
> expansion of both variants and pick the cheaper one right now, it is not
> about correctness.  If some further targets add these patterns and the
> current strategy doesn't work well there, we can reconsider.

Ok, thanks for the elaborate answer.

Can you add a comment indicating that?

The patch is ok with that.

Richard.


Replying to an older patch ([PATCH] Fix failing test-case)

2017-11-30 Thread Sudakshina Das

https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01157.html

This patch fixed a test case switch-case-2.c. I am seeing 
switch-case-1.c failing on


arm-none-linux-gnueabihf:
FAIL: gcc.dg/tree-prof/switch-case-1.c scan-rtl-dump-times expand ";; 
basic block[^\\n]*count 2000" 1 (found 0 times)


aarch64-none-linux-gnu:
FAIL: gcc.dg/tree-prof/switch-case-1.c scan-rtl-dump-times expand ";; 
basic block[^\\n]*count 2000" 1 (found 2 times)


which looks pretty similar (also the same changes make it pass). Can you 
confirm?


Sudi


Re: [PATCH 1/7]: SVE: Add CLOBBER_HIGH expression

2017-11-30 Thread Alan Hayward

> On 27 Nov 2017, at 17:29, Jeff Law  wrote:
> 
> On 11/23/2017 04:11 AM, Alan Hayward wrote:
>> 
>>> On 22 Nov 2017, at 17:33, Jeff Law  wrote:
>>> 
>>> On 11/22/2017 04:31 AM, Alan Hayward wrote:
 
> On 21 Nov 2017, at 03:13, Jeff Law  wrote:
>> 
>>> 
>>> You might also look at TARGET_HARD_REGNO_CALL_PART_CLOBBERED.  I'd
>>> totally forgotten about it.  And in fact it seems to come pretty close
>>> to what you need…
>> 
>> Yes, some of the code is similar to the way
>> TARGET_HARD_REGNO_CALL_PART_CLOBBERED works. Both that code and the
>> CLOBBER expr code served as a starting point for writing the patch. The 
>> main difference
>> here, is that _PART_CLOBBERED is around all calls and is not tied to a 
>> specific Instruction,
>> it’s part of the calling abi. Whereas clobber_high is explicitly tied to 
>> an expression (tls_desc).
>> It meant there wasn’t really any opportunity to resume any existing code.
> Understood.  Though your first patch mentions that you're trying to
> describe partial preservation "around TLS calls". Presumably those are
> represented as normal insns, not call_insn.
> 
> That brings me back to Richi's idea of exposing a set of the low subreg
> to itself using whatever mode is wide enough to cover the neon part of
> the register.
> 
> That should tell the generic parts of the compiler that you're just
> clobbering the upper part and at least in theory you can implement in
> the aarch64 backend and the rest of the compiler should "just work"
> because that's the existing semantics of a subreg store.
> 
> The only worry would be if a pass tried to get overly smart and
> considered that kind of set a nop -- but I think I'd argue that's simply
> wrong given the semantics of a partial store.
> 
 
 So, the instead of using clobber_high(reg X), to use set(reg X, reg X).
 It’s something we considered, and then dismissed.
 
 The problem then is you are now using SET semantics on those registers, 
 and it
 would make the register live around the function, which might not be the 
 case.
 Whereas clobber semantics will just make the register dead - which is 
 exactly
 what we want (but only conditionally).
>>> ?!?  A set of the subreg is the *exact* semantics you want.  It says the
>>> low part is preserved while the upper part is clobbered across the TLS
>>> insns.
>>> 
>>> jeff
>> 
>> Consider where the TLS call is inside a loop. The compiler would normally 
>> want
>> to hoist that out of the loop. By adding a set(x,x) into the parallel of the 
>> tls_desc we
>> are now making x live across the loop, x is dependant on the value from the 
>> previous
>> iteration, and the tls_desc can no longer be hoisted.
> Hmm.  I think I see the problem you're trying to point out.  Let me
> restate it and see if you agree.
> 
> The low subreg set does clearly indicate the upper part of the SVE
> register is clobbered.  The problem is from a liveness standpoint the
> compiler is considering the low part live, even though it's a self-set.
> 

Yes.

> In fact, if that is the case, then a single TLS call (independent of a
> loop) would make the low part of the register globally live.  This
> should be testable.  Include one of these low part self sets on the
> existing TLS calls and compile a little test function and let's look at
> the liveness data.
> 


Ok, so I’ve put 
(set (reg:TI V0_REGNUM) (reg:TI V0_REGNUM))
(set (reg:TI V1_REGNUM) (reg:TI V1_REGNUM))
etc up to V31 in the UNSPEC_TLSDESC pattern in aarch64.md, using up all the 
neon registers.

In an ideal world, this change would do nothing as neon reg size is TI.

First I compiled up  sve_tls_preserve_1.c (Test1 below)

Without the SET changes, gcc does not backup any neon registers before the 
tlsdesccall (good).
With the SET changes, gcc backs up all the neon registers (not ideal).

Without the SET changes, dump logs give:

cse1:
;;  regs ever live   0 [x0] 30 [x30] 32 [v0] 33 [v1] 34 [v2] 66 [cc]

cprop1:
;; lr  in29 [x29] 31 [sp] 32 [v0] 33 [v1] 34 [v2] 64 [sfp] 65 [ap]
;; lr  use   29 [x29] 31 [sp] 32 [v0] 33 [v1] 34 [v2] 64 [sfp] 65 [ap]
;; lr  def   0 [x0] 30 [x30] 32 [v0] 66 [cc] 79 80 81 82 84 85 86 87 88 89
;; live  in  29 [x29] 31 [sp] 32 [v0] 33 [v1] 34 [v2] 64 [sfp] 65 [ap]
;; live  gen 0 [x0] 32 [v0] 78 79 80 81 82 83 84 85 86 87 88 89
;; live  kill30 [x30] 66 [cc]
;; lr  out   29 [x29] 31 [sp] 32 [v0] 64 [sfp] 65 [ap]
;; live  out 29 [x29] 31 [sp] 32 [v0] 64 [sfp] 65 [ap]

reload:
;;  regs ever live   0 [x0] 1 [x1] 2 [x2] 29 [x29] 30 [x30] 32 [v0] 33 [v1] 
34 [v2] 35 [v3] 36 [v4] 66 [cc]

With the set changes:

cse1:
;;  regs ever live   0 [x0] 30 [x30] 32 [v0] 33 [v1] 34 [v2] 35 [v3] 36 
[v4] 37 [v5] 38 [v6] 39 [v7] 40 [v8] 41 [v9] 42 [v10] 43 [v11] 44 [v12] 45 
[v13] 46 [v14] 47 [v15] 48 [v16] 49 [v17] 50 [

Re: Replying to an older patch ([PATCH] Fix failing test-case)

2017-11-30 Thread Sudakshina Das

On 30/11/17 11:03, Sudakshina Das wrote:

https://gcc.gnu.org/ml/gcc-patches/2017-10/msg01157.html

This patch fixed a test case switch-case-2.c. I am seeing
switch-case-1.c failing on

arm-none-linux-gnueabihf:
FAIL: gcc.dg/tree-prof/switch-case-1.c scan-rtl-dump-times expand ";;
basic block[^\\n]*count 2000" 1 (found 0 times)

aarch64-none-linux-gnu:
FAIL: gcc.dg/tree-prof/switch-case-1.c scan-rtl-dump-times expand ";;
basic block[^\\n]*count 2000" 1 (found 2 times)

which looks pretty similar (also the same changes make it pass). Can you
confirm?

Sudi


Put wrong email address earlier. And adding more people on cc


Re: [PATCH] final: Improve output for -dp and -fverbose-asm

2017-11-30 Thread Segher Boessenkool
On Wed, Nov 29, 2017 at 08:46:41PM -0700, Martin Sebor wrote:
> On 11/29/2017 04:13 PM, Segher Boessenkool wrote:
> >This improves the assembler output (for -dp and -fverbose-asm) in
> >several ways.  It always prints the insn_cost.  It does not print
> >"[length = NN]" but "[c=NN l=NN]", to save space.  It does not add one
> >to the instruction alternative number (everything else starts counting
> >those at 0, too).  And finally, it tries to keep things lined up in
> >columns a bit better.
> >
> >Tested on powerpc64-linux {-m32,-m64}; is this okay for trunk?
> 
> [c=NN l=NN] will be meaningless to those without some deeper
> insight into/experience with what's actually being printed.
> It might as well say [NN NN].  But such extreme terseness would

Length isn't printed on all targets, fwiw.

> seem to run counter to the documented purpose of -fverbose-asm
> to "Put extra commentary information in the generated assembly
> code to make it more readable."

The point is that [length = 12] takes up an awful lot of space.  The
output of -fverbose-asm alread suffers from information overload.

> Looking further in the manual I don't see the [length=NN] bit
> mentioned (nor does your patch add a mention of it) so while
> the meaning of [length=NN] isn't exactly obvious, using [l=NN]
> instead certainly won't make it easier to read.  Does the manual
> need updating?

Should -fverbose-asm print length (and cost) at all?  They should be
printed for -dp (which is for debugging the *compiler*), but are they
very useful for -fverbose-asm (which is primarily for debugging the
*user program*)?


Segher


Re: [Ping][PATCH v3] Fix Incorrect ASan global variables alignment on arm (PR sanitizer/81697)

2017-11-30 Thread Maxim Ostapenko

Hi Jakub, thanks for review.

I've fixed the issues you've pointed in review.
Regarding a testcase -- I've cooked a runtime test, but it shows FP on 
unpatched GCC version only when linking with Gold (because it strips 
redzones more aggressively).


This patch passes tests on arm-linux and x86_64-linux, bootstrap is in 
progress.


Thanks,
-Maxim

On 29/11/17 13:10, Jakub Jelinek wrote:

On Tue, Nov 28, 2017 at 10:04:51AM +0300, Maxim Ostapenko wrote:

(CC'ing Jakub and Ramana)

I would like to ping the following patch:
https://gcc.gnu.org/ml/gcc-patches/2017-10/msg02288.html
Fix Incorrect ASan global variables alignment on arm (PR sanitizer/81697)

-Maxim
gcc/ChangeLog:

2017-11-28  Maxim Ostapenko  

PR sanitizer/81697
* asan.c (asan_protect_global): Add new ignore_decl_rtl_set_p
parameter. Return true if ignore_decl_rtl_set_p is true and other
conditions are satisfied.
* asan.h (asan_protect_global): Add new parameter.
* varasm.c (categorize_decl_for_section): Pass true as second parameter
to asan_protect_global calls.

gcc/testsuite/ChangeLog:

2017-11-28  Maxim Ostapenko  

PR sanitizer/81697
* g++.dg/asan/global-alignment.C: New test.

diff --git a/gcc/asan.c b/gcc/asan.c
index d5128aa..78c3b60 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1605,7 +1605,7 @@ is_odr_indicator (tree decl)
 ASAN_RED_ZONE_SIZE bytes.  */
  
  bool

-asan_protect_global (tree decl)
+asan_protect_global (tree decl, bool ignore_decl_rtl_set_p)
  {
if (!ASAN_GLOBALS)
  return false;
@@ -1627,7 +1627,13 @@ asan_protect_global (tree decl)
|| DECL_THREAD_LOCAL_P (decl)
/* Externs will be protected elsewhere.  */
|| DECL_EXTERNAL (decl)
-  || !DECL_RTL_SET_P (decl)
+  /* PR sanitizer/81697: For architectures that use section anchors first
+call to asan_protect_global may occur before DECL_RTL (decl) is set.
+We should ignore DECL_RTL_SET_P then, because otherwise the first call
+to asan_protect_global will return FALSE and the following calls on the
+same decl after setting DECL_RTL (decl) will return TRUE and we'll end
+up with inconsistency at runtime.  */
+  || (!DECL_RTL_SET_P (decl) && !ignore_decl_rtl_set_p)

This part is fine.


/* Comdat vars pose an ABI problem, we can't know if
 the var that is selected by the linker will have
 padding or not.  */
@@ -1651,6 +1657,9 @@ asan_protect_global (tree decl)
|| is_odr_indicator (decl))
  return false;
  
+  if (ignore_decl_rtl_set_p)

+return true;

This isn't, because then you bypass checks that really don't care
about RTL, like:
  if (lookup_attribute ("weakref", DECL_ATTRIBUTES (decl)))
return false;

  if (!TARGET_SUPPORTS_ALIASES && asan_needs_local_alias (decl))
return false;

So, IMHO just wrap only the DECL_RTL/symbol related stuff in if
(!ignore_decl_rtl_set_p).  Perhaps even better in
if (!ignore_decl_rtl_set_p || DECL_RTL_SET_P (decl))
so that if the rtl is already set, you do the check anyway.


--- /dev/null
+++ b/gcc/testsuite/g++.dg/asan/global-alignment.C
@@ -0,0 +1,18 @@
+/* { dg-options "-fmerge-all-constants" } */
+/* { dg-do compile } */
+/* { dg-skip-if "" { *-*-* } { "-O0" } { "" } } */
+
+#include 
+#include 
+
+const char kRecoveryInstallString[] = "NEW";
+const char kRecoveryUpdateString[] = "UPDATE";
+const char kRecoveryUninstallationString[] = "UNINSTALL";
+
+const std::map kStringToRequestMap = {
+  {kRecoveryInstallString, 0},
+  {kRecoveryUpdateString, 0},
+  {kRecoveryUninstallationString, 0},
+};
+
+/* { dg-final { scan-assembler-times 
{\.section\s+\.rodata\n(?:(?!\.section).)*\.\w+\s+"NEW} 1 } } */

I don't really like the testcase.  The scaning for .rodata section is too
fragile, e.g. darwin will need something completely different, doesn't e.g.
powerpc use .sdata section instead, etc.

And, isn't std::map and std::string completely unnecessary?
I'd prefer a runtime test that shows the false positive without the patch,
preferrably in C, just with those const char arrays used by some C code
without any headers.


diff --git a/gcc/varasm.c b/gcc/varasm.c
index a139151..849eae0 100644
--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -6508,7 +6508,7 @@ categorize_decl_for_section (const_tree decl, int reloc)
else if (TREE_CODE (decl) == STRING_CST)
  {
if ((flag_sanitize & SANITIZE_ADDRESS)
- && asan_protect_global (CONST_CAST_TREE (decl)))
+ && asan_protect_global (CONST_CAST_TREE (decl), true))

This doesn't make sense to me.  asan_protect_global for STRING_CST doesn't
care about any RTL, nor -fsection-anchors puts them into any kind of
block.


/* or !flag_merge_constants */
  return SECCAT_RODATA;
else
@@ -6536,7 +6536,7 @@ categorize_decl_for_section (const_tree decl, int reloc)
ret = reloc == 1 ? SECCAT_DATA_REL_RO_LOCAL : SECCAT_DATA_REL_RO

RE: [patch] remove cilk-plus

2017-11-30 Thread Koval, Julia
Hi, here is the followup patch. Ok for trunk?

gcc/c-family/
* c-common.h (inv_list): Remove.

Thanks,
Julia

> -Original Message-
> From: Jeff Law [mailto:l...@redhat.com]
> Sent: Monday, November 27, 2017 6:50 PM
> To: Koval, Julia ; Joseph Myers
> 
> Cc: Jakub Jelinek ; GCC Patches  patc...@gcc.gnu.org>
> Subject: Re: [patch] remove cilk-plus
> 
> On 11/23/2017 02:45 AM, Koval, Julia wrote:
> > Sorry, I think in this version of this patch they are fixed.
> >
> >> -Original Message-
> >> From: Joseph Myers [mailto:jos...@codesourcery.com]
> >> Sent: Wednesday, November 22, 2017 6:23 PM
> >> To: Koval, Julia 
> >> Cc: Jeff Law ; Jakub Jelinek ; GCC
> >> Patches 
> >> Subject: RE: [patch] remove cilk-plus
> >>
> >> This patch version does not appear to address my comment that you're
> >> leaving behind comments in c-parser.c relating to Cilk array notations
> >> while removing the subsequent code.  (Or in one case actually reindenting
> >> the comment that is no longer relevant, rather than removing it.)
> >>
> >> --
> >> Joseph S. Myers
> >> jos...@codesourcery.com
> This version is fine to commit.
> 
> Can you also remove struct inv_list from c-family/c-common.h.  It was
> only used for array notation support.  You can include that in your main
> commit or as a separate follow-up.
> 
> We may well find other nits like inv_list.  I'm comfortable addressing
> those as we bump into them.
> 
> Thanks for taking care of this!
> 
> jeff


0001-cilk-followup.patch
Description: 0001-cilk-followup.patch


Re: [Ping][PATCH v3] Fix Incorrect ASan global variables alignment on arm (PR sanitizer/81697)

2017-11-30 Thread Jakub Jelinek
On Thu, Nov 30, 2017 at 02:38:25PM +0300, Maxim Ostapenko wrote:
> Hi Jakub, thanks for review.
> 
> I've fixed the issues you've pointed in review.
> Regarding a testcase -- I've cooked a runtime test, but it shows FP on
> unpatched GCC version only when linking with Gold (because it strips
> redzones more aggressively).

I think we can live with that.

> --- a/gcc/varasm.c
> +++ b/gcc/varasm.c
> @@ -6550,7 +6550,19 @@ categorize_decl_for_section (const_tree decl, int 
> reloc)
>   ret = reloc == 1 ? SECCAT_DATA_REL_RO_LOCAL : SECCAT_DATA_REL_RO;
>else if (reloc || flag_merge_constants < 2
>  || ((flag_sanitize & SANITIZE_ADDRESS)
> -&& asan_protect_global (CONST_CAST_TREE (decl
> +/* PR 81697: for architectures that use section anchors we
> +   need to ignore DECL_RTL_SET_P (decl) for string constants
> +   inside this asan_protect_global call because otherwise
> +   we'll wrongly put them into SECCAT_RODATA_MERGE_CONST
> +   section, set DECL_RTL (decl) later on and add DECL to
> +   protected globals via successive asan_protect_global
> +   calls.  In this scenario we'll end up with wrong
> +   alignment of these strings at runtime and possible ASan
> +   false positives.  */
> +&& asan_protect_global (CONST_CAST_TREE (decl),
> +use_object_blocks_p ()
> +  && use_blocks_for_decl_p (
> +   CONST_CAST_TREE (decl)

Formatting is too bad here.  && should go below use_object_block_p..
The opening ( should either go on the next line, like:
   use_object_blocks_p ()
   && use_blocks_for_decl_p
(CONST_CAST_TREE (decl)
or perhaps better just introduce a temporary somewhere:
   else if (VAR_P (decl))
 {
+  tree d = CONST_CAST_TREE (decl);
   if (bss_initializer_p (decl))
 ret = SECCAT_BSS;
and use d instead of CONST_CAST_TREE (decl) later?

Ok with those changes.

Jakub


Re: [patch] remove cilk-plus

2017-11-30 Thread Jakub Jelinek
On Thu, Nov 30, 2017 at 11:54:11AM +, Koval, Julia wrote:
> Hi, here is the followup patch. Ok for trunk?
> 
> gcc/c-family/
>   * c-common.h (inv_list): Remove.

Ok, thanks.

Jakub


Re: [C++ PATCH] Plural forms for count != eltscnt structured binding diagnostics

2017-11-30 Thread Nathan Sidwell

On 11/30/2017 04:18 AM, Jakub Jelinek wrote:


Here is an incremental patch to deal with that on top of the previously
posted patch, ok for trunk if it passes testing?  Note, not using inform_n
in this !tree_fits_uhwi_p case, because gettext doesn't recommend that for
negative values and the other values printed there (the very large ones) are
printed using hexadecimal where also I think human plural forms are rarely
useful.

2017-11-30  Jakub Jelinek  

* decl.c (cp_finish_decomp): Split up count != eltscnt and
!tree_fits_uhwi_p (tsize) error_at calls into error_n and inform_n
to handle plural forms properly.

* g++.dg/cpp1z/decomp3.C: Adjust for structured binding count
mismatch diagnostics split into error and warning with plural
forms.
* g++.dg/cpp1z/decomp10.C: Likewise.
* g++.dg/cpp1z/decomp32.C: Likewise.


Ok.


--- gcc/cp/decl.c.jj2017-11-30 09:44:19.0 +0100
+++ gcc/cp/decl.c   2017-11-30 09:57:44.539504854 +0100
@@ -7445,11 +7445,18 @@ cp_finish_decomp (tree decl, tree first,



+ inform_n (loc, eltscnt != (unsigned long) eltscnt
+? (eltscnt % 100) + 100 : eltscnt,


Is such elaboration with the modulo operator necessary? wouldn;t an 
arbitrary non-unity constant do.  (It took me a while to figure out what 
this was trying to do.  At least a comment?)


nathan

--
Nathan Sidwell


[PATCH] rs6000: Improve fusion assembler output

2017-11-30 Thread Segher Boessenkool
This improves the output for load and store fusion a little.  In most
cases it removes the comment output, because that makes the generated
assembler code hard to read, and equivalent info is available with -dp
anyway.  For the vector loads it puts the comment on the second insn,
where it doesn't interfere with other debug comments.

Mike, does this look good?  Or is there something I'm missing :-)

Tested on powerpc64-linux {-m32,-m64}.


Segher


2017-11-30  Segher Boessenkool  

* config/rs6000/rs6000-protos.h (emit_fusion_addis): Remove last two
parameters from prototype.
* config/rs6000/rs6000.c (emit_fusion_addis): Remove last two
parameters.  Don't print a comment.
(emit_fusion_gpr_load): Adjust.
(emit_fusion_load_store): Adjust.
* config/rs6000/rs6000.md (*fusion_p9__constant): Adjust.
* config/rs6000/vsx.md (two peepholes): Print the "vector load fusion"
comment on the second line.

gcc/testsuite/
* gcc.target/powerpc/fusion.c: Add -dp to options.  Adjust the expected
output.
* gcc.target/powerpc/fusion3.c: Ditto.
* gcc.target/powerpc/fusion4.c: Ditto.

---
 gcc/config/rs6000/rs6000-protos.h  |  2 +-
 gcc/config/rs6000/rs6000.c | 23 +--
 gcc/config/rs6000/rs6000.md|  2 +-
 gcc/config/rs6000/vsx.md   |  4 ++--
 gcc/testsuite/gcc.target/powerpc/fusion.c  |  4 ++--
 gcc/testsuite/gcc.target/powerpc/fusion3.c | 10 +-
 gcc/testsuite/gcc.target/powerpc/fusion4.c |  5 ++---
 7 files changed, 18 insertions(+), 32 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index f3f5f27..9a16a51 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -96,7 +96,7 @@ extern bool quad_address_p (rtx, machine_mode, bool);
 extern bool quad_load_store_p (rtx, rtx);
 extern bool fusion_gpr_load_p (rtx, rtx, rtx, rtx);
 extern void expand_fusion_gpr_load (rtx *);
-extern void emit_fusion_addis (rtx, rtx, const char *, const char *);
+extern void emit_fusion_addis (rtx, rtx);
 extern void emit_fusion_load_store (rtx, rtx, rtx, const char *);
 extern const char *emit_fusion_gpr_load (rtx, rtx);
 extern bool fusion_p9_p (rtx, rtx, rtx, rtx);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 2948f2a..9929a45 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -38689,16 +38689,10 @@ expand_fusion_gpr_load (rtx *operands)
sequence.  */
 
 void
-emit_fusion_addis (rtx target, rtx addis_value, const char *comment,
-  const char *mode_name)
+emit_fusion_addis (rtx target, rtx addis_value)
 {
   rtx fuse_ops[10];
-  char insn_template[80];
   const char *addis_str = NULL;
-  const char *comment_str = ASM_COMMENT_START;
-
-  if (*comment_str == ' ')
-comment_str++;
 
   /* Emit the addis instruction.  */
   fuse_ops[0] = target;
@@ -38778,9 +38772,7 @@ emit_fusion_addis (rtx target, rtx addis_value, const 
char *comment,
   if (!addis_str)
 fatal_insn ("Could not generate addis value for fusion", addis_value);
 
-  sprintf (insn_template, "%s\t\t%s %s, type %s", addis_str, comment_str,
-  comment, mode_name);
-  output_asm_insn (insn_template, fuse_ops);
+  output_asm_insn (addis_str, fuse_ops);
 }
 
 /* Emit a D-form load or store instruction that is the second instruction
@@ -38913,7 +38905,6 @@ emit_fusion_gpr_load (rtx target, rtx mem)
   rtx addr;
   rtx load_offset;
   const char *load_str = NULL;
-  const char *mode_name = NULL;
   machine_mode mode;
 
   if (GET_CODE (mem) == ZERO_EXTEND)
@@ -38929,25 +38920,21 @@ emit_fusion_gpr_load (rtx target, rtx mem)
   switch (mode)
 {
 case E_QImode:
-  mode_name = "char";
   load_str = "lbz";
   break;
 
 case E_HImode:
-  mode_name = "short";
   load_str = "lhz";
   break;
 
 case E_SImode:
 case E_SFmode:
-  mode_name = (mode == SFmode) ? "float" : "int";
   load_str = "lwz";
   break;
 
 case E_DImode:
 case E_DFmode:
   gcc_assert (TARGET_POWERPC64);
-  mode_name = (mode == DFmode) ? "double" : "long";
   load_str = "ld";
   break;
 
@@ -38956,7 +38943,7 @@ emit_fusion_gpr_load (rtx target, rtx mem)
 }
 
   /* Emit the addis instruction.  */
-  emit_fusion_addis (target, addis_value, "gpr load fusion", mode_name);
+  emit_fusion_addis (target, addis_value);
 
   /* Emit the D-form load instruction.  */
   emit_fusion_load_store (target, target, load_offset, load_str);
@@ -39225,7 +39212,7 @@ emit_fusion_p9_load (rtx reg, rtx mem, rtx tmp_reg)
   fusion_split_address (addr, &hi, &lo);
 
   /* Emit the addis instruction.  */
-  emit_fusion_addis (tmp_reg, hi, "power9 load fusion", GET_MODE_NAME (mode));
+  emit_fusion_addis (tmp_reg, hi);
 
   /* Emit the D-form load instruction.  */
   emit_fusion_load_store (reg, tmp_reg, lo, load_string);
@@ -39312

Re: [C++ PATCH] Plural forms for count != eltscnt structured binding diagnostics

2017-11-30 Thread Jakub Jelinek
On Thu, Nov 30, 2017 at 06:55:08AM -0500, Nathan Sidwell wrote:
> > --- gcc/cp/decl.c.jj2017-11-30 09:44:19.0 +0100
> > +++ gcc/cp/decl.c   2017-11-30 09:57:44.539504854 +0100
> > @@ -7445,11 +7445,18 @@ cp_finish_decomp (tree decl, tree first,
> 
> > + inform_n (loc, eltscnt != (unsigned long) eltscnt
> > +? (eltscnt % 100) + 100 : eltscnt,
> 
> Is such elaboration with the modulo operator necessary? wouldn;t an
> arbitrary non-unity constant do.  (It took me a while to figure out what
> this was trying to do.  At least a comment?)

For english it isn't needed of course.

I just followed:
https://www.gnu.org/savannah-checkouts/gnu/gettext/manual/html_node/Plural-forms.html

"About larger integer types, such as ‘uintmax_t’ or ‘unsigned long long’: they 
can be
handled by reducing the value to a range that fits in an ‘unsigned long’. Simply
casting the value to ‘unsigned long’ would not do the right thing, since it 
would
treat ULONG_MAX + 1 like zero, ULONG_MAX + 2 like singular, and the like. Here 
you
can exploit the fact that all mentioned plural form formulas eventually become 
periodic,
with a period that is a divisor of 100 (or 1000 or 100). So, when you 
reduce a large
value to another one in the range [100, 199] that ends in the same 6 
decimal
digits, you can assume that it will lead to the same plural form selection. 
This code
does this:

#include 
uintmax_t nbytes = ...;
printf (ngettext ("The file has %"PRIuMAX" byte.",
  "The file has %"PRIuMAX" bytes.",
  (nbytes > ULONG_MAX
   ? (nbytes % 100) + 100
   : nbytes)),
nbytes);"

I can surely add a comment about that.

Note the patch depends on the 
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02521.html
patch.

Jakub


Re: [patch] prevent .cfi_personality/.cfi_lsda on !dwarf eh configurations

2017-11-30 Thread Eric Botcazou
> 2017-11-30  Tristan Gingold  
> 
> * dwarf2out.c (dwarf2out_do_cfi_startproc): Only emit
> .cfi_personality or .cfi_lsda if the eh data format is dwarf2.

PR ada/81470 must be put on the first line to trigger xref in Bugzilla.

-- 
Eric Botcazou


Re: [Ping][PATCH v3] Fix Incorrect ASan global variables alignment on arm (PR sanitizer/81697)

2017-11-30 Thread Maxim Ostapenko
Ok, thanks, here is the patch I'm going to install after bootstrap 
completes.


On 30/11/17 14:54, Jakub Jelinek wrote:

On Thu, Nov 30, 2017 at 02:38:25PM +0300, Maxim Ostapenko wrote:

Hi Jakub, thanks for review.

I've fixed the issues you've pointed in review.
Regarding a testcase -- I've cooked a runtime test, but it shows FP on
unpatched GCC version only when linking with Gold (because it strips
redzones more aggressively).

I think we can live with that.


--- a/gcc/varasm.c
+++ b/gcc/varasm.c
@@ -6550,7 +6550,19 @@ categorize_decl_for_section (const_tree decl, int reloc)
ret = reloc == 1 ? SECCAT_DATA_REL_RO_LOCAL : SECCAT_DATA_REL_RO;
else if (reloc || flag_merge_constants < 2
   || ((flag_sanitize & SANITIZE_ADDRESS)
-  && asan_protect_global (CONST_CAST_TREE (decl
+  /* PR 81697: for architectures that use section anchors we
+ need to ignore DECL_RTL_SET_P (decl) for string constants
+ inside this asan_protect_global call because otherwise
+ we'll wrongly put them into SECCAT_RODATA_MERGE_CONST
+ section, set DECL_RTL (decl) later on and add DECL to
+ protected globals via successive asan_protect_global
+ calls.  In this scenario we'll end up with wrong
+ alignment of these strings at runtime and possible ASan
+ false positives.  */
+  && asan_protect_global (CONST_CAST_TREE (decl),
+  use_object_blocks_p ()
+&& use_blocks_for_decl_p (
+ CONST_CAST_TREE (decl)

Formatting is too bad here.  && should go below use_object_block_p..
The opening ( should either go on the next line, like:
   use_object_blocks_p ()
   && use_blocks_for_decl_p
(CONST_CAST_TREE (decl)
or perhaps better just introduce a temporary somewhere:
else if (VAR_P (decl))
  {
+  tree d = CONST_CAST_TREE (decl);
if (bss_initializer_p (decl))
  ret = SECCAT_BSS;
and use d instead of CONST_CAST_TREE (decl) later?

Ok with those changes.

Jakub





gcc/ChangeLog:

2017-11-30  Maxim Ostapenko  

	PR sanitizer/81697
	* asan.c (asan_protect_global): Add new ignore_decl_rtl_set_p
	parameter. Return true if ignore_decl_rtl_set_p is true and other
	conditions are satisfied.
	* asan.h (asan_protect_global): Add new parameter.
	* varasm.c (categorize_decl_for_section): Pass true as second parameter
	to asan_protect_global calls.

gcc/testsuite/ChangeLog:

2017-11-30  Maxim Ostapenko  

	PR sanitizer/81697
	* c-c++-common/asan/pr81697.c: New test.

diff --git a/gcc/asan.c b/gcc/asan.c
index ca5fceed..873687f 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -1605,7 +1605,7 @@ is_odr_indicator (tree decl)
ASAN_RED_ZONE_SIZE bytes.  */
 
 bool
-asan_protect_global (tree decl)
+asan_protect_global (tree decl, bool ignore_decl_rtl_set_p)
 {
   if (!ASAN_GLOBALS)
 return false;
@@ -1627,7 +1627,13 @@ asan_protect_global (tree decl)
   || DECL_THREAD_LOCAL_P (decl)
   /* Externs will be protected elsewhere.  */
   || DECL_EXTERNAL (decl)
-  || !DECL_RTL_SET_P (decl)
+  /* PR sanitizer/81697: For architectures that use section anchors first
+	 call to asan_protect_global may occur before DECL_RTL (decl) is set.
+	 We should ignore DECL_RTL_SET_P then, because otherwise the first call
+	 to asan_protect_global will return FALSE and the following calls on the
+	 same decl after setting DECL_RTL (decl) will return TRUE and we'll end
+	 up with inconsistency at runtime.  */
+  || (!DECL_RTL_SET_P (decl) && !ignore_decl_rtl_set_p)
   /* Comdat vars pose an ABI problem, we can't know if
 	 the var that is selected by the linker will have
 	 padding or not.  */
@@ -1651,14 +1657,18 @@ asan_protect_global (tree decl)
   || is_odr_indicator (decl))
 return false;
 
-  rtl = DECL_RTL (decl);
-  if (!MEM_P (rtl) || GET_CODE (XEXP (rtl, 0)) != SYMBOL_REF)
-return false;
-  symbol = XEXP (rtl, 0);
+  if (!ignore_decl_rtl_set_p || DECL_RTL_SET_P (decl))
+{
 
-  if (CONSTANT_POOL_ADDRESS_P (symbol)
-  || TREE_CONSTANT_POOL_ADDRESS_P (symbol))
-return false;
+  rtl = DECL_RTL (decl);
+  if (!MEM_P (rtl) || GET_CODE (XEXP (rtl, 0)) != SYMBOL_REF)
+	return false;
+  symbol = XEXP (rtl, 0);
+
+  if (CONSTANT_POOL_ADDRESS_P (symbol)
+	  || TREE_CONSTANT_POOL_ADDRESS_P (symbol))
+	return false;
+}
 
   if (lookup_attribute ("weakref", DECL_ATTRIBUTES (decl)))
 return false;
diff --git a/gcc/asan.h b/gcc/asan.h
index c82d4d9..885b47e 100644
--- a/gcc/asan.h
+++ b/gcc/asan.h
@@ -26,7 +26,7 @@ extern void asan_finish_file (void);
 extern rtx_insn *asan_emit_stac

Re: [C++ PATCH] Plural forms for count != eltscnt structured binding diagnostics

2017-11-30 Thread Jakub Jelinek
On Thu, Nov 30, 2017 at 01:01:58PM +0100, Jakub Jelinek wrote:
> I just followed:
> https://www.gnu.org/savannah-checkouts/gnu/gettext/manual/html_node/Plural-forms.html
> 
> "About larger integer types, such as ‘uintmax_t’ or ‘unsigned long long’: 
> they can be
> handled by reducing the value to a range that fits in an ‘unsigned long’. 
> Simply
> casting the value to ‘unsigned long’ would not do the right thing, since it 
> would
> treat ULONG_MAX + 1 like zero, ULONG_MAX + 2 like singular, and the like. 
> Here you
> can exploit the fact that all mentioned plural form formulas eventually 
> become periodic,
> with a period that is a divisor of 100 (or 1000 or 100). So, when you 
> reduce a large
> value to another one in the range [100, 199] that ends in the same 6 
> decimal
> digits, you can assume that it will lead to the same plural form selection. 
> This code
> does this:
> 
> #include 
> uintmax_t nbytes = ...;
> printf (ngettext ("The file has %"PRIuMAX" byte.",
>   "The file has %"PRIuMAX" bytes.",
>   (nbytes > ULONG_MAX
>? (nbytes % 100) + 100
>: nbytes)),
> nbytes);"
> 
> I can surely add a comment about that.
> 
> Note the patch depends on the 
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02521.html
> patch.

Though, looking at our *_n diagnostic routines, the n argument is actually
int there, not unsigned long for some reason.  So, either we should fix
that (seems ngettext has unsigned long), or the above would need to be
(unsigned HOST_WIDE_INT) (int) eltscnt != eltscnt || (int) eltscnt < 0 

Jakub


[Patch, fortran] PRs 82605, 82606 and 82622 - PDT problems

2017-11-30 Thread Paul Richard Thomas
This patch fixes the above PRs and the additional problems in comment
#1 of both 82606 and 82622.

For the main part, the patch consists of 'obvious' tweaks to the PDT
machinery. The exception to this is the chunk in
trans-array.c(set_loop_bounds), which is needed to handle
parameterized array components coming from trans-io.c. This is safe
because the code would have fallen through to gcc_unreachable
otherwise. If the info->end is present then this can be used.

Bootstrapped and regtested on FC23/x86_64 - OK for trunk?

I will commit tomorrow morning if there are no complaints in the meantime.

Regards

Paul

2017-11-30  Paul Thomas  

PR fortran/82605
* resolve.c (get_pdt_constructor): Initialize 'cons' to NULL.
(resolve_pdt): Correct typo in prior comment. Emit an error if
any parameters are deferred and the object is neither pointer
nor allocatable.

PR fortran/82606
* decl.c (gfc_get_pdt_instance): Continue if the parameter sym
is not present or has no name. Select the parameter by name
of component, rather than component order. Remove all the other
manipulations of 'tail' when building the pdt instance.
(gfc_match_formal_arglist): Emit and error if a star is picked
up in a PDT decl parameter list.

PR fortran/82622
* trans-array.c (set_loop_bounds): If a GFC_SS_COMPONENT has an
info->end, use it rather than falling through to
gcc_unreachable.
(structure_alloc_comps): Check that param->name is non-null
before comparing with the component name.
* trans-decl.c (gfc_get_symbol_decl): Do not use the static
initializer for PDT symbols.
(gfc_init_default_dt): Do nothing for PDT symbols.
* trans-io.c (transfer_array_component): Parameterized array
components use the descriptor ubound since the shape is not
available.

2017-11-30  Paul Thomas  

PR fortran/82605
* gfortran.dg/pdt_4.f03 : Incorporate the new error.

PR fortran/82606
* gfortran.dg/pdt_19.f03 : New test.
* gfortran.dg/pdt_21.f03 : New test.

PR fortran/82622
* gfortran.dg/pdt_20.f03 : New test.
* gfortran.dg/pdt_22.f03 : New test.
Index: gcc/fortran/decl.c
===
*** gcc/fortran/decl.c  (revision 255249)
--- gcc/fortran/decl.c  (working copy)
*** gfc_get_pdt_instance (gfc_actual_arglist
*** 3250,3255 
--- 3250,3258 
name_seen = true;
param = type_param_name_list->sym;
  
+   if (!param || !param->name)
+   continue;
+ 
c1 = gfc_find_component (pdt, param->name, false, true, NULL);
/* An error should already have been thrown in resolve.c
 (resolve_fl_derived0).  */
*** gfc_get_pdt_instance (gfc_actual_arglist
*** 3406,3414 
--- 3409,3427 
for (; c1; c1 = c1->next)
  {
gfc_add_component (instance, c1->name, &c2);
+ 
c2->ts = c1->ts;
c2->attr = c1->attr;
  
+   /* The order of declaration of the type_specs might not be the
+same as that of the components.  */
+   if (c1->attr.pdt_kind || c1->attr.pdt_len)
+   {
+ for (tail = type_param_spec_list; tail; tail = tail->next)
+   if (strcmp (c1->name, tail->name) == 0)
+ break;
+   }
+ 
/* Deal with type extension by recursively calling this function
 to obtain the instance of the extended type.  */
if (gfc_current_state () != COMP_DERIVED
*** gfc_get_pdt_instance (gfc_actual_arglist
*** 3453,3464 
}
  instance->attr.extension = c2->ts.u.derived->attr.extension + 1;
  
- /* Advance the position in the spec list by the number of
-parameters in the extended type.  */
- tail = type_param_spec_list;
- for (f = c1->ts.u.derived->formal; f && f->next; f = f->next)
-   tail = tail->next;
- 
  continue;
}
  
--- 3466,3471 
*** gfc_get_pdt_instance (gfc_actual_arglist
*** 3509,3516 
  
  if (!c2->initializer && c1->initializer)
c2->initializer = gfc_copy_expr (c1->initializer);
- 
- tail = tail->next;
}
  
/* Copy the array spec.  */
--- 3516,3521 
*** gfc_match_formal_arglist (gfc_symbol *pr
*** 5944,5955 
if (gfc_match_char ('*') == MATCH_YES)
{
  sym = NULL;
! if (!gfc_notify_std (GFC_STD_F95_OBS, "Alternate-return argument "
!  "at %C"))
{
  m = MATCH_ERROR;
  goto cleanup;
}
}
else
{
--- 5949,5962 
if (gfc_match_char ('*') == MATCH_YES)
{
  sym = NULL;
! if (!typeparam && !gfc_notify_std (GFC_STD_F95_OBS,
!"Alternate-return argument at %C"))
{
  m = MATCH_ERROR;
  goto cleanup;
}
+ else if (typeparam)
+   

Re: [PATCH GCC][V2]A simple implementation of loop interchange

2017-11-30 Thread Richard Biener
On Tue, Nov 28, 2017 at 4:26 PM, Bin Cheng  wrote:
> Hi,
> This is updated patch with review comments resolved.  Some explanation 
> embedded below.
>
> On Mon, Nov 20, 2017 at 2:46 PM, Richard Biener  
> wrote:
>> On Thu, Nov 16, 2017 at 4:18 PM, Bin.Cheng  wrote:
>>> On Tue, Oct 24, 2017 at 3:30 PM, Michael Matz  wrote:
 Hello,

 On Fri, 22 Sep 2017, Bin.Cheng wrote:

> This is updated patch for loop interchange with review suggestions
> resolved.  Changes are:
>   1) It does more light weight checks like rectangle loop nest check
> earlier than before.
>   2) It checks profitability of interchange before data dependence 
> computation.
>   3) It calls find_data_references_in_loop only once for a loop nest now.
>   4) Data dependence is open-computed so that we can skip instantly at
> unknown dependence.
>   5) It improves code generation in mapping induction variables for
> loop nest, as well as
>  adding a simple dead code elimination pass.
>   6) It changes magic constants into parameters.

 So I have a couple comments/questions.  Something stylistic:
>>> Hi Michael,
>>> Thanks for reviewing.
>>>

> +class loop_cand
> +{
> +public:
> ...
> +  friend class tree_loop_interchange;
> +private:

 Just make this all public (and hence a struct, not class).
 No need for friends in file local classes.
>>> Done.
>>>

> +single_use_in_loop (tree var, struct loop *loop)
> ...
> +  FOR_EACH_IMM_USE_FAST (use_p, iterator, var)
> +{
> +  stmt = USE_STMT (use_p);
> ...
> +  basic_block bb = gimple_bb (stmt);
> +  gcc_assert (bb != NULL);

 This pattern reoccurs often in your patch: you check for a bb associated
 for a USE_STMT.  Uses of SSA names always occur in basic blocks, no need
 for checking.
>>> Done.
>>>

 Then, something about your handling of simple reductions:

> +void
> +loop_cand::classify_simple_reduction (reduction_p re)
> +{
> ...
> +  /* Require memory references in producer and consumer are the same so
> + that we can undo reduction during interchange.  */
> +  if (re->init_ref && !operand_equal_p (re->init_ref, re->fini_ref, 0))
> +return;

 Where is it checked that the undoing transformation is legal also
 from a data dep point of view?  Think code like this:

sum = X[i];
for (j ...)
  sum += X[j];
X[i] = sum;

 Moving the store into the inner loop isn't always correct and I don't seem
 to find where the above situation is rejected.
>>> Yeah.  for the old patch, it's possible to have such loop wrongly 
>>> interchanged;
>>> in practice, it's hard to create an example.  The pass will give up
>>> when computing
>>> data dep between references in inner/outer loops.  In this updated
>>> patch, it's fixed
>>> by giving up if there is any dependence between references of inner/outer 
>>> loops.
>>>

 Maybe I'm confused because I also don't see where you even can get into
 the above situation (though I do see testcases about this).  The thing is,
 for an 2d loop nest to contain something like the above reduction it can't
 be perfect:

for (j) {
  int sum = X[j];  // 1
  for (i)
sum += Y[j][i];
  X[j] = sum;  // 2
}

 But you do check for perfectness in proper_loop_form_for_interchange and
 prepare_perfect_loop_nest, so either you can't get into the situation or
 the checking can't be complete, or you define the above to be perfect
 nevertheless (probably because the load and store are in outer loop
 header/exit blocks?).  The latter would mean that you accept also other
 code in header/footer of loops from a pure CFG perspective, so where is it
 checked that that other code (which aren't simple reductions) isn't
 harmful to the transformation?
>>> Yes, I used the name perfect loop nest, but the pass can handle special form
>>> imperfect loop nest for the simple reduction.  I added comments describing
>>> this before function prepare_perfect_loop_nest.
>>>

 Then, the data dependence part of the new pass:

> +bool
> +tree_loop_interchange::valid_data_dependences (unsigned inner, unsigned 
> outer)
> +{
> +  struct data_dependence_relation *ddr;
> +
> +  for (unsigned i = 0; ddrs.iterate (i, &ddr); ++i)
> +{
> +  /* Skip no-dependence case.  */
> +  if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
> + continue;
> +
> +  for (unsigned j = 0; j < DDR_NUM_DIR_VECTS (ddr); ++j)
> + {
> +   lambda_vector dist_vect = DDR_DIST_VECT (ddr, j);
> +   unsigned level = dependence_level (dist_vect, loop_nest.length 
> ());
> +
> +   /* If there is no carried dependence.  */
> + 

Re: [C++ PATCH] Plural forms for count != eltscnt structured binding diagnostics

2017-11-30 Thread Jakub Jelinek
On Thu, Nov 30, 2017 at 01:33:48PM +0100, Jakub Jelinek wrote:
> On Thu, Nov 30, 2017 at 01:01:58PM +0100, Jakub Jelinek wrote:
> > I just followed:
> > https://www.gnu.org/savannah-checkouts/gnu/gettext/manual/html_node/Plural-forms.html
> > 
> > "About larger integer types, such as ‘uintmax_t’ or ‘unsigned long long’: 
> > they can be
> > handled by reducing the value to a range that fits in an ‘unsigned long’. 
> > Simply
> > casting the value to ‘unsigned long’ would not do the right thing, since it 
> > would
> > treat ULONG_MAX + 1 like zero, ULONG_MAX + 2 like singular, and the like. 
> > Here you
> > can exploit the fact that all mentioned plural form formulas eventually 
> > become periodic,
> > with a period that is a divisor of 100 (or 1000 or 100). So, when you 
> > reduce a large
> > value to another one in the range [100, 199] that ends in the same 
> > 6 decimal
> > digits, you can assume that it will lead to the same plural form selection. 
> > This code
> > does this:
> > 
> > #include 
> > uintmax_t nbytes = ...;
> > printf (ngettext ("The file has %"PRIuMAX" byte.",
> >   "The file has %"PRIuMAX" bytes.",
> >   (nbytes > ULONG_MAX
> >? (nbytes % 100) + 100
> >: nbytes)),
> > nbytes);"
> > 
> > I can surely add a comment about that.
> > 
> > Note the patch depends on the 
> > https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02521.html
> > patch.
> 
> Though, looking at our *_n diagnostic routines, the n argument is actually
> int there, not unsigned long for some reason.  So, either we should fix
> that (seems ngettext has unsigned long), or the above would need to be
> (unsigned HOST_WIDE_INT) (int) eltscnt != eltscnt || (int) eltscnt < 0 

Of course better eltscnt > INT_MAX

Jakub


Re: RFC: Variable-length VECTOR_CSTs

2017-11-30 Thread Richard Sandiford
Richard Sandiford  writes:
> Richard Biener  writes:
>> On Wed, Nov 29, 2017 at 12:57 PM, Richard Sandiford
>>  wrote:
>>> It was clear from the SVE reviews that people were unhappy with how
>>> "special" the variable-length case was.  One particular concern was
>>> the use of VEC_DUPLICATE_CST and VEC_SERIES_CST, and the way that
>>> that would in turn lead to different representations of VEC_PERM_EXPRs
>>> with constant permute vectors, and difficulties in invoking
>>> vec_perm_const_ok.
>>>
>>> This is an RFC for a VECTOR_CST representation that treats each
>>> specific constant as one instance of an arbitrary-length sequence.
>>> The reprensentation then extends to variable-length vectors in a
>>> natural way.
>>>
>>> As discussed on IRC, if a vector contains X*N elements for some
>>> constant N and integer X>0, the main features we need are:
>>>
>>> 1) the ability to represent a sequence that duplicates N values
>>>
>>>This is useful for SLP invariants.
>>>
>>> 2) the ability to represent a sequence that starts with N values and
>>>is followed by zeros
>>>
>>>This is useful for the initial value in a double or SLP reduction
>>>
>>> 3) the ability to represent N interleaved series
>>>
>>>This is useful for SLP inductions and for VEC_PERM_EXPRs.
>>>
>>> For (2), zero isn't necessarily special, since vectors used in an AND
>>> reduction might need to fill with ones.  Also, we might need up to N
>>> different fill values with mixed SLP operations; it isn't necessarily
>>> safe to assume that a single fill value will always be enough.
>>>
>>> The same goes for (3): there's no reason in principle why the
>>> steps in an SLP induction should all be the same (although they
>>> do need to be at the moment).  E.g. once we support SLP on:
>>>
>>>   for (unsigned int i = 0; i < n; i += 2)
>>> {
>>>   x[i] += 4 + i;
>>>   x[i + 1] += 11 + i * 3;
>>> }
>>>
>>> we'll need {[4, 14], +, [2, 6]}.
>>>
>>> So the idea is to represent vectors as P interleaved patterns of the form:
>>>
>>>   [BASE0, BASE1, BASE1 + STEP, BASE1 + STEP*2, ...]
>>>
>>> where the STEP is always zero (actually null) for non-integer vectors.
>>> This is effectively projecting a "foreground" value of P elements
>>> onto an arbitrary-length "background" sequenece, where the background
>>> sequence contains P parallel linear series.
>>>
>>> E.g. to pick an extreme and unlikely example,
>>>
>>>   [42, 99, 2, 20, 3, 30, 4, 40, ...]
>>>
>>> has 2 patterns:
>>>
>>>   BASE0 = 42, BASE1 = 2, STEP = 1
>>>   BASE0 = 99, BASE1 = 20, STEP = 10
>>>
>>> The more useful cases are degenerate versions of this general case.
>>>
>>> As far as memory consumption goes: the number of patterns needed for a
>>> fixed-length vector with 2*N elements is always at most N; in the worst
>>> case, we simply interleave the first N elements with the second N elements.
>>> The worst-case increase in footprint is therefore N trees for the steps.
>>> In practice the footprint is usually smaller than it was before, since
>>> most constants do have a pattern.
>>>
>>> The patch below implements this for trees.  I have patches to use the
>>> same style of encoding for CONST_VECTOR and vec_perm_indices, but the
>>> tree one is probably easiest to read.
>>>
>>> The patch only adds the representation.  Follow-on patches make more
>>> use of it (and usually make things simpler; e.g. integer_zerop is no
>>> longer a looping operation).
>>>
>>> Does this look better?
>>
>> Yes, the overall design looks good.  I wonder why you chose to have
>> the number of patterns being a power of two?  I suppose this is
>> to have the same number of elements from all patterns in the final
>> vector (which is power-of-two sized)?
>
> Right.  The rtl and vec_perm_indices parts don't have this restriction,
> since some ports do define non-power-of-2 vectors for internal use.
> The problem is that, since VECTOR_CSTs are used by the FE, we need
> to support all valid vector lengths without blowing the 16-bit field.
> Using the same style of representation as TYPE_VECTOR_SUBPARTS seemed
> like the safest way of doing that.
>
>> I wonder if there exists a vector where say a three-pattern
>> interleaving would be smaller than a four-pattern one?
>
> Only in the non-power-of-2 case.
>
>> Given you add flags for various purposes would it make sense to
>> overload 'step' with a regular element to avoid the storage increase
>> in case step is unnecessary?  This makes it have three elements
>> which is of course awkward :/
>
> I wondered about keeping it as an array of trees and tacking the
> steps onto the end as an optional addition.  But the idea is that
> tree_vector_pattern becomes the preferred way of handling constant
> vectors, if it can be used, so it seemed neater to use in the tree
> node too.

In the end it seemed better to encode the first NPATTERNS * N
elements of the vector, where:

- N==3 if at least one pattern needs a step; otherwise
- N==2 if at least one pattern has d

[PATCH] PR libstdc++/83226 avoid forming pointer-to-reference type

2017-11-30 Thread Jonathan Wakely

We need to strip references from the key_type and mapped_type before
forming pointers to them.

PR libstdc++/83226
* include/bits/node_handle.h (_Node_handle::__pointer): Avoid forming
pointer-to-reference types.
* testsuite/23_containers/map/modifiers/insert/83226.cc: New test.

Tested powerpc64le-linux, committed to trunk. Backport to gcc-7 to
follow.

commit 9d17e1a6bc20ac7d718930b00dc9858295bc6a09
Author: Jonathan Wakely 
Date:   Thu Nov 30 13:00:05 2017 +

PR libstdc++/83226 avoid forming pointer-to-reference type

PR libstdc++/83226
* include/bits/node_handle.h (_Node_handle::__pointer): Avoid 
forming
pointer-to-reference types.
* testsuite/23_containers/map/modifiers/insert/83226.cc: New test.

diff --git a/libstdc++-v3/include/bits/node_handle.h 
b/libstdc++-v3/include/bits/node_handle.h
index 0d8dbeb4110..7f109ada6f1 100644
--- a/libstdc++-v3/include/bits/node_handle.h
+++ b/libstdc++-v3/include/bits/node_handle.h
@@ -199,7 +199,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   }
 
   template
-   using __pointer = __ptr_rebind;
+   using __pointer
+ = __ptr_rebind>;
 
   __pointer<_Key>  _M_pkey = nullptr;
   __pointer  _M_pmapped = nullptr;
diff --git a/libstdc++-v3/testsuite/23_containers/map/modifiers/insert/83226.cc 
b/libstdc++-v3/testsuite/23_containers/map/modifiers/insert/83226.cc
new file mode 100644
index 000..07afa5cbd87
--- /dev/null
+++ b/libstdc++-v3/testsuite/23_containers/map/modifiers/insert/83226.cc
@@ -0,0 +1,30 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-options "-std=gnu++17" }
+// { dg-do compile { target c++17 } }
+
+#include 
+
+void
+test01()
+{
+  int i = 0;
+  std::map m;
+  std::pair p{1, i};
+  m.insert(p); // PR libstdc++/83226
+}


Re: [PATCH GCC][V2]A simple implementation of loop interchange

2017-11-30 Thread Richard Biener
On Thu, Nov 30, 2017 at 2:01 PM, Richard Biener
 wrote:
> On Tue, Nov 28, 2017 at 4:26 PM, Bin Cheng  wrote:
>> Hi,
>> This is updated patch with review comments resolved.  Some explanation 
>> embedded below.
>>
>> On Mon, Nov 20, 2017 at 2:46 PM, Richard Biener  
>> wrote:
>>> On Thu, Nov 16, 2017 at 4:18 PM, Bin.Cheng  wrote:
 On Tue, Oct 24, 2017 at 3:30 PM, Michael Matz  wrote:
> Hello,
>
> On Fri, 22 Sep 2017, Bin.Cheng wrote:
>
>> This is updated patch for loop interchange with review suggestions
>> resolved.  Changes are:
>>   1) It does more light weight checks like rectangle loop nest check
>> earlier than before.
>>   2) It checks profitability of interchange before data dependence 
>> computation.
>>   3) It calls find_data_references_in_loop only once for a loop nest now.
>>   4) Data dependence is open-computed so that we can skip instantly at
>> unknown dependence.
>>   5) It improves code generation in mapping induction variables for
>> loop nest, as well as
>>  adding a simple dead code elimination pass.
>>   6) It changes magic constants into parameters.
>
> So I have a couple comments/questions.  Something stylistic:
 Hi Michael,
 Thanks for reviewing.

>
>> +class loop_cand
>> +{
>> +public:
>> ...
>> +  friend class tree_loop_interchange;
>> +private:
>
> Just make this all public (and hence a struct, not class).
> No need for friends in file local classes.
 Done.

>
>> +single_use_in_loop (tree var, struct loop *loop)
>> ...
>> +  FOR_EACH_IMM_USE_FAST (use_p, iterator, var)
>> +{
>> +  stmt = USE_STMT (use_p);
>> ...
>> +  basic_block bb = gimple_bb (stmt);
>> +  gcc_assert (bb != NULL);
>
> This pattern reoccurs often in your patch: you check for a bb associated
> for a USE_STMT.  Uses of SSA names always occur in basic blocks, no need
> for checking.
 Done.

>
> Then, something about your handling of simple reductions:
>
>> +void
>> +loop_cand::classify_simple_reduction (reduction_p re)
>> +{
>> ...
>> +  /* Require memory references in producer and consumer are the same so
>> + that we can undo reduction during interchange.  */
>> +  if (re->init_ref && !operand_equal_p (re->init_ref, re->fini_ref, 0))
>> +return;
>
> Where is it checked that the undoing transformation is legal also
> from a data dep point of view?  Think code like this:
>
>sum = X[i];
>for (j ...)
>  sum += X[j];
>X[i] = sum;
>
> Moving the store into the inner loop isn't always correct and I don't seem
> to find where the above situation is rejected.
 Yeah.  for the old patch, it's possible to have such loop wrongly 
 interchanged;
 in practice, it's hard to create an example.  The pass will give up
 when computing
 data dep between references in inner/outer loops.  In this updated
 patch, it's fixed
 by giving up if there is any dependence between references of inner/outer 
 loops.

>
> Maybe I'm confused because I also don't see where you even can get into
> the above situation (though I do see testcases about this).  The thing is,
> for an 2d loop nest to contain something like the above reduction it can't
> be perfect:
>
>for (j) {
>  int sum = X[j];  // 1
>  for (i)
>sum += Y[j][i];
>  X[j] = sum;  // 2
>}
>
> But you do check for perfectness in proper_loop_form_for_interchange and
> prepare_perfect_loop_nest, so either you can't get into the situation or
> the checking can't be complete, or you define the above to be perfect
> nevertheless (probably because the load and store are in outer loop
> header/exit blocks?).  The latter would mean that you accept also other
> code in header/footer of loops from a pure CFG perspective, so where is it
> checked that that other code (which aren't simple reductions) isn't
> harmful to the transformation?
 Yes, I used the name perfect loop nest, but the pass can handle special 
 form
 imperfect loop nest for the simple reduction.  I added comments describing
 this before function prepare_perfect_loop_nest.

>
> Then, the data dependence part of the new pass:
>
>> +bool
>> +tree_loop_interchange::valid_data_dependences (unsigned inner, unsigned 
>> outer)
>> +{
>> +  struct data_dependence_relation *ddr;
>> +
>> +  for (unsigned i = 0; ddrs.iterate (i, &ddr); ++i)
>> +{
>> +  /* Skip no-dependence case.  */
>> +  if (DDR_ARE_DEPENDENT (ddr) == chrec_known)
>> + continue;
>> +
>> +  for (unsigned j = 0; j < DDR_NUM_DIR_VECTS (ddr); ++j)
>> + {
>> +   lambda_vector dist_v

Re: [patch] prevent .cfi_personality/.cfi_lsda on !dwarf eh configurations

2017-11-30 Thread Olivier Hainque

> On 30 Nov 2017, at 13:04, Eric Botcazou  wrote:
> 
>> 2017-11-30  Tristan Gingold  
>> 
>>* dwarf2out.c (dwarf2out_do_cfi_startproc): Only emit
>>.cfi_personality or .cfi_lsda if the eh data format is dwarf2.
> 
> PR ada/81470 must be put on the first line to trigger xref in Bugzilla.

Oh, indeed. Will add if the patch gets approved. Thanks!



Re: [PATCH GCC][V2]A simple implementation of loop interchange

2017-11-30 Thread Bin.Cheng
On Thu, Nov 30, 2017 at 1:01 PM, Richard Biener
 wrote:
> On Tue, Nov 28, 2017 at 4:26 PM, Bin Cheng  wrote:
>> Hi,
>> This is updated patch with review comments resolved.  Some explanation 
>> embedded below.
>>
>>> +
>>> +  iloop->nb_iterations = nb_iterations;
>>>
>>> use std::swap?  Also I think if you can keep nb_iterations you
>>> can certainly keep the upper bounds.  You're probably
>>> afraid of the ->stmt references in the nb_iter_bound entries?
>>>
>>> Anyway, either scrap everything or try to keep everything.
>> Yeah, not only the stmts, but also the control_iv information because the 
>> SCEV
>> information may be corrupted during code transformation.
>> Now I discarded all the information.
>
> Note that given you interchange the loops but not the CFG or the loop 
> structures
> you might want to swap loop->num and flags like ->force_vectorize.  That is,
> essentially change the ->header/latch association (and other CFG related stuff
> like recorded exits).
>
> It might also be we want to / need to disable interchange for, say,
> ->force_vectorize
> inner loops or ->unroll != 0?  Or we need to clear them, maybe
> optionally diagnosing
> that fact.
>
> At least we need to think about what it means to preserve loop
> structure (semantically,
> loop->num should maintain association to the same source-level loop
> throughout the
> compilation) for transforms like interchange.
>
>>>
>>> +  for (i = 0; oloop.reductions.iterate (i, &re); ++i)
>>> +{
>>> +  if (re->type != DOUBLE_RTYPE)
>>> +   gcc_unreachable ();
>>> +
>>> +  use_operand_p use_p;
>>> +  imm_use_iterator iterator;
>>> +  FOR_EACH_IMM_USE_FAST (use_p, iterator, re->var)
>>> +   mark_or_remove_dbg_stmt (USE_STMT (use_p), re->var);
>>> +  FOR_EACH_IMM_USE_FAST (use_p, iterator, re->next)
>>> +   mark_or_remove_dbg_stmt (USE_STMT (use_p), re->next);
>>> +  if (TREE_CODE (re->init) == SSA_NAME)
>>> +   {
>>> + FOR_EACH_IMM_USE_FAST (use_p, iterator, re->init)
>>> +   mark_or_remove_dbg_stmt (USE_STMT (use_p), re->init);
>>> +   }
>>>
>>> can you add a comment what you are doing here?
>>>
>>> Note that other loop opts simply scrap all debug stmts ...
>> As mentioned above, updated patch doesn't try hard to maintain debug use 
>> info any more.
>>
>>>
>>> +static void
>>> +compute_access_stride (struct loop *loop_nest, struct loop *loop,
>>> +  data_reference_p dr)
>>> +{
>>> ...
>>> +  tree ref = DR_REF (dr);
>>> +  tree scev_base = build_fold_addr_expr (ref);
>>> +  tree access_size = TYPE_SIZE_UNIT (TREE_TYPE (ref));
>>> +  tree niters = build_int_cst (sizetype, AVG_LOOP_NITER);
>>> +  access_size = fold_build2 (MULT_EXPR, sizetype, niters, access_size);
>>> +
>>> +  do {
>>> +tree scev_fn = analyze_scalar_evolution (loop, scev_base);
>>> +if (chrec_contains_undetermined (scev_fn)
>>> +   || chrec_contains_symbols_defined_in_loop (scev_fn, loop->num))
>>> +  break;
>>> ...
>>> +strides->safe_push (scev_step);
>>> +  } while (loop != loop_nest && (loop = loop_outer (loop)) != NULL);
>>> +
>>>
>>> I _think_ you want to do
>>>
>>>scev_fn = analyze_scalar_evolution (loop, scev_base); // assuming
>>> DR_STMT (dr) is in loop
>>>scev_fn = instantiate_parameters (nest, scev_fn);
>>>if (chrec_contains_undetermined (scev_fn))
>>>  return; // false?
>>>
>>> and analyze the result which should be of the form
>>>
>>>   { { { init, +, step1 }_1, +, step2 }_2, + , step3 }_3 ...
>>>
>>> if canonical.  I think estimate_val_by_simplify_replace isn't needed
>>> if you do that
>>> (it also looks odd to replace all vairables in step by niter...).
>> I replied on this in previous message, instantiate_parameters doesn't always
>> give canonical form result as expected.  The loop here could be seen as a
>> local instantiate process, right?
>
> Kind of.  I'll see if I can reproduce the difference with any of your
> intercahnge
> testcases - any hint which one to look at?
For added tests, I think there will be no difference between the two.
I noticed the difference for
pointer cases like:
for (i...)
  for (j...)
for (k...)
   p[i*n*n + j*n+ k] =...

>
>> Also estimate_val_by_simplify_replace is needed for pointers, where strides
>> are computed from niters of loops which could be non compilation time 
>> constant.
>> But yes, it's an odd fixup after I failed to do anything better.
>
> But you are for example computing _1 - _2 to zero, right?  Because both _1
> and _2 are not constant and thus you replace it with the same (symbolical)
> constant 'niter'.
>
> I think that asks for garbage-in-garbage-out ...
>
> Which testcase is this important for so I can have a look?
So far this is only for the above pointer case.  Actually I don't
think it's that important, and thought about skip it.
So we don't have to do estimate_val_by_simplify_replace.

Thanks,
bin
>
>>>
>>> I think keeping the chrec in the above form is also more suitable for what
>>

Re: generic retuning part 1 - x86-tune-costs update

2017-11-30 Thread Jan Hubicka
> On Thu, Nov 30, 2017 at 10:40 AM, Jan Hubicka  wrote:
> > Hi,
> > this patch makes costs in generic to math better modern chips (core, 
> > haswell,
> > buldozer and zen).  The only important change is to drop cost of unaligned 
> > loads
> > and stores becuase all modern chips handle it well.  This makes vectorizer 
> > to
> > not peel for alignment and saves a lot of code size without sacrifying
> > performance.
> >
> > I have benchmarked it on zen and skylake and it is small but almost 
> > consistent
> > win in performance too.  Notable regression is fma3d regressing aprox. 5%
> > on zen.  This is the case of native tuning as well, so I will look into it
> > incrementally.
> >
> > Bootstrapped/regtested x86_64-linux, comitted.
> 
> The question is how we cost such things as store bandwith where IIRC
> an unaligned store counts 'two' entries in the pipelines store buffers.
> Likewise unaligned loads do usually still have a penalty.
> 
> What changed is that when the loads/stores happen to be aligned
> using the unaligned instruction variant doesn't have a penalty.
> 
> So I'm not sure peeling for alignmend isn't a win, it just depends more
> on the number of memory streams involved.

I have benchmarked this quite thoroughly while switching the defaults for Core
and Zen.  Disabling alignment is pretty much consistent for specfp/int for
2000,2006 and 2017 (last is Zen only) and quite consistent one.  I will analyze
fma3d. It did not show with the alignment change alone, so it may be related
to other costs or just bad luck.

For sure, there may be specific loops where alignment wins, but it seems bad 
idea
to enable it by default just in case it is needed when it makes regression on
SPECfp.  Lets see if we can identify them and be more careful about alignment
decision.

I also checked that disabling alingment prologues is win even for string
operations and there it is disabled for couple releases already. (I think
since initial tuning for core and Buldozer came in)
Honza
> 
> Richard.


Re: [PATCH] Implement std::to_address for C++2a

2017-11-30 Thread Jonathan Wakely

On 29/11/17 04:54 -0500, Glen Fernandes wrote:

(Also added a new [_neg] test)

Move static_assert for function pointers to __to_address


Thanks, applied.




Re: [PATCH GCC][V2]A simple implementation of loop interchange

2017-11-30 Thread Richard Biener
On Thu, Nov 30, 2017 at 3:13 PM, Bin.Cheng  wrote:
> On Thu, Nov 30, 2017 at 1:01 PM, Richard Biener
>  wrote:
>> On Tue, Nov 28, 2017 at 4:26 PM, Bin Cheng  wrote:
>>> Hi,
>>> This is updated patch with review comments resolved.  Some explanation 
>>> embedded below.
>>>
 +
 +  iloop->nb_iterations = nb_iterations;

 use std::swap?  Also I think if you can keep nb_iterations you
 can certainly keep the upper bounds.  You're probably
 afraid of the ->stmt references in the nb_iter_bound entries?

 Anyway, either scrap everything or try to keep everything.
>>> Yeah, not only the stmts, but also the control_iv information because the 
>>> SCEV
>>> information may be corrupted during code transformation.
>>> Now I discarded all the information.
>>
>> Note that given you interchange the loops but not the CFG or the loop 
>> structures
>> you might want to swap loop->num and flags like ->force_vectorize.  That is,
>> essentially change the ->header/latch association (and other CFG related 
>> stuff
>> like recorded exits).
>>
>> It might also be we want to / need to disable interchange for, say,
>> ->force_vectorize
>> inner loops or ->unroll != 0?  Or we need to clear them, maybe
>> optionally diagnosing
>> that fact.
>>
>> At least we need to think about what it means to preserve loop
>> structure (semantically,
>> loop->num should maintain association to the same source-level loop
>> throughout the
>> compilation) for transforms like interchange.
>>

 +  for (i = 0; oloop.reductions.iterate (i, &re); ++i)
 +{
 +  if (re->type != DOUBLE_RTYPE)
 +   gcc_unreachable ();
 +
 +  use_operand_p use_p;
 +  imm_use_iterator iterator;
 +  FOR_EACH_IMM_USE_FAST (use_p, iterator, re->var)
 +   mark_or_remove_dbg_stmt (USE_STMT (use_p), re->var);
 +  FOR_EACH_IMM_USE_FAST (use_p, iterator, re->next)
 +   mark_or_remove_dbg_stmt (USE_STMT (use_p), re->next);
 +  if (TREE_CODE (re->init) == SSA_NAME)
 +   {
 + FOR_EACH_IMM_USE_FAST (use_p, iterator, re->init)
 +   mark_or_remove_dbg_stmt (USE_STMT (use_p), re->init);
 +   }

 can you add a comment what you are doing here?

 Note that other loop opts simply scrap all debug stmts ...
>>> As mentioned above, updated patch doesn't try hard to maintain debug use 
>>> info any more.
>>>

 +static void
 +compute_access_stride (struct loop *loop_nest, struct loop *loop,
 +  data_reference_p dr)
 +{
 ...
 +  tree ref = DR_REF (dr);
 +  tree scev_base = build_fold_addr_expr (ref);
 +  tree access_size = TYPE_SIZE_UNIT (TREE_TYPE (ref));
 +  tree niters = build_int_cst (sizetype, AVG_LOOP_NITER);
 +  access_size = fold_build2 (MULT_EXPR, sizetype, niters, access_size);
 +
 +  do {
 +tree scev_fn = analyze_scalar_evolution (loop, scev_base);
 +if (chrec_contains_undetermined (scev_fn)
 +   || chrec_contains_symbols_defined_in_loop (scev_fn, loop->num))
 +  break;
 ...
 +strides->safe_push (scev_step);
 +  } while (loop != loop_nest && (loop = loop_outer (loop)) != NULL);
 +

 I _think_ you want to do

scev_fn = analyze_scalar_evolution (loop, scev_base); // assuming
 DR_STMT (dr) is in loop
scev_fn = instantiate_parameters (nest, scev_fn);
if (chrec_contains_undetermined (scev_fn))
  return; // false?

 and analyze the result which should be of the form

   { { { init, +, step1 }_1, +, step2 }_2, + , step3 }_3 ...

 if canonical.  I think estimate_val_by_simplify_replace isn't needed
 if you do that
 (it also looks odd to replace all vairables in step by niter...).
>>> I replied on this in previous message, instantiate_parameters doesn't always
>>> give canonical form result as expected.  The loop here could be seen as a
>>> local instantiate process, right?
>>
>> Kind of.  I'll see if I can reproduce the difference with any of your
>> intercahnge
>> testcases - any hint which one to look at?
> For added tests, I think there will be no difference between the two.
> I noticed the difference for
> pointer cases like:
> for (i...)
>   for (j...)
> for (k...)
>p[i*n*n + j*n+ k] =...
>
>>
>>> Also estimate_val_by_simplify_replace is needed for pointers, where strides
>>> are computed from niters of loops which could be non compilation time 
>>> constant.
>>> But yes, it's an odd fixup after I failed to do anything better.
>>
>> But you are for example computing _1 - _2 to zero, right?  Because both _1
>> and _2 are not constant and thus you replace it with the same (symbolical)
>> constant 'niter'.
>>
>> I think that asks for garbage-in-garbage-out ...
>>
>> Which testcase is this important for so I can have a look?
> So far this is only for the above pointer case.  Actually I don't
> think it's that 

[PATCH] Fix "central flowgraph" typo in machine desc docs

2017-11-30 Thread Jonathan Wakely

OK for trunk?



diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index aee13259b00..c2e42d7aa49 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,7 @@
+2017-11-30  Jonathan Wakely  
+
+	* doc/md.texi (Insn Splitting): Fix "central flowgraph" typo.
+
 2017-11-30  Claudiu Zissulescu  
 
 	* config/arc/arc.md (trap): New pattern.
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index 9806b65d1cf..9e0540a1e8b 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -7890,7 +7890,7 @@ are not valid.
 
 The splitter is allowed to split jump instructions into sequence of
 jumps or create new jumps in while splitting non-jump instructions.  As
-the central flowgraph and branch prediction information needs to be updated,
+the control flow graph and branch prediction information needs to be updated,
 several restriction apply.
 
 Splitting of jump instruction into sequence that over by another jump


[PATCHv2] Add a warning for invalid function casts

2017-11-30 Thread Bernd Edlinger
On 11/29/17 22:57, Jason Merrill wrote:
> On 10/09/2017 06:30 PM, Bernd Edlinger wrote:
>> +/* Heuristic check if two parameter types can be considered 
>> ABI-equivalent.  */
>> +
>> +static bool
>> +cxx_abi_equiv_type_p (tree t1, tree t2)
> 
> This name is too general for a function that is specifically for 
> implementing a particular warning.
> 

Okay, then I will name it cxx_safe_arg_type_equiv_p.

>> +  if (INTEGRAL_TYPE_P (t1)
>> +  && INTEGRAL_TYPE_P (t2)
>> +  && TYPE_PRECISION (t1) == TYPE_PRECISION (t2)
>> +  && (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2)
>> +  || TYPE_PRECISION (t1) >= TYPE_PRECISION (integer_type_node)))
>> +    return true;
> 
> This section needs a comment explaining what you're allowing and why.

Okay. I will add a comment here:

   /* The signedness of the parameter matters only when an integral
  type smaller than int is promoted to int, otherwise only the
  precision of the parameter matters.
  This check should make sure that the callee does not see
  undefined values in argument registers.  */

Joseph, I assume your OK is still valid when I do these two changes
also in the C frontend.

>> +  else if (TYPE_PTRMEMFUNC_P (type) && TYPE_PTRMEMFUNC_P (intype))
>> +    {
>> +  if ((complain & tf_warning)
>> +  && !same_type_p (type, intype))
> 
> Why not use cxx_safe_function_type_cast_p here, too? 
> TYPE_PTRMEMFUNC_FN_TYPE will be helpful.
> 

Yes, that won't hurt.


So, is the attached patch OK for trunk, after the usual
bootstrap and reg-testing?


Thanks
Bernd.
gcc:
2017-10-06  Bernd Edlinger  

* doc/invoke.texi: Document -Wcast-function-type.
* recog.h (stored_funcptr): Change signature.
* tree-dump.c (dump_node): Avoid warning.
* typed-splay-tree.h (typed_splay_tree): Avoid warning.

libcpp:
2017-10-06  Bernd Edlinger  

* internal.h (maybe_print_line): Change signature.

c-family:
2017-10-06  Bernd Edlinger  

* c.opt (Wcast-function-type): New warning option.
* c-lex.c (get_fileinfo): Avoid warning.
* c-ppoutput.c (scan_translation_unit_directives_only): Remove cast.

c:
2017-10-06  Bernd Edlinger  

* c-typeck.c (c_safe_arg_type_equiv_p,
c_safe_function_type_cast_p): New function.
(build_c_cast): Implement -Wcast-function-type.

cp:
2017-10-06  Bernd Edlinger  

* decl2.c (start_static_storage_duration_function): Avoid warning.
* typeck.c (cxx_safe_arg_type_equiv_p,
cxx_safe_function_type_cast_p): New function.
(build_reinterpret_cast_1): Implement -Wcast-function-type.

testsuite:
2017-10-06  Bernd Edlinger  

* c-c++-common/Wcast-function-type.c: New test.
* g++.dg/Wcast-function-type.C: New test.
Index: gcc/c/c-typeck.c
===
--- gcc/c/c-typeck.c	(revision 255270)
+++ gcc/c/c-typeck.c	(working copy)
@@ -5452,6 +5452,58 @@ handle_warn_cast_qual (location_t loc, tree type,
   while (TREE_CODE (in_type) == POINTER_TYPE);
 }
 
+/* Heuristic check if two parameter types can be considered ABI-equivalent.  */
+
+static bool
+c_safe_arg_type_equiv_p (tree t1, tree t2)
+{
+  t1 = TYPE_MAIN_VARIANT (t1);
+  t2 = TYPE_MAIN_VARIANT (t2);
+
+  if (TREE_CODE (t1) == POINTER_TYPE
+  && TREE_CODE (t2) == POINTER_TYPE)
+return true;
+
+  /* The signedness of the parameter matters only when an integral
+ type smaller than int is promoted to int, otherwise only the
+ precision of the parameter matters.
+ This check should make sure that the callee does not see
+ undefined values in argument registers.  */
+  if (INTEGRAL_TYPE_P (t1)
+  && INTEGRAL_TYPE_P (t2)
+  && TYPE_PRECISION (t1) == TYPE_PRECISION (t2)
+  && (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2)
+	  || TYPE_PRECISION (t1) >= TYPE_PRECISION (integer_type_node)))
+return true;
+
+  return comptypes (t1, t2);
+}
+
+/* Check if a type cast between two function types can be considered safe.  */
+
+static bool
+c_safe_function_type_cast_p (tree t1, tree t2)
+{
+  if (TREE_TYPE (t1) == void_type_node &&
+  TYPE_ARG_TYPES (t1) == void_list_node)
+return true;
+
+  if (TREE_TYPE (t2) == void_type_node &&
+  TYPE_ARG_TYPES (t2) == void_list_node)
+return true;
+
+  if (!c_safe_arg_type_equiv_p (TREE_TYPE (t1), TREE_TYPE (t2)))
+return false;
+
+  for (t1 = TYPE_ARG_TYPES (t1), t2 = TYPE_ARG_TYPES (t2);
+   t1 && t2;
+   t1 = TREE_CHAIN (t1), t2 = TREE_CHAIN (t2))
+if (!c_safe_arg_type_equiv_p (TREE_VALUE (t1), TREE_VALUE (t2)))
+  return false;
+
+  return true;
+}
+
 /* Build an expression representing a cast to type TYPE of expression EXPR.
LOC is the location of the cast-- typically the open paren of the cast.  */
 
@@ -5645,6 +5697,16 @@ build_c_cast (location_t loc, tree type, tree expr
 	pedwarn (loc, OPT_Wpedantic, "ISO C forbids "
 		 "conversion of object pointer to function pointer type");
 
+  if (

Re: [C++ PATCH] Plural forms for count != eltscnt structured binding diagnostics

2017-11-30 Thread Pedro Alves
On 11/30/2017 01:10 PM, Jakub Jelinek wrote:
> On Thu, Nov 30, 2017 at 01:33:48PM +0100, Jakub Jelinek wrote:
>> On Thu, Nov 30, 2017 at 01:01:58PM +0100, Jakub Jelinek wrote:
>>> I just followed:
>>> https://www.gnu.org/savannah-checkouts/gnu/gettext/manual/html_node/Plural-forms.html
>>>
>>> "About larger integer types, such as ‘uintmax_t’ or ‘unsigned long long’: 
>>> they can be
>>> handled by reducing the value to a range that fits in an ‘unsigned long’. 
>>> Simply
>>> casting the value to ‘unsigned long’ would not do the right thing, since it 
>>> would
>>> treat ULONG_MAX + 1 like zero, ULONG_MAX + 2 like singular, and the like. 
>>> Here you
>>> can exploit the fact that all mentioned plural form formulas eventually 
>>> become periodic,
>>> with a period that is a divisor of 100 (or 1000 or 100). So, when you 
>>> reduce a large
>>> value to another one in the range [100, 199] that ends in the same 
>>> 6 decimal
>>> digits, you can assume that it will lead to the same plural form selection. 
>>> This code
>>> does this:
>>>
>>> #include 
>>> uintmax_t nbytes = ...;
>>> printf (ngettext ("The file has %"PRIuMAX" byte.",
>>>   "The file has %"PRIuMAX" bytes.",
>>>   (nbytes > ULONG_MAX
>>>? (nbytes % 100) + 100
>>>: nbytes)),
>>> nbytes);"
>>>
>>> I can surely add a comment about that.

How about wrapping it in a function to make it self-describing?
Something around:

/* Comment/url here.  */

unsigned long
plural_form_for (unsigned HOST_WIDE_INT val)
{
  return (val > ULONG_MAX
  ? (val % 100) + 100
  : val);
}

and then:

  inform_n (loc, plural_form_for (eltscnt),
"while %qT decomposes into %wu element",
"while %qT decomposes into %wu elements",
type, eltscnt);

Pedro Alves



Re: [PATCHv2] Add a warning for invalid function casts

2017-11-30 Thread Jason Merrill
On Thu, Nov 30, 2017 at 10:14 AM, Bernd Edlinger
 wrote:
> On 11/29/17 22:57, Jason Merrill wrote:
>> On 10/09/2017 06:30 PM, Bernd Edlinger wrote:

>>> +  if (INTEGRAL_TYPE_P (t1)
>>> +  && INTEGRAL_TYPE_P (t2)
>>> +  && TYPE_PRECISION (t1) == TYPE_PRECISION (t2)
>>> +  && (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2)
>>> +  || TYPE_PRECISION (t1) >= TYPE_PRECISION (integer_type_node)))
>>> +return true;
>>
>> This section needs a comment explaining what you're allowing and why.
>
> Okay. I will add a comment here:
>
>/* The signedness of the parameter matters only when an integral
>   type smaller than int is promoted to int, otherwise only the
>   precision of the parameter matters.
>   This check should make sure that the callee does not see
>   undefined values in argument registers.  */

If we're thinking about argument promotion, should this use
type_passed_as rather than assume promotion to int?

Jason


Re: [PATCH GCC][V2]A simple implementation of loop interchange

2017-11-30 Thread Richard Biener
On Thu, Nov 30, 2017 at 4:09 PM, Richard Biener
 wrote:
> On Thu, Nov 30, 2017 at 3:13 PM, Bin.Cheng  wrote:
>> On Thu, Nov 30, 2017 at 1:01 PM, Richard Biener
>>  wrote:
>>> On Tue, Nov 28, 2017 at 4:26 PM, Bin Cheng  wrote:
 Hi,
 This is updated patch with review comments resolved.  Some explanation 
 embedded below.

> +
> +  iloop->nb_iterations = nb_iterations;
>
> use std::swap?  Also I think if you can keep nb_iterations you
> can certainly keep the upper bounds.  You're probably
> afraid of the ->stmt references in the nb_iter_bound entries?
>
> Anyway, either scrap everything or try to keep everything.
 Yeah, not only the stmts, but also the control_iv information because the 
 SCEV
 information may be corrupted during code transformation.
 Now I discarded all the information.
>>>
>>> Note that given you interchange the loops but not the CFG or the loop 
>>> structures
>>> you might want to swap loop->num and flags like ->force_vectorize.  That is,
>>> essentially change the ->header/latch association (and other CFG related 
>>> stuff
>>> like recorded exits).
>>>
>>> It might also be we want to / need to disable interchange for, say,
>>> ->force_vectorize
>>> inner loops or ->unroll != 0?  Or we need to clear them, maybe
>>> optionally diagnosing
>>> that fact.
>>>
>>> At least we need to think about what it means to preserve loop
>>> structure (semantically,
>>> loop->num should maintain association to the same source-level loop
>>> throughout the
>>> compilation) for transforms like interchange.
>>>
>
> +  for (i = 0; oloop.reductions.iterate (i, &re); ++i)
> +{
> +  if (re->type != DOUBLE_RTYPE)
> +   gcc_unreachable ();
> +
> +  use_operand_p use_p;
> +  imm_use_iterator iterator;
> +  FOR_EACH_IMM_USE_FAST (use_p, iterator, re->var)
> +   mark_or_remove_dbg_stmt (USE_STMT (use_p), re->var);
> +  FOR_EACH_IMM_USE_FAST (use_p, iterator, re->next)
> +   mark_or_remove_dbg_stmt (USE_STMT (use_p), re->next);
> +  if (TREE_CODE (re->init) == SSA_NAME)
> +   {
> + FOR_EACH_IMM_USE_FAST (use_p, iterator, re->init)
> +   mark_or_remove_dbg_stmt (USE_STMT (use_p), re->init);
> +   }
>
> can you add a comment what you are doing here?
>
> Note that other loop opts simply scrap all debug stmts ...
 As mentioned above, updated patch doesn't try hard to maintain debug use 
 info any more.

>
> +static void
> +compute_access_stride (struct loop *loop_nest, struct loop *loop,
> +  data_reference_p dr)
> +{
> ...
> +  tree ref = DR_REF (dr);
> +  tree scev_base = build_fold_addr_expr (ref);
> +  tree access_size = TYPE_SIZE_UNIT (TREE_TYPE (ref));
> +  tree niters = build_int_cst (sizetype, AVG_LOOP_NITER);
> +  access_size = fold_build2 (MULT_EXPR, sizetype, niters, access_size);
> +
> +  do {
> +tree scev_fn = analyze_scalar_evolution (loop, scev_base);
> +if (chrec_contains_undetermined (scev_fn)
> +   || chrec_contains_symbols_defined_in_loop (scev_fn, loop->num))
> +  break;
> ...
> +strides->safe_push (scev_step);
> +  } while (loop != loop_nest && (loop = loop_outer (loop)) != NULL);
> +
>
> I _think_ you want to do
>
>scev_fn = analyze_scalar_evolution (loop, scev_base); // assuming
> DR_STMT (dr) is in loop
>scev_fn = instantiate_parameters (nest, scev_fn);
>if (chrec_contains_undetermined (scev_fn))
>  return; // false?
>
> and analyze the result which should be of the form
>
>   { { { init, +, step1 }_1, +, step2 }_2, + , step3 }_3 ...
>
> if canonical.  I think estimate_val_by_simplify_replace isn't needed
> if you do that
> (it also looks odd to replace all vairables in step by niter...).
 I replied on this in previous message, instantiate_parameters doesn't 
 always
 give canonical form result as expected.  The loop here could be seen as a
 local instantiate process, right?
>>>
>>> Kind of.  I'll see if I can reproduce the difference with any of your
>>> intercahnge
>>> testcases - any hint which one to look at?
>> For added tests, I think there will be no difference between the two.
>> I noticed the difference for
>> pointer cases like:
>> for (i...)
>>   for (j...)
>> for (k...)
>>p[i*n*n + j*n+ k] =...
>>
>>>
 Also estimate_val_by_simplify_replace is needed for pointers, where strides
 are computed from niters of loops which could be non compilation time 
 constant.
 But yes, it's an odd fixup after I failed to do anything better.
>>>
>>> But you are for example computing _1 - _2 to zero, right?  Because both _1
>>> and _2 are not constant and thus you replace it with the same (symbolical)
>>> constant 'niter'.
>>>
>>> I think th

Re: [PATCH] ARM testsuite: force hardfp for addr-modes-float.c

2017-11-30 Thread Kyrill Tkachov


On 27/11/17 19:23, Charles Baylis wrote:

On 27 November 2017 at 17:47, Kyrill  Tkachov
 wrote:

Hi Charles,

On 27/11/17 17:03, Charles Baylis wrote:

Some of the new tests in addr-modes-float.c, which were introduced for
the rework of addressing modes costs [1] fail when GCC is configured
to default to a softfp calling convention. Fix this by annotating the
test functions with __attribute__((pcs("aapcs-vfp"))).


The usual approach to this problem is to add an -mfloat-abi=hard to the
dg-options
of the test (the tests are not dg-run, so there's no link-time mismatch
concerns).
Any particular reason to use the pcs attribute instead?

With the way I have GCC configured, it doesn't work to do this when
including certain system headers, such as arm_neon.h.

In file included from
/home/cbaylis/tools/sysroot-arm-unknown-linux-gnueabi-git/usr/include/features.h:447,
  from
/home/cbaylis/tools/sysroot-arm-unknown-linux-gnueabi-git/usr/include/bits/libc-header-start.h:33,
  from
/home/cbaylis/tools/sysroot-arm-unknown-linux-gnueabi-git/usr/include/stdint.h:26,
  from 
/home/cbaylis/buildarea/gcc/build2/gcc/include/stdint.h:9,
  from
/home/cbaylis/buildarea/gcc/build2/gcc/include/arm_fp16.h:34,
  from
/home/cbaylis/buildarea/gcc/build2/gcc/include/arm_neon.h:41,
  from
/home/cbaylis/srcarea/gcc/gcc-git/gcc/testsuite/gcc.target/arm/addr-modes-float.c:6:
/home/cbaylis/tools/sysroot-arm-unknown-linux-gnueabi-git/usr/include/gnu/stubs.h:10:11:
fatal error: gnu/stubs-hard.h: No such file or directory
compilation terminated.


So is it the case that you don't run any arm tests that include 
arm_neon.h in your configuration?
If so, then I would be fine with leaving this test unsupported on this 
configuration.
By the way, I notice that in addr-modes-float.c the arm_neon_ok check is 
placed before the dg-add-options.
I don't remember the arcane rules exactly, but I think the effective 
target check should go before it, so that the test gets skipped properly.


Thanks,
Kyrill


Re: [PATCH] final: Improve output for -dp and -fverbose-asm

2017-11-30 Thread Michael Matz
Hi,

On Thu, 30 Nov 2017, Segher Boessenkool wrote:

> > [c=NN l=NN] will be meaningless to those without some deeper insight 
> > into/experience with what's actually being printed. It might as well 
> > say [NN NN].  But such extreme terseness would
> 
> Length isn't printed on all targets, fwiw.

Assembler is mnemonic, I see no reason to make the meta info not be 
mnemonic as well.  In that light [NN NN] is clearly less mnemonic than 
[c=NN l=NN] (order is clear), and [length=NN] is overly verbose.  People 
looking at -fverbose-asm output don't need hand-holding.


Ciao,
Michael.


[PATCH][ARM][gcc-7] Fix wrong code by arm_final_prescan with fp16 move instructions

2017-11-30 Thread Sudakshina Das

Hi

This patch is the fix for gcc-7 for the same issue as mentioned in:
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02209.html


For the following test case:
__fp16
test_select (__fp16 a, __fp16 b, __fp16 c)
{
  return (a < b) ? b : c;
}

when compiled with -O2 -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm 
-mfloat-abi=hard generates wrong code:


test_select:
@ args = 0, pretend = 0, frame = 0
@ frame_needed = 0, uses_anonymous_args = 0
@ link register save eliminated.
vcvtb.f32.f16   s0, s0
vcvtb.f32.f16   s15, s1
vmov.f16r3, s2  @ __fp16
vcmpe.f32   s0, s15
vmrsAPSR_nzcv, FPSCR
// <-- No conditional branch
vmov.f16r3, s1  @ __fp16
.L1:
vmov.f16s0, r3  @ __fp16
bx  lr

There should have been a conditional branch there to skip one of the VMOVs.
This patch fixes this problem by making *movhf_vfp_fp16 unconditional.

Testing done: Add a new test case and checked for regressions on 
bootstrapped arm-none-linux-gnueabihf.


Is this ok for gcc-7?

Sudi

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2017-11-30  Sudakshina Das  

* config/arm/vfp.md (*movhf_vfp_fp16): Add conds attribute.

*** gcc/testsuite/ChangeLog ***

2017-11-30  Sudakshina Das  

* gcc.target/arm/armv8_2-fp16-move-2.c: New test.
diff --git a/gcc/config/arm/vfp.md b/gcc/config/arm/vfp.md
index d8f77e2ffe4fdb7c952d6a5ac947d91f89ce259d..9f06c3da9526d09e43836a60f14da9a49671a393 100644
--- a/gcc/config/arm/vfp.md
+++ b/gcc/config/arm/vfp.md
@@ -456,7 +456,10 @@
   gcc_unreachable ();
 }
  }
-  [(set_attr "predicable" "yes, yes, no, yes, no, no, no, no, no, no")
+  [(set_attr "conds" "*, *, unconditional, *, unconditional, unconditional,\
+		  unconditional, unconditional, unconditional,\
+		  unconditional")
+   (set_attr "predicable" "yes, yes, no, yes, no, no, no, no, no, no")
(set_attr "predicable_short_it" "no, no, no, yes,\
 no, no, no, no,\
 no, no")
diff --git a/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c
new file mode 100644
index ..ac7d4e3f2a9fb1d70a9ce95062dc6db4a69272ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/armv8_2-fp16-move-2.c
@@ -0,0 +1,11 @@
+/* { dg-do compile }  */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok }  */
+/* { dg-options "-O2 -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm -mfloat-abi=hard" }  */
+/* { dg-add-options arm_v8_2a_fp16_scalar }  */
+
+__fp16
+test_select (__fp16 a, __fp16 b, __fp16 c)
+{
+  return (a < b) ? b : c;
+}
+/* { dg-final { scan-assembler "bpl" } } */


Re: [PATCH][ARM] Fix wrong code by arm_final_prescan with fp16 move instructions

2017-11-30 Thread Sudakshina Das

Hi Kyrill

On 27/11/17 12:25, Kyrill Tkachov wrote:

Hi Sudi,

On 24/11/17 14:57, Sudi Das wrote:

Hi

For the following test case:
__fp16
test_select (__fp16 a, __fp16 b, __fp16 c)
{
   return (a < b) ? b : c;
}

when compiled with -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm
-mfloat-abi=hard trunk generates wrong code:

test_select:
 @ args = 0, pretend = 0, frame = 0
 @ frame_needed = 0, uses_anonymous_args = 0
 @ link register save eliminated.
 vcvtb.f32.f16   s0, s0
 vcvtb.f32.f16   s15, s1
 vcmpe.f32   s0, s15
 vmrsAPSR_nzcv, FPSCR
 // <-- No conditional branch!
 vmovs1, s2  @ __fp16
.L2:
 vmovs0, s1  @ __fp16
 bx  lr

There should have been a conditional branch there to skip one of the
VMOVs.
This patch fixes this problem by making *movhf_vfp_fp16 unconditional
wherever needed.

Testing done: Add a new test case and checked for regressions
arm-none-linux-gnueabihf.

Is this ok for trunk?



This is ok after assuming a bootstrap on arm-none-linux-gnueabihf passes
as well.
Does this bug appear on the GCC 7 branch?
If so, could you please test this patch on that branch as well if so?



I have tested the patch and also sent a new patch request for gcc-7
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02577.html

Thanks
Sudi


Thanks,
Kyrill


Sudi

ChangeLog entry are as follow:

*** gcc/ChangeLog ***

2017-11-24  Sudakshina Das  

* config/arm/vfp.md (*movhf_vfp_fp16): Add conds attribute.

*** gcc/testsuite/ChangeLog ***

2017-11-24  Sudakshina Das  

* gcc.target/arm/armv8_2-fp16-move-2.c: New test.






Re: [PATCH][ARM] Fix wrong code by arm_final_prescan with fp16 move instructions

2017-11-30 Thread Kyrill Tkachov


On 30/11/17 16:06, Sudakshina Das wrote:

Hi Kyrill

On 27/11/17 12:25, Kyrill Tkachov wrote:
> Hi Sudi,
>
> On 24/11/17 14:57, Sudi Das wrote:
>> Hi
>>
>> For the following test case:
>> __fp16
>> test_select (__fp16 a, __fp16 b, __fp16 c)
>> {
>>return (a < b) ? b : c;
>> }
>>
>> when compiled with -mfpu=fp-armv8 -march=armv8.2-a+fp16 -marm
>> -mfloat-abi=hard trunk generates wrong code:
>>
>> test_select:
>>  @ args = 0, pretend = 0, frame = 0
>>  @ frame_needed = 0, uses_anonymous_args = 0
>>  @ link register save eliminated.
>>  vcvtb.f32.f16   s0, s0
>>  vcvtb.f32.f16   s15, s1
>>  vcmpe.f32   s0, s15
>>  vmrsAPSR_nzcv, FPSCR
>>  // <-- No conditional branch!
>>  vmovs1, s2  @ __fp16
>> .L2:
>>  vmovs0, s1  @ __fp16
>>  bx  lr
>>
>> There should have been a conditional branch there to skip one of the
>> VMOVs.
>> This patch fixes this problem by making *movhf_vfp_fp16 unconditional
>> wherever needed.
>>
>> Testing done: Add a new test case and checked for regressions
>> arm-none-linux-gnueabihf.
>>
>> Is this ok for trunk?
>>
>
> This is ok after assuming a bootstrap on arm-none-linux-gnueabihf passes
> as well.
> Does this bug appear on the GCC 7 branch?
> If so, could you please test this patch on that branch as well if so?
>

I have tested the patch and also sent a new patch request for gcc-7
https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02577.html



Thanks Sudi, this is ok to commit to the branch after we let this patch 
bake on trunk for a week without problems.


Kyrill



Thanks
Sudi

> Thanks,
> Kyrill
>
>> Sudi
>>
>> ChangeLog entry are as follow:
>>
>> *** gcc/ChangeLog ***
>>
>> 2017-11-24  Sudakshina Das 
>>
>> * config/arm/vfp.md (*movhf_vfp_fp16): Add conds attribute.
>>
>> *** gcc/testsuite/ChangeLog ***
>>
>> 2017-11-24  Sudakshina Das 
>>
>> * gcc.target/arm/armv8_2-fp16-move-2.c: New test.
>





Re: [PATCHv2] Add a warning for invalid function casts

2017-11-30 Thread Bernd Edlinger
On 11/30/17 16:45, Jason Merrill wrote:
> On Thu, Nov 30, 2017 at 10:14 AM, Bernd Edlinger
>  wrote:
>> On 11/29/17 22:57, Jason Merrill wrote:
>>> On 10/09/2017 06:30 PM, Bernd Edlinger wrote:
> 
 +  if (INTEGRAL_TYPE_P (t1)
 +  && INTEGRAL_TYPE_P (t2)
 +  && TYPE_PRECISION (t1) == TYPE_PRECISION (t2)
 +  && (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2)
 +  || TYPE_PRECISION (t1) >= TYPE_PRECISION (integer_type_node)))
 +return true;
>>>
>>> This section needs a comment explaining what you're allowing and why.
>>
>> Okay. I will add a comment here:
>>
>> /* The signedness of the parameter matters only when an integral
>>type smaller than int is promoted to int, otherwise only the
>>precision of the parameter matters.
>>This check should make sure that the callee does not see
>>undefined values in argument registers.  */
> 
> If we're thinking about argument promotion, should this use
> type_passed_as rather than assume promotion to int?
> 

I don't know, it is only a heuristic after all, and even if there is no
warning for a bogus type cast that does not mean any
correctness-guarantee at all.

Would type_passed_as make any difference for integral types?


Bernd.


[PATCH] rs6000: Set rs6000_cpu correctly (PR43871)

2017-11-30 Thread Segher Boessenkool
We set rs6000_cpu based on tune_index, but it should be cpu_index.
This patch fixes it.

Tested on powerpc64-linux {-m32,-m64}.  I'll commit this later today,
and schedule backports for next week.  Unless it is all terribly wrong
and someone complains :-)


Segher


2017-11-30  Segher Boessenkool  

PR target/43871
* config/rs6000/rs6000.c (rs6000_option_override_internal): Set
rs6000_cpu based on cpu_index, not tune_index.

---
 gcc/config/rs6000/rs6000.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 9929a45..9c0df9a 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4183,8 +4183,8 @@ rs6000_option_override_internal (bool global_init_p)
  }
 }
 
-  gcc_assert (tune_index >= 0);
-  rs6000_cpu = processor_target_table[tune_index].processor;
+  gcc_assert (cpu_index >= 0);
+  rs6000_cpu = processor_target_table[cpu_index].processor;
 
   if (rs6000_cpu == PROCESSOR_PPCE300C2 || rs6000_cpu == PROCESSOR_PPCE300C3
   || rs6000_cpu == PROCESSOR_PPCE500MC || rs6000_cpu == 
PROCESSOR_PPCE500MC64
-- 
1.8.3.1



Re: [PATCH] final: Improve output for -dp and -fverbose-asm

2017-11-30 Thread Martin Sebor

On 11/30/2017 04:16 AM, Segher Boessenkool wrote:

On Wed, Nov 29, 2017 at 08:46:41PM -0700, Martin Sebor wrote:

On 11/29/2017 04:13 PM, Segher Boessenkool wrote:

This improves the assembler output (for -dp and -fverbose-asm) in
several ways.  It always prints the insn_cost.  It does not print
"[length = NN]" but "[c=NN l=NN]", to save space.  It does not add one
to the instruction alternative number (everything else starts counting
those at 0, too).  And finally, it tries to keep things lined up in
columns a bit better.

Tested on powerpc64-linux {-m32,-m64}; is this okay for trunk?


[c=NN l=NN] will be meaningless to those without some deeper
insight into/experience with what's actually being printed.
It might as well say [NN NN].  But such extreme terseness would


Length isn't printed on all targets, fwiw.


seem to run counter to the documented purpose of -fverbose-asm
to "Put extra commentary information in the generated assembly
code to make it more readable."


The point is that [length = 12] takes up an awful lot of space.  The
output of -fverbose-asm alread suffers from information overload.


Amount of space vs amount of detail are two different concerns.
If you feel that the output is overly detailed then adding even
more to it won't help.  Other than that, I don't think trading
readability for space savings makes sense in a format whose main
purpose is to be readable.  If it's line length that's a concern
then using spaces instead of tabs would make them look shorter.
You can also remove the brackets and use length=NN to shave off
a couple of bytes.  Or print length only when it's not the typical
default.  Or print the hex encoding instead.


Looking further in the manual I don't see the [length=NN] bit
mentioned (nor does your patch add a mention of it) so while
the meaning of [length=NN] isn't exactly obvious, using [l=NN]
instead certainly won't make it easier to read.  Does the manual
need updating?


Should -fverbose-asm print length (and cost) at all?  They should be
printed for -dp (which is for debugging the *compiler*), but are they
very useful for -fverbose-asm (which is primarily for debugging the
*user program*)?


I don't have the answers to these questions.  What I can say
is that using one letter abbreviations for short words is not
going to make it more readable, on the contrary.  len= would
be okay.

Martin


Re: [patch] remove cilk-plus

2017-11-30 Thread Marek Polacek
On Thu, Nov 30, 2017 at 11:54:11AM +, Koval, Julia wrote:
> Hi, here is the followup patch. Ok for trunk?
> 
> gcc/c-family/
>   * c-common.h (inv_list): Remove.

In function.h I still see

  /* In a Cilk function, the VAR_DECL for the frame descriptor. */
  tree cilk_frame_decl;

isn't that redundant now, too?

Marek


Re: [PATCH] final: Improve output for -dp and -fverbose-asm

2017-11-30 Thread Kyrill Tkachov

Hi Segher,

On 29/11/17 23:13, Segher Boessenkool wrote:

This improves the assembler output (for -dp and -fverbose-asm) in
several ways.  It always prints the insn_cost.  It does not print
"[length = NN]" but "[c=NN l=NN]", to save space.  It does not add one
to the instruction alternative number (everything else starts counting
those at 0, too).  And finally, it tries to keep things lined up in
columns a bit better.

Tested on powerpc64-linux {-m32,-m64}; is this okay for trunk?



FWIW printing the cost would be hepful to me at the -dp level.
I agree with Martin that 'c' and 'l' are too short but using 'len' for 
length would be acceptable.


Kyrill



Segher


2017-11-29  Segher Boessenkool 

* final.c (output_asm_name): Print insn_cost. Shorten output.  
Print

which_alternative instead of which_alternative + 1.
(output_asm_insn): Print an extra tab if the template is short.

---
 gcc/final.c | 22 +++---
 1 file changed, 15 insertions(+), 7 deletions(-)

diff --git a/gcc/final.c b/gcc/final.c
index fdae241..afb6906 100644
--- a/gcc/final.c
+++ b/gcc/final.c
@@ -3469,16 +3469,20 @@ output_asm_name (void)
 {
   if (debug_insn)
 {
-  int num = INSN_CODE (debug_insn);
-  fprintf (asm_out_file, "\t%s %d\t%s",
-  ASM_COMMENT_START, INSN_UID (debug_insn),
-  insn_data[num].name);
-  if (insn_data[num].n_alternatives > 1)
-   fprintf (asm_out_file, "/%d", which_alternative + 1);
+  fprintf (asm_out_file, "\t%s %d\t",
+  ASM_COMMENT_START, INSN_UID (debug_insn));

+  fprintf (asm_out_file, "[c=%d",
+  insn_cost (debug_insn, optimize_insn_for_speed_p ()));
   if (HAVE_ATTR_length)
-   fprintf (asm_out_file, "\t[length = %d]",
+   fprintf (asm_out_file, " l=%d",
  get_attr_length (debug_insn));
+  fprintf (asm_out_file, "]  ");
+
+  int num = INSN_CODE (debug_insn);
+  fprintf (asm_out_file, "%s", insn_data[num].name);
+  if (insn_data[num].n_alternatives > 1)
+   fprintf (asm_out_file, "/%d", which_alternative);

   /* Clear this so only the first assembler insn
  of any rtl insn will get the special comment for -dp.  */
@@ -3824,6 +3828,10 @@ output_asm_insn (const char *templ, rtx *operands)
 putc (c, asm_out_file);
   }

+  /* Try to keep the asm a bit more readable.  */
+  if ((flag_verbose_asm || flag_print_asm_name) && strlen (templ) < 9)
+putc ('\t', asm_out_file);
+
   /* Write out the variable names for operands, if we know them.  */
   if (flag_verbose_asm)
 output_asm_operand_names (operands, oporder, ops);
--
1.8.3.1





[PATCH, GCC/testsuite] Improve fstack_protector effective target

2017-11-30 Thread Thomas Preudhomme

Hi,

Effective target fstack_protector fails to return an error for
newlib-based target (such as arm-none-eabi targets) which does not
support stack protector. This is due to the test being too simplist for
stack protection code to be generated by GCC: it does not contain a
local buffer and does not read unknown input.

This commit adds a small local buffer with a copy of the filename to
trigger stack protector code to be generated. The filename is used
instead of the full path so as to ensure the size will fit in the local
buffer.

ChangeLog entry is as follows:

*** gcc/testsuite/ChangeLog ***

2017-11-28  Thomas Preud'homme  

* lib/target-supports.exp (check_effective_target_fstack_protector):
Copy filename in local buffer to trigger stack protection.

Testing: Ran gcc.dg/pr38616 on arm-none-eabi and arm-linux-gnueabihf,
the former is now UNSUPPORTED while the latter continues to PASS.

Is this ok for stage3?

Best regards,

Thomas
diff --git a/gcc/testsuite/lib/target-supports.exp b/gcc/testsuite/lib/target-supports.exp
index d30fd368922713d3695f22710197ce7094c977cd..8aff16a25823ec48e76ad6ad8fdc8db998a45877 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -1064,7 +1064,11 @@ proc check_effective_target_static {} {
 # Return 1 if the target supports -fstack-protector
 proc check_effective_target_fstack_protector {} {
 return [check_runtime fstack_protector {
-	int main (void) { return 0; }
+	#include 
+	int main (int argc, char *argv[]) {
+	  char buf[64];
+	  return !strcpy (buf, strrchr (argv[0], '/'));
+	}
 } "-fstack-protector"]
 }
 


Re: [PATCH] final: Improve output for -dp and -fverbose-asm

2017-11-30 Thread Segher Boessenkool
On Thu, Nov 30, 2017 at 09:28:44AM -0700, Martin Sebor wrote:
> >The point is that [length = 12] takes up an awful lot of space.  The
> >output of -fverbose-asm alread suffers from information overload.
> 
> Amount of space vs amount of detail are two different concerns.

Yes.

> If you feel that the output is overly detailed then adding even
> more to it won't help.  Other than that, I don't think trading
> readability for space savings makes sense in a format whose main
> purpose is to be readable.  If it's line length that's a concern
> then using spaces instead of tabs would make them look shorter.

Trunk:

===
.L.yk:
cmpdi 0,4,0  # 8*movdi_internal2/1  [length = 4]
beq 0,.L2# 9*rs6000.md:12754[length = 4]
srdi 9,3,32  # 11   lshrdi3 [length = 4]
xor 9,9,3# 12   *boolsi3[length = 4]
rldicl 9,9,0,32  # 13   zero_extendsidi2/2  [length = 4]
cmpd 7,9,3   # 14   *cmpdi_signed   [length = 4]
beq 7,.L7# 15   *rs6000.md:12754[length = 4]
.L5:
mr 3,4   # 32   *movdi_internal64/3 [length = 4]
blr  # 53   simple_return   [length = 4]
.L2:
lwz 9,0(4)   # 28   zero_extendsidi2/1  [length = 4]
trap # 29   trap[length = 4]
.L7:
addic 4,9,-1 # 47   *adddi3_imm_carry_m1[length = 4]
subfe 4,4,9  # 48   *subfdi3_carry_in_internal  [length = 4]
b .L5# 58   jump[length = 4]
===

Patched:
===
.L.yk:
cmpdi 0,4,0  # 8[c=4 l=4]  *movdi_internal2/0
beq 0,.L2# 9[c=4 l=4]  *rs6000.md:12774
srdi 9,3,32  # 11   [c=4 l=4]  lshrdi3
xor 9,9,3# 12   [c=4 l=4]  *boolsi3
rldicl 9,9,0,32  # 13   [c=4 l=4]  zero_extendsidi2/1
cmpd 7,9,3   # 14   [c=4 l=4]  *cmpdi_signed
beq 7,.L7# 15   [c=4 l=4]  *rs6000.md:12774
.L5:
mr 3,4   # 32   [c=4 l=4]  *movdi_internal64/2
blr  # 76   [c=4 l=4]  simple_return
.L2:
lfiwzx 0,0,4 # 28   [c=8 l=4]  zero_extendsidi2/2
trap # 29   [c=4 l=4]  trap
.L7:
addic 4,9,-1 # 70   [c=4 l=4]  *adddi3_imm_carry_m1
subfe 4,4,9  # 71   [c=4 l=4]  *subfdi3_carry_in_internal
b .L5# 81   [c=4 l=4]  jump
===

It is neither line length nor amt of info that makes the second one
way better readable.


Segher


Re: [PATCH] final: Improve output for -dp and -fverbose-asm

2017-11-30 Thread Michael Matz
Hi,

On Thu, 30 Nov 2017, Kyrill  Tkachov wrote:

> > This improves the assembler output (for -dp and -fverbose-asm) in 
> > several ways.  It always prints the insn_cost.  It does not print 
> > "[length = NN]" but "[c=NN l=NN]", to save space.  It does not add one 
> > to the instruction alternative number (everything else starts counting 
> > those at 0, too).  And finally, it tries to keep things lined up in 
> > columns a bit better.
> >
> > Tested on powerpc64-linux {-m32,-m64}; is this okay for trunk?
> 
> FWIW printing the cost would be hepful to me at the -dp level. I agree 
> with Martin that 'c' and 'l' are too short but using 'len' for length 
> would be acceptable.

Seriously?  You'd have a problem to decipher c/l but not rldicl ?


Ciao,
Michael.


Re: [PATCH] final: Improve output for -dp and -fverbose-asm

2017-11-30 Thread Kyrill Tkachov


On 30/11/17 16:47, Michael Matz wrote:

Hi,

On Thu, 30 Nov 2017, Kyrill  Tkachov wrote:


This improves the assembler output (for -dp and -fverbose-asm) in
several ways.  It always prints the insn_cost.  It does not print
"[length = NN]" but "[c=NN l=NN]", to save space.  It does not add one
to the instruction alternative number (everything else starts counting
those at 0, too).  And finally, it tries to keep things lined up in
columns a bit better.

Tested on powerpc64-linux {-m32,-m64}; is this okay for trunk?

FWIW printing the cost would be hepful to me at the -dp level. I agree
with Martin that 'c' and 'l' are too short but using 'len' for length
would be acceptable.

Seriously?  You'd have a problem to decipher c/l but not rldicl ?


I don't know what rldicl means without looking it up on the Internet ;)
Given how frequently I have to look at these dumps, I could get used to 
any encoding though.

I don't find them too verbose for my purposes, but if saving space is a goal
here then I won't object to keeping them as single characters

Thanks,
Kyrill




Ciao,
Michael.




Re: [PATCH] final: Improve output for -dp and -fverbose-asm

2017-11-30 Thread Martin Sebor

On 11/30/2017 09:44 AM, Segher Boessenkool wrote:

On Thu, Nov 30, 2017 at 09:28:44AM -0700, Martin Sebor wrote:

The point is that [length = 12] takes up an awful lot of space.  The
output of -fverbose-asm alread suffers from information overload.


Amount of space vs amount of detail are two different concerns.


Yes.


If you feel that the output is overly detailed then adding even
more to it won't help.  Other than that, I don't think trading
readability for space savings makes sense in a format whose main
purpose is to be readable.  If it's line length that's a concern
then using spaces instead of tabs would make them look shorter.


Trunk:

===
.L.yk:
cmpdi 0,4,0  # 8*movdi_internal2/1  [length = 4]
beq 0,.L2# 9*rs6000.md:12754[length = 4]
srdi 9,3,32  # 11   lshrdi3 [length = 4]
xor 9,9,3# 12   *boolsi3[length = 4]
rldicl 9,9,0,32  # 13   zero_extendsidi2/2  [length = 4]
cmpd 7,9,3   # 14   *cmpdi_signed   [length = 4]
beq 7,.L7# 15   *rs6000.md:12754[length = 4]
.L5:
mr 3,4   # 32   *movdi_internal64/3 [length = 4]
blr  # 53   simple_return   [length = 4]
.L2:
lwz 9,0(4)   # 28   zero_extendsidi2/1  [length = 4]
trap # 29   trap[length = 4]
.L7:
addic 4,9,-1 # 47   *adddi3_imm_carry_m1[length = 4]
subfe 4,4,9  # 48   *subfdi3_carry_in_internal  [length = 4]
b .L5# 58   jump[length = 4]
===

Patched:
===
.L.yk:
cmpdi 0,4,0  # 8[c=4 l=4]  *movdi_internal2/0
beq 0,.L2# 9[c=4 l=4]  *rs6000.md:12774
srdi 9,3,32  # 11   [c=4 l=4]  lshrdi3
xor 9,9,3# 12   [c=4 l=4]  *boolsi3
rldicl 9,9,0,32  # 13   [c=4 l=4]  zero_extendsidi2/1
cmpd 7,9,3   # 14   [c=4 l=4]  *cmpdi_signed
beq 7,.L7# 15   [c=4 l=4]  *rs6000.md:12774
.L5:
mr 3,4   # 32   [c=4 l=4]  *movdi_internal64/2
blr  # 76   [c=4 l=4]  simple_return
.L2:
lfiwzx 0,0,4 # 28   [c=8 l=4]  zero_extendsidi2/2
trap # 29   [c=4 l=4]  trap
.L7:
addic 4,9,-1 # 70   [c=4 l=4]  *adddi3_imm_carry_m1
subfe 4,4,9  # 71   [c=4 l=4]  *subfdi3_carry_in_internal
b .L5# 81   [c=4 l=4]  jump
===

It is neither line length nor amt of info that makes the second one
way better readable.


The justification certainly makes it easier to read.  But
the abbreviations make it harder to interpret.  [c=4 l=4]
makes no sense to anyone not already familiar with what
it means.

There's nothing wrong with using mnemonics as long as they're
well established and commonly understood.  Absent that, they
should be explained in some accessible document.

Not everyone who reads the verbose assembly is familiar with
GCC internals.  Users read it to help debug problems in their
code.  They shouldn't have to also study GCC source code to
understand what the contents mean.

Martin


Re: [PATCH] final: Improve output for -dp and -fverbose-asm

2017-11-30 Thread Michael Matz
Hi,

On Thu, 30 Nov 2017, Kyrill  Tkachov wrote:

> I don't know what rldicl means without looking it up on the Internet ;) 

:)  (zero_extendsidi2/1 it seems, from Seghers dump, not that I would have 
known without)

> Given how frequently I have to look at these dumps, I could get used to 
> any encoding though. I don't find them too verbose for my purposes, but 
> if saving space is a goal here then I won't object to keeping them as 
> single characters

I personally like the new format much more than what we have (also because 
of the more sensible alignment, not having the variable-length pattern 
name first).


Ciao,
Michael.
P.S: feeling slightly sorry for adding bikeshed discussions to this 
trivial topic ;)


Re: [PATCH] final: Improve output for -dp and -fverbose-asm

2017-11-30 Thread Michael Matz
Hi,

On Thu, 30 Nov 2017, Martin Sebor wrote:

> > addic 4,9,-1 # 70   [c=4 l=4]  *adddi3_imm_carry_m1
> > subfe 4,4,9  # 71   [c=4 l=4]  *subfdi3_carry_in_internal
> > b .L5# 81   [c=4 l=4]  jump
> 
> Not everyone who reads the verbose assembly is familiar with
> GCC internals.  Users read it to help debug problems in their
> code.  They shouldn't have to also study GCC source code to
> understand what the contents mean.

Um, I think that's a bit overactive.  How would you know what 
adddi3_imm_carry_m1 really means without knowing the particular GCC 
backend?  Or what the above magic number after # means?

Or, for that matter, what "length" means?  Could be byte-length, sure.  
But OTOH, for a RISC target it's always four, so why print it?  The GCC 
developers surely meant cycle-length with that, nothing else makes sense.

My point is, to interpret the asm dumps there's no way around having some 
knowledge and getting used to it.  In addition I doubt they're used 
heavily to debug programs.  Rather they're used to study the interaction 
between compiler and program (and potentially to find miscompilations or 
strangenesses the compiler emits).  As such even -fverbose-asm is a 
low-level compiler debugging tool, not something for an end-user that 
needs stability or documentation.


Ciao,
Michael.


Re: [PATCH] final: Improve output for -dp and -fverbose-asm

2017-11-30 Thread Segher Boessenkool
On Thu, Nov 30, 2017 at 09:54:26AM -0700, Martin Sebor wrote:
> >It is neither line length nor amt of info that makes the second one
> >way better readable.
> 
> The justification certainly makes it easier to read.  But
> the abbreviations make it harder to interpret.  [c=4 l=4]
> makes no sense to anyone not already familiar with what
> it means.
> 
> There's nothing wrong with using mnemonics as long as they're
> well established and commonly understood.  Absent that, they
> should be explained in some accessible document.
> 
> Not everyone who reads the verbose assembly is familiar with
> GCC internals.  Users read it to help debug problems in their
> code.  They shouldn't have to also study GCC source code to
> understand what the contents mean.

This is the -dp output, I hardly ever use -fverbose-asm, it has been
unreadable for ten years or so.

-fverbose-asm looks like this:
===
.L.yk:
 # 81288.c:4:   unsigned int *un = (f3 != 0) ? &t4 : 0;
cmpdi 0,4,0  # tmp130, f3
beq 0,.L2#
 # 81288.c:6:   *un ^= t4;
srdi 9,3,32  #, tmp131, t4
xor 9,9,3#, tmp132, tmp131, t4
 # 81288.c:7:   if (*un == t4)
rldicl 9,9,0,32  # tmp133, tmp132
 # 81288.c:7:   if (*un == t4)
cmpd 7,9,3   # t4, tmp134, tmp133
beq 7,.L7#
.L5:
 # 81288.c:11: }
mr 3,4   #, 
blr
.L2:
 # 81288.c:6:   *un ^= t4;
lwz 9,0(4)   # MEM[(unsigned int *)0B], _13
trap
.L7:
 # 81288.c:8: f3 = !!t4;
addic 4,9,-1 # tmp139, tmp133
subfe 4,4,9  # , tmp139, tmp133
b .L5#
===

If we're okay with outputting that kind of stuff to *users*, then how
bad is [c=8 l=4] for GCC developers?  Heh.


Segher


[PATCH v2.1] Support for adding and stripping location_t wrapper nodes

2017-11-30 Thread David Malcolm
On Thu, 2017-11-16 at 10:58 +0100, Richard Biener wrote:
> On Wed, Nov 15, 2017 at 4:33 PM, David Malcolm 
> wrote:
> > On Wed, 2017-11-15 at 12:11 +0100, Richard Biener wrote:
> > > On Wed, Nov 15, 2017 at 7:17 AM, Trevor Saunders  > > de.o
> > > rg> wrote:
> > > > On Fri, Nov 10, 2017 at 04:45:17PM -0500, David Malcolm wrote:
> > > > > This patch provides a mechanism in tree.c for adding a
> > > > > wrapper
> > > > > node
> > > > > for expressing a location_t, for those nodes for which
> > > > > !CAN_HAVE_LOCATION_P, along with a new method of cp_expr.
> > > > > 
> > > > > It's called in later patches in the kit via that new method.
> > > > > 
> > > > > In this version of the patch, I use NON_LVALUE_EXPR for
> > > > > wrapping
> > > > > constants, and VIEW_CONVERT_EXPR for other nodes.
> > > > > 
> > > > > I also turned off wrapper nodes for EXCEPTIONAL_CLASS_P, for
> > > > > the
> > > > > sake
> > > > > of keeping the patch kit more minimal.
> > > > > 
> > > > > The patch also adds a STRIP_ANY_LOCATION_WRAPPER macro for
> > > > > stripping
> > > > > such nodes, used later on in the patch kit.
> > > > 
> > > > I happened to start reading this series near the end and was
> > > > rather
> > > > confused by this macro since it changes variables in a rather
> > > > unhygienic
> > > > way.  Did you consider just defining a inline function to
> > > > return
> > > > the
> > > > actual decl?  It seems like its not used that often so the
> > > > slight
> > > > extra
> > > > syntax should be that big a deal compared to the explicitness.
> > > 
> > > Existing practice  (STRIP_NOPS & friends).  I'm fine either
> > > way,
> > > the patch looks good.
> > > 
> > > Eventually you can simplify things by doing less checking in
> > > location_wrapper_p, like only checking
> > > 
> > > +inline bool location_wrapper_p (const_tree exp)
> > > +{
> > > +  if ((TREE_CODE (exp) == NON_LVALUE_EXPR
> > > +   || (TREE_CODE (exp) == VIEW_CONVERT_EXPR
> > > +  && (TREE_TYPE (exp)
> > > + == TREE_TYPE (TREE_OPERAND (exp, 0)))
> > > +return true;
> > > +  return false;
> > > +}
> > > 
> > > and renaming to maybe_location_wrapper_p.  After all you can't
> > > really
> > > distinguish location wrappers from non-location wrappers?  (and
> > > why
> > > would you want to?)
> > 
> > That's the implementation I originally tried.
> > 
> > As noted in an earlier thread about this, the problem I ran into
> > was
> > (in g++.dg/conversion/reinterpret1.C):
> > 
> >   // PR c++/15076
> > 
> >   struct Y { Y(int &); };
> > 
> >   int v;
> >   Y y1(reinterpret_cast(v));  // { dg-error "" }
> > 
> > where the "reinterpret_cast" has the same type as the VAR_DECL
> > v,
> > and hence the argument to y1 is a NON_LVALUE_EXPR around a
> > VAR_DECL,
> > where both have the same type, and hence location_wrapper_p () on
> > the
> > cast would return true.
> > 
> > Compare with:
> > 
> >   Y y1(v);
> > 
> > where the argument "v" with a location wrapper is a
> > VIEW_CONVERT_EXPR
> > around a VAR_DECL.
> > 
> > With the simpler conditions you suggested above, both are treated
> > as
> > location wrappers (leading to the dg-error in the test failing),
> > whereas with the condition in the patch, only the latter is treated
> > as
> > a location wrapper, and an error is correctly emitted for the dg-
> > error.
> > 
> > Hope this sounds sane.  Maybe the function needs a more detailed
> > comment explaining this?
> 
> Yes.  I guess the above would argue for a new tree code but I can
> see that it is better to avoid that.
> 
> Thanks,
> Richard.

[...]

Here's an updated version of the patch which:
* adds a much more detailed comment to location_wrapper_p,
* fixes an erroneous reference to LOCATION_WRAPPER_EXPR in the
  comment to maybe_wrap_with_location (from an earlier unfinished
  experiment)
* adds a selftest for handling wrapper nodes.

Is there consensus about whether this approach is sane? i.e.  
* adding wrapper nodes via re-using existing tree codes (this kit), vs
* adding them via some new tree code ("LOCATION_WRAPPER_EXPR"?), vs
* the workaround of:
  "[PATCH] C++: use an optional vec for callsites"
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01392.html

Jason? Jakub? others?

(I'd like to get some version of this patch kit into gcc 8; I like
this kit's approach, as a minimal way to fix a real usability issue,
whilst giving us a route to doing more in gcc 9 (c.f. the v3 kit))

Thanks
Dave

gcc/ChangeLog:
PR c++/43486
* tree.c (maybe_wrap_with_location): New function.
(selftest::test_location_wrappers): New function.
(selftest::tree_c_tests): Call it.
* tree.h (STRIP_ANY_LOCATION_WRAPPER): New macro.
(maybe_wrap_with_location): New decl.
(location_wrapper_p): New inline function.

gcc/cp/ChangeLog:
PR c++/43486
* cp-tree.h (cp_expr::maybe_add_location_wrapper): New method.
---
 gcc/cp/cp-tree.h |  6 ++
 gcc/tree.c   | 66 ++

Re: [patch] remove cilk-plus

2017-11-30 Thread Jeff Law
On 11/30/2017 09:32 AM, Marek Polacek wrote:
> On Thu, Nov 30, 2017 at 11:54:11AM +, Koval, Julia wrote:
>> Hi, here is the followup patch. Ok for trunk?
>>
>> gcc/c-family/
>>  * c-common.h (inv_list): Remove.
> 
> In function.h I still see
> 
>   /* In a Cilk function, the VAR_DECL for the frame descriptor. */
>   tree cilk_frame_decl;
> 
> isn't that redundant now, too?
Yes.  I actually saw that today when looking at the issue Florian and
Carlos raised internally. :-)

Assuming a grep doesn't find any references, removal is pre-approved.

jeff


Re: [patch] remove cilk-plus

2017-11-30 Thread Jakub Jelinek
On Thu, Nov 30, 2017 at 10:15:20AM -0700, Jeff Law wrote:
> On 11/30/2017 09:32 AM, Marek Polacek wrote:
> > On Thu, Nov 30, 2017 at 11:54:11AM +, Koval, Julia wrote:
> >> Hi, here is the followup patch. Ok for trunk?
> >>
> >> gcc/c-family/
> >>* c-common.h (inv_list): Remove.
> > 
> > In function.h I still see
> > 
> >   /* In a Cilk function, the VAR_DECL for the frame descriptor. */
> >   tree cilk_frame_decl;
> > 
> > isn't that redundant now, too?
> Yes.  I actually saw that today when looking at the issue Florian and
> Carlos raised internally. :-)
> 
> Assuming a grep doesn't find any references, removal is pre-approved.

grep finds actually tons of further stuff:

grep -i cilk * c*/* 2>/dev/null | grep -v ChangeLog
cgraph.h:  /* True if this is a Cilk Plus variant.  */
cgraph.h:  unsigned int cilk_elemental : 1;
function.h:  /* In a Cilk function, the VAR_DECL for the frame descriptor. */
function.h:  tree cilk_frame_decl;
function.h:  /* This will indicate whether a function is a cilk function */
function.h:  unsigned int is_cilk_function : 1;
function.h:  /* Nonzero if this is a Cilk function that spawns. */
function.h:  unsigned int calls_cilk_spawn : 1;
omp-simd-clone.c:  clone_info->cilk_elemental = false;
target.def:/* Functions relating to OpenMP and Cilk Plus SIMD clones.  */
tree-inline.h:  /* Cilk keywords currently need to replace some variables that
tree-inline.h:  bool remap_var_for_cilk;
c-family/c-attribs.c:  { "cilk simd function", 0, -1, true,  false, false,
c-family/c-attribs.c:  if (lookup_attribute ("cilk simd function",
c-family/c-attribs.c:   "function marked as a Cilk Plus 
SIMD-enabled function");
c-family/c.opt:fcilkplus Undocumented
c-family/c.opt:C ObjC C++ ObjC++ LTO Report Var(flag_cilkplus) Init(0)
cp/cp-tree.h:/* In c-family/cilk.c */
cp/cp-tree.h:extern bool cilk_valid_spawn(tree);
cp/parser.c:  parser->cilk_simd_fn_info = NULL;
cp/parser.c:   clauses, Cilk Plus SIMD-enabled functions' vector attributes, and
cp/parser.c:   a Cilk Plus for loop.
cp/parser.h:/* Helper data structure for parsing #pragma omp declare simd, and 
Cilk Plus
cp/parser.h:  /* When parsing Cilk Plus SIMD-enabled functions' vector 
attributes,
cp/parser.h:  cp_omp_declare_simd_data * GTY((skip)) cilk_simd_fn_info;

Of course, when it is Cilk+ + something else, it should be carefully adjusted.

Jakub


[PATCH][AArch64] Fix address printing on ILP32

2017-11-30 Thread Wilco Dijkstra
Fix address printing for ILP32.  The md file uses 'a' in assembler
templates for symbolic addresses in adrp/add, which end up calling 
aarch64_print_operand_address.  However in ILP32 these are not valid
memory addresses (being ptr_mode rather than Pmode), so the assert
triggers.  Since it is incorrect to use symbols in memory addresses
(besides literal pool accesses), change the 'a' to 'c' in the md file.

Skip one failing test in ILP32 which combines the 'p' modifier with the 'a'
assembler template to fake a memory reference.

This fixes the ICE in https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02509.html.


ChangeLog:
2017-11-30  Wilco Dijkstra  

gcc/
* config/aarch64/aarch64.md (call_insn): Use %c rather than %a.
(call_value_insn): Likewise.
(sibcall_insn): Likewise.
(sibcall_value_insn): Likewise.
(movsi_aarch64): Likewise.
(movdi_aarch64): Likewise.
(add_losym_): Likewise.
(ldr_got_small_): Likewise.
(ldr_got_small_sidi): Likewise.
(ldr_got_small_28k_): Likewise.
(ldr_got_small_28k_sidi): Likewise.
* config/aarch64/aarch64.c (aarch64_print_address_internal):
Move output_addr_const to symbolic case. Add error check.
testsuite/
* gcc.dg/asm-4.c: Skip on AArch64 with ILP32 as test is incorrect.

--
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
6d89b164275c0b31d65a7c5c0361a6d55700215c..7b50ab43dbc075e6b6d4541c3fb71e5cc872c88b
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -5936,10 +5936,11 @@ aarch64_print_address_internal (FILE *f, machine_mode 
mode, rtx x, RTX_CODE op)
return;
 
   case ADDRESS_SYMBOLIC:
-   break;
+   output_addr_const (f, x);
+   return;
   }
 
-  output_addr_const (f, x);
+  output_operand_lossage ("invalid operand for '%%%c'", op);
 }
 
 /* Print address 'x' of a LDP/STP with mode 'mode'.  */
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
8744722b8daf4f89aca65bbce75ff743df025f47..d14b57b0ef7f4eeca40bfdcaf3ebb02a1031cb99
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -749,7 +749,7 @@ (define_insn "*call_insn"
   ""
   "@
   blr\\t%0
-  bl\\t%a0"
+  bl\\t%c0"
   [(set_attr "type" "call, call")]
 )
 
@@ -775,7 +775,7 @@ (define_insn "*call_value_insn"
   ""
   "@
   blr\\t%1
-  bl\\t%a1"
+  bl\\t%c1"
   [(set_attr "type" "call, call")]
 )
 
@@ -811,7 +811,7 @@ (define_insn "*sibcall_insn"
   "SIBLING_CALL_P (insn)"
   "@
br\\t%0
-   b\\t%a0"
+   b\\t%c0"
   [(set_attr "type" "branch, branch")]
 )
 
@@ -824,7 +824,7 @@ (define_insn "*sibcall_value_insn"
   "SIBLING_CALL_P (insn)"
   "@
br\\t%1
-   b\\t%a1"
+   b\\t%c1"
   [(set_attr "type" "branch, branch")]
 )
 
@@ -946,7 +946,7 @@ (define_insn_and_split "*movsi_aarch64"
ldr\\t%s0, %1
str\\t%w1, %0
str\\t%s1, %0
-   adr\\t%x0, %a1
+   adr\\t%x0, %c1
adrp\\t%x0, %A1
fmov\\t%s0, %w1
fmov\\t%w0, %s1
@@ -981,7 +981,7 @@ (define_insn_and_split "*movdi_aarch64"
ldr\\t%d0, %1
str\\t%x1, %0
str\\t%d1, %0
-   adr\\t%x0, %a1
+   adr\\t%x0, %c1
adrp\\t%x0, %A1
fmov\\t%d0, %x1
fmov\\t%x0, %d1
@@ -5401,7 +5401,7 @@ (define_insn "add_losym_"
(lo_sum:P (match_operand:P 1 "register_operand" "r")
  (match_operand 2 "aarch64_valid_symref" "S")))]
   ""
-  "add\\t%0, %1, :lo12:%a2"
+  "add\\t%0, %1, :lo12:%c2"
   [(set_attr "type" "alu_imm")]
 )
 
@@ -5412,7 +5412,7 @@ (define_insn "ldr_got_small_"
  (match_operand:PTR 2 "aarch64_valid_symref" 
"S")))]
UNSPEC_GOTSMALLPIC))]
   ""
-  "ldr\\t%0, [%1, #:got_lo12:%a2]"
+  "ldr\\t%0, [%1, #:got_lo12:%c2]"
   [(set_attr "type" "load_")]
 )
 
@@ -5424,7 +5424,7 @@ (define_insn "ldr_got_small_sidi"
 (match_operand:DI 2 "aarch64_valid_symref" "S")))]
UNSPEC_GOTSMALLPIC)))]
   "TARGET_ILP32"
-  "ldr\\t%w0, [%1, #:got_lo12:%a2]"
+  "ldr\\t%w0, [%1, #:got_lo12:%c2]"
   [(set_attr "type" "load_4")]
 )
 
@@ -5435,7 +5435,7 @@ (define_insn "ldr_got_small_28k_"
  (match_operand:PTR 2 "aarch64_valid_symref" 
"S")))]
UNSPEC_GOTSMALLPIC28K))]
   ""
-  "ldr\\t%0, [%1, #::%a2]"
+  "ldr\\t%0, [%1, #::%c2]"
   [(set_attr "type" "load_")]
 )
 
@@ -5447,7 +5447,7 @@ (define_insn "ldr_got_small_28k_sidi"
 (match_operand:DI 2 "aarch64_valid_symref" "S")))]
UNSPEC_GOTSMALLPIC28K)))]
   "TARGET_ILP32"
-  "ldr\\t%w0, [%1, #:gotpage_lo14:%a2]"
+  "ldr\\t%w0, [%1, #:gotpage_lo14:%c2]"
   [(set_attr "type" "load_4")]
 )
 
diff --git a/gcc/testsuite/gcc.dg/asm-4.c b/gcc/testsuite/gcc.dg/asm-4.c
index 
1e6a538aeabba00cd1bf3d0d636ceec4ed50cf4a..f142daac8c65688fae8b8f3eadaa2066cc8fbd86
 100644
--- a/gcc/testsuite/gcc.dg/asm-4.c
+++ b/gcc/testsuite/gcc.dg/asm-4.c
@@ -1,6 +1,9 @@
 /* { dg-do 

Re: [PATCHv2] Add a warning for invalid function casts

2017-11-30 Thread Jason Merrill
On Thu, Nov 30, 2017 at 11:07 AM, Bernd Edlinger
 wrote:
> On 11/30/17 16:45, Jason Merrill wrote:
>> On Thu, Nov 30, 2017 at 10:14 AM, Bernd Edlinger
>>  wrote:
>>> On 11/29/17 22:57, Jason Merrill wrote:
 On 10/09/2017 06:30 PM, Bernd Edlinger wrote:
>>
> +  if (INTEGRAL_TYPE_P (t1)
> +  && INTEGRAL_TYPE_P (t2)
> +  && TYPE_PRECISION (t1) == TYPE_PRECISION (t2)
> +  && (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2)
> +  || TYPE_PRECISION (t1) >= TYPE_PRECISION (integer_type_node)))
> +return true;

 This section needs a comment explaining what you're allowing and why.
>>>
>>> Okay. I will add a comment here:
>>>
>>> /* The signedness of the parameter matters only when an integral
>>>type smaller than int is promoted to int, otherwise only the
>>>precision of the parameter matters.
>>>This check should make sure that the callee does not see
>>>undefined values in argument registers.  */
>>
>> If we're thinking about argument promotion, should this use
>> type_passed_as rather than assume promotion to int?
>
> I don't know, it is only a heuristic after all, and even if there is no
> warning for a bogus type cast that does not mean any
> correctness-guarantee at all.
>
> Would type_passed_as make any difference for integral types?

Yes, type_passed_as expresses promotion to int on targets that want
that behavior.

Jason


[PATCH] PR libstdc++/65927 Fix -Wunused-but-set-parameter warning

2017-11-30 Thread Jonathan Wakely

This function doesn't do anything, but we can't remove it as the
symbol is exported.

PR libstdc++/65927
* include/bits/locale_facets_nonio.h (__timepunct::_M_am_pm_format):
Remove statement with no effect.

Tested x86_64-linux. Committed to trunk.

commit 8daeda977638fe99187cd0dbfa363a9755e48e19
Author: Jonathan Wakely 
Date:   Thu Nov 30 16:47:06 2017 +

PR libstdc++/65927 Fix -Wunused-but-set-parameter warning

PR libstdc++/65927
* include/bits/locale_facets_nonio.h (__timepunct::_M_am_pm_format):
Remove statement with no effect.

diff --git a/libstdc++-v3/include/bits/locale_facets_nonio.h 
b/libstdc++-v3/include/bits/locale_facets_nonio.h
index 0068cd3addd..6747a25a263 100644
--- a/libstdc++-v3/include/bits/locale_facets_nonio.h
+++ b/libstdc++-v3/include/bits/locale_facets_nonio.h
@@ -236,9 +236,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
__dt[1] = _M_data->_M_date_time_era_format;
   }
 
+#if !_GLIBCXX_INLINE_VERSION
   void
-  _M_am_pm_format(const _CharT* __ampm) const
-  { __ampm = _M_data->_M_am_pm_format; }
+  _M_am_pm_format(const _CharT*) const
+  { /* Kept for ABI compatibility, see PR65927 */ }
+#endif
 
   void
   _M_am_pm(const _CharT** __ampm) const


[PATCH] rs6000: Cleanup bdz/bdnz insn/splitter, add new insn/splitter for bdzt/bdzf/bdnzt/bdnzf

2017-11-30 Thread Aaron Sawdey
This does some cleanup/consolidation so that bdz/bdnz are supported by
a single insn and splitter, and adds a new insn and splitter to support
the conditional form of those (bdzt/bdzf/bdnzt/bdnzf).

This is going to be used for the memcmp() builtin expansion patch which
is next. That also will require the change to canonicalize_condition I
posted before thanksgiving to prevent doloop from being confused by
bdnzt et. al. 

Bootstrap/regtest passes on ppc64le. OK for trunk?


2017-11-30  Aaron Sawdey  

* config/rs6000/rs6000.md (cceq_ior_compare): Remove * so I can use it
to generate rtl.
(cceq_ior_compare_complement): Give it a name so I can use it, and
change boolean_or_operator predicate to boolean_operator so it can
be used to generate a crand.
Define new code iterator eqne, and new code_attrs bd/bd_neg.
(_) New name for ctr_internal[12] now combined into
a single define_insn. There is now just a single splitter for this
that looks whether the decremented ctr result is going to a register
or memory and generates the appropriate rtl.
(tf_) A new insn pattern for the conditional form branch
decrement (bdnzt/bdnzf/bdzt/bdzf). This also has a splitter similar
to the one for _.
* config/rs6000/rs6000.c (rs6000_legitimate_combined_insn): Updated
with the new names of the branch decrement patterns, and added the
names of the branch decrement conditional patterns.

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC ToolchainIndex: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c	(revision 254553)
+++ gcc/config/rs6000/rs6000.c	(working copy)
@@ -9056,10 +9056,14 @@
  for the difficult case.  It's better to not create problems
  in the first place.  */
   if (icode != CODE_FOR_nothing
-  && (icode == CODE_FOR_ctrsi_internal1
-	  || icode == CODE_FOR_ctrdi_internal1
-	  || icode == CODE_FOR_ctrsi_internal2
-	  || icode == CODE_FOR_ctrdi_internal2))
+  && (icode == CODE_FOR_bdz_si
+	  || icode == CODE_FOR_bdz_di
+	  || icode == CODE_FOR_bdnz_si
+	  || icode == CODE_FOR_bdnz_di
+	  || icode == CODE_FOR_bdztf_si
+	  || icode == CODE_FOR_bdnztf_si
+	  || icode == CODE_FOR_bdztf_di
+	  || icode == CODE_FOR_bdnztf_di))
 return false;
 
   return true;
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md	(revision 254553)
+++ gcc/config/rs6000/rs6000.md	(working copy)
@@ -12744,7 +12744,7 @@
 ; which are generated by the branch logic.
 ; Prefer destructive operations where BT = BB (for crXX BT,BA,BB)
 
-(define_insn "*cceq_ior_compare"
+(define_insn "cceq_ior_compare"
   [(set (match_operand:CCEQ 0 "cc_reg_operand" "=y,?y")
 (compare:CCEQ (match_operator:SI 1 "boolean_operator"
 	[(match_operator:SI 2
@@ -12764,9 +12764,9 @@
 
 ; Why is the constant -1 here, but 1 in the previous pattern?
 ; Because ~1 has all but the low bit set.
-(define_insn ""
+(define_insn "cceq_ior_compare_complement"
   [(set (match_operand:CCEQ 0 "cc_reg_operand" "=y,?y")
-(compare:CCEQ (match_operator:SI 1 "boolean_or_operator"
+(compare:CCEQ (match_operator:SI 1 "boolean_operator"
 	[(not:SI (match_operator:SI 2
   "branch_positive_comparison_operator"
   [(match_operand 3
@@ -12983,34 +12983,13 @@
 ;; rs6000_legitimate_combined_insn prevents combine creating any of
 ;; the ctr insns.
 
-(define_insn "ctr_internal1"
-  [(set (pc)
-	(if_then_else (ne (match_operand:P 1 "register_operand" "c,*b,*b,*b")
-			  (const_int 1))
-		  (label_ref (match_operand 0))
-		  (pc)))
-   (set (match_operand:P 2 "nonimmediate_operand" "=1,*r,m,*d*wi*c*l")
-	(plus:P (match_dup 1)
-		(const_int -1)))
-   (clobber (match_scratch:CC 3 "=X,&x,&x,&x"))
-   (clobber (match_scratch:P 4 "=X,X,&r,r"))]
-  ""
-{
-  if (which_alternative != 0)
-return "#";
-  else if (get_attr_length (insn) == 4)
-return "bdnz %l0";
-  else
-return "bdz $+8\;b %l0";
-}
-  [(set_attr "type" "branch")
-   (set_attr "length" "*,16,20,20")])
+(define_code_iterator eqne [eq ne])
+(define_code_attr bd [(ne "bdnz") (eq "bdz")])
+(define_code_attr bd_neg [(ne "bdz") (eq "bdnz")])
 
-;; Similar but use EQ
-
-(define_insn "ctr_internal2"
+(define_insn "_"
   [(set (pc)
-	(if_then_else (eq (match_operand:P 1 "register_operand" "c,*b,*b,*b")
+	(if_then_else (eqne (match_operand:P 1 "register_operand" "c,*b,*b,*b")
 			  (const_int 1))
 		  (label_ref (match_operand 0))
 		  (pc)))
@@ -13024,15 +13003,14 @@
   if (which_alternative != 0)
 return "#";
   else if (get_attr_length (insn) == 4)
-return "bdz %l0";
+return " %l0";
   else
-return "bdnz $+8\;b %l0";
+return " $+8\;b %l0";
 }
   [(set_attr "typ

Re: [PATCH v2.1] Support for adding and stripping location_t wrapper nodes

2017-11-30 Thread Jason Merrill
On Thu, Nov 30, 2017 at 12:16 PM, David Malcolm  wrote:
> On Thu, 2017-11-16 at 10:58 +0100, Richard Biener wrote:
>> On Wed, Nov 15, 2017 at 4:33 PM, David Malcolm 
>> wrote:
>> > On Wed, 2017-11-15 at 12:11 +0100, Richard Biener wrote:
>> > > On Wed, Nov 15, 2017 at 7:17 AM, Trevor Saunders > > > de.o
>> > > rg> wrote:
>> > > > On Fri, Nov 10, 2017 at 04:45:17PM -0500, David Malcolm wrote:
>> > > > > This patch provides a mechanism in tree.c for adding a
>> > > > > wrapper
>> > > > > node
>> > > > > for expressing a location_t, for those nodes for which
>> > > > > !CAN_HAVE_LOCATION_P, along with a new method of cp_expr.
>> > > > >
>> > > > > It's called in later patches in the kit via that new method.
>> > > > >
>> > > > > In this version of the patch, I use NON_LVALUE_EXPR for
>> > > > > wrapping
>> > > > > constants, and VIEW_CONVERT_EXPR for other nodes.
>> > > > >
>> > > > > I also turned off wrapper nodes for EXCEPTIONAL_CLASS_P, for
>> > > > > the
>> > > > > sake
>> > > > > of keeping the patch kit more minimal.
>> > > > >
>> > > > > The patch also adds a STRIP_ANY_LOCATION_WRAPPER macro for
>> > > > > stripping
>> > > > > such nodes, used later on in the patch kit.
>> > > >
>> > > > I happened to start reading this series near the end and was
>> > > > rather
>> > > > confused by this macro since it changes variables in a rather
>> > > > unhygienic
>> > > > way.  Did you consider just defining a inline function to
>> > > > return
>> > > > the
>> > > > actual decl?  It seems like its not used that often so the
>> > > > slight
>> > > > extra
>> > > > syntax should be that big a deal compared to the explicitness.
>> > >
>> > > Existing practice  (STRIP_NOPS & friends).  I'm fine either
>> > > way, the patch looks good.

Note that STRIP_NOPS is now implemented in terms of such an inline
function, it would make sense to do the same here.

>> > > Eventually you can simplify things by doing less checking in
>> > > location_wrapper_p, like only checking
>> > >
>> > > +inline bool location_wrapper_p (const_tree exp)
>> > > +{
>> > > +  if ((TREE_CODE (exp) == NON_LVALUE_EXPR
>> > > +   || (TREE_CODE (exp) == VIEW_CONVERT_EXPR
>> > > +  && (TREE_TYPE (exp)
>> > > + == TREE_TYPE (TREE_OPERAND (exp, 0)))
>> > > +return true;
>> > > +  return false;
>> > > +}
>> > >
>> > > and renaming to maybe_location_wrapper_p.  After all you can't
>> > > really
>> > > distinguish location wrappers from non-location wrappers?  (and
>> > > why
>> > > would you want to?)
>> >
>> > That's the implementation I originally tried.
>> >
>> > As noted in an earlier thread about this, the problem I ran into
>> > was
>> > (in g++.dg/conversion/reinterpret1.C):
>> >
>> >   // PR c++/15076
>> >
>> >   struct Y { Y(int &); };
>> >
>> >   int v;
>> >   Y y1(reinterpret_cast(v));  // { dg-error "" }
>> >
>> > where the "reinterpret_cast" has the same type as the VAR_DECL
>> > v,
>> > and hence the argument to y1 is a NON_LVALUE_EXPR around a
>> > VAR_DECL,
>> > where both have the same type, and hence location_wrapper_p () on
>> > the
>> > cast would return true.
>> >
>> > Compare with:
>> >
>> >   Y y1(v);
>> >
>> > where the argument "v" with a location wrapper is a
>> > VIEW_CONVERT_EXPR
>> > around a VAR_DECL.
>> >
>> > With the simpler conditions you suggested above, both are treated
>> > as
>> > location wrappers (leading to the dg-error in the test failing),
>> > whereas with the condition in the patch, only the latter is treated
>> > as
>> > a location wrapper, and an error is correctly emitted for the dg-
>> > error.
>> >
>> > Hope this sounds sane.  Maybe the function needs a more detailed
>> > comment explaining this?
>>
>> Yes.  I guess the above would argue for a new tree code but I can
>> see that it is better to avoid that.

Basically, we can only strip a NON_LVALUE_EXPR if the argument is
already not an lvalue.  We shouldn't need any extra check on
VIEW_CONVERT_EXPR.

> [...]
>
> Here's an updated version of the patch which:
> * adds a much more detailed comment to location_wrapper_p,
> * fixes an erroneous reference to LOCATION_WRAPPER_EXPR in the
>   comment to maybe_wrap_with_location (from an earlier unfinished
>   experiment)
> * adds a selftest for handling wrapper nodes.
>
> Is there consensus about whether this approach is sane? i.e.
> * adding wrapper nodes via re-using existing tree codes (this kit), vs
> * adding them via some new tree code ("LOCATION_WRAPPER_EXPR"?), vs
> * the workaround of:
>   "[PATCH] C++: use an optional vec for callsites"
> https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01392.html
>
> Jason? Jakub? others?

I still like this approach.

Jason


C++ PATCH for c++/82219, -Wignored-qualifiers and template

2017-11-30 Thread Jason Merrill
Many warnings don't make as much sense in template instantiations, this is one.

Tested x86_64-pc-linux-gnu, applying to trunk.
commit 0b86d1edf07a3abea46c1bdf9a019141a83759b4
Author: Jason Merrill 
Date:   Thu Nov 30 11:56:37 2017 -0500

PR c++/82219 - bogus -Wignored-qualifiers with template

* pt.c (tsubst_copy_and_build) [STATIC_CAST_EXPR]: Suppress
-Wignored-qualifiers.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 7e2f7740106..500ac0c64fe 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -17180,6 +17180,7 @@ tsubst_copy_and_build (tree t,
op = RECUR (TREE_OPERAND (t, 0));
 
warning_sentinel s(warn_useless_cast);
+   warning_sentinel s2(warn_ignored_qualifiers);
switch (TREE_CODE (t))
  {
  case CAST_EXPR:
diff --git a/gcc/testsuite/g++.dg/warn/Wignored-qualifiers1.C 
b/gcc/testsuite/g++.dg/warn/Wignored-qualifiers1.C
new file mode 100644
index 000..5be61a6f394
--- /dev/null
+++ b/gcc/testsuite/g++.dg/warn/Wignored-qualifiers1.C
@@ -0,0 +1,8 @@
+// PR c++/82219
+// { dg-additional-options "-Wall -Wextra" }
+
+struct A {
+  template  T foo(T *) const { return static_cast(0); }
+  void bar() const { foo(&i); }
+  int i;
+};


Re: [PATCH] final: Improve output for -dp and -fverbose-asm

2017-11-30 Thread David Malcolm
On Thu, 2017-11-30 at 18:02 +0100, Michael Matz wrote:
> Hi,
> 
> On Thu, 30 Nov 2017, Martin Sebor wrote:
> 
> > > addic 4,9,-1 # 70   [c=4 l=4]  *adddi3_imm_carry_m1
> > > subfe 4,4,9  # 71   [c=4
> > > l=4]  *subfdi3_carry_in_internal
> > > b .L5# 81   [c=4 l=4]  jump
> > 
> > Not everyone who reads the verbose assembly is familiar with
> > GCC internals.  Users read it to help debug problems in their
> > code.  They shouldn't have to also study GCC source code to
> > understand what the contents mean.
> 
> Um, I think that's a bit overactive.  How would you know what 
> adddi3_imm_carry_m1 really means without knowing the particular GCC 
> backend?  Or what the above magic number after # means?
> 
> Or, for that matter, what "length" means?  Could be byte-length,
> sure.  
> But OTOH, for a RISC target it's always four, so why print it?  The
> GCC 
> developers surely meant cycle-length with that, nothing else makes
> sense.
> 
> My point is, to interpret the asm dumps there's no way around having
> some 
> knowledge and getting used to it.  In addition I doubt they're used 
> heavily to debug programs.  Rather they're used to study the
> interaction 
> between compiler and program (and potentially to find miscompilations
> or 
> strangenesses the compiler emits).  As such even -fverbose-asm is a 
> low-level compiler debugging tool, not something for an end-user
> that 
> needs stability or documentation.

-fverbose-asm is on the border of compiler-debugging vs end-user usage.

FWIW an improvement to -fverbose-asm was explicitly mentioned in the
gcc 7 release notes:
  https://gcc.gnu.org/gcc-7/changes.html
and I've seen at least some end-users comment favorably on that change;
this was:
  https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01051.html
which was originally a "-fasm-show-source" option.


Dave


Re: [PATCH] rs6000: Improve fusion assembler output

2017-11-30 Thread Michael Meissner
On Thu, Nov 30, 2017 at 11:59:37AM +, Segher Boessenkool wrote:
> This improves the output for load and store fusion a little.  In most
> cases it removes the comment output, because that makes the generated
> assembler code hard to read, and equivalent info is available with -dp
> anyway.  For the vector loads it puts the comment on the second insn,
> where it doesn't interfere with other debug comments.
> 
> Mike, does this look good?  Or is there something I'm missing :-)
> 
> Tested on powerpc64-linux {-m32,-m64}.

The comment was used by my perl script (analyze-ppc-asm) that looks at .s files
and prints out statistics.  I can adjust the tool so it no longer looks for the
comment, but actually looks at the adjacent instructions (which I do in a few
other cases).
 
> 
> Segher
> 
> 
> 2017-11-30  Segher Boessenkool  
> 
>   * config/rs6000/rs6000-protos.h (emit_fusion_addis): Remove last two
>   parameters from prototype.
>   * config/rs6000/rs6000.c (emit_fusion_addis): Remove last two
>   parameters.  Don't print a comment.
>   (emit_fusion_gpr_load): Adjust.
>   (emit_fusion_load_store): Adjust.
>   * config/rs6000/rs6000.md (*fusion_p9__constant): Adjust.
>   * config/rs6000/vsx.md (two peepholes): Print the "vector load fusion"
>   comment on the second line.
> 
> gcc/testsuite/
>   * gcc.target/powerpc/fusion.c: Add -dp to options.  Adjust the expected
>   output.
>   * gcc.target/powerpc/fusion3.c: Ditto.
>   * gcc.target/powerpc/fusion4.c: Ditto.
> 
> ---
>  gcc/config/rs6000/rs6000-protos.h  |  2 +-
>  gcc/config/rs6000/rs6000.c | 23 +--
>  gcc/config/rs6000/rs6000.md|  2 +-
>  gcc/config/rs6000/vsx.md   |  4 ++--
>  gcc/testsuite/gcc.target/powerpc/fusion.c  |  4 ++--
>  gcc/testsuite/gcc.target/powerpc/fusion3.c | 10 +-
>  gcc/testsuite/gcc.target/powerpc/fusion4.c |  5 ++---
>  7 files changed, 18 insertions(+), 32 deletions(-)
> 
> diff --git a/gcc/config/rs6000/rs6000-protos.h 
> b/gcc/config/rs6000/rs6000-protos.h
> index f3f5f27..9a16a51 100644
> --- a/gcc/config/rs6000/rs6000-protos.h
> +++ b/gcc/config/rs6000/rs6000-protos.h
> @@ -96,7 +96,7 @@ extern bool quad_address_p (rtx, machine_mode, bool);
>  extern bool quad_load_store_p (rtx, rtx);
>  extern bool fusion_gpr_load_p (rtx, rtx, rtx, rtx);
>  extern void expand_fusion_gpr_load (rtx *);
> -extern void emit_fusion_addis (rtx, rtx, const char *, const char *);
> +extern void emit_fusion_addis (rtx, rtx);
>  extern void emit_fusion_load_store (rtx, rtx, rtx, const char *);
>  extern const char *emit_fusion_gpr_load (rtx, rtx);
>  extern bool fusion_p9_p (rtx, rtx, rtx, rtx);
> diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
> index 2948f2a..9929a45 100644
> --- a/gcc/config/rs6000/rs6000.c
> +++ b/gcc/config/rs6000/rs6000.c
> @@ -38689,16 +38689,10 @@ expand_fusion_gpr_load (rtx *operands)
> sequence.  */
> 
>  void
> -emit_fusion_addis (rtx target, rtx addis_value, const char *comment,
> -const char *mode_name)
> +emit_fusion_addis (rtx target, rtx addis_value)
>  {
>rtx fuse_ops[10];
> -  char insn_template[80];
>const char *addis_str = NULL;
> -  const char *comment_str = ASM_COMMENT_START;
> -
> -  if (*comment_str == ' ')
> -comment_str++;
> 
>/* Emit the addis instruction.  */
>fuse_ops[0] = target;
> @@ -38778,9 +38772,7 @@ emit_fusion_addis (rtx target, rtx addis_value, const 
> char *comment,
>if (!addis_str)
>  fatal_insn ("Could not generate addis value for fusion", addis_value);
> 
> -  sprintf (insn_template, "%s\t\t%s %s, type %s", addis_str, comment_str,
> -comment, mode_name);
> -  output_asm_insn (insn_template, fuse_ops);
> +  output_asm_insn (addis_str, fuse_ops);
>  }
> 
>  /* Emit a D-form load or store instruction that is the second instruction
> @@ -38913,7 +38905,6 @@ emit_fusion_gpr_load (rtx target, rtx mem)
>rtx addr;
>rtx load_offset;
>const char *load_str = NULL;
> -  const char *mode_name = NULL;
>machine_mode mode;
> 
>if (GET_CODE (mem) == ZERO_EXTEND)
> @@ -38929,25 +38920,21 @@ emit_fusion_gpr_load (rtx target, rtx mem)
>switch (mode)
>  {
>  case E_QImode:
> -  mode_name = "char";
>load_str = "lbz";
>break;
> 
>  case E_HImode:
> -  mode_name = "short";
>load_str = "lhz";
>break;
> 
>  case E_SImode:
>  case E_SFmode:
> -  mode_name = (mode == SFmode) ? "float" : "int";
>load_str = "lwz";
>break;
> 
>  case E_DImode:
>  case E_DFmode:
>gcc_assert (TARGET_POWERPC64);
> -  mode_name = (mode == DFmode) ? "double" : "long";
>load_str = "ld";
>break;
> 
> @@ -38956,7 +38943,7 @@ emit_fusion_gpr_load (rtx target, rtx mem)
>  }
> 
>/* Emit the addis instruction.  */
> -  emit_fusion_addis (target, addis_value, "gpr load fusion", mode_name);
> +  em

Re: [PATCH] final: Improve output for -dp and -fverbose-asm

2017-11-30 Thread Martin Sebor

On 11/30/2017 10:02 AM, Michael Matz wrote:

Hi,

On Thu, 30 Nov 2017, Martin Sebor wrote:


addic 4,9,-1 # 70   [c=4 l=4]  *adddi3_imm_carry_m1
subfe 4,4,9  # 71   [c=4 l=4]  *subfdi3_carry_in_internal
b .L5# 81   [c=4 l=4]  jump


Not everyone who reads the verbose assembly is familiar with
GCC internals.  Users read it to help debug problems in their
code.  They shouldn't have to also study GCC source code to
understand what the contents mean.


Um, I think that's a bit overactive.  How would you know what
adddi3_imm_carry_m1 really means without knowing the particular GCC
backend?  Or what the above magic number after # means?


adddi3_imm_carry_m1 seems (mostly) self-explanatory since it's
built up of common assembly mnemonics.  I confess I don't know
what the number after # means, either on powerpc64 or on any
other target.  I'd say that just shows that not even full time
GCC developers are (or can be expected to be) familiar with all
GCC internals, and we shouldn't need to study the back end code
to interpret basic things like # 7 in the output.


Or, for that matter, what "length" means?  Could be byte-length, sure.
But OTOH, for a RISC target it's always four, so why print it?  The GCC
developers surely meant cycle-length with that, nothing else makes sense.


Heh.  I thought it meant the length of the instruction in bytes,
and it made perfect sense to me.  Sounds like I misinterpreted it.
Which suggests that it should be mentioned in the manual (whatever
label it ends up with).  With it documented (and the position on
the line made clear), the length= or l= part could even be skipped
altogether to save a few more bytes if that's important (I don't
think it is in this case).


My point is, to interpret the asm dumps there's no way around having some
knowledge and getting used to it.  In addition I doubt they're used
heavily to debug programs.  Rather they're used to study the interaction
between compiler and program (and potentially to find miscompilations or
strangenesses the compiler emits).  As such even -fverbose-asm is a
low-level compiler debugging tool, not something for an end-user that
needs stability or documentation.


Sure, basic knowledge of the target assembly is prerequisite.
That includes some familiarity with common mnemonics.  But
except for details whose purpose is to expose GCC internals,
knowledge of GCC implementation details (or having to read
GCC source code that prints this stuff) shouldn't be.

The basic point I'm making is that shortening length=NN to l=NN
is not an improvement to the readability of the output and is
contrary both to the documented purpose of the -fverbose-asm
option and Segher's objective for the patch.  The convention
used in the output is to use mnemonics, similar to the assembly
code itself.  One letter mnemonics aren't nearly as effective
as those consisting of multiple letters.  Does l stand for load,
length, latency, or something else?  That's why they are almost
mever used.  In contrast, ld is a known mnemonic for load, len
for length, and lat(?) could with some effort be correctly
interpreted as latency.

This seems fairly elementary to me and I would have expected
it to be non-controversial so I'm not sure why it's being
challenged.  Don't we want the output to be generally useful?
What do we gain by adopting these terse abbreviations?

Martin


Re: [PATCHv2] Add a warning for invalid function casts

2017-11-30 Thread Bernd Edlinger
On 11/30/17 18:29, Jason Merrill wrote:
> On Thu, Nov 30, 2017 at 11:07 AM, Bernd Edlinger
>  wrote:
>> On 11/30/17 16:45, Jason Merrill wrote:
>>> On Thu, Nov 30, 2017 at 10:14 AM, Bernd Edlinger
>>>  wrote:
 On 11/29/17 22:57, Jason Merrill wrote:
> On 10/09/2017 06:30 PM, Bernd Edlinger wrote:
>>>
>> +  if (INTEGRAL_TYPE_P (t1)
>> +  && INTEGRAL_TYPE_P (t2)
>> +  && TYPE_PRECISION (t1) == TYPE_PRECISION (t2)
>> +  && (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2)
>> +  || TYPE_PRECISION (t1) >= TYPE_PRECISION (integer_type_node)))
>> +return true;
>
> This section needs a comment explaining what you're allowing and why.

 Okay. I will add a comment here:

  /* The signedness of the parameter matters only when an integral
 type smaller than int is promoted to int, otherwise only the
 precision of the parameter matters.
 This check should make sure that the callee does not see
 undefined values in argument registers.  */
>>>
>>> If we're thinking about argument promotion, should this use
>>> type_passed_as rather than assume promotion to int?
>>
>> I don't know, it is only a heuristic after all, and even if there is no
>> warning for a bogus type cast that does not mean any
>> correctness-guarantee at all.
>>
>> Would type_passed_as make any difference for integral types?
> 
> Yes, type_passed_as expresses promotion to int on targets that want
> that behavior.
> 

Hmm, I see, mostly arm, sh and msp430 (whatever that may be).

So how would you like this:

   if (INTEGRAL_TYPE_P (t1)
   && INTEGRAL_TYPE_P (t2)
   && TYPE_PRECISION (t1) == TYPE_PRECISION (t2)
   && (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2)
   || !targetm.calls.promote_prototypes (t1)
   || !targetm.calls.promote_prototypes (t2)
   || TYPE_PRECISION (t1) >= TYPE_PRECISION (integer_type_node)))


Thanks
Bernd.


Re: [PATCH, docs] Cleanup riscv option docs.

2017-11-30 Thread Jim Wilson
On Tue, Nov 28, 2017 at 2:17 PM, Jim Wilson  wrote:
> I'll wait a few days in case a doc maintainer wants to comment, and then will
> check it in as is if I don't get any comments.

> gcc/
> * doc/invoke.texi (RISC-V Options): Delete nonexistent -mmemcpy and
> -mno-memcpy options.  For -mplt, -mfdiv, -mdiv, -msave-restore, and
> -mstrict-align, add info on default value.  Delete redundant lines for
> -mabi.  Add missing -mexplicit-relocs docs.

No comments, so I checked it in.

Jim


Re: [PATCHv2] Add a warning for invalid function casts

2017-11-30 Thread Jason Merrill
On Thu, Nov 30, 2017 at 12:55 PM, Bernd Edlinger
 wrote:
> On 11/30/17 18:29, Jason Merrill wrote:
>> On Thu, Nov 30, 2017 at 11:07 AM, Bernd Edlinger
>>  wrote:
>>> On 11/30/17 16:45, Jason Merrill wrote:
 On Thu, Nov 30, 2017 at 10:14 AM, Bernd Edlinger
  wrote:
> On 11/29/17 22:57, Jason Merrill wrote:
>> On 10/09/2017 06:30 PM, Bernd Edlinger wrote:

>>> +  if (INTEGRAL_TYPE_P (t1)
>>> +  && INTEGRAL_TYPE_P (t2)
>>> +  && TYPE_PRECISION (t1) == TYPE_PRECISION (t2)
>>> +  && (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2)
>>> +  || TYPE_PRECISION (t1) >= TYPE_PRECISION (integer_type_node)))
>>> +return true;
>>
>> This section needs a comment explaining what you're allowing and why.
>
> Okay. I will add a comment here:
>
>  /* The signedness of the parameter matters only when an integral
> type smaller than int is promoted to int, otherwise only the
> precision of the parameter matters.
> This check should make sure that the callee does not see
> undefined values in argument registers.  */

 If we're thinking about argument promotion, should this use
 type_passed_as rather than assume promotion to int?
>>>
>>> I don't know, it is only a heuristic after all, and even if there is no
>>> warning for a bogus type cast that does not mean any
>>> correctness-guarantee at all.
>>>
>>> Would type_passed_as make any difference for integral types?
>>
>> Yes, type_passed_as expresses promotion to int on targets that want
>> that behavior.
>>
>
> Hmm, I see, mostly arm, sh and msp430 (whatever that may be).
>
> So how would you like this:
>
>if (INTEGRAL_TYPE_P (t1)
>&& INTEGRAL_TYPE_P (t2)
>&& TYPE_PRECISION (t1) == TYPE_PRECISION (t2)
>&& (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2)
>|| !targetm.calls.promote_prototypes (t1)
>|| !targetm.calls.promote_prototypes (t2)
>|| TYPE_PRECISION (t1) >= TYPE_PRECISION (integer_type_node)))

I was thinking

&& (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2)
|| type_passed_as (t1) == t1))

Jason


Re: generic retuning part 1 - x86-tune-costs update

2017-11-30 Thread Jan Hubicka
> 
> I have benchmarked this quite thoroughly while switching the defaults for Core
> and Zen.  Disabling alignment is pretty much consistent for specfp/int for
> 2000,2006 and 2017 (last is Zen only) and quite consistent one.  I will 
> analyze
> fma3d. It did not show with the alignment change alone, so it may be related
> to other costs or just bad luck.

I have opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83232 for that.
The problem is that SLP gives up vectorization when it sees vectorized used
in the same basic block. Preivously there was alignment prologue while now
we fully unroll the loop. Missed SLP introduce memory mismatch stall.

I am bit lost on why SLP gives up though.

Honza


[C++ PING] Re: [C++ Patch, V2] PR 82593 ("Internal compiler error: in process_init_constructor_array, at cp/typeck2.c:1294")

2017-11-30 Thread Paolo Carlini

Hi,

On 17/11/2017 15:09, Paolo Carlini wrote:

Hi again,

I managed to spend much more time on the issue and I'm starting a new 
thread with a mature - IMHO - proposal: the big thing is the use of 
the existing check_array_designated_initializer in 
process_init_constructor_array,  which calls maybe_constant_value, as 
we want, and covers all the ill-formed cases which I can imagine. I'm 
also tweaking a bit the parser to check the return value of 
require_potential_rvalue_constant_expression in order to avoid 
redundant diagnostic in some cases. Also, a couple more testcases 
beyond the bug report.
I'm gently pinging this. I rebased it vs a very minor conflict due to 
Jakub's implementation of P0329R4. While I was at it, I'm also proposing 
a small tweak vs the previous version in the way 
check_array_designated_initializer is used: only if ce->index is 
non-null, more consistently with the current code. Or see the original post:


    https://gcc.gnu.org/ml/gcc-patches/2017-11/msg01481.html

Thanks!
Paolo.

///
Index: cp/cp-tree.h
===
--- cp/cp-tree.h(revision 255161)
+++ cp/cp-tree.h(working copy)
@@ -6190,6 +6190,8 @@ extern bool require_deduced_type  (tree, tsubst_fl
 
 extern tree finish_case_label  (location_t, tree, tree);
 extern tree cxx_maybe_build_cleanup(tree, tsubst_flags_t);
+extern bool check_array_designated_initializer  (constructor_elt *,
+unsigned HOST_WIDE_INT);
 
 /* in decl2.c */
 extern void record_mangling(tree, bool);
Index: cp/decl.c
===
--- cp/decl.c   (revision 255161)
+++ cp/decl.c   (working copy)
@@ -5249,7 +5249,7 @@ grok_reference_init (tree decl, tree type, tree in
initializer.  If it does, an error is issued.  Returns true if CE
is valid, i.e., does not have a designated initializer.  */
 
-static bool
+bool
 check_array_designated_initializer (constructor_elt *ce,
unsigned HOST_WIDE_INT index)
 {
Index: cp/parser.c
===
--- cp/parser.c (revision 255161)
+++ cp/parser.c (working copy)
@@ -22218,8 +22218,10 @@ cp_parser_initializer_list (cp_parser* parser, boo
 
  if (!cp_parser_parse_definitely (parser))
designator = NULL_TREE;
- else if (non_const)
-   require_potential_rvalue_constant_expression (designator);
+ else if (non_const
+  && (!require_potential_rvalue_constant_expression
+  (designator)))
+   designator = NULL_TREE;
  if (designator)
/* Warn the user that they are using an extension.  */
pedwarn (loc, OPT_Wpedantic,
Index: cp/typeck2.c
===
--- cp/typeck2.c(revision 255161)
+++ cp/typeck2.c(working copy)
@@ -1289,17 +1289,10 @@ process_init_constructor_array (tree type, tree in
 
   FOR_EACH_VEC_SAFE_ELT (v, i, ce)
 {
-  if (ce->index)
-   {
- gcc_assert (TREE_CODE (ce->index) == INTEGER_CST);
- if (compare_tree_int (ce->index, i) != 0)
-   {
- ce->value = error_mark_node;
- sorry ("non-trivial designated initializers not supported");
-   }
-   }
-  else
+  if (!ce->index)
ce->index = size_int (i);
+  else if (!check_array_designated_initializer (ce, i))
+   ce->index = error_mark_node;
   gcc_assert (ce->value);
   ce->value = massage_init_elt (TREE_TYPE (type), ce->value, complain);
 
Index: testsuite/g++.dg/cpp0x/desig2.C
===
--- testsuite/g++.dg/cpp0x/desig2.C (nonexistent)
+++ testsuite/g++.dg/cpp0x/desig2.C (working copy)
@@ -0,0 +1,23 @@
+// PR c++/82593
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+enum {
+ INDEX1 = 0,
+ INDEX2
+};
+
+class SomeClass {
+public:
+ SomeClass();
+private:
+ struct { int field; } member[2];
+};
+
+SomeClass::SomeClass()
+ : member({
+   [INDEX1] = { .field = 0 },
+   [INDEX2] = { .field = 1 }
+ })
+{
+}
Index: testsuite/g++.dg/cpp0x/desig3.C
===
--- testsuite/g++.dg/cpp0x/desig3.C (nonexistent)
+++ testsuite/g++.dg/cpp0x/desig3.C (working copy)
@@ -0,0 +1,21 @@
+// PR c++/82593
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+const int INDEX1 = 0;
+const int INDEX2 = 1;
+
+class SomeClass {
+public:
+ SomeClass();
+private:
+ struct { int field; } member[2];
+};
+
+SomeClass::SomeClass()
+ : member({
+   [INDEX1] = { .field = 0 },
+   [INDEX2] = { .field = 1 }
+ })
+{
+}
Index: testsuite/g++.dg/cpp0x/desig4.C
===

Re: [PATCH GCC][V2]A simple implementation of loop interchange

2017-11-30 Thread Bin.Cheng
On Thu, Nov 30, 2017 at 3:51 PM, Richard Biener
 wrote:
> On Thu, Nov 30, 2017 at 4:09 PM, Richard Biener
>  wrote:
>> On Thu, Nov 30, 2017 at 3:13 PM, Bin.Cheng  wrote:
>>> On Thu, Nov 30, 2017 at 1:01 PM, Richard Biener
>>>  wrote:
>>
>> Ok, I'd like to "dumb" the pass down to the level we can solve the
>> bwave case (which I realize is already one of the more complicated ones).
>>
>> Just because it's already late for GCC 8.
>
> For reference I'll commit the following tomorrow, will play with adding
> a testcase for bwaves and doing the multiple_of_p thing we talked about.
Given instantiated scev in parameterized case like:
{{{p_19(D), +, 8}_1, +, (long unsigned int) n_16(D) * 8}_2, +, (long
unsigned int) (n_16(D) * n_16(D)) * 8}_3
it's ideal if we can relate the variable part in stride with loop
niters.  Unfortunately that's impractical because niter
computation and address computation may expand variables differently.
 As in case of bwaves.
This leaves us the method to check multiple relation for stride itself:
(long unsigned int) (n_16(D) * n_16(D)) * 8  ;;stride X
 vs.
(long unsigned int) n_16(D) * 8 ;;stride Y
 vs.
8;;stride Z
>From inner loop to outer, we check if Y is multiple of Z and if X is
multiple of Y, so strides computed are like:
8 * AVG_LOOP_NITERS * AVG_LOOP_NITERS
 vs.
8 * AVG_LOOP_NITERS
 vs.
8

To make it general, we also need to check if X is multiple of previous
stride (Z, in this case) if check on Y failed.

The multiple check on tree expr is weak, so one question is how to do
if failed.  Giving up or using a placeholder const stride?

Thanks,
bin
>
> Richard.


[PATCH, v2.2] Support for adding and stripping location_t wrapper nodes

2017-11-30 Thread David Malcolm
On Thu, 2017-11-30 at 12:36 -0500, Jason Merrill wrote:
> On Thu, Nov 30, 2017 at 12:16 PM, David Malcolm 
> wrote:
> > On Thu, 2017-11-16 at 10:58 +0100, Richard Biener wrote:
> > > On Wed, Nov 15, 2017 at 4:33 PM, David Malcolm  > > om>
> > > wrote:
> > > > On Wed, 2017-11-15 at 12:11 +0100, Richard Biener wrote:
> > > > > On Wed, Nov 15, 2017 at 7:17 AM, Trevor Saunders  > > > > saun
> > > > > de.o
> > > > > rg> wrote:
> > > > > > On Fri, Nov 10, 2017 at 04:45:17PM -0500, David Malcolm
> > > > > > wrote:
> > > > > > > This patch provides a mechanism in tree.c for adding a
> > > > > > > wrapper
> > > > > > > node
> > > > > > > for expressing a location_t, for those nodes for which
> > > > > > > !CAN_HAVE_LOCATION_P, along with a new method of cp_expr.
> > > > > > > 
> > > > > > > It's called in later patches in the kit via that new
> > > > > > > method.
> > > > > > > 
> > > > > > > In this version of the patch, I use NON_LVALUE_EXPR for
> > > > > > > wrapping
> > > > > > > constants, and VIEW_CONVERT_EXPR for other nodes.
> > > > > > > 
> > > > > > > I also turned off wrapper nodes for EXCEPTIONAL_CLASS_P,
> > > > > > > for
> > > > > > > the
> > > > > > > sake
> > > > > > > of keeping the patch kit more minimal.
> > > > > > > 
> > > > > > > The patch also adds a STRIP_ANY_LOCATION_WRAPPER macro
> > > > > > > for
> > > > > > > stripping
> > > > > > > such nodes, used later on in the patch kit.
> > > > > > 
> > > > > > I happened to start reading this series near the end and
> > > > > > was
> > > > > > rather
> > > > > > confused by this macro since it changes variables in a
> > > > > > rather
> > > > > > unhygienic
> > > > > > way.  Did you consider just defining a inline function to
> > > > > > return
> > > > > > the
> > > > > > actual decl?  It seems like its not used that often so the
> > > > > > slight
> > > > > > extra
> > > > > > syntax should be that big a deal compared to the
> > > > > > explicitness.
> > > > > 
> > > > > Existing practice  (STRIP_NOPS & friends).  I'm fine
> > > > > either
> > > > > way, the patch looks good.
> 
> Note that STRIP_NOPS is now implemented in terms of such an inline
> function, it would make sense to do the same here.

FWIW STRIP_NOPS is currently:

#define STRIP_NOPS(EXP) \
  (EXP) = tree_strip_nop_conversions (CONST_CAST_TREE (EXP))

and tree_strip_nop_conversions *isn't* inline.

But assuming I follow you, I've updated the new macro to be:

#define STRIP_ANY_LOCATION_WRAPPER(EXP) \
  (EXP) = tree_strip_any_location_wrapper (CONST_CAST_TREE (EXP))

introducing a new inline tree_strip_any_location_wrapper.

> > > > > Eventually you can simplify things by doing less checking in
> > > > > location_wrapper_p, like only checking
> > > > > 
> > > > > +inline bool location_wrapper_p (const_tree exp)
> > > > > +{
> > > > > +  if ((TREE_CODE (exp) == NON_LVALUE_EXPR
> > > > > +   || (TREE_CODE (exp) == VIEW_CONVERT_EXPR
> > > > > +  && (TREE_TYPE (exp)
> > > > > + == TREE_TYPE (TREE_OPERAND (exp, 0)))
> > > > > +return true;
> > > > > +  return false;
> > > > > +}
> > > > > 
> > > > > and renaming to maybe_location_wrapper_p.  After all you
> > > > > can't
> > > > > really
> > > > > distinguish location wrappers from non-location
> > > > > wrappers?  (and
> > > > > why
> > > > > would you want to?)
> > > > 
> > > > That's the implementation I originally tried.
> > > > 
> > > > As noted in an earlier thread about this, the problem I ran
> > > > into
> > > > was
> > > > (in g++.dg/conversion/reinterpret1.C):
> > > > 
> > > >   // PR c++/15076
> > > > 
> > > >   struct Y { Y(int &); };
> > > > 
> > > >   int v;
> > > >   Y y1(reinterpret_cast(v));  // { dg-error "" }
> > > > 
> > > > where the "reinterpret_cast" has the same type as the
> > > > VAR_DECL
> > > > v,
> > > > and hence the argument to y1 is a NON_LVALUE_EXPR around a
> > > > VAR_DECL,
> > > > where both have the same type, and hence location_wrapper_p ()
> > > > on
> > > > the
> > > > cast would return true.
> > > > 
> > > > Compare with:
> > > > 
> > > >   Y y1(v);
> > > > 
> > > > where the argument "v" with a location wrapper is a
> > > > VIEW_CONVERT_EXPR
> > > > around a VAR_DECL.
> > > > 
> > > > With the simpler conditions you suggested above, both are
> > > > treated
> > > > as
> > > > location wrappers (leading to the dg-error in the test
> > > > failing),
> > > > whereas with the condition in the patch, only the latter is
> > > > treated
> > > > as
> > > > a location wrapper, and an error is correctly emitted for the
> > > > dg-
> > > > error.
> > > > 
> > > > Hope this sounds sane.  Maybe the function needs a more
> > > > detailed
> > > > comment explaining this?
> > > 
> > > Yes.  I guess the above would argue for a new tree code but I can
> > > see that it is better to avoid that.
> 
> Basically, we can only strip a NON_LVALUE_EXPR if the argument is
> already not an lvalue.  We shouldn't need any extra check on
> VIEW_CONVERT_EXPR.

Aha.  Thanks.  I've updated loc

Re: [PATCHv2] Add a warning for invalid function casts

2017-11-30 Thread Bernd Edlinger
On 11/30/17 19:05, Jason Merrill wrote:
> On Thu, Nov 30, 2017 at 12:55 PM, Bernd Edlinger
>  wrote:
>> On 11/30/17 18:29, Jason Merrill wrote:
>>> On Thu, Nov 30, 2017 at 11:07 AM, Bernd Edlinger
>>>  wrote:
 On 11/30/17 16:45, Jason Merrill wrote:
> On Thu, Nov 30, 2017 at 10:14 AM, Bernd Edlinger
>  wrote:
>> On 11/29/17 22:57, Jason Merrill wrote:
>>> On 10/09/2017 06:30 PM, Bernd Edlinger wrote:
>
 +  if (INTEGRAL_TYPE_P (t1)
 +  && INTEGRAL_TYPE_P (t2)
 +  && TYPE_PRECISION (t1) == TYPE_PRECISION (t2)
 +  && (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2)
 +  || TYPE_PRECISION (t1) >= TYPE_PRECISION (integer_type_node)))
 +return true;
>>>
>>> This section needs a comment explaining what you're allowing and why.
>>
>> Okay. I will add a comment here:
>>
>>   /* The signedness of the parameter matters only when an integral
>>  type smaller than int is promoted to int, otherwise only the
>>  precision of the parameter matters.
>>  This check should make sure that the callee does not see
>>  undefined values in argument registers.  */
>
> If we're thinking about argument promotion, should this use
> type_passed_as rather than assume promotion to int?

 I don't know, it is only a heuristic after all, and even if there is no
 warning for a bogus type cast that does not mean any
 correctness-guarantee at all.

 Would type_passed_as make any difference for integral types?
>>>
>>> Yes, type_passed_as expresses promotion to int on targets that want
>>> that behavior.
>>>
>>
>> Hmm, I see, mostly arm, sh and msp430 (whatever that may be).
>>
>> So how would you like this:
>>
>> if (INTEGRAL_TYPE_P (t1)
>> && INTEGRAL_TYPE_P (t2)
>> && TYPE_PRECISION (t1) == TYPE_PRECISION (t2)
>> && (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2)
>> || !targetm.calls.promote_prototypes (t1)
>> || !targetm.calls.promote_prototypes (t2)
>> || TYPE_PRECISION (t1) >= TYPE_PRECISION (integer_type_node)))
> 
> I was thinking
> 
>  && (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2)
>  || type_passed_as (t1) == t1))
> 

Which should be hopefully equivalent, since type_passed_as also compares
size(t1) with sizeof(int), and calls the same hook.

It is unfortunate that I don't see an equivalent function to use in the
C FE.

But they just have to agree on this target dependency.

So in the C front-end I would have to use the target hook as above,
unless Joseph can point out a more appropriate way.

I personally would like to keep the symmetry between C and C++ here,
but if you strongly prefer to use type_passed_as, I can use your
suggestion in the C++ FE instead.


Thanks
Bernd.


Re: [PATCHv2] Add a warning for invalid function casts

2017-11-30 Thread Jason Merrill
On Thu, Nov 30, 2017 at 1:23 PM, Bernd Edlinger
 wrote:
> On 11/30/17 19:05, Jason Merrill wrote:
>> On Thu, Nov 30, 2017 at 12:55 PM, Bernd Edlinger
>>  wrote:
>>> On 11/30/17 18:29, Jason Merrill wrote:
 On Thu, Nov 30, 2017 at 11:07 AM, Bernd Edlinger
  wrote:
> On 11/30/17 16:45, Jason Merrill wrote:
>> On Thu, Nov 30, 2017 at 10:14 AM, Bernd Edlinger
>>  wrote:
>>> On 11/29/17 22:57, Jason Merrill wrote:
 On 10/09/2017 06:30 PM, Bernd Edlinger wrote:
>>
> +  if (INTEGRAL_TYPE_P (t1)
> +  && INTEGRAL_TYPE_P (t2)
> +  && TYPE_PRECISION (t1) == TYPE_PRECISION (t2)
> +  && (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2)
> +  || TYPE_PRECISION (t1) >= TYPE_PRECISION (integer_type_node)))
> +return true;

 This section needs a comment explaining what you're allowing and why.
>>>
>>> Okay. I will add a comment here:
>>>
>>>   /* The signedness of the parameter matters only when an integral
>>>  type smaller than int is promoted to int, otherwise only the
>>>  precision of the parameter matters.
>>>  This check should make sure that the callee does not see
>>>  undefined values in argument registers.  */
>>
>> If we're thinking about argument promotion, should this use
>> type_passed_as rather than assume promotion to int?
>
> I don't know, it is only a heuristic after all, and even if there is no
> warning for a bogus type cast that does not mean any
> correctness-guarantee at all.
>
> Would type_passed_as make any difference for integral types?

 Yes, type_passed_as expresses promotion to int on targets that want
 that behavior.

>>>
>>> Hmm, I see, mostly arm, sh and msp430 (whatever that may be).
>>>
>>> So how would you like this:
>>>
>>> if (INTEGRAL_TYPE_P (t1)
>>> && INTEGRAL_TYPE_P (t2)
>>> && TYPE_PRECISION (t1) == TYPE_PRECISION (t2)
>>> && (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2)
>>> || !targetm.calls.promote_prototypes (t1)
>>> || !targetm.calls.promote_prototypes (t2)
>>> || TYPE_PRECISION (t1) >= TYPE_PRECISION (integer_type_node)))
>>
>> I was thinking
>>
>>  && (TYPE_UNSIGNED (t1) == TYPE_UNSIGNED (t2)
>>  || type_passed_as (t1) == t1))
>>
>
> Which should be hopefully equivalent, since type_passed_as also compares
> size(t1) with sizeof(int), and calls the same hook.
>
> It is unfortunate that I don't see an equivalent function to use in the
> C FE.
>
> But they just have to agree on this target dependency.
>
> So in the C front-end I would have to use the target hook as above,
> unless Joseph can point out a more appropriate way.
>
> I personally would like to keep the symmetry between C and C++ here,

OK, that makes sense.

Jason


[PATCH, gcc-7] Riscv doc fixes and improvements.

2017-11-30 Thread Jim Wilson
This backports 4 commits, so that the riscv user docs will be correct on the
gcc-7 branch.

Tested with gcc build, and looking at the gcc.info file to make sure it is
correct.  Committed.

This is the last riscv patch I'm backporting to the gcc-7 branch, unless some
new issue comes up.

Jim

gcc/
Backport from mainline
2017-11-30  Jim Wilson  
* doc/invoke.texi (RISC-V Options): Delete nonexistent -mmemcpy and
-mno-memcpy options.  For -mplt, -mfdiv, -mdiv, -msave-restore, and
-mstrict-align, add info on default value.  Delete redundant lines for
-mabi.  Add missing -mexplicit-relocs docs.

Backport from mainline
2017-11-01  Palmer Dabbelt  
* doc/invoke.texi (RISC-V Options): Use "@minus{}2 GB", not "-2 GB".
* doc/invoke.texi (RISC-V Options): Explicitly name the medlow
and medany code models, and describe what they do.

2017-10-27  Palmer Dabbelt  
PR target/82717
* doc/invoke.texi (RISC-V) <-mabi>: Correct and improve.
---
 gcc/doc/invoke.texi | 70 ++---
 1 file changed, 51 insertions(+), 19 deletions(-)

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 9b1857c..7311c10 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -966,7 +966,6 @@ See RS/6000 and PowerPC Options.
 
 @emph{RISC-V Options}
 @gccoptlist{-mbranch-cost=@var{N-instruction} @gol
--mmemcpy  -mno-memcpy @gol
 -mplt  -mno-plt @gol
 -mabi=@var{ABI-string} @gol
 -mfdiv  -mno-fdiv @gol
@@ -976,7 +975,7 @@ See RS/6000 and PowerPC Options.
 -msmall-data-limit=@var{N-bytes} @gol
 -msave-restore  -mno-save-restore @gol
 -mstrict-align -mno-strict-align @gol
--mcmodel=@var{code-model} @gol
+-mcmodel=medlow -mcmodel=medany @gol
 -mexplicit-relocs  -mno-explicit-relocs @gol}
 
 @emph{RL78 Options}
@@ -20885,32 +20884,46 @@ These command-line options are defined for RISC-V 
targets:
 @opindex mbranch-cost
 Set the cost of branches to roughly @var{n} instructions.
 
-@item -mmemcpy
-@itemx -mno-memcpy
-@opindex mmemcpy
-Don't optimize block moves.
-
 @item -mplt
 @itemx -mno-plt
 @opindex plt
-When generating PIC code, allow the use of PLTs. Ignored for non-PIC.
+When generating PIC code, do or don't allow the use of PLTs. Ignored for
+non-PIC.  The default is @option{-mplt}.
 
 @item -mabi=@var{ABI-string}
 @opindex mabi
-Specify integer and floating-point calling convention.  This defaults to the
-natural calling convention: e.g.@ LP64 for RV64I, ILP32 for RV32I, LP64D for
-RV64G.
+Specify integer and floating-point calling convention.  @var{ABI-string}
+contains two parts: the size of integer types and the registers used for
+floating-point types.  For example @samp{-march=rv64ifd -mabi=lp64d} means that
+@samp{long} and pointers are 64-bit (implicitly defining @samp{int} to be
+32-bit), and that floating-point values up to 64 bits wide are passed in F
+registers.  Contrast this with @samp{-march=rv64ifd -mabi=lp64f}, which still
+allows the compiler to generate code that uses the F and D extensions but only
+allows floating-point values up to 32 bits long to be passed in registers; or
+@samp{-march=rv64ifd -mabi=lp64}, in which no floating-point arguments will be
+passed in registers.
+
+The default for this argument is system dependent, users who want a specific
+calling convention should specify one explicitly.  The valid calling
+conventions are: @samp{ilp32}, @samp{ilp32f}, @samp{ilp32d}, @samp{lp64},
+@samp{lp64f}, and @samp{lp64d}.  Some calling conventions are impossible to
+implement on some ISAs: for example, @samp{-march=rv32if -mabi=ilp32d} is
+invalid because the ABI requires 64-bit values be passed in F registers, but F
+registers are only 32 bits wide.
 
 @item -mfdiv
 @itemx -mno-fdiv
 @opindex mfdiv
-Use hardware floating-point divide and square root instructions.  This requires
-the F or D extensions for floating-point registers.
+Do or don't use hardware floating-point divide and square root instructions.
+This requires the F or D extensions for floating-point registers.  The default
+is to use them if the specified architecture has these instructions.
 
 @item -mdiv
 @itemx -mno-div
 @opindex mdiv
-Use hardware instructions for integer division.  This requires the M extension.
+Do or don't use hardware instructions for integer division.  This requires the
+M extension.  The default is to use them if the specified architecture has
+these instructions.
 
 @item -march=@var{ISA-string}
 @opindex march
@@ -20930,16 +20943,35 @@ Put global and static data smaller than @var{n} bytes 
into a special section
 @item -msave-restore
 @itemx -mno-save-restore
 @opindex msave-restore
-Use smaller but slower prologue and epilogue code.
+Do or don't use smaller but slower prologue and epilogue code that uses
+library function calls.  The default is to use fast inline prologues and
+epilogues.
 
 @item -mstrict-align
 @itemx -mno-strict-align
 @opindex mstri

Re: [PATCH GCC][V2]A simple implementation of loop interchange

2017-11-30 Thread Richard Biener
On November 30, 2017 7:20:18 PM GMT+01:00, "Bin.Cheng"  
wrote:
>On Thu, Nov 30, 2017 at 3:51 PM, Richard Biener
> wrote:
>> On Thu, Nov 30, 2017 at 4:09 PM, Richard Biener
>>  wrote:
>>> On Thu, Nov 30, 2017 at 3:13 PM, Bin.Cheng 
>wrote:
 On Thu, Nov 30, 2017 at 1:01 PM, Richard Biener
  wrote:
>>>
>>> Ok, I'd like to "dumb" the pass down to the level we can solve the
>>> bwave case (which I realize is already one of the more complicated
>ones).
>>>
>>> Just because it's already late for GCC 8.
>>
>> For reference I'll commit the following tomorrow, will play with
>adding
>> a testcase for bwaves and doing the multiple_of_p thing we talked
>about.
>Given instantiated scev in parameterized case like:
>{{{p_19(D), +, 8}_1, +, (long unsigned int) n_16(D) * 8}_2, +, (long
>unsigned int) (n_16(D) * n_16(D)) * 8}_3
>it's ideal if we can relate the variable part in stride with loop
>niters.  Unfortunately that's impractical because niter
>computation and address computation may expand variables differently.
> As in case of bwaves.
>This leaves us the method to check multiple relation for stride itself:
>(long unsigned int) (n_16(D) * n_16(D)) * 8  ;;stride X
> vs.
>(long unsigned int) n_16(D) * 8 ;;stride Y
> vs.
>8   
>;;stride Z
>From inner loop to outer, we check if Y is multiple of Z and if X is
>multiple of Y, so strides computed are like:
>8 * AVG_LOOP_NITERS * AVG_LOOP_NITERS
> vs.
>8 * AVG_LOOP_NITERS
> vs.
>8
>
>To make it general, we also need to check if X is multiple of previous
>stride (Z, in this case) if check on Y failed.
>
>The multiple check on tree expr is weak, so one question is how to do
>if failed.  Giving up or using a placeholder const stride?

The question is whether we are interested in just whether strideA is bigger 
than strideB or in the actual difference. If the former then substituting a 
constant of course works.  For multiplications that's even reasonable but for 
other ops the relation gets too weak IMHO. Maybe we can extract sth like 
factors we can more easily relate.  I guess only experiments will tell. At 
least we should separate step extraction and this 'analysis' given if all steps 
are constant we can employ more precise cost models. 

Richard. 

>Thanks,
>bin
>>
>> Richard.



Re: generic retuning part 1 - x86-tune-costs update

2017-11-30 Thread Richard Biener
On November 30, 2017 7:14:12 PM GMT+01:00, Jan Hubicka  wrote:
>> 
>> I have benchmarked this quite thoroughly while switching the defaults
>for Core
>> and Zen.  Disabling alignment is pretty much consistent for
>specfp/int for
>> 2000,2006 and 2017 (last is Zen only) and quite consistent one.  I
>will analyze
>> fma3d. It did not show with the alignment change alone, so it may be
>related
>> to other costs or just bad luck.
>
>I have opened https://gcc.gnu.org/bugzilla/show_bug.cgi?id=83232 for
>that.
>The problem is that SLP gives up vectorization when it sees vectorized
>used
>in the same basic block. Preivously there was alignment prologue while
>now
>we fully unroll the loop. Missed SLP introduce memory mismatch stall.
>
>I am bit lost on why SLP gives up though.

I will have a look - it shouldn't give up so easily. Maybe it's a costing 
issue. 

Richard. 

>Honza



Re: [C/C++] Add support for #pragma GCC unroll v3

2017-11-30 Thread Joseph Myers
On Sat, 25 Nov 2017, Eric Botcazou wrote:

> Hi,
> 
> this is the (hopefully) final implementation of the support for the unrolling 
> pragma in the C and C++ front-ends.  It contains a couple of fixes for the 
> C++ 
> front-ends to make it correctly handle unroll and ivdep for the same loop.
> 
> Tested on x86_64-suse-linux, OK for the mainline?

The C front-end changes are OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, v2.2] Support for adding and stripping location_t wrapper nodes

2017-11-30 Thread Jason Merrill
On Thu, Nov 30, 2017 at 1:26 PM, David Malcolm  wrote:
> +/* Test if EXP is merely a wrapper node, added to express a location_t
> +   on behalf of the node's child (e.g. by maybe_wrap_with_location).
> +
> +   A wrapper node has code NON_LVALUE_EXPR or VIEW_CONVERT_EXPR, and the
> +   same type as its operand.
> +
> +   NON_LVALUE_EXPR is used for wrapping constants.
> +   VIEW_CONVERT_EXPR is used for wrapping non-constants.
> +
> +   A subtlety is that we may have to test whether we have the correct
> +   TREE_CODE for the wrapped TREE_CODE.  Otherwise, e.g. the C++ expression:
> + reinterpret_cast(some_int_var)
> +   is a NON_LVALUE_EXPR around a non-constant of the same type, and
> +   could thus be mischaracterized as a location wrapper node.
> +
> +   Hence we need to check CONSTANT_CLASS_P (TREE_OPERAND (EXP, 0))
> +   for a NON_LVALUE_EXPR for EXP to be a wrapper.
> +   We don't need to check a VIEW_CONVERT_EXPR that wraps the same type
> +   for !CONSTANT_CLASS_P; such instances of VIEW_CONVERT_EXPR are always
> +   wrappers.  */

I think most of this comment (everything after the first paragraph)
belongs inside the function rather than before it.  OK with that
change.

Jason


Re: [PATCH v2: 00/14] Preserving locations for variable-uses and constants (PR 43486)

2017-11-30 Thread David Malcolm
Ping for the rest of this kit:
  https://gcc.gnu.org/ml/gcc-patches/2017-11/msg00880.html

(thanks for the review of patch 2 of the kit)

On Fri, 2017-11-10 at 16:45 -0500, David Malcolm wrote:
> On Thu, 2017-11-02 at 10:46 -0400, Jason Merrill wrote:
> > On Tue, Oct 31, 2017 at 5:09 PM, David Malcolm  > >
> > wrote:
> > > On Tue, 2017-10-24 at 09:53 -0400, Jason Merrill wrote:
> > > > On Fri, Oct 20, 2017 at 5:53 PM, David Malcolm  > > > .c
> > > > om>
> > > > wrote:
> > > > > Design questions:
> > > > > 
> > > > > * The patch introduces a new kind of tree node, currently
> > > > > called
> > > > >   DECL_WRAPPER_EXPR (although it's used for wrapping
> > > > > constants
> > > > > as
> > > > > well
> > > > >   as decls).  Should wrappers be a new kind of tree node, or
> > > > > should
> > > > > they
> > > > >   reuse an existing TREE_CODE? (e.g. NOP_EXPR, CONVERT_EXPR,
> > > > > etc).
> > > > > * NOP_EXPR: seems to be for use as an rvalue
> > > > > * CONVERT_EXPR: for type conversions
> > > > > * NON_LVALUE_EXPR: "Value is same as argument, but
> > > > > guaranteed
> > > > > not an
> > > > >   lvalue"
> > > > >   * but we *do* want to support lvalues here
> > > > 
> > > > I think using NON_LVALUE_EXPR for constants would be
> > > > appropriate.
> > > > 
> > > > > * VIEW_CONVERT_EXPR: viewing one thing as of a different
> > > > > type
> > > > >   * can it support lvalues?
> > > > 
> > > > Yes, the purpose of VIEW_CONVERT_EXPR is to support lvalues, it
> > > > seems
> > > > like the right choice.
> > > > 
> > > > Jason
> > > 
> > > Thanks.  I've been working on a new version of the patch using
> > > those
> > > tree codes, but have run into an issue.
> > > 
> > > In g++.dg/conversion/reinterpret1.C:
> > > 
> > >   // PR c++/15076
> > > 
> > >   struct Y { Y(int &); };
> > > 
> > >   int v;
> > >   Y y1(reinterpret_cast(v));  // { dg-error "" }
> > > 
> > > With trunk, this successfully generates an error:
> > > 
> > >   reinterpret1.C:6:6: error: cannot bind non-const lvalue
> > > reference
> > > of type ‘int&’ to an rvalue of type ‘int’
> > >Y y1(reinterpret_cast(v));  // { dg-error "" }
> > > ^~~~
> > >   reinterpret1.C:3:12: note:   initializing argument 1 of
> > > ‘Y::Y(int&)’
> > >struct Y { Y(int &); };
> > >   ^
> > > 
> > > where internally there's a NON_LVALUE_EXPR around a VAR_DECL,
> > > where
> > > both have the same type:
> > > 
> > > (gdb) call debug_tree (expr)
> > >   > > type  > > size 
> > > unit-size 
> > > align:32 warn_if_not_align:0 symtab:0 alias-set -1
> > > canonical-type 0x7132e5e8 precision:32 min  > > 0x713310d8 -2147483648> max  > > 2147483647>
> > > pointer_to_this 
> > > reference_to_this >
> > > 
> > > arg:0  > > 0x7132e5e8 int>
> > > used public static tree_1 read SI /home/david/coding-
> > > 3/gcc-
> > > git-expr-vs-
> > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C:5:5 size
> > >  unit-size  > > 0x71331138 4>
> > > align:32 warn_if_not_align:0 context
> > >  > > 0x7131e168 /home/david/coding-3/gcc-git-expr-vs-
> > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C>
> > > chain  > > 0x7144c150 Y>
> > > public decl_2 VOID /home/david/coding-3/gcc-git-expr-
> > > vs-decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C:3:8
> > > align:8 warn_if_not_align:0 context
> > >  > > git-
> > > expr-vs-decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C>
> > > chain >>
> > > /home/david/coding-3/gcc-git-expr-vs-
> > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C:6:6
> > > start:
> > > /home/david/coding-3/gcc-git-expr-vs-
> > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C:6:6
> > > finish:
> > > /home/david/coding-3/gcc-git-expr-vs-
> > > decl/src/gcc/testsuite/g++.dg/conversion/reinterpret1.C:6:29>
> > > 
> > > The problem is that this reinterpret cast "looks" just like one
> > > of
> > > my
> > > location wrappers.
> > 
> > Your code shouldn't strip a NON_LVALUE_EXPR around a VAR_DECL.
> > > I see a similar issue with constants, where with:
> > > 
> > >   struct Y { Y(int &); };
> > >   Y y1(reinterpret_cast(42));
> > > 
> > > trunk generates an error like the above, but my code handles the
> > >   NON_LVALUE_EXPR(INTEGER_CST(42))
> > > as if it were a location wrapper around the INTEGER_CST, and thus
> > > doesn't emit the error.
> > 
> > Why doesn't it emit the error?  We should get the same error
> > whether
> > or not we strip the wrapper.
> 
> Thanks: my stripping macro was over-zealous: it was stripping any
> NON_LVALUE_EXPR or VIEW_CONVERT_EXPR where the type matched that of
> the wrapped node.  I've added the additional condition that a
> NON_LVALUE_EXPR has to be around a CONSTANT_CLASS_P, and
> a VIEW_CONVERT_EXPR around a !CONSTANT_CLASS_P.
> 
> Here's an updated version of the patch (v2), now a patch kit (on top
> of r254387).  I split i

Re: [PATCH] handle non-constant offsets in -Wstringop-overflow (PR 77608)

2017-11-30 Thread Martin Sebor

On 11/22/2017 05:03 PM, Jeff Law wrote:

On 11/21/2017 12:07 PM, Martin Sebor wrote:

On 11/21/2017 09:55 AM, Jeff Law wrote:

On 11/19/2017 04:28 PM, Martin Sebor wrote:

On 11/18/2017 12:53 AM, Jeff Law wrote:

On 11/17/2017 12:36 PM, Martin Sebor wrote:

The attached patch enhances -Wstringop-overflow to detect more
instances of buffer overflow at compile time by handling non-
constant offsets into the destination object that are known to
be in some range.  The solution could be improved by handling
even more cases (e.g., anti-ranges or offsets relative to
pointers beyond the beginning of an object) but it's a start.

In addition to bootsrapping/regtesting GCC, also tested with
Binutils/GDB, Glibc, and the Linux kernel on x86_64 with no
regressions.

Martin

The top of GDB fails to compile at the moment so the validation
there was incomplete.

gcc-77608.diff


PR middle-end/77608 - missing protection on trivially detectable
runtime buffer overflow

gcc/ChangeLog:

PR middle-end/77608
* builtins.c (compute_objsize): Handle non-constant offsets.

gcc/testsuite/ChangeLog:

PR middle-end/77608
* gcc.dg/Wstringop-overflow.c: New test.

The recursive call into compute_objsize passing in the ostype avoids
having to think about the whole object vs nearest containing object
issues.  Right?

What's left to worry about is maximum or minimum remaining bytes in the
object.  At least that's my understanding of how ostype works here.

So we get the amount remaining, ignoring the variable offset, from the
recursive call (SIZE).  The space left after we account for the
variable
offset is [SIZE - MAX, SIZE - MIN].  So ISTM for type 0/1 you have to
return SIZE-MIN (which you do) and for type 2/3 you have to return
SIZE-MAX which I think you get wrong (and you have to account for the
possibility that MAX or MIN is greater than SIZE and thus there's
nothing left).


Subtracting the upper bound of the offset from the size instead
of the lower bound when the caller is asking for the minimum
object size would make the result essentially meaningless in
common cases where the offset is smaller than size_t, as in:

  char a[7];

  void f (const char *s, unsigned i)
  {
__builtin_strcpy (a + i, s);
  }

Here, i's range is [0, UINT_MAX].

IMO, it's only useful to use the lower bound here, otherwise
the result would only rarely be non-zero.

But when we're asking for the minimum left, aren't we essentially asking
for "how much space am I absolutely sure I can write"?  And if that is
the question, then the only conservatively correct answer is to subtract
the high bound.


I suppose you could look at it that way but IME with this work
(now, and also last year when I submitted a patch actually
changing the built-in), using the upper bound is just not that
useful because it's too often way too big.  There's no way to
distinguish an out-of-range upper bound that's the result of
an inadequate attempt to constrain a value from an out-of-range
upper bound that is sufficiently constrained but in a way GCC
doesn't see.

Understood.

So while it's reasonable to not warn in those cases where we just have
crap range information (that's always going to be the case for some code
regardless of how good my work or Andrew/Aldy's work is), we have to be
very careful and make sure that nobody acts on this information for
optimization purposes because what we're returning is not conservatively
correct.




There are no clients of this API that would be affected by
the decision one way or the other (unless the user specifies
a -Wstringop-overflow= argument greater than the default 2)
so I don't think what we do now matters much, if at all.

Right, but what's to stop someone without knowledge of the
implementation and its quirk of not returning the conservatively safe
result from using the results in other ways.


Presumably they would find out by testing their code.  But this
is a hypothetical scenario.  I added the function for warnings.
I wasn't expecting it to be used for optimization, no such uses
have emerged, and I don't have the impression that anyone is
contemplating adding them (certainly not in stage 3).  If you
think the function could be useful for optimization then we
should certainly consider changing it as we gain experience
with it under those conditions.


What would the impact be of simply not supporting those queries,
essentially returning "I don't know" rather than returning something
that isn't conservatively correct?


Except for false negatives with -Wstringop-overflow=3 and =4
(i.e., Object Size type 2 and 3) I don't think there would be
any impact.  As I said, the function isn't used for optimization
and I don't think the option is used in these modes either, so
in my mind it matters very little.  I don't even think there
are any tests for it in these modes, either.



If we have to support both queries and we're going to return something
that is not conservatively correct, then it really needs to be
documen

Re: [PATCH] PR libgcc/83112, Fix warnings on libgcc float128-ifunc.c

2017-11-30 Thread Michael Meissner
On Wed, Nov 29, 2017 at 08:42:51PM -0600, Segher Boessenkool wrote:
> Hi,
> 
> On Mon, Nov 27, 2017 at 06:40:09PM -0500, Michael Meissner wrote:
> > @@ -33,3 +35,13 @@ $(fp128_hw_obj)   : $(srcdir)/config/rs6
> >  
> >  $(fp128_ifunc_obj)  : INTERNAL_CFLAGS += $(FP128_CFLAGS_SW)
> >  $(fp128_ifunc_obj)  : $(srcdir)/config/rs6000/t-float128-hw
> > +
> > +_mulkc3-hw.c: $(srcdir)/config/rs6000/_mulkc3.c
> > +   rm -rf _mulkc3.c
> > +   (echo "#define __mulkc3 __mulkc3_hw"; \
> > +cat $(srcdir)/config/rs6000/_mulkc3.c) > _mulkc3-hw.c
> 
> Please don't -rf.  -rf is a dangerous habit.
> 
> This also won't work if anything tries to build _mulkc3-hw.c a second
> time; you have deleted its prerequisite.
> 
> Maybe some other scheme is better?
> 
> > --- libgcc/config/rs6000/t-float128 (revision 255177)
> > +++ libgcc/config/rs6000/t-float128 (working copy)
> > @@ -86,7 +86,7 @@ test:
> > for x in $(fp128_obj); do echo "$$x"; done;
> >  
> >  clean-float128:
> > -   rm -rf $(fp128_softfp_src)
> > +   rm -rf $(fp128_softfp_src) $(fp128_hardfp_src)
> > @$(MULTICLEAN) multi-clean DO=clean-float128
> 
> -rm to avoid warnings from rm if you clean without the files being there.
> 
> Otherwise looks good.  Thanks!

As we discussed on private IRC, I mistakenly deleted the _{mul,div}kc3.c file
when I meant to delete the _{mul,div}kc3-hw.c file before recreating it.

This is the patch I checked in (subversion id 255282):

2017-11-30  Michael Meissner  

PR libgcc/83112
* config/rs6000/float128-ifunc.c (__addkf3_resolve): Use the
correct type for all ifunc resolvers to silence -Wattribute-alias
warnings.  Eliminate the forward declaration of the resolver
functions which is no longer needed.
(__subkf3_resolve): Likewise.
(__mulkf3_resolve): Likewise.
(__divkf3_resolve): Likewise.
(__negkf2_resolve): Likewise.
(__eqkf2_resolve): Likewise.
(__nekf2_resolve): Likewise.
(__gekf2_resolve): Likewise.
(__gtkf2_resolve): Likewise.
(__lekf2_resolve): Likewise.
(__ltkf2_resolve): Likewise.
(__unordkf2_resolve): Likewise.
(__extendsfkf2_resolve): Likewise.
(__extenddfkf2_resolve): Likewise.
(__trunckfsf2_resolve): Likewise.
(__trunckfdf2_resolve): Likewise.
(__fixkfsi_resolve): Likewise.
(__fixkfdi_resolve): Likewise.
(__fixunskfsi_resolve): Likewise.
(__fixunskfdi_resolve): Likewise.
(__floatsikf_resolve): Likewise.
(__floatdikf_resolve): Likewise.
(__floatunsikf_resolve): Likewise.
(__floatundikf_resolve): Likewise.
(__extendkftf2_resolve): Likewise.
(__trunctfkf2_resolve): Likewise.

PR libgcc/83103
* config/rs6000/quad-float128.h (TF): Don't define if long double
is IEEE 128-bit floating point.
(TCtype): Define as either TCmode or KCmode, depending on whether
long double is IEEE 128-bit floating point.
(__mulkc3_sw): Add declarations for software/hardware versions of
complex multiply/divide.
(__divkc3_sw): Likewise.
(__mulkc3_hw): Likewise.
(__divkc3_hw): Likewise.
* config/rs6000/_mulkc3.c (_mulkc3): If we are building ifunc
handlers to switch between using software emulation and hardware
float128 instructions, build the complex multiply/divide functions
for both software and hardware support.
* config/rs6000/_divkc3.c (_divkc3): Likewise.
* config/rs6000/float128-ifunc.c (__mulkc3_resolve): Likewise.
(__divkc3_resolve): Likewise.
(__mulkc3): Likewise.
(__divkc3): Likewise.
* config/rs6000/t-float128-hw (fp128_hardfp_src): Likewise.
(fp128_hw_src): Likewise.
(fp128_hw_static_obj): Likewise.
(fp128_hw_shared_obj): Likewise.
(_mulkc3-hw.c): Likewise.
(_divkc3-hw.c): Likewise.
* config/rs6000/t-float128 (clean-float128): Delete _mulkc3-hw.c
and _divkc3-hw.c.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: libgcc/config/rs6000/quad-float128.h
===
--- libgcc/config/rs6000/quad-float128.h(revision 255281)
+++ libgcc/config/rs6000/quad-float128.h(working copy)
@@ -30,13 +30,20 @@
 /* quad.h defines the TFtype type by:
typedef float TFtype __attribute__ ((mode (TF)));
 
-   This define forces it to use KFmode (aka, ieee 128-bit floating point).  */
+   This define forces it to use KFmode (aka, ieee 128-bit floating point).
+   However, when the compiler's default is changed so that long double is IEEE
+   128-bit floating point, we need to go back to using TFmode and TCmode.  */
+#ifndef __LONG_DOUBLE_IEEE128__
 #define TF KF
 
 /* We also need TCtype to represent complex ieee 128

Re: [PATCH][AArch64] Fix address printing on ILP32

2017-11-30 Thread Steve Ellcey
On Thu, 2017-11-30 at 17:27 +, Wilco Dijkstra wrote:
> 
> This fixes the ICE in 
> https://gcc.gnu.org/ml/gcc-patches/2017-11/msg02509.html.
> 
> 
> ChangeLog:
> 2017-11-30  Wilco Dijkstra  
> 
> gcc/
>   * config/aarch64/aarch64.md (call_insn): Use %c rather than %a.
>   (call_value_insn): Likewise.
>   (sibcall_insn): Likewise.
>   (sibcall_value_insn): Likewise.
>   (movsi_aarch64): Likewise.
>   (movdi_aarch64): Likewise.
>   (add_losym_): Likewise.
>   (ldr_got_small_): Likewise.
>   (ldr_got_small_sidi): Likewise.
>   (ldr_got_small_28k_): Likewise.
>   (ldr_got_small_28k_sidi): Likewise.
>   * config/aarch64/aarch64.c (aarch64_print_address_internal):
>   Move output_addr_const to symbolic case. Add error check.
> testsuite/
>   * gcc.dg/asm-4.c: Skip on AArch64 with ILP32 as test is incorrect.
> 

This fixed the build problems on my system with ILP32.

Steve Ellcey
sell...@cavium.com


[PATCH #2], PR target/81959, Fix ++int to _Float128 conversion on power9

2017-11-30 Thread Michael Meissner
I submitted the original version of the patch back in August, and then I forgot
about it.
https://gcc.gnu.org/ml/gcc-patches/2017-08/msg01600.html

Hi Mike,

On Mon, Aug 28, 2017 at 02:50:02PM -0400, Michael Meissner wrote:
> When I added the optimization for loading 32-bit values directly into the
> vector registers from memory to convert to IEEE 128-bit floating point, I
> forgot to make sure the address did not have PRE_INCREMENT, etc. addressing.

>   * config/rs6000/rs6000.md (float_si2_hw): If register
>   allocation hasn't been done, make sure the memory address is
>   X-FORM (register+register).
>   (floatuns_si2_hw2): Likewise.

Why is it okay after RA but not before?

Register allocation has fixed the address due to the 'Z' constraint, so it is
no longer an AUTOINC address.  I've fixed it so that the function
rs6000_address_for_fpconvert checks whether it is being called after register
allocation, and if so, it does nothing.

> --- gcc/config/rs6000/rs6000.md   (revision 251358)
> +++ gcc/config/rs6000/rs6000.md   (working copy)
> @@ -14505,6 +14505,9 @@ (define_insn_and_split "float_si2_
>  {
>if (GET_CODE (operands[2]) == SCRATCH)
>  operands[2] = gen_reg_rtx (DImode);
> +
> +  if (MEM_P (operands[1]) && !reload_completed)
> +operands[1] = rs6000_address_for_fpconvert (operands[1]);
>  })

It will need a comment here, then (other callers of
rs6000_address_for_fpconvert do not test for !reload_completed).

All of the other uses of rs6000_address_for_fpconvert are either in
define_expands or on the first splitter pass, which occurs before register
allocation.

Or maybe the predicate should be stricter in all these cases?
nonimmediate_operand allows a lot ;-)

No, then it tends to generate worse code if it is done before the first split
pass (because it no longer keeps the address together).  I've been thinking
that in general, we should replace these calls with a new predicate that before
register allocation allows normal memory addresses, but during/after RA, it
becomes more strict.  In my experience, with RELOAD that wasn't feasible, but
LRA can handle it (and RELOAD is no longer an issue).

> --- gcc/testsuite/gcc.target/powerpc/pr81959.c(revision 0)
> +++ gcc/testsuite/gcc.target/powerpc/pr81959.c(revision 0)
> @@ -0,0 +1,25 @@
> +/* { dg-do compile { target { powerpc64*-*-* && lp64 } } } */
> +/* { dg-require-effective-target powerpc_p9vector_ok } */
> +/* { dg-options "-mpower9-vector -O2 -mfloat128" } */

powerpc*-*-*, or does that not work?

It needs 64-bit because various machine independent parts of the compiler want
to use TImode if there is arithmetic support for KFmode to copy things, and
TImode isn't supported in 32-bit.

The __float128 support is not built if the compiler is a 32-bit compiler (the
enabler for _float128 is in linux64.h)

Here is the current version of the patch.  I have done bootstraps and make
check with no regressions.  Can I check this into the trunk?

The bug shows up in GCC 7 as well.  Assuming it backports cleanly, can I check
this into GCC 7 also?

[gcc]
2017-11-30  Michael Meissner  

PR target/81959
* config/rs6000/rs6000.c (rs6000_address_for_fpconvert): Check for
whether we can allocate pseudos before trying to fix an address.
* config/rs6000/rs6000.md (float_si2_hw): Make sure the
memory address is indexed or indirect.
(floatuns_si2_hw2): Likewise.

[gcct/testsuite]
2017-11-30  Michael Meissner  

PR target/81959
* gcc.target/powerpc/pr81959.c: New test.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 255177)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -37897,7 +37897,8 @@ rs6000_address_for_fpconvert (rtx x)
 
   gcc_assert (MEM_P (x));
   addr = XEXP (x, 0);
-  if (! legitimate_indirect_address_p (addr, reload_completed)
+  if (can_create_pseudo_p ()
+  && ! legitimate_indirect_address_p (addr, reload_completed)
   && ! legitimate_indexed_address_p (addr, reload_completed))
 {
   if (GET_CODE (addr) == PRE_INC || GET_CODE (addr) == PRE_DEC)
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 255177)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -14636,6 +14636,9 @@ (define_insn_and_split "float_si2_
 {
   if (GET_CODE (operands[2]) == SCRATCH)
 operands[2] = gen_reg_rtx (DImode);
+
+  if (MEM_P (operands[1]))
+operands[1] = rs6000_address_for_fpconvert (operands[1]);
 })
 
 (define_insn_and_split "float2"
@@ -14699,6 +14702,9 @@ (define_insn_and_split "floatuns_s
 {
   if (GET_CODE (operands[2]) == SCRATCH)
 operands[2] = gen_reg_rtx (DImode);
+
+  if (MEM_P (operands

[committed] ;; to ;

2017-11-30 Thread Jakub Jelinek
Hi!

I've noticed way too many ;; that shouldn't be there.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
committed to trunk as obvious.

2017-11-30  Jakub Jelinek  

* spellcheck-tree.c (test_find_closest_identifier): Use ; instead
of ;;.
* gengtype-state.c (read_state_pair): Likewise.
* gimple-fold.c (gimple_fold_builtin_string_compare): Likewise.
* sel-sched-dump.c (dump_insn_rtx_1): Likewise.
* ipa-cp.c (intersect_aggregates_with_edge): Likewise.
* ifcvt.c (noce_try_store_flag_constants): Likewise.
* tree-ssa-ccp.c (ccp_finalize): Likewise.
* omp-grid.c (grid_process_kernel_body_copy): Likewise.
* builtins.c (fold_builtin_3): Likewise.
* graphite-scop-detection.c
(scop_detection::stmt_has_simple_data_refs_p): Likewise.
* hsa-gen.c (hsa_function_representation::hsa_function_representation):
Likewise.
c/
* c-parser.c (c_parser_postfix_expression): Use ; instead of ;;.
jit/
* jit-recording.c
(recording::memento_of_new_rvalue_from_const ::write_reproducer):
Use ; instead of ;;.
lto/
* lto.c (create_subid_section_table): Use ; instead of ;;.
objc/
* objc-next-runtime-abi-01.c (generate_dispatch_table): Use ; instead
of ;;.

--- gcc/spellcheck-tree.c.jj2017-01-01 12:45:38.0 +0100
+++ gcc/spellcheck-tree.c   2017-11-30 12:13:21.228625493 +0100
@@ -123,7 +123,7 @@ test_find_closest_identifier ()
   ASSERT_EQ (apple, find_closest_identifier (get_identifier ("app"),
 &candidates));
   ASSERT_EQ (banana, find_closest_identifier (get_identifier ("banyan"),
- &candidates));;
+ &candidates));
   ASSERT_EQ (cherry, find_closest_identifier (get_identifier ("berry"),
  &candidates));
   ASSERT_EQ (NULL,
--- gcc/gengtype-state.c.jj 2017-01-01 12:45:38.0 +0100
+++ gcc/gengtype-state.c2017-11-30 12:11:10.434244350 +0100
@@ -2206,7 +2206,7 @@ read_state_pair (pair_p *current)
  next_state_tokens (1);
  read_state_type (&((*current)->type));
  read_state_fileloc (&((*current)->line));
- read_state_options (&((*current)->opt));;
+ read_state_options (&((*current)->opt));
  t0 = peek_state_token (0);
  if (state_token_kind (t0) == STOK_RIGHTPAR)
{
--- gcc/gimple-fold.c.jj2017-11-28 14:00:54.0 +0100
+++ gcc/gimple-fold.c   2017-11-30 12:11:31.726980807 +0100
@@ -2145,7 +2145,7 @@ gimple_fold_builtin_string_compare (gimp
r = strncmp (p1, p2, length);
if (r == 0)
  known_result = true;
-   break;;
+   break;
  }
default:
  gcc_unreachable ();
--- gcc/sel-sched-dump.c.jj 2017-11-20 11:02:42.0 +0100
+++ gcc/sel-sched-dump.c2017-11-30 12:13:08.872778422 +0100
@@ -121,7 +121,7 @@ dump_insn_rtx_1 (rtx insn, int flags)
   int all;
 
   /* flags == -1 also means dumping all.  */
-  all = (flags & 1);;
+  all = (flags & 1);
   if (all)
 flags |= DUMP_INSN_RTX_ALL;
 
--- gcc/ipa-cp.c.jj 2017-11-28 22:19:28.0 +0100
+++ gcc/ipa-cp.c2017-11-30 12:12:36.064184498 +0100
@@ -4192,7 +4192,7 @@ intersect_aggregates_with_edge (struct c
}
   else
{
- src_plats = ipa_get_parm_lattices (caller_info, src_idx);;
+ src_plats = ipa_get_parm_lattices (caller_info, src_idx);
  /* Currently we do not produce clobber aggregate jump
 functions, adjust when we do.  */
  gcc_checking_assert (!src_plats->aggs || !jfunc->agg.items);
@@ -4214,7 +4214,7 @@ intersect_aggregates_with_edge (struct c
FOR_EACH_VEC_ELT (inter, k, item)
  {
int l = 0;
-   bool found = false;;
+   bool found = false;
 
if (!item->value)
  continue;
--- gcc/ifcvt.c.jj  2017-10-20 09:16:06.0 +0200
+++ gcc/ifcvt.c 2017-11-30 12:12:19.800385797 +0100
@@ -1287,7 +1287,7 @@ noce_try_store_flag_constants (struct no
   HOST_WIDE_INT itrue, ifalse, diff, tmp;
   int normalize;
   bool can_reverse;
-  machine_mode mode = GET_MODE (if_info->x);;
+  machine_mode mode = GET_MODE (if_info->x);
   rtx common = NULL_RTX;
 
   rtx a = if_info->a;
--- gcc/tree-ssa-ccp.c.jj   2017-11-02 08:45:52.0 +0100
+++ gcc/tree-ssa-ccp.c  2017-11-30 12:13:35.208452463 +0100
@@ -989,7 +989,7 @@ ccp_finalize (bool nonzero_p)
 
   free (const_val);
   const_val = NULL;
-  return something_changed;;
+  return something_changed;
 }
 
 
--- gcc/omp-grid.c.jj   2017-10-30 17:16:28.0 +0100
+++ gcc/omp-grid.c  2017-11-30 12:12:50.083010986 +0100
@@ -1237,7 +1237,7 @@ grid_process_kernel_body_copy (grid_prop
 
   if (gimple_omp_for_combined_p (inner_loop))

[PATCH] Fix unreachable-2.c testcase (PR sanitizer/83219)

2017-11-30 Thread Jakub Jelinek
Hi!

As mentioned in the PR, when -fsanitize=unreachable (part of
-fsanitize=undefined), we don't want optimizers to optimize away
conditional branches to __builtin_unreachable or switch cases
that branch go __builtin_unreachable.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2017-11-30  Jakub Jelinek  

PR sanitizer/83219
* tree-cfg.c: Include asan.h.
(gimple_seq_unreachable_p): Return false for -fsanitize=unreachable.

--- gcc/tree-cfg.c.jj   2017-11-23 21:13:30.0 +0100
+++ gcc/tree-cfg.c  2017-11-30 11:40:38.334930487 +0100
@@ -61,6 +61,7 @@ along with GCC; see the file COPYING3.
 #include "attribs.h"
 #include "selftest.h"
 #include "opts.h"
+#include "asan.h"
 
 /* This file contains functions for building the Control Flow Graph (CFG)
for a function tree.  */
@@ -469,7 +470,12 @@ computed_goto_p (gimple *t)
 bool
 gimple_seq_unreachable_p (gimple_seq stmts)
 {
-  if (stmts == NULL)
+  if (stmts == NULL
+  /* Return false if -fsanitize=unreachable, we don't want to
+optimize away those calls, but rather turn them into
+__ubsan_handle_builtin_unreachable () or __builtin_trap ()
+later.  */
+  || sanitize_flags_p (SANITIZE_UNREACHABLE))
 return false;
 
   gimple_stmt_iterator gsi = gsi_last (stmts);

Jakub


Re: [PATCH] final: Improve output for -dp and -fverbose-asm

2017-11-30 Thread Martin Sebor

On 11/30/2017 10:07 AM, Segher Boessenkool wrote:

On Thu, Nov 30, 2017 at 09:54:26AM -0700, Martin Sebor wrote:

It is neither line length nor amt of info that makes the second one
way better readable.


The justification certainly makes it easier to read.  But
the abbreviations make it harder to interpret.  [c=4 l=4]
makes no sense to anyone not already familiar with what
it means.

There's nothing wrong with using mnemonics as long as they're
well established and commonly understood.  Absent that, they
should be explained in some accessible document.

Not everyone who reads the verbose assembly is familiar with
GCC internals.  Users read it to help debug problems in their
code.  They shouldn't have to also study GCC source code to
understand what the contents mean.


This is the -dp output, I hardly ever use -fverbose-asm, it has been
unreadable for ten years or so.

-fverbose-asm looks like this:
===
.L.yk:
 # 81288.c:4:   unsigned int *un = (f3 != 0) ? &t4 : 0;
cmpdi 0,4,0  # tmp130, f3
beq 0,.L2#
 # 81288.c:6:   *un ^= t4;
srdi 9,3,32  #, tmp131, t4
xor 9,9,3#, tmp132, tmp131, t4
 # 81288.c:7:   if (*un == t4)
rldicl 9,9,0,32  # tmp133, tmp132
 # 81288.c:7:   if (*un == t4)
cmpd 7,9,3   # t4, tmp134, tmp133
beq 7,.L7#
.L5:
 # 81288.c:11: }
mr 3,4   #, 
blr
.L2:
 # 81288.c:6:   *un ^= t4;
lwz 9,0(4)   # MEM[(unsigned int *)0B], _13
trap
.L7:
 # 81288.c:8: f3 = !!t4;
addic 4,9,-1 # tmp139, tmp133
subfe 4,4,9  # , tmp139, tmp133
b .L5#
===

If we're okay with outputting that kind of stuff to *users*, then how
bad is [c=8 l=4] for GCC developers?  Heh.


I don't know if the above is okay or not.  What I do know is
that [l=4] is not an improvement over [length = 4].

But I think there are ways to improve the readability while
at the same time making the output more compact.  I mentioned
documenting the labels (whatever they may be) in the manual
as one possibility.  Another idea is to print a brief legend
at the bottom of the file explaining what l= stands for.  Yet
another is to print a header at the top of every function with
a label for each column (like in the top command), and then
document what each column means in the manual by referring
to the column headers.  I'm sure there are others.

Martin


[PATCH] Fix -Wreturn-type with switches (PR sanitizer/81275)

2017-11-30 Thread Jakub Jelinek
Hi!

Now that the C++ FE emits __builtin_unreachable with BUILTINS_LOCATION
for fallthrough into end of function without returning value, we need to
see those in the warn_function_return pass which is right after building
cfg.  While GIMPLE_CONDs conditionally jumping to __builtin_unreachable are
only optimized during evrp and later, GIMPLE_SWITCH cases that branch to
__builtin_unreachable are unfortunately removed already during the cfg
cleanup in the cfg pass and thus the immediately following
warn_function_return pass can't warn.  Fixed by postponing that, e.g.
the evrp pass does unconditional TODO_cleanup_cfg, so those will be
optimized away pretty early (often already during ccp1).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-11-30  Jakub Jelinek  

PR sanitizer/81275
* tree-cfg.c (group_case_labels_stmt): Don't optimize away
C++ FE implicitly added __builtin_unreachable () until -Wreturn-type
is diagnosed.

* c-c++-common/tsan/pr81275.c: New test.

--- gcc/tree-cfg.c.jj   2017-11-30 11:40:38.0 +0100
+++ gcc/tree-cfg.c  2017-11-30 12:01:39.496311219 +0100
@@ -1750,7 +1750,14 @@ group_case_labels_stmt (gswitch *stmt)
 
   /* Discard cases that have an unreachable destination block.  */
   if (EDGE_COUNT (base_bb->succs) == 0
- && gimple_seq_unreachable_p (bb_seq (base_bb)))
+ && gimple_seq_unreachable_p (bb_seq (base_bb))
+ /* Don't optimize this if __builtin_unreachable () is the
+implicitly added one by the C++ FE too early, before
+-Wreturn-type can be diagnosed.  We'll optimize it later
+during switchconv pass or any other cfg cleanup.  */
+ && (gimple_in_ssa_p (cfun)
+ || (LOCATION_LOCUS (gimple_location (last_stmt (base_bb)))
+ != BUILTINS_LOCATION)))
{
  edge base_edge = find_edge (gimple_bb (stmt), base_bb);
  if (base_edge != NULL)
--- gcc/testsuite/c-c++-common/tsan/pr81275.c.jj2017-11-28 
22:23:34.0 +0100
+++ gcc/testsuite/c-c++-common/tsan/pr81275.c   2017-11-30 11:50:56.185274379 
+0100
@@ -1,7 +1,6 @@
 /* PR sanitizer/81275 */
 /* { dg-do compile } */
 /* { dg-options "-Wreturn-type -fsanitize=thread" } */
-/* { dg-skip-if "" { c++ } { "*" } { "-O0" } } */
 
 int
 f1 (int a, int b)

Jakub


[C PATCH] More constistent treatment of C initializers (PR c/83222)

2017-11-30 Thread Jakub Jelinek
Hi!

In this PR Ian complains that the C FE behaves inconsistently
between static var initializers inside of a function and outside of
functions.  Apparently we've done that earlier, but only at -O1 and above
where decl_constant_value would be called and in initializers inside of
functions would fold to DECL_INITIAL if other conditions are met, but
outside of functions never.  With my recent changes this is the case
also at -O0.

The following patch allows such folding (in initializers only) also
outside of functions.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-11-30  Jakub Jelinek  

PR c/83222
* c-tree.h (decl_constant_value_1): Declare.
* c-typeck.c (decl_constant_value_1): New function.
(decl_constant_value): Use it.
* c-fold.c (c_fully_fold_internal): If in_init, use
decl_constant_value_1 instead of decl_constant_value.

* gcc.c-torture/compile/pr83222.c: New test.

--- gcc/c/c-tree.h.jj   2017-11-19 18:06:52.0 +0100
+++ gcc/c/c-tree.h  2017-11-30 13:40:02.585208145 +0100
@@ -640,6 +640,7 @@ extern struct c_expr default_function_ar
 struct c_expr);
 extern struct c_expr convert_lvalue_to_rvalue (location_t, struct c_expr,
   bool, bool);
+extern tree decl_constant_value_1 (tree);
 extern void mark_exp_read (tree);
 extern tree composite_type (tree, tree);
 extern tree build_component_ref (location_t, tree, tree, location_t);
--- gcc/c/c-typeck.c.jj 2017-11-28 22:21:17.0 +0100
+++ gcc/c/c-typeck.c2017-11-30 13:39:41.041475162 +0100
@@ -1832,13 +1832,10 @@ c_size_in_bytes (const_tree type)
 /* Return either DECL or its known constant value (if it has one).  */
 
 tree
-decl_constant_value (tree decl)
+decl_constant_value_1 (tree decl)
 {
-  if (/* Don't change a variable array bound or initial value to a constant
-in a place where a variable is invalid.  Note that DECL_INITIAL
-isn't valid for a PARM_DECL.  */
-  current_function_decl != NULL_TREE
-  && TREE_CODE (decl) != PARM_DECL
+  if (/* Note that DECL_INITIAL isn't valid for a PARM_DECL.  */
+  TREE_CODE (decl) != PARM_DECL
   && !TREE_THIS_VOLATILE (decl)
   && TREE_READONLY (decl)
   && DECL_INITIAL (decl) != NULL_TREE
@@ -1853,6 +1850,17 @@ decl_constant_value (tree decl)
   return decl;
 }
 
+/* Return either DECL or its known constant value (if it has one).
+   Like the above, but always return decl outside of functions.  */
+
+tree
+decl_constant_value (tree decl)
+{
+  /* Don't change a variable array bound or initial value to a constant
+ in a place where a variable is invalid.  */
+  return current_function_decl ? decl_constant_value_1 (decl) : decl;
+}
+
 /* Convert the array expression EXP to a pointer.  */
 static tree
 array_to_pointer_conversion (location_t loc, tree exp)
--- gcc/c/c-fold.c.jj   2017-11-21 20:23:00.0 +0100
+++ gcc/c/c-fold.c  2017-11-30 13:40:53.911571996 +0100
@@ -167,7 +167,10 @@ c_fully_fold_internal (tree expr, bool i
   /* Except for variables which we can optimize to its initializer.  */
   if (VAR_P (expr) && !lval && (optimize || in_init))
{
- ret = decl_constant_value (expr);
+ if (in_init)
+   ret = decl_constant_value_1 (expr);
+ else
+   ret = decl_constant_value (expr);
  /* Avoid unwanted tree sharing between the initializer and current
 function's body where the tree can be modified e.g. by the
 gimplifier.  */
--- gcc/testsuite/gcc.c-torture/compile/pr83222.c.jj2017-11-30 
13:53:37.114111959 +0100
+++ gcc/testsuite/gcc.c-torture/compile/pr83222.c   2017-11-30 
13:53:19.0 +0100
@@ -0,0 +1,21 @@
+/* PR c/83222 */
+
+const char a = 0x42;
+const double b = (double) a;
+const double c = a;
+double d = (double) a;
+double e = a;
+const double f = 1 + (double) a;
+const double g = 1 + a;
+double h = 1 + (double) a;
+double i = 1 + a;
+double j[] = { (double) a, a, 1 + (double) a, 1 + a };
+
+void
+foo (void)
+{
+  static const double k = (double) a;
+  static const double l = a;
+  static const double m = 1 + (double) a;
+  static const double n = 1 + a;
+}

Jakub


[C++ PATCH] Fix ICE with structured binding & to incomplete type (PR c++/83217)

2017-11-30 Thread Jakub Jelinek
Hi!

When the structured binding decl doesn't have a reference type,
cp_finish_decl makes sure its type is complete, but if it has
a reference type, that doesn't happen and we ICE trying to
access bases of a not yet instantiated type.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

2017-11-30  Jakub Jelinek  

PR c++/83217
* decl.c (cp_finish_decomp): If decl's type is REFERENCE_TYPE,
call complete_type (TREE_TYPE (type)).

* g++.dg/cpp1z/decomp33.C: New test.

--- gcc/cp/decl.c.jj2017-11-30 11:18:00.0 +0100
+++ gcc/cp/decl.c   2017-11-30 15:34:55.873673157 +0100
@@ -7424,7 +7424,9 @@ cp_finish_decomp (tree decl, tree first,
   if (TREE_CODE (type) == REFERENCE_TYPE)
 {
   dexp = convert_from_reference (dexp);
-  type = TREE_TYPE (type);
+  type = complete_type (TREE_TYPE (type));
+  if (type == error_mark_node)
+   goto error_out;
 }
 
   tree eltype = NULL_TREE;
--- gcc/testsuite/g++.dg/cpp1z/decomp33.C.jj2017-11-30 15:37:58.781405774 
+0100
+++ gcc/testsuite/g++.dg/cpp1z/decomp33.C   2017-11-30 15:38:28.006043496 
+0100
@@ -0,0 +1,21 @@
+// PR c++/83217
+// { dg-do compile { target c++11 } }
+// { dg-options "" }
+
+template 
+struct S
+{
+  T a;
+};
+
+void
+foo (S *b)
+{
+  auto & [c] = *b; // { dg-warning "structured bindings only available 
with" "" { target c++14_down } }
+}
+
+void
+bar (S *d)
+{
+  auto [e] = *d;   // { dg-warning "structured bindings only available 
with" "" { target c++14_down } }
+}

Jakub


[PATCH] Fix reassoc qsort checking ICE (PR tree-optimization/83221)

2017-11-30 Thread Jakub Jelinek
Hi!

bb_rank is long and has basic block indexes << 16, and oe rank
is unsigned int.

So, if some function has over 32767 basic blocks, we can run into various
issues.

As I said in the PR, I see 3 possible fixes, one is attached below and
the shortest, which I've bootstrapped/regtested on x86_64-linux and
i686-linux.  While it not fix all possible issues, at least if there
aren't way too many basic blocks (2G+) on 64-bit hosts it shouldn't
fail the qsort checking, and on 32-bit hosts also, even when above 64K
basic blocks it is possible two different basic blocks will have the same
rank and we fall through to reassoc_stmt_dominates_stmt_p.

Another possibility is to store bb_rank still as long, but don't << 16
it when initializing, but when using except for this sort_by_operand_rank
spot.

And probably best but most involved change would be to switch to using
uint64_t for bb_rank, phi_rank as well as oe->rank.  By reordering fields
in oe it shouldn't make things worse on 64-bit hosts, but for 32-bit hosts
will need more memory; on the other side, it should handle better 32K+ basic
block cases, which even for 32-bit host compilers in some cases can be
handled within the limited 32-bit host address space.

So, is this ok for trunk or should I pick some other option?

2017-11-30  Jakub Jelinek  

PR tree-optimization/83221
* tree-ssa-reassoc.c (sort_by_operand_rank): Shift bb_rank
down by 16.
(init_reassoc): Formatting fix.

--- gcc/tree-ssa-reassoc.c.jj   2017-10-28 09:00:48.0 +0200
+++ gcc/tree-ssa-reassoc.c  2017-11-30 16:07:47.220334364 +0100
@@ -543,7 +543,7 @@ sort_by_operand_rank (const void *pa, co
return -1;
  /* If neither is, compare bb_rank.  */
  if (bb_rank[bbb->index] != bb_rank[bba->index])
-   return bb_rank[bbb->index] - bb_rank[bba->index];
+   return (bb_rank[bbb->index] >> 16) - (bb_rank[bba->index] >> 16);
}
 
   bool da = reassoc_stmt_dominates_stmt_p (stmta, stmtb);
@@ -6131,7 +6131,7 @@ init_reassoc (void)
 
   /* Set up rank for each BB  */
   for (i = 0; i < n_basic_blocks_for_fn (cfun) - NUM_FIXED_BLOCKS; i++)
-bb_rank[bbs[i]] = ++rank  << 16;
+bb_rank[bbs[i]] = ++rank << 16;
 
   free (bbs);
   calculate_dominance_info (CDI_POST_DOMINATORS);

Jakub


[PATCH] Improve -Wimplicit-fallthrough for nested switches (PR c/79153)

2017-11-30 Thread Jakub Jelinek
Hi!

A nested switch that can fallthrough (either because it has break stmts
or because it doesn't have default: and doesn't cover all cases) isn't
unfortunately reported with -Wimplicit-fallthrough, because we first
gimplify the nested switch and then we just see a label added implicitly
for break; or for the default: case if it was missing and the cases don't
cover all possible values and punt on that.

The following patch fixes it by telling the -Wimplicit-fallthrough
code about the special labels added for break; or missing default:
through a new SWITCH_BREAK_LABEL_P flag.  Additionally it makes sure
that GIMPLE_LABEL is emitted still inside of the switch's body sequence
and if it is a nested switch, it wraps the GIMPLE_SWITCH with the body
that ends in a GIMPLE_LABEL with SWITCH_BREAK_LABEL_P in a GIMPLE_BIND
(without vars or blocks).  With this, the -Wimplicit-fallthrough code
can recognize this and treat the whole switch that can fall through as
a single statement for the purpose of the warning.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-11-30  Jakub Jelinek  

PR c/79153
* tree.h (SWITCH_BREAK_LABEL_P): Define.
* gimplify.c (collect_fallthrough_labels): Handle GIMPLE_BIND
starting with a GIMPLE_SWITCH and ending with GIMPLE_LABEL with
SWITCH_BREAK_LABEL_P set on the label.
(gimplify_switch_expr): Set SWITCH_BREAK_LABEL_P on the label
added for default case if it was missing and not all cases covered.
Wrap GIMPLE_SWITCH and the switch_body_seq into a GIMPLE_BIND if
switch_body_seq ends with a GIMPLE_LABEL with SWITCH_BREAK_LABEL_P
set on the label.
* tree-chrec.c (evolution_function_is_univariate_p): Add return true;
to avoid -Wimplicit-fallthrough warning.
* config/i386/i386.c (ix86_expand_special_args_builtin): Add
FALLTHRU comment to avoid -Wimplicit-fallthrough warning.
c/
* c-parser.c: Include tree-iterator.h.
(c_parser_switch_statement): Emit LABEL_EXPR for the break label
into SWITCH_BODY instead of after it and set SWITCH_BREAK_LABEL_P
on it.
cp/
* cp-gimplify.c (genericize_switch_stmt): Emit LABEL_EXPR for the
break label into SWITCH_BODY instead of after it and set
SWITCH_BREAK_LABEL_P on it.
* parser.c (cp_parser_objc_expression): Add FALLTHRU comment to avoid
-Wimplicit-fallthrough warning.
fortran/
* match.c (gfc_match): Add FALLTHRU comment to avoid
-Wimplicit-fallthrough warning.
testsuite/
* c-c++-common/Wimplicit-fallthrough-7.c: Adjust expected warning
line.
* c-c++-common/Wimplicit-fallthrough-36.c: New test.

--- gcc/tree.h.jj   2017-11-28 22:21:17.0 +0100
+++ gcc/tree.h  2017-11-30 17:55:35.299197618 +0100
@@ -766,6 +766,11 @@ extern void omp_clause_range_check_faile
 #define FALLTHROUGH_LABEL_P(NODE) \
   (LABEL_DECL_CHECK (NODE)->base.private_flag)
 
+/* Set on the artificial label created for break; stmt from a switch.
+   This is used to implement -Wimplicit-fallthrough.  */
+#define SWITCH_BREAK_LABEL_P(NODE) \
+  (LABEL_DECL_CHECK (NODE)->base.protected_flag)
+
 /* Nonzero means this expression is volatile in the C sense:
its address should be of type `volatile WHATEVER *'.
In other words, the declared item is volatile qualified.
--- gcc/gimplify.c.jj   2017-11-28 12:11:40.0 +0100
+++ gcc/gimplify.c  2017-11-30 19:04:32.694173903 +0100
@@ -1897,6 +1897,27 @@ collect_fallthrough_labels (gimple_stmt_
 
   do
 {
+  if (gimple_code (gsi_stmt (*gsi_p)) == GIMPLE_BIND)
+   {
+ /* Recognize the special GIMPLE_BIND added by gimplify_switch_expr,
+which starts on a GIMPLE_SWITCH and ends with a break label.
+Handle that as a single statement that can fall through.  */
+ gbind *bind = as_a  (gsi_stmt (*gsi_p));
+ gimple *first = gimple_seq_first_stmt (gimple_bind_body (bind));
+ gimple *last = gimple_seq_last_stmt (gimple_bind_body (bind));
+ if (last
+ && gimple_code (first) == GIMPLE_SWITCH
+ && gimple_code (last) == GIMPLE_LABEL)
+   {
+ tree label = gimple_label_label (as_a  (last));
+ if (SWITCH_BREAK_LABEL_P (label))
+   {
+ prev = bind;
+ gsi_next (gsi_p);
+ continue;
+   }
+   }
+   }
   if (gimple_code (gsi_stmt (*gsi_p)) == GIMPLE_BIND
  || gimple_code (gsi_stmt (*gsi_p)) == GIMPLE_TRY)
{
@@ -2315,6 +2336,7 @@ gimplify_switch_expr (tree *expr_p, gimp
   preprocess_case_label_vec_for_gimple (labels, index_type,
&default_case);
 
+  bool add_bind = false;
   if (!default_case)
{
  glabel *new_default;
@@ -2322,14 +2344,46 @@ gimplify_switch_expr (tree *expr_p, gimp
  default_case
 

[PATCH] Fix build of gimple-ssa-store-merging.c with -std=c++98

2017-11-30 Thread Jakub Jelinek
Hi!

Markus reported that C++98 mode is unhappy about anonymous namespace
with anonymous structure typed variables.

The following should fix that, bootstrapped/regtested on x86_64-linux
and i686-linux, ok for trunk?

2017-11-30  Jakub Jelinek  

PR tree-optimization/83233
* gimple-ssa-store-merging.c (nop_stats, bswap_stats): Use
bswap_stat name for the struct.

--- gcc/gimple-ssa-store-merging.c.jj   2017-11-21 14:53:50.0 +0100
+++ gcc/gimple-ssa-store-merging.c  2017-11-30 20:00:15.669946061 +0100
@@ -170,7 +170,7 @@
 
 namespace {
 
-struct
+struct bswap_stat
 {
   /* Number of hand-written 16-bit nop / bswaps found.  */
   int found_16bit;

Jakub


[GCC 9][RFC][PATCH] Optimize PHIs with constant arguments better

2017-11-30 Thread Jeff Law


This addresses some code quality regressions with some work scheduled
for gcc-9.  However, if anyone is aware of BZs that could be helped with
this patch, don't hesitate to pass them along.  Depending on their
severity we could always re-evaluate the plan for this patch.

The patch optimizes sequences like this:


 # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
 _30 = (unsigned char) iftmp.4_16;

ie, we have a PHI where all its arguments are known (differing)
constants.  The PHI feeds an operation which can be compile-time
evaluated for all the PHI's arguments.

This arises most often where the result of the PHI is type converted via
 NOP_EXPR.  But it also happens for a boolean comparisons, arithmetic
and logicals and other type conversions.

We can optimize away the statement by creating a new PHI node holding
the result of the statement applied to each of the original PHI's
constant operands.  So for the fragment above we'd generate:

 # iftmp.4_16 = PHI <0(7), 1(8), 0(9)>
 # _30 = PHI <0(7), 1(8), 0(9)>

These kind of backward propagations can cascade.  Here's an example I
saw during analysis of test results:

 # m_5 = PHI <10(5), 12(6), 14(7)>
:
  _2 = (long unsigned int) m_5;
  _3 = _2 * 32;
  goto ; [100.00%]

After this patch the two statements have been eliminated in favor of
generating PHIs with constant arguments:

And after PHI propagation we have:
  # m_5 = PHI <10(5), 12(6), 14(7)>
  # _2 = PHI <10(5), 12(6), 14(7)>
  # _3 = PHI <320(5), 384(6), 448(7)>
:
  goto ; [100.00%]

DCE will come along and wipe out m_5 and _2 if they are unused.

I initially covered just NOP_EXPR in the proof of concept.   But it's
trivial to extend to cover other unaries, and binary/comparisons where
one operand was already a constant, so I did that.  This applies
surprisingly often during a bootstrap.  An earlier version (which didn't
handle multiple uses of the result of the PHI) triggered over 6000 times
during a bootstrap:

NOP_EXPR5161
LT_EXPR  658
GE_EXPR  504
NE_EXPR  310
BIT_NOT_EXPR 295
BIT_AND_EXPR  94
PLUS_EXPR 77
NEGATE_EXPR   48
LSHIFT_EXPR   40
EQ_EXPR   34
GT_EXPR   16
BIT_IOR_EXPR  10
MINUS_EXPR 8

There's actually other cases we could handle were

x = PHI (a, 0);
y = a & x;

Turns into

x = PHI (a, 0);
y = PHI (a, 0);


I'm not sure how often these non-constant cases happen -- I haven't
tried to support this or gain any data on how often this kidn of case
might happen.

FWIW, there are cases were the PHI arguments are constant, but we still
can't simplify.  For example:


x = PHI (0, 1, 2)

y = 1234 / x;


When we process this we'll try to simplify 1234 / 0 to a constant which
fails.  We have to gracefully remove the partially initialized PHI node
in that case.  This is tested by existing tests in the suite.

You'll also see that certain tests in the suite such as pr61839_3,
ssa-pre-4.c are twiddled.  I've suppressed phi backwards propagation in
the original test so that it's not compromised.  THere's also a variant
of each test which verifies that the transformation is handled by phi
back propagation.

Bootstrapped and regression tested on x86, both in isolation and in a
larger patch series.

Will ping when gcc-9 opens for development.


Jeff

* tree-ssa-phiprop.c (propagate_with_phi_1): Renamed from
propagate_with_phi.
(stmt_likely_produces_constant_result): New function.
(fold_use_into_phi, propagate_with_phi_2):  Likewise.
(pass_phiprop::execute): Corresponding changes.  Call
propagate_with_phi_2.

* gcc.dg/tree-ssa/pr61743-1.c: Use -fno-tree-phiprop.
* gcc.dg/tree-ssa/pr61839_1.c: Likewise.
* gcc.dg/tree-ssa/pr61839_3.c: Likewise.
* gcc.dg/tree-ssa/ssa-pre-4.c: Likewise.

* gcc.dg/tree-ssa/pr61743-1a.c: New test.
* gcc.dg/tree-ssa/pr61839_1a.c: Likewise.
* gcc.dg/tree-ssa/pr61839_3a.c: Likewise.
* gcc.dg/tree-ssa/ssa-pre-4a.c: New test.

diff --git a/gcc/tree-ssa-phiprop.c b/gcc/tree-ssa-phiprop.c
index 494158b..f2ec2b3 100644
--- a/gcc/tree-ssa-phiprop.c
+++ b/gcc/tree-ssa-phiprop.c
@@ -259,8 +259,8 @@ chk_uses (tree, tree *idx, void *data)
with aliasing issues as we are moving memory reads.  */
 
 static bool
-propagate_with_phi (basic_block bb, gphi *phi, struct phiprop_d *phivn,
-   size_t n)
+propagate_with_phi_1 (basic_block bb, gphi *phi, struct phiprop_d *phivn,
+ size_t n)
 {
   tree ptr = PHI_RESULT (phi);
   gimple *use_stmt;
@@ -440,6 +440,207 @@ next:;
   return phi_inserted;
 }
 
+/* Return TRUE if USE_STMT, which uses PHI_RESULT will likely produce
+   a constant result if PHI_RESULT is itself a constant.  Else return
+   FALSE.
+
+   It is safe for this routine to always return TRUE.  If the result
+   is not a constant it will be detected and properly handled later.
+
+   This is overly conservative.  If USE_STMT were to produce an SSA_NAME,
+   then that would still work for our purposes.  */
+
+static 

Re: [patch] jump threading multiple paths that start from the same BB

2017-11-30 Thread Jeff Law
On 11/29/2017 11:49 AM, Aldy Hernandez wrote:
> On 11/27/2017 06:27 PM, Jeff Law wrote:
>> On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
> 
>>> Without further ado, here are my monumental, earth shattering
>>> improvements:
>>>
>>> Conditional branches
>>>     Without patch: 411846839709
>>>     With    patch: 411831330316
>>>  %changed: -0.0037660%
>>>
>>> Number of instructions
>>>     Without patch: 2271844650634
>>>     With    patch: 2271761153578
>>>  %changed: -0.0036754%
>> So that's pretty small.  A really good improvement would be on the order
>> of a half-percent reduction in runtime conditional branches.  I usually
>> test with .i files that have enable-checking turned on -- which tends to
>> give lots of opportunities due to the redundancies in our checking code.
>>
>> On a positive note, you're eliminating roughly 4.5 other dynamic
>> instructions for every runtime conditional branch you remove, which is
>> actually a good ratio.  3.5 is what I typically see for a fairly
>> extensive amount of work.   Patrick P's work last year was on the order
>> of 7.5.  So while it didn't fire often, when it did it was highly
>> effective.
> 
> I've retested with .ii files gathered from a stage1 build with
> --enable-checking=yes,all.  The results are an order of magnitude better
> but not impressive by any means:
Ack.  It'd need another order of magnitude to start getting into that
interesting range by itself :-)  You're still hitting that 4.5 ratio of
non-branch to branch instructions eliminated.  So it's higher value when
it applies than most of the stuff I've done through the years.


>> We have installed changes of this nature when they were part of a larger
>> set of changes, particularly if they helped us elsewhere (like
>> eliminating key path to silence a bogus warning or addressing other
>> regressions).
> 
> I don't know if the above changes your view, but I am not losing sleep
> over this, so no worries.  I will just keep accumulating these along
> with the myriad of other changes I have ;-).
I'm not sure it changes anything either.   But I'll certainly keep it in
mind as I start digging deeper into the buglists for gcc-8.  If it helps
in the bugfixing effort, then we'll have an interesting decision to make.

jeff


Re: [C PATCH] More constistent treatment of C initializers (PR c/83222)

2017-11-30 Thread Joseph Myers
On Thu, 30 Nov 2017, Jakub Jelinek wrote:

> Hi!
> 
> In this PR Ian complains that the C FE behaves inconsistently
> between static var initializers inside of a function and outside of
> functions.  Apparently we've done that earlier, but only at -O1 and above
> where decl_constant_value would be called and in initializers inside of
> functions would fold to DECL_INITIAL if other conditions are met, but
> outside of functions never.  With my recent changes this is the case
> also at -O0.
> 
> The following patch allows such folding (in initializers only) also
> outside of functions.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Improve -Wimplicit-fallthrough for nested switches (PR c/79153)

2017-11-30 Thread Joseph Myers
On Fri, 1 Dec 2017, Jakub Jelinek wrote:

> c/
>   * c-parser.c: Include tree-iterator.h.
>   (c_parser_switch_statement): Emit LABEL_EXPR for the break label
>   into SWITCH_BODY instead of after it and set SWITCH_BREAK_LABEL_P
>   on it.

The C front-end changes are OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


  1   2   >