Re: [PATCH] Fix bool vs. unsigned:1 vectorization (PR tree-optimization/79284)

2017-02-01 Thread Jakub Jelinek
On Tue, Jan 31, 2017 at 04:31:28PM -0700, Jeff Law wrote:
> On 01/31/2017 04:22 PM, Jakub Jelinek wrote:
> > On Tue, Jan 31, 2017 at 03:52:10PM -0700, Jeff Law wrote:
> > > Which makes your patch safe -- but introduces a non-obvious dependency
> > > between useless_type_conversion_p and your definition of
> > > INTEGRAL_BOOLEAN_TYPE and how it's used in the vectorizer.
> > 
> > The predicate is simply true for all BOOLEAN_TYPEs and all types that are
> > compatible with it in the middle-end.  BOOLEAN_TYPEs with different
> > precisions are not considered compatible types, therefore they won't appear
> > together without explicit casts in between.
> I understand that.  My objection is that there's a highly non-obvious
> dependency between useless_type_conversion, INTEGRAL_TYPE_P (in particular
> how its used in the tree-vect-patterns).
> 
> If someone came along and changed useless_type_conversion, they'd have no
> way to know they need to go fix up INTEGRAL_TYPE_P and/or the vectorizer at
> the same time.

I think we have many other places that depend on such behavior of u_t_c,
after all, the patch decreases the amount of such spots by replacing 3
places doing such checks manually with the macro that does the check.

> Thus my suggestion that you explicitly check the precision and rename the
> macro.  THough I don't offhand have a better suggestion.

Not sure I understand what you mean explicitly check the precision,
the macro checks the precision already, and intentionally only for
non-BOOLEAN_TYPE.  If you mean checking precision explicitly in the spots
where the macro is used, that would be worse for the hypothetical case when
u_t_c would change in this regard, you'd have far more places to change.
As for macro name, I came up e.g. with BOOLEAN_COMPATIBLE_P or
BOOLEAN_COMPATIBLE_TYPE_P, perhaps those could make it clearer on what it
is.

Jakub


Re: [PATCH] Fix PR71824

2017-02-01 Thread Richard Biener
On Tue, 31 Jan 2017, Sebastian Pop wrote:

> On Tue, Jan 31, 2017 at 9:11 AM, Richard Biener  wrote:
> > On Tue, 31 Jan 2017, Richard Biener wrote:
> >
> >> On Tue, 31 Jan 2017, Sebastian Pop wrote:
> >>
> >> > Resend as plain text to please gcc-patches@
> >> >
> >> > On Tue, Jan 31, 2017 at 8:39 AM, Sebastian Pop  wrote:
> >> > >
> >> > >
> >> > > On Tue, Jan 31, 2017 at 7:49 AM, Richard Biener  
> >> > > wrote:
> >> > >>
> >> > >>
> >> > >> The following fixes an ICE that happens because instantiate_scev
> >> > >> doesn't really work as expected for SESE regions (a FIXME comment
> >> > >> hints at this already).  So instead of asserting all goes well
> >> > >> just bail out (add_loop_constraints seems to add constraints not
> >> > >> required for correctness?).
> >> > >
> >> > >
> >> > > The conditions under which a loop is executed are required for 
> >> > > correctness.
> >> > > There is a similar check in scop_detection::can_represent_loop_1
> >> > >
> >> > > && (niter = number_of_latch_executions (loop))
> >> > > && !chrec_contains_undetermined (niter)
> >> > >
> >> > > that is supposed to filter out all these loops where this assert does 
> >> > > not
> >> > > hold.
> >> > > The question is: why scop detection has not rejected this loop?
> >> > >
> >> > > Well, I see that we do not check that niter can be analyzed in the 
> >> > > region:
> >> > > so we would need another check like this:
> >> > >
> >> > > diff --git a/gcc/graphite-scop-detection.c 
> >> > > b/gcc/graphite-scop-detection.c
> >> > > index 3860693..8e14412 100644
> >> > > --- a/gcc/graphite-scop-detection.c
> >> > > +++ b/gcc/graphite-scop-detection.c
> >> > > @@ -931,6 +931,7 @@ scop_detection::can_represent_loop_1 (loop_p loop,
> >> > > sese_l scop)
> >> > >  && niter_desc.control.no_overflow
> >> > >  && (niter = number_of_latch_executions (loop))
> >> > >  && !chrec_contains_undetermined (niter)
> >> > > +&& !chrec_contains_undetermined (scalar_evolution_in_region (scop,
> >> > > loop, niter))
> >> > >  && graphite_can_represent_expr (scop, loop, niter);
> >> > >  }
> >> > >
> >> > > Could you please try this patch and see whether it fixes the problem?
> >>
> >> It doesn't.  It seems we test the above before the regions are
> >> eventually merged?  That is, the above enters with
> >>
> >> $46 = (const sese_l &) @0x7fffd6f0: {
> >>   entry =  7)>,
> >>   exit =  8)>}
> >>
> >> but the failing case with
> >>
> >> $15 = (const sese_l &) @0x298b420: {entry = 
> >> 3)>,
> >>   exit =  15)>}
> >
> > Index: graphite-scop-detection.c
> > ===
> > --- graphite-scop-detection.c   (revision 245064)
> > +++ graphite-scop-detection.c   (working copy)
> > @@ -905,7 +905,9 @@ scop_detection::build_scop_breadth (sese
> >
> >sese_l combined = merge_sese (s1, s2);
> >
> > -  if (combined)
> > +  if (combined
> > +  && loop_is_valid_in_scop (loop, combined)
> > +  && loop_is_valid_in_scop (loop->next, combined))
> 
> Looks good.  Thanks for the fix!

Applied as follows, bootstrapped & tested on x86_64-unknown-linux-gnu.

Richard.

2017-02-01  Richard Biener  

PR tree-optimization/71824
* graphite-scop-detection.c (scop_detection::build_scop_breadth):
Verify the loops are valid in the merged SESE region.
(scop_detection::can_represent_loop_1): Check analyzing the
evolution of the number of iterations in the region succeeds.

* gcc.dg/graphite/pr71824.c: New testcase.

Index: gcc/graphite-scop-detection.c
===
--- gcc/graphite-scop-detection.c   (revision 245064)
+++ gcc/graphite-scop-detection.c   (working copy)
@@ -905,7 +905,9 @@ scop_detection::build_scop_breadth (sese
 
   sese_l combined = merge_sese (s1, s2);
 
-  if (combined)
+  if (combined
+  && loop_is_valid_in_scop (loop, combined)
+  && loop_is_valid_in_scop (loop->next, combined))
 s1 = combined;
   else
 add_scop (s2);
@@ -931,6 +933,8 @@ scop_detection::can_represent_loop_1 (lo
 && niter_desc.control.no_overflow
 && (niter = number_of_latch_executions (loop))
 && !chrec_contains_undetermined (niter)
+&& !chrec_contains_undetermined (scalar_evolution_in_region (scop,
+loop, niter))
 && graphite_can_represent_expr (scop, loop, niter);
 }
 
Index: gcc/testsuite/gcc.dg/graphite/pr71824.c
===
--- gcc/testsuite/gcc.dg/graphite/pr71824.c (nonexistent)
+++ gcc/testsuite/gcc.dg/graphite/pr71824.c (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O2 -floop-nest-optimize" } */
+
+int a, b, d;
+int **c;
+int fn1() {
+while (a)
+  if (d) {
+ int e = -d;
+ for (; b < e; b++)
+   c[b] = &a;
+  } else {
+ for (; b; b++)
+   c[b] = &

Re: [PATCH] Fix bool vs. unsigned:1 vectorization (PR tree-optimization/79284)

2017-02-01 Thread Richard Biener
On Wed, 1 Feb 2017, Jakub Jelinek wrote:

> On Tue, Jan 31, 2017 at 04:31:28PM -0700, Jeff Law wrote:
> > On 01/31/2017 04:22 PM, Jakub Jelinek wrote:
> > > On Tue, Jan 31, 2017 at 03:52:10PM -0700, Jeff Law wrote:
> > > > Which makes your patch safe -- but introduces a non-obvious dependency
> > > > between useless_type_conversion_p and your definition of
> > > > INTEGRAL_BOOLEAN_TYPE and how it's used in the vectorizer.
> > > 
> > > The predicate is simply true for all BOOLEAN_TYPEs and all types that are
> > > compatible with it in the middle-end.  BOOLEAN_TYPEs with different
> > > precisions are not considered compatible types, therefore they won't 
> > > appear
> > > together without explicit casts in between.
> > I understand that.  My objection is that there's a highly non-obvious
> > dependency between useless_type_conversion, INTEGRAL_TYPE_P (in particular
> > how its used in the tree-vect-patterns).
> > 
> > If someone came along and changed useless_type_conversion, they'd have no
> > way to know they need to go fix up INTEGRAL_TYPE_P and/or the vectorizer at
> > the same time.
> 
> I think we have many other places that depend on such behavior of u_t_c,
> after all, the patch decreases the amount of such spots by replacing 3
> places doing such checks manually with the macro that does the check.
> 
> > Thus my suggestion that you explicitly check the precision and rename the
> > macro.  THough I don't offhand have a better suggestion.
> 
> Not sure I understand what you mean explicitly check the precision,
> the macro checks the precision already, and intentionally only for
> non-BOOLEAN_TYPE.  If you mean checking precision explicitly in the spots
> where the macro is used, that would be worse for the hypothetical case when
> u_t_c would change in this regard, you'd have far more places to change.
> As for macro name, I came up e.g. with BOOLEAN_COMPATIBLE_P or
> BOOLEAN_COMPATIBLE_TYPE_P, perhaps those could make it clearer on what it
> is.

+/* Nonzero if TYPE represents a (scalar) boolean type or type
+   in the middle-end compatible with it.  */
+
+#define INTEGRAL_BOOLEAN_TYPE_P(TYPE) \
+  (TREE_CODE (TYPE) == BOOLEAN_TYPE\
+   || ((TREE_CODE (TYPE) == INTEGER_TYPE   \
+   || TREE_CODE (TYPE) == ENUMERAL_TYPE)   \
+   && TYPE_PRECISION (TYPE) == 1   \
+   && TYPE_UNSIGNED (TYPE)))

(just to quote what you proposed).

As of useless_type_conversion_p, I don't remember why we have

  /* Preserve conversions to/from BOOLEAN_TYPE if types are not
 of precision one.  */
  if (((TREE_CODE (inner_type) == BOOLEAN_TYPE)
   != (TREE_CODE (outer_type) == BOOLEAN_TYPE))
  && TYPE_PRECISION (outer_type) != 1)
return false;

it came with r173854 where you see other BOOLEAN_TYPE
-> integral-type with precision 1 check changes, so a new predicate
is very welcome IMHO.

all BOOLEAN_TYPEs but Adas have precision one and are unsigned
(their TYPE_SIZE may vary though).  Adas larger precision boolean
has only two valid values but needs to be able to encode some 'NaT'
state.

I think BOOLEAN_COMPATIBLE_TYPE_P would be misleading as it isn't
equal to types_compatible_p (boolean_type_node, t).

Maybe we want TWO_VALUED_UNSIGNED_INTEGRAL_TYPE_P ()? (ick)
I thought "BOOLEAN" covers TWO_VALUED_UNSIGNED well enough but
simply BOOLEAN_TYPE_P is easily confused with TREE_CODE () == 
BOOLEAN_TYPE.

I'm fine with changing the predicate to be more explicit, like

#define INTEGRAL_BOOLEAN_TYPE_P(TYPE) \
  (INTEGRAL_TYPE_P (TYPE) && TYPE_PRECISION (TYPE) == 1)

not sure if we really need the TYPE_UNSIGNED check?  The middle-end
has various places that just check for a 1-precision type when
asking for a boolean context.

So naming set aside, would you agree with the above definition?
(modulo a && TYPE_UNSIGNED (TYPE))?

Richard.


Re: [PATCH] Fix bool vs. unsigned:1 vectorization (PR tree-optimization/79284)

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 09:21:57AM +0100, Richard Biener wrote:
> it came with r173854 where you see other BOOLEAN_TYPE
> -> integral-type with precision 1 check changes, so a new predicate
> is very welcome IMHO.
> 
> all BOOLEAN_TYPEs but Adas have precision one and are unsigned
> (their TYPE_SIZE may vary though).  Adas larger precision boolean
> has only two valid values but needs to be able to encode some 'NaT'
> state.
> 
> I think BOOLEAN_COMPATIBLE_TYPE_P would be misleading as it isn't
> equal to types_compatible_p (boolean_type_node, t).
> 
> Maybe we want TWO_VALUED_UNSIGNED_INTEGRAL_TYPE_P ()? (ick)
> I thought "BOOLEAN" covers TWO_VALUED_UNSIGNED well enough but
> simply BOOLEAN_TYPE_P is easily confused with TREE_CODE () == 
> BOOLEAN_TYPE.
> 
> I'm fine with changing the predicate to be more explicit, like
> 
> #define INTEGRAL_BOOLEAN_TYPE_P(TYPE) \
>   (INTEGRAL_TYPE_P (TYPE) && TYPE_PRECISION (TYPE) == 1)
> 
> not sure if we really need the TYPE_UNSIGNED check?  The middle-end
> has various places that just check for a 1-precision type when
> asking for a boolean context.
> 
> So naming set aside, would you agree with the above definition?
> (modulo a && TYPE_UNSIGNED (TYPE))?

The TYPE_UNSIGNED check is there so that we don't consider signed 1 bit
bitfields as boolean, those are not compatible with any BOOLEAN_TYPE
(because they have different sign) and from the past experience, are
terribly hard to support properly ("true" is negative, is smaller than
"false", we had dozens of PRs about that kind of stuff already).

As for always checking TYPE_PRECISION == 1 and thus not considering
the Ada BOOLEAN_TYPE as something that should be vectorized as boolean
vector, I'm afraid (but haven't tried to prove that) it would make lots
of Ada testcases no longer vectorizable or vectorizable with much worse
code (especially on AVX512), and/or cause ICEs (if there are assumptions
e.g. that in COND_EXPR the first operand, if it is SSA_NAME, must be
vectorized as boolean vector type).

Jakub


Re: [PATCH] Fix bool vs. unsigned:1 vectorization (PR tree-optimization/79284)

2017-02-01 Thread Eric Botcazou
> all BOOLEAN_TYPEs but Adas have precision one and are unsigned
> (their TYPE_SIZE may vary though).

/* Builds a boolean type of precision PRECISION.
   Used for boolean vectors to choose proper vector element size.  */
tree
build_nonstandard_boolean_type (unsigned HOST_WIDE_INT precision)
{
  tree type;

  if (precision <= MAX_BOOL_CACHED_PREC)
{
  type = nonstandard_boolean_type_cache[precision];
  if (type)
return type;
}

  type = make_node (BOOLEAN_TYPE);
  TYPE_PRECISION (type) = precision;
  fixup_signed_type (type);

  if (precision <= MAX_INT_CACHED_PREC)
nonstandard_boolean_type_cache[precision] = type;

  return type;
}

-- 
Eric Botcazou


Re: [PATCH] Fix bool vs. unsigned:1 vectorization (PR tree-optimization/79284)

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 09:32:46AM +0100, Eric Botcazou wrote:
> > all BOOLEAN_TYPEs but Adas have precision one and are unsigned
> > (their TYPE_SIZE may vary though).

Oops, thanks for the correction.  That said, the prec > 1 BOOLEAN_TYPE
aren't compatible with the prec 1 BOOLEAN_TYPE, so the intent of the
predicate is to support precision 1 BOOLEAN_TYPE and anything
types_compatible_p with it, or precision > 1 BOOLEAN_TYPEs (where
anything types_compatible_p with it must be BOOLEAN_TYPE with the
same precision and signedness).

> /* Builds a boolean type of precision PRECISION.
>Used for boolean vectors to choose proper vector element size.  */
> tree
> build_nonstandard_boolean_type (unsigned HOST_WIDE_INT precision)
> {
>   tree type;
> 
>   if (precision <= MAX_BOOL_CACHED_PREC)
> {
>   type = nonstandard_boolean_type_cache[precision];
>   if (type)
>   return type;
> }
> 
>   type = make_node (BOOLEAN_TYPE);
>   TYPE_PRECISION (type) = precision;
>   fixup_signed_type (type);
> 
>   if (precision <= MAX_INT_CACHED_PREC)
> nonstandard_boolean_type_cache[precision] = type;
> 
>   return type;
> }

Jakub


Re: [PATCH] Fix PR79278, amend ADJUST_FIELD_ALIGN interface

2017-02-01 Thread Richard Biener
On Tue, 31 Jan 2017, Jeff Law wrote:

> On 01/31/2017 02:01 AM, Richard Biener wrote:
> > 
> > This amends ADJUST_FIELD_ALIGN to always get the type of the field
> > as argument but make the field itself optional.  All actual target
> > macro implementations only look at the type of the field but FRV
> > (which seems to misuse ADJUST_FIELD_ALIGN to do bitfield layout
> > rather than using one of the three standard ways - Alex/Nick?).
> Didn't we deprecate FRV?  Oh, that was MEP..  Nevermind.
> 
> 
> 
> > This speeds up min_align_of_type (no longer needs to build a FIELD_DECL)
> > and thus (IMHO) makes it usable from get_object_alignment.  This
> > causes us no longer to return bogus answers for indirect accesses to
> > doubles on i?86 and expand to RTL with proper MEM_ALIGN.  (it also
> > makes the previous fix for PR79256 no longer necessary)
> > 
> > Bootstrap and regtest running on x86_64-unknown-linux-gnu - is this ok
> > for trunk at this stage?
> > 
> > grep found a ADJUST_FIELD_ALIGN use in libobjc/encoding.c but that
> > is fed a C string(!?) as FIELD_DECL so I discounted it as unrelated
> > (and grep didn't find a way this macro could be defined there)?
> Presumably this is the code that takes the structure and encodes information
> about it for the runtime.  Though taking a string sounds horribly broken.
> 
> I suspect it gets included via tm.h.
> 
> I bet if someone built a cross far enough to build libobjc we could see it in
> action.  It does make one wonder how this part of libobjc could possibly be
> working on targets that define ADJUST_FIELD_ALIGN.
> 
> I'll note it's been that way since libobjc was moved into its own directory,
> but wasn't like that prior to moving into its own directory.
> 
> I have no idea what to do here...

As objc builds just fine on x86_64 which does define ADJUST_FIELD_ALIGN
including tm.h can't be the whole story here...

In fact preprocessed source on x86_64 with -dD shows no ADJUST_FIELD_ALIGN
but instead

#define rs6000_special_adjust_field_align_p(FIELD,COMPUTED) 0

which means it must be sth powerpc specific...  FRV for example has

/* @@@ A hack, needed because libobjc wants to use ADJUST_FIELD_ALIGN for
   some reason.  */
#ifdef IN_TARGET_LIBS
#define BIGGEST_FIELD_ALIGNMENT 64
#else
/* An expression for the alignment of a structure field FIELD if the
   alignment computed in the usual way is COMPUTED.  GCC uses this
   value instead of the value in `BIGGEST_ALIGNMENT' or
   `BIGGEST_FIELD_ALIGNMENT', if defined, for structure fields only.  */
#define ADJUST_FIELD_ALIGN(FIELD, COMPUTED) \
  frv_adjust_field_align (FIELD, COMPUTED)
#endif

Similar x86_64.

So it seems on power this might be an issue and thus I'd need to
adjust the macro use - but not sure what to pass as "type" here...

I'll try to build a cross to ppc64 and see what happens.

Richard.


Re: [patch 79279] combine/simplify_set issue

2017-02-01 Thread Aurelien Buhrig
On 31/01/2017 22:15, Segher Boessenkool wrote:
> Hello,
>
> On Mon, Jan 30, 2017 at 10:43:23AM +0100, Aurelien Buhrig wrote:
>> This patch fixes a combiner bug in simplify_set which calls
>> CANNOT_CHANGE_MODE_CLASS with wrong mode params.
>> It occurs when trying to simplify paradoxical subregs of hw regs (whose
>> natural mode is lower than a word).
>>
>> In fact, changing from (set x:m1 (subreg:m1 (op:m2))) to (set (subreg:m2
>> x)  op2) is valid if REG_CANNOT_CHANGE_MODE_P (x, m1, m2) is false
>> and REG_CANNOT_CHANGE_MODE_P (x, GET_MODE (src), GET_MODE (SUBREG_REG
>> (src))
> r62212 (in 2003) changed it to what we have now, it used to be what you
> want to change it back to.
>
> You say m1 > m2, which means you have WORD_REGISTER_OPERATIONS defined.
No, just some hard regs whose natural mode size is 2 and UNIT_PER_WORD
is 4...
>
> Where does this transformation go wrong?  Why is the resulting insn
> recognised at all?  For example, general_operand should refuse it.
> Maybe you have custom *_operand that do not handle subreg correctly?
>
> The existing code looks correct: what we want to know is if an m2
> interpreted as an m1 yields the correct value.  We might *also* want
> your condition, but I'm not sure about that.
OK, looks like both m1->m2 & m2 -> m1 checks would be needed, but the m1
-> m2 should be filtererd by valid predicates (general_operand).
Sorry about that.

>>See:
>> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79279
>>
>> Validated on a private target,
>> bootstraped on x86_64-pc-linux-gnu, but I'm not sure which target is the
>> most relevant for this patch though ...
>>
>> OK to commit?
> Sorry, no.  We're currently in development stage 4, and this is not a
> regression (see ).  But we can
> of course discuss this further, and you can repost the patch when stage 1
> opens (a few months from now) if you still want it.
OK, but not sure if it needs to be patched any more.

Aurélien


Re: [PATCH] Fix bool vs. unsigned:1 vectorization (PR tree-optimization/79284)

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 09:21:57AM +0100, Richard Biener wrote:
> > Not sure I understand what you mean explicitly check the precision,
> > the macro checks the precision already, and intentionally only for
> > non-BOOLEAN_TYPE.  If you mean checking precision explicitly in the spots
> > where the macro is used, that would be worse for the hypothetical case when
> > u_t_c would change in this regard, you'd have far more places to change.
> > As for macro name, I came up e.g. with BOOLEAN_COMPATIBLE_P or
> > BOOLEAN_COMPATIBLE_TYPE_P, perhaps those could make it clearer on what it
> > is.
> 
> +/* Nonzero if TYPE represents a (scalar) boolean type or type
> +   in the middle-end compatible with it.  */
> +
> +#define INTEGRAL_BOOLEAN_TYPE_P(TYPE) \
> +  (TREE_CODE (TYPE) == BOOLEAN_TYPE\
> +   || ((TREE_CODE (TYPE) == INTEGER_TYPE   \
> +   || TREE_CODE (TYPE) == ENUMERAL_TYPE)   \
> +   && TYPE_PRECISION (TYPE) == 1   \
> +   && TYPE_UNSIGNED (TYPE)))
> 
> (just to quote what you proposed).

So would it help to use
  (TREE_CODE (TYPE) == BOOLEAN_TYPE
   || (INTEGRAL_TYPE_P (TYPE)
   && useless_type_conversion_p (boolean_type_node, TYPE)))
It would be much slower than the above, but would be less dependent
on useless_type_conversion_p details.

Jakub


Re: [PATCH] Fix bool vs. unsigned:1 vectorization (PR tree-optimization/79284)

2017-02-01 Thread Richard Biener
On Wed, 1 Feb 2017, Jakub Jelinek wrote:

> On Wed, Feb 01, 2017 at 09:21:57AM +0100, Richard Biener wrote:
> > > Not sure I understand what you mean explicitly check the precision,
> > > the macro checks the precision already, and intentionally only for
> > > non-BOOLEAN_TYPE.  If you mean checking precision explicitly in the spots
> > > where the macro is used, that would be worse for the hypothetical case 
> > > when
> > > u_t_c would change in this regard, you'd have far more places to change.
> > > As for macro name, I came up e.g. with BOOLEAN_COMPATIBLE_P or
> > > BOOLEAN_COMPATIBLE_TYPE_P, perhaps those could make it clearer on what it
> > > is.
> > 
> > +/* Nonzero if TYPE represents a (scalar) boolean type or type
> > +   in the middle-end compatible with it.  */
> > +
> > +#define INTEGRAL_BOOLEAN_TYPE_P(TYPE) \
> > +  (TREE_CODE (TYPE) == BOOLEAN_TYPE\
> > +   || ((TREE_CODE (TYPE) == INTEGER_TYPE   \
> > +   || TREE_CODE (TYPE) == ENUMERAL_TYPE)   \
> > +   && TYPE_PRECISION (TYPE) == 1   \
> > +   && TYPE_UNSIGNED (TYPE)))
> > 
> > (just to quote what you proposed).
> 
> So would it help to use
>   (TREE_CODE (TYPE) == BOOLEAN_TYPE
>|| (INTEGRAL_TYPE_P (TYPE)
>&& useless_type_conversion_p (boolean_type_node, TYPE)))
> It would be much slower than the above, but would be less dependent
> on useless_type_conversion_p details.

For the vectorizer it likely would break the larger logical type
handling?

The question is really what the vectorizer and other places are looking
for -- which isually is a 1-bit precision, eventually unsigned,
integral type.

I can't see where we'd use the variant with useless_type_conversion_p.

Richard.


Re: [PATCH/VECT/AARCH64] Improve cost model for ThunderX2 CN99xx

2017-02-01 Thread Richard Earnshaw (lists)
On 31/01/17 22:34, Andrew Pinski wrote:
> On Sat, Jan 28, 2017 at 12:34 PM, Andrew Pinski  wrote:
>> Hi,
>>   On some (most) AARCH64 cores, it is not always profitable to
>> vectorize some integer loops.  This patch does two things (I can split
>> it into different patches if needed).
>> 1) It splits the aarch64 back-end's vector cost model's vector and
>> scalar costs into int and fp fields
>> 1a) For thunderx2t99, models correctly the integer vector/scalar costs.
>> 2) Fixes/Improves a few calls to record_stmt_cost in tree-vect-loop.c
>> where stmt_info was not being passed.
>>
>> OK?  Bootstrapped and tested on aarch64-linux-gnu and provides 20% on
>> libquantum and ~1% overall on SPEC CPU 2006 int.
> 
> Here is the updated patch with the fixes requested by both Richards.
> Still the same performance as above.
> 
> OK?
> 
> Thanks,
> Andrew
> 
> ChangLog:
> * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost): Pass
> stmt_info to record_stmt_cost.
> (vect_get_known_peeling_cost): Pass stmt_info if known to record_stmt_cost.
> 
> * config/aarch64/aarch64-protos.h (cpu_vector_cost): Split
> cpu_vector_cost field into
> scalar_int_stmt_cost and scalar_fp_stmt_cost.  Split vec_stmt_cost
> field into vec_int_stmt_cost and vec_fp_stmt_cost.
> * config/aarch64/aarch64.c (generic_vector_cost): Update for the
> splitting of scalar_stmt_cost and vec_stmt_cost.
> (thunderx_vector_cost): Likewise.
> (cortexa57_vector_cost): LIkewise.
> (exynosm1_vector_cost): Likewise.
> (xgene1_vector_cost): Likewise.
> (thunderx2t99_vector_cost): Improve after the splitting of the two fields.
> (aarch64_builtin_vectorization_cost): Update for the splitting of
> scalar_stmt_cost and vec_stmt_cost.
> 
>>
>> Thanks,
>> Andrew Pinski
>>
>> ChangeLog:
>> * tree-vect-loop.c (vect_compute_single_scalar_iteration_cost): Pass
>> stmt_info to record_stmt_cost.
>> (vect_get_known_peeling_cost): Pass stmt_info if known to record_stmt_cost.
>>
>> * config/aarch64/aarch64-protos.h (cpu_vector_cost): Split
>> cpu_vector_cost field into
>> scalar_int_stmt_cost and scalar_fp_stmt_cost.  Split vec_stmt_cost
>> field into vec_int_stmt_cost and vec_fp_stmt_cost.
>> * config/aarch64/aarch64.c (generic_vector_cost): Update for the
>> splitting of scalar_stmt_cost and vec_stmt_cost.
>> (thunderx_vector_cost): Likewise.
>> (cortexa57_vector_cost): LIkewise.
>> (exynosm1_vector_cost): Likewise.
>> (xgene1_vector_cost): Likewise.
>> (thunderx2t99_vector_cost): Improve after the splitting of the two fields.
>> (aarch64_builtin_vectorization_cost): Update for the splitting of
>> scalar_stmt_cost and vec_stmt_cost.
>>

OK.

R.

>> updatedvectcost.diff.txt
>>
>>
>> Index: config/aarch64/aarch64-protos.h
>> ===
>> --- config/aarch64/aarch64-protos.h  (revision 245070)
>> +++ config/aarch64/aarch64-protos.h  (working copy)
>> @@ -151,11 +151,17 @@ struct cpu_regmove_cost
>>  /* Cost for vector insn classes.  */
>>  struct cpu_vector_cost
>>  {
>> -  const int scalar_stmt_cost;/* Cost of any scalar 
>> operation,
>> +  const int scalar_int_stmt_cost;/* Cost of any int scalar operation,
>> +excluding load and store.  */
>> +  const int scalar_fp_stmt_cost; /* Cost of any fp scalar operation,
>>  excluding load and store.  */
>>const int scalar_load_cost;/* Cost of scalar load.  */
>>const int scalar_store_cost;   /* Cost of scalar store.  */
>> -  const int vec_stmt_cost;   /* Cost of any vector operation,
>> +  const int vec_int_stmt_cost;   /* Cost of any int vector 
>> operation,
>> +excluding load, store, permute,
>> +vector-to-scalar and
>> +scalar-to-vector operation.  */
>> +  const int vec_fp_stmt_cost;/* Cost of any fp vector 
>> operation,
>>  excluding load, store, permute,
>>  vector-to-scalar and
>>  scalar-to-vector operation.  */
>> Index: config/aarch64/aarch64.c
>> ===
>> --- config/aarch64/aarch64.c (revision 245070)
>> +++ config/aarch64/aarch64.c (working copy)
>> @@ -365,10 +365,12 @@ static const struct cpu_regmove_cost thu
>>  /* Generic costs for vector insn classes.  */
>>  static const struct cpu_vector_cost generic_vector_cost =
>>  {
>> -  1, /* scalar_stmt_cost  */
>> +  1, /* scalar_int_stmt_cost  */
>> +  1, /* scalar_fp_stmt_cost  */
>>1, /* scalar_load_cost  */
>>1, /* scalar_store_cost  */
>> -  1, /* vec_stmt_cost  */
>> +  1, /* vec_int_stmt_cost  */
>> +  1, /* vec_fp_stmt_cost  */
>>2, /* vec_permute_cost  */
>>1, /* vec_to_scalar_cost  */
>>1, /* scalar_to_vec_c

Re: [PATCH] Fix bool vs. unsigned:1 vectorization (PR tree-optimization/79284)

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 10:58:29AM +0100, Richard Biener wrote:
> > > +/* Nonzero if TYPE represents a (scalar) boolean type or type
> > > +   in the middle-end compatible with it.  */
> > > +
> > > +#define INTEGRAL_BOOLEAN_TYPE_P(TYPE) \
> > > +  (TREE_CODE (TYPE) == BOOLEAN_TYPE\
> > > +   || ((TREE_CODE (TYPE) == INTEGER_TYPE   \
> > > +   || TREE_CODE (TYPE) == ENUMERAL_TYPE)   \
> > > +   && TYPE_PRECISION (TYPE) == 1   \
> > > +   && TYPE_UNSIGNED (TYPE)))
> > > 
> > > (just to quote what you proposed).
> > 
> > So would it help to use
> >   (TREE_CODE (TYPE) == BOOLEAN_TYPE
> >|| (INTEGRAL_TYPE_P (TYPE)
> >&& useless_type_conversion_p (boolean_type_node, TYPE)))
> > It would be much slower than the above, but would be less dependent
> > on useless_type_conversion_p details.
> 
> For the vectorizer it likely would break the larger logical type
> handling?

Why?  It is the same thing as the earlier macro above.
Any kind of boolean, plus anything that could be initially boolean
and the middle-end might have replaced it with.

> The question is really what the vectorizer and other places are looking
> for -- which isually is a 1-bit precision, eventually unsigned,
> integral type.

It is looking for any type where the only valid values are 0 (false) and 1
(true), so that it can actually vectorize it as a bitmask, or vector of
integers with -1 and 0 values.

Jakub


Re: [PATCH] Fix bool vs. unsigned:1 vectorization (PR tree-optimization/79284)

2017-02-01 Thread Richard Biener
On Wed, 1 Feb 2017, Jakub Jelinek wrote:

> On Wed, Feb 01, 2017 at 10:58:29AM +0100, Richard Biener wrote:
> > > > +/* Nonzero if TYPE represents a (scalar) boolean type or type
> > > > +   in the middle-end compatible with it.  */
> > > > +
> > > > +#define INTEGRAL_BOOLEAN_TYPE_P(TYPE) \
> > > > +  (TREE_CODE (TYPE) == BOOLEAN_TYPE\
> > > > +   || ((TREE_CODE (TYPE) == INTEGER_TYPE   \
> > > > +   || TREE_CODE (TYPE) == ENUMERAL_TYPE)   \
> > > > +   && TYPE_PRECISION (TYPE) == 1   \
> > > > +   && TYPE_UNSIGNED (TYPE)))
> > > > 
> > > > (just to quote what you proposed).
> > > 
> > > So would it help to use
> > >   (TREE_CODE (TYPE) == BOOLEAN_TYPE
> > >|| (INTEGRAL_TYPE_P (TYPE)
> > >&& useless_type_conversion_p (boolean_type_node, TYPE)))
> > > It would be much slower than the above, but would be less dependent
> > > on useless_type_conversion_p details.
> > 
> > For the vectorizer it likely would break the larger logical type
> > handling?
> 
> Why?  It is the same thing as the earlier macro above.
> Any kind of boolean, plus anything that could be initially boolean
> and the middle-end might have replaced it with.

boolean_type_node is QImode but logical(8) is DImode for example.
Both have precision == 1 but they are not types_compatible_p
(you probably missed the mode check in useless_type_conversion_p).

> > The question is really what the vectorizer and other places are looking
> > for -- which isually is a 1-bit precision, eventually unsigned,
> > integral type.
> 
> It is looking for any type where the only valid values are 0 (false) and 1
> (true), so that it can actually vectorize it as a bitmask, or vector of
> integers with -1 and 0 values.

That's INTEGRAL_TYPE_P && TYPE_PRECISION == 1 && TYPE_UNSIGNED.  The
Ada types do not fall under this category as far as I understand as
the exceptional values may exist in memory(?)

Richard.


RE: [PATCH] MIPS: Fix mode mismatch error between Loongson builtin arguments and insn operands.

2017-02-01 Thread Toma Tabacu
> From: Matthew Fortune
> > +/* The third argument needs to be in SImode in order to succesfully
> > match
> > +   the operand from the insn definition.  */
> 
> Please refer to operand here not argument as it is the second argument
> to the builtin but third operand of the instruction.  Also double ss in
> successfully.
> 

I have rewritten the comment to address these mistakes.

> > +case CODE_FOR_loongson_pshufh:
> > +case CODE_FOR_loongson_psllh:
> > +case CODE_FOR_loongson_psllw:
> > +case CODE_FOR_loongson_psrah:
> > +case CODE_FOR_loongson_psraw:
> > +case CODE_FOR_loongson_psrlh:
> > +case CODE_FOR_loongson_psrlw:
> > +  gcc_assert (has_target_p && nops == 3 && ops[2].mode == QImode);
> > +  ops[2].value = lowpart_subreg (SImode, ops[2].value, QImode);
> > +  ops[2].mode = SImode;
> > +  break;
> > +
> >  case CODE_FOR_msa_addvi_b:
> >  case CODE_FOR_msa_addvi_h:
> >  case CODE_FOR_msa_addvi_w:
> 
> For the record, given paradoxical subregs are a headache...
> I am OK with this on the basis that the argument to psllh etc is actually
> a uint8_t which means that bits 8 upwards are guaranteed to be zero so
> the subreg can be eliminated without any explicit sign or zero extension
> inserted.  This is the same kind of optimisation that combine would
> perform when eliminating zero extension.
> 
> Please can you check that a zero extension is inserted for the following
> case with -O2 or above:
> 
> int16x4_t testme(int16x4_t s, int amount)
> {
>   return psllh_s (s, amount);
> }
> 
> If my understanding is correct there should be an ANDI 0xff inserted
> or similar.
> 

The ANDI 0xff is present for -O0, after the first time the value is loaded
from memory, but it is not generated for -O1 and -O2.
I'm not seeing any zero extension happening for -O1 and -O2.

The only change in the patch below is the fixed comment.

Regards,
Toma

gcc/

* config/mips/mips.c (mips_expand_builtin_insn): Put the QImode
argument of the pshufh, psllh, psllw, psrah, psraw, psrlh, psrlw
builtins into an SImode paradoxical SUBREG.

diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index da7fa8f..e5b2d9a 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -16574,6 +16574,20 @@ mips_expand_builtin_insn (enum insn_code icode, 
unsigned int nops,
 
   switch (icode)
 {
+/* The third operand of these instructions is in SImode, so we need to
+   bring the corresponding builtin argument from QImode into SImode.  */
+case CODE_FOR_loongson_pshufh:
+case CODE_FOR_loongson_psllh:
+case CODE_FOR_loongson_psllw:
+case CODE_FOR_loongson_psrah:
+case CODE_FOR_loongson_psraw:
+case CODE_FOR_loongson_psrlh:
+case CODE_FOR_loongson_psrlw:
+  gcc_assert (has_target_p && nops == 3 && ops[2].mode == QImode);
+  ops[2].value = lowpart_subreg (SImode, ops[2].value, QImode);
+  ops[2].mode = SImode;
+  break;
+
 case CODE_FOR_msa_addvi_b:
 case CODE_FOR_msa_addvi_h:
 case CODE_FOR_msa_addvi_w:



Re: [PATCH] Fix bool vs. unsigned:1 vectorization (PR tree-optimization/79284)

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 11:07:18AM +0100, Richard Biener wrote:
> > On Wed, Feb 01, 2017 at 10:58:29AM +0100, Richard Biener wrote:
> > > > > +/* Nonzero if TYPE represents a (scalar) boolean type or type
> > > > > +   in the middle-end compatible with it.  */
> > > > > +
> > > > > +#define INTEGRAL_BOOLEAN_TYPE_P(TYPE) \
> > > > > +  (TREE_CODE (TYPE) == BOOLEAN_TYPE\
> > > > > +   || ((TREE_CODE (TYPE) == INTEGER_TYPE   \
> > > > > +   || TREE_CODE (TYPE) == ENUMERAL_TYPE)   \
> > > > > +   && TYPE_PRECISION (TYPE) == 1   \
> > > > > +   && TYPE_UNSIGNED (TYPE)))
> > > > > 
> > > > > (just to quote what you proposed).
> > > > 
> > > > So would it help to use
> > > >   (TREE_CODE (TYPE) == BOOLEAN_TYPE
> > > >|| (INTEGRAL_TYPE_P (TYPE)
> > > >&& useless_type_conversion_p (boolean_type_node, TYPE)))
> > > > It would be much slower than the above, but would be less dependent
> > > > on useless_type_conversion_p details.
> > > 
> > > For the vectorizer it likely would break the larger logical type
> > > handling?
> > 
> > Why?  It is the same thing as the earlier macro above.
> > Any kind of boolean, plus anything that could be initially boolean
> > and the middle-end might have replaced it with.
> 
> boolean_type_node is QImode but logical(8) is DImode for example.
> Both have precision == 1 but they are not types_compatible_p
> (you probably missed the mode check in useless_type_conversion_p).

Then the earlier macro should have used also a TYPE_MODE check.
The latter macro is what I've been looking for in the patch, except that
I thought it is too expensive (plus might actually not DTRT if
boolean_type_node has > 1 precision, but some unsigned precision 1
BOOLEAN_TYPE exists too; as Fortran has changed, now it is only Ada that
doesn't have precision 1 boolean_type_node, but then it likely doesn't
have any precision 1 BOOLEAN_TYPEs).

> > > The question is really what the vectorizer and other places are looking
> > > for -- which isually is a 1-bit precision, eventually unsigned,
> > > integral type.
> > 
> > It is looking for any type where the only valid values are 0 (false) and 1
> > (true), so that it can actually vectorize it as a bitmask, or vector of
> > integers with -1 and 0 values.
> 
> That's INTEGRAL_TYPE_P && TYPE_PRECISION == 1 && TYPE_UNSIGNED.  The
> Ada types do not fall under this category as far as I understand as
> the exceptional values may exist in memory(?)

But the exceptional values are undefined behavior I believe.
Anyway, we've been vectorizing not just the 1-bit precision BOOLEAN_TYPEs
but other BOOLEAN_TYPEs (Ada and especially Fortran) for a couple of years
this way already and I'm not aware of issues with that.

So suddenly stopping doing that because we want to fix a bug related
to the fact that for 1-bit precision unsigned QImode BOOLEAN_TYPE the
middle-end considers other types to be compatible with those is strange
and very risky, especially at this point in GCC 7 development.

Jakub


Re: [PATCH] Fix PGO bootstrap on x390x (PR bootstrap/78985).

2017-02-01 Thread Martin Liška
On 01/31/2017 05:47 PM, Jeff Law wrote:
> On 01/30/2017 06:32 AM, Martin Liška wrote:
>> On 01/30/2017 12:27 PM, Martin Liška wrote:
>>> Hi.
>>>
>>> Following patch simply fixes issues reported by -Wmaybe-unitialized. That 
>>> enables PGO bootstrap
>>> on a s390x machine.
>>>
>>> Ready to be installed?
>>> Martin
>>>
>> There's second version that adds one more hunk for s390 target.
>>
>> Martin
>>
>>
>> 0001-Fix-PGO-bootstrap-on-x390x-PR-bootstrap-78985-v2.patch
>>
>>
>> From 598d0a59b91070211b09056195bc0f971bc57ae1 Mon Sep 17 00:00:00 2001
>> From: marxin 
>> Date: Mon, 30 Jan 2017 11:09:29 +0100
>> Subject: [PATCH] Fix PGO bootstrap on x390x (PR bootstrap/78985).
>>
>> gcc/ChangeLog:
>>
>> 2017-01-30  Martin Liska  
>>
>> PR bootstrap/78985
>> * config/s390/s390.c (s390_gimplify_va_arg): Initialize local
>> variable to NULL.
>> (print_operand_address): Initialize a struct to zero.
> Presumably the issue with print_operand_address is that there are paths where 
> s390_decompose_address can return without initializing AD/OUT. But AFAICT 
> those are invalid addresses that presumably shouldn't be showing up in 
> print_operand_address.
> 
> Can you add an assert in print_operand_address to ensure decomposition never 
> returns false?

Like done in v2 of the patch?

If so, I'll commit that.

Martin

> 
> OK with that addition.
> 
> jeff
> 

>From 2896518e33878106ee5a6d4766ec80b0b94ad378 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Mon, 30 Jan 2017 11:09:29 +0100
Subject: [PATCH] Fix PGO bootstrap on x390x (PR bootstrap/78985).

gcc/ChangeLog:

2017-01-30  Martin Liska  

	PR bootstrap/78985
	* config/s390/s390.c (s390_gimplify_va_arg): Initialize local
	variable to NULL.
	(print_operand_address): Initialize a struct to zero. Add assert
	that s390_decompose_address never returns false;
---
 gcc/config/s390/s390.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/gcc/config/s390/s390.c b/gcc/config/s390/s390.c
index 93377cdf7c8..30a06cf9d94 100644
--- a/gcc/config/s390/s390.c
+++ b/gcc/config/s390/s390.c
@@ -7347,6 +7347,7 @@ void
 print_operand_address (FILE *file, rtx addr)
 {
   struct s390_address ad;
+  memset (&ad, 0, sizeof (s390_address));
 
   if (s390_loadrelative_operand_p (addr, NULL, NULL))
 {
@@ -7360,8 +7361,10 @@ print_operand_address (FILE *file, rtx addr)
   return;
 }
 
-  if (!s390_decompose_address (addr, &ad)
-  || (ad.base && !REGNO_OK_FOR_BASE_P (REGNO (ad.base)))
+  bool r = s390_decompose_address (addr, &ad);
+  gcc_assert (r);
+
+  if ((ad.base && !REGNO_OK_FOR_BASE_P (REGNO (ad.base)))
   || (ad.indx && !REGNO_OK_FOR_INDEX_P (REGNO (ad.indx
 output_operand_lossage ("cannot decompose address");
 
@@ -12195,7 +12198,7 @@ s390_gimplify_va_arg (tree valist, tree type, gimple_seq *pre_p,
   tree f_gpr, f_fpr, f_ovf, f_sav;
   tree gpr, fpr, ovf, sav, reg, t, u;
   int indirect_p, size, n_reg, sav_ofs, sav_scale, max_reg;
-  tree lab_false, lab_over;
+  tree lab_false, lab_over = NULL_TREE;
   tree addr = create_tmp_var (ptr_type_node, "addr");
   bool left_align_p; /* How a value < UNITS_PER_LONG is aligned within
 			a stack slot.  */
-- 
2.11.0



[PATCH] Fix PR79315

2017-02-01 Thread Richard Biener

The following fixes ICEs when building SPEC 2k6 with autopar.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2017-02-01  Richard Biener  

PR middle-end/79315
* tree-cfg.c (move_stmt_op): Never set TREE_BLOCK when it
was not set before.

* gfortran.dg/pr79315.f90: New testcase.

Index: gcc/tree-cfg.c
===
--- gcc/tree-cfg.c  (revision 245064)
+++ gcc/tree-cfg.c  (working copy)
@@ -6636,11 +6636,12 @@ move_stmt_op (tree *tp, int *walk_subtre
   if (EXPR_P (t))
 {
   tree block = TREE_BLOCK (t);
-  if (block == p->orig_block
- || (p->orig_block == NULL_TREE
- && block != NULL_TREE))
+  if (block == NULL_TREE)
+   ;
+  else if (block == p->orig_block
+  || p->orig_block == NULL_TREE)
TREE_SET_BLOCK (t, p->new_block);
-  else if (flag_checking && block != NULL_TREE)
+  else if (flag_checking)
{
  while (block && TREE_CODE (block) == BLOCK && block != p->orig_block)
block = BLOCK_SUPERCONTEXT (block);
Index: gcc/testsuite/gfortran.dg/pr79315.f90
===
--- gcc/testsuite/gfortran.dg/pr79315.f90   (nonexistent)
+++ gcc/testsuite/gfortran.dg/pr79315.f90   (working copy)
@@ -0,0 +1,52 @@
+! { dg-do compile }
+! { dg-require-effective-target pthread }
+! { dg-options "-Ofast -ftree-parallelize-loops=4" }
+
+SUBROUTINE wsm32D(t, &
+   w, &
+   den, &
+   p, &
+   delz, &
+ its,&
+   ite, &
+   kts, &
+   kte  &
+  )
+  REAL, DIMENSION( its:ite , kts:kte ),   &
+INTENT(INOUT) ::  &
+   t
+  REAL, DIMENSION( ims:ime , kms:kme ),   &
+INTENT(IN   ) ::   w, &
+ den, &
+   p, &
+delz
+  REAL, DIMENSION( its:ite , kts:kte ) :: &
+qs, &
+xl, &
+work1, &
+work2, &
+qs0, &
+n0sfac
+  diffus(x,y) = 8.794e-5*x**1.81/y
+  diffac(a,b,c,d,e) = d*a*a/(xka(c,d)*rv*c*c)+1./(e*diffus(c,b))
+  venfac(a,b,c) = (viscos(b,c)/diffus(b,a))**(.333)   &
+ /viscos(b,c)**(.5)*(den0/c)**0.25
+  do loop = 1,loops
+  xa=-dldt/rv
+  do k = kts, kte
+do i = its, ite
+  tr=ttp/t(i,k)
+  if(t(i,k).lt.ttp) then
+qs(i,k) =psat*(tr**xa)*exp(xb*(1.-tr))
+  endif
+  qs0(i,k)  =psat*(tr**xa)*exp(xb*(1.-tr))
+enddo
+do i = its, ite
+  if(t(i,k).ge.t0c) then
+work1(i,k) = diffac(xl(i,k),p(i,k),t(i,k),den(i,k),qs(i,k))
+  endif
+  work2(i,k) = venfac(p(i,k),t(i,k),den(i,k))
+enddo
+  enddo
+  enddo  ! big loops
+END SUBROUTINE wsm32D


Re: [PATCH] Fix PGO bootstrap on x390x (PR bootstrap/78985).

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 11:34:48AM +0100, Martin Liška wrote:
> > Presumably the issue with print_operand_address is that there are paths 
> > where s390_decompose_address can return without initializing AD/OUT. But 
> > AFAICT those are invalid addresses that presumably shouldn't be showing up 
> > in print_operand_address.
> > 
> > Can you add an assert in print_operand_address to ensure decomposition 
> > never returns false?

Can't it happen e.g. with inline asm and "X" constraint?
output_operand_lossage then would emit an error rather than ICE for
something that is a user code bug, not internal compiler error.

> 
> Like done in v2 of the patch?
> 
> If so, I'll commit that.

Jakub


[PATCH] Provide opt-info for autopar and graphite

2017-02-01 Thread Richard Biener

The following provides opt-info for autopar and graphite.  That should
help to track down miscompilations by them given they usually do not
apply in many places (like for hunting down PR79321).

I've used the loop location of the loop containing the SCOP
which is probably reasonable for an option like -floop-nest-optimize.
There may be corner cases where this is confusing though (like when
that loop is the root of the loop tree and thus the SCOP covers
more than one outermost loop or when the SCOP covers adjacent loops
and the entry enters the first loop).  But ISTR we restrict SCOP
building / merging so that this doesn't happen (too often).

Bootstrap / regtest in progress on x86_64-unknown-linux-gnu (just to
check if any dump scanning is affected).

Richard.

2017-02-01  Richard Biener  

* graphite.c: Include tree-vectorizer.h for find_loop_location.
(graphite_transform_loops): Provide opt-info for optimized nests.
* tree-parloop.c (parallelize_loops): Provide opt-info for
parallelized loops.

Index: gcc/graphite.c
===
--- gcc/graphite.c  (revision 245022)
+++ gcc/graphite.c  (working copy)
@@ -52,6 +52,7 @@ along with GCC; see the file COPYING3.
 #include "dbgcnt.h"
 #include "tree-parloops.h"
 #include "tree-cfgcleanup.h"
+#include "tree-vectorizer.h"
 #include "graphite.h"
 
 /* Print global statistics to FILE.  */
@@ -328,6 +329,11 @@ graphite_transform_loops (void)
   and could be in an inconsistent state.  */
if (!graphite_regenerate_ast_isl (scop))
  break;
+
+   location_t loc = find_loop_location
+  (scop->scop_info->region.entry->dest->loop_father);
+   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loc,
+"loop nest optimized\n");
   }
 
   free_scops (scops);
Index: gcc/tree-parloops.c
===
--- gcc/tree-parloops.c (revision 245022)
+++ gcc/tree-parloops.c (working copy)
@@ -3322,17 +3322,14 @@ parallelize_loops (bool oacc_kernels_p)
 
   changed = true;
   skip_loop = loop->inner;
-  if (dump_file && (dump_flags & TDF_DETAILS))
-  {
-   if (loop->inner)
- fprintf (dump_file, "parallelizing outer loop 
%d\n",loop->header->index);
-   else
- fprintf (dump_file, "parallelizing inner loop 
%d\n",loop->header->index);
-   loop_loc = find_loop_location (loop);
-   if (loop_loc != UNKNOWN_LOCATION)
- fprintf (dump_file, "\nloop at %s:%d: ",
-  LOCATION_FILE (loop_loc), LOCATION_LINE (loop_loc));
-  }
+
+  loop_loc = find_loop_location (loop);
+  if (loop->inner)
+   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loop_loc,
+"parallelizing outer loop %d\n", loop->num);
+  else
+   dump_printf_loc (MSG_OPTIMIZED_LOCATIONS, loop_loc,
+"parallelizing inner loop %d\n", loop->num);
 
   gen_parallel_loop (loop, &reduction_list,
 n_threads, &niter_desc, oacc_kernels_p);


gcc.css colors

2017-02-01 Thread Markus Trippelsdorf
Some colors on e.g. https://gcc.gnu.org/gcc-7/changes.html are nearly
unreadable. So what about the following patch?

--- gcc_orig.css2017-02-01 11:39:17.634017498 +0100
+++ gcc.css 2017-02-01 11:40:23.979244263 +0100
@@ -58,8 +58,8 @@
 }
 div.copyright p:nth-child(3) { margin-bottom: 0; }
 
-.boldcyan{ font-weight:bold; color:cyan; }
-.boldlime{ font-weight:bold; color:lime; }
+.boldcyan{ font-weight:bold; color:#25a9a9; }
+.boldlime{ font-weight:bold; color:green;}
 .boldmagenta { font-weight:bold; color:magenta; }
 .boldred { font-weight:bold; color:red; }
 .boldblue{ font-weight:bold; color:blue; }

-- 
Markus


Re: [PATCH] Fix bool vs. unsigned:1 vectorization (PR tree-optimization/79284)

2017-02-01 Thread Richard Biener
On Wed, 1 Feb 2017, Jakub Jelinek wrote:

> On Wed, Feb 01, 2017 at 11:07:18AM +0100, Richard Biener wrote:
> > > On Wed, Feb 01, 2017 at 10:58:29AM +0100, Richard Biener wrote:
> > > > > > +/* Nonzero if TYPE represents a (scalar) boolean type or type
> > > > > > +   in the middle-end compatible with it.  */
> > > > > > +
> > > > > > +#define INTEGRAL_BOOLEAN_TYPE_P(TYPE) \
> > > > > > +  (TREE_CODE (TYPE) == BOOLEAN_TYPE\
> > > > > > +   || ((TREE_CODE (TYPE) == INTEGER_TYPE   \
> > > > > > +   || TREE_CODE (TYPE) == ENUMERAL_TYPE)   \
> > > > > > +   && TYPE_PRECISION (TYPE) == 1   \
> > > > > > +   && TYPE_UNSIGNED (TYPE)))
> > > > > > 
> > > > > > (just to quote what you proposed).
> > > > > 
> > > > > So would it help to use
> > > > >   (TREE_CODE (TYPE) == BOOLEAN_TYPE
> > > > >|| (INTEGRAL_TYPE_P (TYPE)
> > > > >&& useless_type_conversion_p (boolean_type_node, TYPE)))
> > > > > It would be much slower than the above, but would be less dependent
> > > > > on useless_type_conversion_p details.
> > > > 
> > > > For the vectorizer it likely would break the larger logical type
> > > > handling?
> > > 
> > > Why?  It is the same thing as the earlier macro above.
> > > Any kind of boolean, plus anything that could be initially boolean
> > > and the middle-end might have replaced it with.
> > 
> > boolean_type_node is QImode but logical(8) is DImode for example.
> > Both have precision == 1 but they are not types_compatible_p
> > (you probably missed the mode check in useless_type_conversion_p).
> 
> Then the earlier macro should have used also a TYPE_MODE check.
> The latter macro is what I've been looking for in the patch, except that
> I thought it is too expensive (plus might actually not DTRT if
> boolean_type_node has > 1 precision, but some unsigned precision 1
> BOOLEAN_TYPE exists too; as Fortran has changed, now it is only Ada that
> doesn't have precision 1 boolean_type_node, but then it likely doesn't
> have any precision 1 BOOLEAN_TYPEs).
> 
> > > > The question is really what the vectorizer and other places are looking
> > > > for -- which isually is a 1-bit precision, eventually unsigned,
> > > > integral type.
> > > 
> > > It is looking for any type where the only valid values are 0 (false) and 1
> > > (true), so that it can actually vectorize it as a bitmask, or vector of
> > > integers with -1 and 0 values.
> > 
> > That's INTEGRAL_TYPE_P && TYPE_PRECISION == 1 && TYPE_UNSIGNED.  The
> > Ada types do not fall under this category as far as I understand as
> > the exceptional values may exist in memory(?)
> 
> But the exceptional values are undefined behavior I believe.
> Anyway, we've been vectorizing not just the 1-bit precision BOOLEAN_TYPEs
> but other BOOLEAN_TYPEs (Ada and especially Fortran) for a couple of years
> this way already and I'm not aware of issues with that.
> 
> So suddenly stopping doing that because we want to fix a bug related
> to the fact that for 1-bit precision unsigned QImode BOOLEAN_TYPE the
> middle-end considers other types to be compatible with those is strange
> and very risky, especially at this point in GCC 7 development.

I agree.  But this means we should look for a vectorizer-local fix
without a new global predicate then (there seem to be subtly different
needs and coming up with good names for all of them sounds difficult...).

Richard.


Re: [PATCH] Fix PR79278, amend ADJUST_FIELD_ALIGN interface

2017-02-01 Thread Iain Sandoe

> On 1 Feb 2017, at 08:42, Richard Biener  wrote:
> 
> On Tue, 31 Jan 2017, Jeff Law wrote:
> 
>> On 01/31/2017 02:01 AM, Richard Biener wrote:
>>> 
>>> This amends ADJUST_FIELD_ALIGN to always get the type of the field
>>> as argument but make the field itself optional.  All actual target
>>> macro implementations only look at the type of the field but FRV
>>> (which seems to misuse ADJUST_FIELD_ALIGN to do bitfield layout
>>> rather than using one of the three standard ways - Alex/Nick?).
>> Didn't we deprecate FRV?  Oh, that was MEP..  Nevermind.
>> 
>> 
>> 
>>> This speeds up min_align_of_type (no longer needs to build a FIELD_DECL)
>>> and thus (IMHO) makes it usable from get_object_alignment.  This
>>> causes us no longer to return bogus answers for indirect accesses to
>>> doubles on i?86 and expand to RTL with proper MEM_ALIGN.  (it also
>>> makes the previous fix for PR79256 no longer necessary)
>>> 
>>> Bootstrap and regtest running on x86_64-unknown-linux-gnu - is this ok
>>> for trunk at this stage?
>>> 
>>> grep found a ADJUST_FIELD_ALIGN use in libobjc/encoding.c but that
>>> is fed a C string(!?) as FIELD_DECL so I discounted it as unrelated
>>> (and grep didn't find a way this macro could be defined there)?
>> Presumably this is the code that takes the structure and encodes information
>> about it for the runtime.  Though taking a string sounds horribly broken.
>> 
>> I suspect it gets included via tm.h.
>> 
>> I bet if someone built a cross far enough to build libobjc we could see it in
>> action.  It does make one wonder how this part of libobjc could possibly be
>> working on targets that define ADJUST_FIELD_ALIGN.
>> 
>> I'll note it's been that way since libobjc was moved into its own directory,
>> but wasn't like that prior to moving into its own directory.
>> 
>> I have no idea what to do here…
> 
> As objc builds just fine on x86_64 which does define ADJUST_FIELD_ALIGN
> including tm.h can't be the whole story here…

It also builds fine on powerpc-darwin9 (Darwin builds libobjc and tests both 
gnu-runtime and next-runtime [the latter with the system pre-installed NeXT 
runtime]).

> In fact preprocessed source on x86_64 with -dD shows no ADJUST_FIELD_ALIGN
> but instead
> 
> #define rs6000_special_adjust_field_align_p(FIELD,COMPUTED) 0
> 
> which means it must be sth powerpc specific...  FRV for example has
> 
> /* @@@ A hack, needed because libobjc wants to use ADJUST_FIELD_ALIGN for
>   some reason.  */

the (intended) reason is to get correct input for @encode which is supposed to 
be able to describe the layout of an object, to and from a description 
presented as the encoding string - this is, naturally, target-specific.

(however, perhaps that is broken in some way that we don’t test :( )

> #ifdef IN_TARGET_LIBS
> #define BIGGEST_FIELD_ALIGNMENT 64
> #else
> /* An expression for the alignment of a structure field FIELD if the
>   alignment computed in the usual way is COMPUTED.  GCC uses this
>   value instead of the value in `BIGGEST_ALIGNMENT' or
>   `BIGGEST_FIELD_ALIGNMENT', if defined, for structure fields only.  */
> #define ADJUST_FIELD_ALIGN(FIELD, COMPUTED) \
>  frv_adjust_field_align (FIELD, COMPUTED)
> #endif


> 
> Similar x86_64.
> 
> So it seems on power this might be an issue and thus I'd need to
> adjust the macro use - but not sure what to pass as "type" here...
> 
> I'll try to build a cross to ppc64 and see what happens.

I’m also in the process of trying to get powerpc64-darwin to build on  trunk 
(that will also test libobjc); although from Jeff’s commment about this being 
present for a long time, I can say that powerpc64-darwin9 does at least build a 
working libobjc for 6.3.0

.. will subscribe to the PR - but not 100% clear what change you want to make,
Iain




Re: gcc.css colors

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 11:45:14AM +0100, Markus Trippelsdorf wrote:
> Some colors on e.g. https://gcc.gnu.org/gcc-7/changes.html are nearly
> unreadable. So what about the following patch?
> 
> --- gcc_orig.css  2017-02-01 11:39:17.634017498 +0100
> +++ gcc.css   2017-02-01 11:40:23.979244263 +0100
> @@ -58,8 +58,8 @@
>  }
>  div.copyright p:nth-child(3) { margin-bottom: 0; }
>  
> -.boldcyan{ font-weight:bold; color:cyan; }
> -.boldlime{ font-weight:bold; color:lime; }
> +.boldcyan{ font-weight:bold; color:#25a9a9; }
> +.boldlime{ font-weight:bold; color:green;}
>  .boldmagenta { font-weight:bold; color:magenta; }
>  .boldred { font-weight:bold; color:red; }
>  .boldblue{ font-weight:bold; color:blue; }

I think the intent is that they actually match closely what gcc/libasan emits
(that of course depends on the exact terminal setting).
So are your colors closer to what gcc/libasan print or not?

Jakub


Re: [wwwdocs] changes.html - document new warning options

2017-02-01 Thread Aldy Hernandez



Since you ask so nicely I added another example but I'm afraid it
isn't terribly interesting:

  In contrast, a call to alloca that isn't bounded at all such as
  in the following function will elicit the warning below regardless
  of the size argument to the option.

  void f (size_t n)
  {
char *d = alloca (n)
...
  }

  warning: unbounded use of 'alloca' [-Walloca-larger-than=]


I like it.  Thank you so much for taking care of all this.



Re: [wwwdocs] changes.html - document new warning options

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 05:50:08AM -0500, Aldy Hernandez wrote:
> 
> > Since you ask so nicely I added another example but I'm afraid it
> > isn't terribly interesting:
> > 
> >   In contrast, a call to alloca that isn't bounded at all such as
> >   in the following function will elicit the warning below regardless
> >   of the size argument to the option.
> > 
> >   void f (size_t n)
> >   {
> > char *d = alloca (n)

Missing semicolon after alloca (n)

> > ...
> >   }
> > 
> >   warning: unbounded use of 'alloca' [-Walloca-larger-than=]
> 
> I like it.  Thank you so much for taking care of all this.

Jakub


Re: gcc.css colors

2017-02-01 Thread Markus Trippelsdorf
On 2017.02.01 at 11:48 +0100, Jakub Jelinek wrote:
> On Wed, Feb 01, 2017 at 11:45:14AM +0100, Markus Trippelsdorf wrote:
> > Some colors on e.g. https://gcc.gnu.org/gcc-7/changes.html are nearly
> > unreadable. So what about the following patch?
> > 
> > --- gcc_orig.css2017-02-01 11:39:17.634017498 +0100
> > +++ gcc.css 2017-02-01 11:40:23.979244263 +0100
> > @@ -58,8 +58,8 @@
> >  }
> >  div.copyright p:nth-child(3) { margin-bottom: 0; }
> >  
> > -.boldcyan{ font-weight:bold; color:cyan; }
> > -.boldlime{ font-weight:bold; color:lime; }
> > +.boldcyan{ font-weight:bold; color:#25a9a9; }
> > +.boldlime{ font-weight:bold; color:green;}
> >  .boldmagenta { font-weight:bold; color:magenta; }
> >  .boldred { font-weight:bold; color:red; }
> >  .boldblue{ font-weight:bold; color:blue; }
> 
> I think the intent is that they actually match closely what gcc/libasan emits
> (that of course depends on the exact terminal setting).
> So are your colors closer to what gcc/libasan print or not?

As you said, the exact terminal colors are user definable.
But yes, the change above bring them closer to what I see in my
terminal. 
And readability is much improved by the patch IMHO.

-- 
Markus


Re: [PATCH] Fix PR79278, amend ADJUST_FIELD_ALIGN interface

2017-02-01 Thread Richard Biener
On Wed, 1 Feb 2017, Iain Sandoe wrote:

> 
> > On 1 Feb 2017, at 08:42, Richard Biener  wrote:
> > 
> > On Tue, 31 Jan 2017, Jeff Law wrote:
> > 
> >> On 01/31/2017 02:01 AM, Richard Biener wrote:
> >>> 
> >>> This amends ADJUST_FIELD_ALIGN to always get the type of the field
> >>> as argument but make the field itself optional.  All actual target
> >>> macro implementations only look at the type of the field but FRV
> >>> (which seems to misuse ADJUST_FIELD_ALIGN to do bitfield layout
> >>> rather than using one of the three standard ways - Alex/Nick?).
> >> Didn't we deprecate FRV?  Oh, that was MEP..  Nevermind.
> >> 
> >> 
> >> 
> >>> This speeds up min_align_of_type (no longer needs to build a FIELD_DECL)
> >>> and thus (IMHO) makes it usable from get_object_alignment.  This
> >>> causes us no longer to return bogus answers for indirect accesses to
> >>> doubles on i?86 and expand to RTL with proper MEM_ALIGN.  (it also
> >>> makes the previous fix for PR79256 no longer necessary)
> >>> 
> >>> Bootstrap and regtest running on x86_64-unknown-linux-gnu - is this ok
> >>> for trunk at this stage?
> >>> 
> >>> grep found a ADJUST_FIELD_ALIGN use in libobjc/encoding.c but that
> >>> is fed a C string(!?) as FIELD_DECL so I discounted it as unrelated
> >>> (and grep didn't find a way this macro could be defined there)?
> >> Presumably this is the code that takes the structure and encodes 
> >> information
> >> about it for the runtime.  Though taking a string sounds horribly broken.
> >> 
> >> I suspect it gets included via tm.h.
> >> 
> >> I bet if someone built a cross far enough to build libobjc we could see it 
> >> in
> >> action.  It does make one wonder how this part of libobjc could possibly be
> >> working on targets that define ADJUST_FIELD_ALIGN.
> >> 
> >> I'll note it's been that way since libobjc was moved into its own 
> >> directory,
> >> but wasn't like that prior to moving into its own directory.
> >> 
> >> I have no idea what to do here…
> > 
> > As objc builds just fine on x86_64 which does define ADJUST_FIELD_ALIGN
> > including tm.h can't be the whole story here…
> 
> It also builds fine on powerpc-darwin9 (Darwin builds libobjc and tests both 
> gnu-runtime and next-runtime [the latter with the system pre-installed NeXT 
> runtime]).
> 
> > In fact preprocessed source on x86_64 with -dD shows no ADJUST_FIELD_ALIGN
> > but instead
> > 
> > #define rs6000_special_adjust_field_align_p(FIELD,COMPUTED) 0
> > 
> > which means it must be sth powerpc specific...  FRV for example has
> > 
> > /* @@@ A hack, needed because libobjc wants to use ADJUST_FIELD_ALIGN for
> >   some reason.  */
> 
> the (intended) reason is to get correct input for @encode which is supposed 
> to be able to describe the layout of an object, to and from a description 
> presented as the encoding string - this is, naturally, target-specific.
> 
> (however, perhaps that is broken in some way that we don’t test :( )
> 
> > #ifdef IN_TARGET_LIBS
> > #define BIGGEST_FIELD_ALIGNMENT 64
> > #else
> > /* An expression for the alignment of a structure field FIELD if the
> >   alignment computed in the usual way is COMPUTED.  GCC uses this
> >   value instead of the value in `BIGGEST_ALIGNMENT' or
> >   `BIGGEST_FIELD_ALIGNMENT', if defined, for structure fields only.  */
> > #define ADJUST_FIELD_ALIGN(FIELD, COMPUTED) \
> >  frv_adjust_field_align (FIELD, COMPUTED)
> > #endif
> 
> 
> > 
> > Similar x86_64.
> > 
> > So it seems on power this might be an issue and thus I'd need to
> > adjust the macro use - but not sure what to pass as "type" here...
> > 
> > I'll try to build a cross to ppc64 and see what happens.
> 
> I’m also in the process of trying to get powerpc64-darwin to build on  trunk 
> (that will also test libobjc); although from Jeff’s commment about this being 
> present for a long time, I can say that powerpc64-darwin9 does at least build 
> a working libobjc for 6.3.0
> 
> .. will subscribe to the PR - but not 100% clear what change you want to 
> make,
> Iain

The change is

* doc/tm.texi.in (ADJUST_FIELD_ALIGN): Adjust to take additional
type parameter.

thus the target macro ADJUST_FIELD_ALIGN takes three parameters now,
in second place we insert a always-non-NULL TREE_TYPE of the field
(old and new first parameter, after the change may be NULL).

libobjc passes a const char * as first argument - I have no idea
what to pass as second ;)  I _suppose_ the argument is supposed
to be never looked at so we could just pass NULL for both first
and new second parameter?

Wasn't successful in making a cross to ppc64-linux build its libobjc.

Richard.

Re: gcc.css colors

2017-02-01 Thread Marc Glisse

On Wed, 1 Feb 2017, Markus Trippelsdorf wrote:


Some colors on e.g. https://gcc.gnu.org/gcc-7/changes.html are nearly
unreadable.


I recently noticed that gcc's website has an extremely strict Content 
Security Policy, which makes it harder to customize its appearance using 
for instance greasemonkey in firefox. Arguably, greasemonkey should 
disable CSP checks for the content injected by its scripts, and I can 
always make my browser pretend that gcc is returning a different CSP, but 
that seems complicated and unnecessary.


Would it be possible to allow unsafe-inline in style-src (assuming I read 
the doc correctly)? Possibly relax things even more if it helps other 
customization tools?


--
Marc Glisse


Re: gcc.css colors

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 11:55:53AM +0100, Markus Trippelsdorf wrote:
> On 2017.02.01 at 11:48 +0100, Jakub Jelinek wrote:
> > On Wed, Feb 01, 2017 at 11:45:14AM +0100, Markus Trippelsdorf wrote:
> > > Some colors on e.g. https://gcc.gnu.org/gcc-7/changes.html are nearly
> > > unreadable. So what about the following patch?
> > > 
> > > --- gcc_orig.css  2017-02-01 11:39:17.634017498 +0100
> > > +++ gcc.css   2017-02-01 11:40:23.979244263 +0100
> > > @@ -58,8 +58,8 @@
> > >  }
> > >  div.copyright p:nth-child(3) { margin-bottom: 0; }
> > >  
> > > -.boldcyan{ font-weight:bold; color:cyan; }
> > > -.boldlime{ font-weight:bold; color:lime; }
> > > +.boldcyan{ font-weight:bold; color:#25a9a9; }
> > > +.boldlime{ font-weight:bold; color:green;}
> > >  .boldmagenta { font-weight:bold; color:magenta; }
> > >  .boldred { font-weight:bold; color:red; }
> > >  .boldblue{ font-weight:bold; color:blue; }
> > 
> > I think the intent is that they actually match closely what gcc/libasan 
> > emits
> > (that of course depends on the exact terminal setting).
> > So are your colors closer to what gcc/libasan print or not?
> 
> As you said, the exact terminal colors are user definable.
> But yes, the change above bring them closer to what I see in my
> terminal. 

Exactly the opposite here, the current colors match very closely what I get
(gnome-terminal, White on black, Linux console color set),
your colors are completely different.

E.g. in changes.html, the Asan spans with boldlime are using
\033[1m\033[32m
in libsanitizer, and the note color in gcc by default is
\033[1m\033[36m

I've tried various color settings of gnome-terminal (both white on black and
black on white plus the different color sets) and all of them except 
for Solarized (which is shades of grey) look much brighter than your colors.

Jakub


Re: gcc.css colors

2017-02-01 Thread Markus Trippelsdorf
On 2017.02.01 at 12:14 +0100, Jakub Jelinek wrote:
> On Wed, Feb 01, 2017 at 11:55:53AM +0100, Markus Trippelsdorf wrote:
> > On 2017.02.01 at 11:48 +0100, Jakub Jelinek wrote:
> > > On Wed, Feb 01, 2017 at 11:45:14AM +0100, Markus Trippelsdorf wrote:
> > > > Some colors on e.g. https://gcc.gnu.org/gcc-7/changes.html are nearly
> > > > unreadable. So what about the following patch?
> > > > 
> > > > --- gcc_orig.css2017-02-01 11:39:17.634017498 +0100
> > > > +++ gcc.css 2017-02-01 11:40:23.979244263 +0100
> > > > @@ -58,8 +58,8 @@
> > > >  }
> > > >  div.copyright p:nth-child(3) { margin-bottom: 0; }
> > > >  
> > > > -.boldcyan{ font-weight:bold; color:cyan; }
> > > > -.boldlime{ font-weight:bold; color:lime; }
> > > > +.boldcyan{ font-weight:bold; color:#25a9a9; }
> > > > +.boldlime{ font-weight:bold; color:green;}
> > > >  .boldmagenta { font-weight:bold; color:magenta; }
> > > >  .boldred { font-weight:bold; color:red; }
> > > >  .boldblue{ font-weight:bold; color:blue; }
> > > 
> > > I think the intent is that they actually match closely what gcc/libasan 
> > > emits
> > > (that of course depends on the exact terminal setting).
> > > So are your colors closer to what gcc/libasan print or not?
> > 
> > As you said, the exact terminal colors are user definable.
> > But yes, the change above bring them closer to what I see in my
> > terminal. 
> 
> Exactly the opposite here, the current colors match very closely what I get
> (gnome-terminal, White on black, Linux console color set),
> your colors are completely different.
> 
> E.g. in changes.html, the Asan spans with boldlime are using
> \033[1m\033[32m
> in libsanitizer, and the note color in gcc by default is
> \033[1m\033[36m
> 
> I've tried various color settings of gnome-terminal (both white on black and
> black on white plus the different color sets) and all of them except 
> for Solarized (which is shades of grey) look much brighter than your colors.

See: http://i.imgur.com/rAlEdVy.png. (I use konsole, but even gnome-
terminal supports truecolor now. So one has complete freedom in choosing
the default colors.)


-- 
Markus


Re: [PATCH] PR libstdc++/79254 fix exception-safety in std::string::operator=

2017-02-01 Thread Jonathan Wakely

On 27/01/17 16:16 +, Jonathan Wakely wrote:

This implements the strong exception-safety guarantee that is required
by [string.require] p2, which the new string can fail to meet when
propagate_on_container_copy_assignment (POCCA) is true.

The solution is to define a helper that takes ownership of the
string's memory (and also the associated allocator, length and
capacity) and either deallocates it after the assignment, or swaps it
back in if an exception happens (i.e. commit or rollback).

PR libstdc++/79254
* config/abi/pre/gnu.ver: Add new symbols.
* include/bits/basic_string.h [_GLIBCXX_USE_CXX11_ABI]
(basic_string::_M_copy_assign): New overloaded functions to perform
copy assignment.
(basic_string::operator=(const basic_string&)): Dispatch to
_M_copy_assign.
* include/bits/basic_string.tcc [_GLIBCXX_USE_CXX11_ABI]
(basic_string::_M_copy_assign(const basic_string&, true_type)):
Define, performing rollback on exception.
* testsuite/21_strings/basic_string/allocator/char/copy_assign.cc:
Test exception-safety guarantee.
* testsuite/21_strings/basic_string/allocator/wchar_t/copy_assign.cc:
Likewise.
* testsuite/util/testsuite_allocator.h (uneq_allocator::swap): Make
std::swap visible.

The backports for the branches will be a bit different, as we can't
add new exports to closed symbol versions, so I'll keep everything in
operator= instead of tag dispatching. The POCCA code path will still
be dependent on a constant expression, so should be optimized away for
most allocators.


Whlie working on the backport of this I realised the RAII
commit-and-rollback approach is a lot more complicated than simply
doing the new allocation before making any changes to *this.

This new patch simplifies it. There's no need to tag-dispatch to
_M_copy_assign because the code path for POCCA allocators is behind a
compile-time constant condition anyway, and the allocator assignment
is already conditional because of __alloc_on_copy.

**
!!! This removes the new _M_copy_assign members functions that
!!! were exported from libstdc++.so since last Friday.
**

Packagers (including at least Fedora rawhide) that have shipped a
gcc-7 build since last Friday will need to update again.  This
shouldn't be a big deal, because I expect the amount of code in a
typical GNU/Linux distro that used the _M_copy_assign(..., true_type)
symbol to be exactly zero, and the _M_copy_assign(..., false_type)
symbol will be inlined with any -Ox level, and is not used at -O0.

Tested powerpc64-linux, committed to trunk.


commit 5e6bb61638e06b51291307e7e21745a55feed5f2
Author: Jonathan Wakely 
Date:   Tue Jan 31 17:56:03 2017 +

PR libstdc++/79254 simplify exception-safety in copy assignment

	PR libstdc++/79254
	* config/abi/pre/gnu.ver: Remove recently added symbols.
	* include/bits/basic_string.h [_GLIBCXX_USE_CXX11_ABI]
	(basic_string::_M_copy_assign): Remove.
	(basic_string::operator=(const basic_string&)): Don't dispatch to
	_M_copy_assign. If source object is small just deallocate, otherwise
	perform new allocation before making any changes.
	* include/bits/basic_string.tcc [_GLIBCXX_USE_CXX11_ABI]
	(basic_string::_M_copy_assign(const basic_string&, true_type)):
	Remove.
	* testsuite/21_strings/basic_string/allocator/char/copy_assign.cc:
	Test cases where the allocators are equal or the string is small.
	* testsuite/21_strings/basic_string/allocator/wchar_t/copy_assign.cc:
	Likewise.

diff --git a/libstdc++-v3/config/abi/pre/gnu.ver b/libstdc++-v3/config/abi/pre/gnu.ver
index 1bea4b4..268fb94 100644
--- a/libstdc++-v3/config/abi/pre/gnu.ver
+++ b/libstdc++-v3/config/abi/pre/gnu.ver
@@ -1955,9 +1955,6 @@ GLIBCXX_3.4.23 {
 _ZNSsC[12]ERKSs[jmy]RKSaIcE;
 _ZNSbIwSt11char_traitsIwESaIwEEC[12]ERKS2_mRKS1_;
 
-# basic_string::_M_copy_assign(const basic_string&, {true,false}_type)
-_ZNSt7__cxx1112basic_stringI[cw]St11char_traitsI[cw]ESaI[cw]EE14_M_copy_assign*;
-
 #ifndef HAVE_EXCEPTION_PTR_SINCE_GCC46
 # std::future symbols are exported in the first version to support
 # std::exception_ptr
diff --git a/libstdc++-v3/include/bits/basic_string.h b/libstdc++-v3/include/bits/basic_string.h
index 97fe797..981ffc5 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -393,15 +393,6 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   void
   _M_erase(size_type __pos, size_type __n);
 
-#if __cplusplus >= 201103L
-  void
-  _M_copy_assign(const basic_string& __str, /* pocca = */ true_type);
-
-  void
-  _M_copy_assign(const basic_string& __str, /* pocca = */ false_type)
-  { this->_M_assign(__str); }
-#endif
-
 public:
   // Construct/copy/destroy:
  

Re: [patch] Fix PR middle-end/78468

2017-02-01 Thread Eric Botcazou
> I'd say let's not have a middle ground - this stuff is sufficiently
> brain-twisting that I'd rather go back to a known working state. If
> there was an error in the previous patch, let's roll it back until we
> fully understand the situation.

Here's a revised version along these lines, OK for mainline after testing?


PR middle-end/78468
* emit-rtl.c (init_emit): Add ??? comment for problematic alignment
settings of the virtual registers.

Revert again
2016-08-23  Dominik Vogt  

* explow.c (get_dynamic_stack_size): Take known alignment of stack
pointer + STACK_DYNAMIC_OFFSET into account when calculating the size
needed.


-- 
Eric BotcazouIndex: emit-rtl.c
===
--- emit-rtl.c	(revision 244917)
+++ emit-rtl.c	(working copy)
@@ -5725,10 +5725,13 @@ init_emit (void)
   REGNO_POINTER_ALIGN (HARD_FRAME_POINTER_REGNUM) = STACK_BOUNDARY;
   REGNO_POINTER_ALIGN (ARG_POINTER_REGNUM) = STACK_BOUNDARY;
 
+  /* ??? These are problematic (for example, 3 out of 4 are wrong on
+ 32-bit SPARC and cannot be all fixed because of the ABI).  */
   REGNO_POINTER_ALIGN (VIRTUAL_INCOMING_ARGS_REGNUM) = STACK_BOUNDARY;
   REGNO_POINTER_ALIGN (VIRTUAL_STACK_VARS_REGNUM) = STACK_BOUNDARY;
   REGNO_POINTER_ALIGN (VIRTUAL_STACK_DYNAMIC_REGNUM) = STACK_BOUNDARY;
   REGNO_POINTER_ALIGN (VIRTUAL_OUTGOING_ARGS_REGNUM) = STACK_BOUNDARY;
+
   REGNO_POINTER_ALIGN (VIRTUAL_CFA_REGNUM) = BITS_PER_WORD;
 #endif
 
Index: explow.c
===
--- explow.c	(revision 244917)
+++ explow.c	(working copy)
@@ -1233,15 +1233,9 @@ get_dynamic_stack_size (rtx *psize, unsi
  example), so we must preventively align the value.  We leave space
  in SIZE for the hole that might result from the alignment operation.  */
 
-  unsigned known_align = REGNO_POINTER_ALIGN (VIRTUAL_STACK_DYNAMIC_REGNUM);
-  if (known_align == 0)
-known_align = BITS_PER_UNIT;
-  if (required_align > known_align)
-{
-  extra = (required_align - known_align) / BITS_PER_UNIT;
-  size = plus_constant (Pmode, size, extra);
-  size = force_operand (size, NULL_RTX);
-}
+  extra = (required_align - BITS_PER_UNIT) / BITS_PER_UNIT;
+  size = plus_constant (Pmode, size, extra);
+  size = force_operand (size, NULL_RTX);
 
   if (flag_stack_usage_info && pstack_usage_size)
 *pstack_usage_size += extra;


Re: gcc.css colors

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 12:33:30PM +0100, Markus Trippelsdorf wrote:
> > E.g. in changes.html, the Asan spans with boldlime are using
> > \033[1m\033[32m
> > in libsanitizer, and the note color in gcc by default is
> > \033[1m\033[36m
> > 
> > I've tried various color settings of gnome-terminal (both white on black and
> > black on white plus the different color sets) and all of them except 
> > for Solarized (which is shades of grey) look much brighter than your colors.
> 
> See: http://i.imgur.com/rAlEdVy.png. (I use konsole, but even gnome-
> terminal supports truecolor now. So one has complete freedom in choosing
> the default colors.)

Sure, one can customize anything.  The point is, what colors are configured 
   
by default and thus used by most of the users?  
   

   
Just tried konsole (both Linux colors and Black on white) look like this
http://imgur.com/pQS6lbI

Jakub


Re: gcc.css colors

2017-02-01 Thread Markus Trippelsdorf
On 2017.02.01 at 12:41 +0100, Jakub Jelinek wrote:
> Hi!
> 
> On Wed, Feb 01, 2017 at 12:27:05PM +0100, Markus Trippelsdorf wrote:
> > > I've tried various color settings of gnome-terminal (both white on black 
> > > and
> > > black on white plus the different color sets) and all of them except 
> > > for Solarized (which is shades of grey) look much brighter than your 
> > > colors.
> > 
> > See attached screenshot. (I use konsole, but even gnome-terminal
> > supports truecolor now. So one has complete freedom in choosing the
> > default colors.)
> 
> Sure, one can customize anything.  The point is, what colors are configured
> by default and thus used by most of the users?
> 
> Just tried konsole (both Linux colors and Black on white) look like this
> here:

This points to the core of the issue. I guess most users use a dark
(black) background in the terminal. And on a black background the
gcc.css colors are perfectly readable. But because we use a white
background on the website, the colors have way too little contrast and
become hard to read.

-- 
Markus


Re: [PATCH] Fix PGO bootstrap on x390x (PR bootstrap/78985).

2017-02-01 Thread Martin Liška
On 02/01/2017 11:38 AM, Jakub Jelinek wrote:
> On Wed, Feb 01, 2017 at 11:34:48AM +0100, Martin Liška wrote:
>>> Presumably the issue with print_operand_address is that there are paths 
>>> where s390_decompose_address can return without initializing AD/OUT. But 
>>> AFAICT those are invalid addresses that presumably shouldn't be showing up 
>>> in print_operand_address.
>>>
>>> Can you add an assert in print_operand_address to ensure decomposition 
>>> never returns false?
> 
> Can't it happen e.g. with inline asm and "X" constraint?
> output_operand_lossage then would emit an error rather than ICE for
> something that is a user code bug, not internal compiler error.

Ok, thus said I'll commit the original version.
Is it fine?

M.

> 
>>
>> Like done in v2 of the patch?
>>
>> If so, I'll commit that.
> 
>   Jakub
> 



Re: [PATCH] Fix PGO bootstrap on x390x (PR bootstrap/78985).

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 12:52:16PM +0100, Martin Liška wrote:
> On 02/01/2017 11:38 AM, Jakub Jelinek wrote:
> > On Wed, Feb 01, 2017 at 11:34:48AM +0100, Martin Liška wrote:
> >>> Presumably the issue with print_operand_address is that there are paths 
> >>> where s390_decompose_address can return without initializing AD/OUT. But 
> >>> AFAICT those are invalid addresses that presumably shouldn't be showing 
> >>> up in print_operand_address.
> >>>
> >>> Can you add an assert in print_operand_address to ensure decomposition 
> >>> never returns false?
> > 
> > Can't it happen e.g. with inline asm and "X" constraint?
> > output_operand_lossage then would emit an error rather than ICE for
> > something that is a user code bug, not internal compiler error.
> 
> Ok, thus said I'll commit the original version.
> Is it fine?

Fine with me, but let Jeff chime in if he disagrees.

Jakub


Re: [PATCH] PR libstdc++/79254 fix exception-safety in std::string::operator=

2017-02-01 Thread Jonathan Wakely

On 01/02/17 11:42 +, Jonathan Wakely wrote:

On 27/01/17 16:16 +, Jonathan Wakely wrote:

This implements the strong exception-safety guarantee that is required
by [string.require] p2, which the new string can fail to meet when
propagate_on_container_copy_assignment (POCCA) is true.

The solution is to define a helper that takes ownership of the
string's memory (and also the associated allocator, length and
capacity) and either deallocates it after the assignment, or swaps it
back in if an exception happens (i.e. commit or rollback).

PR libstdc++/79254
* config/abi/pre/gnu.ver: Add new symbols.
* include/bits/basic_string.h [_GLIBCXX_USE_CXX11_ABI]
(basic_string::_M_copy_assign): New overloaded functions to perform
copy assignment.
(basic_string::operator=(const basic_string&)): Dispatch to
_M_copy_assign.
* include/bits/basic_string.tcc [_GLIBCXX_USE_CXX11_ABI]
(basic_string::_M_copy_assign(const basic_string&, true_type)):
Define, performing rollback on exception.
* testsuite/21_strings/basic_string/allocator/char/copy_assign.cc:
Test exception-safety guarantee.
* testsuite/21_strings/basic_string/allocator/wchar_t/copy_assign.cc:
Likewise.
* testsuite/util/testsuite_allocator.h (uneq_allocator::swap): Make
std::swap visible.

The backports for the branches will be a bit different, as we can't
add new exports to closed symbol versions, so I'll keep everything in
operator= instead of tag dispatching. The POCCA code path will still
be dependent on a constant expression, so should be optimized away for
most allocators.


Whlie working on the backport of this I realised the RAII
commit-and-rollback approach is a lot more complicated than simply
doing the new allocation before making any changes to *this.


Here's the backport for the branches, which shows that the new
approach is much closer to the original code and much simpler.

Tested x86_64-linux, committed to gcc-6-branch and gcc-5-branch.


commit 90d239b147e4f72e9361e10c5d45470c4b52eb7f
Author: Jonathan Wakely 
Date:   Wed Feb 1 04:27:20 2017 +

PR libstdc++/79254 fix exception-safety of std::string copy assignment

PR libstdc++/79254
* include/bits/basic_string.h [_GLIBCXX_USE_CXX11_ABI]
(basic_string::operator=(const basic_string&)): If source object is
small just deallocate, otherwise perform new allocation before
making any changes.
* testsuite/21_strings/basic_string/allocator/wchar_t/copy_assign.cc:
Test exception-safety of copy assignment when allocator propagates.
* testsuite/21_strings/basic_string/allocator/char/copy_assign.cc:
Likewise.
* testsuite/util/testsuite_allocator.h (uneq_allocator::swap): Make
std::swap visible.

diff --git a/libstdc++-v3/include/bits/basic_string.h 
b/libstdc++-v3/include/bits/basic_string.h
index f8f3f88..0352bf4 100644
--- a/libstdc++-v3/include/bits/basic_string.h
+++ b/libstdc++-v3/include/bits/basic_string.h
@@ -570,10 +570,25 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
if (!_Alloc_traits::_S_always_equal() && !_M_is_local()
&& _M_get_allocator() != __str._M_get_allocator())
  {
-   // replacement allocator cannot free existing storage
-   _M_destroy(_M_allocated_capacity);
-   _M_data(_M_local_data());
-   _M_set_length(0);
+   // Propagating allocator cannot free existing storage so must
+   // deallocate it before replacing current allocator.
+   if (__str.size() <= _S_local_capacity)
+ {
+   _M_destroy(_M_allocated_capacity);
+   _M_data(_M_local_data());
+   _M_set_length(0);
+ }
+   else
+ {
+   const auto __len = __str.size();
+   auto __alloc = __str._M_get_allocator();
+   // If this allocation throws there are no effects:
+   auto __ptr = _Alloc_traits::allocate(__alloc, __len + 1);
+   _M_destroy(_M_allocated_capacity);
+   _M_data(__ptr);
+   _M_capacity(__len);
+   _M_set_length(__len);
+ }
  }
std::__alloc_on_copy(_M_get_allocator(), __str._M_get_allocator());
  }
diff --git 
a/libstdc++-v3/testsuite/21_strings/basic_string/allocator/char/copy_assign.cc 
b/libstdc++-v3/testsuite/21_strings/basic_string/allocator/char/copy_assign.cc
index 3c8e440..645e3cb 100644
--- 
a/libstdc++-v3/testsuite/21_strings/basic_string/allocator/char/copy_assign.cc
+++ 
b/libstdc++-v3/testsuite/21_strings/basic_string/allocator/char/copy_assign.cc
@@ -1,4 +1,4 @@
-// Copyright (C) 2015-2016 Free Software Foundation, Inc.
+// Copyright (C) 2015-2017 Free Software Foundation, Inc.
 //
 // This 

Re: [PATCH] PR libstdc++/79195 fix make_array type deduction

2017-02-01 Thread Jonathan Wakely

On 23/01/17 15:55 +, Jonathan Wakely wrote:

This adds indirection through a class template partial specialization
so we don't instantiate common_type<_Types...>::type unless we need
it.

PR libstdc++/79195
* include/experimental/array (__make_array_elem): New class template
and partial specialization.
(__is_reference_wrapper): Move into __make_array_elem specialization.
(make_array): Use __make_array_elem to determine element type and move
static assertion into specialization. Qualify std::forward call.
(to_array): Add exception specifiation.
* testsuite/experimental/array/make_array.cc: Test argument types
without a common type.
* testsuite/experimental/array/neg.cc: Adjust expected error message.

Tested powerpc64le-linux, committed to trunk (because it's only an
experimental TS feature).


Here's a more conservative fix for the gcc-6-branch. I prefer the
cleanup on trunk, as this version will remove any cv-qualifiers from
_Dest, but it's good enough for the branch IMHO.

Tested x86_64-linux, committed to gcc-6-branch.


commit 99ec893357b6b6a2f67027efc6938f30b154dce4
Author: Jonathan Wakely 
Date:   Wed Feb 1 04:36:38 2017 +

PR libstdc++/79195 fix make_array type deduction

	PR libstdc++/79195
	* include/experimental/array (make_array): Use common_type<_Dest>
	and delay instantiation of common_type until after conditional_t.
	Qualify std::forward call.
	(to_array): Add exception specification.
	* testsuite/experimental/array/make_array.cc: Test argument types
	without a common type.

diff --git a/libstdc++-v3/include/experimental/array b/libstdc++-v3/include/experimental/array
index 31a066b..c01f0f9 100644
--- a/libstdc++-v3/include/experimental/array
+++ b/libstdc++-v3/include/experimental/array
@@ -69,9 +69,9 @@ template 
 template 
   constexpr auto
   make_array(_Types&&... __t)
--> array,
-   common_type_t<_Types...>,
-   _Dest>,
+-> array,
+common_type<_Types...>,
+common_type<_Dest>>::type,
  sizeof...(_Types)>
   {
 static_assert(__or_<
@@ -80,13 +80,12 @@ template 
   ::value,
   "make_array cannot be used without an explicit target type "
   "if any of the types given is a reference_wrapper");
-return {{forward<_Types>(__t)...}};
+return {{ std::forward<_Types>(__t)... }};
   }
 
 template 
   constexpr array, _Nm>
-  __to_array(_Tp (&__a)[_Nm],
- index_sequence<_Idx...>)
+  __to_array(_Tp (&__a)[_Nm], index_sequence<_Idx...>)
   {
 return {{__a[_Idx]...}};
   }
@@ -94,6 +93,7 @@ template 
 template 
   constexpr array, _Nm>
   to_array(_Tp (&__a)[_Nm])
+  noexcept(is_nothrow_constructible, _Tp&>::value)
   {
 return __to_array(__a, make_index_sequence<_Nm>{});
   }
diff --git a/libstdc++-v3/testsuite/experimental/array/make_array.cc b/libstdc++-v3/testsuite/experimental/array/make_array.cc
index 0ae188b..75f5333 100644
--- a/libstdc++-v3/testsuite/experimental/array/make_array.cc
+++ b/libstdc++-v3/testsuite/experimental/array/make_array.cc
@@ -1,7 +1,6 @@
-// { dg-options "-std=gnu++14" }
-// { dg-do compile }
+// { dg-do compile { target c++14 } }
 
-// Copyright (C) 2015-2016 Free Software Foundation, Inc.
+// Copyright (C) 2015-2017 Free Software Foundation, Inc.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
 // software; you can redistribute it and/or modify it under the
@@ -19,6 +18,7 @@
 // .
 
 #include 
+#include  // for std::ref and std::reference_wrapper
 
 struct MoveOnly
 {
@@ -27,7 +27,7 @@ struct MoveOnly
   MoveOnly& operator=(MoveOnly&&) = default;
 };
 
-int main()
+void test01()
 {
   char x[42];
   std::array y = std::experimental::to_array(x);
@@ -45,3 +45,13 @@ int main()
 = std::experimental::make_array(1,2L, 3);
   constexpr std::array zzz2 = std::experimental::make_array(MoveOnly{});
 }
+
+void test02()
+{
+  // PR libstdc++/79195
+  struct A {};
+  struct B : A {};
+  struct C : A {};
+  auto arr = std::experimental::make_array(B{}, C{});
+  static_assert(std::is_same>::value, "");
+}


[PATCH] XFAIL gcc.dg/graphite/scop-dsyrk.c

2017-02-01 Thread Richard Biener

The following XFAILs the testcases, making them fail reliably independelty
of int/long type sizes and also providing new testcase variants that
succeed reliably.

I don't see us fixing the underlying niter analysis issue for GCC 7.

Tested on x86_64-unknown-linux-gnu with {,-m32}, applied.

Richard.

2017-02-01  Richard Biener  

PR testsuite/76957
* gcc.dg/graphite/scop-dsyr2k-2.c: New testcase.
* gcc.dg/graphite/scop-dsyrk-2.c: Likewise.
* gcc.dg/graphite/scop-dsyr2k.c: XFAIL.
* gcc.dg/graphite/scop-dsyrk.c: Likewise.

Index: gcc/testsuite/gcc.dg/graphite/scop-dsyr2k-2.c
===
--- gcc/testsuite/gcc.dg/graphite/scop-dsyr2k-2.c   (nonexistent)
+++ gcc/testsuite/gcc.dg/graphite/scop-dsyr2k-2.c   (working copy)
@@ -0,0 +1,24 @@
+/* { dg-require-effective-target size32plus } */
+#define NMAX 3000
+
+static double a[NMAX][NMAX], b[NMAX][NMAX], c[NMAX][NMAX];
+
+typedef __INT32_TYPE__ int32_t;
+typedef __INT64_TYPE__ int64_t;
+
+void dsyr2k(int64_t N) {
+   int32_t i,j,k;
+   
+#pragma scop
+   for (i=0; i

[PATCH] PR78346 make handle stashing iterators

2017-02-01 Thread Jonathan Wakely

I didn't get an answer explaining why these function objects store a
reference not the iterator, so I'm fixing the regression by storing
the iterator and dereferencing it on every comparison.

PR libstdc++/78346
* include/bits/predefined_ops.h (_Iter_equals_iter): Store iterator
not its referent.
(_Iter_comp_to_iter): Likewise.
* testsuite/25_algorithms/search/78346.cc: New test.

Tested powerpc64le-linux, committed to trunk, gcc-6-branch and gcc-5-branch.


commit 76bd2936da1ca933480708f8381613d95dc8498e
Author: Jonathan Wakely 
Date:   Wed Feb 1 12:22:36 2017 +

PR78346 make  handle stashing iterators

	PR libstdc++/78346
	* include/bits/predefined_ops.h (_Iter_equals_iter): Store iterator
	not its referent.
	(_Iter_comp_to_iter): Likewise.
	* testsuite/25_algorithms/search/78346.cc: New test.

diff --git a/libstdc++-v3/include/bits/predefined_ops.h b/libstdc++-v3/include/bits/predefined_ops.h
index a5a7694..0624a38 100644
--- a/libstdc++-v3/include/bits/predefined_ops.h
+++ b/libstdc++-v3/include/bits/predefined_ops.h
@@ -24,7 +24,7 @@
 
 /** @file predefined_ops.h
  *  This is an internal header file, included by other library headers.
- *  You should not attempt to use it directly.
+ *  You should not attempt to use it directly. @headername{algorithm}
  */
 
 #ifndef _GLIBCXX_PREDEFINED_OPS_H
@@ -249,17 +249,17 @@ namespace __ops
   template
 struct _Iter_equals_iter
 {
-  typename std::iterator_traits<_Iterator1>::reference _M_ref;
+  _Iterator1 _M_it1;
 
   explicit
   _Iter_equals_iter(_Iterator1 __it1)
-	: _M_ref(*__it1)
+	: _M_it1(__it1)
   { }
 
   template
 	bool
 	operator()(_Iterator2 __it2)
-	{ return *__it2 == _M_ref; }
+	{ return *__it2 == *_M_it1; }
 };
 
   template
@@ -315,16 +315,16 @@ namespace __ops
 struct _Iter_comp_to_iter
 {
   _Compare _M_comp;
-  typename std::iterator_traits<_Iterator1>::reference _M_ref;
+  _Iterator1 _M_it1;
 
   _Iter_comp_to_iter(_Compare __comp, _Iterator1 __it1)
-	: _M_comp(_GLIBCXX_MOVE(__comp)), _M_ref(*__it1)
+	: _M_comp(_GLIBCXX_MOVE(__comp)), _M_it1(__it1)
   { }
 
   template
 	bool
 	operator()(_Iterator2 __it2)
-	{ return bool(_M_comp(*__it2, _M_ref)); }
+	{ return bool(_M_comp(*__it2, *_M_it1)); }
 };
 
   template
diff --git a/libstdc++-v3/testsuite/25_algorithms/search/78346.cc b/libstdc++-v3/testsuite/25_algorithms/search/78346.cc
new file mode 100644
index 000..6f003bd
--- /dev/null
+++ b/libstdc++-v3/testsuite/25_algorithms/search/78346.cc
@@ -0,0 +1,118 @@
+// Copyright (C) 2017 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+// { dg-do run { target c++11 } }
+
+#include 
+#include 
+
+bool values[100];
+
+unsigned next_id()
+{
+  static unsigned counter = 0;
+  VERIFY(counter < 100);
+  return counter++;
+}
+
+struct value
+{
+  int val;
+  const unsigned id;
+
+  value(int i = 0) : val(i), id(next_id()) { values[id] = true; }
+  value(const value& v) : val(v.val), id(next_id()) { values[id] = true; }
+  value& operator=(const value& v) { val = v.val; return *this; }
+  ~value() { values[id] = false; }
+};
+
+bool operator<(const value& lhs, const value& rhs)
+{
+  if (!values[lhs.id])
+throw lhs.id;
+  if (!values[rhs.id])
+throw rhs.id;
+  return lhs.val < rhs.val;
+}
+
+bool operator==(const value& lhs, const value& rhs)
+{
+  if (!values[lhs.id])
+throw lhs.id;
+  if (!values[rhs.id])
+throw rhs.id;
+  return lhs.val == rhs.val;
+}
+
+// A forward iterator that fails to meet the requirement that for any
+// two dereferenceable forward iterators, a == b implies &*a == &*b
+struct stashing_iterator
+{
+  typedef std::forward_iterator_tag iterator_category;
+  typedef value value_type;
+  typedef value_type const* pointer;
+  typedef value_type const& reference;
+  typedef std::ptrdiff_t difference_type;
+
+  stashing_iterator() : ptr(), stashed() { }
+  stashing_iterator(pointer p) : ptr(p), stashed() { stash(); }
+  stashing_iterator(const stashing_iterator&) = default;
+  stashing_iterator& operator=(const stashing_iterator&) = default;
+
+  stashing_iterator& operator++()
+  {
+++ptr;
+stash();
+return *this;
+  }
+
+  stashing_iterator operator++(int)
+  {
+stash

[PATCH] Add dg-require-alias to a ICF test (PR testsuite/79272).

2017-02-01 Thread Martin Liška
Hello.

As mentioned in the PR, hppa2.0w-hp-hpux11.11 does not support aliasing and thus
the scanned pattern is invalid. Fixed in patch.

Ready to be installed?
Martin
>From 2bc48dc64eeb0dbe55ca8f9f5abe2841f78c3c80 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 31 Jan 2017 09:51:56 +0100
Subject: [PATCH] Add dg-require-alias to a ICF test (PR testsuite/79272).

gcc/testsuite/ChangeLog:

2017-01-31  Martin Liska  

	PR testsuite/79272
	* gcc.dg/ipa/pr77653.c: Add dg-require-alias to the test.
---
 gcc/testsuite/gcc.dg/ipa/pr77653.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.dg/ipa/pr77653.c b/gcc/testsuite/gcc.dg/ipa/pr77653.c
index f508815a3fb..f0b2b224091 100644
--- a/gcc/testsuite/gcc.dg/ipa/pr77653.c
+++ b/gcc/testsuite/gcc.dg/ipa/pr77653.c
@@ -1,3 +1,4 @@
+/* { dg-require-alias "" } */
 /* { dg-options "-O2 -fdump-ipa-icf-details"  } */
 
 int a, b, c, d, e, h, i, j, k, l;
-- 
2.11.0



[PATCH][C++] Improve memory use for PR12245

2017-02-01 Thread Richard Biener

Looks like we cache the answer to maybe_constant_value (INTEGER_CST)
which results in (-fmem-report):

cp/constexpr.c:4814 (maybe_constant_value) 67108816:100.0% 
10066310417:  0.0%   ggc

this can be improved trivially to

cp/constexpr.c:4817 (maybe_constant_value) 2032: 13.6%  
2144 2:  0.0%   ggc

with the following patch which I am testing right now.

Ok for trunk?

(just in case it causes some fallout because, err, some tcc_constant
is not really constant, what's the subset we can cheaply check here?
basically we want to avoid caching all INTEGER_CSTs we use for
CONSTRUCTOR_INDEX in large initializers)

Thanks,
Richard.

2017-02-01  Richard Biener  

cp/
* constexpr.c (maybe_constant_value): Do not cache
CONSTANT_CLASS_P nodes.

Index: gcc/cp/constexpr.c
===
--- gcc/cp/constexpr.c  (revision 245094)
+++ gcc/cp/constexpr.c  (working copy)
@@ -4810,6 +4810,9 @@ static GTY((deletable)) hash_map::create_ggc (101);
 


Re: [PATCH] Add dg-require-alias to a ICF test (PR testsuite/79272).

2017-02-01 Thread John David Anglin
On 2017-02-01, at 8:03 AM, Martin Liška wrote:

> As mentioned in the PR, hppa2.0w-hp-hpux11.11 does not support aliasing and 
> thus
> the scanned pattern is invalid. Fixed in patch.

Looks fine to me and obvious.

Dave
--
John David Anglin   dave.ang...@bell.net





[C++ Patch] PR 69637

2017-02-01 Thread Paolo Carlini

Hi,

I'm still working on a number of ICEs on invalid happening during error 
recovery...


In these cases, after a meaningful diagnostic, we ICE in 
cxx_eval_constant_expression because it doesn't handle OVERLOADs and 
TEMPLATE_ID_EXPRs in the main switch. I tried a number of different 
approaches, but I think the most straightforward one is the below: avoid 
setting up  in grokbitfield DECL_INITIAL (and SET_DECL_C_BIT_FIELD) 
after the early diagnostic, exactly as currently happens when the width 
passed to the function is an error_mark_node. In particular, the 
approach seems robust because the callers don't use the the width 
information afterward (it's only passed to grokbitfield). Tested 
x86_64-linux.


Thanks, Paolo.

///

/cp
2017-02-01  Paolo Carlini  

PR c++/69637
* decl2.c (grokbitfield): In case of error don't set-up DECL_INITIAL
to the width.

/testsuite
2017-02-01  Paolo Carlini  

PR c++/69637
* g++.dg/cpp0x/pr69637-1.C: New.
* g++.dg/cpp0x/pr69637-2.C: Likewise.
Index: cp/decl2.c
===
--- cp/decl2.c  (revision 245084)
+++ cp/decl2.c  (working copy)
@@ -1059,8 +1059,11 @@ grokbitfield (const cp_declarator *declarator,
  && !INTEGRAL_OR_UNSCOPED_ENUMERATION_TYPE_P (TREE_TYPE (width)))
error ("width of bit-field %qD has non-integral type %qT", value,
   TREE_TYPE (width));
-  DECL_INITIAL (value) = width;
-  SET_DECL_C_BIT_FIELD (value);
+  else
+   {
+ DECL_INITIAL (value) = width;
+ SET_DECL_C_BIT_FIELD (value);
+   }
 }
 
   DECL_IN_AGGR_P (value) = 1;
Index: testsuite/g++.dg/cpp0x/pr69637-1.C
===
--- testsuite/g++.dg/cpp0x/pr69637-1.C  (revision 0)
+++ testsuite/g++.dg/cpp0x/pr69637-1.C  (working copy)
@@ -0,0 +1,8 @@
+// { dg-do compile { target c++11 } }
+
+template 
+int foo () { return 1; }
+
+struct B {
+unsigned c: foo;  // { dg-error "non-integral type" }
+};
Index: testsuite/g++.dg/cpp0x/pr69637-2.C
===
--- testsuite/g++.dg/cpp0x/pr69637-2.C  (revision 0)
+++ testsuite/g++.dg/cpp0x/pr69637-2.C  (working copy)
@@ -0,0 +1,6 @@
+// { dg-do compile { target c++11 } }
+
+template 
+constexpr int foo () { return N; }
+
+struct B { unsigned c: foo, 3(); };  // { dg-error "non-integral 
type|expected" }


Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-02-01 Thread Alexander Monakov
Hi,

Earlier Richard mentioned the possibility to special-case GOMP_SIMT_ENTER to
allow passing privatized variables to it by reference without making them
addressable.  I now see that such special-casing is already done for
IFN_ATOMIC_COMPARE_EXCHANGE in tree-ssa.c: execute_update_addresses_taken ().
If that's the only place in the compiler where such special casing needs to
happen, and the rest of the compiler already tolerates it, can we indeed do:

  void *simtrec = GOMP_SIMT_ENTER (&var1, &var2, ...);

  for (...) { ... }

  var1 ={v} {CLOBBER};
  var2 ={v} {CLOBBER};
  ... ;
  GOMP_SIMT_EXIT (simtrec, &var1, &var2, ...)'

Thanks.
Alexander


PR79286, ira combine_and_move_insns in loops

2017-02-01 Thread Alan Modra
This patch cures PR79286 by restoring the REG_DEAD note test used
prior to r235660, but modified to only exclude insns that may trap.
I'd like to allow combine/move without a REG_DEAD note in loops
because insns in loops often lack such notes, and I recall seeing
quite a few cases at the time I wrote r235660 where loops benefited
from allowing the combine/move to happen.

I've been battling hardware instability on my x86_64 box all day, so
hopefully this finally passes bootstrap and regression testing
overnight.  OK to apply assuming no regressions?

PR rtl-optimization/79286
* ira.c (combine_and_move_insns): Don't combine or move when
use_insn does not have a REG_DEAD note and def_insn may trap.
testsuite/
* gcc.c-torture/execute/pr79286.c: New.

diff --git a/gcc/ira.c b/gcc/ira.c
index 96b4b62..cdde775 100644
--- a/gcc/ira.c
+++ b/gcc/ira.c
@@ -3682,6 +3682,14 @@ combine_and_move_insns (void)
   gcc_assert (DF_REG_DEF_COUNT (regno) == 1 && DF_REF_INSN_INFO (def));
   rtx_insn *def_insn = DF_REF_INSN (def);
 
+  /* Even though we know this reg is set exactly once and used
+exactly once, check that the reg dies in the use insn or that
+the def insn can't trap.  This is to exclude degenerate cases
+in loops where the use occurs before the def.  See PR79286.  */
+  if (!find_reg_note (use_insn, REG_DEAD, regno_reg_rtx[regno])
+ && may_trap_p (PATTERN (def_insn)))
+   continue;
+
   /* We may not move instructions that can throw, since that
 changes basic block boundaries and we are not prepared to
 adjust the CFG to match.  */
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr79286.c 
b/gcc/testsuite/gcc.c-torture/execute/pr79286.c
new file mode 100644
index 000..e6d0e93
--- /dev/null
+++ b/gcc/testsuite/gcc.c-torture/execute/pr79286.c
@@ -0,0 +1,15 @@
+int a = 0, c = 0;
+static int d[][8] = {};
+
+int main ()
+{
+  int e;
+  for (int b = 0; b < 4; b++)
+{
+  __builtin_printf ("%d\n", b, e);
+  while (a && c++)
+   e = d[30][0];
+}
+
+  return 0;
+}

-- 
Alan Modra
Australia Development Lab, IBM


Re: [PATCH] [AArch64] Implement popcount pattern

2017-02-01 Thread James Greenhalgh
On Tue, Dec 13, 2016 at 11:59:36AM +, Kyrill Tkachov wrote:
> Hi Naveen,
> 
> On 13/12/16 11:51, Hurugalawadi, Naveen wrote:
> >Hi Kyrill,
> >
> >Thanks for reviewing the patch and your useful comments.
> >
> >>>looks good to me if it has gone through the normal required
> >>>bootstrap and testing, but I can't approve.
> >Bootstrapped and Regression Tested on aarch64-thunderx-linux.
> >
> >>>The rest of the MD file uses the term AdvSIMD. Also, the instrurction
> >>>is CNT rather than "pop count".
> >Done.
> >
> >>>__builtin_popcount takes an unsigned int, so this should be
> >>>scanning for absence of popcountsi2 instead?
> >Done.
> >
> >Please find attached the modified patch as per review comments
> >and let me know if its okay for Stage-1 or current branch.
> 
> This looks much better, thanks.
> I still have a minor nit about the testcase.
> 
> +long
> +foo1 (int x)
> +{
> +  return __builtin_popcountl (x);
> +}
> 
> On ILP32 systems this would still use the SImode patterns, so I suggest you 
> use __builtin_popcountll and
> an unsigned long long return type to ensure you always exercise the 64-bit 
> code.
> 
> +
> +/* { dg-final { scan-assembler-not "popcount" } } */
> 
> 
> This looks ok to me otherwise, but you'll need an ok from the aarch64 folk.

I didn't see a follow-up patch posted with Kyrill's comments addressed?

Thanks,
James



[PATCH][C++] Save memory/time when folding CONSTRUCTORs

2017-02-01 Thread Richard Biener

Currently we copy CONSTRUCTORs we cp_fold even if no elements fold
(we throw the copy away then).  That's wasteful and we can easily
do the copying on-demand.  For simplicity the following resorts
to memcpy-ing the whole original vector on the first change
and only overwrites changed elements in the folding loop
(I suspect that's usually faster given initializer elements do
usually not fold(?)).

That removes another 35MB GC memory use from the PR12245 testcase.
(the biggest offender there is, non-surprisingly the C++ preprocessor
tokens with 130MB followed by 35MB for the CONSTRUCTOR and 40MB for
INTEGER_CSTs).

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Ok for trunk if that succeeds?

Thanks,
Richard.

2017-02-01  Richard Biener  

PR cp/14179
cp/
* cp-gimplify.c (cp_fold): When folding a CONSTRUCTOR copy
it lazily on the first changed element only and copy it
fully upfront, only storing changed elements.

Index: gcc/cp/cp-gimplify.c
===
--- gcc/cp/cp-gimplify.c(revision 245094)
+++ gcc/cp/cp-gimplify.c(working copy)
@@ -2361,12 +2361,9 @@ cp_fold (tree x)
bool changed = false;
vec *elts = CONSTRUCTOR_ELTS (x);
vec *nelts = NULL;
-   vec_safe_reserve (nelts, vec_safe_length (elts));
FOR_EACH_VEC_SAFE_ELT (elts, i, p)
  {
tree op = cp_fold (p->value);
-   constructor_elt e = { p->index, op };
-   nelts->quick_push (e);
if (op != p->value)
  {
if (op == error_mark_node)
@@ -2375,7 +2372,13 @@ cp_fold (tree x)
changed = false;
break;
  }
-   changed = true;
+   if (! changed)
+ {
+   nelts = elts->copy ();
+   changed = true;
+ }
+   constructor_elt e = { p->index, op };
+   (*nelts)[i] = e;
  }
  }
if (changed)


Re: [PATCH] Add dg-require-alias to a ICF test (PR testsuite/79272).

2017-02-01 Thread Martin Liška
On 02/01/2017 02:14 PM, John David Anglin wrote:
> On 2017-02-01, at 8:03 AM, Martin Liška wrote:
> 
>> As mentioned in the PR, hppa2.0w-hp-hpux11.11 does not support aliasing and 
>> thus
>> the scanned pattern is invalid. Fixed in patch.
> 
> Looks fine to me and obvious.

Ok, installed as r245095.

Martin

> 
> Dave
> --
> John David Anglin dave.ang...@bell.net
> 
> 
> 



Re: [PATCH][C++] Save memory/time when folding CONSTRUCTORs

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 03:03:15PM +0100, Richard Biener wrote:
> 2017-02-01  Richard Biener  
> 
>   PR cp/14179
>   cp/
>   * cp-gimplify.c (cp_fold): When folding a CONSTRUCTOR copy
>   it lazily on the first changed element only and copy it
>   fully upfront, only storing changed elements.
> 
> Index: gcc/cp/cp-gimplify.c
> ===
> --- gcc/cp/cp-gimplify.c  (revision 245094)
> +++ gcc/cp/cp-gimplify.c  (working copy)
> @@ -2361,12 +2361,9 @@ cp_fold (tree x)
>   bool changed = false;
>   vec *elts = CONSTRUCTOR_ELTS (x);
>   vec *nelts = NULL;
> - vec_safe_reserve (nelts, vec_safe_length (elts));
>   FOR_EACH_VEC_SAFE_ELT (elts, i, p)
> {
>   tree op = cp_fold (p->value);
> - constructor_elt e = { p->index, op };
> - nelts->quick_push (e);
>   if (op != p->value)
> {
>   if (op == error_mark_node)
> @@ -2375,7 +2372,13 @@ cp_fold (tree x)
>   changed = false;
>   break;
> }
> - changed = true;
> + if (! changed)
> +   {
> + nelts = elts->copy ();

Isn't the above part unnecessarily expensive, e.g. for the case
where you have huge CONSTRUCTOR and recursive cp_fold changes already
the very first value?  Wouldn't it be better to just do:
vec_safe_reserve (nelts, vec_safe_length (elts));
vec_quick_grow (nelts, i);
memcpy (nelts->address (), elts->address (),
i * sizeof (constructor_elt));
and then:

> + changed = true;
> +   }

> + constructor_elt e = { p->index, op };
> + (*nelts)[i] = e;

Remove the above two lines:
> }

and add:
if (changed)
  {
constructor_elt e = { p->index, op };
nelts->quick_push (e);
  }

> }
>   if (changed)

Otherwise it looks good to me, but Jason should have his last word.

Jakub


Re: [PATCH v2] aarch64: Add split-stack initial support

2017-02-01 Thread Wilco Dijkstra
Hi Adhermerval,

The argument code looks good now, but this isn't right:

+  int ninsn = aarch64_internal_mov_immediate (reg10, GEN_INT (-allocate),
+ true, Pmode);
+  gcc_assert (ninsn == 1 || ninsn == 2);
+  if (ninsn == 1)
+    {
+  if (allocate > 0)
+   emit_insn (gen_insv_immdi (reg10, GEN_INT (0), GEN_INT (0x)));
+  else
+   emit_insn (gen_insv_immdi (reg10, GEN_INT (0), GEN_INT (0x0)));
+    }

Both insv_imm will always set the low 16 bits of X10 to zero, corrupting the 
value
of the first instruction. It seems best to emit both instructions explicitly 
and use
positive values to avoid the zero special case (this should make the linker code
updating the allocation simpler too):

gen_rtx_SET (reg10, GEN_INT (allocate & 0x))
gen_insv_immdi (reg10, GEN_INT (16), GEN_INT ((allocate & 0x) >> 16))

I bet this will avoid the crash you mentioned.

Wilco
    

Re: [PATCH][C++] Save memory/time when folding CONSTRUCTORs

2017-02-01 Thread Richard Biener
On Wed, 1 Feb 2017, Jakub Jelinek wrote:

> On Wed, Feb 01, 2017 at 03:03:15PM +0100, Richard Biener wrote:
> > 2017-02-01  Richard Biener  
> > 
> > PR cp/14179
> > cp/
> > * cp-gimplify.c (cp_fold): When folding a CONSTRUCTOR copy
> > it lazily on the first changed element only and copy it
> > fully upfront, only storing changed elements.
> > 
> > Index: gcc/cp/cp-gimplify.c
> > ===
> > --- gcc/cp/cp-gimplify.c(revision 245094)
> > +++ gcc/cp/cp-gimplify.c(working copy)
> > @@ -2361,12 +2361,9 @@ cp_fold (tree x)
> > bool changed = false;
> > vec *elts = CONSTRUCTOR_ELTS (x);
> > vec *nelts = NULL;
> > -   vec_safe_reserve (nelts, vec_safe_length (elts));
> > FOR_EACH_VEC_SAFE_ELT (elts, i, p)
> >   {
> > tree op = cp_fold (p->value);
> > -   constructor_elt e = { p->index, op };
> > -   nelts->quick_push (e);
> > if (op != p->value)
> >   {
> > if (op == error_mark_node)
> > @@ -2375,7 +2372,13 @@ cp_fold (tree x)
> > changed = false;
> > break;
> >   }
> > -   changed = true;
> > +   if (! changed)
> > + {
> > +   nelts = elts->copy ();
> 
> Isn't the above part unnecessarily expensive, e.g. for the case
> where you have huge CONSTRUCTOR and recursive cp_fold changes already
> the very first value?  Wouldn't it be better to just do:
>   vec_safe_reserve (nelts, vec_safe_length (elts));
>   vec_quick_grow (nelts, i);
>   memcpy (nelts->address (), elts->address (),
>   i * sizeof (constructor_elt));
> and then:

It really depends on how many constructor elements usually fold.
If every next element will fold then yes, otherwise a memcpy
is going to be faster than individual vec_quick_push ()s with
cache-trashing cp_fold calls inbetween.

But I didn't benchmark anything, I just looked at memory use
(for the case where nothing folds).  And elts->copy () looks
much "cleaner" than this reserve/grow/memcpy ;)

So I'll do whatever Jason suggests here.

Thanks,
Richard.


Re: [PATCH][C++] Improve memory use for PR12245

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 02:14:20PM +0100, Richard Biener wrote:
> 
> Looks like we cache the answer to maybe_constant_value (INTEGER_CST)
> which results in (-fmem-report):
> 
> cp/constexpr.c:4814 (maybe_constant_value) 67108816:100.0% 
> 10066310417:  0.0%   ggc
> 
> this can be improved trivially to
> 
> cp/constexpr.c:4817 (maybe_constant_value) 2032: 13.6%  
> 2144 2:  0.0%   ggc
> 
> with the following patch which I am testing right now.
> 
> Ok for trunk?
> 
> (just in case it causes some fallout because, err, some tcc_constant
> is not really constant, what's the subset we can cheaply check here?
> basically we want to avoid caching all INTEGER_CSTs we use for
> CONSTRUCTOR_INDEX in large initializers)

I'm worried that we don't want to handle all the constants that way.
As I wrote on IRC, I see some problematic constants:
1) not sure if constants can't be
   potential_nondependent_constant_expression, then we don't want to return
   them
2) cxx_eval_outermost_constant_expr has some special handling of
   trees with vector type (and array type)
3) constants with TREE_OVERFLOW should go through maybe_constant_value_1
4) INTEGER_CSTs with POINTER_TYPE (if they aren't 0) likewise

For 3) and 4) I believe maybe_constant_value is supposed to wrap the
constants into a NOP_EXPR or something.

> 2017-02-01  Richard Biener  
> 
>   cp/
>   * constexpr.c (maybe_constant_value): Do not cache
>   CONSTANT_CLASS_P nodes.
> 
> Index: gcc/cp/constexpr.c
> ===
> --- gcc/cp/constexpr.c(revision 245094)
> +++ gcc/cp/constexpr.c(working copy)
> @@ -4810,6 +4810,9 @@ static GTY((deletable)) hash_map  tree
>  maybe_constant_value (tree t, tree decl)
>  {
> +  if (CONSTANT_CLASS_P (t))
> +return t;
> +
>if (cv_cache == NULL)
>  cv_cache = hash_map::create_ggc (101);
>  

Jakub


Re: [PATCH][C++] Save memory/time when folding CONSTRUCTORs

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 03:35:49PM +0100, Richard Biener wrote:
> > >   changed = false;
> > >   break;
> > > }
> > > - changed = true;
> > > + if (! changed)
> > > +   {
> > > + nelts = elts->copy ();
> > 
> > Isn't the above part unnecessarily expensive, e.g. for the case
> > where you have huge CONSTRUCTOR and recursive cp_fold changes already
> > the very first value?  Wouldn't it be better to just do:
> > vec_safe_reserve (nelts, vec_safe_length (elts));
> > vec_quick_grow (nelts, i);
> > memcpy (nelts->address (), elts->address (),
> > i * sizeof (constructor_elt));
> > and then:
> 
> It really depends on how many constructor elements usually fold.
> If every next element will fold then yes, otherwise a memcpy
> is going to be faster than individual vec_quick_push ()s with
> cache-trashing cp_fold calls inbetween.
> 
> But I didn't benchmark anything, I just looked at memory use
> (for the case where nothing folds).  And elts->copy () looks
> much "cleaner" than this reserve/grow/memcpy ;)

Maybe.  But then it would be better not to do:
+   constructor_elt e = { p->index, op };   
   
+   (*nelts)[i] = e;
   
but just
(*nelts)[i].value = op;
because (*nelts)[i].index has been already copied and is the same,
so no need to override it again.

Jakub


Re: [PATCH][C++] Save memory/time when folding CONSTRUCTORs

2017-02-01 Thread Richard Biener
On Wed, 1 Feb 2017, Jakub Jelinek wrote:

> On Wed, Feb 01, 2017 at 03:35:49PM +0100, Richard Biener wrote:
> > > > changed = false;
> > > > break;
> > > >   }
> > > > -   changed = true;
> > > > +   if (! changed)
> > > > + {
> > > > +   nelts = elts->copy ();
> > > 
> > > Isn't the above part unnecessarily expensive, e.g. for the case
> > > where you have huge CONSTRUCTOR and recursive cp_fold changes already
> > > the very first value?  Wouldn't it be better to just do:
> > >   vec_safe_reserve (nelts, vec_safe_length (elts));
> > >   vec_quick_grow (nelts, i);
> > >   memcpy (nelts->address (), elts->address (),
> > >   i * sizeof (constructor_elt));
> > > and then:
> > 
> > It really depends on how many constructor elements usually fold.
> > If every next element will fold then yes, otherwise a memcpy
> > is going to be faster than individual vec_quick_push ()s with
> > cache-trashing cp_fold calls inbetween.
> > 
> > But I didn't benchmark anything, I just looked at memory use
> > (for the case where nothing folds).  And elts->copy () looks
> > much "cleaner" than this reserve/grow/memcpy ;)
> 
> Maybe.  But then it would be better not to do:
> +   constructor_elt e = { p->index, op }; 
>  
> +   (*nelts)[i] = e;  
>  
> but just
>   (*nelts)[i].value = op;
> because (*nelts)[i].index has been already copied and is the same,
> so no need to override it again.

Heh, yes - good catch.  Consider the patch changed that way.

Richard.


Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 04:28:14PM +0300, Alexander Monakov wrote:
> Hi,
> 
> Earlier Richard mentioned the possibility to special-case GOMP_SIMT_ENTER to
> allow passing privatized variables to it by reference without making them
> addressable.  I now see that such special-casing is already done for
> IFN_ATOMIC_COMPARE_EXCHANGE in tree-ssa.c: execute_update_addresses_taken ().
> If that's the only place in the compiler where such special casing needs to
> happen, and the rest of the compiler already tolerates it, can we indeed do:

IFN_ASAN_POISON is treated that way too.  That also means that if a
variable is previously addressable and the only spot that takes its address
is that IFN, it can be rewritten into SSA form, but the IFN has to be
adjusted to something different which no longer takes the address.  Perhaps for:

>   void *simtrec = GOMP_SIMT_ENTER (&var1, &var2, ...);
> 
>   for (...) { ... }
> 
>   var1 ={v} {CLOBBER};
>   var2 ={v} {CLOBBER};
>   ... ;
>   GOMP_SIMT_EXIT (simtrec, &var1, &var2, ...)'

that would just mean dropping that &varN from the two ifns (and the clobbers 
would
be as usually when rewriting something into SSA get removed).

That said, I understand how would you add these &varN arguments during
lowering, but don't understand what would you want to do during inlining,
if you have addressable vars in inlined function, you need to avoid
escaping those from the SIMT region.

I believe the abnormal edges turning the SIMT region into kind of loop that
it to some extent is would take care of this even without having to add
those addresses to the ifns, but if you don't want to go that way,
supposedly the inliner would need to find those GOMP_SIMT_* statements
around the inline caller if any and adjust those?

Jakub


Re: [C++ Patch] PR 69637

2017-02-01 Thread Nathan Sidwell

On 02/01/2017 08:24 AM, Paolo Carlini wrote:

Hi,

I'm still working on a number of ICEs on invalid happening during error
recovery...

In these cases, after a meaningful diagnostic, we ICE in
cxx_eval_constant_expression because it doesn't handle OVERLOADs and
TEMPLATE_ID_EXPRs in the main switch. I tried a number of different
approaches, but I think the most straightforward one is the below: avoid
setting up  in grokbitfield DECL_INITIAL (and SET_DECL_C_BIT_FIELD)
after the early diagnostic, exactly as currently happens when the width
passed to the function is an error_mark_node. In particular, the
approach seems robust because the callers don't use the the width
information afterward (it's only passed to grokbitfield). Tested
x86_64-linux.



ok, thanks


--
Nathan Sidwell


Re: [PATCH] Fix PR79278, amend ADJUST_FIELD_ALIGN interface

2017-02-01 Thread Trevor Saunders
On Wed, Feb 01, 2017 at 09:42:08AM +0100, Richard Biener wrote:
> On Tue, 31 Jan 2017, Jeff Law wrote:
> 
> > On 01/31/2017 02:01 AM, Richard Biener wrote:
> > > 
> > > This amends ADJUST_FIELD_ALIGN to always get the type of the field
> > > as argument but make the field itself optional.  All actual target
> > > macro implementations only look at the type of the field but FRV
> > > (which seems to misuse ADJUST_FIELD_ALIGN to do bitfield layout
> > > rather than using one of the three standard ways - Alex/Nick?).
> > Didn't we deprecate FRV?  Oh, that was MEP..  Nevermind.
> > 
> > 
> > 
> > > This speeds up min_align_of_type (no longer needs to build a FIELD_DECL)
> > > and thus (IMHO) makes it usable from get_object_alignment.  This
> > > causes us no longer to return bogus answers for indirect accesses to
> > > doubles on i?86 and expand to RTL with proper MEM_ALIGN.  (it also
> > > makes the previous fix for PR79256 no longer necessary)
> > > 
> > > Bootstrap and regtest running on x86_64-unknown-linux-gnu - is this ok
> > > for trunk at this stage?
> > > 
> > > grep found a ADJUST_FIELD_ALIGN use in libobjc/encoding.c but that
> > > is fed a C string(!?) as FIELD_DECL so I discounted it as unrelated

Yeah, this is all rather horrifying, but that actually "works" because
libobjc/encoding.c redefines the tree accessor macros it needs to work
with the strings.

> > > (and grep didn't find a way this macro could be defined there)?
> > Presumably this is the code that takes the structure and encodes information
> > about it for the runtime.  Though taking a string sounds horribly broken.
> > 
> > I suspect it gets included via tm.h.

It does.

> > I bet if someone built a cross far enough to build libobjc we could see it 
> > in
> > action.  It does make one wonder how this part of libobjc could possibly be
> > working on targets that define ADJUST_FIELD_ALIGN.
> > 
> > I'll note it's been that way since libobjc was moved into its own directory,
> > but wasn't like that prior to moving into its own directory.
> > 
> > I have no idea what to do here...
> 
> As objc builds just fine on x86_64 which does define ADJUST_FIELD_ALIGN
> including tm.h can't be the whole story here...
> 
> In fact preprocessed source on x86_64 with -dD shows no ADJUST_FIELD_ALIGN

i386.h conditions the definition on IN_TARGET_LIBS

> but instead
> 
> #define rs6000_special_adjust_field_align_p(FIELD,COMPUTED) 0
> 
> which means it must be sth powerpc specific...  FRV for example has

Yeah, different targets deal with this mess in different ways, looks
like frv uses a different macro and ppc redefines macros as needed
rs6000_special_adjust_field_align_p is also defined in ppc headers.

> 
> /* @@@ A hack, needed because libobjc wants to use ADJUST_FIELD_ALIGN for
>some reason.  */
> #ifdef IN_TARGET_LIBS
> #define BIGGEST_FIELD_ALIGNMENT 64
> #else
> /* An expression for the alignment of a structure field FIELD if the
>alignment computed in the usual way is COMPUTED.  GCC uses this
>value instead of the value in `BIGGEST_ALIGNMENT' or
>`BIGGEST_FIELD_ALIGNMENT', if defined, for structure fields only.  */
> #define ADJUST_FIELD_ALIGN(FIELD, COMPUTED) \
>   frv_adjust_field_align (FIELD, COMPUTED)
> #endif
> 
> Similar x86_64.
> 
> So it seems on power this might be an issue and thus I'd need to
> adjust the macro use - but not sure what to pass as "type" here...

Sorry but I seem to have successfully purged my brain of the details on
how this works.

Trev

> 
> I'll try to build a cross to ppc64 and see what happens.
> 
> Richard.


Re: [PATCH][C++] Save memory/time when folding CONSTRUCTORs

2017-02-01 Thread Nathan Sidwell

On 02/01/2017 09:03 AM, Richard Biener wrote:


Currently we copy CONSTRUCTORs we cp_fold even if no elements fold
(we throw the copy away then).  That's wasteful and we can easily
do the copying on-demand.  For simplicity the following resorts
to memcpy-ing the whole original vector on the first change
and only overwrites changed elements in the folding loop
(I suspect that's usually faster given initializer elements do
usually not fold(?)).

That removes another 35MB GC memory use from the PR12245 testcase.
(the biggest offender there is, non-surprisingly the C++ preprocessor
tokens with 130MB followed by 35MB for the CONSTRUCTOR and 40MB for
INTEGER_CSTs).

Bootstrap / regtest running on x86_64-unknown-linux-gnu.

Ok for trunk if that succeeds?

Thanks,
Richard.

2017-02-01  Richard Biener  

PR cp/14179
cp/
* cp-gimplify.c (cp_fold): When folding a CONSTRUCTOR copy
it lazily on the first changed element only and copy it
fully upfront, only storing changed elements.

Index: gcc/cp/cp-gimplify.c
===
--- gcc/cp/cp-gimplify.c(revision 245094)
+++ gcc/cp/cp-gimplify.c(working copy)
@@ -2361,12 +2361,9 @@ cp_fold (tree x)
bool changed = false;
vec *elts = CONSTRUCTOR_ELTS (x);
vec *nelts = NULL;



+   if (! changed)
+ {
+   nelts = elts->copy ();
+   changed = true;
+ }


doesn't this make 'changed' a synonym for 'nelts != NULL'?  (so you 
could ditch the former)


nathan

--
Nathan Sidwell


Re: [PATCH][C++] Improve memory use for PR12245

2017-02-01 Thread Richard Biener
On Wed, 1 Feb 2017, Jakub Jelinek wrote:

> On Wed, Feb 01, 2017 at 02:14:20PM +0100, Richard Biener wrote:
> > 
> > Looks like we cache the answer to maybe_constant_value (INTEGER_CST)
> > which results in (-fmem-report):
> > 
> > cp/constexpr.c:4814 (maybe_constant_value) 67108816:100.0% 
> > 10066310417:  0.0%   ggc
> > 
> > this can be improved trivially to
> > 
> > cp/constexpr.c:4817 (maybe_constant_value) 2032: 13.6%  
> > 2144 2:  0.0%   ggc
> > 
> > with the following patch which I am testing right now.
> > 
> > Ok for trunk?
> > 
> > (just in case it causes some fallout because, err, some tcc_constant
> > is not really constant, what's the subset we can cheaply check here?
> > basically we want to avoid caching all INTEGER_CSTs we use for
> > CONSTRUCTOR_INDEX in large initializers)
> 
> I'm worried that we don't want to handle all the constants that way.
> As I wrote on IRC, I see some problematic constants:
> 1) not sure if constants can't be
>potential_nondependent_constant_expression, then we don't want to return
>them
> 2) cxx_eval_outermost_constant_expr has some special handling of
>trees with vector type (and array type)
> 3) constants with TREE_OVERFLOW should go through maybe_constant_value_1
> 4) INTEGER_CSTs with POINTER_TYPE (if they aren't 0) likewise
> 
> For 3) and 4) I believe maybe_constant_value is supposed to wrap the
> constants into a NOP_EXPR or something.

Just to mention, bootstrap & regtest completed successfully without
regressions on x86_64-unknown-linux-gnu so we at least have zero
testing coverage for the cases that break.

I'll wait for Jason to suggest specific things to avoid, TREE_OVERFLOW
and pointer types are easy (no need to special case zero, it's just
one entry per pointer type).

Richard.

> > 2017-02-01  Richard Biener  
> > 
> > cp/
> > * constexpr.c (maybe_constant_value): Do not cache
> > CONSTANT_CLASS_P nodes.
> > 
> > Index: gcc/cp/constexpr.c
> > ===
> > --- gcc/cp/constexpr.c  (revision 245094)
> > +++ gcc/cp/constexpr.c  (working copy)
> > @@ -4810,6 +4810,9 @@ static GTY((deletable)) hash_map >  tree
> >  maybe_constant_value (tree t, tree decl)
> >  {
> > +  if (CONSTANT_CLASS_P (t))
> > +return t;
> > +
> >if (cv_cache == NULL)
> >  cv_cache = hash_map::create_ggc (101);
> >  


Re: [PATCH][C++] Save memory/time when folding CONSTRUCTORs

2017-02-01 Thread Richard Biener
On Wed, 1 Feb 2017, Nathan Sidwell wrote:

> On 02/01/2017 09:03 AM, Richard Biener wrote:
> > 
> > Currently we copy CONSTRUCTORs we cp_fold even if no elements fold
> > (we throw the copy away then).  That's wasteful and we can easily
> > do the copying on-demand.  For simplicity the following resorts
> > to memcpy-ing the whole original vector on the first change
> > and only overwrites changed elements in the folding loop
> > (I suspect that's usually faster given initializer elements do
> > usually not fold(?)).
> > 
> > That removes another 35MB GC memory use from the PR12245 testcase.
> > (the biggest offender there is, non-surprisingly the C++ preprocessor
> > tokens with 130MB followed by 35MB for the CONSTRUCTOR and 40MB for
> > INTEGER_CSTs).
> > 
> > Bootstrap / regtest running on x86_64-unknown-linux-gnu.
> > 
> > Ok for trunk if that succeeds?
> > 
> > Thanks,
> > Richard.
> > 
> > 2017-02-01  Richard Biener  
> > 
> > PR cp/14179
> > cp/
> > * cp-gimplify.c (cp_fold): When folding a CONSTRUCTOR copy
> > it lazily on the first changed element only and copy it
> > fully upfront, only storing changed elements.
> > 
> > Index: gcc/cp/cp-gimplify.c
> > ===
> > --- gcc/cp/cp-gimplify.c(revision 245094)
> > +++ gcc/cp/cp-gimplify.c(working copy)
> > @@ -2361,12 +2361,9 @@ cp_fold (tree x)
> > bool changed = false;
> > vec *elts = CONSTRUCTOR_ELTS (x);
> > vec *nelts = NULL;
> 
> > +   if (! changed)
> > + {
> > +   nelts = elts->copy ();
> > +   changed = true;
> > + }
> 
> doesn't this make 'changed' a synonym for 'nelts != NULL'?  (so you could
> ditch the former)

True.  Updated patch below.

Richard.

2017-02-01  Richard Biener  

PR cp/14179
cp/
* cp-gimplify.c (cp_fold): When folding a CONSTRUCTOR copy
it lazily on the first changed element only and copy it
fully upfront, only storing changed elements.

Index: gcc/cp/cp-gimplify.c
===
--- gcc/cp/cp-gimplify.c(revision 245096)
+++ gcc/cp/cp-gimplify.c(working copy)
@@ -2358,15 +2358,11 @@ cp_fold (tree x)
   {
unsigned i;
constructor_elt *p;
-   bool changed = false;
vec *elts = CONSTRUCTOR_ELTS (x);
vec *nelts = NULL;
-   vec_safe_reserve (nelts, vec_safe_length (elts));
FOR_EACH_VEC_SAFE_ELT (elts, i, p)
  {
tree op = cp_fold (p->value);
-   constructor_elt e = { p->index, op };
-   nelts->quick_push (e);
if (op != p->value)
  {
if (op == error_mark_node)
@@ -2375,10 +2371,12 @@ cp_fold (tree x)
changed = false;
break;
  }
-   changed = true;
+   if (nelts == NULL)
+ nelts = elts->copy ();
+   (*nelts)[i].value = op;
  }
  }
-   if (changed)
+   if (nelts)
  x = build_constructor (TREE_TYPE (x), nelts);
else
  vec_free (nelts);


Re: [PATCH][C++] Save memory/time when folding CONSTRUCTORs

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 04:18:49PM +0100, Richard Biener wrote:
> 2017-02-01  Richard Biener  
> 
>   PR cp/14179
>   cp/
>   * cp-gimplify.c (cp_fold): When folding a CONSTRUCTOR copy
>   it lazily on the first changed element only and copy it
>   fully upfront, only storing changed elements.
> 
> Index: gcc/cp/cp-gimplify.c
> ===
> --- gcc/cp/cp-gimplify.c  (revision 245096)
> +++ gcc/cp/cp-gimplify.c  (working copy)
> @@ -2358,15 +2358,11 @@ cp_fold (tree x)
>{
>   unsigned i;
>   constructor_elt *p;
> - bool changed = false;
>   vec *elts = CONSTRUCTOR_ELTS (x);
>   vec *nelts = NULL;
> - vec_safe_reserve (nelts, vec_safe_length (elts));
>   FOR_EACH_VEC_SAFE_ELT (elts, i, p)
> {
>   tree op = cp_fold (p->value);
> - constructor_elt e = { p->index, op };
> - nelts->quick_push (e);
>   if (op != p->value)
> {
>   if (op == error_mark_node)
> @@ -2375,10 +2371,12 @@ cp_fold (tree x)
>   changed = false;
>   break;
> }
> - changed = true;
> + if (nelts == NULL)
> +   nelts = elts->copy ();
> + (*nelts)[i].value = op;
> }
> }
> - if (changed)
> + if (nelts)
> x = build_constructor (TREE_TYPE (x), nelts);
>   else
> vec_free (nelts);

vec_free (nelts); doesn't make sense in this case though,
so I'd also remove the last 2 lines.

Jakub


Re: [PATCH][C++] Save memory/time when folding CONSTRUCTORs

2017-02-01 Thread Nathan Sidwell

On 02/01/2017 10:18 AM, Richard Biener wrote:


True.  Updated patch below.

Richard.

2017-02-01  Richard Biener  

PR cp/14179
cp/
* cp-gimplify.c (cp_fold): When folding a CONSTRUCTOR copy
it lazily on the first changed element only and copy it
fully upfront, only storing changed elements.


Looks good, with the tweak Jakub noticed.

nathan

--
Nathan Sidwell


Re: [PATCH] Fix PR79278, amend ADJUST_FIELD_ALIGN interface

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 10:27:27AM -0500, Trevor Saunders wrote:
> Yeah, different targets deal with this mess in different ways, looks
> like frv uses a different macro and ppc redefines macros as needed
> rs6000_special_adjust_field_align_p is also defined in ppc headers.
> 
> > 
> > /* @@@ A hack, needed because libobjc wants to use ADJUST_FIELD_ALIGN for
> >some reason.  */
> > #ifdef IN_TARGET_LIBS
> > #define BIGGEST_FIELD_ALIGNMENT 64
> > #else
> > /* An expression for the alignment of a structure field FIELD if the
> >alignment computed in the usual way is COMPUTED.  GCC uses this
> >value instead of the value in `BIGGEST_ALIGNMENT' or
> >`BIGGEST_FIELD_ALIGNMENT', if defined, for structure fields only.  */
> > #define ADJUST_FIELD_ALIGN(FIELD, COMPUTED) \
> >   frv_adjust_field_align (FIELD, COMPUTED)
> > #endif
> > 
> > Similar x86_64.
> > 
> > So it seems on power this might be an issue and thus I'd need to
> > adjust the macro use - but not sure what to pass as "type" here...
> 
> Sorry but I seem to have successfully purged my brain of the details on
> how this works.

IMNSHO what libobjc should have used is just some new builtin that would
allow it to query such information from the compiler.

Jakub


Re: [PATCH][C++] Save memory/time when folding CONSTRUCTORs

2017-02-01 Thread Richard Biener
On Wed, 1 Feb 2017, Nathan Sidwell wrote:

> On 02/01/2017 10:18 AM, Richard Biener wrote:
> 
> > True.  Updated patch below.
> > 
> > Richard.
> > 
> > 2017-02-01  Richard Biener  
> > 
> > PR cp/14179
> > cp/
> > * cp-gimplify.c (cp_fold): When folding a CONSTRUCTOR copy
> > it lazily on the first changed element only and copy it
> > fully upfront, only storing changed elements.
> 
> Looks good, with the tweak Jakub noticed.

And one more tweak, I have to free it for the error_mark_case.
(vec_free sets nelts to NULL)

Richard.

2017-02-01  Richard Biener  

PR cp/14179
cp/
* cp-gimplify.c (cp_fold): When folding a CONSTRUCTOR copy
it lazily on the first changed element only and copy it
fully upfront, only storing changed elements.

Index: gcc/cp/cp-gimplify.c
===
--- gcc/cp/cp-gimplify.c(revision 245096)
+++ gcc/cp/cp-gimplify.c(working copy)
@@ -2358,30 +2358,26 @@ cp_fold (tree x)
   {
unsigned i;
constructor_elt *p;
-   bool changed = false;
vec *elts = CONSTRUCTOR_ELTS (x);
vec *nelts = NULL;
-   vec_safe_reserve (nelts, vec_safe_length (elts));
FOR_EACH_VEC_SAFE_ELT (elts, i, p)
  {
tree op = cp_fold (p->value);
-   constructor_elt e = { p->index, op };
-   nelts->quick_push (e);
if (op != p->value)
  {
if (op == error_mark_node)
  {
x = error_mark_node;
-   changed = false;
+   vec_free (nelts);
break;
  }
-   changed = true;
+   if (nelts == NULL)
+ nelts = elts->copy ();
+   (*nelts)[i].value = op;
  }
  }
-   if (changed)
+   if (nelts)
  x = build_constructor (TREE_TYPE (x), nelts);
-   else
- vec_free (nelts);
break;
   }
 case TREE_VEC:


Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-02-01 Thread Alexander Monakov
On Wed, 1 Feb 2017, Jakub Jelinek wrote:
> IFN_ASAN_POISON is treated that way too.  That also means that if a
> variable is previously addressable and the only spot that takes its address
> is that IFN, it can be rewritten into SSA form, but the IFN has to be
> adjusted to something different which no longer takes the address.  Perhaps 
> for:
> 
> >   void *simtrec = GOMP_SIMT_ENTER (&var1, &var2, ...);
> > 
> >   for (...) { ... }
> > 
> >   var1 ={v} {CLOBBER};
> >   var2 ={v} {CLOBBER};
> >   ... ;
> >   GOMP_SIMT_EXIT (simtrec, &var1, &var2, ...)'
> 
> that would just mean dropping that &varN from the two ifns (and the clobbers 
> would
> be as usually when rewriting something into SSA get removed).

Ack; thanks.

> That said, I understand how would you add these &varN arguments during
> lowering, but don't understand what would you want to do during inlining,
> if you have addressable vars in inlined function, you need to avoid
> escaping those from the SIMT region.
> 
> I believe the abnormal edges turning the SIMT region into kind of loop that
> it to some extent is would take care of this even without having to add
> those addresses to the ifns, but if you don't want to go that way,
> supposedly the inliner would need to find those GOMP_SIMT_* statements
> around the inline caller if any and adjust those?

Yes; I imagine the approach taken in patch 2/5 can be extended to achieve this.
That is, instead of just storing a flag 'bool in_simtreg' in struct loop, store
pointers to corresponding SIMT_ENTER/EXIT gimple statements, use a similar
upwards walk on loop tree to discover if we're inlining into a SIMT region, and
if yes, adjust their argument lists.  Does this sound ok?

Alexander


Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 06:44:39PM +0300, Alexander Monakov wrote:
> > That said, I understand how would you add these &varN arguments during
> > lowering, but don't understand what would you want to do during inlining,
> > if you have addressable vars in inlined function, you need to avoid
> > escaping those from the SIMT region.
> > 
> > I believe the abnormal edges turning the SIMT region into kind of loop that
> > it to some extent is would take care of this even without having to add
> > those addresses to the ifns, but if you don't want to go that way,
> > supposedly the inliner would need to find those GOMP_SIMT_* statements
> > around the inline caller if any and adjust those?
> 
> Yes; I imagine the approach taken in patch 2/5 can be extended to achieve 
> this.
> That is, instead of just storing a flag 'bool in_simtreg' in struct loop, 
> store
> pointers to corresponding SIMT_ENTER/EXIT gimple statements, use a similar
> upwards walk on loop tree to discover if we're inlining into a SIMT region, 
> and
> if yes, adjust their argument lists.  Does this sound ok?

I'd prefer the abnormal edges and flags on the vars, if it can work, but
won't fight for that hard.
That said, I think pointers to gimple stmts in struct loop or something
similar is problematic, you'd need to adjust those whenever something would
remove those stmts, or e.g. duplicate the loop and stmts, handle those
during inlining (if you inline some function with SIMT_ENTER/EXIT in them)
etc.  Trying to find those stmts on preheader or in exit block from the
marked loop might be easier.

Jakub


Re: [PATCH] Provide opt-info for autopar and graphite

2017-02-01 Thread Sebastian Pop
On Wed, Feb 1, 2017 at 4:43 AM, Richard Biener  wrote:
> * graphite.c: Include tree-vectorizer.h for find_loop_location.
> (graphite_transform_loops): Provide opt-info for optimized nests.
> * tree-parloop.c (parallelize_loops): Provide opt-info for
> parallelized loops.

Looks good to me.


Re: [PR63238] output alignment debug information

2017-02-01 Thread Jakub Jelinek
On Fri, Jan 27, 2017 at 04:24:58AM -0200, Alexandre Oliva wrote:
> Output DWARFv5+ DW_AT_alignment for non-default alignment of
> variables, fields and types.

The new tests all fail on targets that default to -gstrict-dwarf
because they have buggy or prehistoric linkers/debug info consumers
like Darwin.

Fixed thusly, tested on x86_64-linux vanilla and with common.opt
hack to turn -gstrict-dwarf and -fno-merge-debug-strings by default,
committed to trunk as obvious:

2017-02-01  Jakub Jelinek  

PR testsuite/79324
* gcc.dg/debug/dwarf2/align-1.c: Add -gno-strict-dwarf to dg-options.
* gcc.dg/debug/dwarf2/align-2.c: Likewise.
* gcc.dg/debug/dwarf2/align-3.c: Likewise.
* gcc.dg/debug/dwarf2/align-4.c: Likewise.
* gcc.dg/debug/dwarf2/align-5.c: Likewise.
* gcc.dg/debug/dwarf2/align-6.c: Likewise.
* gcc.dg/debug/dwarf2/align-as-1.c: Likewise.
* g++.dg/debug/dwarf2/align-1.C: Likewise.
* g++.dg/debug/dwarf2/align-2.C: Likewise.
* g++.dg/debug/dwarf2/align-3.C: Likewise.
* g++.dg/debug/dwarf2/align-4.C: Likewise.
* g++.dg/debug/dwarf2/align-5.C: Likewise.
* g++.dg/debug/dwarf2/align-6.C: Likewise.

--- gcc/testsuite/gcc.dg/debug/dwarf2/align-1.c.jj  2017-01-31 
09:26:00.0 +0100
+++ gcc/testsuite/gcc.dg/debug/dwarf2/align-1.c 2017-02-01 16:39:24.852112430 
+0100
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-O -g -dA" }
+// { dg-options "-O -g -dA -gno-strict-dwarf" }
 // { dg-final { scan-assembler-times " DW_AT_alignment" 1 { xfail { 
powerpc-ibm-aix* } } } }
 
 int __attribute__((__aligned__(64))) i;
--- gcc/testsuite/gcc.dg/debug/dwarf2/align-2.c.jj  2017-01-31 
09:26:00.0 +0100
+++ gcc/testsuite/gcc.dg/debug/dwarf2/align-2.c 2017-02-01 16:39:34.260991165 
+0100
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-O -g -dA" }
+// { dg-options "-O -g -dA -gno-strict-dwarf" }
 // { dg-final { scan-assembler-times " DW_AT_alignment" 1 { xfail { 
powerpc-ibm-aix* } } } }
 
 typedef int __attribute__((__aligned__(64))) i_t;
--- gcc/testsuite/gcc.dg/debug/dwarf2/align-3.c.jj  2017-01-31 
09:26:00.0 +0100
+++ gcc/testsuite/gcc.dg/debug/dwarf2/align-3.c 2017-02-01 16:39:37.841945013 
+0100
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-O -g -dA" }
+// { dg-options "-O -g -dA -gno-strict-dwarf" }
 // { dg-final { scan-assembler-times " DW_AT_alignment" 1 { xfail { 
powerpc-ibm-aix* } } } }
 
 typedef int int_t;
--- gcc/testsuite/gcc.dg/debug/dwarf2/align-4.c.jj  2017-01-31 
09:26:00.0 +0100
+++ gcc/testsuite/gcc.dg/debug/dwarf2/align-4.c 2017-02-01 16:39:41.889892842 
+0100
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-O -g -dA" }
+// { dg-options "-O -g -dA -gno-strict-dwarf" }
 // { dg-final { scan-assembler-times " DW_AT_alignment" 2 { xfail { 
powerpc-ibm-aix* } } } }
 
 struct tt {
--- gcc/testsuite/gcc.dg/debug/dwarf2/align-5.c.jj  2017-01-31 
09:26:00.0 +0100
+++ gcc/testsuite/gcc.dg/debug/dwarf2/align-5.c 2017-02-01 16:39:46.117838350 
+0100
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-O -g -dA" }
+// { dg-options "-O -g -dA -gno-strict-dwarf" }
 // { dg-final { scan-assembler-times " DW_AT_alignment" 1 { xfail { 
powerpc-ibm-aix* } } } }
 
 struct tt {
--- gcc/testsuite/gcc.dg/debug/dwarf2/align-6.c.jj  2017-01-31 
09:26:00.0 +0100
+++ gcc/testsuite/gcc.dg/debug/dwarf2/align-6.c 2017-02-01 16:39:50.884776913 
+0100
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-O -g -dA" }
+// { dg-options "-O -g -dA -gno-strict-dwarf" }
 // { dg-final { scan-assembler-times " DW_AT_alignment" 1 { xfail { 
powerpc-ibm-aix* } } } }
 
 struct tt {
--- gcc/testsuite/gcc.dg/debug/dwarf2/align-as-1.c.jj   2017-01-31 
09:26:00.0 +0100
+++ gcc/testsuite/gcc.dg/debug/dwarf2/align-as-1.c  2017-02-01 
16:39:55.554716725 +0100
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-O -g -dA" }
+// { dg-options "-O -g -dA -gno-strict-dwarf" }
 // { dg-final { scan-assembler-times " DW_AT_alignment" 1 { xfail { 
powerpc-ibm-aix* } } } }
 
 int _Alignas(64) i;
--- gcc/testsuite/g++.dg/debug/dwarf2/align-1.C.jj  2017-01-31 
09:26:00.0 +0100
+++ gcc/testsuite/g++.dg/debug/dwarf2/align-1.C 2017-02-01 16:40:58.421906472 
+0100
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-O -g -dA" }
+// { dg-options "-O -g -dA -gno-strict-dwarf" }
 // { dg-final { scan-assembler-times " DW_AT_alignment" 1 { xfail { 
powerpc-ibm-aix* } } } }
 
 int __attribute__((__aligned__(64))) i;
--- gcc/testsuite/g++.dg/debug/dwarf2/align-2.C.jj  2017-01-31 
09:26:00.0 +0100
+++ gcc/testsuite/g++.dg/debug/dwarf2/align-2.C 2017-02-01 16:41:01.340868851 
+0100
@@ -1,5 +1,5 @@
 // { dg-do compile }
-// { dg-options "-O -g -dA" }
+// { dg-options "-O -g -dA -gno-strict-dwarf" }
 // { dg-final { scan-assembler-times " DW_AT_alignment" 1 { xfail { 
powerpc-ibm-aix* } } } }
 
 typedef int __att

Re: [PATCH] XFAIL gcc.dg/graphite/scop-dsyrk.c

2017-02-01 Thread Sebastian Pop
Sorry for duplicates, I'm resending as plain text for the mailing list.

On Wed, Feb 1, 2017 at 6:57 AM, Richard Biener  wrote:
>
>
> The following XFAILs the testcases, making them fail reliably independelty
> of int/long type sizes and also providing new testcase variants that
> succeed reliably.
>
> I don't see us fixing the underlying niter analysis issue for GCC 7.
>

I agree.  There is a way with isl to represent the modulo expressions for niter.
It would take some time to properly implement.

>
> Tested on x86_64-unknown-linux-gnu with {,-m32}, applied.
>

Thanks,
Sebastian


[PATCH, rs6000] Fix PR70012 (gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c)

2017-02-01 Thread Bill Schmidt
Hi,

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70012 reports that the subject
test case is now failing in some circumstances.  Prior to POWER8, the test
was checking for conditions that apply when data alignment is unknown; either
peeling for alignment or versioning for alignment may be used.  With POWER8,
however, neither is necessary because we have efficient unaligned memory
accesses.  So the test case needs some adjustment.

Interpreting the original intent of the test case is a little difficult.
The use of the vect_alignment_reachable test seems odd.  Looking at
testsuite/lib/target-supports.exp, vect_alignment_reachable is equivalent
to vect_aligned_arrays || natural_alignment_32.  But vect_aligned_arrays
is always 0 for powerpc*-*-*, so we might as well just be testing
natural_alignment_32.

It appears that the intent is for peeling to occur for 64-bit Darwin, but
otherwise versioning for alignment is expected.  So actually 
vect_alignment_reachable should be replaced by ! natural_alignment_32,
and ! vect_alignment_reachable should be replaced by natural_alignment_32.
In other words, for whatever reason these tests appear to be backward.
I suspect there have been changes to target-supports.exp over time that
have left this test slightly bit-rotten.

I've added a separate test for whether versioning occurs (which should
happen for server targets on POWER7, for example), since just testing for
vectorization doesn't test this.

I asked Iain Sandoe to test this on Darwin, and he reported that this is
not quite right for darwin9 with -m64.  Unfortunately he will be traveling
for a while and won't be able to investigate till he returns.  He's asked
me to leave the test as is without skipping Darwin so that the bug for
Darwin is not hidden, and he and I will work together to fix that at a
later time.  But I would like to get this issue cleared up for the server
targets now, hence the patch submission.

Tested on powerpc64le-unknown-linux-gnu (POWER8) and on
powerpc64-unknown-linux-gnu (POWER7) with correct behavior.  Is this ok
for trunk?

Thanks,
Bill


2017-02-01  Bill Schmidt  

* gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c: Adjust test
conditions.


Index: gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c
===
--- gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c (revision 
245029)
+++ gcc/testsuite/gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c (working copy)
@@ -36,7 +36,12 @@ int main (void)
 } 
 
 /* Peeling to align the store is used. Overhead of peeling is too high.  */
-/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" { 
target vector_alignment_reachable } } } */
+/* { dg-final { scan-tree-dump-times "vectorization not profitable" 1 "vect" { 
target { ! natural_alignment_32 } } } } */
 
-/* Versioning to align the store is used. Overhead of versioning is not too 
high.  */
-/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target {! 
vector_alignment_reachable} } } } */
+/* Vectorization occurs, either because overhead of versioning is not
+   too high, or because the hardware supports efficient unaligned accesses.  */
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 1 "vect" { target { 
natural_alignment_32 } } } } */
+
+/* Versioning to align the store is used.  Overhead of versioning is not 
+   too high.  */
+/* { dg-final { scan-tree-dump-times "loop versioned for vectorization to 
enhance alignment" 1 "vect" { target { natural_alignment_32 && { ! 
vect_hw_misalign } } } } } */



Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-02-01 Thread Alexander Monakov
On Wed, 1 Feb 2017, Jakub Jelinek wrote:
> > Yes; I imagine the approach taken in patch 2/5 can be extended to achieve 
> > this.
> > That is, instead of just storing a flag 'bool in_simtreg' in struct loop, 
> > store
> > pointers to corresponding SIMT_ENTER/EXIT gimple statements, use a similar
> > upwards walk on loop tree to discover if we're inlining into a SIMT region, 
> > and
> > if yes, adjust their argument lists.  Does this sound ok?
> 
> I'd prefer the abnormal edges and flags on the vars, if it can work, but
> won't fight for that hard.

Sorry, I'm uncomfortable with that because introducing abnormal edges seems like
a big hammer, e.g. it constrains non-privatized variables too.  And they don't
seem to be rigorously defined, so to me that leaves some uncertainty -- as
opposed to an approach I seek that makes constraints obvious in the IR.

> That said, I think pointers to gimple stmts in struct loop or something
> similar is problematic, you'd need to adjust those whenever something would
> remove those stmts, or e.g. duplicate the loop and stmts, handle those
> during inlining (if you inline some function with SIMT_ENTER/EXIT in them)
> etc.  Trying to find those stmts on preheader or in exit block from the
> marked loop might be easier.

Ah, sorry, so I'd need to keep the bool flag, and for SIMT_ENTER walk the
dominator tree upwards, scanning each bb until I find it (and likewise on
postdominator tree for SIMT_EXIT).

Alternatively, if simduid is already properly remapped, we could assign to it
when calling SIMT_ENTER, and then just look up its defining statement?

Alexander


Re: [PATCH 0/5] OpenMP/PTX: improve correctness in SIMD regions

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 08:09:27PM +0300, Alexander Monakov wrote:
> > That said, I think pointers to gimple stmts in struct loop or something
> > similar is problematic, you'd need to adjust those whenever something would
> > remove those stmts, or e.g. duplicate the loop and stmts, handle those
> > during inlining (if you inline some function with SIMT_ENTER/EXIT in them)
> > etc.  Trying to find those stmts on preheader or in exit block from the
> > marked loop might be easier.
> 
> Ah, sorry, so I'd need to keep the bool flag, and for SIMT_ENTER walk the
> dominator tree upwards, scanning each bb until I find it (and likewise on
> postdominator tree for SIMT_EXIT).
> 
> Alternatively, if simduid is already properly remapped, we could assign to it
> when calling SIMT_ENTER, and then just look up its defining statement?

I believe simduid doesn't have a defining stmt, and it is a decl, not
SSA_NAME.  So, if SIMT_ENTER would set an SSA_NAME using simduid as
underlying variable (and perhaps also use the (D) and SIMT_EXIT consume it,
you could look at the single user of the default def for the decl
(SIMT_ENTER) and from there the single user of the SIMT_ENTER result
(SIMT_EXIT).

Jakub


Re: [PATCH] Fix __atomic to not implement atomic loads with CAS.

2017-02-01 Thread Torvald Riegel
On Mon, 2017-01-30 at 19:54 +0100, Torvald Riegel wrote:
> This patch fixes the __atomic builtins to not implement supposedly
> lock-free atomic loads based on just a compare-and-swap operation.

After an off-list OK by Jakub, I have committed this as r245098.
Jakub will take care of the OpenMP side in a follow-up patch.



Re: [gomp4] add support for derived types in ACC UPDATE

2017-02-01 Thread Cesar Philippidis
On 01/30/2017 02:08 AM, Thomas Schwinge wrote:
> Hi Cesar!
> 
> On Thu, 10 Nov 2016 09:38:33 -0800, Cesar Philippidis 
>  wrote:
>> This patch has been committed to gomp-4_0-branch.
> 
>> --- a/gcc/fortran/openmp.c
>> +++ b/gcc/fortran/openmp.c
> 
>> @@ -242,7 +243,8 @@ gfc_match_omp_variable_list (const char *str, 
>> gfc_omp_namelist **list,
>>  case MATCH_YES:
>>gfc_expr *expr;
>>expr = NULL;
>> -  if (allow_sections && gfc_peek_ascii_char () == '(')
>> +  if (allow_sections && gfc_peek_ascii_char () == '('
>> +  || allow_derived && gfc_peek_ascii_char () == '%')
>>  {
>>gfc_current_locus = cur_loc;
>>m = gfc_match_variable (&expr, 0);
> 
> [...]/source-gcc/gcc/fortran/openmp.c: In function 'match 
> {+gfc_match_omp_variable_list(const char*, gfc_omp_namelist**, bool, bool*, 
> gfc_omp_namelist***, bool, bool)':+}
> [...]/source-gcc/gcc/fortran/openmp.c:246:23: warning: suggest 
> parentheses around '&&' within '||' [-Wparentheses]
> if (allow_sections && gfc_peek_ascii_char () == '('
>^
> 
>> --- a/gcc/fortran/trans-openmp.c
>> +++ b/gcc/fortran/trans-openmp.c
>> @@ -1938,7 +1938,66 @@ gfc_trans_omp_clauses_1 (stmtblock_t *block, 
>> gfc_omp_clauses *clauses,
>>tree decl = gfc_get_symbol_decl (n->sym);
>>if (DECL_P (decl))
>>  TREE_ADDRESSABLE (decl) = 1;
>> -  if (n->expr == NULL || n->expr->ref->u.ar.type == AR_FULL)
>> +  /* Handle derived-typed members for OpenACC Update.  */
>> +  if (n->sym->ts.type == BT_DERIVED
>> +  && n->expr != NULL && n->expr->ref != NULL
>> +  && (n->expr->ref->next == NULL
>> +  || (n->expr->ref->next != NULL
>> +  && n->expr->ref->next->type == REF_ARRAY
>> +  && n->expr->ref->next->u.ar.type == AR_FULL)))
>> +{
>> +  gfc_ref *ref = n->expr->ref;
>> +  tree orig_decl = decl;
> 
> [...]/source-gcc/gcc/fortran/trans-openmp.c: In function 'tree_node* 
> gfc_trans_omp_clauses_1(stmtblock_t*, gfc_omp_clauses*, locus, bool)':
> [...]/source-gcc/gcc/fortran/trans-openmp.c:1947:10: warning: unused 
> variable 'orig_decl' [-Wunused-variable]
>  tree orig_decl = decl;
>   ^

Not sure why this wasn't caught with a bootstrap build. Anyway, I've the
attached patch to gomp4 to fix this issue. It also corrects a problem
that you found with checking enabled.

>> --- /dev/null
>> +++ b/gcc/testsuite/gfortran.dg/goacc/derived-types.f90
>> @@ -0,0 +1,78 @@
>> +! Test ACC UPDATE with derived types. The DEVICE clause depends on an
>> +! accelerator being present.
> 
> I guess that "DEVICE" comment here is a leftover?  (Doesn't apply to a
> compile test.)
> 
>> +module dt
>> +  integer, parameter :: n = 10
>> +  type inner
>> + integer :: d(n)
>> +  end type inner
>> +  type dtype
>> + integer(8) :: a, b, c(n)
>> + type(inner) :: in
>> +  end type dtype
>> +end module dt
>> +
>> +program derived_acc
>> +  use dt
>> +  
>> +  implicit none
>> +  type(dtype):: var
>> +  integer i
>> +  !$acc declare create(var)
>> +  !$acc declare pcopy(var%a) ! { dg-error "Syntax error in OpenMP" }
>> +
>> +  !$acc update host(var)
>> +  !$acc update host(var%a)
>> +  !$acc update device(var)
>> +  !$acc update device(var%a)
>> +[...]
> 
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-fortran/update-2.f90
>> @@ -0,0 +1,285 @@
>> +! Test ACC UPDATE with derived types. The DEVICE clause depends on an
>> +! accelerator being present.
> 
> Why?  Shouldn "!$acc update device" just be a no-op for host execution?

Just more test coverage.

Cesar

2017-02-01  Cesar Philippidis  

	gcc/fortran/
	* openmp.c (gfc_match_omp_variable_list): Eliminate a warning when
	checking for derived types.
	* trans-openmp.c (gfc_trans_omp_clauses_1): Don't cast derived type
	pointers to void pointers.


diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
index 2782a8d..b3506d4 100644
--- a/gcc/fortran/openmp.c
+++ b/gcc/fortran/openmp.c
@@ -243,8 +243,8 @@ gfc_match_omp_variable_list (const char *str, gfc_omp_namelist **list,
 	case MATCH_YES:
 	  gfc_expr *expr;
 	  expr = NULL;
-	  if (allow_sections && gfc_peek_ascii_char () == '('
-	  || allow_derived && gfc_peek_ascii_char () == '%')
+	  if ((allow_sections && gfc_peek_ascii_char () == '(')
+	  || (allow_derived && gfc_peek_ascii_char () == '%'))
 	{
 	  gfc_current_locus = cur_loc;
 	  m = gfc_match_variable (&expr, 0);
diff --git a/gcc/fortran/trans-openmp.c b/gcc/fortran/trans-openmp.c
index 7826e1c..80aa421 100644
--- a/gcc/fortran/trans-openmp.c
+++ b/gcc/fortran/trans-openmp.c
@@ -1986,9 +1986,10 @@ gfc_trans_omp_clauses_1 (stmtblock_t *block, gfc_omp_clauses *clauses,
 	 TREE_TYPE (field), decl, field,
 	 NULL_TREE);
 		  type = TREE_TYPE (scratch);
-		  ptr = gfc_create_var (build_pointer_type (void_ty

Re: [gomp4] enable GOMP_MAP_FIRSTPRIVATE_INT in OpenACC

2017-02-01 Thread Cesar Philippidis
On 01/30/2017 02:18 AM, Thomas Schwinge wrote:
> Hi Cesar!
> 
> On Fri, 27 Jan 2017 07:45:52 -0800, Cesar Philippidis 
>  wrote:
>> If you take a close look at lower_omp_target, you'll notice that I'm
>> gave reference types special treatment. Specifically, I disabled this
>> optimization on non-INTEGER_TYPE and floating point values, because the
>> nvptx target was having some problems dereferencing boolean-typed
>> pointers. That's something I have on my TODO list to track down later.
> 
> Please file an issue as appropriate.

I filed an issue for this internally.

>> As for the performance gains, this optimization resulted in a
>> non-trivial speedup in CloverLeaf running on a Nvidia Pascal board.
>> CloverLeaf is somewhat special in that it consists of a lot of OpenACC
>> offloaded regions which gets called multiple times throughout its
>> execution. Consequently, it is I/O limited. The other benchmarks I ran
>> didn't benefit nearly as much as CloverLeaf. I chose a small data set
>> for CloverLeaf that only ran in 1.3s without the patch, and hence make
>> it even more I/O limited. After the patch, it ran 0.35s faster.
> 
> \o/ Yay!
> 
>> This patch has been applied to gomp-4_0-branch.
> 
> (Not reviewed in detail.)
> 
>> --- a/gcc/omp-low.c
>> +++ b/gcc/omp-low.c
> 
>> +static tree
>> +convert_from_firstprivate_pointer (tree var, bool is_ref, gimple_seq *gs)
>> +{
>> +  tree type = TREE_TYPE (var);
>> +  tree new_type = NULL_TREE;
>> +  tree tmp = NULL_TREE;
>> +  tree inner_type = NULL_TREE;
> 
> [...]/source-gcc/gcc/omp-low.c: In function 'tree_node* 
> convert_from_firstprivate_pointer(tree, bool, gimple**)':
> [...]/source-gcc/gcc/omp-low.c:16515:8: warning: unused variable 
> 'inner_type' [-Wunused-variable]
> 
> 
>> --- /dev/null
>> +++ b/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90
> 
> I see:
> 
> {+FAIL: libgomp.oacc-fortran/firstprivate-int.f90 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O  
> (internal compiler error)+}
> {+FAIL: libgomp.oacc-fortran/firstprivate-int.f90 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O   4 
> blank line(s) in output+}
> {+FAIL: libgomp.oacc-fortran/firstprivate-int.f90 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O  (test 
> for excess errors)+}
> {+UNRESOLVED: libgomp.oacc-fortran/firstprivate-int.f90 
> -DACC_DEVICE_TYPE_nvidia=1 -DACC_MEM_SHARED=0 -foffload=nvptx-none  -O  
> compilation failed to produce executable+}
> 
> That's the nvptx offloading compiler configured with
> "--enable-checking=yes,df,fold,rtl":
> 
> 
> [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90: 
> In function 'MAIN__._omp_fn.1':
> 
> [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0:
>  error: conversion of register to a different size
> VIEW_CONVERT_EXPR(_17);
> 
> _18 = VIEW_CONVERT_EXPR(_17);
> 
> [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0:
>  error: conversion of register to a different size
> VIEW_CONVERT_EXPR(_20);
> 
> _21 = VIEW_CONVERT_EXPR(_20);
> 
> [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0:
>  error: conversion of register to a different size
> VIEW_CONVERT_EXPR(_23);
> 
> _24 = VIEW_CONVERT_EXPR(_23);
> 
> [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0:
>  error: conversion of register to a different size
> VIEW_CONVERT_EXPR(_26);
> 
> _27 = VIEW_CONVERT_EXPR(_26);
> 
> [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:55:0:
>  internal compiler error: verify_gimple failed
> 0xa67d75 verify_gimple_in_cfg(function*, bool)
> [...]/source-gcc/gcc/tree-cfg.c:5125
> 0x94ebbc execute_function_todo
> [...]/source-gcc/gcc/passes.c:1958
> 0x94f513 execute_todo
> [...]/source-gcc/gcc/passes.c:2010
> 
> 
> And with "-m32" multilib testing, I see:
> 
> {+FAIL: libgomp.oacc-fortran/firstprivate-int.f90 
> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O  (test for 
> excess errors)+}
> {+UNRESOLVED: libgomp.oacc-fortran/firstprivate-int.f90 
> -DACC_DEVICE_TYPE_host=1 -DACC_MEM_SHARED=1 -foffload=disable  -O  
> compilation failed to produce executable+}
> 
> That is:
> 
> 
> [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:10:18:
>  Error: Kind 16 not supported for type INTEGER at (1)
> 
> [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:16:18:
>  Error: Kind 16 not supported for type LOGICAL at (1)
> 
> [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:115:18:
>  Error: Kind 16 not supported for type INTEGER at (1)
> 
> [...]/source-gcc/libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90:121:18:
>  Error: Kind 16 no

Re: [wwwdocs] changes.html - document new warning options

2017-02-01 Thread Gerald Pfeifer
On Tue, 31 Jan 2017, Martin Sebor wrote:
> Thanks for the careful review (and debugging)!

Thanks for taking the time to prepare all this to begin with. ;-)

On Wed, 1 Feb 2017, Jakub Jelinek wrote:
>>>   void f (size_t n)
>>>   {
>>> char *d = alloca (n)
> Missing semicolon after alloca (n)

Martin might argue that this was covered by the ellipsis dots
in the following line ;-), but I admit that's a little tongue
in cheek and went ahead with the patch below.

Thanks for your careful reviews, Jakub!

Gerald

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.52
diff -u -r1.52 changes.html
--- changes.html1 Feb 2017 10:16:47 -   1.52
+++ changes.html1 Feb 2017 19:05:12 -
@@ -368,7 +368,7 @@
   
 void f (size_t n)
 {
-  char *d = alloca (n)
+  char *d = alloca (n);
   …
 }
 


Re: [patch 79279] combine/simplify_set issue

2017-02-01 Thread Segher Boessenkool
Hi Aurelien,

On Wed, Feb 01, 2017 at 10:02:56AM +0100, Aurelien Buhrig wrote:
> >> This patch fixes a combiner bug in simplify_set which calls
> >> CANNOT_CHANGE_MODE_CLASS with wrong mode params.
> >> It occurs when trying to simplify paradoxical subregs of hw regs (whose
> >> natural mode is lower than a word).
> >>
> >> In fact, changing from (set x:m1 (subreg:m1 (op:m2))) to (set (subreg:m2
> >> x)  op2) is valid if REG_CANNOT_CHANGE_MODE_P (x, m1, m2) is false
> >> and REG_CANNOT_CHANGE_MODE_P (x, GET_MODE (src), GET_MODE (SUBREG_REG
> >> (src))
> > r62212 (in 2003) changed it to what we have now, it used to be what you
> > want to change it back to.
> >
> > You say m1 > m2, which means you have WORD_REGISTER_OPERATIONS defined.
> No, just some hard regs whose natural mode size is 2 and UNIT_PER_WORD
> is 4...

You said it is a paradoxical subreg?  Or do you mean the result is
a paradoxical subreg?

> > Where does this transformation go wrong?  Why is the resulting insn
> > recognised at all?  For example, general_operand should refuse it.
> > Maybe you have custom *_operand that do not handle subreg correctly?
> >
> > The existing code looks correct: what we want to know is if an m2
> > interpreted as an m1 yields the correct value.  We might *also* want
> > your condition, but I'm not sure about that.
> OK, looks like both m1->m2 & m2 -> m1 checks would be needed, but the m1
> -> m2 should be filtererd by valid predicates (general_operand).
> Sorry about that.

Hrm, maybe you can show the RTL before and after this transform?

> >> OK to commit?
> > Sorry, no.  We're currently in development stage 4, and this is not a
> > regression (see ).  But we can
> > of course discuss this further, and you can repost the patch when stage 1
> > opens (a few months from now) if you still want it.
> OK, but not sure if it needs to be patched any more.

Let's work that out then.


Segher


Re: [PATCH, wwwdocs/ARM] Mention new rmprofile value for --with-multilib-list

2017-02-01 Thread Gerald Pfeifer

On Mon, 30 Jan 2017, Thomas Preudhomme wrote:

ARM backend now support a new set of multilib libraries enabled with
--with-multilib-list=rmprofile [1]. This patch documents it in the changes for
GCC 7.


I will admit that even following this list, I had to look up what
this really referred to. ;-)

And once I found the documentation, I figured I could as well provide
a reference for others (since most consumers of our release notes will
be even farther away than me).

Below is the patch I applied.

Gerald

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-7/changes.html,v
retrieving revision 1.53
diff -u -r1.53 changes.html
--- changes.html1 Feb 2017 19:05:53 -   1.53
+++ changes.html1 Feb 2017 19:20:18 -
@@ -778,7 +778,9 @@
 
  The configure option --with-multilib-list now accepts the
  value rmprofile to build multilib libraries for a range of
-  embedded targets.
+  embedded targets.  See our
+  https://gcc.gnu.org/install/configure.html";>installation
+  instructions for details.

   




Re: [PATCH, rs6000] Fix PR70012 (gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c)

2017-02-01 Thread Segher Boessenkool
On Wed, Feb 01, 2017 at 10:42:31AM -0600, Bill Schmidt wrote:
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70012 reports that the subject
> test case is now failing in some circumstances.  Prior to POWER8, the test
> was checking for conditions that apply when data alignment is unknown; either
> peeling for alignment or versioning for alignment may be used.  With POWER8,
> however, neither is necessary because we have efficient unaligned memory
> accesses.  So the test case needs some adjustment.

> Tested on powerpc64le-unknown-linux-gnu (POWER8) and on
> powerpc64-unknown-linux-gnu (POWER7) with correct behavior.  Is this ok
> for trunk?

Yes, okay.  Thanks,


Segher


> 2017-02-01  Bill Schmidt  
> 
>   * gcc.dg/vect/costmodel/ppc/costmodel-vect-33.c: Adjust test
>   conditions.


[PATCH] Fix ada __gnat_killprocesstree (PR ada/79309)

2017-02-01 Thread Jakub Jelinek
Hi!

As mentioned in the PR, strncat does something different from what the
code expects (the last argument is the maximum number of characters
to be copied, rather than maximum number of characters in the destination
buffer).  As for the (highly unlikely, because d->d_name really should be
the pid numbers plus a couple of extra dirnames) case of truncated name
trying to open such truncated filename wouldn't work anyway, this
patch just skips it altogether if there would be overflow.
GCC strlen pass should be able to optimize all the 3 calls into memcpy,
using strlen value from the earlier strlen call.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2017-02-01  Jakub Jelinek  

PR ada/79309
* adaint.c (__gnat_killprocesstree): Don't clear statfile
before overwriting it.  If d->d_name is too long, skip trying
to construct the filename and open it.  Use strcpy/strcat
instead of strncpy/strncat.

--- gcc/ada/adaint.c.jj 2017-01-12 22:28:59.293871830 +0100
+++ gcc/ada/adaint.c2017-02-01 09:18:47.027598963 +0100
@@ -3396,14 +3396,16 @@ void __gnat_killprocesstree (int pid, in
 {
   if ((d->d_type & DT_DIR) == DT_DIR)
 {
-  char statfile[64] = { 0 };
+ char statfile[64];
   int _pid, _ppid;
 
   /* read /proc//stat */
 
-  strncpy (statfile, "/proc/", sizeof(statfile));
-  strncat (statfile, d->d_name, sizeof(statfile));
-  strncat (statfile, "/stat", sizeof(statfile));
+ if (strlen (d->d_name) > sizeof (statfile) - sizeof ("/proc//stat"))
+   continue;
+ strcpy (statfile, "/proc/");
+ strcat (statfile, d->d_name);
+ strcat (statfile, "/stat");
 
   FILE *fd = fopen (statfile, "r");
 

Jakub


[wwwdocs] gcc-3.2/c++-abi.html - adjust link text for C++ ABI

2017-02-01 Thread Gerald Pfeifer
Applied, after a friendly hint off-list.

At first I was wondering whether to leave the text, but then it
had been updated already earlier (to mentor.com), so why not fix
it again?

Gerald

Index: gcc-3.2/c++-abi.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-3.2/c++-abi.html,v
retrieving revision 1.8
diff -u -r1.8 c++-abi.html
--- gcc-3.2/c++-abi.html28 Jun 2015 14:43:58 -  1.8
+++ gcc-3.2/c++-abi.html1 Feb 2017 20:16:39 -
@@ -10,7 +10,7 @@
 The main point of the GCC 3.2 release is to have a relatively
 stable and common C++ ABI for GNU/Linux and BSD usage, following
 the documentation at
-http://mentorembedded.github.io/cxx-abi/";>http://sourcery.mentor.com/public/cxx-abi/.
+http://mentorembedded.github.io/cxx-abi/";>http://mentorembedded.github.io/cxx-abi/.

 Unfortunately this means that GCC 3.2 is incompatible with GCC 3.0
 and GCC 3.1 releases.


Re: [PATCH, pr63256] update powerpc dg options

2017-02-01 Thread Aaron Sawdey
On Thu, 2017-01-19 at 17:00 -0600, Aaron Sawdey wrote:
> SMS does process the loop in sms-8.c on powerpc now so I have updated
> the options to reflect that.
> 
> Test now passes on powerpc -m64/-m32/-m32 -mpowerpc64. Ok for trunk?
> 
> testsuite/ChangeLog
> 2017-01-19  Aaron Sawdey  
>   * gcc.dg/sms-8.c: Update options for powerpc*-*-*.

Ping.

-- 
Aaron Sawdey, Ph.D.  acsaw...@linux.vnet.ibm.com
050-2/C113  (507) 253-7520 home: 507/263-0782
IBM Linux Technology Center - PPC Toolchain



[Ada] Fix PR ada/79309

2017-02-01 Thread Eric Botcazou
This is the questionable string logic in __gnat_killprocesstree detected by 
the new -Wstringop-overflow= warning.  The fix is Jakub's.

Tested on x86_64-suse-linux, applied on the mainline.


2017-02-01  Eric Botcazou  
Jakub Jelinek  

PR ada/79309
* adaint.c (__gnat_killprocesstree): Fix broken string handling.

-- 
Eric BotcazouIndex: adaint.c
===
--- adaint.c	(revision 244917)
+++ adaint.c	(working copy)
@@ -3396,14 +3396,16 @@ void __gnat_killprocesstree (int pid, in
 {
   if ((d->d_type & DT_DIR) == DT_DIR)
 {
-  char statfile[64] = { 0 };
+  char statfile[64];
   int _pid, _ppid;
 
   /* read /proc//stat */
 
-  strncpy (statfile, "/proc/", sizeof(statfile));
-  strncat (statfile, d->d_name, sizeof(statfile));
-  strncat (statfile, "/stat", sizeof(statfile));
+  if (strlen (d->d_name) >= sizeof (statfile) - sizeof ("/proc//stat"))
+continue;
+  strcpy (statfile, "/proc/");
+  strcat (statfile, d->d_name);
+  strcat (statfile, "/stat");
 
   FILE *fd = fopen (statfile, "r");
 


Re: [PATCH] Fix ada __gnat_killprocesstree (PR ada/79309)

2017-02-01 Thread Eric Botcazou
> As mentioned in the PR, strncat does something different from what the
> code expects (the last argument is the maximum number of characters
> to be copied, rather than maximum number of characters in the destination
> buffer).  As for the (highly unlikely, because d->d_name really should be
> the pid numbers plus a couple of extra dirnames) case of truncated name
> trying to open such truncated filename wouldn't work anyway, this
> patch just skips it altogether if there would be overflow.
> GCC strlen pass should be able to optimize all the 3 calls into memcpy,
> using strlen value from the earlier strlen call.
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 
> 2017-02-01  Jakub Jelinek  
> 
>   PR ada/79309
>   * adaint.c (__gnat_killprocesstree): Don't clear statfile
>   before overwriting it.  If d->d_name is too long, skip trying
>   to construct the filename and open it.  Use strcpy/strcat
>   instead of strncpy/strncat.

Sorry, I installed the fix in the meantime and our messages crossed.

> --- gcc/ada/adaint.c.jj   2017-01-12 22:28:59.293871830 +0100
> +++ gcc/ada/adaint.c  2017-02-01 09:18:47.027598963 +0100
> @@ -3396,14 +3396,16 @@ void __gnat_killprocesstree (int pid, in
>  {
>if ((d->d_type & DT_DIR) == DT_DIR)
>  {
> -  char statfile[64] = { 0 };
> +   char statfile[64];
>int _pid, _ppid;
> 
>/* read /proc//stat */
> 
> -  strncpy (statfile, "/proc/", sizeof(statfile));
> -  strncat (statfile, d->d_name, sizeof(statfile));
> -  strncat (statfile, "/stat", sizeof(statfile));
> +   if (strlen (d->d_name) > sizeof (statfile) - sizeof ("/proc//stat"))
> + continue;

I think you need ">=" here.

-- 
Eric Botcazou


Re: [PATCH] Fix ada __gnat_killprocesstree (PR ada/79309)

2017-02-01 Thread Jakub Jelinek
On Wed, Feb 01, 2017 at 09:47:40PM +0100, Eric Botcazou wrote:
> > 2017-02-01  Jakub Jelinek  
> > 
> > PR ada/79309
> > * adaint.c (__gnat_killprocesstree): Don't clear statfile
> > before overwriting it.  If d->d_name is too long, skip trying
> > to construct the filename and open it.  Use strcpy/strcat
> > instead of strncpy/strncat.
> 
> Sorry, I installed the fix in the meantime and our messages crossed.

Np.

> > --- gcc/ada/adaint.c.jj 2017-01-12 22:28:59.293871830 +0100
> > +++ gcc/ada/adaint.c2017-02-01 09:18:47.027598963 +0100
> > @@ -3396,14 +3396,16 @@ void __gnat_killprocesstree (int pid, in
> >  {
> >if ((d->d_type & DT_DIR) == DT_DIR)
> >  {
> > -  char statfile[64] = { 0 };
> > + char statfile[64];
> >int _pid, _ppid;
> > 
> >/* read /proc//stat */
> > 
> > -  strncpy (statfile, "/proc/", sizeof(statfile));
> > -  strncat (statfile, d->d_name, sizeof(statfile));
> > -  strncat (statfile, "/stat", sizeof(statfile));
> > + if (strlen (d->d_name) > sizeof (statfile) - sizeof ("/proc//stat"))
> > +   continue;
> 
> I think you need ">=" here.

I believe > is right.  sizeof (statfile) is 64, sizeof ("/proc//stat") is
12 (that includes the terminating '\0'), and 52 characters long d->d_name
still fits (6 bytes /proc/, 52 bytes d->d_name, 5 bytes /stat and 1 byte '\0')
while 53 characters are too much.  Equivalent of the above would be
  if (strlen (d->d_name) >= sizeof (statfile) - strlen ("/proc//stat"))

Jakub


Re: [PATCH] Fix ada __gnat_killprocesstree (PR ada/79309)

2017-02-01 Thread Eric Botcazou
> I believe > is right.  sizeof (statfile) is 64, sizeof ("/proc//stat") is
> 12 (that includes the terminating '\0'), and 52 characters long d->d_name
> still fits (6 bytes /proc/, 52 bytes d->d_name, 5 bytes /stat and 1 byte
> '\0') while 53 characters are too much.  Equivalent of the above would be
> if (strlen (d->d_name) >= sizeof (statfile) - strlen ("/proc//stat"))

For some reason I was convinced this was strlen instead of sizeof...  fixed.

-- 
Eric Botcazou


Re: PR79286, ira combine_and_move_insns in loops

2017-02-01 Thread Alan Modra
On Thu, Feb 02, 2017 at 12:18:31AM +1030, Alan Modra wrote:
> This patch cures PR79286 by restoring the REG_DEAD note test used
> prior to r235660, but modified to only exclude insns that may trap.
> I'd like to allow combine/move without a REG_DEAD note in loops
> because insns in loops often lack such notes, and I recall seeing
> quite a few cases at the time I wrote r235660 where loops benefited
> from allowing the combine/move to happen.

Ugh, the new testcase fails for x86 -m32 -Os, but not due to ira this
time but rather reload.  I haven't looked into what is going wrong in
reload yet, but the net result is the same:  The faulting mem read is
moved before the printf call.

There were no other testsuite regressions, apart from the random set
of fails I have been getting for a long time on x86_64 for
c-c++-common/ubsan/float-cast-overflow-10.c,
c-c++-common/ubsan/float-cast-overflow-2.c,
c-c++-common/ubsan/float-cast-overflow-8.c, and
c-c++-common/ubsan/overflow-mul-4.c.

What is the correct thing to do for a new testcase that fails like
this?  Add a dg-fail-if?  Assuming I or someone else can't fix the
reload fail.

The new testcase -Os failure occurs on gcc-4.x, gcc-5 and gcc-6, but
gcc-3.4 passes.

-- 
Alan Modra
Australia Development Lab, IBM


[doc] extend.texi - "lock critical sections"?

2017-02-01 Thread Gerald Pfeifer
Hi Andi, or Uros,

I am not sure, but got a pointer off-list.  Is the patch below
appropriate, or is the term "lock critical section" a special
one for x86?

Gerald

Index: extend.texi
===
--- extend.texi (revision 245106)
+++ extend.texi (working copy)
@@ -10103,7 +10103,7 @@
 @section x86-Specific Memory Model Extensions for Transactional Memory
 
 The x86 architecture supports additional memory ordering flags
-to mark lock critical sections for hardware lock elision. 
+to mark critical sections for hardware lock elision. 
 These must be specified in addition to an existing memory order to
 atomic intrinsics.
 


Re: [PATCH] Fix PR79278, amend ADJUST_FIELD_ALIGN interface

2017-02-01 Thread Segher Boessenkool
On Wed, Feb 01, 2017 at 11:59:10AM +0100, Richard Biener wrote:
> Wasn't successful in making a cross to ppc64-linux build its libobjc.

I'll do a native build.  Just the patch in the first message in this
thread?  And just running the testsuite is enough, or is there
something specific you want tested?


Segher


[wwwdocs] Update releasing.html

2017-02-01 Thread Gerald Pfeifer
I noticed that releasing.html had a reference to old-style (and
even incorrect) GCC release pages, and only one of the two files
that contrib/gennews processes per release.

Also, we do not ship gcc-core-... tarballs any more, so I removed
that reference and update to a newer version there as well.

Applied in the home it makes cutting a release a few seconds easier...

Gerald

Index: releasing.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/releasing.html,v
retrieving revision 1.46
diff -u -r1.46 releasing.html
--- releasing.html  13 Aug 2014 13:09:01 -  1.46
+++ releasing.html  1 Feb 2017 22:18:02 -
@@ -11,7 +11,8 @@
 
 
 Before rolling the release, update the release notes web pages
-(for example gcc-4.5.0/index.html)
+(for example gcc-7/changes.html and
+gcc-7/index.html)
 and ensure that they are all listed in contrib/gennews.
 On the announcement page for that release series, note the new
 release without removing information about any previous minor releases.
@@ -55,8 +56,7 @@
 option, whose argument must name the .tar.bz2 file for a
 previous release, in a directory containing all the
 .tar.bz2 files for that previous release (for example,
--p /some/where/gcc-4.2.2/gcc-4.2.2.tar.bz2, where there are
-also files such as 
/some/where/gcc-4.2.2/gcc-core-4.2.2.tar.bz2).
+-p /some/where/gcc-5.4.0/gcc-5.4.0.tar.bz2).
 
 Upload the release to ftp.gnu.org.
 


Re: [PATCH] Fix PR79278, amend ADJUST_FIELD_ALIGN interface

2017-02-01 Thread Segher Boessenkool
On Tue, Jan 31, 2017 at 10:01:46AM +0100, Richard Biener wrote:
>   * doc/tm.texi.in (ADJUST_FIELD_ALIGN): Adjust to take additional
>   type parameter.
>   * doc/tm.texi: Regenerate.

You didn't include tm.texi.in in the patch, only tm.texi .


Segher


New German PO file for 'gcc' (version 7.1-b20170101)

2017-02-01 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the German team of translators.  The file is available at:

http://translationproject.org/latest/gcc/de.po

(This file, 'gcc-7.1-b20170101.de.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.




GCC 6 patch: Add aligned attribute for m68k support

2017-02-01 Thread Ian Lance Taylor
This patch by John Paul Adrian Glaubitz fixes m68k libgo on the GCC 6
branch, by adding alignment required by the kernel but not otherwise
imposed by the m68k backend.  I bootstrapped and ran Go tests on
x86_64-pc-linux-gnu, Adrian tested on m68k.  Committed to GCC 6
branch.

Ian
@@ -, +, @@ 
---
 libgo/runtime/runtime.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)
--- a/libgo/runtime/runtime.h   
+++ a/libgo/runtime/runtime.h   
@@ -154,14 +154,14 @@ structLock
// Futex-based impl treats it as uint32 key,
// while sema-based impl as M* waitm.
// Used to be a union, but unions break precise GC.
-   uintptr key;
+   uintptr key __attribute__((aligned(4)));
 };
 struct Note
 {
// Futex-based impl treats it as uint32 key,
// while sema-based impl as M* waitm.
// Used to be a union, but unions break precise GC.
-   uintptr key;
+   uintptr key __attribute__((aligned(4)));
 };
 struct String
 {
-- 


Re: [PATCH] Add support for Fuchsia (OS)

2017-02-01 Thread Gerald Pfeifer
On Tue, 17 Jan 2017, Josh Conner via gcc-patches wrote:
> Attached is my recommended patch for changes to the web docs describing
> Fuchsia support. Please let me know if there's anything else I can do.

This looks fine (just remove the blank before "Fuchsia"); and
sorry for the delay getting back to this!

Gerald



Re: Go patch committed: Fixes for m68k

2017-02-01 Thread Ian Lance Taylor
On Mon, Jan 23, 2017 at 10:15 AM, Ian Lance Taylor  wrote:
> I committed a patch to the Go frontend and libgo to fix alignment
> issues on m68k.  This fixes PR 79037.  Bootstrapped and ran Go tests
> on x86_64-pc-linux-gnu.  Tested by John Paul Adrian Glaubitz on
> m68k-linux-gnu.  Committed to mainline.

Now also committed to GCC 6 branch, as follows.

Ian
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 245107)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -2175,11 +2175,26 @@
   is_common = true;
 }
 
+  // The current garbage collector requires that the GC symbol be
+  // aligned to at least a four byte boundary.  See the use of PRECISE
+  // and LOOP in libgo/runtime/mgc0.c.
+  int64_t align;
+  if (!sym_init->type()->backend_type_align(gogo, &align))
+go_assert(saw_errors());
+  if (align < 4)
+align = 4;
+  else
+{
+  // Use default alignment.
+  align = 0;
+}
+
   // Since we are building the GC symbol in this package, we must create the
   // variable before converting the initializer to its backend representation
   // because the initializer may refer to the GC symbol for this type.
   this->gc_symbol_var_ =
-gogo->backend()->implicit_variable(sym_name, sym_btype, false, true, 
is_common, 0);
+gogo->backend()->implicit_variable(sym_name, sym_btype, false, true,
+   is_common, align);
   if (phash != NULL)
 *phash = this->gc_symbol_var_;
 
Index: libgo/runtime/go-unsafe-pointer.c
===
--- libgo/runtime/go-unsafe-pointer.c   (revision 245107)
+++ libgo/runtime/go-unsafe-pointer.c   (working copy)
@@ -36,7 +36,8 @@
   sizeof REFLECTION - 1
 };
 
-const uintptr unsafe_Pointer_gc[] = {sizeof(void*), GC_APTR, 0, GC_END};
+const uintptr unsafe_Pointer_gc[] __attribute__((aligned(4))) =
+  {sizeof(void*), GC_APTR, 0, GC_END};
 
 const struct __go_type_descriptor unsafe_Pointer =
 {
Index: libgo/runtime/parfor.c
===
--- libgo/runtime/parfor.c  (revision 245107)
+++ libgo/runtime/parfor.c  (working copy)
@@ -10,7 +10,7 @@
 struct ParForThread
 {
// the thread's iteration space [32lsb, 32msb)
-   uint64 pos;
+   uint64 pos __attribute__((aligned(8)));
// stats
uint64 nsteal;
uint64 nstealcnt;
Index: libgo/runtime/runtime.h
===
--- libgo/runtime/runtime.h (revision 245109)
+++ libgo/runtime/runtime.h (working copy)
@@ -431,7 +431,7 @@
// otherwise parfor may return while 
other threads are still working
ParForThread *thr;  // array of thread descriptors
// stats
-   uint64 nsteal;
+   uint64 nsteal __attribute__((aligned(8))); // force alignment for m68k
uint64 nstealcnt;
uint64 nprocyield;
uint64 nosyield;


[PATCH] use zero as the lower bound for a signed-unsigned range (PR 79327)

2017-02-01 Thread Martin Sebor

PR tree-optimization/79327 exposes a bug in the gimple-ssa-sprintf
pass in the computation of the expected size of output of an integer
format directive whose argument is in a signed-unsigned range (such
as [-12, 345]).  The current algorithm on trunk uses the bounds of
the range to compute the range of output when it should be using
zero to obtain the minimum output and whichever of the two bounds
results in more output to compute the maximum.  This can lead to
both wrong code (as pointed out in the referenced bug) to and to
the wrong range printed in the warnings.

The attached patch fixes that bug.

Martin
PR tree-optimization/79327 - wrong code at -O2 and -fprintf-return-value

gcc/ChangeLog:

	PR tree-optimization/79327
	* gimple-ssa-sprintf.c (format_integer): Replace with zero
	the lower bound of an argument in a signed-unsigned range.

gcc/testsuite/ChangeLog:

	PR tree-optimization/79327
	* gcc.dg/tree-ssa/builtin-sprintf-warn-12.c: New test.
	* gcc.dg/tree-ssa/pr79327.c: New test.

diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 11f4174..4f0670e 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -1382,13 +1382,26 @@ format_integer (const directive &dir, tree arg)
 would reflect the largest possible precision (i.e., INT_MAX).  */
   res.range.min = format_integer (dir, argmax).range.min;
   res.range.max = format_integer (dir, argmin).range.max;
-}
 
-  if (res.range.max < res.range.min)
-{
-  unsigned HOST_WIDE_INT tmp = res.range.max;
-  res.range.max = res.range.min;
-  res.range.min = tmp;
+  if (res.range.max < res.range.min)
+   {
+ unsigned HOST_WIDE_INT tmp = res.range.max;
+ res.range.max = res.range.min;
+ res.range.min = tmp;
+   }
+
+  if (!TYPE_UNSIGNED (argtype)
+ && tree_int_cst_lt (integer_zero_node, argmax)
+ && tree_int_cst_lt (argmin, integer_zero_node))
+   {
+ /* The minimum output for a signed argument in a negative-positive
+range is that of zero.  */
+ unsigned HOST_WIDE_INT
+   nzero = format_integer (dir, integer_zero_node).range.min;
+
+ if (nzero < res.range.min)
+   res.range.min = nzero;
+   }
 }
 
   res.range.likely = res.knownrange ? res.range.max : res.range.min;
diff --git a/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-12.c b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-12.c
new file mode 100644
index 000..35db400
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/tree-ssa/builtin-sprintf-warn-12.c
@@ -0,0 +1,228 @@
+/* PR tree-optimization79/327 - wrong code at -O2 and -fprintf-return-value
+   { dg-do compile }
+   { dg-options "-O2 -Wall -Wformat-overflow=1 -ftrack-macro-expansion=0" }
+   { dg-require-effective-target int32plus } */
+
+typedef __SIZE_TYPE__  size_t;
+typedef __WCHAR_TYPE__ wchar_t;
+
+#define INT_MAX __INT_MAX__
+#define INT_MIN (-INT_MAX - 1)
+
+/* When debugging, define LINE to the line number of the test case to exercise
+   and avoid exercising any of the others.  The buffer and objsize macros
+   below make use of LINE to avoid warnings for other lines.  */
+#ifndef LINE
+# define LINE 0
+#endif
+
+void sink (char*, char*);
+
+int dummy_sprintf (char*, const char*, ...);
+
+char buffer [256];
+extern char *ptr;
+
+int int_range (int min, int max)
+{
+  extern int int_value (void);
+  int n = int_value ();
+  return n < min || max < n ? min : n;
+}
+
+unsigned uint_range (unsigned min, unsigned max)
+{
+  extern unsigned uint_value (void);
+  unsigned n = uint_value ();
+  return n < min || max < n ? min : n;
+}
+
+/* Evaluate to an array of SIZE characters when non-negative, or to
+   a pointer to an unknown object otherwise.  */
+#define buffer(size)	\
+  ((0 <= size) ? buffer + sizeof buffer - (size) : ptr)
+
+/* Helper to expand function to either __builtin_f or dummy_f to
+   make debugging GCC easy.  */
+#define FUNC(f)			\
+  ((!LINE || LINE == __LINE__) ? __builtin_ ## f : dummy_ ## f)
+
+/* Macro to verify that calls to __builtin_sprintf (i.e., with no size
+   argument) issue diagnostics by correctly determining the size of
+   the destination buffer.  */
+#define T(size, ...)		\
+  (FUNC (sprintf) (buffer (size),  __VA_ARGS__),		\
+   sink (buffer, ptr))
+
+/* Return a signed integer in the range [MIN, MAX].  */
+#define R(min, max)  int_range (min, max)
+
+/* Return a unsigned integer in the range [MIN, MAX].  */
+#define U(min, max)  uint_range (min, max)
+
+/* Exercise the hh length modifier with ranges.  */
+void test_hh (void)
+{
+  T (0, "%hhi", R (  -1,0));/* { dg-warning "between 1 and 2 bytes" } */
+  T (0, "%hhi", R (  -1,1));/* { dg-warning "between 1 and 2 bytes" } */
+  T (0, "%hhi", R (  -1,   12));/* { dg-warning "between 1 and 2 bytes" } */
+  T (0, "%hhi", R (  -1,  123));/* { dg-warning "between 1 and 3 bytes" } */
+  T (0, "%hhi", R (  -1,  128));/* { dg-warning "bet

Re: [PATCH] relax -Wformat-overflow for precision ranges (PR 79275)

2017-02-01 Thread Martin Sebor

On 01/31/2017 03:33 PM, Jeff Law wrote:

On 01/30/2017 02:28 PM, Martin Sebor wrote:

Bug 79275 - -Wformat-overflow false positive exceeding INT_MAX in
glibc sysdeps/posix/tempname.c points out a false positive found
during a Glibc build and caused by the checker using the upper
bound of a range of precisions in string directives with string
arguments of non-constant length.  The attached patch relaxes
the checker to use the lower bound instead when appropriate.

Martin

gcc-79275.diff


PR middle-end/79275 -  -Wformat-overflow false positive exceeding
INT_MAX in glibc sysdeps/posix/tempname.c

gcc/testsuite/ChangeLog:

PR middle-end/79275
* gcc.dg/tree-ssa/builtin-sprintf-warn-11.c: New test.
* gcc.dg/tree-ssa/pr79275.c: New test.

gcc/ChangeLog:

PR middle-end/79275
* gimple-ssa-sprintf.c (get_string_length): Set lower bound to zero.
(format_string): Tighten up the range of output for non-constant
strings and correct the expected range for wide non-constant strings.

My general inclination is to ask this to wait for gcc-8 as it is not a
regression, but instead a false positive in a new warning.


I would feel better if the patch were committed not just because
of the false positives but also because it corrects the range for
non-constant wide strings (i.e., for something like
snprintf (0, 0, "%ls", rand () ? L"\u03a6" : L"" ) trunk thinks
the output is at most 1 byte when in reality it's more like 2
bytes (U+03a6 is "\xCE\xA6" in UTF-8).  It's probably a pretty
rare case but still.



However, if we see this triggering with any significant frequency, then
we should reevaluate.  Getting Marek's build logs and grepping through
them might guide us a bit on this...


I'm not sure what the rationale is for length of zero at level 1 and
length of one at higher levels for unknown strings.  I guess I can
kindof see the former, though I suspect if we looked at the actual
strings, zero length would actually be uncommon.

For level 1 and above assuming a single character seems way too
tolerant.  We're issuing a "may" warning, so ISTM we ought to be
assuming a much larger length here.  I realize that makes a lot more
noise for the warning, but doesn't that better reflect what may happen?


My rationale for zero and one for strings of unknown length was
to try to avoid false positives.  The checker does use the size
of the array when a string of unknown length points to one as
the likely length.  That's run into the expected pushback but
at least there is (what I consider) a defensible rationale for
it (the string could potentially be as long as the array,
otherwise why make the array that big?)  It also uses the length
of the longest of the strings an argument is known to point to,
on the same basis.

My biggest concern with being more aggressive than that (besides
the pushback) is that I can't think of a good function to compute
the size (it can't very well be a constant).

Martin




diff --git a/gcc/gimple-ssa-sprintf.c b/gcc/gimple-ssa-sprintf.c
index 8261a44..c0c0a5f 100644
--- a/gcc/gimple-ssa-sprintf.c
+++ b/gcc/gimple-ssa-sprintf.c
@@ -1986,43 +1987,89 @@ format_string (const directive &dir, tree arg)
 }
   else
 {
-  /* For a '%s' and '%ls' directive with a non-constant string,
- the minimum number of characters is the greater of WIDTH
- and either 0 in mode 1 or the smaller of PRECISION and 1
- in mode 2, and the maximum is PRECISION or -1 to disable
- tracking.  */
+  /* For a '%s' and '%ls' directive with a non-constant string
(either
+ one of a number of strings of known length or an unknown string)
+ the minimum number of characters is lesser of PRECISION[0] and
+ the length of the shortest known string or zero, and the maximum
+ is the lessser of the length of the longest known string or

s/lessser/lesser/

Jeff




Re: [RFC] Bug lto/78140

2017-02-01 Thread kugan

Hi Richard,

On 30/01/17 21:08, Richard Biener wrote:

On Mon, Jan 30, 2017 at 12:23 AM, kugan
 wrote:

lto1: internal compiler error: Segmentation fault
0xdedc4b crash_signal
../../gcc/gcc/toplev.c:333
0xb46680 ipa_node_params_t::duplicate(cgraph_node*, cgraph_node*,
ipa_node_params*, ipa_node_params*)
../../gcc/gcc/ipa-prop.c:3819
0xb306a3
function_summary::symtab_duplication(cgraph_node*,
cgraph_node*, void*)
../../gcc/gcc/symbol-summary.h:187
0x85aba7 symbol_table::call_cgraph_duplication_hooks(cgraph_node*,
cgraph_node*)
../../gcc/gcc/cgraph.c:488
0x8765bf cgraph_node::create_clone(tree_node*, long, int, bool,
vec, bool, cgraph_node*, bitmap_head*, char
const*)
../../gcc/gcc/cgraphclones.c:522
0x166fb3b clone_inlined_nodes(cgraph_edge*, bool, bool, int*, int)
../../gcc/gcc/ipa-inline-transform.c:227
0x166fbd7 clone_inlined_nodes(cgraph_edge*, bool, bool, int*, int)
../../gcc/gcc/ipa-inline-transform.c:242
0x1670893 inline_call(cgraph_edge*, bool, vec*, int*, bool, bool*)
../../gcc/gcc/ipa-inline-transform.c:449
0x1665bd3 inline_small_functions
../../gcc/gcc/ipa-inline.c:2024
0x1667157 ipa_inline
../../gcc/gcc/ipa-inline.c:2434
0x1667fa7 execute
../../gcc/gcc/ipa-inline.c:2845



This is due to an existing issue. That is, in ipa_node_params_t::remove, 
m_vr and bits vectors are not set to null such that the gc can claim it.


I also noticed that we don't create m_vr and bits vectors. Attached 
patch does this. This was bootstrapped and regression tested with the 
above patch. I am now testing the attached patch alone.  Is this OK if 
no regressions?



gcc/ChangeLog:

2017-02-02  Kugan Vivekanandarajah  

* ipa-cp.c (ipcp_store_bits_results): Construct bits vector.
(ipcp_store_vr_results): Constrict m_vr vector.
* ipa-prop.c (ipa_node_params_t::remove): Set transaction 
summary to null.

(ipa_node_params_t::duplicate): Construct bits and m_vr vector.
(read_ipcp_transformation_info): Likewise.


Thanks,
Kugan


I tried similar think without variable structure like:
diff --git a/gcc/ipa-prop.h b/gcc/ipa-prop.h
index 93a2390c..b0cc832 100644
--- a/gcc/ipa-prop.h
+++ b/gcc/ipa-prop.h
@@ -525,7 +525,7 @@ struct GTY(()) ipcp_transformation_summary
   /* Known bits information.  */
   vec *bits;
   /* Value range information.  */
-  vec *m_vr;
+  vec *m_vr;
 };

This also has the same issue so I don't think it has anything to do with
variable structure.


You have to debug that detail yourself but I wonder why the transformation
summary has a different representation than the jump function (and I think
the jump function size is the issue).

The JF has

  /* Information about zero/non-zero bits.  */
  struct ipa_bits bits;

  /* Information about value range, containing valid data only when vr_known is
 true.  */
  value_range m_vr;
  bool vr_known;

with ipa_bits having two widest_ints and value_range having two trees
and an unused bitmap and ipa_vr having two wide_ints (widest_ints are
smaller!).

Richard.



Thanks,
Kugan
diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index aa3c997..5103555 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -4865,6 +4865,8 @@ ipcp_store_bits_results (void)
 
   ipcp_grow_transformations_if_necessary ();
   ipcp_transformation_summary *ts = ipcp_get_transformation_summary (node);
+  if (!ts->bits)
+   ts->bits = (new (ggc_cleared_alloc ()) ipa_bits_vec ());
   vec_safe_reserve_exact (ts->bits, count);
 
   for (unsigned i = 0; i < count; i++)
@@ -4940,6 +4942,8 @@ ipcp_store_vr_results (void)
 
   ipcp_grow_transformations_if_necessary ();
   ipcp_transformation_summary *ts = ipcp_get_transformation_summary (node);
+  if (!ts->m_vr)
+   ts->m_vr = new (ggc_cleared_alloc ()) ipa_vr_vec ();
   vec_safe_reserve_exact (ts->m_vr, count);
 
   for (unsigned i = 0; i < count; i++)
diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 834c27d..b992a7f 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -3759,13 +3759,20 @@ ipa_node_params_t::insert (cgraph_node *, 
ipa_node_params *info)
to.  */
 
 void
-ipa_node_params_t::remove (cgraph_node *, ipa_node_params *info)
+ipa_node_params_t::remove (cgraph_node *node, ipa_node_params *info)
 {
   free (info->lattices);
   /* Lattice values and their sources are deallocated with their alocation
  pool.  */
   info->known_csts.release ();
   info->known_contexts.release ();
+  ipcp_transformation_summary *ts = ipcp_get_transformation_summary (node);
+  if (ts != NULL)
+{
+  ts->agg_values = NULL;
+  ts->bits = NULL;
+  ts->m_vr = NULL;
+}
 }
 
 /* Hook that is called by summary when a node is duplicated.  */
@@ -3811,8 +3818,10 @@ ipa_node_params_t::duplicate(cgraph_node *src, 
cgraph_node *dst,
   ipcp_grow_transformations_if_necessary ();
   src_trans = ipcp_get_transformation_summary (src);
   const vec *src_vr = src_trans->m_vr;

[PATCH doc] clean up -fdump-tree- options (PR 32003)

2017-02-01 Thread Martin Sebor

As discussed in bug 32003 - Undocumented -fdump-tree options, rather
than duplicating the same boiler-plate text for each of the dozens
(138 by my count) of undocumented passes, the attached patch removes
the pass-specific -fdump-tree- options replacing them with a list of
generic steps to determine the full list of such options and the names
of the dump files for each.

The bug only talks about -fdump-tree- options so the attached patch
only tackles those and leaves the -fdump-rtl- options for another
time.

Martin
PR middle-end/32003 - Undocumented -fdump-tree options

gcc/ChangeLog:

	PR middle-end/32003
	* doc/invoke.texi (-fdump-rtl-): Remove pass-specific options from
	index.
	(-fdump-tree-@var): Add to index and document how to come up
	with pass-specific option and dump file names.
	(-fdump-passes): Clarify where to look for output.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 4b13aeb..75b6e3e 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -543,30 +543,9 @@ Objective-C and Objective-C++ Dialects}.
 -fdump-passes @gol
 -fdump-rtl-@var{pass}  -fdump-rtl-@var{pass}=@var{filename} @gol
 -fdump-statistics @gol
--fdump-tree-all @gol
--fdump-tree-original@r{[}-@var{n}@r{]}  @gol
--fdump-tree-optimized@r{[}-@var{n}@r{]} @gol
--fdump-tree-cfg  -fdump-tree-alias @gol
--fdump-tree-ch @gol
--fdump-tree-ssa@r{[}-@var{n}@r{]}  -fdump-tree-pre@r{[}-@var{n}@r{]} @gol
--fdump-tree-ccp@r{[}-@var{n}@r{]}  -fdump-tree-dce@r{[}-@var{n}@r{]} @gol
--fdump-tree-gimple@r{[}-raw@r{]} @gol
--fdump-tree-dom@r{[}-@var{n}@r{]} @gol
--fdump-tree-dse@r{[}-@var{n}@r{]} @gol
--fdump-tree-phiprop@r{[}-@var{n}@r{]} @gol
--fdump-tree-phiopt@r{[}-@var{n}@r{]} @gol
--fdump-tree-backprop@r{[}-@var{n}@r{]} @gol
--fdump-tree-forwprop@r{[}-@var{n}@r{]} @gol
--fdump-tree-nrv  -fdump-tree-vect @gol
--fdump-tree-sink @gol
--fdump-tree-sra@r{[}-@var{n}@r{]} @gol
--fdump-tree-forwprop@r{[}-@var{n}@r{]} @gol
--fdump-tree-fre@r{[}-@var{n}@r{]} @gol
--fdump-tree-vtable-verify @gol
--fdump-tree-vrp@r{[}-@var{n}@r{]} @gol
--fdump-tree-split-paths@r{[}-@var{n}@r{]} @gol
--fdump-tree-storeccp@r{[}-@var{n}@r{]} @gol
--fdump-final-insns=@var{file} @gol
+-fdump-tree-@var{switch} @gol
+-fdump-tree-@var{switch}-@var{options} @gol
+-fdump-tree-@var{switch}-@var{options}=@var{filename} @gol
 -fcompare-debug@r{[}=@var{opts}@r{]}  -fcompare-debug-second @gol
 -fenable-@var{kind}-@var{pass} @gol
 -fenable-@var{kind}-@var{pass}=@var{range-list} @gol
@@ -12971,8 +12950,8 @@ Dump after function inlining.
 
 @item -fdump-passes
 @opindex fdump-passes
-Dump the list of optimization passes that are turned on and off by
-the current command-line options.
+Print on @file{stderr} the list of optimization passes that are turned
+on and off by the current command-line options.
 
 @item -fdump-statistics-@var{option}
 @opindex fdump-statistics
@@ -13069,7 +13048,7 @@ example,
 
 @smallexample
 gcc -O2 -ftree-vectorize -fdump-tree-vect-blocks=foo.dump
- -fdump-tree-pre=stderr file.c
+ -fdump-tree-pre=/dev/stderr file.c
 @end smallexample
 
 outputs vectorizer dump into @file{foo.dump}, while the PRE dump is
@@ -13077,11 +13056,6 @@ output on to @file{stderr}. If two conflicting dump filenames are
 given for the same pass, then the latter option overrides the earlier
 one.
 
-@item split-paths
-@opindex fdump-tree-split-paths
-Dump each function after splitting paths to loop backedges.  The file
-name is made by appending @file{.split-paths} to the source file name.
-
 @item all
 Turn on all options, except @option{raw}, @option{slim}, @option{verbose}
 and @option{lineno}.
@@ -13091,148 +13065,33 @@ Turn on all optimization options, i.e., @option{optimized},
 @option{missed}, and @option{note}.
 @end table
 
-The following tree dumps are possible:
-@table @samp
-
-@item original
-@opindex fdump-tree-original
-Dump before any tree based optimization, to @file{@var{file}.original}.
-
-@item optimized
-@opindex fdump-tree-optimized
-Dump after all tree based optimization, to @file{@var{file}.optimized}.
-
-@item gimple
-@opindex fdump-tree-gimple
-Dump each function before and after the gimplification pass to a file.  The
-file name is made by appending @file{.gimple} to the source file name.
-
-@item cfg
-@opindex fdump-tree-cfg
-Dump the control flow graph of each function to a file.  The file name is
-made by appending @file{.cfg} to the source file name.
-
-@item ch
-@opindex fdump-tree-ch
-Dump each function after copying loop headers.  The file name is made by
-appending @file{.ch} to the source file name.
-
-@item ssa
-@opindex fdump-tree-ssa
-Dump SSA related information to a file.  The file name is made by appending
-@file{.ssa} to the source file name.
-
-@item alias
-@opindex fdump-tree-alias
-Dump aliasing information for each function.  The file name is made by
-appending @file{.alias} to the source file name.
-
-@item ccp
-@opindex fdump-tree-ccp
-Dump each function after CCP@.  The file name is made by appending
-@file{.cc

  1   2   >