Re: [PATCH][RFC] Fix PR63155 (some more)

2018-09-20 Thread Richard Biener
On Wed, 19 Sep 2018, Steven Bosscher wrote:

> On Wed, Sep 19, 2018 at 3:06 PM Richard Biener wrote:
> > If we'd only had a O(log n) search sparse bitmap implementation ...
> > (Steven posted patches to switch bitmap from/to such one but IIRC
> > that at least lacked bitmap_first_set_bit).
> 
> But bitmap_first_set_bit would be easy to implement. Just take the
> left-most node of the splay tree.
> 
> Actually all bit-tests would be easy to implement. It's only
> enumeration and set operations on the tree-views that would be
> complicated fluff (easier to "switch views" than re-implement).
> 
> Impressive that you remember >5yr old patches like that ;-)

;)

Well, it's still useful (but obviously doesn't apply).  Not sure
iff the worst-case behavior of splay-trees makes it a pointless
exercise though ;)

Richard.


Re: [Ada] Fix comment typo in exp_ch9.adb

2018-09-20 Thread Arnaud Charlet
OK, thanks.

> Index: gcc/ada/ChangeLog
> ===
> 
> --- gcc/ada/ChangeLog   (revision 264438)
> +++ gcc/ada/ChangeLog   (working copy)
> @@ -1,3 +1,7 @@
> +2018-09-20  Oliver Kellogg  
> +
> +   * exp_ch9.adb: Fix typo 'geenrated' to 'generated'.
> +
>  2018-09-13  Eric Botcazou  
> 
> * Makefile.rtl (arm% linux-gnueabi%): Always set EH_MECHANISM
> to -arm.
> Index: gcc/ada/exp_ch9.adb
> ===
> 
> --- gcc/ada/exp_ch9.adb (revision 264438)
> +++ gcc/ada/exp_ch9.adb (working copy)
> @@ -481,7 +481,7 @@
> --  to be E. Bod is either a block or a subprogram body.  Used after
> --  expanding various kinds of entry bodies into their corresponding
> --  constructs. This is needed during unnesting to determine whether a
> -   --  body geenrated for an entry or an accept alternative includes uplevel
> +   --  body generated for an entry or an accept alternative includes uplevel
> --  references.
> 
> function Trivial_Accept_OK return Boolean;
> 


Re: C++ PATCH to implement P1064R0, Virtual Function Calls in Constant Expressions (v4)

2018-09-20 Thread Andreas Schwab
On Sep 19 2018, Jason Merrill  wrote:

> Andreas, do the new testcases pass?  That would surprise me, but OK if so.

No, they don't.

/usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:29:26:
 error: non-constant condition for static assertion
/usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:29:23:
 error: expression '((& X2::_ZTV2X2) + 16)' does not designate a 'constexpr' 
function
/usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:33:26:
 error: non-constant condition for static assertion
/usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:33:23:
 error: expression '((& X2::_ZTV2X2) + 16)' does not designate a 'constexpr' 
function
/usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:37:27:
 error: non-constant condition for static assertion
/usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:37:24:
 error: expression '((& X2::_ZTV2X2) + 16)' does not designate a 'constexpr' 
function
/usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:41:26:
 error: non-constant condition for static assertion
/usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:41:23:
 error: expression '((& X4::_ZTV2X4) + 16)' does not designate a 'constexpr' 
function
/usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:45:26:
 error: non-constant condition for static assertion
/usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:45:23:
 error: expression '((& X4::_ZTV2X4) + 16)' does not designate a 'constexpr' 
function
/usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:49:27:
 error: non-constant condition for static assertion
/usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:49:24:
 error: expression '((& X4::_ZTV2X4) + 16)' does not designate a 'constexpr' 
function
compiler exited with status 1
FAIL: g++.dg/cpp2a/constexpr-virtual2.C   (test for excess errors)

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


[PATCH v2] Change EQ_ATTR_ALT to support up to 64 alternatives

2018-09-20 Thread Ilya Leoshkevich
Bootstrapped and regtested on x86_64-redhat-linux and
s390x-redhat-linux.

Changes since v1:

* Use alternative_mask and HOST_WIDE_INT in attr_alt_intersection,
  attr_alt_union and attr_alt_complement.

On S/390 there is a need to support more than 32 instruction
alternatives per define_insn.  Currently this is not explicitly
prohibited or unsupported: MAX_RECOG_ALTERNATIVES is equal 35, and,
futhermore, the related code uses uint64_t for bitmaps in most places.

However, genattrtab contains the logic to convert (eq_attr "attribute"
"value") RTXs to (eq_attr_alt bitmap) RTXs, where bitmap contains
alternatives, whose "attribute" has the corresponding "value".
Unfortunately, bitmap is only 32 bits.

When adding the 33rd alternative, this led to (eq_attr "type" "larl")
becoming (eq_attr_alt -1050625 1), where -1050625 == 0xffeff7ff.  The
cleared bits 12, 21 and 32 correspond to two existing and one newly
added insn of type "larl".  compute_alternative_mask sign extended this
to 0xffeff7ff, which contained non-existent alternatives, and
this made simplify_test_exp fail with "invalid alternative specified".

I'm not sure why it didn't fail the same way before, since the top bit,
which led to sign extension, should have been set even with 32
alternatives.  Maybe simplify_test_exp was not called for "type"
attribute for some reason?

This patch widens EQ_ATTR_ALT bitmap to 64 bits, making it possible to
gracefully handle up to 64 alternatives.  It eliminates the problem with
the 33rd alternative on S/390.

gcc/ChangeLog:

2018-09-18  Ilya Leoshkevich  

* genattrtab.c (mk_attr_alt): Use alternative_mask.
(attr_rtx_1): Adjust caching to match the new EQ_ATTR_ALT field
types.
(check_attr_test): Use alternative_mask.
(get_attr_value): Likewise.
(compute_alternative_mask): Use alternative_mask and XWINT.
(make_alternative_compare): Use alternative_mask.
(attr_alt_subset_p): Use XWINT.
(attr_alt_subset_of_compl_p): Likewise.
(attr_alt_intersection): Use alternative_mask and XWINT.
(attr_alt_union): Likewise.
(attr_alt_complement): Use HOST_WIDE_INT and XWINT.
(mk_attr_alt): Use alternative_mask and HOST_WIDE_INT.
(simplify_test_exp): Use alternative_mask and XWINT.
(write_test_expr): Use alternative_mask and XWINT, adjust bit
number calculation to support 64 bits.  Generate code that
checks 64-bit masks.
(main): Use alternative_mask.
* rtl.def (EQ_ATTR_ALT): Change field types from ii to ww.
---
 gcc/genattrtab.c | 132 ++-
 gcc/recog.h  |   2 +-
 gcc/rtl.def  |   2 +-
 3 files changed, 74 insertions(+), 62 deletions(-)

diff --git a/gcc/genattrtab.c b/gcc/genattrtab.c
index f9b0bc94b0f..d5cdbf5be23 100644
--- a/gcc/genattrtab.c
+++ b/gcc/genattrtab.c
@@ -228,7 +228,9 @@ static int *insn_n_alternatives;
 /* Stores, for each insn code, a bitmap that has bits on for each possible
alternative.  */
 
-static uint64_t *insn_alternatives;
+/* Keep this in sync with recog.h.  */
+typedef uint64_t alternative_mask;
+static alternative_mask *insn_alternatives;
 
 /* Used to simplify expressions.  */
 
@@ -256,7 +258,7 @@ static char *attr_printf   (unsigned int, const 
char *, ...)
   ATTRIBUTE_PRINTF_2;
 static rtx make_numeric_value  (int);
 static struct attr_desc *find_attr (const char **, int);
-static rtx mk_attr_alt (uint64_t);
+static rtx mk_attr_alt (alternative_mask);
 static char *next_comma_elt   (const char **);
 static rtx insert_right_side  (enum rtx_code, rtx, rtx, int, int);
 static rtx copy_boolean   (rtx);
@@ -494,26 +496,26 @@ attr_rtx_1 (enum rtx_code code, va_list p)
}
 }
   else if (GET_RTX_LENGTH (code) == 2
-  && GET_RTX_FORMAT (code)[0] == 'i'
-  && GET_RTX_FORMAT (code)[1] == 'i')
+  && GET_RTX_FORMAT (code)[0] == 'w'
+  && GET_RTX_FORMAT (code)[1] == 'w')
 {
-  int  arg0 = va_arg (p, int);
-  int  arg1 = va_arg (p, int);
+  HOST_WIDE_INT arg0 = va_arg (p, HOST_WIDE_INT);
+  HOST_WIDE_INT arg1 = va_arg (p, HOST_WIDE_INT);
 
   hashcode = ((HOST_WIDE_INT) code + RTL_HASH (arg0) + RTL_HASH (arg1));
   for (h = attr_hash_table[hashcode % RTL_HASH_SIZE]; h; h = h->next)
if (h->hashcode == hashcode
&& GET_CODE (h->u.rtl) == code
-   && XINT (h->u.rtl, 0) == arg0
-   && XINT (h->u.rtl, 1) == arg1)
+   && XWINT (h->u.rtl, 0) == arg0
+   && XWINT (h->u.rtl, 1) == arg1)
  return h->u.rtl;
 
   if (h == 0)
{
  rtl_obstack = hash_obstack;
  rt_val = rtx_alloc (code);
- XINT (rt_val, 0) = arg0;
- XINT (rt_val, 1) = arg1;
+ XWINT (rt_val, 0) = arg0;
+ XWINT (rt_val, 1) = arg1;
}
 }
   else if (code == CONST_INT)
@@ -703,7 +705,8 @@ check_att

Re: [PATCH] look harder for MEM_REF operand equality to avoid -Wstringop-truncation (PR 84561)

2018-09-20 Thread Richard Biener
On Wed, Sep 19, 2018 at 4:19 PM Martin Sebor  wrote:
>
> On 09/18/2018 10:23 PM, Jeff Law wrote:
> > On 9/18/18 1:46 PM, Martin Sebor wrote:
> >> On 09/18/2018 12:58 PM, Jeff Law wrote:
> >>> On 9/18/18 11:12 AM, Martin Sebor wrote:
> >>>
> > My bad.  Sigh. CCP doesn't track copies, just constants, so there's not
> > going to be any data structure you can exploit.  And I don't think
> > there's a value number you can use to determine the two objects are the
> > same.
> >
> > Hmm, let's back up a bit, what is does the relevant part of the IL look
> > like before CCP?  Is the real problem here that we have unpropagated
> > copies lying around in the IL?  Hmm, more likely the IL looksl ike:
> >
> >_8 = &pb_3(D)->a;
> >_9 = _8;
> >_1 = _9;
> >strncpy (MEM_REF (&pb_3(D)->a), ...);
> >MEM[(struct S *)_1].a[n_7] = 0;
> 
>  Yes, that is what the folder sees while the strncpy call is
>  being transformed/folded by ccp.  The MEM_REF is folded just
>  after the strncpy call and that's when it's transformed into
> 
>    MEM[(struct S *)_8].a[n_7] = 0;
> 
>  (The assignments to _1 and _9 don't get removed until after
>  the dom walk finishes).
> 
> >
> > If we were to propagate the copies out we'd at best have:
> >
> >_8 = &pb_3(D)->a;
> >strncpy (MEM_REF (&pb_3(D)->a), ...);
> >MEM[(struct S *)_8].a[n_7] = 0;
> >
> >
> > Is that in a form you can handle?  Or would we also need to forward
> > propagate the address computation into the use of _8?
> 
>  The above works as long as we look at the def_stmt of _8 in
>  the MEM_REF (we currently don't).  That's also what the last
>  iteration of the loop does.  In this case (with _8) it would
>  be discovered in the first iteration, so the loop could be
>  replaced by a simple if statement.
> 
>  But I'm not sure I understand the concern with the loop.  Is
>  it that we are looping at all, i.e., the cost?  Or that ccp
>  is doing something wrong or suboptimal? (Should have
>  propagated the value of _8 earlier?)
> >>> I suspect it's more a concern that things like copies are typically
> >>> propagated away.   So their existence in the IL (and consequently your
> >>> need to handle them) raises the question "has something else failed to
> >>> do its job earlier".
> >>>
> >>> During which of the CCP passes is this happening?  Can we pull the
> >>> warning out of the folder (even if that means having a distinct warning
> >>> pass over the IL?)
> >>
> >> It happens during the third run of the pass.
> >>
> >> The only way to do what you suggest that I could think of is
> >> to defer the strncpy to memcpy transformation until after
> >> the warning pass.  That was also my earlier suggestion: defer
> >> both it and the warning until the tree-ssa-strlen pass (where
> >> the warning is implemented to begin with -- the folder calls
> >> into it).
> > If it's happening that late (CCP3) in general, then ISTM we ought to be
> > able to get the warning out of the folder.  We just have to pick the
> > right spot.
> >
> > warn_restrict runs before fold_all_builtins, but after dom/vrp so we
> > should have the IL in pretty good shape.  That seems like about the
> > right time.
> >
> > I wonder if we could generalize warn_restrict to be a more generic
> > warning pass over the IL and place it right before fold_builtins.
>
> The restrict pass doesn't know about string lengths so it can't
> handle all the warnings about string built-ins (the strlen pass
> now calls into it to issue some).  The strlen pass does so it
> could handle most if not all of them (the folder also calls
> into it to issue some warnings).  It would work even better if
> it were also integrated with the object size pass.
>
> We're already working on merging strlen with sprintf.  It seems
> to me that the strlen pass would benefit not only from that but
> also from integrating with object size and warn-restrict.  With
> that, -Wstringop-overflow could be moved from builtins.c into
> it as well (and also benefit not only from accurate string
> lengths but also from the more accurate object size info).
>
> What do you think about that?

I think integrating the various "passes" (objectsize is also
as much a facility as a pass) generally makes sense given
it might end up improving all of them and reduce code duplication.

Richard.

>
> Martin
>
> PS I don't think I could do more than merger strlen and sprintf
> before stage 1 ends (if even that much) so this would be a longer
> term goal.


Re: C++ PATCH to implement P1064R0, Virtual Function Calls in Constant Expressions (v4)

2018-09-20 Thread Jakub Jelinek
On Thu, Sep 20, 2018 at 09:12:53AM +0200, Andreas Schwab wrote:
> On Sep 19 2018, Jason Merrill  wrote:
> 
> > Andreas, do the new testcases pass?  That would surprise me, but OK if so.
> 
> No, they don't.
> 
> /usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:29:26:
>  error: non-constant condition for static assertion
> /usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:29:23:
>  error: expression '((& X2::_ZTV2X2) + 16)' does not designate a 'constexpr' 
> function
> /usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:33:26:
>  error: non-constant condition for static assertion
> /usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:33:23:
>  error: expression '((& X2::_ZTV2X2) + 16)' does not designate a 'constexpr' 
> function
> /usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:37:27:
>  error: non-constant condition for static assertion
> /usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:37:24:
>  error: expression '((& X2::_ZTV2X2) + 16)' does not designate a 'constexpr' 
> function
> /usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:41:26:
>  error: non-constant condition for static assertion
> /usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:41:23:
>  error: expression '((& X4::_ZTV2X4) + 16)' does not designate a 'constexpr' 
> function
> /usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:45:26:
>  error: non-constant condition for static assertion
> /usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:45:23:
>  error: expression '((& X4::_ZTV2X4) + 16)' does not designate a 'constexpr' 
> function
> /usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:49:27:
>  error: non-constant condition for static assertion
> /usr/local/gcc/gcc-20180920/gcc/testsuite/g++.dg/cpp2a/constexpr-virtual2.C:49:24:
>  error: expression '((& X4::_ZTV2X4) + 16)' does not designate a 'constexpr' 
> function
> compiler exited with status 1
> FAIL: g++.dg/cpp2a/constexpr-virtual2.C   (test for excess errors)

I think the primary problem here is:
  /* When using function descriptors, the address of the
 vtable entry is treated as a function pointer.  */
  if (TARGET_VTABLE_USES_DESCRIPTORS)
e2 = build1 (NOP_EXPR, TREE_TYPE (e2),
 cp_build_addr_expr (e2, complain));
in typeck.c, on non-descriptor targets we have an INDIRECT_REF where we
read the vtable function pointer.  On ia64, the above optimizes the
INDIRECT_REF away, so what the cxx_eval_call_expression actually gets
after constexpr evaluating the CALL_FN is not ADDR_EXPR of a function,
but the address of the function descriptor (e.g. &_ZTV2X2 + 16 ).

So, perhaps in cxx_eval_call_expression we need:
   if (TREE_CODE (fun) == ADDR_EXPR)
fun = TREE_OPERAND (fun, 0);
+  else if (TARGET_VTABLE_USES_DESCRIPTORS
+  && TREE_CODE (fun) == POINTER_PLUS_EXPR
+  && ...)
where we verify that p+ first argument is ADDR_EXPR of a virtual table,
second arg is INTEGER_CST and just walk the DECL_INITIAL of that, finding
the FDESC_EXPR at the right offset (therefore, I believe you need following
rather than the patch you've posted, so that you can actually find it) and
finally pick the function from the FDESC_EXPR entry.
Makes me wonder what happens with indirect calls in constexpr evaluation,
e.g. if I do:
constexpr int bar () { return 42; }
constexpr int foo () { int (*fn) () = bar; return fn (); }
static_assert (foo () == 42);
but apparently this works.

--- gcc/cp/class.c.jj   2018-09-20 09:56:59.229751895 +0200
+++ gcc/cp/class.c  2018-09-20 10:12:17.447370890 +0200
@@ -9266,7 +9266,6 @@ build_vtbl_initializer (tree binfo,
   tree vcall_index;
   tree fn, fn_original;
   tree init = NULL_TREE;
-  tree idx = size_int (jx++);
 
   fn = BV_FN (v);
   fn_original = fn;
@@ -9370,7 +9369,7 @@ build_vtbl_initializer (tree binfo,
  int i;
  if (init == size_zero_node)
for (i = 0; i < TARGET_VTABLE_USES_DESCRIPTORS; ++i)
- CONSTRUCTOR_APPEND_ELT (*inits, idx, init);
+ CONSTRUCTOR_APPEND_ELT (*inits, size_int (jx++), init);
  else
for (i = 0; i < TARGET_VTABLE_USES_DESCRIPTORS; ++i)
  {
@@ -9378,11 +9377,11 @@ build_vtbl_initializer (tree binfo,
 fn, build_int_cst (NULL_TREE, i));
TREE_CONSTANT (fdesc) = 1;
 
-   CONSTRUCTOR_APPEND_ELT (*inits, idx, fdesc);
+   CONSTRUCTOR_APPEND_ELT (*inits, size_int (jx++), fdesc);
  }
}
   else
-   CONSTRUCTOR_APPEND_ELT (*inits, idx, init);
+   CONSTRUCTOR_APPEND_ELT (*inits, size_int (jx++), init);
 }
 }
 


Jakub


Re: [PATCH][GCC][AArch64] Add support for SVE stack clash probing [patch (2/7)]

2018-09-20 Thread Tamar Christina
Hi Richard,

The 09/11/2018 16:20, Richard Sandiford wrote:
> Tamar Christina  writes:
> >> > +
> >> > +  /* No probe leave.  */
> >> > +  ASM_OUTPUT_INTERNAL_LABEL (asm_out_file, loop_end_lab);
> >> > +  return "";
> >> 
> >> With the CFA stuff and constant load, I think this works out as:
> >> 
> >> -
> >> # 12 insns
> >>mov r15, base
> >>mov adjustment, N
> >> 1:
> >>cmp adjustment, guard_size
> >>b.lt2f
> >>sub base, base, guard_size
> >>str xzr, [base, limit]
> >>sub adjustment, adjustment, guard_size
> >>b   1b
> >> 2:
> >>sub base, base, adjustment
> >>cmp adjustment, limit
> >>b.le3f
> >>str xzr, [base, limit]
> >> 3:
> >> -
> >> 
> >> What do you think about something like:
> >> 
> >> -
> >> # 10 insns
> >>mov adjustment, N
> >>sub r15, base, adjustment
> >>subsadjustment, adjustment, min_probe_threshold
> >>b.lo2f
> >> 1:
> >>add base, x15, adjustment
> >>str xzr, [base, 0]
> >>subsadjustment, adjustment, 16
> >>and adjustment, adjustment, ~(guard_size-1)
> >>b.hs1b
> >> 2:
> >>mov base, r15
> >> -
> >> 
> >> or (with different trade-offs):
> >> 
> >> -
> >> # 11 insns
> >>mov adjustment, N
> >>sub r15, base, adjustment
> >>subsadjustment, adjustment, min_probe_threshold
> >>b.lo2f
> >># Might be 0, leading to a double probe
> >>and r14, adjustment, guard_size-1
> >> 1:
> >>add base, x15, adjustment
> >>str xzr, [base, 0]
> >>subsadjustment, adjustment, r14
> >>mov r14, guard_size
> >>b.hs1b
> >> 2:
> >>mov base, r15
> >> -
> >> 
> >> or (longer, but with a simpler loop):
> >> 
> >> -
> >> # 12 insns
> >>mov adjustment, N
> >>sub r15, base, adjustment
> >>subsadjustment, adjustment, min_probe_threshold
> >>b.lo2f
> >>str xzr, [base, -16]!
> >>sub adjustment, adjustment, 32
> >>and adjustment, adjustment, -(guard_size-1)
> >> 1:
> >>add base, x15, adjustment
> >>str xzr, [base, 0]
> >>subsadjustment, adjustment, guard_size
> >>b.hs1b
> >> 2:
> >>mov base, r15
> >> -
> >> 
> >> with the CFA based on r15+offset?
> >> 
> >> These loops probe more often than necessary in some cases,
> >> but they only need a single branch in the common case that
> >> ADJUSTMENT <= MIN_PROBE_THRESHOLD.
> >
> > I haven't changed the loop yet because I'm a bit on the edge about
> > whether the implementation difficulties would outweigh the benefits.
> > We are planning on doing something smarter for SVE so optimizing these
> > loops only to replace them later may not be time well spent now.
> >
> > The problem is that to support both 4KB and 64KB pages, instructions such
> > as subs would require different immediates and shifts. Granted we 
> > technically
> > only support these two so I could hardcode the values, but that would mean
> > these functions are less general than the rest.
> 
> Because of the min_probe_threshold?  You could conservatively clamp it
> to the next lowest value that's in range, which we could do without
> having to hard-code specific values.  I think it would be better
> to do that even with the current code, since hard-coding 2048 with:
> 
>   /* Test if ADJUSTMENT < RESIDUAL_PROBE_GUARD, in principle any power of two
>  larger than 1024B would work, but we need one that works for all 
> supported
>  guard-sizes.  What we actually want to check is guard-size - 1KB, but 
> this
>  immediate won't fit inside a cmp without requiring a tempory, so instead 
> we
>  just accept a smaller immediate that doesn't, we may probe a bit more 
> often
>  but that doesn't matter much on the long run.  */
> 
> seems a bit of a hack.
> 
> > If you think it would be worthwhile, I'd be happy to use one of these
> > loops instead.
> 
> Yeah, I still think we should do this unless we can commit to doing
> the optimised version by a specific date, and that date is soon enough
> that the optimisation could reasonably be backported to GCC 8.
> 

While implementing these loops I found them a bit hard to follow, or rather a 
bit
difficult to prove correct, to someone looking at the code it may not be 
trivially clear
what it does. I believe the main concern here is that the common case
isn't shortcutted? e.g. spills small enough not to require a probe. So how about

mov r15, base
mov adjustment, N
cmp adjustment, nearest(min_probe_threshold)
b.ltend
be

Re: [PATCH] Remove arc profile histogram in non-LTO mode.

2018-09-20 Thread Jan Hubicka
> On Thu, Sep 20, 2018 at 2:11 AM Martin Liška  wrote:
> >
> > Hello.
> >
> > I've been working for some time on a patch that simplifies how we set
> > the hotness threshold of basic blocks. Currently, we calculate so called
> > arc profile histograms that should identify edges that cover 99.9% of all
> > branching. These edges are then identified as hot. Disadvantage of the 
> > approach
> > is that it comes with significant overhead in run-time and GCC related code
> > is also not trivial. Moreover, anytime a histogram is merged after an 
> > instrumented
> > run, the resulting histogram is misleading.
> >
> > That said, I decided to simplify it again, remove usage of the histogram 
> > and return
> > to what we have before (--param hot-bb-count-fraction). That basically says 
> > that
> > we consider hot each edge that has execution count bigger than sum_max / 
> > 10.000.
> >
> > Note that LTO+PGO remains untouched as it still uses histogram that is 
> > dynamically
> > calculated by read arc counts.
> Hi,
> Does this affect AutoFDO stuff?  AutoFDO is broken and I am fixing it
> now, on the basis of current code.

This is indpendent of Auto-FDO. There we probably can define cutoffs for 
hot-cold
partitions in the tool translating global data into per-file data read by GCC.
It is great you will take a deper look at autoFDO. it indeed needs work!

The patch is OK, thank for working on it!  Histograms was added by google as
bit of experiment, but I do not think they turned out to be useful. The data
produced by them was not very related to what the IPA profile generation 
produces
and thus it did not seem to match reality very well.

Honza
> 
> Thanks,
> bin
> >
> > Note the statistics of the patch:
> >   19 files changed, 101 insertions(+), 1216 deletions(-)
> >
> > I'm attaching file sizes of SPEC2006 int benchmark.
> >
> > Patch survives testing on x86_64-linux-gnu machine.
> > Ready to be installed?
> >
> > Martin
> >
> > gcc/ChangeLog:
> >
> > 2018-09-19  Martin Liska  
> >
> > * auto-profile.c (autofdo_source_profile::read): Do not
> > set sum_all.
> > (read_profile): Do not add working sets.
> > (read_autofdo_file): Remove sum_all.
> > (afdo_callsite_hot_enough_for_early_inline): Remove const
> > qualifier.
> > * coverage.c (struct counts_entry): Remove gcov_summary.
> > (read_counts_file): Read new GCOV_TAG_OBJECT_SUMMARY,
> > do not support GCOV_TAG_PROGRAM_SUMMARY.
> > (get_coverage_counts): Remove summary and expected
> > arguments.
> > * coverage.h (get_coverage_counts): Likewise.
> > * doc/gcov-dump.texi: Remove -w option.
> > * gcov-dump.c (dump_working_sets): Remove.
> > (main): Do not support '-w' option.
> > (print_usage): Likewise.
> > (tag_summary): Likewise.
> > * gcov-io.c (gcov_write_summary): Do not dump
> > histogram.
> > (gcov_read_summary): Likewise.
> > (gcov_histo_index): Remove.
> > (gcov_histogram_merge): Likewise.
> > (compute_working_sets): Likewise.
> > * gcov-io.h (GCOV_TAG_OBJECT_SUMMARY): Mark
> > it not obsolete.
> > (GCOV_TAG_PROGRAM_SUMMARY): Mark it obsolete.
> > (GCOV_TAG_SUMMARY_LENGTH): Adjust.
> > (GCOV_HISTOGRAM_SIZE): Remove.
> > (GCOV_HISTOGRAM_BITVECTOR_SIZE): Likewise.
> > (struct gcov_summary): Simplify rapidly just
> > to runs and sum_max fields.
> > (gcov_histo_index): Remove.
> > (NUM_GCOV_WORKING_SETS): Likewise.
> > (compute_working_sets): Likewise.
> > * gcov-tool.c (print_overlap_usage_message): Remove
> > trailing empty line.
> > * gcov.c (read_count_file): Read GCOV_TAG_OBJECT_SUMMARY.
> > (output_lines): Remove program related line.
> > * ipa-profile.c (ipa_profile): Do not consider GCOV histogram.
> > * lto-cgraph.c (output_profile_summary): Do not stream GCOV
> > histogram.
> > (input_profile_summary): Do not read it.
> > (merge_profile_summaries): And do not merge it.
> > (input_symtab): Do not call removed function.
> > * modulo-sched.c (sms_schedule): Do not print sum_max.
> > * params.def (HOT_BB_COUNT_FRACTION): Reincarnate param that was
> > removed when histogram method was invented.
> > (HOT_BB_COUNT_WS_PERMILLE): Mention that it's used only in LTO
> > mode.
> > * postreload-gcse.c (eliminate_partially_redundant_load): Fix
> > GCOV coding style.
> > * predict.c (get_hot_bb_threshold): Use HOT_BB_COUNT_FRACTION
> > and dump selected value.
> > * profile.c (add_working_set): Remove.
> > (get_working_sets): Likewise.
> > (find_working_set): Likewise.
> > (get_exec_counts): Do not work with working sets.
> > (read_profile_edge_counts): Do not inform as sum_max is removed.
> > (

Re: [PATCH] Remove arc profile histogram in non-LTO mode.

2018-09-20 Thread Bin.Cheng
On Thu, Sep 20, 2018 at 5:26 PM Jan Hubicka  wrote:
>
> > On Thu, Sep 20, 2018 at 2:11 AM Martin Liška  wrote:
> > >
> > > Hello.
> > >
> > > I've been working for some time on a patch that simplifies how we set
> > > the hotness threshold of basic blocks. Currently, we calculate so called
> > > arc profile histograms that should identify edges that cover 99.9% of all
> > > branching. These edges are then identified as hot. Disadvantage of the 
> > > approach
> > > is that it comes with significant overhead in run-time and GCC related 
> > > code
> > > is also not trivial. Moreover, anytime a histogram is merged after an 
> > > instrumented
> > > run, the resulting histogram is misleading.
> > >
> > > That said, I decided to simplify it again, remove usage of the histogram 
> > > and return
> > > to what we have before (--param hot-bb-count-fraction). That basically 
> > > says that
> > > we consider hot each edge that has execution count bigger than sum_max / 
> > > 10.000.
> > >
> > > Note that LTO+PGO remains untouched as it still uses histogram that is 
> > > dynamically
> > > calculated by read arc counts.
> > Hi,
> > Does this affect AutoFDO stuff?  AutoFDO is broken and I am fixing it
> > now, on the basis of current code.
>
> This is indpendent of Auto-FDO. There we probably can define cutoffs for 
> hot-cold
> partitions in the tool translating global data into per-file data read by GCC.
> It is great you will take a deper look at autoFDO. it indeed needs work!
>
> The patch is OK, thank for working on it!  Histograms was added by google as
> bit of experiment, but I do not think they turned out to be useful. The data
I did some experiments showing it is somehow useful, for autoFDO.  To
which extend it is useful remains a question I need to investigate
later.

Thanks,
bin
> produced by them was not very related to what the IPA profile generation 
> produces
> and thus it did not seem to match reality very well.
>
> Honza
> >
> > Thanks,
> > bin
> > >
> > > Note the statistics of the patch:
> > >   19 files changed, 101 insertions(+), 1216 deletions(-)
> > >
> > > I'm attaching file sizes of SPEC2006 int benchmark.
> > >
> > > Patch survives testing on x86_64-linux-gnu machine.
> > > Ready to be installed?
> > >
> > > Martin
> > >
> > > gcc/ChangeLog:
> > >
> > > 2018-09-19  Martin Liska  
> > >
> > > * auto-profile.c (autofdo_source_profile::read): Do not
> > > set sum_all.
> > > (read_profile): Do not add working sets.
> > > (read_autofdo_file): Remove sum_all.
> > > (afdo_callsite_hot_enough_for_early_inline): Remove const
> > > qualifier.
> > > * coverage.c (struct counts_entry): Remove gcov_summary.
> > > (read_counts_file): Read new GCOV_TAG_OBJECT_SUMMARY,
> > > do not support GCOV_TAG_PROGRAM_SUMMARY.
> > > (get_coverage_counts): Remove summary and expected
> > > arguments.
> > > * coverage.h (get_coverage_counts): Likewise.
> > > * doc/gcov-dump.texi: Remove -w option.
> > > * gcov-dump.c (dump_working_sets): Remove.
> > > (main): Do not support '-w' option.
> > > (print_usage): Likewise.
> > > (tag_summary): Likewise.
> > > * gcov-io.c (gcov_write_summary): Do not dump
> > > histogram.
> > > (gcov_read_summary): Likewise.
> > > (gcov_histo_index): Remove.
> > > (gcov_histogram_merge): Likewise.
> > > (compute_working_sets): Likewise.
> > > * gcov-io.h (GCOV_TAG_OBJECT_SUMMARY): Mark
> > > it not obsolete.
> > > (GCOV_TAG_PROGRAM_SUMMARY): Mark it obsolete.
> > > (GCOV_TAG_SUMMARY_LENGTH): Adjust.
> > > (GCOV_HISTOGRAM_SIZE): Remove.
> > > (GCOV_HISTOGRAM_BITVECTOR_SIZE): Likewise.
> > > (struct gcov_summary): Simplify rapidly just
> > > to runs and sum_max fields.
> > > (gcov_histo_index): Remove.
> > > (NUM_GCOV_WORKING_SETS): Likewise.
> > > (compute_working_sets): Likewise.
> > > * gcov-tool.c (print_overlap_usage_message): Remove
> > > trailing empty line.
> > > * gcov.c (read_count_file): Read GCOV_TAG_OBJECT_SUMMARY.
> > > (output_lines): Remove program related line.
> > > * ipa-profile.c (ipa_profile): Do not consider GCOV histogram.
> > > * lto-cgraph.c (output_profile_summary): Do not stream GCOV
> > > histogram.
> > > (input_profile_summary): Do not read it.
> > > (merge_profile_summaries): And do not merge it.
> > > (input_symtab): Do not call removed function.
> > > * modulo-sched.c (sms_schedule): Do not print sum_max.
> > > * params.def (HOT_BB_COUNT_FRACTION): Reincarnate param that was
> > > removed when histogram method was invented.
> > > (HOT_BB_COUNT_WS_PERMILLE): Mention that it's used only in LTO
> > > mode.
> > > * postreload-gcse.c (eliminate_partially_redundant_load): Fix

Re: [PATCH] Remove arc profile histogram in non-LTO mode.

2018-09-20 Thread Jan Hubicka
> On Thu, Sep 20, 2018 at 5:26 PM Jan Hubicka  wrote:
> >
> > > On Thu, Sep 20, 2018 at 2:11 AM Martin Liška  wrote:
> > > >
> > > > Hello.
> > > >
> > > > I've been working for some time on a patch that simplifies how we set
> > > > the hotness threshold of basic blocks. Currently, we calculate so called
> > > > arc profile histograms that should identify edges that cover 99.9% of 
> > > > all
> > > > branching. These edges are then identified as hot. Disadvantage of the 
> > > > approach
> > > > is that it comes with significant overhead in run-time and GCC related 
> > > > code
> > > > is also not trivial. Moreover, anytime a histogram is merged after an 
> > > > instrumented
> > > > run, the resulting histogram is misleading.
> > > >
> > > > That said, I decided to simplify it again, remove usage of the 
> > > > histogram and return
> > > > to what we have before (--param hot-bb-count-fraction). That basically 
> > > > says that
> > > > we consider hot each edge that has execution count bigger than sum_max 
> > > > / 10.000.
> > > >
> > > > Note that LTO+PGO remains untouched as it still uses histogram that is 
> > > > dynamically
> > > > calculated by read arc counts.
> > > Hi,
> > > Does this affect AutoFDO stuff?  AutoFDO is broken and I am fixing it
> > > now, on the basis of current code.
> >
> > This is indpendent of Auto-FDO. There we probably can define cutoffs for 
> > hot-cold
> > partitions in the tool translating global data into per-file data read by 
> > GCC.
> > It is great you will take a deper look at autoFDO. it indeed needs work!
> >
> > The patch is OK, thank for working on it!  Histograms was added by google as
> > bit of experiment, but I do not think they turned out to be useful. The data
> I did some experiments showing it is somehow useful, for autoFDO.  To
> which extend it is useful remains a question I need to investigate
> later.

Indeed auto-FDO has better idea about whole program behaviour. We could revive
the patch for streaming histograms and reading them to compiler if that turns
out to be a good idea. I can see that auto-FDO profile data tells you pretty
clearly where the hot spots are and it is not as easy to recover this 
information
from profile annotated CFG becuase of all the transforms we do.
Lets fix and benchmark auto-FDO first and then we could decide what is best 
option.
Putting the stream-in code back should not be hard if it turns out to be useful.

Main problem with current historams with normal FDO is the fact that you need
to merge them between runs which is technically impossible job to do, so they
work for programs run once, but not for programs run many times in train runs
like gcc itself.  It seems to me that for those relaly interested in
performance it is good idea to switch to LTO and that makes it possible to
calculate histograms during the linking stage.

Honza
> 
> Thanks,
> bin
> > produced by them was not very related to what the IPA profile generation 
> > produces
> > and thus it did not seem to match reality very well.
> >
> > Honza
> > >
> > > Thanks,
> > > bin
> > > >
> > > > Note the statistics of the patch:
> > > >   19 files changed, 101 insertions(+), 1216 deletions(-)
> > > >
> > > > I'm attaching file sizes of SPEC2006 int benchmark.
> > > >
> > > > Patch survives testing on x86_64-linux-gnu machine.
> > > > Ready to be installed?
> > > >
> > > > Martin
> > > >
> > > > gcc/ChangeLog:
> > > >
> > > > 2018-09-19  Martin Liska  
> > > >
> > > > * auto-profile.c (autofdo_source_profile::read): Do not
> > > > set sum_all.
> > > > (read_profile): Do not add working sets.
> > > > (read_autofdo_file): Remove sum_all.
> > > > (afdo_callsite_hot_enough_for_early_inline): Remove const
> > > > qualifier.
> > > > * coverage.c (struct counts_entry): Remove gcov_summary.
> > > > (read_counts_file): Read new GCOV_TAG_OBJECT_SUMMARY,
> > > > do not support GCOV_TAG_PROGRAM_SUMMARY.
> > > > (get_coverage_counts): Remove summary and expected
> > > > arguments.
> > > > * coverage.h (get_coverage_counts): Likewise.
> > > > * doc/gcov-dump.texi: Remove -w option.
> > > > * gcov-dump.c (dump_working_sets): Remove.
> > > > (main): Do not support '-w' option.
> > > > (print_usage): Likewise.
> > > > (tag_summary): Likewise.
> > > > * gcov-io.c (gcov_write_summary): Do not dump
> > > > histogram.
> > > > (gcov_read_summary): Likewise.
> > > > (gcov_histo_index): Remove.
> > > > (gcov_histogram_merge): Likewise.
> > > > (compute_working_sets): Likewise.
> > > > * gcov-io.h (GCOV_TAG_OBJECT_SUMMARY): Mark
> > > > it not obsolete.
> > > > (GCOV_TAG_PROGRAM_SUMMARY): Mark it obsolete.
> > > > (GCOV_TAG_SUMMARY_LENGTH): Adjust.
> > > > (GCOV_HISTOGRAM_SIZE): Remove.
> > > > (GCOV_HISTOGRAM_BITVECTOR_SIZE): Likewise.
> > > > 

Re: [PATCH 09/25] Elide repeated RTL elements.

2018-09-20 Thread Andrew Stubbs

On 19/09/18 17:38, Andrew Stubbs wrote:
Here's an updated patch incorporating the RTL front-end changes. I had 
to change from "repeated 2x" to "repeated x2" because the former is not 
a valid C token, and apparently that's important.


Here's a patch with self tests added, for both reading and writing.

It also fixes a bug when the repeat was the last item in a list.

OK?

Andrew
Elide repeated RTL elements.

GCN's 64-lane vectors tend to make RTL dumps very long.  This patch makes them
far more bearable by eliding long sequences of the same element into "repeated"
messages.

This also takes care of reading repeated sequences in the RTL front-end.

There are self tests for both reading and writing.

2018-09-20  Andrew Stubbs  
	Jan Hubicka  
	Martin Jambor  

	gcc/
	* print-rtl.c (print_rtx_operand_codes_E_and_V): Print how many times
	the same elements are repeated rather than printing all of them.
	* read-rtl.c (rtx_reader::read_rtx_operand): Recognize and expand
	"repeated" elements.
	* read-rtl-function.c (test_loading_repeat): New function.
	(read_rtl_function_c_tests): Call test_loading_repeat.
	* rtl-tests.c (test_dumping_repeat): New function.
	(rtl_tests_c_tests): Call test_dumping_repeat.

	gcc/testsuite/
	* selftests/repeat.rtl: New file.

diff --git a/gcc/print-rtl.c b/gcc/print-rtl.c
index 5dd2e31..1228483 100644
--- a/gcc/print-rtl.c
+++ b/gcc/print-rtl.c
@@ -370,7 +370,20 @@ rtx_writer::print_rtx_operand_codes_E_and_V (const_rtx in_rtx, int idx)
 	m_sawclose = 1;
 
   for (int j = 0; j < XVECLEN (in_rtx, idx); j++)
-	print_rtx (XVECEXP (in_rtx, idx, j));
+	{
+	  int j1;
+
+	  print_rtx (XVECEXP (in_rtx, idx, j));
+	  for (j1 = j + 1; j1 < XVECLEN (in_rtx, idx); j1++)
+	if (XVECEXP (in_rtx, idx, j) != XVECEXP (in_rtx, idx, j1))
+	  break;
+
+	  if (j1 != j + 1)
+	{
+	  fprintf (m_outfile, " repeated x%i", j1 - j);
+	  j = j1 - 1;
+	}
+	}
 
   m_indent -= 2;
 }
diff --git a/gcc/read-rtl-function.c b/gcc/read-rtl-function.c
index cde9d3e..8746f70 100644
--- a/gcc/read-rtl-function.c
+++ b/gcc/read-rtl-function.c
@@ -2166,6 +2166,20 @@ test_loading_mem ()
   ASSERT_EQ (6, MEM_ADDR_SPACE (mem2));
 }
 
+/* Verify that "repeated xN" is read correctly.  */
+
+static void
+test_loading_repeat ()
+{
+  rtl_dump_test t (SELFTEST_LOCATION, locate_file ("repeat.rtl"));
+
+  rtx_insn *insn_1 = get_insn_by_uid (1);
+  ASSERT_EQ (PARALLEL, GET_CODE (PATTERN (insn_1)));
+  ASSERT_EQ (64, XVECLEN (PATTERN (insn_1), 0));
+  for (int i = 0; i < 64; i++)
+ASSERT_EQ (const0_rtx, XVECEXP (PATTERN (insn_1), 0, i));
+}
+
 /* Run all of the selftests within this file.  */
 
 void
@@ -2187,6 +2201,7 @@ read_rtl_function_c_tests ()
   test_loading_cfg ();
   test_loading_bb_index ();
   test_loading_mem ();
+  test_loading_repeat ();
 }
 
 } // namespace selftest
diff --git a/gcc/read-rtl.c b/gcc/read-rtl.c
index 723c3e1..d698dd4 100644
--- a/gcc/read-rtl.c
+++ b/gcc/read-rtl.c
@@ -1690,6 +1690,7 @@ rtx_reader::read_rtx_operand (rtx return_rtx, int idx)
 	struct obstack vector_stack;
 	int list_counter = 0;
 	rtvec return_vec = NULL_RTVEC;
+	rtx saved_rtx = NULL_RTX;
 
 	require_char_ws ('[');
 
@@ -1700,8 +1701,34 @@ rtx_reader::read_rtx_operand (rtx return_rtx, int idx)
 	if (c == EOF)
 	  fatal_expected_char (']', c);
 	unread_char (c);
-	list_counter++;
-	obstack_ptr_grow (&vector_stack, read_nested_rtx ());
+
+	rtx value;
+	int repeat_count = 1;
+	if (c == 'r')
+	  {
+		/* Process "repeated xN" directive.  */
+		read_name (&name);
+		if (strcmp (name.string, "repeated"))
+		  fatal_with_file_and_line ("invalid directive \"%s\"\n",
+	name.string);
+		read_name (&name);
+		if (!sscanf (name.string, "x%d", &repeat_count))
+		  fatal_with_file_and_line ("invalid repeat count \"%s\"\n",
+	name.string);
+
+		/* We already saw one of the instances.  */
+		repeat_count--;
+		value = saved_rtx;
+	  }
+	else
+	  value = read_nested_rtx ();
+
+	for (; repeat_count > 0; repeat_count--)
+	  {
+		list_counter++;
+		obstack_ptr_grow (&vector_stack, value);
+	  }
+	saved_rtx = value;
 	  }
 	if (list_counter > 0)
 	  {
diff --git a/gcc/rtl-tests.c b/gcc/rtl-tests.c
index f67f2a3..c684f8e 100644
--- a/gcc/rtl-tests.c
+++ b/gcc/rtl-tests.c
@@ -284,6 +284,29 @@ const_poly_int_tests::run ()
 	 gen_int_mode (poly_int64 (5, -1), QImode));
 }
 
+/* Check dumping of repeated RTL vectors.  */
+
+static void
+test_dumping_repeat ()
+{
+  rtx p = gen_rtx_PARALLEL (VOIDmode, rtvec_alloc (3));
+  XVECEXP (p, 0, 0) = const0_rtx;
+  XVECEXP (p, 0, 1) = const0_rtx;
+  XVECEXP (p, 0, 2) = const0_rtx;
+  ASSERT_RTL_DUMP_EQ ("(parallel [\n"
+		  "(const_int 0) repeated x3\n"
+		  "])",
+		  p);
+
+  XVECEXP (p, 0, 1) = const1_rtx;
+  ASSERT_RTL_DUMP_EQ ("(parallel [\n"
+		  "(const_int 0)\n"
+		  "(const_int 1)\n"
+		  "(const_int 0)\n"
+		  "])",
+	

Add missing alignment checks in epilogue loop vectorisation (PR 86877)

2018-09-20 Thread Richard Sandiford
Epilogue loop vectorisation skips vect_enhance_data_refs_alignment
since it doesn't make sense to version or peel the epilogue loop
(that will already have happened for the main loop).  But this means
that it also fails to check whether the accesses are suitably aligned
for the new vector subarch.

We don't seem to carry alignment information from the (potentially
peeled or versioned) main loop to the epilogue loop, which would be
good to fix at some point.  I think we want this patch regardless,
since there's no guarantee that the alignment requirements are the
same for every subarch.

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
and x86_64-linux-gnu.  OK to install?

Richard


2018-09-20  Richard Sandiford  

gcc/
PR tree-optimization/86877
* tree-vect-loop.c (vect_analyze_loop_2): Call
vect_verify_datarefs_alignment.

gcc/testsuite/
PR tree-optimization/86877
* gfortran.dg/vect/vect-8-epilogue.F90: New test.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2018-09-20 12:39:06.341625036 +0100
+++ gcc/tree-vect-loop.c2018-09-20 12:39:14.541555902 +0100
@@ -1979,20 +1979,21 @@ vect_analyze_loop_2 (loop_vec_info loop_
   if (!ok)
 return false;
 
-  /* Do not invoke vect_enhance_data_refs_alignment for eplilogue
- vectorization.  */
+  /* Do not invoke vect_enhance_data_refs_alignment for epilogue
+ vectorization, since we do not want to add extra peeling or
+ add versioning for alignment.  */
   if (!LOOP_VINFO_EPILOGUE_P (loop_vinfo))
-{
 /* This pass will decide on using loop versioning and/or loop peeling in
order to enhance the alignment of data references in the loop.  */
 ok = vect_enhance_data_refs_alignment (loop_vinfo);
-if (!ok)
-  {
-   if (dump_enabled_p ())
- dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
-  "bad data alignment.\n");
-return false;
-  }
+  else
+ok = vect_verify_datarefs_alignment (loop_vinfo);
+  if (!ok)
+{
+  if (dump_enabled_p ())
+   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
+"bad data alignment.\n");
+  return false;
 }
 
   if (slp)
Index: gcc/testsuite/gfortran.dg/vect/vect-8-epilogue.F90
===
--- /dev/null   2018-09-14 11:16:31.122530289 +0100
+++ gcc/testsuite/gfortran.dg/vect/vect-8-epilogue.F90  2018-09-20 
12:39:14.537555936 +0100
@@ -0,0 +1,6 @@
+! { dg-do compile }
+! { dg-require-effective-target vect_double }
+! { dg-additional-options "-finline-matmul-limit=0 --param 
vect-epilogues-nomask=1" }
+! { dg-additional-options "-mstrict-align" { target { aarch64*-*-* } } }
+
+#include "vect-8.f90"


Fix PEELING_FOR_NITERS calculation (PR 87288)

2018-09-20 Thread Richard Sandiford
PEELING_FOR_GAPS now means "peel one iteration for the epilogue",
in much the same way that PEELING_FOR_ALIGNMENT > 0 means
"peel that number of iterations for the prologue".  We weren't
taking this into account when deciding whether we needed to peel
further scalar iterations beyond the iterations for "gaps" and
"alignment".

Only the first test failed before the patch.  The other two
are just for completeness.

Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
and x86_64-linux-gnu.  OK to install?

Richard


2018-09-20  Richard Sandiford  

gcc/
PR tree-optimization/87288
* tree-vect-loop.c (vect_analyze_loop_2): Take PEELING_FOR_GAPS
into account when determining PEELING_FOR_NITERS.

gcc/testsuite/
PR tree-optimization/87288
* gcc.dg/vect/pr87288-1.c: New test.
* gcc.dg/vect/pr87288-2.c: Likewise,
* gcc.dg/vect/pr87288-3.c: Likewise.

Index: gcc/tree-vect-loop.c
===
--- gcc/tree-vect-loop.c2018-09-20 12:39:14.541555902 +0100
+++ gcc/tree-vect-loop.c2018-09-20 12:39:19.013518199 +0100
@@ -2074,14 +2074,22 @@ vect_analyze_loop_2 (loop_vec_info loop_
 /* The main loop handles all iterations.  */
 LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false;
   else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
-  && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > 0)
+  && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
 {
-  if (!multiple_p (LOOP_VINFO_INT_NITERS (loop_vinfo)
-  - LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo),
+  /* Work out the (constant) number of iterations that need to be
+peeled for reasons other than niters.  */
+  unsigned int peel_niter = LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo);
+  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
+   peel_niter += 1;
+  if (!multiple_p (LOOP_VINFO_INT_NITERS (loop_vinfo) - peel_niter,
   LOOP_VINFO_VECT_FACTOR (loop_vinfo)))
LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true;
 }
   else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
+  /* ??? When peeling for gaps but not alignment, we could
+ try to check whether the (variable) niters is known to be
+ VF * N + 1.  That's something of a niche case though.  */
+  || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
   || !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&const_vf)
   || ((tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
< (unsigned) exact_log2 (const_vf))
Index: gcc/testsuite/gcc.dg/vect/pr87288-1.c
===
--- /dev/null   2018-09-14 11:16:31.122530289 +0100
+++ gcc/testsuite/gcc.dg/vect/pr87288-1.c   2018-09-20 12:39:19.009518233 
+0100
@@ -0,0 +1,49 @@
+#include "tree-vect.h"
+
+#define N (VECTOR_BITS / 32)
+#define MAX_COUNT 4
+
+void __attribute__ ((noipa))
+run (int *restrict a, int *restrict b, int count)
+{
+  for (int i = 0; i < count * N; ++i)
+{
+  a[i * 2] = b[i * 2] + count;
+  a[i * 2 + 1] = count;
+}
+}
+
+void __attribute__ ((noipa))
+check (int *restrict a, int count)
+{
+  for (int i = 0; i < count * N; ++i)
+if (a[i * 2] != i * 41 + count || a[i * 2 + 1] != count)
+  __builtin_abort ();
+  if (a[count * 2 * N] != 999)
+__builtin_abort ();
+}
+
+int a[N * MAX_COUNT * 2 + 1], b[N * MAX_COUNT * 2];
+
+int
+main (void)
+{
+  check_vect ();
+
+  for (int i = 0; i < N * MAX_COUNT; ++i)
+{
+  b[i * 2] = i * 41;
+  asm volatile ("" ::: "memory");
+}
+
+  for (int i = 0; i <= MAX_COUNT; ++i)
+{
+  a[i * 2 * N] = 999;
+  run (a, b, i);
+  check (a, i);
+}
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times {LOOP VECTORIZED} 1 "vect" { target { { 
vect_int && vect_perm } && vect_element_align } } } } */
Index: gcc/testsuite/gcc.dg/vect/pr87288-2.c
===
--- /dev/null   2018-09-14 11:16:31.122530289 +0100
+++ gcc/testsuite/gcc.dg/vect/pr87288-2.c   2018-09-20 12:39:19.009518233 
+0100
@@ -0,0 +1,64 @@
+#include "tree-vect.h"
+
+#define N (VECTOR_BITS / 32)
+#define MAX_COUNT 4
+
+#define RUN_COUNT(COUNT)   \
+  void __attribute__ ((noipa)) \
+  run_##COUNT (int *restrict a, int *restrict b)   \
+  {\
+for (int i = 0; i < N * COUNT; ++i)\
+  {\
+   a[i * 2] = b[i * 2] + COUNT;\
+   a[i * 2 + 1] = COUNT;   \
+  }\
+  }
+
+RUN_COUNT (1)
+RUN_COUNT (2)
+RUN_COUNT (3)
+RUN_COUNT (4)
+
+void __attribute__ ((noipa))
+check (int *restrict a, int count)
+{
+  for (int i = 0; i <

Re: [PATCH][RFC] Fix PR63155 (some more)

2018-09-20 Thread Richard Biener
On Wed, 19 Sep 2018, Richard Biener wrote:

> 
> The second testcase in the above PR runs into our O(N) bitmap element
> search limitation and spends 8s (60%) of the compile-time in the SSA 
> propagator
> engine (when optimizing).  The patch improves that to 0.9s (15%).  For the 
> first testcase it isn't that bad but still the patch improves CCP from 36% to 
> 14%.
> 
> The "solution" is to use sbitmap instead of a bitmap to avoid
> the linearity when doing add_ssa_edge.  We pay for that (but not
> actually with the testcases) with a linear-time bitmap_first_set_bit
> in process_ssa_edge_worklist.  I do not (yet...) have a testcase
> that overall gets slower with this approach.  I suppose using
> std::set would "solve" the complexity issue but we'd pay
> back with horribly inefficient memory use.  Similarly with
> our sparse bitmap implementation which lacks an ordered
> first_set_bit (it only can get any set bit fast, breaking optimal
> iteration order).
> 
> If we'd only had a O(log n) search sparse bitmap implementation ...
> (Steven posted patches to switch bitmap from/to such one but IIRC
> that at least lacked bitmap_first_set_bit).
> 
> Bootstrapped and tested on x86_64-unknown-linux-gnu.
> 
> OK for trunk?

So it turns out that while the bitmap data structure isn't optimal
the issue with the testcase is that we end up with a full-set-universe
for the SSA worklist mostly because we are queueing PHIs via uses
on backedges that are not yet executable (so we'd re-simulate the PHI
anyway once that became excutable - no need to set the individual bits).

So I'm testing the following instead.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

Richard.

2018-09-20  Richard Biener  

PR tree-optimization/63155
* tree-ssa-propagate.c (add_ssa_edge): Avoid adding PHIs to
the worklist when the edge of the respective argument isn't
executable.

Index: gcc/tree-ssa-propagate.c
===
--- gcc/tree-ssa-propagate.c(revision 264438)
+++ gcc/tree-ssa-propagate.c(working copy)
@@ -168,10 +168,18 @@ add_ssa_edge (tree var)
   FOR_EACH_IMM_USE_FAST (use_p, iter, var)
 {
   gimple *use_stmt = USE_STMT (use_p);
+  basic_block use_bb = gimple_bb (use_stmt);
 
   /* If we did not yet simulate the block wait for this to happen
  and do not add the stmt to the SSA edge worklist.  */
-  if (! (gimple_bb (use_stmt)->flags & BB_VISITED))
+  if (! (use_bb->flags & BB_VISITED))
+   continue;
+
+  /* If this is a use on a not yet executable edge do not bother to
+queue it.  */
+  if (gimple_code (use_stmt) == GIMPLE_PHI
+ && !(EDGE_PRED (use_bb, PHI_ARG_INDEX_FROM_USE (use_p))->flags
+  & EDGE_EXECUTABLE))
continue;
 
   if (prop_simulate_again_p (use_stmt)


[PATCH] i386: Don't peephole test to and on CPUs that don't like it

2018-09-20 Thread Pip Cet
Some AMD CPUs fuse "test" followed by a conditional branch into a
single uop, but don't fuse "and" followed by a conditional branch.
This patch makes the test-to-and peephole rules depend on not tuning
for BDVER. This is a slight improvement in many cases, but it becomes
more significant when combined with the rest of the patch at PR87104.

I think this could be improved further by enabling the peephole rule
if the insn following the peephole is not a conditional branch, but I
don't know whether NONJUMP_INSN_P (peep2_next_insn (...)) works or is
the right approach.

Bootstrapped, but "make check" produces errors which appear unrelated
to this patch.

2018-09-18  Pip Cet  

PR 87104
* config/i386/i386.h (TARGET_FUSE_TEST_AND_BRANCH): Add.
* config/i386/i386.md (test to and peephole2s): Don't use for
TARGET_FUSE_TEST_AND_BRANCH.
* config/i386/x86-tune.def (TARGET_FUSE_TEST_AND_BRANCH): New.
Define for AMD family 15h.

---
 gcc/ChangeLog| 9 +
 gcc/config/i386/i386.h   | 2 ++
 gcc/config/i386/i386.md  | 9 ++---
 gcc/config/i386/x86-tune.def | 5 +
 4 files changed, 22 insertions(+), 3 deletions(-)
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 92b878f2300..5b6a57cce4a 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2018-09-18  Pip Cet  
+
+	PR 87104
+	* config/i386/i386.h (TARGET_FUSE_TEST_AND_BRANCH): Add.
+	* config/i386/i386.md (test to and peephole2s): Don't use for
+	TARGET_FUSE_TEST_AND_BRANCH.
+	* config/i386/x86-tune.def (TARGET_FUSE_TEST_AND_BRANCH): New.
+	Define for AMD family 15h.
+
 2018-09-18  Segher Boessenkool  
 
 	* config/rs6000/rs6000.md: Remove old "Cygnus sibcall" comment.
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 01eba5dd01f..5d580d15d30 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -529,6 +529,8 @@ extern unsigned char ix86_tune_features[X86_TUNE_LAST];
 #define TARGET_FUSE_CMP_AND_BRANCH \
 	(TARGET_64BIT ? TARGET_FUSE_CMP_AND_BRANCH_64 \
 	 : TARGET_FUSE_CMP_AND_BRANCH_32)
+#define TARGET_FUSE_TEST_AND_BRANCH \
+ix86_tune_features[X86_TUNE_FUSE_TEST_AND_BRANCH]
 #define TARGET_FUSE_CMP_AND_BRANCH_SOFLAGS \
 	ix86_tune_features[X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS]
 #define TARGET_FUSE_ALU_AND_BRANCH \
diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index e08b2b7c14b..77d560d390e 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -18388,7 +18388,8 @@
 	  [(and:SI (match_operand:SI 2 "register_operand")
 		   (match_operand:SI 3 "immediate_operand"))
 	   (const_int 0)]))]
-  "ix86_match_ccmode (insn, CCNOmode)
+  "(optimize_insn_for_size_p () || ! TARGET_FUSE_TEST_AND_BRANCH)
+   && ix86_match_ccmode (insn, CCNOmode)
&& (REGNO (operands[2]) != AX_REG
|| satisfies_constraint_K (operands[3]))
&& peep2_reg_dead_p (1, operands[2])"
@@ -18408,7 +18409,8 @@
 	  [(and:QI (match_operand:QI 2 "register_operand")
 		   (match_operand:QI 3 "immediate_operand"))
 	   (const_int 0)]))]
-  "! TARGET_PARTIAL_REG_STALL
+  "! TARGET_FUSE_TEST_AND_BRANCH
+   && ! TARGET_PARTIAL_REG_STALL
&& ix86_match_ccmode (insn, CCNOmode)
&& REGNO (operands[2]) != AX_REG
&& peep2_reg_dead_p (1, operands[2])"
@@ -18429,7 +18431,8 @@
 (const_int 8)) 0)
 	 (match_operand 3 "const_int_operand"))
 	   (const_int 0)]))]
-  "! TARGET_PARTIAL_REG_STALL
+  "! TARGET_FUSE_TEST_AND_BRANCH
+   && ! TARGET_PARTIAL_REG_STALL
&& ix86_match_ccmode (insn, CCNOmode)
&& REGNO (operands[2]) != AX_REG
&& peep2_reg_dead_p (1, operands[2])"
diff --git a/gcc/config/i386/x86-tune.def b/gcc/config/i386/x86-tune.def
index a46450ad99d..ef0cc5a5a0f 100644
--- a/gcc/config/i386/x86-tune.def
+++ b/gcc/config/i386/x86-tune.def
@@ -113,6 +113,11 @@ DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH_64, "fuse_cmp_and_branch_64",
 DEF_TUNE (X86_TUNE_FUSE_CMP_AND_BRANCH_SOFLAGS, "fuse_cmp_and_branch_soflags",
 	  m_NEHALEM | m_SANDYBRIDGE | m_CORE_AVX2 | m_BDVER | m_ZNVER1 | m_GENERIC)
 
+/* X86_TUNE_FUSE_TEST_AND_BRANCH: Fuse test with a subsequent
+   conditional jump instruction. */
+DEF_TUNE (X86_TUNE_FUSE_TEST_AND_BRANCH, "fuse_test_and_branch",
+  m_BDVER)
+
 /* X86_TUNE_FUSE_ALU_AND_BRANCH: Fuse alu with a subsequent conditional
jump instruction when the alu instruction produces the CCFLAG consumed by
the conditional jump instruction. */


Re: [PATCH 03/25] Improve TARGET_MANGLE_DECL_ASSEMBLER_NAME.

2018-09-20 Thread Richard Biener
On Wed, Sep 19, 2018 at 5:11 PM Julian Brown  wrote:
>
> On Fri, 14 Sep 2018 22:49:35 -0400
> Julian Brown  wrote:
>
> > > > On 12/09/18 16:16, Richard Biener wrote:
> > > > It may well be that there's a better way to solve the problem, or
> > > > at least to do the lookups.
> > > >
> > > > It may also be that there are some unintended consequences, such
> > > > as false name matches, but I don't know of any at present.
>
> > > Possibly, this was an abuse of these hooks, but it's arguably wrong
> > > that that e.g. handle_alias_pairs has the "assembler name" leak
> > > through into the user's source code -- if it's expected that the
> > > hook could make arbitrary transformations to the string. (The
> > > latter hook is only used by PE code for x86 at present, by the look
> > > of it, and the default handles only special-purpose mangling
> > > indicated by placing a '*' at the front of the symbol.)
>
> Two places I've found that currently expose the underlying symbol name
> in the user's source code: one (documented!) is C++, where one must
> write the mangled symbol name as the alias target:
>
> int foo (int c) { ... }
> int bar (int) __attribute__((alias("_Z3fooi")));
>
> another (perhaps obscure) is x86/PE with "fastcall":
>
> __attribute__((fastcall)) void foo(void) { ... }
> void bar(void) __attribute__((alias("@foo@0")));
>
> both of which probably suggest that using the decl name, rather than
> demangling the assembler name (or using some completely different
> solution) was the wrong thing to do.
>
> I'll keep thinking about this...

Thanks, IIRC we already have some targets with quite complex renaming
where I wonder iff uses like above work correctly.

Btw, if you don't "fix" the handle_alias_paris code but keep your mangling
what does break for you in practice (apart from maybe some testcases)?

Richard.

> Julian


Re: [PATCH 16/25] Fix IRA ICE.

2018-09-20 Thread Richard Sandiford
Andrew Stubbs  writes:
> On 17/09/18 10:22, Richard Sandiford wrote:
>>  writes:
>>> The IRA pass makes an assumption that any pseudos created after the
>>> pass begins
>>> were created explicitly by the pass itself and therefore will have
>>> corresponding entries in its other tables.
>>>
>>> The GCN back-end, however, often creates additional pseudos, in expand
>>> patterns, to represent the necessary EXEC value, and these break IRA's
>>> assumption and cause ICEs.
>>>
>>> This patch simply has IRA skip unknown pseudos, and the problem goes away.
>>>
>>> Presumably, it's not ideal that these registers have not been
>>> processed by IRA,
>>> but it does not appear to do any real harm.
>> 
>> Could you go into more detail about how this happens?  Other targets
>> also create pseudos in their move patterns.
>
> Here's a simplified snippet from the machine description:
>
> (define_expand "mov" 
>  
>  
>
>[(set (match_operand:VEC_REG_MODE 0 "nonimmediate_operand") 
>  
>  
>
>  (match_operand:VEC_REG_MODE 1 "general_operand"))] 
>  
>  
>
>"" 
>  
>  
>
>{ 
>  
>  
>
>  [...]
>  
>  
>  
>
>  if (can_create_pseudo_p ()) 
>  
>  
>
>{ 
>  
>  
>
>  rtx exec = gcn_full_exec_reg ();
>  rtx undef = gcn_gen_undef (mode);
>  
>
>  [...]
>
>   emit_insn (gen_mov_vector (operands[0], operands[1], exec
>   undef));
>  [...]
>
>  DONE;
>}
>})
>
> gcn_full_exec_reg creates a new pseudo. It gets used as the mask 
> parameter of a vec_merge.
>
> These registers then trip the asserts in ira.c.
>
> In the case of setup_preferred_alternate_classes_for_new_pseudos it's 
> because they have numbers greater than "start" but have not been 
> initialized with different ORIGINAL_REGNO (why would they have been?)
>
> In the case of move_unallocated_pseudos it's because the table 
> pseudo_replaced_reg only has entries for the new pseudos directly 
> created by find_moveable_pseudos, not the ones created indirectly.

What I more meant was: where do the moves that introduce the new
pseudos get created?

Almost all targets' move patterns introduce new pseudos if
can_create_pseudo_p in certain circumstances, so GCN isn't doing
anything unusual in the outline above.  I think it comes down to
the specifics of which kinds of operands require these temporaries
and where the moves are being introduced.

AIUI IRA normally calls expand_reg_info () at a suitable point
to cope with new pseudos.  It sounds like we might be missing
a call somewhere.

Richard


Re: Add missing alignment checks in epilogue loop vectorisation (PR 86877)

2018-09-20 Thread Richard Biener
On Thu, Sep 20, 2018 at 1:42 PM Richard Sandiford
 wrote:
>
> Epilogue loop vectorisation skips vect_enhance_data_refs_alignment
> since it doesn't make sense to version or peel the epilogue loop
> (that will already have happened for the main loop).  But this means
> that it also fails to check whether the accesses are suitably aligned
> for the new vector subarch.
>
> We don't seem to carry alignment information from the (potentially
> peeled or versioned) main loop to the epilogue loop, which would be
> good to fix at some point.  I think we want this patch regardless,
> since there's no guarantee that the alignment requirements are the
> same for every subarch.
>
> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
> and x86_64-linux-gnu.  OK to install?

OK.

Richard.

> Richard
>
>
> 2018-09-20  Richard Sandiford  
>
> gcc/
> PR tree-optimization/86877
> * tree-vect-loop.c (vect_analyze_loop_2): Call
> vect_verify_datarefs_alignment.
>
> gcc/testsuite/
> PR tree-optimization/86877
> * gfortran.dg/vect/vect-8-epilogue.F90: New test.
>
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2018-09-20 12:39:06.341625036 +0100
> +++ gcc/tree-vect-loop.c2018-09-20 12:39:14.541555902 +0100
> @@ -1979,20 +1979,21 @@ vect_analyze_loop_2 (loop_vec_info loop_
>if (!ok)
>  return false;
>
> -  /* Do not invoke vect_enhance_data_refs_alignment for eplilogue
> - vectorization.  */
> +  /* Do not invoke vect_enhance_data_refs_alignment for epilogue
> + vectorization, since we do not want to add extra peeling or
> + add versioning for alignment.  */
>if (!LOOP_VINFO_EPILOGUE_P (loop_vinfo))
> -{
>  /* This pass will decide on using loop versioning and/or loop peeling in
> order to enhance the alignment of data references in the loop.  */
>  ok = vect_enhance_data_refs_alignment (loop_vinfo);
> -if (!ok)
> -  {
> -   if (dump_enabled_p ())
> - dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> -  "bad data alignment.\n");
> -return false;
> -  }
> +  else
> +ok = vect_verify_datarefs_alignment (loop_vinfo);
> +  if (!ok)
> +{
> +  if (dump_enabled_p ())
> +   dump_printf_loc (MSG_MISSED_OPTIMIZATION, vect_location,
> +"bad data alignment.\n");
> +  return false;
>  }
>
>if (slp)
> Index: gcc/testsuite/gfortran.dg/vect/vect-8-epilogue.F90
> ===
> --- /dev/null   2018-09-14 11:16:31.122530289 +0100
> +++ gcc/testsuite/gfortran.dg/vect/vect-8-epilogue.F90  2018-09-20 
> 12:39:14.537555936 +0100
> @@ -0,0 +1,6 @@
> +! { dg-do compile }
> +! { dg-require-effective-target vect_double }
> +! { dg-additional-options "-finline-matmul-limit=0 --param 
> vect-epilogues-nomask=1" }
> +! { dg-additional-options "-mstrict-align" { target { aarch64*-*-* } } }
> +
> +#include "vect-8.f90"


Re: Fix PEELING_FOR_NITERS calculation (PR 87288)

2018-09-20 Thread Richard Biener
On Thu, Sep 20, 2018 at 1:44 PM Richard Sandiford
 wrote:
>
> PEELING_FOR_GAPS now means "peel one iteration for the epilogue",
> in much the same way that PEELING_FOR_ALIGNMENT > 0 means
> "peel that number of iterations for the prologue".  We weren't
> taking this into account when deciding whether we needed to peel
> further scalar iterations beyond the iterations for "gaps" and
> "alignment".
>
> Only the first test failed before the patch.  The other two
> are just for completeness.
>
> Tested on aarch64-linux-gnu (with and without SVE), aarch64_be-elf
> and x86_64-linux-gnu.  OK to install?

OK.

Richard.

> Richard
>
>
> 2018-09-20  Richard Sandiford  
>
> gcc/
> PR tree-optimization/87288
> * tree-vect-loop.c (vect_analyze_loop_2): Take PEELING_FOR_GAPS
> into account when determining PEELING_FOR_NITERS.
>
> gcc/testsuite/
> PR tree-optimization/87288
> * gcc.dg/vect/pr87288-1.c: New test.
> * gcc.dg/vect/pr87288-2.c: Likewise,
> * gcc.dg/vect/pr87288-3.c: Likewise.
>
> Index: gcc/tree-vect-loop.c
> ===
> --- gcc/tree-vect-loop.c2018-09-20 12:39:14.541555902 +0100
> +++ gcc/tree-vect-loop.c2018-09-20 12:39:19.013518199 +0100
> @@ -2074,14 +2074,22 @@ vect_analyze_loop_2 (loop_vec_info loop_
>  /* The main loop handles all iterations.  */
>  LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = false;
>else if (LOOP_VINFO_NITERS_KNOWN_P (loop_vinfo)
> -  && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) > 0)
> +  && LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo) >= 0)
>  {
> -  if (!multiple_p (LOOP_VINFO_INT_NITERS (loop_vinfo)
> -  - LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo),
> +  /* Work out the (constant) number of iterations that need to be
> +peeled for reasons other than niters.  */
> +  unsigned int peel_niter = LOOP_VINFO_PEELING_FOR_ALIGNMENT 
> (loop_vinfo);
> +  if (LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo))
> +   peel_niter += 1;
> +  if (!multiple_p (LOOP_VINFO_INT_NITERS (loop_vinfo) - peel_niter,
>LOOP_VINFO_VECT_FACTOR (loop_vinfo)))
> LOOP_VINFO_PEELING_FOR_NITER (loop_vinfo) = true;
>  }
>else if (LOOP_VINFO_PEELING_FOR_ALIGNMENT (loop_vinfo)
> +  /* ??? When peeling for gaps but not alignment, we could
> + try to check whether the (variable) niters is known to be
> + VF * N + 1.  That's something of a niche case though.  */
> +  || LOOP_VINFO_PEELING_FOR_GAPS (loop_vinfo)
>|| !LOOP_VINFO_VECT_FACTOR (loop_vinfo).is_constant (&const_vf)
>|| ((tree_ctz (LOOP_VINFO_NITERS (loop_vinfo))
> < (unsigned) exact_log2 (const_vf))
> Index: gcc/testsuite/gcc.dg/vect/pr87288-1.c
> ===
> --- /dev/null   2018-09-14 11:16:31.122530289 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr87288-1.c   2018-09-20 12:39:19.009518233 
> +0100
> @@ -0,0 +1,49 @@
> +#include "tree-vect.h"
> +
> +#define N (VECTOR_BITS / 32)
> +#define MAX_COUNT 4
> +
> +void __attribute__ ((noipa))
> +run (int *restrict a, int *restrict b, int count)
> +{
> +  for (int i = 0; i < count * N; ++i)
> +{
> +  a[i * 2] = b[i * 2] + count;
> +  a[i * 2 + 1] = count;
> +}
> +}
> +
> +void __attribute__ ((noipa))
> +check (int *restrict a, int count)
> +{
> +  for (int i = 0; i < count * N; ++i)
> +if (a[i * 2] != i * 41 + count || a[i * 2 + 1] != count)
> +  __builtin_abort ();
> +  if (a[count * 2 * N] != 999)
> +__builtin_abort ();
> +}
> +
> +int a[N * MAX_COUNT * 2 + 1], b[N * MAX_COUNT * 2];
> +
> +int
> +main (void)
> +{
> +  check_vect ();
> +
> +  for (int i = 0; i < N * MAX_COUNT; ++i)
> +{
> +  b[i * 2] = i * 41;
> +  asm volatile ("" ::: "memory");
> +}
> +
> +  for (int i = 0; i <= MAX_COUNT; ++i)
> +{
> +  a[i * 2 * N] = 999;
> +  run (a, b, i);
> +  check (a, i);
> +}
> +
> +  return 0;
> +}
> +
> +/* { dg-final { scan-tree-dump-times {LOOP VECTORIZED} 1 "vect" { target { { 
> vect_int && vect_perm } && vect_element_align } } } } */
> Index: gcc/testsuite/gcc.dg/vect/pr87288-2.c
> ===
> --- /dev/null   2018-09-14 11:16:31.122530289 +0100
> +++ gcc/testsuite/gcc.dg/vect/pr87288-2.c   2018-09-20 12:39:19.009518233 
> +0100
> @@ -0,0 +1,64 @@
> +#include "tree-vect.h"
> +
> +#define N (VECTOR_BITS / 32)
> +#define MAX_COUNT 4
> +
> +#define RUN_COUNT(COUNT)   \
> +  void __attribute__ ((noipa)) \
> +  run_##COUNT (int *restrict a, int *restrict b)   \
> +  {\
> +for (int i = 0; i < N * COUNT; ++i)\
> +  {\
> +  

Re: [RFA] Minor cleanup to VRP/EVRP handling of deferred edge/switch optimization

2018-09-20 Thread Richard Biener
On Mon, Sep 17, 2018 at 4:50 PM Jeff Law  wrote:
>
> This is a relatively minor cleanup that I should have caught last cycle,
> but somehow missed.
>
> We have two structures TO_REMOVE_EDGES and TO_UPDATE_SWITCH_STMTS which
> are used by the VRP/EVRP code to record edges to remove and switch
> statements that need updating.
>
> They are currently implemented as globals within tree-vrp.c with an
> appropriate extern within tree-vrp.h.
>
> The code to walk those vectors was only implemented in VRP, but we can
> (and do) add to those vectors within EVRP.   So EVRP would detect
> certain edges as dead or switches that were needed simplification, but
> they were left as-is because EVRP never walked the vectors to do the
> necessary cleanup.
>
> This change pushes the vectors into the vr_values structure.  They're
> initialized in the ctor and we verify they're properly cleaned up in the
> dtor.  This obviously avoids the global object carrying state, but also
> catches cases where we record that an optimization was possible but
> failed to update the IL appropriately.
>
> As a side effect, we don't need to bother with initializing and wiping
> EDGE_IGNORE for jump threading in VRP.  We just mark the appropriate
> edges at the same time we put them in TO_REMOVE_EDGES within vr_values.
>
> vrp113.c is an example of where EVRP detected optimizations should be
> possible, but failed to update the IL.  Given the test really wanted to
> check VRP's behavior, I've disabled EVRP for that test.
>
> Bootstrapped and regression tested on x86_64-linux-gnu.
>
> OK for the trunk?

OK.  Maybe you can duplicate vrp113.c to also have a EVRP testing variant
of this functionality?

Thanks,
Richard.

> Jeff
>


Re: [PATCH 14/25] Disable inefficient vectorization of elementwise loads/stores.

2018-09-20 Thread Richard Biener
On Mon, Sep 17, 2018 at 2:40 PM Andrew Stubbs  wrote:
>
> On 17/09/18 12:43, Richard Sandiford wrote:
> > OK, sounds like the cost of vec_construct is too low then.  But looking
> > at the port, I see you have:
> >
> > /* Implement TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST.  */
> >
> > int
> > gcn_vectorization_cost (enum vect_cost_for_stmt ARG_UNUSED (type_of_cost),
> >   tree ARG_UNUSED (vectype), int ARG_UNUSED (misalign))
> > {
> >/* Always vectorize.  */
> >return 1;
> > }
> >
> > which short-circuits the cost-model altogether.  Isn't that part
> > of the problem?
>
> Well, it's possible that that's a little simplistic. ;-)
>
> Although, actually the elementwise issue predates the existence of
> gcn_vectorization_cost, and the default does appear to penalize
> vec_construct somewhat.
>
> Actually, the default definition doesn't seem to do much besides
> increase vec_construct, so I'm not sure now why I needed to change it?
> Hmm, more experiments to do.
>
> Thanks for the pointer.

Btw, we do not consider to use gather/scatter for VMAT_ELEMENTWISE,
that's a missed "optimization" quite possibly because gather/scatter is so
expensive on x86.  Thus the vectorizer should consider this and use the
cheaper alternative according to the cost model (which you of course should
fill with sensible values...).

Richard.

> Andrew


Re: [PATCH 02/25] Propagate address spaces to builtins.

2018-09-20 Thread Richard Biener
On Wed, Sep 5, 2018 at 1:50 PM  wrote:
>
>
> At present, pointers passed to builtin functions, including atomic operators,
> are stripped of their address space properties.  This doesn't seem to be
> deliberate, it just omits to copy them.
>
> Not only that, but it forces pointer sizes to Pmode, which isn't appropriate
> for all address spaces.
>
> This patch attempts to correct both issues.  It works for GCN atomics and
> GCN OpenACC gang-private variables.

OK.

Richard.

> 2018-09-05  Andrew Stubbs  
> Julian Brown  
>
> gcc/
> * builtins.c (get_builtin_sync_mem): Handle address spaces.
> ---
>  gcc/builtins.c | 13 ++---
>  1 file changed, 10 insertions(+), 3 deletions(-)
>


Re: Fold more boolean expressions

2018-09-20 Thread Richard Biener
On Sat, Sep 15, 2018 at 8:01 AM MCC CS  wrote:
>
> Sorry for doing the same mistake twice. Is this OK, and do
> I need to test it again after the first version of this
> patch?
>
> 2018-09-15 MCC CS 
>
> gcc/
> PR tree-optimization/87261
> * match.pd: Add boolean optimizations,
> fix whitespace.
>
> 2018-09-15 MCC CS 
>
> gcc/testsuite/
> PR tree-optimization/87261
> * gcc.dg/pr87261.c: New test.
>
> Index: gcc/match.pd
> ===
> --- gcc/match.pd(revision 264170)
> +++ gcc/match.pd(working copy)
> @@ -92,7 +92,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>IFN_FMA IFN_FMS IFN_FNMA IFN_FNMS)
>  (define_operator_list COND_TERNARY
>IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS)
> -
> +
>  /* As opposed to convert?, this still creates a single pattern, so
> it is not a suitable replacement for convert? in all cases.  */
>  (match (nop_convert @0)
> @@ -106,7 +106,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>&& tree_nop_conversion_p (TREE_TYPE (type), TREE_TYPE (TREE_TYPE 
> (@0))
>  /* This one has to be last, or it shadows the others.  */
>  (match (nop_convert @0)
> - @0)
> + @0)
>
>  /* Transform likes of (char) ABS_EXPR <(int) x> into (char) ABSU_EXPR 
> ABSU_EXPR returns unsigned absolute value of the operand and the operand
> @@ -285,7 +285,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   And not for _Fract types where we can't build 1.  */
>(if (!integer_zerop (@0) && !ALL_FRACT_MODE_P (TYPE_MODE (type)))
> { build_one_cst (type); }))
> - /* X / abs (X) is X < 0 ? -1 : 1.  */
> + /* X / abs (X) is X < 0 ? -1 : 1.  */
>   (simplify
> (div:C @0 (abs @0))
> (if (INTEGRAL_TYPE_P (type)
> @@ -921,6 +921,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(bitop:c @0 (bit_not (bitop:cs @0 @1)))
>(bitop @0 (bit_not @1
>
> +/* (~x & y) | ~(x | y) -> ~x */
> +(simplify
> + (bit_ior:c (bit_and:c (bit_not@2 @0) @1) (bit_not (bit_ior:c @0 @1)))
> + @2)
> +
> +/* (x | y) ^ (x | ~y) -> ~x */
> +(simplify
> + (bit_xor:c (bit_ior:c @0 @1) (bit_ior:c @0 (bit_not @1)))
> + (bit_not @0))
> +
> +/* (x & y) | ~(x | y) -> ~(x ^ y) */
> +(simplify
> + (bit_ior:c (bit_and @0 @1) (bit_not:s (bit_ior:s @0 @1)))

I think this misses :cs on the bit_and.

> + (bit_not (bit_xor @0 @1)))
> +
> +/* (~x | y) ^ (x ^ y) -> x | ~y */
> +(simplify
> + (bit_xor:c (bit_ior:cs (bit_not @0) @1) (bit_xor:c @0 @1))
> + (bit_ior @0 (bit_not @1)))

:s on the bit_xor

> +/* (x ^ y) | ~(x | y) -> ~(x & y) */
> +(simplify
> + (bit_ior:c (bit_xor @0 @1) (bit_not:s (bit_ior @0 @1)))
> + (bit_not (bit_and @0 @1)))

:cs on the bit_xor, :s on the second bit_ior

Otherwise looks OK to me.

Thanks,
Richard.

>  /* (x | y) & ~x -> y & ~x */
>  /* (x & y) | ~x -> y | ~x */
>  (for bitop (bit_and bit_ior)
> @@ -1131,7 +1156,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>(if (tree_nop_conversion_p (type, TREE_TYPE (@0))
> && tree_nop_conversion_p (type, TREE_TYPE (@1)))
> (mult (convert @0) (convert (negate @1)
> -
> +
>  /* -(A + B) -> (-B) - A.  */
>  (simplify
>   (negate (plus:c @0 negate_expr_p@1))
> @@ -3091,7 +3116,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (if (tree_int_cst_sgn (@1) < 0)
>   (scmp @0 @2)
>   (cmp @0 @2))
> -
> +
>  /* Simplify comparison of something with itself.  For IEEE
> floating-point, we can only do some of these simplifications.  */
>  (for cmp (eq ge le)
> @@ -3162,11 +3187,11 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  }
>tree newtype
>  = (TYPE_PRECISION (TREE_TYPE (@0)) > TYPE_PRECISION (type1)
> -  ? TREE_TYPE (@0) : type1);
> +  ? TREE_TYPE (@0) : type1);
>  }
>  (if (TYPE_PRECISION (TREE_TYPE (@2)) > TYPE_PRECISION (newtype))
>   (cmp (convert:newtype @0) (convert:newtype @1))
> -
> +
>   (simplify
>(cmp @0 REAL_CST@1)
>/* IEEE doesn't distinguish +0 and -0 in comparisons.  */
> @@ -3414,7 +3439,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> (FTYPE) N == CST -> 0
> (FTYPE) N != CST -> 1.  */
> (if (cmp == EQ_EXPR || cmp == NE_EXPR)
> -{ constant_boolean_node (cmp == NE_EXPR, type); })
> +{ constant_boolean_node (cmp == NE_EXPR, type); })
> /* Otherwise replace with sensible integer constant.  */
> (with
>  {
> @@ -3656,7 +3681,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>   (simplify
>(cmp (bit_and@2 @0 integer_pow2p@1) @1)
>(icmp @2 { build_zero_cst (TREE_TYPE (@0)); })))
> -
> +
>  /* If we have (A & C) != 0 ? D : 0 where C and D are powers of 2,
> convert this into a shift followed by ANDing with D.  */
>  (simplify
> @@ -3876,7 +3901,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
>  (if (cmp == LE_EXPR)
>  (ge (convert:st @0) { build_zero_cst (st); })
>  (lt (convert:st @0) { build_zero_cst (st); }))
> -
> +
>  (for cmp (unordered ord

Re: [PATCH 16/25] Fix IRA ICE.

2018-09-20 Thread Andrew Stubbs

On 20/09/18 13:46, Richard Sandiford wrote:

Andrew Stubbs  writes:

In the case of move_unallocated_pseudos it's because the table
pseudo_replaced_reg only has entries for the new pseudos directly
created by find_moveable_pseudos, not the ones created indirectly.


What I more meant was: where do the moves that introduce the new
pseudos get created?


For find_moveable_pseudos, I believe it's where it calls gen_move_insn.


Almost all targets' move patterns introduce new pseudos if
can_create_pseudo_p in certain circumstances, so GCN isn't doing
anything unusual in the outline above.  I think it comes down to
the specifics of which kinds of operands require these temporaries
and where the moves are being introduced.


GCN creates new pseudos for all vector moves. Maybe that's just less 
exotic than other targets do?



AIUI IRA normally calls expand_reg_info () at a suitable point
to cope with new pseudos.  It sounds like we might be missing
a call somewhere.


Yes, it does, but one of the places I had to patch is *within* 
expand_reg_info: it's setup_preferred_alternate_classes_for_new_pseudos 
that asserts for pseudos created by gen_move_insn.


Andrew


Re: [PATCH] PR libstdc++/78179 run long double tests separately

2018-09-20 Thread Christophe Lyon
On Wed, 19 Sep 2018 at 23:13, Rainer Orth  wrote:
>
> Hi Christophe,
>
> > I have noticed failures on hypot-long-double.cc on arm, so I suggest we add:
> >
> > diff --git
> > a/libstdc++-v3/testsuite/26_numerics/headers/cmath/hypot-long-double.cc
> > b/libstdc++-v3/testsuite/26_numerics/headers/cmath/hypot-long-double.cc
> > index 8a05473..4c2e33b 100644
> > --- a/libstdc++-v3/testsuite/26_numerics/headers/cmath/hypot-long-double.cc
> > +++ b/libstdc++-v3/testsuite/26_numerics/headers/cmath/hypot-long-double.cc
> > @@ -17,7 +17,7 @@
> >
> >  // { dg-options "-std=gnu++17" }
> >  // { dg-do run { target c++17 } }
> > -// { dg-xfail-run-if "PR 78179" { powerpc-ibm-aix* hppa-*-linux* nios2-*-* 
> > } }
> > +// { dg-xfail-run-if "PR 78179" { powerpc-ibm-aix* hppa-*-linux*
> > nios2-*-* arm*-*-* } }
> >
> >  // Run the long double tests from hypot.cc separately, because they fail 
> > on a
> >  // number of targets. See PR libstdc++/78179 for details.
> >
> > OK?
>
> just a nit (and not a review): I'd prefer the target list to be sorted
> alphabetically, not completely random.
>

Sure, I can sort the whole list, if OK on principle.

Christophe

> Thanks.
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH 14/25] Disable inefficient vectorization of elementwise loads/stores.

2018-09-20 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, Sep 17, 2018 at 2:40 PM Andrew Stubbs  wrote:
>> On 17/09/18 12:43, Richard Sandiford wrote:
>> > OK, sounds like the cost of vec_construct is too low then.  But looking
>> > at the port, I see you have:
>> >
>> > /* Implement TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST.  */
>> >
>> > int
>> > gcn_vectorization_cost (enum vect_cost_for_stmt ARG_UNUSED (type_of_cost),
>> >   tree ARG_UNUSED (vectype), int ARG_UNUSED (misalign))
>> > {
>> >/* Always vectorize.  */
>> >return 1;
>> > }
>> >
>> > which short-circuits the cost-model altogether.  Isn't that part
>> > of the problem?
>>
>> Well, it's possible that that's a little simplistic. ;-)
>>
>> Although, actually the elementwise issue predates the existence of
>> gcn_vectorization_cost, and the default does appear to penalize
>> vec_construct somewhat.
>>
>> Actually, the default definition doesn't seem to do much besides
>> increase vec_construct, so I'm not sure now why I needed to change it?
>> Hmm, more experiments to do.
>>
>> Thanks for the pointer.
>
> Btw, we do not consider to use gather/scatter for VMAT_ELEMENTWISE,
> that's a missed "optimization" quite possibly because gather/scatter is so
> expensive on x86.  Thus the vectorizer should consider this and use the
> cheaper alternative according to the cost model (which you of course should
> fill with sensible values...).

Do you mean it this way round, or that it doesn't consider using
VMAT_ELEMENTWISE for natural gather/scatter accesses?  We do use
VMAT_GATHER_SCATTER instead of VMAT_ELEMENTWISE where possible for SVE,
but that relies on implementing the new optabs instead of using the old
built-in-based interface, so it doesn't work for x86 yet.

I guess we might need some way of selecting between the two if
the costs of gather and scatter are context-dependent in some way.
But if gather/scatter is always more expensive than VMAT_ELEMENTWISE
for certain modes then it's probably better not to define the optabs
for those modes.

Thanks,
Richard


PATCH to add -Wno-init-list-lifetime to C++ Language Options

2018-09-20 Thread Marek Polacek
Applying as obvious.

2018-09-20  Marek Polacek  

* doc/invoke.texi: Add -Wno-init-list-lifetime to C++ Language Options.

diff --git gcc/doc/invoke.texi gcc/doc/invoke.texi
index aab5fcec35a..cfa9c143784 100644
--- gcc/doc/invoke.texi
+++ gcc/doc/invoke.texi
@@ -229,7 +229,7 @@ in the following sections.
 -fext-numeric-literals @gol
 -Wabi=@var{n}  -Wabi-tag  -Wconversion-null  -Wctor-dtor-privacy @gol
 -Wdelete-non-virtual-dtor -Wdeprecated-copy  -Wliteral-suffix @gol
--Wmultiple-inheritance @gol
+-Wmultiple-inheritance -Wno-init-list-lifetime @gol
 -Wnamespaces  -Wnarrowing @gol
 -Wpessimizing-move  -Wredundant-move @gol
 -Wnoexcept  -Wnoexcept-type  -Wclass-memaccess @gol


Re: Fold more boolean expressions

2018-09-20 Thread Marc Glisse

On Thu, 20 Sep 2018, Richard Biener wrote:


On Sat, Sep 15, 2018 at 8:01 AM MCC CS  wrote:


Sorry for doing the same mistake twice. Is this OK, and do
I need to test it again after the first version of this
patch?

2018-09-15 MCC CS 

gcc/
PR tree-optimization/87261
* match.pd: Add boolean optimizations,
fix whitespace.

2018-09-15 MCC CS 

gcc/testsuite/
PR tree-optimization/87261
* gcc.dg/pr87261.c: New test.

Index: gcc/match.pd
===
--- gcc/match.pd(revision 264170)
+++ gcc/match.pd(working copy)
@@ -92,7 +92,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   IFN_FMA IFN_FMS IFN_FNMA IFN_FNMS)
 (define_operator_list COND_TERNARY
   IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS)
-
+
 /* As opposed to convert?, this still creates a single pattern, so
it is not a suitable replacement for convert? in all cases.  */
 (match (nop_convert @0)
@@ -106,7 +106,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   && tree_nop_conversion_p (TREE_TYPE (type), TREE_TYPE (TREE_TYPE 
(@0))
 /* This one has to be last, or it shadows the others.  */
 (match (nop_convert @0)
- @0)
+ @0)

 /* Transform likes of (char) ABS_EXPR <(int) x> into (char) ABSU_EXPR 
ABSU_EXPR returns unsigned absolute value of the operand and the operand
@@ -285,7 +285,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
  And not for _Fract types where we can't build 1.  */
   (if (!integer_zerop (@0) && !ALL_FRACT_MODE_P (TYPE_MODE (type)))
{ build_one_cst (type); }))
- /* X / abs (X) is X < 0 ? -1 : 1.  */
+ /* X / abs (X) is X < 0 ? -1 : 1.  */
  (simplify
(div:C @0 (abs @0))
(if (INTEGRAL_TYPE_P (type)
@@ -921,6 +921,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
   (bitop:c @0 (bit_not (bitop:cs @0 @1)))
   (bitop @0 (bit_not @1

+/* (~x & y) | ~(x | y) -> ~x */
+(simplify
+ (bit_ior:c (bit_and:c (bit_not@2 @0) @1) (bit_not (bit_ior:c @0 @1)))
+ @2)
+
+/* (x | y) ^ (x | ~y) -> ~x */
+(simplify
+ (bit_xor:c (bit_ior:c @0 @1) (bit_ior:c @0 (bit_not @1)))
+ (bit_not @0))
+
+/* (x & y) | ~(x | y) -> ~(x ^ y) */
+(simplify
+ (bit_ior:c (bit_and @0 @1) (bit_not:s (bit_ior:s @0 @1)))


I think this misses :cs on the bit_and.


For :c, shouldn't canonicalization make the order of @0 and @1 consistent 
for bit_and and bit_ior?



+ (bit_not (bit_xor @0 @1)))
+
+/* (~x | y) ^ (x ^ y) -> x | ~y */
+(simplify
+ (bit_xor:c (bit_ior:cs (bit_not @0) @1) (bit_xor:c @0 @1))
+ (bit_ior @0 (bit_not @1)))


:s on the bit_xor


+/* (x ^ y) | ~(x | y) -> ~(x & y) */
+(simplify
+ (bit_ior:c (bit_xor @0 @1) (bit_not:s (bit_ior @0 @1)))
+ (bit_not (bit_and @0 @1)))


:cs on the bit_xor, :s on the second bit_ior

Otherwise looks OK to me.


--
Marc Glisse


[patch] prepend vxworks-dummy.h to tm_file for powerp

2018-09-20 Thread Olivier Hainque
Hello,

vxworks-dummy.h is intended to be included in the list of
target header files for every CPU for which we have at least
one VxWorks port.

It essentially provides default values for common VxWorks
markers (typically, macros conveying if we are configured for 
such or such VxWorks variant), so they can be referenced
consistently in other files of the port.

This was missing for powerpc* and this patch just fixes that,
which will help further vxworks related patches to come. 

This should really be a noop for non VxWorks ports.

Checked on a gcc-8 based source tree that I can still
build functional compilers passing Ada ACATS for VxWorks
6.9 and 7.0.

Bootstrapped and reg tested on mainline for x86_64-linux.

With Kind Regards,

Olivier


2018-09-20  Olivier Hainque  

* config.gcc (powerpc*-*-*): Prepend vxworks-dummy.h to tm_file.



0001-config.gcc-add-vxworks-dummy.h-to-tmfiles-for-powerp.patch
Description: Binary data


PING [PATCH] Add new warning flag "warn_prio_ctor_dtor"

2018-09-20 Thread Vinay Kumar
Hi Joseph,

Please consider this mail as a reminder to review the patch posted at:
https://gcc.gnu.org/ml/gcc-patches/2018-09/msg00132.html

Please review the patch and let me know if any modifications are required.

Thanks,
Vinay

-Original Message-
From: Joseph Myers  
Sent: 03 September 2018 21:57
To: Vinay Kumar 
Cc: gcc-patches@gcc.gnu.org
Subject: RE: Add new warning flag "warn_prio_ctor_dtor"

On Mon, 3 Sep 2018, Vinay Kumar wrote:

> Thanks for reviewing the patch and your suggestions.
> Please find attached the modified patch as per your review comments.
> Please review the patch and let me know if its okay?

Thanks, this seems to address the issues I saw in earlier versions (there's a 
missing blank line after the entry in gcc/testsuite/ChangeLog, however), but I 
haven't fully reviewed the patch.

--
Joseph S. Myers
jos...@codesourcery.com


Re: [patch] Fix PR tree-optimization/86990

2018-09-20 Thread Richard Biener
On Mon, Sep 17, 2018 at 9:12 AM Eric Botcazou  wrote:
>
> Hi,
>
> this is a regression present on the mainline only: now that the GIMPLE store
> merging pass is able to mix constants and SSA_NAMEs on the RHS of stores to
> bit-fields, we need to check that the entire merged store group is made of
> constants only when encountering overlapping stores.
>
> Tested on x86_64-suse-linux, OK for the mainline?

OK.

Richard.

>
> 2018-09-17  Eric Botcazou  
>
> PR tree-optimization/86990
> * gimple-ssa-store-merging.c 
> (imm_store_chain_info::coalesce_immediate):
> Check that the entire merged store group is made of constants only for
> overlapping stores.
>
>
> 2018-09-17  Eric Botcazou  
>
> * gcc.c-torture/execute/20180917-1.c: New test.
>
> --
> Eric Botcazou


[patch] move default #define for TARGET_VXWORKS7 to vxworks-dummy.h

2018-09-20 Thread Olivier Hainque
Where it belongs, together with the original TARGET_VXWORKS
marker.

Checked on a gcc-8 based source tree that I can still
build functional compilers passing Ada ACATS for VxWorks
6.9 and 7.0.

Bootstrapped and reg tested on mainline for x86_64-linux.

With Kind Regards,

Olivier

2018-09-20  Olivier Hainque  

* config/vxworks.h (TARGET_VXWORKS7): Move default definition...
* config/vxworks-dummy.h: ...here.



0002-Move-default-define-TARGET_VXWORKS7-0-to-vxworks-dum.patch
Description: Binary data


Re: [PATCH 14/25] Disable inefficient vectorization of elementwise loads/stores.

2018-09-20 Thread Richard Biener
On Thu, Sep 20, 2018 at 3:40 PM Richard Sandiford
 wrote:
>
> Richard Biener  writes:
> > On Mon, Sep 17, 2018 at 2:40 PM Andrew Stubbs  wrote:
> >> On 17/09/18 12:43, Richard Sandiford wrote:
> >> > OK, sounds like the cost of vec_construct is too low then.  But looking
> >> > at the port, I see you have:
> >> >
> >> > /* Implement TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST.  */
> >> >
> >> > int
> >> > gcn_vectorization_cost (enum vect_cost_for_stmt ARG_UNUSED 
> >> > (type_of_cost),
> >> >   tree ARG_UNUSED (vectype), int ARG_UNUSED 
> >> > (misalign))
> >> > {
> >> >/* Always vectorize.  */
> >> >return 1;
> >> > }
> >> >
> >> > which short-circuits the cost-model altogether.  Isn't that part
> >> > of the problem?
> >>
> >> Well, it's possible that that's a little simplistic. ;-)
> >>
> >> Although, actually the elementwise issue predates the existence of
> >> gcn_vectorization_cost, and the default does appear to penalize
> >> vec_construct somewhat.
> >>
> >> Actually, the default definition doesn't seem to do much besides
> >> increase vec_construct, so I'm not sure now why I needed to change it?
> >> Hmm, more experiments to do.
> >>
> >> Thanks for the pointer.
> >
> > Btw, we do not consider to use gather/scatter for VMAT_ELEMENTWISE,
> > that's a missed "optimization" quite possibly because gather/scatter is so
> > expensive on x86.  Thus the vectorizer should consider this and use the
> > cheaper alternative according to the cost model (which you of course should
> > fill with sensible values...).
>
> Do you mean it this way round, or that it doesn't consider using
> VMAT_ELEMENTWISE for natural gather/scatter accesses?  We do use
> VMAT_GATHER_SCATTER instead of VMAT_ELEMENTWISE where possible for SVE,
> but that relies on implementing the new optabs instead of using the old
> built-in-based interface, so it doesn't work for x86 yet.
>
> I guess we might need some way of selecting between the two if
> the costs of gather and scatter are context-dependent in some way.
> But if gather/scatter is always more expensive than VMAT_ELEMENTWISE
> for certain modes then it's probably better not to define the optabs
> for those modes.

I think we can't vectorize true gathers (indexed from memory loads) w/o
gather yet, right?  So I really was thinking of implementing VMAT_ELEMENTWISE
(invariant stride) and VMAT_STRIDED_SLP by composing the appropriate
index vector with a splat and multiplication and using a gather.  I think that's
not yet implemented?

But yes, vectorizing gathers as detected by dataref analysis w/o native gather
support would also be interesting.  We can do that by doing elementwise
loads and either load the indexes also elementwise or decompose the vector
of indexes (dependent on how that vector is computed).

Richard.

>
> Thanks,
> Richard


Re: Fold more boolean expressions

2018-09-20 Thread Richard Biener
On Thu, Sep 20, 2018 at 4:00 PM Marc Glisse  wrote:
>
> On Thu, 20 Sep 2018, Richard Biener wrote:
>
> > On Sat, Sep 15, 2018 at 8:01 AM MCC CS  wrote:
> >>
> >> Sorry for doing the same mistake twice. Is this OK, and do
> >> I need to test it again after the first version of this
> >> patch?
> >>
> >> 2018-09-15 MCC CS 
> >>
> >> gcc/
> >> PR tree-optimization/87261
> >> * match.pd: Add boolean optimizations,
> >> fix whitespace.
> >>
> >> 2018-09-15 MCC CS 
> >>
> >> gcc/testsuite/
> >> PR tree-optimization/87261
> >> * gcc.dg/pr87261.c: New test.
> >>
> >> Index: gcc/match.pd
> >> ===
> >> --- gcc/match.pd(revision 264170)
> >> +++ gcc/match.pd(working copy)
> >> @@ -92,7 +92,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >>IFN_FMA IFN_FMS IFN_FNMA IFN_FNMS)
> >>  (define_operator_list COND_TERNARY
> >>IFN_COND_FMA IFN_COND_FMS IFN_COND_FNMA IFN_COND_FNMS)
> >> -
> >> +
> >>  /* As opposed to convert?, this still creates a single pattern, so
> >> it is not a suitable replacement for convert? in all cases.  */
> >>  (match (nop_convert @0)
> >> @@ -106,7 +106,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >>&& tree_nop_conversion_p (TREE_TYPE (type), TREE_TYPE (TREE_TYPE 
> >> (@0))
> >>  /* This one has to be last, or it shadows the others.  */
> >>  (match (nop_convert @0)
> >> - @0)
> >> + @0)
> >>
> >>  /* Transform likes of (char) ABS_EXPR <(int) x> into (char) ABSU_EXPR 
> >> ABSU_EXPR returns unsigned absolute value of the operand and the 
> >> operand
> >> @@ -285,7 +285,7 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >>   And not for _Fract types where we can't build 1.  */
> >>(if (!integer_zerop (@0) && !ALL_FRACT_MODE_P (TYPE_MODE (type)))
> >> { build_one_cst (type); }))
> >> - /* X / abs (X) is X < 0 ? -1 : 1.  */
> >> + /* X / abs (X) is X < 0 ? -1 : 1.  */
> >>   (simplify
> >> (div:C @0 (abs @0))
> >> (if (INTEGRAL_TYPE_P (type)
> >> @@ -921,6 +921,31 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
> >>(bitop:c @0 (bit_not (bitop:cs @0 @1)))
> >>(bitop @0 (bit_not @1
> >>
> >> +/* (~x & y) | ~(x | y) -> ~x */
> >> +(simplify
> >> + (bit_ior:c (bit_and:c (bit_not@2 @0) @1) (bit_not (bit_ior:c @0 @1)))
> >> + @2)
> >> +
> >> +/* (x | y) ^ (x | ~y) -> ~x */
> >> +(simplify
> >> + (bit_xor:c (bit_ior:c @0 @1) (bit_ior:c @0 (bit_not @1)))
> >> + (bit_not @0))
> >> +
> >> +/* (x & y) | ~(x | y) -> ~(x ^ y) */
> >> +(simplify
> >> + (bit_ior:c (bit_and @0 @1) (bit_not:s (bit_ior:s @0 @1)))
> >
> > I think this misses :cs on the bit_and.
>
> For :c, shouldn't canonicalization make the order of @0 and @1 consistent
> for bit_and and bit_ior?

Hmm, probably yes.  This all makes me think that the :c should better be
placed automagically by genmatch...

> >> + (bit_not (bit_xor @0 @1)))
> >> +
> >> +/* (~x | y) ^ (x ^ y) -> x | ~y */
> >> +(simplify
> >> + (bit_xor:c (bit_ior:cs (bit_not @0) @1) (bit_xor:c @0 @1))
> >> + (bit_ior @0 (bit_not @1)))
> >
> > :s on the bit_xor
> >
> >> +/* (x ^ y) | ~(x | y) -> ~(x & y) */
> >> +(simplify
> >> + (bit_ior:c (bit_xor @0 @1) (bit_not:s (bit_ior @0 @1)))
> >> + (bit_not (bit_and @0 @1)))
> >
> > :cs on the bit_xor, :s on the second bit_ior
> >
> > Otherwise looks OK to me.
>
> --
> Marc Glisse


[patch] introduce a TARGET_VXWORKS64 marker

2018-09-20 Thread Olivier Hainque
Hello,

This is a preliminary patch before the addition of support
for 64bit VxWorks on some CPUs, which incurs sometimes subtle
ABI variations.

Checked on a gcc-8 based source tree that I can still
build functional compilers passing Ada ACATS for VxWorks
6.9 and 7.0.

Bootstrapped and reg tested on mainline for x86_64-linux.

With Kind Regards,

Olivier


2018-09-20  Olivier Hainque  

* config.gcc: Enforce def of TARGET_VXWORKS64 to 1 from
triplet, similar to support for VxWorks7.
* config/vxworks-dummy.h: Provide a default definition
of TARGET_VXWORKS64 to 0.



0003-Introduce-TARGET_VXWORKS64-for-VxWorks-64bit-ports.patch
Description: Binary data



Re: [PATCH 14/25] Disable inefficient vectorization of elementwise loads/stores.

2018-09-20 Thread Richard Sandiford
Richard Biener  writes:
> On Thu, Sep 20, 2018 at 3:40 PM Richard Sandiford
>  wrote:
>>
>> Richard Biener  writes:
>> > On Mon, Sep 17, 2018 at 2:40 PM Andrew Stubbs  
>> > wrote:
>> >> On 17/09/18 12:43, Richard Sandiford wrote:
>> >> > OK, sounds like the cost of vec_construct is too low then.  But looking
>> >> > at the port, I see you have:
>> >> >
>> >> > /* Implement TARGET_VECTORIZE_BUILTIN_VECTORIZATION_COST.  */
>> >> >
>> >> > int
>> >> > gcn_vectorization_cost (enum vect_cost_for_stmt ARG_UNUSED 
>> >> > (type_of_cost),
>> >> >   tree ARG_UNUSED (vectype), int ARG_UNUSED 
>> >> > (misalign))
>> >> > {
>> >> >/* Always vectorize.  */
>> >> >return 1;
>> >> > }
>> >> >
>> >> > which short-circuits the cost-model altogether.  Isn't that part
>> >> > of the problem?
>> >>
>> >> Well, it's possible that that's a little simplistic. ;-)
>> >>
>> >> Although, actually the elementwise issue predates the existence of
>> >> gcn_vectorization_cost, and the default does appear to penalize
>> >> vec_construct somewhat.
>> >>
>> >> Actually, the default definition doesn't seem to do much besides
>> >> increase vec_construct, so I'm not sure now why I needed to change it?
>> >> Hmm, more experiments to do.
>> >>
>> >> Thanks for the pointer.
>> >
>> > Btw, we do not consider to use gather/scatter for VMAT_ELEMENTWISE,
>> > that's a missed "optimization" quite possibly because gather/scatter is so
>> > expensive on x86.  Thus the vectorizer should consider this and use the
>> > cheaper alternative according to the cost model (which you of course should
>> > fill with sensible values...).
>>
>> Do you mean it this way round, or that it doesn't consider using
>> VMAT_ELEMENTWISE for natural gather/scatter accesses?  We do use
>> VMAT_GATHER_SCATTER instead of VMAT_ELEMENTWISE where possible for SVE,
>> but that relies on implementing the new optabs instead of using the old
>> built-in-based interface, so it doesn't work for x86 yet.
>>
>> I guess we might need some way of selecting between the two if
>> the costs of gather and scatter are context-dependent in some way.
>> But if gather/scatter is always more expensive than VMAT_ELEMENTWISE
>> for certain modes then it's probably better not to define the optabs
>> for those modes.
>
> I think we can't vectorize true gathers (indexed from memory loads) w/o
> gather yet, right?

Right.

> So I really was thinking of implementing VMAT_ELEMENTWISE (invariant
> stride) and VMAT_STRIDED_SLP by composing the appropriate index vector
> with a splat and multiplication and using a gather.  I think that's
> not yet implemented?

For SVE we use:

  /* As a last resort, trying using a gather load or scatter store.

 ??? Although the code can handle all group sizes correctly,
 it probably isn't a win to use separate strided accesses based
 on nearby locations.  Or, even if it's a win over scalar code,
 it might not be a win over vectorizing at a lower VF, if that
 allows us to use contiguous accesses.  */
  if (*memory_access_type == VMAT_ELEMENTWISE
  && single_element_p
  && loop_vinfo
  && vect_use_strided_gather_scatters_p (stmt_info, loop_vinfo,
 masked_p, gs_info))
*memory_access_type = VMAT_GATHER_SCATTER;

in get_group_load_store_type.  This only works when the target defines
gather/scatter using optabs rather than built-ins.

But yeah, no VMAT_STRIDED_SLP support yet.  That would be good
to have...

Richard


Re: [PATCH] PR libstdc++/78179 run long double tests separately

2018-09-20 Thread Jonathan Wakely

On 20/09/18 15:36 +0200, Christophe Lyon wrote:

On Wed, 19 Sep 2018 at 23:13, Rainer Orth  wrote:


Hi Christophe,

> I have noticed failures on hypot-long-double.cc on arm, so I suggest we add:
>
> diff --git
> a/libstdc++-v3/testsuite/26_numerics/headers/cmath/hypot-long-double.cc
> b/libstdc++-v3/testsuite/26_numerics/headers/cmath/hypot-long-double.cc
> index 8a05473..4c2e33b 100644
> --- a/libstdc++-v3/testsuite/26_numerics/headers/cmath/hypot-long-double.cc
> +++ b/libstdc++-v3/testsuite/26_numerics/headers/cmath/hypot-long-double.cc
> @@ -17,7 +17,7 @@
>
>  // { dg-options "-std=gnu++17" }
>  // { dg-do run { target c++17 } }
> -// { dg-xfail-run-if "PR 78179" { powerpc-ibm-aix* hppa-*-linux* nios2-*-* } 
}
> +// { dg-xfail-run-if "PR 78179" { powerpc-ibm-aix* hppa-*-linux*
> nios2-*-* arm*-*-* } }
>
>  // Run the long double tests from hypot.cc separately, because they fail on a
>  // number of targets. See PR libstdc++/78179 for details.
>
> OK?

just a nit (and not a review): I'd prefer the target list to be sorted
alphabetically, not completely random.



Sure, I can sort the whole list, if OK on principle.


Yes, please go ahead and commit it with the sorted list.



Re: [PATCH v4] [aarch64] Add HiSilicon tsv110 CPU support

2018-09-20 Thread James Greenhalgh
On Wed, Sep 19, 2018 at 04:53:52AM -0500, Shaokun Zhang wrote:
> This patch adds HiSilicon's an mcpu: tsv110, which supports v8_4A.
> It has been tested on aarch64 and no regressions from this patch.

This patch is OK for Trunk.

Do you need someone to commit it on your behalf?

Thanks,
James

> 
> ---
>  gcc/ChangeLog|   9 +++
>  gcc/config/aarch64/aarch64-cores.def |   3 +
>  gcc/config/aarch64/aarch64-cost-tables.h | 104 
> +++
>  gcc/config/aarch64/aarch64-tune.md   |   2 +-
>  gcc/config/aarch64/aarch64.c |  82 
>  gcc/doc/invoke.texi  |   2 +-
>  6 files changed, 200 insertions(+), 2 deletions(-)
> 
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 69e2e14..a040daa 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,12 @@
> +2018-09-19  Shaokun Zhang  
> +Bo Zhou  
> +
> + * config/aarch64/aarch64-cores.def (tsv110): New CPU.
> + * config/aarch64/aarch64-tune.md: Regenerated.
> + * doc/invoke.texi (AArch64 Options/-mtune): Add "tsv110".
> + * config/aarch64/aarch64.c (tsv110_tunings): New tuning table.
> + * config/aarch64/aarch64-cost-tables.h: Add "tsv110" extra costs.
> +
>  2018-09-18  Marek Polacek  
>  
>   P1064R0 - Allowing Virtual Function Calls in Constant Expressions
 


[patch] account for 64bit case in type size defaults for vxworks

2018-09-20 Thread Olivier Hainque

This is a second preliminary patch for 64bit vxworks
on some CPUs.

Checked on a gcc-8 based source tree that I can still
build functional compilers passing Ada ACATS for VxWorks
6.9 and 7.0.

Bootstrapped and reg tested on mainline for x86_64-linux.

With Kind Regards,

Olivier

2018-09-20  Olivier Hainque  

* config/vxworks.h (SIZE_TYPE): Account for TARGET_VXWORKS64.
(PTRDIFF_TYPE): Likewise.



0004-Refine-default-SIZE_TYPE-and-PTRDIFF_TYPE-for-vxwork.patch
Description: Binary data


Re: [openacc] Teach gfortran to lower OpenACC routine dims

2018-09-20 Thread Cesar Philippidis
On 09/19/2018 03:27 PM, Bernhard Reutner-Fischer wrote:
> On Wed, 5 Sep 2018 12:52:03 -0700
> Cesar Philippidis  wrote:
> 
>> At present, gfortran does not encode the gang, worker or vector
>> parallelism clauses when it creates acc routines dim attribute for
>> subroutines and functions. While support for acc routine is lacking in
>> other areas in gfortran (including modules), this patch is important
>> because it encodes the parallelism attributes using the same function
>> as the C and C++ FEs. This will become important with the forthcoming
>> nvptx vector length extensions, because large vectors are not
>> supported in acc routines yet.
>>
>> Is this OK for trunk? I regtested and bootstrapped for x86_64 with
>> nvptx offloading.
> 
>> diff --git a/gcc/fortran/openmp.c b/gcc/fortran/openmp.c
>> index 94a7f7eaa50..d48c9351e25 100644
>> --- a/gcc/fortran/openmp.c
>> +++ b/gcc/fortran/openmp.c
>> @@ -2234,34 +2234,45 @@ gfc_match_oacc_cache (void)
>>return MATCH_YES;
>>  }
>>  
>> -/* Determine the loop level for a routine.   */
>> +/* Determine the loop level for a routine.  Returns
>> OACC_FUNCTION_NONE
>> +   if any error is detected.  */
>>  
>> -static int
>> +static oacc_function
>>  gfc_oacc_routine_dims (gfc_omp_clauses *clauses)
>>  {
>>int level = -1;
>> +  oacc_function ret = OACC_FUNCTION_AUTO;
>>  
>>if (clauses)
>>  {
>>unsigned mask = 0;
>>  
>>if (clauses->gang)
>> -level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
>> +{
>> +  level = GOMP_DIM_GANG, mask |= GOMP_DIM_MASK (level);
>> +  ret = OACC_FUNCTION_GANG;
>> +}
>>if (clauses->worker)
>> -level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
>> +{
>> +  level = GOMP_DIM_WORKER, mask |= GOMP_DIM_MASK (level);
>> +  ret = OACC_FUNCTION_WORKER;
>> +}
>>if (clauses->vector)
>> -level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
>> +{
>> +  level = GOMP_DIM_VECTOR, mask |= GOMP_DIM_MASK (level);
>> +  ret = OACC_FUNCTION_VECTOR;
>> +}
>>if (clauses->seq)
>> -level = GOMP_DIM_MAX, mask |= GOMP_DIM_MASK (level);
>> +{
>> +  level = GOMP_DIM_MAX, mask |= GOMP_DIM_MASK (level);
>> +  ret = OACC_FUNCTION_SEQ;
>> +}
>>  
>>if (mask != (mask & -mask))
>> -gfc_error ("Multiple loop axes specified for routine");
>> +ret = OACC_FUNCTION_NONE;
>>  }
>>  
>> -  if (level < 0)
>> -level = GOMP_DIM_MAX;
>> -
>> -  return level;
>> +  return ret;
>>  }
>>  
>>  match
>> @@ -2272,6 +2283,8 @@ gfc_match_oacc_routine (void)
>>match m;
>>gfc_omp_clauses *c = NULL;
>>gfc_oacc_routine_name *n = NULL;
>> +  oacc_function dims = OACC_FUNCTION_NONE;
> 
> Unneeded initialisation of dims.

ACK.

>> +  bool seen_error = false;
>>  
>>old_loc = gfc_current_locus;
>>  
>> @@ -2318,17 +2331,15 @@ gfc_match_oacc_routine (void)
>>  }
>>else
>>  {
>> -  gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %C");
>> -  gfc_current_locus = old_loc;
>> -  return MATCH_ERROR;
>> +  gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %L",
>> &old_loc);
>> +  goto cleanup;
>>  }
>>  
>>if (gfc_match_char (')') != MATCH_YES)
>>  {
>> -  gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %C,
>> expecting"
>> - " ')' after NAME");
>> -  gfc_current_locus = old_loc;
>> -  return MATCH_ERROR;
>> +  gfc_error ("Syntax error in !$ACC ROUTINE ( NAME ) at %L,
>> expecting"
>> + " ')' after NAME", &old_loc);
>> +  goto cleanup;
>>  }
>>  }
>>  
>> @@ -2337,26 +2348,83 @@ gfc_match_oacc_routine (void)
>>!= MATCH_YES))
>>  return MATCH_ERROR;
>>  
>> +  /* Scan for invalid routine geometry.  */
>> +  dims = gfc_oacc_routine_dims (c);
>> +  if (dims == OACC_FUNCTION_NONE)
>> +{
>> +  gfc_error ("Multiple loop axes specified in !$ACC ROUTINE at
>> %L",
>> + &old_loc);
>> +
>> +  /* Don't abort early, because it's important to let the user
>> + know of any potential duplicate routine directives.  */
>> +  seen_error = true;
>> +}
>> +  else if (dims == OACC_FUNCTION_AUTO)
>> +{
>> +  gfc_warning (0, "Expected one of %, %,
>> % or "
>> +   "% clauses in !$ACC ROUTINE at %L",
>> &old_loc);
>> +  dims = OACC_FUNCTION_SEQ;
>> +}
>> +
>>if (sym != NULL)
>>  {
>> -  n = gfc_get_oacc_routine_name ();
>> -  n->sym = sym;
>> -  n->clauses = NULL;
>> -  n->next = NULL;
>> -  if (gfc_current_ns->oacc_routine_names != NULL)
>> -n->next = gfc_current_ns->oacc_routine_names;
>> -
>> -  gfc_current_ns->oacc_routine_names = n;
>> +  bool needs_entry = true;
>> +
>> +  /* Scan for any repeated routine directives on 'sym' and report
>> + an error if necessary.  TODO: Extend this function to scan
>> + for compatible DEVICE_TYPE dims.  */
>> +  for (n = gfc_current_ns->oacc_routi

[patch] leverage STARTFILE_PREFIX_SPEC for vxworks7

2018-09-20 Thread Olivier Hainque
Hello,

To help locate crt0.o with -l:crt0.o on VxWorks7, we currently
stick an ad-hoc -L in LIB_SPEC.

This gets removed by -nodefaultlibs, which then affects more
that just default libs.

This patch fixes this by replacing the ad-hoc -L by a
STARTFILE_PREFIX_SPEC, which makes sense anyway for something
intended to help find crt0.o.

Checked on a gcc-8 based source tree that I can still
build functional compilers passing Ada ACATS for VxWorks
6.9 and 7.0.

Bootstrapped and reg tested on mainline for x86_64-linux.

With Kind Regards,

Olivier

2018-09-20  Olivier Hainque  

* config/vxworks.h (STARTFILE_PREFIX_SPEC): Define.
(VXWORKS_LIBS_DIR_RTP): Remove definition and use.



0005-Introduce-a-STARTFILE_PREFIX_SPEC-for-VxWorks7.patch
Description: Binary data







Re: [RFA] Minor cleanup to VRP/EVRP handling of deferred edge/switch optimization

2018-09-20 Thread Jeff Law
On 9/20/18 6:51 AM, Richard Biener wrote:
> On Mon, Sep 17, 2018 at 4:50 PM Jeff Law  wrote:
>>
>> This is a relatively minor cleanup that I should have caught last cycle,
>> but somehow missed.
>>
>> We have two structures TO_REMOVE_EDGES and TO_UPDATE_SWITCH_STMTS which
>> are used by the VRP/EVRP code to record edges to remove and switch
>> statements that need updating.
>>
>> They are currently implemented as globals within tree-vrp.c with an
>> appropriate extern within tree-vrp.h.
>>
>> The code to walk those vectors was only implemented in VRP, but we can
>> (and do) add to those vectors within EVRP.   So EVRP would detect
>> certain edges as dead or switches that were needed simplification, but
>> they were left as-is because EVRP never walked the vectors to do the
>> necessary cleanup.
>>
>> This change pushes the vectors into the vr_values structure.  They're
>> initialized in the ctor and we verify they're properly cleaned up in the
>> dtor.  This obviously avoids the global object carrying state, but also
>> catches cases where we record that an optimization was possible but
>> failed to update the IL appropriately.
>>
>> As a side effect, we don't need to bother with initializing and wiping
>> EDGE_IGNORE for jump threading in VRP.  We just mark the appropriate
>> edges at the same time we put them in TO_REMOVE_EDGES within vr_values.
>>
>> vrp113.c is an example of where EVRP detected optimizations should be
>> possible, but failed to update the IL.  Given the test really wanted to
>> check VRP's behavior, I've disabled EVRP for that test.
>>
>> Bootstrapped and regression tested on x86_64-linux-gnu.
>>
>> OK for the trunk?
> 
> OK.  Maybe you can duplicate vrp113.c to also have a EVRP testing variant
> of this functionality?
Sure.  I'd pondered doing something like that anyway.  We can verify
that EVRP collapses the test and that we do not trigger the "did you
fail to clean up properly" assert.

jeff


[patch] cleanup handling of libgcc and libc_internal for VxWorks

2018-09-20 Thread Olivier Hainque
Hello,

For static RTPs, libc_internal is included
in link closures through LIB_SPEC and LIBGCC_SPEC, which makes
it hard to conditionally remove with command line options
such as -nolibc.

This change arranges to have libc_internal dragged in from
LIB_SPEC only and reworks the ordering of libs that
participate in LIB_SPEC to match a rationale that the head
comment can state.

We have been using this for a while in our gcc-7 based toolchains
on several targets.

I checked on a gcc-8 based source tree that I can still
build functional compilers passing Ada ACATS for VxWorks
6.9 and 7.0.

Bootstrapped and reg tested on mainline for x86_64-linux.

With Kind Regards,

Olivier

2018-09-20  Olivier Hainque  

* config/vxworks.h (VXWORKS_LIBGCC_SPEC): Remove -lc_internal.
Merge block comment with the one ahead of VXWORKS_LIBS_RTP. Then:
(VXWORKS_LIBS_RTP): Minor reordering.



0006-Cleanup-handling-of-libgcc-and-libc_internal-for-VxW.patch
Description: Binary data


Re: [PATCH v4] [aarch64] Add HiSilicon tsv110 CPU support

2018-09-20 Thread Zhangshaokun
Hi James,

On 2018/9/20 22:22, James Greenhalgh wrote:
> On Wed, Sep 19, 2018 at 04:53:52AM -0500, Shaokun Zhang wrote:
>> This patch adds HiSilicon's an mcpu: tsv110, which supports v8_4A.
>> It has been tested on aarch64 and no regressions from this patch.
> 
> This patch is OK for Trunk.
> 
> Do you need someone to commit it on your behalf?
> 

Sure, it is great.

Thanks in advance,
Shaokun

> Thanks,
> James
> 
>>
>> ---
>>  gcc/ChangeLog|   9 +++
>>  gcc/config/aarch64/aarch64-cores.def |   3 +
>>  gcc/config/aarch64/aarch64-cost-tables.h | 104 
>> +++
>>  gcc/config/aarch64/aarch64-tune.md   |   2 +-
>>  gcc/config/aarch64/aarch64.c |  82 
>>  gcc/doc/invoke.texi  |   2 +-
>>  6 files changed, 200 insertions(+), 2 deletions(-)
>>
>> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
>> index 69e2e14..a040daa 100644
>> --- a/gcc/ChangeLog
>> +++ b/gcc/ChangeLog
>> @@ -1,3 +1,12 @@
>> +2018-09-19  Shaokun Zhang  
>> +Bo Zhou  
>> +
>> +* config/aarch64/aarch64-cores.def (tsv110): New CPU.
>> +* config/aarch64/aarch64-tune.md: Regenerated.
>> +* doc/invoke.texi (AArch64 Options/-mtune): Add "tsv110".
>> +* config/aarch64/aarch64.c (tsv110_tunings): New tuning table.
>> +* config/aarch64/aarch64-cost-tables.h: Add "tsv110" extra costs.
>> +
>>  2018-09-18  Marek Polacek  
>>  
>>  P1064R0 - Allowing Virtual Function Calls in Constant Expressions
>  
> 
> .
> 



[Patch 1/3][Aarch64] Implement Aarch64 SIMD ABI

2018-09-20 Thread Steve Ellcey
Here is a new version of my patch to support the Aarch64 SIMD ABI in GCC.
There is no functional change, I just removed the definition of V23_REGNUM
from aarch64.md.  This is no longer needed because another patch that was
checked in has added it.  I am following up this patch with two more,
one to add the TARGET_SIMD_CLONE* macros and functions and one to modify
code that checks for register usage by functions so that we can differentiate
between regular functions and simd functions on Aarch64.

This first patch has been tested with no regressions and should be ready
to checkin if approved.  The other two are not fully tested but are being
submitted for to get feedback.

Steve Ellcey
sell...@cavium.com


2018-09-20  Steve Ellcey  

* config/aarch64/aarch64-protos.h (aarch64_use_simple_return_insn_p):
New prototype.
(aarch64_epilogue_uses): Ditto.
* config/aarch64/aarch64.c (aarch64_attribute_table): New array.
(aarch64_simd_decl_p): New function.
(aarch64_reg_save_mode): New function.
(aarch64_is_simd_call_p): New function.
(aarch64_function_ok_for_sibcall): Check for simd calls.
(aarch64_layout_frame): Check for simd function.
(aarch64_gen_storewb_pair): Handle E_TFmode.
(aarch64_push_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_loadwb_pair): Handle E_TFmode.
(aarch64_pop_regs): Use aarch64_reg_save_mode to get mode.
(aarch64_gen_store_pair): Handle E_TFmode.
(aarch64_gen_load_pair): Ditto.
(aarch64_save_callee_saves): Handle different mode sizes.
(aarch64_restore_callee_saves): Ditto.
(aarch64_components_for_bb): Check for simd function.
(aarch64_epilogue_uses): New function.
(aarch64_process_components): Check for simd function.
(aarch64_expand_prologue): Ditto.
(aarch64_expand_epilogue): Ditto.
(aarch64_expand_call): Ditto.
(TARGET_ATTRIBUTE_TABLE): New define.
* config/aarch64/aarch64.h (EPILOGUE_USES): Redefine.
(FP_SIMD_SAVED_REGNUM_P): New macro.
* config/aarch64/aarch64.md (simple_return): New define_expand.
(load_pair_dw_tftf): New instruction.
(store_pair_dw_tftf): Ditto.
(loadwb_pair_): Ditto.
("storewb_pair_): Ditto.


Testsuite ChangeLog:

2018-09-20  Steve Ellcey  

* gcc.target/aarch64/torture/aarch64-torture.exp: New file.
* gcc.target/aarch64/torture/simd-abi-1.c: New test.
* gcc.target/aarch64/torture/simd-abi-2.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-3.c: Ditto.
* gcc.target/aarch64/torture/simd-abi-4.c: Ditto.

diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index b26e46f..7fb85e5 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -466,6 +466,7 @@ bool aarch64_split_dimode_const_store (rtx, rtx);
 bool aarch64_symbolic_address_p (rtx);
 bool aarch64_uimm12_shift (HOST_WIDE_INT);
 bool aarch64_use_return_insn_p (void);
+bool aarch64_use_simple_return_insn_p (void);
 const char *aarch64_mangle_builtin_type (const_tree);
 const char *aarch64_output_casesi (rtx *);
 
@@ -550,6 +551,8 @@ void aarch64_split_simd_move (rtx, rtx);
 /* Check for a legitimate floating point constant for FMOV.  */
 bool aarch64_float_const_representable_p (rtx);
 
+extern int aarch64_epilogue_uses (int);
+
 #if defined (RTX_CODE)
 void aarch64_gen_unlikely_cbranch (enum rtx_code, machine_mode cc_mode,
    rtx label_ref);
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 8cc738c..7fbd49c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1005,6 +1005,15 @@ static const struct processor *selected_tune;
 /* The current tuning set.  */
 struct tune_params aarch64_tune_params = generic_tunings;
 
+/* Table of machine attributes.  */
+static const struct attribute_spec aarch64_attribute_table[] =
+{
+  /* { name, min_len, max_len, decl_req, type_req, fn_type_req,
+   affects_type_identity, handler, exclude } */
+  { "aarch64_vector_pcs", 0, 0, true,  false, false, false, NULL, NULL },
+  { NULL, 0, 0, false, false, false, false, NULL, NULL }
+};
+
 #define AARCH64_CPU_DEFAULT_FLAGS ((selected_cpu) ? selected_cpu->flags : 0)
 
 /* An ISA extension in the co-processor and main instruction set space.  */
@@ -1383,6 +1392,31 @@ aarch64_hard_regno_mode_ok (unsigned regno, machine_mode mode)
   return false;
 }
 
+/* Return true if this is a definition of a vectorized simd function.  */
+
+static bool
+aarch64_simd_decl_p (tree fndecl)
+{
+  if (lookup_attribute ("aarch64_vector_pcs", DECL_ATTRIBUTES (fndecl)) != NULL)
+return true;
+  if (lookup_attribute ("simd", DECL_ATTRIBUTES (fndecl)) == NULL)
+return false;
+  return (VECTOR_TYPE_P (TREE_TYPE (TREE_TYPE (fndecl;
+}
+
+/* Return the mode a register save/restore should use.  DImode for integ

[Patch 2/3][Aarch64] Implement Aarch64 SIMD ABI

2018-09-20 Thread Steve Ellcey
This is the second of three Aarch64 patch for SIMD ABI support.  It
defines the TARGET_SIMD_CLONE_* macros so that GCC will recognize
and vectorize loops containing SIMD functions.  It requires that
patch one of the Aarch64 SIMD ABI get checked in first.

This patch has not been fully regression tested yet but is fairly
safe and I am posting it to see if there are any comments on it.

Steve Ellcey
sell...@cavium.com


2018-09-20  Steve Ellcey  

* config/aarch64/aarch64.c (cgraph.h): New include.
(aarch64_simd_clone_compute_vecsize_and_simdlen): New function.
(aarch64_simd_clone_adjust): Ditto.
(aarch64_simd_clone_usable): Ditto.
(TARGET_SIMD_CLONE_COMPUTE_VECSIZE_AND_SIMDLEN): New macro.
(TARGET_SIMD_CLONE_ADJUST): Ditto.
(TARGET_SIMD_CLONE_USABLE): Ditto.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 8cc738c..a86f32d 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -40,6 +40,7 @@
 #include "regs.h"
 #include "emit-rtl.h"
 #include "recog.h"
+#include "cgraph.h"
 #include "diagnostic.h"
 #include "insn-attr.h"
 #include "alias.h"
@@ -17472,6 +17473,131 @@ aarch64_speculation_safe_value (machine_mode mode,
   return result;
 }
 
+/* Set CLONEI->vecsize_mangle, CLONEI->mask_mode, CLONEI->vecsize_int,
+   CLONEI->vecsize_float and if CLONEI->simdlen is 0, also
+   CLONEI->simdlen.  Return 0 if SIMD clones shouldn't be emitted,
+   or number of vecsize_mangle variants that should be emitted.  */
+
+static int
+aarch64_simd_clone_compute_vecsize_and_simdlen (struct cgraph_node *node,
+	struct cgraph_simd_clone *clonei,
+	tree base_type,
+	int num ATTRIBUTE_UNUSED)
+{
+  int ret = 0;
+
+  if (clonei->simdlen
+  && (clonei->simdlen < 2
+	  || clonei->simdlen > 1024
+	  || (clonei->simdlen & (clonei->simdlen - 1)) != 0))
+{
+  warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		  "unsupported simdlen %d", clonei->simdlen);
+  return 0;
+}
+
+  tree ret_type = TREE_TYPE (TREE_TYPE (node->decl));
+  if (TREE_CODE (ret_type) != VOID_TYPE)
+switch (TYPE_MODE (ret_type))
+  {
+  case E_QImode:
+  case E_HImode:
+  case E_SImode:
+  case E_DImode:
+  case E_SFmode:
+  case E_DFmode:
+  /* case E_SCmode: */
+  /* case E_DCmode: */
+	break;
+  default:
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		"unsupported return type %qT for simd\n", ret_type);
+	return 0;
+  }
+
+  tree t;
+  for (t = DECL_ARGUMENTS (node->decl); t; t = DECL_CHAIN (t))
+/* FIXME: Shouldn't we allow such arguments if they are uniform?  */
+switch (TYPE_MODE (TREE_TYPE (t)))
+  {
+  case E_QImode:
+  case E_HImode:
+  case E_SImode:
+  case E_DImode:
+  case E_SFmode:
+  case E_DFmode:
+  /* case E_SCmode: */
+  /* case E_DCmode: */
+	break;
+  default:
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		"unsupported argument type %qT for simd\n", TREE_TYPE (t));
+	return 0;
+  }
+
+  if (TARGET_SIMD)
+{
+clonei->vecsize_mangle = 'n';
+clonei->mask_mode = VOIDmode;
+clonei->vecsize_int = 128;
+clonei->vecsize_float = 128;
+
+if (clonei->simdlen == 0)
+  {
+  if (SCALAR_INT_MODE_P (TYPE_MODE (base_type)))
+	clonei->simdlen = clonei->vecsize_int;
+  else
+	clonei->simdlen = clonei->vecsize_float;
+  clonei->simdlen /= GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type));
+  }
+else if (clonei->simdlen > 16)
+  {
+  /* If it is possible for given SIMDLEN to pass CTYPE value in
+	 registers (v0-v7) accept that SIMDLEN, otherwise warn and don't
+	 emit corresponding clone.  */
+  int cnt = GET_MODE_BITSIZE (SCALAR_TYPE_MODE (base_type)) * clonei->simdlen;
+  if (SCALAR_INT_MODE_P (TYPE_MODE (base_type)))
+	cnt /= clonei->vecsize_int;
+  else
+	cnt /= clonei->vecsize_float;
+  if (cnt > 8)
+	{
+	warning_at (DECL_SOURCE_LOCATION (node->decl), 0,
+		"unsupported simdlen %d", clonei->simdlen);
+	return 0;
+	}
+  }
+  ret = 1;
+}
+  return ret;
+}
+
+/* Add target attribute to SIMD clone NODE if needed.  */
+
+static void
+aarch64_simd_clone_adjust (struct cgraph_node *node ATTRIBUTE_UNUSED)
+{
+}
+
+/* If SIMD clone NODE can't be used in a vectorized loop
+   in current function, return -1, otherwise return a badness of using it
+   (0 if it is most desirable from vecsize_mangle point of view, 1
+   slightly less desirable, etc.).  */
+
+static int
+aarch64_simd_clone_usable (struct cgraph_node *node)
+{
+  switch (node->simdclone->vecsize_mangle)
+{
+case 'n':
+  if (!TARGET_SIMD)
+	return -1;
+  return 0;
+default:
+  gcc_unreachable ();
+}
+}
+
 /* Target-specific selftests.  */
 
 #if CHECKING_P
@@ -17947,6 +18073,16 @@ aarch64_libgcc_floating_mode_supported_p
 #undef TARGET_SPECULATION_SAFE_VALUE
 #define TARGET_SPECULATION_SAFE_VALUE aarch64_speculation_safe_value
 
+#undef TAR

[Patch 3/3][Aarch64] Implement Aarch64 SIMD ABI

2018-09-20 Thread Steve Ellcey
This is the third of three patches for Aarch64 SIMD ABI support.  This
patch is not fully tested yet but I want to post it to get comments.

This is the only patch of the three that touches non-aarch64 specific
code.  The changes here are made to allow GCC to have better information
about what registers are clobbered by functions.  With the new SIMD
ABI on Aarch64 the registers clobbered by a SIMD function is a subset
of the registers clobbered by a normal (non-SIMD) function.  This can
result in the caller saving and restoring more registers than is necessary.

This patch addresses that by passing information about the call insn to 
various routines so that they can check on what type of function is being
called and modify the clobbered register set based on that information.

As an example, this code:

  __attribute__ ((__simd__ ("notinbranch"))) extern double sin (double __x);
  __attribute__ ((__simd__ ("notinbranch"))) extern double log (double __x);
  __attribute__ ((__simd__ ("notinbranch"))) extern double exp (double __x);

  double foo(double * __restrict__ x, double * __restrict__ y,
 double * __restrict__ z, int n)
  {
int i;
double a = 0.0;
for (i = 0; i < n; i++)
a = a + sin(x[i]) + log(y[i]) + exp (z[i]);
return a;
  }

Will generate stores inside the main vectorized loop to preserve registers
without this patch, but after the patch, will not do any stores and will
use registers it knows the vector sin/log/exp functions do not clobber.

Comments?

Steve Ellcey
sell...@cavium.com


2018-09-20  Steve Ellcey  

* caller-save.c (setup_save_areas): Modify get_call_reg_set_usage
arguments.
(save_call_clobbered_regs): Ditto.
* config/aarch64/aarch64.c (aarch64_simd_function_def): New function.
(aarch64_simd_call_p): Ditto.
(aarch64_hard_regno_call_part_clobbered): Check for simd calls.
(aarch64_check_part_clobbered): New function.
(aarch64_used_reg_set): New function.
(TARGET_CHECK_PART_CLOBBERED): New macro.
(TARGET_USED_REG_SET): New macro.
* cselib.c (cselib_process_insn): Modify
targetm.hard_regno_call_part_clobbered arguments.
* df-scan.c (df_get_call_refs): Modify get_call_reg_set_usage
arguments.
* doc/tm.texi.in (TARGET_CHECK_PART_CLOBBERED): New hook.
(TARGET_USED_REG_SET): New hook.
* final.c (collect_fn_hard_reg_usage): Modify get_call_reg_set_usage
arguments.
(get_call_reg_set_usage): Update description and argument list,
modify code to return proper register set.
* hooks.c (hook_bool_uint_mode_false): Rename to
hook_bool_insn_uint_mode_false.
* hooks.h (hook_bool_uint_mode_false): Ditto.
* ira-conflicts.c (ira_build_conflicts): Modify
targetm.hard_regno_call_part_clobbered arguments.
* ira-costs.c (ira_tune_allocno_costs): Ditto.
* ira-lives.c (process_bb_node_lives): Modify get_call_reg_set_usage
arguments.
* lra-constraints.c (need_for_call_save_p): Add new argument.
Modify return and update arguments to
targetm.hard_regno_call_part_clobbered.
(need_for_split_p): Add insn argument. Pass argument to
need_for_call_save_p.
(split_if_necessary): Pass insn argument to need_for_split_p.
(inherit_in_ebb): Pass curr_insn to need_for_split_p.
* lra-int.h (struct lra_reg): Add check_part_clobbered field
* lra-lives.c (lra_setup_reload_pseudo_preferenced_hard_reg):
Add insn argument.
(check_pseudos_live_through_calls): Add check of flag_ipa_ra.
(process_bb_lives): Pass curr_insn to check_pseudos_live_through_calls.
Modify get_call_reg_set_usage, targetm.check_part_clobbered, and
check_pseudos_live_through_calls arguments.
* lra.c (initialize_lra_reg_info_element): Initialize
check_part_clobbered to false.
* postreload.c (reload_combine): Modify get_call_reg_set_usage
arguments.
* regcprop.c (copyprop_hardreg_forward_1): Modify
get_call_reg_set_usage and targetm.hard_regno_call_part_clobbered
arguments.
* reginfo.c (choose_hard_reg_mode): Modify
targetm.hard_regno_call_part_clobbered arguments.
* regrename.c (check_new_reg_p): Ditto.
* regs.h (get_call_reg_set_usage): Update argument list.
* reload.c (find_equiv_reg): Modify
targetm.hard_regno_call_part_clobbered argument list.
* reload1.c (emit_reload_insns): Ditto.
* resource.c (mark_set_resources): Modify get_call_reg_set_usage
argument list.
* sched-deps.c (deps_analyze_insn): Modify
targetm.hard_regno_call_part_clobbered argument list.
* sel-sched.c (init_regs_for_mode): Ditto.
(mark_unavailable_hard_regs): Ditto.
* target.def (hard_regno_call_part_clobbered): Update description
a

[patch] leverage cacheTextUpdate for __clear_cache on VxWorks

2018-09-20 Thread Olivier Hainque
Hello,

Proper synchronization of instruction and data caches for
trampolines is always tricky.

As we we considering various options to achieve this on ARM
for VxWorks, Alex found out about the cacheTextUpdate entry
point.

The function is expected to always be available
and to perform whatever needs to be done, if anything at all,
from the kernel perspective so tailored for whatever particular
board/cpu at hand.

This turned out a perfect fit, available since at least
version 5.4.

This patch arranges to provide an implementation of
__clear_cache using this service consistently for all the
VxWorks ports.

The bulk of this was contributed by Alex (thanks!)

I only added the LIB2FUNCS_EXCLUDE part and performed
the on-board verification that the patch had the intended effect
(before the patch, crashes from heavy use of indirect calls to
nested functions, then correct execution after the patch).

We have been using this for a while in our gcc-7 based toolchains
on several targets.

I checked on a gcc-8 based source tree that I can still
build functional compilers passing Ada ACATS for VxWorks
6.9 and 7.0.

Also bootstrapped and reg tested on mainline for x86_64-linux.

With Kind Regards,

Olivier

2018-09-20  Alexandre Oliva  

libgcc/
* config/vxcache.c: New file.  Provide __clear_cache, based on
the cacheTextUpdate VxWorks service.
* config/t-vxworks (LIB2ADD): Add vxcache.c.
(LIB2FUNCS_EXCLUDE): Add _clear_cache.
* config/t-vxwoks7: Likewise.
gcc/
* config/vxworks.h (CLEAR_INSN_CACHE): #define to 1.




0007-Resort-to-VxWorks-cacheLib-services-for-__clear_cach.patch
Description: Binary data


Re: C++ PATCH to refine c++/87109 patch

2018-09-20 Thread Jason Merrill
On Wed, Sep 19, 2018 at 9:50 PM, Marek Polacek  wrote:
> Aaaand this addresses 
> ,
> as I promised earlier.  I hope I got it right.
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2018-09-19  Marek Polacek  
>
> PR c++/87109 - wrong ctor with maybe-rvalue semantics.
> * call.c (build_user_type_conversion_1): Refine the maybe-rvalue
> check to only return if we're converting from a base class.
>
> * g++.dg/cpp0x/ref-qual19.C: Adjust the expected results.
> * g++.dg/cpp0x/ref-qual20.C: New test.
>
> diff --git gcc/cp/call.c gcc/cp/call.c
> index ddf0ed044a0..4bbd77b9cef 100644
> --- gcc/cp/call.c
> +++ gcc/cp/call.c
> @@ -4034,9 +4034,13 @@ build_user_type_conversion_1 (tree totype, tree expr, 
> int flags,
>  conv->bad_p = true;
>
>/* We're performing the maybe-rvalue overload resolution and
> - a conversion function is in play.  This isn't going to work
> - because we would not end up with a suitable constructor.  */
> -  if ((flags & LOOKUP_PREFER_RVALUE) && !DECL_CONSTRUCTOR_P (cand->fn))
> + a conversion function is in play.  If we're converting from
> + a base class to a derived class, reject the conversion.  */
> +  if ((flags & LOOKUP_PREFER_RVALUE)
> +  && !DECL_CONSTRUCTOR_P (cand->fn)
> +  && CLASS_TYPE_P (fromtype)
> +  && CLASS_TYPE_P (totype)
> +  && DERIVED_FROM_P (fromtype, totype))

Here fromtype is the type we're converting from, and what we want to
reject is converting the return value of the conversion op to a base
class.  CLASS_TYPE_P (fromtype) will always be true, since it has a
conversion op.  And I think we also want to handle the case of totype
being a reference.

Jason


Re: [PATCH 11/25] Simplify vec_merge according to the mask.

2018-09-20 Thread Andrew Stubbs

On 17/09/18 10:05, Richard Sandiford wrote:

Would be good to have self-tests for the new transforms.

[...]

known_eq, since we require equality for correctness.  Same for the
other tests.


How about the attached? I've made the edits you requested and written 
some self-tests.



Doesn't simplify_merge_mask make the second two redundant?  I couldn't
see the difference between them and the first condition tested by
simplify_merge_mask.


Yes, I think you're right. Removed, now.

Andrew

Simplify vec_merge according to the mask.

This patch was part of the original patch we acquired from Honza and Martin.

It simplifies nested vec_merge operations using the same mask.

Self-tests are included.

2018-09-20  Andrew Stubbs  
	Jan Hubicka  
	Martin Jambor  

	* simplify-rtx.c (simplify_merge_mask): New function.
	(simplify_ternary_operation): Use it, also see if VEC_MERGEs with the
	same masks are used in op1 or op2.
	(test_vec_merge): New function.
	(test_vector_ops): Call test_vec_merge.

diff --git a/gcc/simplify-rtx.c b/gcc/simplify-rtx.c
index f77e1aa..13b2882 100644
--- a/gcc/simplify-rtx.c
+++ b/gcc/simplify-rtx.c
@@ -5578,6 +5578,68 @@ simplify_cond_clz_ctz (rtx x, rtx_code cmp_code, rtx true_val, rtx false_val)
   return NULL_RTX;
 }
 
+/* Try to simplify nested VEC_MERGE operations by comparing the masks.  The
+   nested operations need not use the same vector mode, but must have the same
+   number of elements.
+
+   X is an operand number OP of a VEC_MERGE operation with MASK.
+   Returns NULL_RTX if no simplification is possible.  */
+
+rtx
+simplify_merge_mask (rtx x, rtx mask, int op)
+{
+  gcc_assert (VECTOR_MODE_P (GET_MODE (x)));
+  poly_uint64 nunits = GET_MODE_NUNITS (GET_MODE (x));
+  if (GET_CODE (x) == VEC_MERGE && rtx_equal_p (XEXP (x, 2), mask))
+{
+  if (!side_effects_p (XEXP (x, 1 - op)))
+	return XEXP (x, op);
+}
+  if (side_effects_p (x))
+return NULL_RTX;
+  if (UNARY_P (x)
+  && VECTOR_MODE_P (GET_MODE (XEXP (x, 0)))
+  && known_eq (GET_MODE_NUNITS (GET_MODE (XEXP (x, 0))), nunits))
+{
+  rtx top0 = simplify_merge_mask (XEXP (x, 0), mask, op);
+  if (top0)
+	return simplify_gen_unary (GET_CODE (x), GET_MODE (x), top0,
+   GET_MODE (XEXP (x, 0)));
+}
+  if (BINARY_P (x)
+  && VECTOR_MODE_P (GET_MODE (XEXP (x, 0)))
+  && known_eq (GET_MODE_NUNITS (GET_MODE (XEXP (x, 0))), nunits)
+  && VECTOR_MODE_P (GET_MODE (XEXP (x, 1)))
+  && known_eq (GET_MODE_NUNITS (GET_MODE (XEXP (x, 1))), nunits))
+{
+  rtx top0 = simplify_merge_mask (XEXP (x, 0), mask, op);
+  rtx top1 = simplify_merge_mask (XEXP (x, 1), mask, op);
+  if (top0 || top1)
+	return simplify_gen_binary (GET_CODE (x), GET_MODE (x),
+top0 ? top0 : XEXP (x, 0),
+top1 ? top1 : XEXP (x, 1));
+}
+  if (GET_RTX_CLASS (GET_CODE (x)) == RTX_TERNARY
+  && VECTOR_MODE_P (GET_MODE (XEXP (x, 0)))
+  && known_eq (GET_MODE_NUNITS (GET_MODE (XEXP (x, 0))), nunits)
+  && VECTOR_MODE_P (GET_MODE (XEXP (x, 1)))
+  && known_eq (GET_MODE_NUNITS (GET_MODE (XEXP (x, 1))), nunits)
+  && VECTOR_MODE_P (GET_MODE (XEXP (x, 2)))
+  && known_eq (GET_MODE_NUNITS (GET_MODE (XEXP (x, 2))), nunits))
+{
+  rtx top0 = simplify_merge_mask (XEXP (x, 0), mask, op);
+  rtx top1 = simplify_merge_mask (XEXP (x, 1), mask, op);
+  rtx top2 = simplify_merge_mask (XEXP (x, 2), mask, op);
+  if (top0 || top1)
+	return simplify_gen_ternary (GET_CODE (x), GET_MODE (x),
+ GET_MODE (XEXP (x, 0)),
+ top0 ? top0 : XEXP (x, 0),
+ top1 ? top1 : XEXP (x, 1),
+ top2 ? top2 : XEXP (x, 2));
+}
+  return NULL_RTX;
+}
+
 
 /* Simplify CODE, an operation with result mode MODE and three operands,
OP0, OP1, and OP2.  OP0_MODE was the mode of OP0 before it became
@@ -5967,6 +6029,16 @@ simplify_ternary_operation (enum rtx_code code, machine_mode mode,
 	  && !side_effects_p (op2) && !side_effects_p (op1))
 	return op0;
 
+  if (!side_effects_p (op2))
+	{
+	  rtx top0 = simplify_merge_mask (op0, op2, 0);
+	  rtx top1 = simplify_merge_mask (op1, op2, 1);
+	  if (top0 || top1)
+	return simplify_gen_ternary (code, mode, mode,
+	 top0 ? top0 : op0,
+	 top1 ? top1 : op1, op2);
+	}
+
   break;
 
 default:
@@ -6932,6 +7004,71 @@ test_vector_ops_series (machine_mode mode, rtx scalar_reg)
 	constm1_rtx));
 }
 
+/* Verify simplify_merge_mask works correctly.  */
+
+static void
+test_vec_merge (machine_mode mode)
+{
+  rtx op0 = make_test_reg (mode);
+  rtx op1 = make_test_reg (mode);
+  rtx op2 = make_test_reg (mode);
+  rtx op3 = make_test_reg (mode);
+  rtx op4 = make_test_reg (mode);
+  rtx op5 = make_test_reg (mode);
+  rtx mask1 = make_test_reg (SImode);
+  rtx mask2 = make_test_reg (SImode);
+  rtx vm1 = gen_rtx_VEC_MERGE (mode, op0, op1, mask1);
+  rtx vm2 = gen_rtx_VEC_MERGE (mode, op2, op3, mask1);
+  rtx vm3 = gen_rtx_VEC_MERGE (mode, op4, op5, mask1);
+
+  /* Simple vec_m

Re: [patch] prepend vxworks-dummy.h to tm_file for powerp

2018-09-20 Thread Segher Boessenkool
Hi Olivier,

On Thu, Sep 20, 2018 at 04:04:51PM +0200, Olivier Hainque wrote:
> vxworks-dummy.h is intended to be included in the list of
> target header files for every CPU for which we have at least
> one VxWorks port.
> 
> It essentially provides default values for common VxWorks
> markers (typically, macros conveying if we are configured for 
> such or such VxWorks variant), so they can be referenced
> consistently in other files of the port.
> 
> This was missing for powerpc* and this patch just fixes that,
> which will help further vxworks related patches to come. 
> 
> This should really be a noop for non VxWorks ports.
> 
> Checked on a gcc-8 based source tree that I can still
> build functional compilers passing Ada ACATS for VxWorks
> 6.9 and 7.0.
> 
> Bootstrapped and reg tested on mainline for x86_64-linux.

Looks fine to me (for all branches).  Thanks,


Segher


> 2018-09-20  Olivier Hainque  
> 
>   * config.gcc (powerpc*-*-*): Prepend vxworks-dummy.h to tm_file.


Re: [patch] prepend vxworks-dummy.h to tm_file for powerpc

2018-09-20 Thread Olivier Hainque
Hi Segher,

> On 20 Sep 2018, at 17:44, Segher Boessenkool  
> wrote:
> 
> Looks fine to me (for all branches).  Thanks,

Great :-) Thanks for your prompt feedabck !




Re: C++ PATCH to refine c++/87109 patch

2018-09-20 Thread Marek Polacek
On Thu, Sep 20, 2018 at 11:25:38AM -0400, Jason Merrill wrote:
> On Wed, Sep 19, 2018 at 9:50 PM, Marek Polacek  wrote:
> > Aaaand this addresses 
> > ,
> > as I promised earlier.  I hope I got it right.
> >
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> >
> > 2018-09-19  Marek Polacek  
> >
> > PR c++/87109 - wrong ctor with maybe-rvalue semantics.
> > * call.c (build_user_type_conversion_1): Refine the maybe-rvalue
> > check to only return if we're converting from a base class.
> >
> > * g++.dg/cpp0x/ref-qual19.C: Adjust the expected results.
> > * g++.dg/cpp0x/ref-qual20.C: New test.
> >
> > diff --git gcc/cp/call.c gcc/cp/call.c
> > index ddf0ed044a0..4bbd77b9cef 100644
> > --- gcc/cp/call.c
> > +++ gcc/cp/call.c
> > @@ -4034,9 +4034,13 @@ build_user_type_conversion_1 (tree totype, tree 
> > expr, int flags,
> >  conv->bad_p = true;
> >
> >/* We're performing the maybe-rvalue overload resolution and
> > - a conversion function is in play.  This isn't going to work
> > - because we would not end up with a suitable constructor.  */
> > -  if ((flags & LOOKUP_PREFER_RVALUE) && !DECL_CONSTRUCTOR_P (cand->fn))
> > + a conversion function is in play.  If we're converting from
> > + a base class to a derived class, reject the conversion.  */
> > +  if ((flags & LOOKUP_PREFER_RVALUE)
> > +  && !DECL_CONSTRUCTOR_P (cand->fn)
> > +  && CLASS_TYPE_P (fromtype)
> > +  && CLASS_TYPE_P (totype)
> > +  && DERIVED_FROM_P (fromtype, totype))
> 
> Here fromtype is the type we're converting from, and what we want to
> reject is converting the return value of the conversion op to a base
> class.  CLASS_TYPE_P (fromtype) will always be true, since it has a
> conversion op.  And I think we also want to handle the case of totype
> being a reference.

I think I totally misunderstood what this was about.  It's actually about
this case

struct Y { int y; };
struct X : public Y { int x; };

struct A {
  operator X();
};

Y
fn (A a)
{
  return a;
}

where we want to avoid slicing of X when converting X to Y, yes?

Marek


Re: [PATCH 08/25] Fix co-array allocation

2018-09-20 Thread Janne Blomqvist
On Wed, Sep 19, 2018 at 7:24 PM Andrew Stubbs  wrote:

> On 05/09/18 19:07, Janne Blomqvist wrote:
> > The argument must be of type size_type_node, not sizetype. Please instead
> > use
> >
> > size = build_zero_cst (size_type_node);
> >
> >
> >>  * trans-intrinsic.c (conv_intrinsic_event_query): Convert
> computed
> >>  index to a size_t type.
> >>
> >
> > Using integer_type_node is wrong, but the correct type for calculating
> > array indices (lbound, ubound,  etc.) is not size_type_node but rather
> > gfc_array_index_type (which in practice maps to ptrdiff_t). So please use
> > that, and then fold_convert index to size_type_node just before
> generating
> > the call to event_query.
> >
> >
> >>  * trans-stmt.c (gfc_trans_event_post_wait): Likewise.
> >>
> >
> > Same here as above.
>
> How is the attached? I retested and found no regressions.
>
> Andrew
>

Ok, looks good.

There are some other remaining incorrect uses of integer_type_node (at
least one visible in the diff), but that can be done as a separate patch
(not saying you must do it as a precondition for anything, though it would
of course be nice if you would. :) )

-- 
Janne Blomqvist


Re: [PATCH 22/25] Add dg-require-effective-target exceptions

2018-09-20 Thread Andrew Stubbs

On 17/09/18 18:51, Mike Stump wrote:

On Sep 5, 2018, at 4:52 AM, a...@codesourcery.com wrote:

There are a number of tests that fail because they assume that exceptions are
available, but GCN does not support them, yet.


So, generally we don't goop up the testsuite with the day to day port stuff 
when it is being developed.  If the port is finished, and EH can't be done, 
this type of change is fine.  If someone plans on doing it in the next 5 years 
and the port is still being developed, there is likely little reason to do 
this.  People that track regressions do so by differencing, and that easily 
handles massive amounts of failures seamlessly.

So, my question would be, has is just not been worked on yet, or, is it 
basically impossible to ever do it?


It's not impossible, but there's no plan to implement it.

I'm just trying to avoid myself and others spending future hours 
triaging this stuff, again.


Andrew


Re: [openacc] Teach gfortran to lower OpenACC routine dims

2018-09-20 Thread Bernhard Reutner-Fischer
On Thu, 20 Sep 2018 07:41:08 -0700
Cesar Philippidis  wrote:

> On 09/19/2018 03:27 PM, Bernhard Reutner-Fischer wrote:
> > On Wed, 5 Sep 2018 12:52:03 -0700
> > Cesar Philippidis  wrote:

> >> diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
> >> index eea6b81ebfa..eed868f475b 100644
> >> --- a/gcc/fortran/trans-decl.c
> >> +++ b/gcc/fortran/trans-decl.c
> >> @@ -46,6 +46,7 @@ along with GCC; see the file COPYING3.  If not
> >> see #include "trans-stmt.h"
> >>  #include "gomp-constants.h"
> >>  #include "gimplify.h"
> >> +#include "omp-general.h"  
> > 
> > hmz. so the gomp-constants.h include would be redundant, but do we
> > really need omp-general.h?  
> 
> Good point. omp-general.h is required for oacc_build_routine_dims.
> 
> > Doesn't this suggest to move this oacc dims lowering to
> > trans-openmp.c instead, please?  
> 
> So something like adding a new gfc_add_omp_offload_attributes to
> trans-openmp.c and call it from add_attributes_to_decl?

yes.

> On a related note, I noticed that I forgot to incorporate this change
> in gfortran.h:
> 
> @@ -902,7 +912,7 @@ typedef struct
>unsigned oacc_declare_link:1;
> 
>/* This is an OpenACC acclerator function at level N - 1  */
> -  unsigned oacc_function:3;
> +  ENUM_BITFIELD (oacc_function) oacc_function:3;
> 
> It's probably not huge, but I noticed that some other enum bitfields
> are declared that way.

yea, some compilers had trouble with enum bitfields (where plain int
bitfields like here worked fine, IIRC) but i'm not sure if it's
considered legacy these days. Fine with me to be safe.

> 
> > btw.. the OACC merge from the gomp4 branch added a copy'n paste
> > error in an error message. May i ask you to regtest and install the
> > below:

> Sure. That looks reasonable. I'll also update and/or add new tests as
> necessary.

TIA and cheers,


Re: C++ PATCH to refine c++/87109 patch

2018-09-20 Thread Jason Merrill
On Thu, Sep 20, 2018 at 11:53 AM, Marek Polacek  wrote:
> On Thu, Sep 20, 2018 at 11:25:38AM -0400, Jason Merrill wrote:
>> On Wed, Sep 19, 2018 at 9:50 PM, Marek Polacek  wrote:
>> > Aaaand this addresses 
>> > ,
>> > as I promised earlier.  I hope I got it right.
>> >
>> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
>> >
>> > 2018-09-19  Marek Polacek  
>> >
>> > PR c++/87109 - wrong ctor with maybe-rvalue semantics.
>> > * call.c (build_user_type_conversion_1): Refine the maybe-rvalue
>> > check to only return if we're converting from a base class.
>> >
>> > * g++.dg/cpp0x/ref-qual19.C: Adjust the expected results.
>> > * g++.dg/cpp0x/ref-qual20.C: New test.
>> >
>> > diff --git gcc/cp/call.c gcc/cp/call.c
>> > index ddf0ed044a0..4bbd77b9cef 100644
>> > --- gcc/cp/call.c
>> > +++ gcc/cp/call.c
>> > @@ -4034,9 +4034,13 @@ build_user_type_conversion_1 (tree totype, tree 
>> > expr, int flags,
>> >  conv->bad_p = true;
>> >
>> >/* We're performing the maybe-rvalue overload resolution and
>> > - a conversion function is in play.  This isn't going to work
>> > - because we would not end up with a suitable constructor.  */
>> > -  if ((flags & LOOKUP_PREFER_RVALUE) && !DECL_CONSTRUCTOR_P (cand->fn))
>> > + a conversion function is in play.  If we're converting from
>> > + a base class to a derived class, reject the conversion.  */
>> > +  if ((flags & LOOKUP_PREFER_RVALUE)
>> > +  && !DECL_CONSTRUCTOR_P (cand->fn)
>> > +  && CLASS_TYPE_P (fromtype)
>> > +  && CLASS_TYPE_P (totype)
>> > +  && DERIVED_FROM_P (fromtype, totype))
>>
>> Here fromtype is the type we're converting from, and what we want to
>> reject is converting the return value of the conversion op to a base
>> class.  CLASS_TYPE_P (fromtype) will always be true, since it has a
>> conversion op.  And I think we also want to handle the case of totype
>> being a reference.
>
> I think I totally misunderstood what this was about.  It's actually about
> this case
>
> struct Y { int y; };
> struct X : public Y { int x; };
>
> struct A {
>   operator X();
> };
>
> Y
> fn (A a)
> {
>   return a;
> }
>
> where we want to avoid slicing of X when converting X to Y, yes?

Yes.

Jason


Re: [PATCH 08/25] Fix co-array allocation

2018-09-20 Thread Andrew Stubbs

On 20/09/18 16:56, Janne Blomqvist wrote:

Ok, looks good.


Thanks.

There are some other remaining incorrect uses of integer_type_node (at 
least one visible in the diff), but that can be done as a separate patch 
(not saying you must do it as a precondition for anything, though it 
would of course be nice if you would. :) )


I'm not confident I can tell what should be integer_type_node, and what 
should not?


Once it gets to build_call_expr_loc it's clear that the types should 
match the function signature, but the intermediate values' types are not 
obvious to me.


Andrew


Re: [GCC][PATCH v2][Aarch64] Exploiting BFXIL when OR-ing two AND-operations with appropriate bitmasks

2018-09-20 Thread Christophe Lyon
On Wed, 19 Sep 2018 at 11:31, Kyrill Tkachov
 wrote:
>
> Hi Christophe,
>
> On 18/09/18 23:00, Christophe Lyon wrote:
> > On Thu, 13 Sep 2018 at 11:49, Kyrill Tkachov
> >  wrote:
> >>
> >> On 13/09/18 10:25, Sam Tebbs wrote:
> >>> On 09/11/2018 04:20 PM, James Greenhalgh wrote:
>  On Tue, Sep 04, 2018 at 10:13:43AM -0500, Sam Tebbs wrote:
> > Hi James,
> >
> > Thanks for the feedback. Here is an update with the changes you proposed
> > and an updated changelog.
> >
> > gcc/
> > 2018-09-04  Sam Tebbs  
> >
> >PR target/85628
> >* config/aarch64/aarch64.md (*aarch64_bfxil):
> >Define.
> >* config/aarch64/constraints.md (Ulc): Define
> >* config/aarch64/aarch64-protos.h 
> > (aarch64_high_bits_all_ones_p):
> >Define.
> >* config/aarch64/aarch64.c (aarch64_high_bits_all_ones_p): 
> > New function.
> >
> > gcc/testsuite
> > 2018-09-04  Sam Tebbs  
> >
> >PR target/85628
> >* gcc.target/aarch64/combine_bfxil.c: New file.
> >* gcc.target/aarch64/combine_bfxil_2.c: New file.
> >
> >
>  
> 
> > +/* Return true if I's bits are consecutive ones from the MSB.  */
> > +bool
> > +aarch64_high_bits_all_ones_p (HOST_WIDE_INT i)
> > +{
> > +  return exact_log2(-i) != HOST_WIDE_INT_M1;
> > +}
>  You need a space in here between the function name and the bracket:
> 
>  exact_log2 (-i)
> 
> 
> > +extern void abort(void);
>  The same comment applies multiple places in this file.
> 
>  Likewise; if (
> 
>  Otherwise, OK, please apply with those fixes.
> 
>  Thanks,
>  James
> >>> Thanks for noticing that, here's the fixed version.
> >>>
> >> Thanks Sam, I've committed the patch on your behalf with r264264.
> >> If you want to get write-after-approval access to the SVN repo to commit 
> >> patches yourself in the future
> >> please fill out the form at https://sourceware.org/cgi-bin/pdw/ps_form.cgi 
> >> putting my address from the MAINTAINERS file as the approver.
> >>
> > Hi,
> >
> > You've probably already noticed by now since you fixed the
> > combine_bfi_1 issue introduced by this commit, but it add another
> > regression:
> > FAIL: gcc.target/aarch64/copysign-bsl.c scan-assembler b(sl|it|if)\tv[0-9]
>
> Yeah, that one is a bit more involved as it's an unexpected interaction with 
> the copysign BSL pattern.
> Would you be able to file a bugzilla issue to track it?
>

Sure, this is: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=87369


> Thanks,
> Kyrill
>
> > Christophe
> >
> >> Kyrill
> >>
> >>> Sam
>


[patch,openacc] Better distinguish OpenACC and OpenMP sections in libgomp.texi

2018-09-20 Thread Cesar Philippidis
This patch updates the libgomp documentation to more clearly identify
OpenMP-specific sections. Specifically, the sections "Runtime Library
Routine" and "Environment Variables" are now prefixed by OpenMP, because
those sections are applicable to OpenACC.

Is this OK for trunk? I verified that libgomp.pdf looks ok.

Thanks,
Cesar
[OpenACC] Update _OPENACC value and documentation for OpenACC 2.5

2018-XX-YY  Thomas Schwinge 
	Cesar Philippidis  

	gcc/c-family/
	* c-cppbuiltin.c (c_cpp_builtins): Update "_OPENACC" to "201510".
	gcc/fortran/
	* cpp.c (cpp_define_builtins): Update "_OPENACC" to "201510".
	* gfortran.texi: Update for OpenACC 2.5.
	* Intrinsic.texi: Likewise.
	* invoke.texi: Likewise.
	gcc/testsuite/
	* c-c++-common/cpp/openacc-define-3.c: Update.
	* gfortran.dg/openacc-define-3.f90: Likewise.
	gcc/
	* doc/invoke.texi: Update for OpenACC 2.5.
	libgomp/
	* libgomp.texi: Update for OpenACC 2.5.
	* openacc.f90 (openacc_version): Update to "201510".
	* openacc_lib.h (openacc_version): Likewise.
	* testsuite/libgomp.oacc-fortran/openacc_version-1.f: Update.
	* testsuite/libgomp.oacc-fortran/openacc_version-2.f90: Update.

(cherry picked from gomp-4_0-branch r248057, ccbbcb70569)
---
 gcc/c-family/c-cppbuiltin.c   |  2 +-
 gcc/doc/invoke.texi   |  4 +++-
 gcc/fortran/cpp.c |  2 +-
 gcc/fortran/gfortran.texi | 16 +-
 gcc/fortran/intrinsic.texi|  6 +++---
 gcc/fortran/invoke.texi   |  4 +---
 .../c-c++-common/cpp/openacc-define-3.c   |  2 +-
 .../gfortran.dg/openacc-define-3.f90  |  2 +-
 libgomp/libgomp.texi  | 21 ++-
 libgomp/openacc.f90   |  2 +-
 libgomp/openacc_lib.h |  2 +-
 .../libgomp.oacc-fortran/openacc_version-1.f  |  2 +-
 .../openacc_version-2.f90 |  2 +-
 13 files changed, 31 insertions(+), 36 deletions(-)

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 96a6b4dfd2b..f2a273b6ac7 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -1391,7 +1391,7 @@ c_cpp_builtins (cpp_reader *pfile)
 cpp_define (pfile, "__SSP__=1");
 
   if (flag_openacc)
-cpp_define (pfile, "_OPENACC=201306");
+cpp_define (pfile, "_OPENACC=201510");
 
   if (flag_openmp)
 cpp_define (pfile, "_OPENMP=201511");
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 94304c314cf..34d7ff71512 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2161,10 +2161,12 @@ freestanding and hosted environments.
 Enable handling of OpenACC directives @code{#pragma acc} in C/C++ and
 @code{!$acc} in Fortran.  When @option{-fopenacc} is specified, the
 compiler generates accelerated code according to the OpenACC Application
-Programming Interface v2.0 @w{@uref{https://www.openacc.org}}.  This option
+Programming Interface v2.5 @w{@uref{https://www.openacc.org}}.  This option
 implies @option{-pthread}, and thus is only supported on targets that
 have support for @option{-pthread}.
 
+See @uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
+
 @item -fopenacc-dim=@var{geom}
 @opindex fopenacc-dim
 @cindex OpenACC accelerator programming
diff --git a/gcc/fortran/cpp.c b/gcc/fortran/cpp.c
index 0b3de42e832..14871129ff6 100644
--- a/gcc/fortran/cpp.c
+++ b/gcc/fortran/cpp.c
@@ -165,7 +165,7 @@ cpp_define_builtins (cpp_reader *pfile)
   cpp_define (pfile, "_LANGUAGE_FORTRAN=1");
 
   if (flag_openacc)
-cpp_define (pfile, "_OPENACC=201306");
+cpp_define (pfile, "_OPENACC=201510");
 
   if (flag_openmp)
 cpp_define (pfile, "_OPENMP=201511");
diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 30934046a49..59a69457fe0 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -476,9 +476,7 @@ used on real-world programs.  In particular, the supported extensions
 include OpenMP, Cray-style pointers, some old vendor extensions, and several
 Fortran 2003 and Fortran 2008 features, including TR 15581.  However, it is
 still under development and has a few remaining rough edges.
-There also is initial support for OpenACC.
-Note that this is an experimental feature, incomplete, and subject to
-change in future versions of GCC.  See
+There also is support for OpenACC.  See
 @uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
 
 At present, the GNU Fortran compiler passes the
@@ -538,10 +536,8 @@ status} and @ref{Fortran 2018 status} sections of the documentation.
 Additionally, the GNU Fortran compilers supports the OpenMP specification
 (version 4.0 and most of the features of the 4.5 version,
 @url{http://openmp.org/@/wp/@/openmp-specifications/}).
-There also is initial support for the OpenACC specification (targeting
-version 2.0, @uref{http://www.openacc.org/}).
-Note that this is an experimental feature, incomplete, and subject to
-change

[patch,opencc] Don't mark OpenACC auto loops as independent inside acc parallel regions

2018-09-20 Thread Cesar Philippidis
OpenACC as a concept of loop independence, in which independent loops
may be executed in parallel across gangs, workers and vectors. Inside
acc parallel regions, if a loop isn't explicitly marked seq or auto, it
is predetermined to be independent.

This patch corrects a bug where acc loops marked as auto were being
mistakenly promoted to independent. That's bad because it can generate
bogus results if a dependency exist.

Note that this patch depends on the following patches for
-fnote-info-omp-optimized which is used in a test case.

  * Add user-friendly OpenACC diagnostics regarding detected
parallelism.
https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01652.html

  * Correct the reported line number in fortran combined OpenACC
directives
https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01554.html

  * Correct the reported line number in c++ combined OpenACC directives
https://gcc.gnu.org/ml/gcc-patches/2018-07/msg01552.html

Is this OK for trunk? I bootstrapped and regtested on x86_64 Linux with
nvptx offloading.

Thanks,
Cesar
[OpenACC] Don't mark OpenACC auto loops as independent inside acc parallel regions

2018-XX-YY  Cesar Philippidis  

	gcc/
	* omp-low.c (lower_oacc_head_mark): Don't mark OpenACC auto
	loops as independent inside acc parallel regions.

	gcc/testsuite/
	* c-c++-common/goacc/loop-auto-1.c: Adjust test case to conform to
	the new behavior of the auto clause in OpenACC 2.5.
	* c-c++-common/goacc/loop-auto-2.c: Likewise.
	* gcc.dg/goacc/loop-processing-1.c: Likewise.
	* c-c++-common/goacc/loop-auto-3.c: New test.
	* gfortran.dg/goacc/loop-auto-1.f90: New test.

	libgomp/
	* testsuite/libgomp.oacc-c-c++-common/loop-auto-1.c: Adjust test case
	to conform to the new behavior of the auto clause in OpenACC 2.5.

(cherry picked from gomp-4_0-branch r247569, 6d30b542f29)

---
 gcc/omp-low.c |  5 +-
 .../c-c++-common/goacc/loop-auto-1.c  | 50 +--
 .../c-c++-common/goacc/loop-auto-2.c  |  4 +-
 .../c-c++-common/goacc/loop-auto-3.c  | 78 
 .../gcc.dg/goacc/loop-processing-1.c  |  2 +-
 .../gfortran.dg/goacc/loop-auto-1.f90 | 88 +++
 .../libgomp.oacc-c-c++-common/loop-auto-1.c   | 20 ++---
 7 files changed, 207 insertions(+), 40 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/loop-auto-3.c
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/loop-auto-1.f90

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index fdabf67249b..24685fd012c 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -5647,9 +5647,10 @@ lower_oacc_head_mark (location_t loc, tree ddvar, tree clauses,
   tag |= OLF_GANG_STATIC;
 }
 
-  /* In a parallel region, loops are implicitly INDEPENDENT.  */
+  /* In a parallel region, loops without auto and seq clauses are
+ implicitly INDEPENDENT.  */
   omp_context *tgt = enclosing_target_ctx (ctx);
-  if (!tgt || is_oacc_parallel (tgt))
+  if ((!tgt || is_oacc_parallel (tgt)) && !(tag & (OLF_SEQ | OLF_AUTO)))
 tag |= OLF_INDEPENDENT;
 
   if (tag & OLF_TILE)
diff --git a/gcc/testsuite/c-c++-common/goacc/loop-auto-1.c b/gcc/testsuite/c-c++-common/goacc/loop-auto-1.c
index 124befc4002..dcad07f11c8 100644
--- a/gcc/testsuite/c-c++-common/goacc/loop-auto-1.c
+++ b/gcc/testsuite/c-c++-common/goacc/loop-auto-1.c
@@ -10,7 +10,7 @@ void Foo ()
 #pragma acc loop seq
 	for (int jx = 0; jx < 10; jx++) {}
 
-#pragma acc loop auto /* { dg-warning "insufficient partitioning" } */
+#pragma acc loop auto independent /* { dg-warning "insufficient partitioning" } */
 	for (int jx = 0; jx < 10; jx++) {}
   }
 
@@ -20,7 +20,7 @@ void Foo ()
 #pragma acc loop auto
 	for (int jx = 0; jx < 10; jx++) {}
 
-#pragma acc loop auto /* { dg-warning "insufficient partitioning" } */
+#pragma acc loop auto independent /* { dg-warning "insufficient partitioning" } */
 	for (int jx = 0; jx < 10; jx++)
 	  {
 #pragma acc loop vector
@@ -51,7 +51,7 @@ void Foo ()
 #pragma acc loop vector
 	for (int jx = 0; jx < 10; jx++)
 	  {
-#pragma acc loop auto /* { dg-warning "insufficient partitioning" } */
+#pragma acc loop auto independent /* { dg-warning "insufficient partitioning" } */
 	for (int kx = 0; kx < 10; kx++) {}
 	  }
 
@@ -64,27 +64,27 @@ void Foo ()
 
   }
 
-#pragma acc loop auto
+#pragma acc loop auto independent
 for (int ix = 0; ix < 10; ix++)
   {
-#pragma acc loop auto
+#pragma acc loop auto independent
 	for (int jx = 0; jx < 10; jx++)
 	  {
-#pragma acc loop auto
+#pragma acc loop auto independent
 	for (int kx = 0; kx < 10; kx++) {}
 	  }
   }
 
-#pragma acc loop auto
+#pragma acc loop auto independent
 for (int ix = 0; ix < 10; ix++)
   {
-#pragma acc loop auto
+#pragma acc loop auto independent
 	for (int jx = 0; jx < 10; jx++)
 	  {
-#pragma acc loop auto /* { dg-warning "insufficient partitioning" } */
+#pragma acc loop auto independent /* { dg-warning "insufficient partitioning" } */
 	for (int kx = 0; kx <

[patch,openacc] Fix acc_shutdown issue

2018-09-20 Thread Cesar Philippidis
Attached is an old gomp4 patch that allegedly fixes an shutdown runtime
issue involving OpenACC accelerators. Unfortunately, the original patch
didn't include a test case, nor did it generate any regressions in the
libgomp testsuite when I reverted it in og8.

With that said, I like how this patch eliminates the redundant use of
gomp_mutex_lock to unmap variables (because gomp_unmap_vars already
acquires a lock). However, the trade-off is that it does increase
tgt->list_count to num_funcs + num_vars.

Does anyone have any strong opinion on this patch and is it OK for
trunk? I bootstrapped and regtested it for x86_64 Linux with nvptx
offloading and I didn't encounter any regressions.

Thanks,
Cesar
[OpenACC] Fix acc_shutdown issue

2018-XX-YY  James Norris 
	Cesar Philippidis  

	libgomp/
	* oacc-init.c (acc_shutdown_1): Replace use of gomp_free_memmap with
	gomp_unmap_vars.
	* target.c (gomp_load_image_to_device): Fix initialization.
	(gomp_free_memmap): Remove.

(cherry picked from gomp-4_0-branch r226045)
---
 libgomp/libgomp.h   |  1 -
 libgomp/oacc-init.c |  9 ++---
 libgomp/target.c| 27 +--
 3 files changed, 15 insertions(+), 22 deletions(-)

diff --git a/libgomp/libgomp.h b/libgomp/libgomp.h
index 3a8cc2bd7d6..5c11e97616d 100644
--- a/libgomp/libgomp.h
+++ b/libgomp/libgomp.h
@@ -1003,7 +1003,6 @@ extern struct target_mem_desc *gomp_map_vars (struct gomp_device_descr *,
 	  enum gomp_map_vars_kind);
 extern void gomp_unmap_vars (struct target_mem_desc *, bool);
 extern void gomp_init_device (struct gomp_device_descr *);
-extern void gomp_free_memmap (struct splay_tree_s *);
 extern void gomp_unload_device (struct gomp_device_descr *);
 extern bool gomp_remove_var (struct gomp_device_descr *, splay_tree_key);
 
diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 8842e7218cb..957bb9f31f9 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -303,9 +303,12 @@ acc_shutdown_1 (acc_device_t d)
 
   if (walk->dev)
 	{
-	  gomp_mutex_lock (&walk->dev->lock);
-	  gomp_free_memmap (&walk->dev->mem_map);
-	  gomp_mutex_unlock (&walk->dev->lock);
+	  while (walk->dev->mem_map.root)
+	{
+	  struct target_mem_desc *tgt = walk->dev->mem_map.root->key.tgt;
+
+	  gomp_unmap_vars (tgt, false);
+	}
 
 	  walk->dev = NULL;
 	  walk->base_dev = NULL;
diff --git a/libgomp/target.c b/libgomp/target.c
index dda041cdbef..9ddc8d6c038 100644
--- a/libgomp/target.c
+++ b/libgomp/target.c
@@ -1184,14 +1184,17 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
 }
 
   /* Insert host-target address mapping into splay tree.  */
-  struct target_mem_desc *tgt = gomp_malloc (sizeof (*tgt));
+  struct target_mem_desc *tgt =
+	  gomp_malloc (sizeof (*tgt)
+		   + sizeof (tgt->list[0])
+		   * (num_funcs + num_vars) * sizeof (*tgt->array));
   tgt->array = gomp_malloc ((num_funcs + num_vars) * sizeof (*tgt->array));
   tgt->refcount = REFCOUNT_INFINITY;
   tgt->tgt_start = 0;
   tgt->tgt_end = 0;
   tgt->to_free = NULL;
   tgt->prev = NULL;
-  tgt->list_count = 0;
+  tgt->list_count = num_funcs + num_vars;
   tgt->device_descr = devicep;
   splay_tree_node array = tgt->array;
 
@@ -1204,6 +1207,8 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   k->tgt_offset = target_table[i].start;
   k->refcount = REFCOUNT_INFINITY;
   k->link_key = NULL;
+  tgt->list[i].key = k;
+  tgt->refcount++;
   array->left = NULL;
   array->right = NULL;
   splay_tree_insert (&devicep->mem_map, array);
@@ -1236,6 +1241,8 @@ gomp_load_image_to_device (struct gomp_device_descr *devicep, unsigned version,
   k->tgt_offset = target_var->start;
   k->refcount = target_size & link_bit ? REFCOUNT_LINK : REFCOUNT_INFINITY;
   k->link_key = NULL;
+  tgt->list[i].key = k;
+  tgt->refcount++;
   array->left = NULL;
   array->right = NULL;
   splay_tree_insert (&devicep->mem_map, array);
@@ -1454,22 +1461,6 @@ gomp_unload_device (struct gomp_device_descr *devicep)
 }
 }
 
-/* Free address mapping tables.  MM must be locked on entry, and remains locked
-   on return.  */
-
-attribute_hidden void
-gomp_free_memmap (struct splay_tree_s *mem_map)
-{
-  while (mem_map->root)
-{
-  struct target_mem_desc *tgt = mem_map->root->key.tgt;
-
-  splay_tree_remove (mem_map, &mem_map->root->key);
-  free (tgt->array);
-  free (tgt);
-}
-}
-
 /* Host fallback for GOMP_target{,_ext} routines.  */
 
 static void
-- 
2.17.1



[C++ PATCH] PR c++/87075 - ICE with constexpr array initialization.

2018-09-20 Thread Jason Merrill
My patch of 2016-08-26 to avoid calling a trivial default constructor
introduced TARGET_EXPRs initialized with void_node to express trivial
initialization.  But when this shows up in a VEC_INIT_EXPR, we weren't
prepared to handle it.  Fixed by handling it explicitly in
cxx_eval_vec_init_1.

Tested x86_64-pc-linux-gnu, applying to trunk, and later 7 and 8.

* constexpr.c (cxx_eval_vec_init_1): Handle trivial initialization.
---
 gcc/cp/constexpr.c|  3 +++
 gcc/testsuite/g++.dg/cpp1y/constexpr-array6.C | 26 +++
 gcc/cp/ChangeLog  |  5 
 3 files changed, 34 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/cpp1y/constexpr-array6.C

diff --git a/gcc/cp/constexpr.c b/gcc/cp/constexpr.c
index aa33319875f..fdea769faa9 100644
--- a/gcc/cp/constexpr.c
+++ b/gcc/cp/constexpr.c
@@ -3034,6 +3034,9 @@ cxx_eval_vec_init_1 (const constexpr_ctx *ctx, tree 
atype, tree init,
{
  /* Initializing an element using value or default initialization
 we just pre-built above.  */
+ if (init == void_node)
+   /* Trivial default-init, don't do anything to the CONSTRUCTOR.  */
+   return ctx->ctor;
  eltinit = cxx_eval_constant_expression (&new_ctx, init, lval,
  non_constant_p, overflow_p);
  reuse = i == 0;
diff --git a/gcc/testsuite/g++.dg/cpp1y/constexpr-array6.C 
b/gcc/testsuite/g++.dg/cpp1y/constexpr-array6.C
new file mode 100644
index 000..1f15bef8d0c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp1y/constexpr-array6.C
@@ -0,0 +1,26 @@
+// PR c++/87075
+// { dg-do compile { target c++14 } }
+
+template 
+struct vec
+{
+  struct { T y; } n;
+  vec() = default;
+};
+
+template 
+struct S
+{
+  vec value[2];
+  template
+  constexpr S(const U&);
+};
+
+template
+template
+constexpr S::S(const X&)
+{
+  value[0] = vec();
+}
+
+Sm(0);
diff --git a/gcc/cp/ChangeLog b/gcc/cp/ChangeLog
index 75286d53fdb..c5072d53334 100644
--- a/gcc/cp/ChangeLog
+++ b/gcc/cp/ChangeLog
@@ -1,3 +1,8 @@
+2018-09-20  Jason Merrill  
+
+   PR c++/87075 - ICE with constexpr array initialization.
+   * constexpr.c (cxx_eval_vec_init_1): Handle trivial initialization.
+
 2018-09-19  Marek Polacek  
 
Add -Wclass-conversion.

base-commit: 51481b252ffe30a1daea491f62c687333efabc40
-- 
2.17.1



[patch,openacc] Fix infinite recursion in OMP clause pretty-printing, default label

2018-09-20 Thread Cesar Philippidis
Apparently, Tom ran into an ICE when we were adding support for new
clauses back in the gomp-4_0-branch days.  This patch shouldn't be
necessary because all of the clauses are fully implemented now, but
it may prevent similar bugs from occurring in the future at least
during development.

Is this patch OK for trunk? I bootstrapped and regtested it for x86_64
Linux with nvptx offloading.

Thanks,
Cesar
Fix infinite recursion in OMP clause pretty-printing, default label

Apparently, Tom ran into an ICE when we were adding support for new
clauses back in the gomp-4_0-branch days.  This patch shouldn't be
necessary because all of the clauses are fully implemented now, but
it may prevent similar bugs from occuring in the future at least
during development.

2018-XX-YY  Tom de Vries  
Cesar Philippidis  

	gcc/
	* tree-pretty-print.c (dump_omp_clause): Fix infinite recursion in
	default label.

(cherry picked from gomp-4_0-branch r228915, 2e4d930)
---
 gcc/tree-pretty-print.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/gcc/tree-pretty-print.c b/gcc/tree-pretty-print.c
index 2c089b11751..031afbb49e4 100644
--- a/gcc/tree-pretty-print.c
+++ b/gcc/tree-pretty-print.c
@@ -1063,8 +1063,7 @@ dump_omp_clause (pretty_printer *pp, tree clause, int spc, dump_flags_t flags)
   break;
 
 default:
-  /* Should never happen.  */
-  dump_generic_node (pp, clause, spc, flags, false);
+  pp_string (pp, "unknown");
   break;
 }
 }
-- 
2.17.1



Re: [PATCH] PR libstdc++/78179 run long double tests separately

2018-09-20 Thread Christophe Lyon
On Thu, 20 Sep 2018 at 16:22, Jonathan Wakely  wrote:
>
> On 20/09/18 15:36 +0200, Christophe Lyon wrote:
> >On Wed, 19 Sep 2018 at 23:13, Rainer Orth  
> >wrote:
> >>
> >> Hi Christophe,
> >>
> >> > I have noticed failures on hypot-long-double.cc on arm, so I suggest we 
> >> > add:
> >> >
> >> > diff --git
> >> > a/libstdc++-v3/testsuite/26_numerics/headers/cmath/hypot-long-double.cc
> >> > b/libstdc++-v3/testsuite/26_numerics/headers/cmath/hypot-long-double.cc
> >> > index 8a05473..4c2e33b 100644
> >> > --- 
> >> > a/libstdc++-v3/testsuite/26_numerics/headers/cmath/hypot-long-double.cc
> >> > +++ 
> >> > b/libstdc++-v3/testsuite/26_numerics/headers/cmath/hypot-long-double.cc
> >> > @@ -17,7 +17,7 @@
> >> >
> >> >  // { dg-options "-std=gnu++17" }
> >> >  // { dg-do run { target c++17 } }
> >> > -// { dg-xfail-run-if "PR 78179" { powerpc-ibm-aix* hppa-*-linux* 
> >> > nios2-*-* } }
> >> > +// { dg-xfail-run-if "PR 78179" { powerpc-ibm-aix* hppa-*-linux*
> >> > nios2-*-* arm*-*-* } }
> >> >
> >> >  // Run the long double tests from hypot.cc separately, because they 
> >> > fail on a
> >> >  // number of targets. See PR libstdc++/78179 for details.
> >> >
> >> > OK?
> >>
> >> just a nit (and not a review): I'd prefer the target list to be sorted
> >> alphabetically, not completely random.
> >>
> >
> >Sure, I can sort the whole list, if OK on principle.
>
> Yes, please go ahead and commit it with the sorted list.
>

OK committed as r264443:
Index: testsuite/26_numerics/headers/cmath/hypot-long-double.cc
===
--- testsuite/26_numerics/headers/cmath/hypot-long-double.cc (revision 264442)
+++ testsuite/26_numerics/headers/cmath/hypot-long-double.cc (revision 264443)
@@ -17,7 +17,7 @@

 // { dg-options "-std=gnu++17" }
 // { dg-do run { target c++17 } }
-// { dg-xfail-run-if "PR 78179" { powerpc-ibm-aix* hppa-*-linux* nios2-*-* } }
+// { dg-xfail-run-if "PR 78179" { arm*-*-* hppa-*-linux* nios2-*-*
powerpc-ibm-aix* } }

 // Run the long double tests from hypot.cc separately, because they fail on a
 // number of targets. See PR libstdc++/78179 for details.


[patch,openacc] Generate sequential loop for OpenACC loop directive inside kernels

2018-09-20 Thread Cesar Philippidis
As Chung-Lin noted here
:

  This patch adjusts omp-low.c:expand_omp_for_generic() to expand to a
  "sequential" loop form (without the OMP runtime calls), used for loop
  directives inside OpenACC kernels constructs. Tom mentions that this
  allows the kernels parallelization to work when '#pragma acc loop'
  makes the front-ends create OMP_FOR, which the loop analysis phases
  don't understand.

I bootstrapped and regtested it on x86_64 Linux with nvptx offloading.
Is this patch OK for trunk?

Thanks,
Cesar
[OpenACC] Generate sequential loop for OpenACC loop directive inside kernels

2018-XX-YY  Chung-Lin Tang 
	Cesar Philippidis  

	gcc/
	* omp-expand.c (struct omp_region): Add inside_kernels_p field.
	(expand_omp_for_generic): Adjust to generate a 'sequential' loop
	when GOMP builtin arguments are BUILT_IN_NONE.
	(expand_omp_for): Use expand_omp_for_generic to generate a
	non-parallelized loop for OMP_FORs inside OpenACC kernels regions.
	(expand_omp): Mark inside_kernels_p field true for regions
	nested inside OpenACC kernels constructs.
	gcc/testsuite/
	* c-c++-common/goacc/kernels-loop-acc-loop.c: New test.
	* c-c++-common/goacc/kernels-loop-2-acc-loop.c: New test.
	* c-c++-common/goacc/kernels-loop-3-acc-loop.c: New test.
	* c-c++-common/goacc/kernels-loop-n-acc-loop.c: New test.
	* c-c++-common/goacc/kernels-acc-loop-reduction.c: New test.
	* c-c++-common/goacc/kernels-acc-loop-smaller-equal.c: New test.

(cherry picked from gomp-4_0-branch r224505, r224837, r228232, r228233,
r231461, and r247958)
---
 gcc/omp-expand.c  | 136 --
 .../goacc/kernels-acc-loop-reduction.c|  23 +++
 .../goacc/kernels-acc-loop-smaller-equal.c|  23 +++
 .../goacc/kernels-loop-2-acc-loop.c   |  18 +++
 .../goacc/kernels-loop-3-acc-loop.c   |  15 ++
 .../goacc/kernels-loop-acc-loop.c |  15 ++
 .../goacc/kernels-loop-n-acc-loop.c   |  15 ++
 7 files changed, 204 insertions(+), 41 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-reduction.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-acc-loop-smaller-equal.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-2-acc-loop.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-3-acc-loop.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-acc-loop.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/kernels-loop-n-acc-loop.c

diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index d2a77c067c6..9b03f62e065 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -104,6 +104,9 @@ struct omp_region
   /* The ordered stmt if type is GIMPLE_OMP_ORDERED and it has
  a depend clause.  */
   gomp_ordered *ord_stmt;
+
+  /* True if this is nested inside an OpenACC kernels construct.  */
+  bool inside_kernels_p;
 };
 
 static struct omp_region *root_omp_region;
@@ -2509,6 +2512,7 @@ expand_omp_for_generic (struct omp_region *region,
   gassign *assign_stmt;
   bool in_combined_parallel = is_combined_parallel (region);
   bool broken_loop = region->cont == NULL;
+  bool seq_loop = (start_fn == BUILT_IN_NONE || next_fn == BUILT_IN_NONE);
   edge e, ne;
   tree *counts = NULL;
   int i;
@@ -2606,8 +2610,12 @@ expand_omp_for_generic (struct omp_region *region,
   type = TREE_TYPE (fd->loop.v);
   istart0 = create_tmp_var (fd->iter_type, ".istart0");
   iend0 = create_tmp_var (fd->iter_type, ".iend0");
-  TREE_ADDRESSABLE (istart0) = 1;
-  TREE_ADDRESSABLE (iend0) = 1;
+
+  if (!seq_loop)
+{
+  TREE_ADDRESSABLE (istart0) = 1;
+  TREE_ADDRESSABLE (iend0) = 1;
+}
 
   /* See if we need to bias by LLONG_MIN.  */
   if (fd->iter_type == long_long_unsigned_type_node
@@ -2637,7 +2645,25 @@ expand_omp_for_generic (struct omp_region *region,
   gsi_prev (&gsif);
 
   tree arr = NULL_TREE;
-  if (in_combined_parallel)
+  if (seq_loop)
+{
+  tree n1 = fold_convert (fd->iter_type, fd->loop.n1);
+  tree n2 = fold_convert (fd->iter_type, fd->loop.n2);
+
+  n1 = force_gimple_operand_gsi_1 (&gsi, n1, is_gimple_reg, NULL_TREE, true,
+   GSI_SAME_STMT);
+  n2 = force_gimple_operand_gsi_1 (&gsi, n2, is_gimple_reg, NULL_TREE, true,
+   GSI_SAME_STMT);
+
+  assign_stmt = gimple_build_assign (istart0, n1);
+  gsi_insert_before (&gsi, assign_stmt, GSI_SAME_STMT);
+
+  assign_stmt = gimple_build_assign (iend0, n2);
+  gsi_insert_before (&gsi, assign_stmt, GSI_SAME_STMT);
+
+  t = fold_build2 (NE_EXPR, boolean_type_node, istart0, iend0);
+}
+  else if (in_combined_parallel)
 {
   gcc_assert (fd->ordered == 0);
   /* In a combined parallel loop, emit a call to
@@ -3059,39 +3085,45 @@ expand_omp_for_generic (struct omp_region *region,
 	collapse_bb = extract_omp_for_update_vars (fd, cont_bb, l1_bb);
 
   /* Emit code to get the next parallel iteration in L2_BB.  */
-  gsi = g

[PATCH] rs6000: Delete VECTOR_OTHER

2018-09-20 Thread Segher Boessenkool
It's never used.  Committing to trunk.


2018-09-20  Segher Boessenkool  

* config/rs6000/rs6000-opts.h (enum rs6000_vector): Delete
VECTOR_OTHER.
* config/rs6000/rs6000.c (rs6000_debug_vector_unit): Delete
case VECTOR_OTHER.

---
 gcc/config/rs6000/rs6000-opts.h | 3 +--
 gcc/config/rs6000/rs6000.c  | 1 -
 2 files changed, 1 insertion(+), 3 deletions(-)

diff --git a/gcc/config/rs6000/rs6000-opts.h b/gcc/config/rs6000/rs6000-opts.h
index bd0eea3..1212d11 100644
--- a/gcc/config/rs6000/rs6000-opts.h
+++ b/gcc/config/rs6000/rs6000-opts.h
@@ -137,8 +137,7 @@ enum rs6000_vector {
   VECTOR_NONE, /* Type is not  a vector or not supported */
   VECTOR_ALTIVEC,  /* Use altivec for vector processing */
   VECTOR_VSX,  /* Use VSX for vector processing */
-  VECTOR_P8_VECTOR,/* Use ISA 2.07 VSX for vector processing */
-  VECTOR_OTHER /* Some other vector unit */
+  VECTOR_P8_VECTOR /* Use ISA 2.07 VSX for vector processing */
 };
 
 /* Where to get the canary for the stack protector.  */
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 6f25f6b..4cf5538 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -2300,7 +2300,6 @@ rs6000_debug_vector_unit (enum rs6000_vector v)
 case VECTOR_ALTIVEC:   ret = "altivec";   break;
 case VECTOR_VSX:  ret = "vsx";   break;
 case VECTOR_P8_VECTOR: ret = "p8_vector"; break;
-case VECTOR_OTHER:ret = "other"; break;
 default:  ret = "unknown";   break;
 }
 
-- 
1.8.3.1



[patch,openacc] handle missing OMP_LIST_ clauses in fortran's parse tree debugger

2018-09-20 Thread Cesar Philippidis
This patch updates Fortran's parse tree printer to print the names of
new OpenACC data clauses. I'm not if this functionality is widely used
or not, but from a standpoint of correctness, this patch would probably
be nice to have.

It this patch OK for trunk? I bootstrapped and regtested it for x86_64
Linux with nvptx offloading.

Thanks,
Cesar
[OpenACC] handle missing OMP_LIST_ clauses in fortran's parse tree debugger

2018-XX-YY  Cesar Philippidis  

	gcc/fortran/
	* dump-parse-tree.c (show_omp_clauses): Add missing omp list_types
	and reorder the switch cases to match the enum in gfortran.h.

(cherry picked from gomp-4_0-branch r228355, 159518d)
---
 gcc/fortran/dump-parse-tree.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/gcc/fortran/dump-parse-tree.c b/gcc/fortran/dump-parse-tree.c
index 2a28fa30986..f1be5a67a26 100644
--- a/gcc/fortran/dump-parse-tree.c
+++ b/gcc/fortran/dump-parse-tree.c
@@ -1384,21 +1384,26 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
 	const char *type = NULL;
 	switch (list_type)
 	  {
-	  case OMP_LIST_USE_DEVICE: type = "USE_DEVICE"; break;
-	  case OMP_LIST_DEVICE_RESIDENT: type = "USE_DEVICE"; break;
-	  case OMP_LIST_CACHE: type = ""; break;
 	  case OMP_LIST_PRIVATE: type = "PRIVATE"; break;
 	  case OMP_LIST_FIRSTPRIVATE: type = "FIRSTPRIVATE"; break;
 	  case OMP_LIST_LASTPRIVATE: type = "LASTPRIVATE"; break;
+	  case OMP_LIST_COPYPRIVATE: type = "COPYPRIVATE"; break;
 	  case OMP_LIST_SHARED: type = "SHARED"; break;
 	  case OMP_LIST_COPYIN: type = "COPYIN"; break;
 	  case OMP_LIST_UNIFORM: type = "UNIFORM"; break;
 	  case OMP_LIST_ALIGNED: type = "ALIGNED"; break;
 	  case OMP_LIST_LINEAR: type = "LINEAR"; break;
+	  case OMP_LIST_DEPEND: type = "DEPEND"; break;
+	  case OMP_LIST_MAP: type = "MAP"; break;
+	  case OMP_LIST_TO: type = "TO"; break;
+	  case OMP_LIST_FROM: type = "FROM"; break;
 	  case OMP_LIST_REDUCTION: type = "REDUCTION"; break;
+	  case OMP_LIST_DEVICE_RESIDENT: type = "DEVICE_RESIDENT"; break;
+	  case OMP_LIST_LINK: type = "LINK"; break;
+	  case OMP_LIST_USE_DEVICE: type = "USE_DEVICE"; break;
+	  case OMP_LIST_CACHE: type = "CACHE"; break;
 	  case OMP_LIST_IS_DEVICE_PTR: type = "IS_DEVICE_PTR"; break;
 	  case OMP_LIST_USE_DEVICE_PTR: type = "USE_DEVICE_PTR"; break;
-	  case OMP_LIST_DEPEND: type = "DEPEND"; break;
 	  default:
 	gcc_unreachable ();
 	  }
-- 
2.17.1



[patch,openacc] Fix hang when running oacc exec with CUDA 9.0 nvprof

2018-09-20 Thread Cesar Philippidis
While tuning the performance of nvptx OpenACC offloading earlier this
year, Tom fixed a bug in og7 that prevented Nvidia's nvprof profiling
tool from working with CUDA 9. Tom posted more details on the patch here
, which is
still relevant here.

Note that this issue was triggered by the new OpenACC profiling API in
og7, which has not landed in trunk yet. However, it's probably a good
idea to get this patch committed independently from that huge profiling
patch series.

Is this OK for trunk? I bootstrapped and regtested this for x86_64 Linux
with nvptx offloading.

Thanks,
Cesar
[OpenACC] Fix hang when running oacc exec with CUDA 9.0 nvprof

2018-XX-YY  Tom de Vries  
	Cesar Philippidis  

	libgomp/
	* oacc-init.c (acc_init_state_lock, acc_init_state, acc_init_thread):
	New variable.
	(acc_init_1): Set acc_init_thread to pthread_self ().  Set
	acc_init_state to initializing at the start, and to initialized at the
	end.
	(self_initializing_p): New function.
	(acc_get_device_type): Return acc_device_none if called by thread that
	is currently executing acc_init_1.

(cherry picked from openacc-gcc-7-branch commit
81904b675f6298a9c26c71391909ce362990a11f, bfc999c)
---
 libgomp/oacc-init.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/libgomp/oacc-init.c b/libgomp/oacc-init.c
index 8db24b17d29..8842e7218cb 100644
--- a/libgomp/oacc-init.c
+++ b/libgomp/oacc-init.c
@@ -40,6 +40,11 @@
 
 static gomp_mutex_t acc_device_lock;
 
+static gomp_mutex_t acc_init_state_lock;
+static enum { uninitialized, initializing, initialized } acc_init_state
+  = uninitialized;
+static pthread_t acc_init_thread;
+
 /* A cached version of the dispatcher for the global "current" accelerator type,
e.g. used as the default when creating new host threads.  This is the
device-type equivalent of goacc_device_num (which specifies which device to
@@ -215,6 +220,11 @@ acc_init_1 (acc_device_t d)
   struct gomp_device_descr *base_dev, *acc_dev;
   int ndevs;
 
+  gomp_mutex_lock (&acc_init_state_lock);
+  acc_init_state = initializing;
+  acc_init_thread = pthread_self ();
+  gomp_mutex_unlock (&acc_init_state_lock);
+
   base_dev = resolve_device (d, true);
 
   ndevs = base_dev->get_num_devices_func ();
@@ -234,6 +244,10 @@ acc_init_1 (acc_device_t d)
   gomp_init_device (acc_dev);
   gomp_mutex_unlock (&acc_dev->lock);
 
+  gomp_mutex_lock (&acc_init_state_lock);
+  acc_init_state = initialized;
+  gomp_mutex_unlock (&acc_init_state_lock);
+
   return base_dev;
 }
 
@@ -528,6 +542,17 @@ acc_set_device_type (acc_device_t d)
 
 ialias (acc_set_device_type)
 
+static bool
+self_initializing_p (void)
+{
+  bool res;
+  gomp_mutex_lock (&acc_init_state_lock);
+  res = (acc_init_state == initializing
+	 && pthread_equal (acc_init_thread, pthread_self ()));
+  gomp_mutex_unlock (&acc_init_state_lock);
+  return res;
+}
+
 acc_device_t
 acc_get_device_type (void)
 {
@@ -537,6 +562,15 @@ acc_get_device_type (void)
 
   if (thr && thr->base_dev)
 res = acc_device_type (thr->base_dev->type);
+  else if (self_initializing_p ())
+/* The Cuda libaccinj64.so version 9.0+ calls acc_get_device_type during the
+   acc_ev_device_init_start event callback, which is dispatched during
+   acc_init_1.  Trying to lock acc_device_lock during such a call (as we do
+   in the else clause below), will result in deadlock, since the lock has
+   already been taken by the acc_init_1 caller.  We work around this problem
+   by using the acc_get_device_type property "If the device type has not yet
+   been selected, the value acc_device_none may be returned".  */
+;
   else
 {
   gomp_init_targets_once ();
-- 
2.17.1



[patch,openacc] Fix PR71959: lto dump of callee counts

2018-09-20 Thread Cesar Philippidis
This is another old gomp4 patch that demotes an ICE in PR71959 to a
linker warning. One problem here is that it is not clear if OpenACC
allows individual member functions in C++ classes to be marked as acc
routines. There's another issue accessing member data inside offloaded
regions. We'll add some support for member data OpenACC 2.6, but some of
the OpenACC C++ semantics are still unclear.

Is this OK for trunk? I bootstrapped and regtested it for x86_64 Linux
with nvptx offloading.

Thanks,
Cesar
[PR71959] lto dump of callee counts

2018-XX-YY  Nathan Sidwell  
	Cesar Philippidis  

	gcc/
	* ipa-inline-analysis.c (inline_write_summary): Only dump callee
	counts when dumping the function's body.

	libgomp/
	* testsuite/libgomp.oacc-c++/pr71959.C: New.
	* testsuite/libgomp.oacc-c++/pr71959-a.C: New.

(cherry picked from gomp-4_0-branch r239788)
---
 gcc/ipa-fnsummary.c   | 18 ---
 .../testsuite/libgomp.oacc-c++/pr71959-a.C| 31 +++
 libgomp/testsuite/libgomp.oacc-c++/pr71959.C  | 31 +++
 3 files changed, 75 insertions(+), 5 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c++/pr71959-a.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-c++/pr71959.C

diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 62095c6cf6f..e796b085e14 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -3409,8 +3409,10 @@ ipa_fn_summary_write (void)
 	  int i;
 	  size_time_entry *e;
 	  struct condition *c;
+	  int index = lto_symtab_encoder_encode (encoder, cnode);
+	  bool body = encoder->nodes[index].body;
 
-	  streamer_write_uhwi (ob, lto_symtab_encoder_encode (encoder, cnode));
+	  streamer_write_uhwi (ob, index);
 	  streamer_write_hwi (ob, info->estimated_self_stack_size);
 	  streamer_write_hwi (ob, info->self_size);
 	  info->time.stream_out (ob);
@@ -3453,10 +3455,16 @@ ipa_fn_summary_write (void)
 	info->array_index->stream_out (ob);
 	  else
 	streamer_write_uhwi (ob, 0);
-	  for (edge = cnode->callees; edge; edge = edge->next_callee)
-	write_ipa_call_summary (ob, edge);
-	  for (edge = cnode->indirect_calls; edge; edge = edge->next_callee)
-	write_ipa_call_summary (ob, edge);
+	  if (body)
+	{
+	  /* Only write callee counts when we're emitting the
+		 body, as the reader only knows about the callees when
+		 the body's emitted.  */
+	  for (edge = cnode->callees; edge; edge = edge->next_callee)
+		write_ipa_call_summary (ob, edge);
+	  for (edge = cnode->indirect_calls; edge; edge = edge->next_callee)
+		write_ipa_call_summary (ob, edge);
+	}
 	}
 }
   streamer_write_char_stream (ob->main_stream, 0);
diff --git a/libgomp/testsuite/libgomp.oacc-c++/pr71959-a.C b/libgomp/testsuite/libgomp.oacc-c++/pr71959-a.C
new file mode 100644
index 000..9486512d0e7
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c++/pr71959-a.C
@@ -0,0 +1,31 @@
+// { dg-do compile }
+
+struct Iter 
+{
+  int *cursor;
+
+  void ctor (int *cursor_) asm("_ZN4IterC1EPi");
+  int *point () const asm("_ZNK4Iter5pointEv");
+};
+
+#pragma acc routine
+void  Iter::ctor (int *cursor_)
+{
+  cursor = cursor_;
+}
+
+#pragma acc routine
+int *Iter::point () const
+{
+  return cursor;
+}
+
+void apply (int (*fn)(), Iter out) asm ("_ZN5Apply5applyEPFivE4Iter");
+
+#pragma acc routine
+void apply (int (*fn)(), struct Iter out)
+{ *out.point() = fn (); }
+
+extern "C" void __gxx_personality_v0 ()
+{
+}
diff --git a/libgomp/testsuite/libgomp.oacc-c++/pr71959.C b/libgomp/testsuite/libgomp.oacc-c++/pr71959.C
new file mode 100644
index 000..169bf4aad17
--- /dev/null
+++ b/libgomp/testsuite/libgomp.oacc-c++/pr71959.C
@@ -0,0 +1,31 @@
+// { dg-additional-sources "pr71959-a.C" }
+
+// pr lto/71959 ICEd LTO due to mismatch between writing & reading behaviour
+
+struct Iter
+{
+  int *cursor;
+  
+  Iter(int *cursor_) : cursor(cursor_) {}
+
+  int *point() const { return cursor; }
+};
+
+#pragma acc routine seq
+int one () { return 1; }
+
+struct Apply
+{
+  static void apply (int (*fn)(), Iter out)
+  { *out.point() = fn (); }
+};
+
+int main ()
+{
+  int x;
+  
+#pragma acc parallel copyout(x)
+  Apply::apply (one, Iter (&x));
+
+  return x != 1;
+}
-- 
2.17.1



[patch,openacc] Propagate independent clause for OpenACC kernels pass

2018-09-20 Thread Cesar Philippidis
This is another old patch teaches the omp expansion pass how to
propagate the acc loop independent clause to the later stages throughout
compilation. Unfortunately, it didn't include any test cases. I'm not
sure how effective this will be with the existing kernel parloops pass.
But as I noted in my Cauldron talk, we would like to convert acc kernels
regions to acc parallel regions, and this patch could help in that regard.

Chung-Lin, do you have anymore state on this patch?

Anyway, I bootstrapped and regtested it for x86_64 Linux with nvptx
offloading and it didn't introduce any regressions. We do have a couple
of other standalone kernels patches in og8, but those depend on other
patches.

Thanks,
Cesar
[OpenACC] Propagate independent clause for OpenACC kernels pass

2018-XX-YY  Chung-Lin Tang 
	Cesar Philippidis  

	gcc/
	* cfgloop.h (struct loop): Add 'bool marked_independent' field.
	* omp-expand.c (struct omp_region): Add 'int kind' and
	'bool independent' fields.
	(expand_omp_for): Set 'marked_independent' field for loop
	corresponding to region.
	(find_omp_for_region_data): New function.
	(build_omp_regions_1): Set kind field.  Call
	find_omp_for_region_data for GIMPLE_OMP_FOR statements.

(cherry picked from gomp-4_0-branch r225759)
---
 gcc/cfgloop.h|  4 
 gcc/omp-expand.c | 46 --
 2 files changed, 48 insertions(+), 2 deletions(-)

diff --git a/gcc/cfgloop.h b/gcc/cfgloop.h
index 80a31c416ca..7928681b514 100644
--- a/gcc/cfgloop.h
+++ b/gcc/cfgloop.h
@@ -221,6 +221,10 @@ struct GTY ((chain_next ("%h.next"))) loop {
   /* True if the loop is part of an oacc kernels region.  */
   unsigned in_oacc_kernels_region : 1;
 
+  /* True if loop is tagged as having independent iterations by user,
+ e.g. the OpenACC independent clause.  */
+  bool marked_independent;
+
   /* The number of times to unroll the loop.  0 means no information given,
  just do what we always do.  A value of 1 means do not unroll the loop.
  A value of USHRT_MAX means unroll with no specific unrolling factor.
diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index 9b03f62e065..427f329d35f 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -107,6 +107,12 @@ struct omp_region
 
   /* True if this is nested inside an OpenACC kernels construct.  */
   bool inside_kernels_p;
+
+  /* Records a generic kind field.  */
+  int kind;
+
+  /* For an OpenACC loop directive, true if has the 'independent' clause.  */
+  bool independent;
 };
 
 static struct omp_region *root_omp_region;
@@ -5705,8 +5711,15 @@ expand_omp_for (struct omp_region *region, gimple *inner_stmt)
 loops_state_set (LOOPS_NEED_FIXUP);
 
   if (region->inside_kernels_p)
-expand_omp_for_generic (region, &fd, BUILT_IN_NONE, BUILT_IN_NONE,
-			inner_stmt);
+{
+  expand_omp_for_generic (region, &fd, BUILT_IN_NONE, BUILT_IN_NONE,
+			  inner_stmt);
+  if (region->independent && region->cont->loop_father)
+	{
+	  struct loop *loop = region->cont->loop_father;
+	  loop->marked_independent = true;
+	}
+}
   else if (gimple_omp_for_kind (fd.for_stmt) & GF_OMP_FOR_SIMD)
 expand_omp_simd (region, &fd);
   else if (gimple_omp_for_kind (fd.for_stmt) == GF_OMP_FOR_KIND_OACC_LOOP)
@@ -7887,6 +7900,31 @@ expand_omp (struct omp_region *region)
 }
 }
 
+/* Fill in additional data for a region REGION associated with an
+   OMP_FOR STMT.  */
+
+static void
+find_omp_for_region_data (struct omp_region *region, gomp_for *stmt)
+{
+  region->kind = gimple_omp_for_kind (stmt);
+
+  if (region->kind == GF_OMP_FOR_KIND_OACC_LOOP)
+{
+  struct omp_region *target_region = region->outer;
+  while (target_region
+	 && target_region->type != GIMPLE_OMP_TARGET)
+	target_region = target_region->outer;
+  if (!target_region)
+	return;
+
+  tree clauses = gimple_omp_for_clauses (stmt);
+
+  if (target_region->kind == GF_OMP_TARGET_KIND_OACC_KERNELS
+	  && omp_find_clause (clauses, OMP_CLAUSE_INDEPENDENT))
+	region->independent = true;
+}
+}
+
 /* Helper for build_omp_regions.  Scan the dominator tree starting at
block BB.  PARENT is the region that contains BB.  If SINGLE_TREE is
true, the function ends once a single tree is built (otherwise, whole
@@ -7953,6 +7991,8 @@ build_omp_regions_1 (basic_block bb, struct omp_region *parent,
 		case GF_OMP_TARGET_KIND_OACC_KERNELS:
 		case GF_OMP_TARGET_KIND_OACC_DATA:
 		case GF_OMP_TARGET_KIND_OACC_HOST_DATA:
+		  if (is_gimple_omp_oacc (stmt))
+		region->kind = gimple_omp_target_kind (stmt);
 		  break;
 		case GF_OMP_TARGET_KIND_UPDATE:
 		case GF_OMP_TARGET_KIND_ENTER_DATA:
@@ -7974,6 +8014,8 @@ build_omp_regions_1 (basic_block bb, struct omp_region *parent,
 	/* #pragma omp ordered depend is also just a stand-alone
 	   directive.  */
 	region = NULL;
+	  else if (code == GIMPLE_OMP_FOR)
+	find_omp_for_region_data (region, as_a  (stmt));
 	  /* ..., this directive becomes the parent for a ne

[patch,openacc] Set safelen to INT_MAX for oacc independent pragma

2018-09-20 Thread Cesar Philippidis
This is another old gomp4 OpenACC patch which impacts targets that use
simd vectorization, such as the host and AMD GCN, rather than nvptx.
Basically, as the subject states, it sets safelen to INT_MAX for
independent acc loops, which I believe is already being done for OpenMP
in certain situations.

The original discussion for this patch can be found here
.

Is this patch OK for trunk? I bootstrapped and regtested it for x86_64
Linux with nvptx offloading.

Thanks,
Cesar
[OpenACC] Set safelen to INT_MAX for oacc independent pragma

2018-XX-YY  Tom de Vries  
	Cesar Philippidis  

	gcc/
	* omp-expand.c (expand_omp_for): Set loop->safelen to INT_MAX if
	marked_independent.

(cherry picked from gomp-4_0-branch r226079)
---
 gcc/omp-expand.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/omp-expand.c b/gcc/omp-expand.c
index 427f329d35f..ee147f10826 100644
--- a/gcc/omp-expand.c
+++ b/gcc/omp-expand.c
@@ -5718,6 +5718,7 @@ expand_omp_for (struct omp_region *region, gimple *inner_stmt)
 	{
 	  struct loop *loop = region->cont->loop_father;
 	  loop->marked_independent = true;
+	  loop->safelen = INT_MAX;
 	}
 }
   else if (gimple_omp_for_kind (fd.for_stmt) & GF_OMP_FOR_SIMD)
-- 
2.17.1



[patch,openacc] Update _OPENACC value and documentation for OpenACC 2.5

2018-09-20 Thread Cesar Philippidis
This patch formally introduces OpenACC 2.5 functionality in various GCC
documentation sources along with with updated the _OPENACC value in the
various offloading header files.

As of right now, GCC trunk already supports the updated OpenACC 2.5 data
clause semantics. Julian, Chung-Lin and I have been working on pushing
our remaining og8 patches to trunk (which we're down to under 30 now
from 170+). But a number of those changes involve performance tuning,
rather than new OpenACC functionality.

Is this patch OK for trunk? I bootstrapped and regtested it for x86_64
Linux with nvptx offloading.

Thanks,
Cesar
[OpenACC] Update _OPENACC value and documentation for OpenACC 2.5

2018-XX-YY  Thomas Schwinge 
	Cesar Philippidis  

	gcc/c-family/
	* c-cppbuiltin.c (c_cpp_builtins): Update "_OPENACC" to "201510".
	gcc/fortran/
	* cpp.c (cpp_define_builtins): Update "_OPENACC" to "201510".
	* gfortran.texi: Update for OpenACC 2.5.
	* Intrinsic.texi: Likewise.
	* invoke.texi: Likewise.
	gcc/testsuite/
	* c-c++-common/cpp/openacc-define-3.c: Update.
	* gfortran.dg/openacc-define-3.f90: Likewise.
	gcc/
	* doc/invoke.texi: Update for OpenACC 2.5.
	libgomp/
	* libgomp.texi: Update for OpenACC 2.5.
	* openacc.f90 (openacc_version): Update to "201510".
	* openacc_lib.h (openacc_version): Likewise.
	* testsuite/libgomp.oacc-fortran/openacc_version-1.f: Update.
	* testsuite/libgomp.oacc-fortran/openacc_version-2.f90: Update.

(cherry picked from gomp-4_0-branch r248057, ccbbcb70569)
---
 gcc/c-family/c-cppbuiltin.c   |  2 +-
 gcc/doc/invoke.texi   |  4 +++-
 gcc/fortran/cpp.c |  2 +-
 gcc/fortran/gfortran.texi | 16 +-
 gcc/fortran/intrinsic.texi|  6 +++---
 gcc/fortran/invoke.texi   |  4 +---
 .../c-c++-common/cpp/openacc-define-3.c   |  2 +-
 .../gfortran.dg/openacc-define-3.f90  |  2 +-
 libgomp/libgomp.texi  | 21 ++-
 libgomp/openacc.f90   |  2 +-
 libgomp/openacc_lib.h |  2 +-
 .../libgomp.oacc-fortran/openacc_version-1.f  |  2 +-
 .../openacc_version-2.f90 |  2 +-
 13 files changed, 31 insertions(+), 36 deletions(-)

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 96a6b4dfd2b..f2a273b6ac7 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -1391,7 +1391,7 @@ c_cpp_builtins (cpp_reader *pfile)
 cpp_define (pfile, "__SSP__=1");
 
   if (flag_openacc)
-cpp_define (pfile, "_OPENACC=201306");
+cpp_define (pfile, "_OPENACC=201510");
 
   if (flag_openmp)
 cpp_define (pfile, "_OPENMP=201511");
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index 94304c314cf..34d7ff71512 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -2161,10 +2161,12 @@ freestanding and hosted environments.
 Enable handling of OpenACC directives @code{#pragma acc} in C/C++ and
 @code{!$acc} in Fortran.  When @option{-fopenacc} is specified, the
 compiler generates accelerated code according to the OpenACC Application
-Programming Interface v2.0 @w{@uref{https://www.openacc.org}}.  This option
+Programming Interface v2.5 @w{@uref{https://www.openacc.org}}.  This option
 implies @option{-pthread}, and thus is only supported on targets that
 have support for @option{-pthread}.
 
+See @uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
+
 @item -fopenacc-dim=@var{geom}
 @opindex fopenacc-dim
 @cindex OpenACC accelerator programming
diff --git a/gcc/fortran/cpp.c b/gcc/fortran/cpp.c
index 0b3de42e832..14871129ff6 100644
--- a/gcc/fortran/cpp.c
+++ b/gcc/fortran/cpp.c
@@ -165,7 +165,7 @@ cpp_define_builtins (cpp_reader *pfile)
   cpp_define (pfile, "_LANGUAGE_FORTRAN=1");
 
   if (flag_openacc)
-cpp_define (pfile, "_OPENACC=201306");
+cpp_define (pfile, "_OPENACC=201510");
 
   if (flag_openmp)
 cpp_define (pfile, "_OPENMP=201511");
diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 30934046a49..59a69457fe0 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -476,9 +476,7 @@ used on real-world programs.  In particular, the supported extensions
 include OpenMP, Cray-style pointers, some old vendor extensions, and several
 Fortran 2003 and Fortran 2008 features, including TR 15581.  However, it is
 still under development and has a few remaining rough edges.
-There also is initial support for OpenACC.
-Note that this is an experimental feature, incomplete, and subject to
-change in future versions of GCC.  See
+There also is support for OpenACC.  See
 @uref{https://gcc.gnu.org/wiki/OpenACC} for more information.
 
 At present, the GNU Fortran compiler passes the
@@ -538,10 +536,8 @@ status} and @ref{Fortran 2018 status} sections of the documentation.
 Additionally, the GNU Fortran compilers supports the OpenMP specification
 (version 4.0 and most of th

Re: [patch,openacc] handle missing OMP_LIST_ clauses in fortran's parse tree debugger

2018-09-20 Thread Cesar Philippidis
On 09/20/2018 11:22 AM, Paul Richard Thomas wrote:
> Hi Cesar,
> 
> It looks OK to me.
> 
> Thanks for the patch.
> 
> Paul

Thanks! Committed in r264446.

Cesar

> On 20 September 2018 at 18:21, Cesar Philippidis  
> wrote:
>> This patch updates Fortran's parse tree printer to print the names of
>> new OpenACC data clauses. I'm not if this functionality is widely used
>> or not, but from a standpoint of correctness, this patch would probably
>> be nice to have.
>>
>> It this patch OK for trunk? I bootstrapped and regtested it for x86_64
>> Linux with nvptx offloading.
>>
>> Thanks,
>> Cesar
> 
> 
> 



[PATCH, rs6000] Update vec_splat references in testcases for validity.

2018-09-20 Thread Will Schmidt
Hi,

  This updates those powerpc testsuite tests that are using the
vec_splat() builtin with an invalid arg1.   Per discussions during the
review of gimple-folding for vec_splat(), it was clarified
that arg1 for vec_splat() should be a valid index into the
referenced vector (no modulo).

OK for trunk?

Thanks,
-Will

[testsuite]

2018-11-20  Will Schmidt  

* gcc.target/powerpc/fold-vec-splat-char.c: Remove invalid
vec_splat calls from recently added tests. Update instruction counts.
* gcc.target/powerpc/fold-vec-splat-floatdouble.c: Same.
* gcc.target/powerpc/fold-vec-splat-int.c: Same.
* gcc.target/powerpc/fold-vec-splat-pixel.c: Same.
* gcc.target/powerpc/fold-vec-splat-short.c: Same.
* g++.dg/ext/altivec-6.C: Updated vec_splat() calls.

diff --git a/gcc/testsuite/g++.dg/ext/altivec-6.C 
b/gcc/testsuite/g++.dg/ext/altivec-6.C
index 63ae0b0..4c863ef 100644
--- a/gcc/testsuite/g++.dg/ext/altivec-6.C
+++ b/gcc/testsuite/g++.dg/ext/altivec-6.C
@@ -20,9 +20,11 @@ void foo(void) {
   vec_dststt(buf, a, 3);
   vec_dststt(buf, a, 2);
 
   vp = vec_sld(vp, vp, 5);
   vbc = vec_splat(vbc, 7);
-  vbs = vec_splat(vbs, 12);
-  vp = vec_splat(vp, 17);
-  vbi = vec_splat(vbi, 31);  
+  /*  The second argument to vec_splat needs to be less than the number of
+   elements in the referenced vector.  */
+  vbs = vec_splat(vbs, 4);
+  vp = vec_splat(vp, 1);
+  vbi = vec_splat(vbi, 15);  
 }
diff --git a/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-char.c 
b/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-char.c
index d50d073..ca9ea3c 100644
--- a/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-char.c
+++ b/gcc/testsuite/gcc.target/powerpc/fold-vec-splat-char.c
@@ -10,46 +10,31 @@
 vector bool char testb_0  (vector bool char x) { return vec_splat (x, 
0b0); }
 vector bool char testb_1  (vector bool char x) { return vec_splat (x, 
0b1); }
 vector bool char testb_2  (vector bool char x) { return vec_splat (x, 
0b00010); }
 vector bool char testb_4  (vector bool char x) { return vec_splat (x, 
0b00100); }
 vector bool char testb_8  (vector bool char x) { return vec_splat (x, 
0b01000); }
-vector bool char testb_10 (vector bool char x) { return vec_splat (x, 
0b1); }
-vector bool char testb_1e (vector bool char x) { return vec_splat (x, 
0b0); }
-vector bool char testb_1f (vector bool char x) { return vec_splat (x, 
0b1); }
 
 vector signed char tests_0  (vector signed char x) { return vec_splat (x, 
0b0); }
 vector signed char tests_1  (vector signed char x) { return vec_splat (x, 
0b1); }
 vector signed char tests_2  (vector signed char x) { return vec_splat (x, 
0b00010); }
 vector signed char tests_4  (vector signed char x) { return vec_splat (x, 
0b00100); }
 vector signed char tests_8  (vector signed char x) { return vec_splat (x, 
0b01000); }
-vector signed char tests_10 (vector signed char x) { return vec_splat (x, 
0b1); }
-vector signed char tests_1e (vector signed char x) { return vec_splat (x, 
0b0); }
-vector signed char tests_1f (vector signed char x) { return vec_splat (x, 
0b1); }
 
 vector unsigned char testu_0  (vector unsigned char x) { return vec_splat (x, 
0b0); }
 vector unsigned char testu_1  (vector unsigned char x) { return vec_splat (x, 
0b1); }
 vector unsigned char testu_2  (vector unsigned char x) { return vec_splat (x, 
0b00010); }
 vector unsigned char testu_4  (vector unsigned char x) { return vec_splat (x, 
0b00100); }
 vector unsigned char testu_8  (vector unsigned char x) { return vec_splat (x, 
0b01000); }
-vector unsigned char testu_10 (vector unsigned char x) { return vec_splat (x, 
0b1); }
-vector unsigned char testu_1e (vector unsigned char x) { return vec_splat (x, 
0b0); }
-vector unsigned char testu_1f (vector unsigned char x) { return vec_splat (x, 
0b1); }
 
 /* Similar tests as above, but the source vector is a known constant. */
 const vector bool char by = 
{'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p'};
 const vector signed char sy = 
{'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p'};
 const vector unsigned char uy = 
{'a','b','c','d','e','f','g','h','i','j','k','l','m','n','o','p'};
 
 vector bool char test_bc (vector bool char x) { return vec_splat (by, 
0b00010); }
 vector signed char test_sc (vector signed char x) { return vec_splat (sy, 
0b00011); }
 vector unsigned char test_uc (vector unsigned char x) { return vec_splat (uy, 
0b00110); }
 
-/* Similar tests as above, mask is greater than number of elements in the
- source vector.  */
-vector bool char test_obc (vector bool char x) { return vec_splat (by, 
0b10010); }
-vector signed char test_osc (vector signed char x) { return vec_splat (sy, 
0b10011); }
-vector unsigned char test_ouc (vector unsigned char x) { return vec_splat (uy, 
0b10110); }
-
 // vec_splat() using variable vectors should generate the vspltb instruction.
-/* { dg-final { scan-assemble

Re: [patch,openacc] Generate sequential loop for OpenACC loop directive inside kernels

2018-09-20 Thread Cesar Philippidis
On 09/20/2018 10:14 AM, Cesar Philippidis wrote:
> As Chung-Lin noted here
> :
> 
>   This patch adjusts omp-low.c:expand_omp_for_generic() to expand to a
>   "sequential" loop form (without the OMP runtime calls), used for loop
>   directives inside OpenACC kernels constructs. Tom mentions that this
>   allows the kernels parallelization to work when '#pragma acc loop'
>   makes the front-ends create OMP_FOR, which the loop analysis phases
>   don't understand.
> 
> I bootstrapped and regtested it on x86_64 Linux with nvptx offloading.
> Is this patch OK for trunk?

I forgot to mention how that patch depends on the
omp_target_base_pointers_restrict_p functionality from omp lowering that
I removed back in June when I added support for the OpenACC 2.5 data
clause semantics. It turned out that I was too aggressive when I was
removing unused code. That's because, at least initially, there was no
test cases that exercised that functionality in trunk until Chung-Lin's
kernels patch goes in.

Anyway, this patch is specifically required to get
kernels-acc-loop-reduction.c working.

Is this OK for trunk? I bootstrapped and regression tested it on x86_64
Linux with nvptx offloading.

Thanks,
Cesar
[OpenACC] Reintroduce omp_target_base_pointers_restrict_p

It turns out that existing acc kernels instructure based on parloops
will benefit if the variables used in OpenACC data clauses maintained
the restrict pointer qualifier. This code is present in GCC 8, but I
removed it back in June when I committed a patch to update the
behavior of the data clauses match the semantics in OpenACC 2.5.

Is this patch OK for trunk? A forthcoming acc kernels patch depends on
it.

2018-XX-YY  Cesar Philippidis  

	* omp-low.c (install_var_field): New base_pointer_restrict
	argument.
	(scan_sharing_clauses): Update call to install_var_field.
	(omp_target_base_pointers_restrict_p): New function.
	(scan_omp_target): Update call to install_var_field.
---
 gcc/omp-low.c | 89 +++
 1 file changed, 83 insertions(+), 6 deletions(-)

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index 24685fd012c..a59c15ae5fd 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -642,7 +642,8 @@ build_sender_ref (tree var, omp_context *ctx)
BASE_POINTERS_RESTRICT, declare the field with restrict.  */
 
 static void
-install_var_field (tree var, bool by_ref, int mask, omp_context *ctx)
+install_var_field (tree var, bool by_ref, int mask, omp_context *ctx,
+		   bool base_pointers_restrict = false)
 {
   tree field, type, sfield = NULL_TREE;
   splay_tree_key key = (splay_tree_key) var;
@@ -673,7 +674,11 @@ install_var_field (tree var, bool by_ref, int mask, omp_context *ctx)
   type = build_pointer_type (build_pointer_type (type));
 }
   else if (by_ref)
-type = build_pointer_type (type);
+{
+  type = build_pointer_type (type);
+  if (base_pointers_restrict)
+	type = build_qualified_type (type, TYPE_QUAL_RESTRICT);
+}
   else if ((mask & 3) == 1 && omp_is_reference (var))
 type = TREE_TYPE (type);
 
@@ -987,10 +992,12 @@ fixup_child_record_type (omp_context *ctx)
 }
 
 /* Instantiate decls as necessary in CTX to satisfy the data sharing
-   specified by CLAUSES.  */
+   specified by CLAUSES.  If BASE_POINTERS_RESTRICT, install var field with
+   restrict.  */
 
 static void
-scan_sharing_clauses (tree clauses, omp_context *ctx)
+scan_sharing_clauses (tree clauses, omp_context *ctx,
+		  bool base_pointers_restrict = false)
 {
   tree c, decl;
   bool scan_array_reductions = false;
@@ -1252,7 +1259,8 @@ scan_sharing_clauses (tree clauses, omp_context *ctx)
 		  && TREE_CODE (TREE_TYPE (decl)) == ARRAY_TYPE)
 		install_var_field (decl, true, 7, ctx);
 		  else
-		install_var_field (decl, true, 3, ctx);
+		install_var_field (decl, true, 3, ctx,
+   base_pointers_restrict);
 		  if (is_gimple_omp_offloaded (ctx->stmt)
 		  && !OMP_CLAUSE_MAP_IN_REDUCTION (c))
 		install_var_local (decl, ctx);
@@ -2265,6 +2273,68 @@ scan_omp_single (gomp_single *stmt, omp_context *outer_ctx)
 layout_type (ctx->record_type);
 }
 
+/* Return true if the CLAUSES of an omp target guarantee that the base pointers
+   used in the corresponding offloaded function are restrict.  */
+
+static bool
+omp_target_base_pointers_restrict_p (tree clauses)
+{
+  /* The analysis relies on the GOMP_MAP_FORCE_* mapping kinds, which are only
+ used by OpenACC.  */
+  if (flag_openacc == 0)
+return false;
+
+  /* I.  Basic example:
+
+   void foo (void)
+   {
+	 unsigned int a[2], b[2];
+
+	 #pragma acc kernels \
+	   copyout (a) \
+	   copyout (b)
+	 {
+	   a[0] = 0;
+	   b[0] = 1;
+	 }
+   }
+
+ After gimplification, we have:
+
+   #pragma omp target oacc_kernels \
+	 map(force_from:a [len: 8]) \
+	 map(force_from:b [len: 8])
+   {
+	 a[0] = 0;
+	 b[0] = 1;
+   }
+
+ Because both mappings have

[PATCH] rs6000: Remove -misel={yes,no}

2018-09-20 Thread Segher Boessenkool
These options have been deprecated for many years, supplanted by -misel
and -mno-isel.  This patch finally removes them.

Committing to trunk.


2018-09-20  Segher Boessenkool  

* config/rs6000/rs6000.opt (misel=no, misel=yes): Delete.
* doc/invoke.texi (RS/6000 and PowerPC Options): Delete -misel=yes and
-misel=no.

---
 gcc/config/rs6000/rs6000.opt | 8 
 gcc/doc/invoke.texi  | 5 -
 2 files changed, 13 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.opt b/gcc/config/rs6000/rs6000.opt
index 138ce26..fc147b0 100644
--- a/gcc/config/rs6000/rs6000.opt
+++ b/gcc/config/rs6000/rs6000.opt
@@ -341,14 +341,6 @@ misel
 Target Report Mask(ISEL) Var(rs6000_isa_flags)
 Generate isel instructions.
 
-misel=no
-Target RejectNegative Alias(misel) NegativeAlias Warn(%<-misel=no%> is 
deprecated; use %<-mno-isel%> instead)
-Deprecated option.  Use -mno-isel instead.
-
-misel=yes
-Target RejectNegative Alias(misel) Warn(%<-misel=yes%> is deprecated; use 
%<-misel%> instead)
-Deprecated option.  Use -misel instead.
-
 mdebug=
 Target RejectNegative Joined
 -mdebug=   Enable debug output.
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index cfa9c14..b3b50c2 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -1113,7 +1113,6 @@ See RS/6000 and PowerPC Options.
 -mblock-compare-inline-loop-limit=@var{num} @gol
 -mstring-compare-inline-limit=@var{num} @gol
 -misel  -mno-isel @gol
--misel=yes  -misel=no @gol
 -mvrsave  -mno-vrsave @gol
 -mmulhw  -mno-mulhw @gol
 -mdlmzb  -mno-dlmzb @gol
@@ -23936,10 +23935,6 @@ This is a PowerPC 32-bit SYSV ABI option.
 @opindex mno-isel
 This switch enables or disables the generation of ISEL instructions.
 
-@item -misel=@var{yes/no}
-This switch has been deprecated.  Use @option{-misel} and
-@option{-mno-isel} instead.
-
 @item -mvsx
 @itemx -mno-vsx
 @opindex mvsx
-- 
1.8.3.1



[patch, rfc] Clobber scalar intent(out) variables on entry

2018-09-20 Thread Thomas König

Hi,

the patch below tries to clobber scalar intent(out) arguments
on procedure entry.

Index: trans-decl.c
===
--- trans-decl.c(Revision 264423)
+++ trans-decl.c(Arbeitskopie)
@@ -4143,6 +4143,19 @@ init_intent_out_dt (gfc_symbol * proc_sym, gfc_wra

gfc_add_expr_to_block (&init, tmp);
   }
+else if (f->sym->attr.dummy && !f->sym->attr.dimension
+&& f->sym->attr.intent == INTENT_OUT
+&& !f->sym->attr.codimension && !f->sym->attr.allocatable
+&& (f->sym->ts.type != BT_CLASS
+|| (!CLASS_DATA (f->sym)->attr.dimension
+&& !(CLASS_DATA (f->sym)->attr.codimension
+ && CLASS_DATA (f->sym)->attr.allocatable
+  {
+   tree t1, t2;
+   t1 = build_fold_indirect_ref_loc (input_location, 
f->sym->backend_decl);

+   t2 = build_clobber (TREE_TYPE (t1));
+   gfc_add_modify (&init, t1, t2);
+  }

   gfc_add_init_cleanup (block, gfc_finish_block (&init), NULL_TREE);
 }

With this patch,

module x
  contains
subroutine foo(a)
  real, intent(out) :: a
  a =  21.
  a = a + 22.
end subroutine foo
end module x

generates, with -fdump-tree-original

foo (real(kind=4) & restrict a)
{
  *a = {CLOBBER};
  *a = 2.1e+1;
  *a = *a + 2.2e+1;
}

Is this the right way to proceed?

(The if statement is not yet correct, so this version causes
regressions, that would have to be adjusted).

Regards

Thomas


Re: [Patch, Fortran, OOP] PR 46313: OOP-ABI issue, ALLOCATE issue, CLASS renaming issue

2018-09-20 Thread Janus Weil
Am Mi., 19. Sep. 2018 um 16:50 Uhr schrieb Bernhard Reutner-Fischer
:
>
> On Mon, 17 Sep 2018 at 22:25, Janus Weil  wrote:
>
> > The regtest was successful. I don't think the off-by-two error for the
> > vtab/vtype comparisons is a big problem in practice, since the number
> > of internal symbols with leading underscores is very limited, but of
> > course it should still be fixed ...
>
> Luckily it should make no difference indeed as "__vta" and "__vtyp"
> are only used for this one purpose.
> I don't think the DTIO op keyword fix would hit any real user either.
> Thanks for taking care of it, patch LGTM.

I have now committed this as r264448.

Cheers,
Janus


Re: [Patch][GCC] Document and fix -r (partial linking)

2018-09-20 Thread Joseph Myers
On Sat, 1 Sep 2018, Allan Sandfeld Jensen wrote:

> On Montag, 27. August 2018 15:37:15 CEST Joseph Myers wrote:
> > On Sun, 26 Aug 2018, Allan Sandfeld Jensen wrote:
> > > Patch updated. I specifically edited a number of the existing tests that
> > > used both -r and -nostdlib and removed -nostdlib so the patch is
> > > exercised by existing tests. The patch bootstrapped, I didn't notice any
> > > relevant failures when running the test suite (though I could have missed
> > > something, I am never comfortable reading that output).
> > 
> > Note that Iain's comments also included that the patch is incomplete
> > because of more specs in gcc.c (VTABLE_VERIFICATION_SPEC,
> > SANITIZER_EARLY_SPEC, SANITIZER_SPEC) that needs corresponding updates to
> > handle -r like -nostdlib.
> 
> Updated (but tests not rerun)

Thanks, I've now committed the patch.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 08/25] Fix co-array allocation

2018-09-20 Thread Thomas Koenig

Hi Damian,


On a related note, two Sourcery Institute developers have attempted to edit
the GCC build system to make the downloading and building of OpenCoarrays
automatically part of the gfortran build process.  Neither developer
succeeded.


We addressed integrating OpenCoarray into the gcc source tree at the
recent Gcc summit during the gfortran BoF session.

Feedback from people working for big Linux distributions was that they
would prefer to package OpenCoarrays as a separate library.
(They also mentioned it was quite hard to build.)

Maybe these people could use some help from you.

Regards

Thomas


Re: C++ PATCH to refine c++/87109 patch

2018-09-20 Thread Marek Polacek
On Thu, Sep 20, 2018 at 12:20:08PM -0400, Jason Merrill wrote:
> On Thu, Sep 20, 2018 at 11:53 AM, Marek Polacek  wrote:
> > On Thu, Sep 20, 2018 at 11:25:38AM -0400, Jason Merrill wrote:
> >> On Wed, Sep 19, 2018 at 9:50 PM, Marek Polacek  wrote:
> >> > Aaaand this addresses 
> >> > ,
> >> > as I promised earlier.  I hope I got it right.
> >> >
> >> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> >> >
> >> > 2018-09-19  Marek Polacek  
> >> >
> >> > PR c++/87109 - wrong ctor with maybe-rvalue semantics.
> >> > * call.c (build_user_type_conversion_1): Refine the maybe-rvalue
> >> > check to only return if we're converting from a base class.
> >> >
> >> > * g++.dg/cpp0x/ref-qual19.C: Adjust the expected results.
> >> > * g++.dg/cpp0x/ref-qual20.C: New test.
> >> >
> >> > diff --git gcc/cp/call.c gcc/cp/call.c
> >> > index ddf0ed044a0..4bbd77b9cef 100644
> >> > --- gcc/cp/call.c
> >> > +++ gcc/cp/call.c
> >> > @@ -4034,9 +4034,13 @@ build_user_type_conversion_1 (tree totype, tree 
> >> > expr, int flags,
> >> >  conv->bad_p = true;
> >> >
> >> >/* We're performing the maybe-rvalue overload resolution and
> >> > - a conversion function is in play.  This isn't going to work
> >> > - because we would not end up with a suitable constructor.  */
> >> > -  if ((flags & LOOKUP_PREFER_RVALUE) && !DECL_CONSTRUCTOR_P (cand->fn))
> >> > + a conversion function is in play.  If we're converting from
> >> > + a base class to a derived class, reject the conversion.  */
> >> > +  if ((flags & LOOKUP_PREFER_RVALUE)
> >> > +  && !DECL_CONSTRUCTOR_P (cand->fn)
> >> > +  && CLASS_TYPE_P (fromtype)
> >> > +  && CLASS_TYPE_P (totype)
> >> > +  && DERIVED_FROM_P (fromtype, totype))
> >>
> >> Here fromtype is the type we're converting from, and what we want to
> >> reject is converting the return value of the conversion op to a base
> >> class.  CLASS_TYPE_P (fromtype) will always be true, since it has a
> >> conversion op.  And I think we also want to handle the case of totype
> >> being a reference.
> >
> > I think I totally misunderstood what this was about.  It's actually about
> > this case
> >
> > struct Y { int y; };
> > struct X : public Y { int x; };
> >
> > struct A {
> >   operator X();
> > };
> >
> > Y
> > fn (A a)
> > {
> >   return a;
> > }
> >
> > where we want to avoid slicing of X when converting X to Y, yes?
> 
> Yes.

Got it.  So I think the following is the real fix then:

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2018-09-20  Marek Polacek  

PR c++/87109 - wrong ctor with maybe-rvalue semantics.
* call.c (build_user_type_conversion_1): Refine the maybe-rvalue
check to only return if we're converting the return value to a base
class.

* g++.dg/cpp0x/ref-qual19.C: Adjust the expected results.
* g++.dg/cpp0x/ref-qual20.C: New test.

diff --git gcc/cp/call.c gcc/cp/call.c
index ddf0ed044a0..b2ca667c8b4 100644
--- gcc/cp/call.c
+++ gcc/cp/call.c
@@ -4034,10 +4034,12 @@ build_user_type_conversion_1 (tree totype, tree expr, 
int flags,
 conv->bad_p = true;
 
   /* We're performing the maybe-rvalue overload resolution and
- a conversion function is in play.  This isn't going to work
- because we would not end up with a suitable constructor.  */
+ a conversion function is in play.  Reject converting the return
+ value of the conversion function to a base class.  */
   if ((flags & LOOKUP_PREFER_RVALUE) && !DECL_CONSTRUCTOR_P (cand->fn))
-return NULL;
+for (conversion *t = cand->second_conv; t; t = next_conversion (t))
+  if (t->kind == ck_base)
+   return NULL;
 
   /* Remember that this was a list-initialization.  */
   if (flags & LOOKUP_NO_NARROWING)
diff --git gcc/testsuite/g++.dg/cpp0x/ref-qual19.C 
gcc/testsuite/g++.dg/cpp0x/ref-qual19.C
index 8494b83e5b0..50f92977c49 100644
--- gcc/testsuite/g++.dg/cpp0x/ref-qual19.C
+++ gcc/testsuite/g++.dg/cpp0x/ref-qual19.C
@@ -85,13 +85,13 @@ int
 main ()
 {
   C c1 = f (A());
-  if (c1.i != 1)
+  if (c1.i != 2)
 __builtin_abort ();
   C c2 = f2 (A());
   if (c2.i != 2)
 __builtin_abort ();
   C c3 = f3 ();
-  if (c3.i != 1)
+  if (c3.i != 2)
 __builtin_abort ();
   C c4 = f4 ();
   if (c4.i != 2)
@@ -100,13 +100,13 @@ main ()
   if (c5.i != 2)
 __builtin_abort ();
   D c6 = f6 (B());
-  if (c6.i != 3)
+  if (c6.i != 4)
 __builtin_abort ();
   D c7 = f7 (B());
   if (c7.i != 4)
 __builtin_abort ();
   D c8 = f8 ();
-  if (c8.i != 3)
+  if (c8.i != 4)
 __builtin_abort ();
   D c9 = f9 ();
   if (c9.i != 4)
diff --git gcc/testsuite/g++.dg/cpp0x/ref-qual20.C 
gcc/testsuite/g++.dg/cpp0x/ref-qual20.C
index e69de29bb2d..c8bd43643af 100644
--- gcc/testsuite/g++.dg/cpp0x/ref-qual20.C
+++ gcc/testsuite/g++.dg/cpp0x/ref-qual20.C
@@ -0,0 +1,70 @@
+// PR c++/87109
+// { dg-do run { target c++11 } }
+
+#include 
+
+struct 

Re: [PATCH 08/25] Fix co-array allocation

2018-09-20 Thread Damian Rouson
On Thu, Sep 20, 2018 at 1:01 PM Thomas Koenig  wrote:

>
> We addressed integrating OpenCoarray into the gcc source tree at the
> recent Gcc summit during the gfortran BoF session.
>

I agree with keeping it as a separate code base, but comments from some
gfortran developers on the gfortran mailing list suggest that they liked
the idea of integrating the building of OpenCoarrays into the GCC build
system to simplify multi-image testing.


> Feedback from people working for big Linux distributions was that they
> would prefer to package OpenCoarrays as a separate library.
> (They also mentioned it was quite hard to build.)
>
> Maybe these people could use some help from you.
>

Thanks for the feedback.  Please feel free to put me in touch with them or
suggest that they submit issues on the OpenCoarrays repository.  We would
be glad to help.  We've put a lot of time into addressing installation
issues that have been submitted to us and we'll continue to do so if we
receive reports.

Damian


Re: [PATCH 1/2] [ARC] Check for odd-even register when emitting double mac ops.

2018-09-20 Thread Andrew Burgess
* Claudiu Zissulescu  [2018-09-17 15:50:26 +0300]:

> Avoid generate dmac instructions when the register is not odd-even,
> use instead the equivalent mac instruction.
> 
> gcc/
>   Claudiu Zissulescu  
> 
>   * config/arc/arc.md (maddsidi4_split): Don't use dmac if the
>   destination register is not odd-even.
>   (umaddsidi4_split): Likewise.
> 
> gcc/testsuite/
>   Claudiu Zissulescu  
> 
>   * gcc.target/arc/tmac-3.c: New file.

Looks good thanks, with one minor nit below...

> ---
>  gcc/config/arc/arc.md |  4 ++--
>  gcc/testsuite/gcc.target/arc/tmac-3.c | 17 +
>  2 files changed, 19 insertions(+), 2 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arc/tmac-3.c
> 
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index dbcd7098bec..2d108ef166d 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -6078,7 +6078,7 @@ core_3, archs4x, archs4xd, archs4xd_slow"
>"{
> rtx acc_reg = gen_rtx_REG (DImode, ACC_REG_FIRST);
> emit_move_insn (acc_reg, operands[3]);
> -   if (TARGET_PLUS_MACD)
> +   if (TARGET_PLUS_MACD && even_register_operand (operands[0], DImode))
>   emit_insn (gen_macd (operands[0], operands[1], operands[2]));
> else
>   {
> @@ -6178,7 +6178,7 @@ core_3, archs4x, archs4xd, archs4xd_slow"
>"{
> rtx acc_reg = gen_rtx_REG (DImode, ACC_REG_FIRST);
> emit_move_insn (acc_reg, operands[3]);
> -   if (TARGET_PLUS_MACD)
> +   if (TARGET_PLUS_MACD && even_register_operand (operands[0], DImode))
>   emit_insn (gen_macdu (operands[0], operands[1], operands[2]));
> else
>   {
> diff --git a/gcc/testsuite/gcc.target/arc/tmac-3.c 
> b/gcc/testsuite/gcc.target/arc/tmac-3.c
> new file mode 100644
> index 000..3c8c1201f83
> --- /dev/null
> +++ b/gcc/testsuite/gcc.target/arc/tmac-3.c
> @@ -0,0 +1,17 @@
> +/* { dg-do compile } */
> +/* { dg-skip-if "" { ! { clmcpu } } } */
> +/* { dg-options "-mcpu=hs38 -Os" } */
> +
> +/* The compiler will assign r1r2 as a DI register, but it doesn't fit
> +   the macd operation, hence we need to fall back on the mac
> +   instruction.  */
> +typedef long long myint64_t;
> +
> +extern int d (int, myint64_t);
> +int b (int c)
> +{
> +  int x = (int) d;
> +  d(c, (myint64_t)x * 2 + 1);

Could you apply GNU coding standard whitespace on this line please.

Thanks,
Andrew

> +}
> +
> +/* { dg-final { scan-assembler "mac\\\s+r1" } } */
> -- 
> 2.17.1
> 


Re: [PATCH 2/2] [ARC] Avoid specific constants to end in limm field.

2018-09-20 Thread Andrew Burgess
* Claudiu Zissulescu  [2018-09-17 15:50:27 +0300]:

> The 3-operand instructions accepts to place an immediate into the
> second operand. However, this immediate will end up in the long
> immediate field. This patch avoids constants to end up in the limm
> field for particular instructions when compiling for size.
> 
> gcc/
> -xx-xx  Claudiu Zissulescu  
> 
>   * config/arc/arc.md (*add_n): Clean up pattern, update instruction
>   constraints.
>   (ashlsi3_insn): Update instruction constraints.
>   (ashrsi3_insn): Likewise.
>   (rotrsi3): Likewise.
>   (add_shift): Likewise.
>   * config/arc/constraints.md (Csz): New 32 bit constraint. It
>   avoids placing in the limm field small constants which, otherwise,
>   could end into a small instruction.
> ---
>  gcc/config/arc/arc.md   | 51 +---
>  gcc/config/arc/constraints.md   |  6 +++
>  gcc/testsuite/gcc.target/arc/tph_addx.c | 53 +
>  3 files changed, 78 insertions(+), 32 deletions(-)
>  create mode 100644 gcc/testsuite/gcc.target/arc/tph_addx.c

Looks good.

Thanks,
Andrew


> 
> diff --git a/gcc/config/arc/arc.md b/gcc/config/arc/arc.md
> index 2d108ef166d..c28a87cd3b0 100644
> --- a/gcc/config/arc/arc.md
> +++ b/gcc/config/arc/arc.md
> @@ -3056,30 +3056,17 @@ core_3, archs4x, archs4xd, archs4xd_slow"
> (set (match_dup 3) (match_dup 4))])
>  
>  (define_insn "*add_n"
> -  [(set (match_operand:SI 0 "dest_reg_operand" "=Rcqq,Rcw,W,W,w,w")
> - (plus:SI (ashift:SI (match_operand:SI 1 "register_operand" 
> "Rcqq,c,c,c,c,c")
> - (match_operand:SI 2 "_1_2_3_operand" ""))
> -  (match_operand:SI 3 "nonmemory_operand" 
> "0,0,c,?Cal,?c,??Cal")))]
> +  [(set (match_operand:SI 0 "dest_reg_operand" "=q,r,r")
> + (plus:SI (mult:SI (match_operand:SI 1 "register_operand" "q,r,r")
> +   (match_operand:SI 2 "_2_4_8_operand" ""))
> +  (match_operand:SI 3 "nonmemory_operand" "0,r,Csz")))]
>""
> -  "add%c2%? %0,%3,%1%&"
> +  "add%z2%?\\t%0,%3,%1%&"
>[(set_attr "type" "shift")
> -   (set_attr "length" "*,4,4,8,4,8")
> -   (set_attr "predicable" "yes,yes,no,no,no,no")
> -   (set_attr "cond" "canuse,canuse,nocond,nocond,nocond,nocond")
> -   (set_attr "iscompact" "maybe,false,false,false,false,false")])
> -
> -(define_insn "*add_n"
> -  [(set (match_operand:SI 0 "dest_reg_operand"  
> "=Rcqq,Rcw,W,  W,w,w")
> - (plus:SI (mult:SI (match_operand:SI 1 "register_operand" "Rcqq,  c,c,  
> c,c,c")
> -   (match_operand:SI 2 "_2_4_8_operand"   ""))
> -  (match_operand:SI 3 "nonmemory_operand""0,  
> 0,c,Cal,c,Cal")))]
> -  ""
> -  "add%z2%? %0,%3,%1%&"
> -  [(set_attr "type" "shift")
> -   (set_attr "length" "*,4,4,8,4,8")
> -   (set_attr "predicable" "yes,yes,no,no,no,no")
> -   (set_attr "cond" "canuse,canuse,nocond,nocond,nocond,nocond")
> -   (set_attr "iscompact" "maybe,false,false,false,false,false")])
> +   (set_attr "length" "*,4,8")
> +   (set_attr "predicable" "yes,no,no")
> +   (set_attr "cond" "canuse,nocond,nocond")
> +   (set_attr "iscompact" "maybe,false,false")])
>  
>  ;; N.B. sub[123] has the operands of the MINUS in the opposite order from
>  ;; what synth_mult likes.
> @@ -3496,7 +3483,7 @@ core_3, archs4x, archs4xd, archs4xd_slow"
>  ; provide one alternatice for this, without condexec support.
>  (define_insn "*ashlsi3_insn"
>[(set (match_operand:SI 0 "dest_reg_operand"   
> "=Rcq,Rcqq,Rcqq,Rcw, w,   w")
> - (ashift:SI (match_operand:SI 1 "nonmemory_operand" "!0,Rcqq,   0,  0, 
> c,cCal")
> + (ashift:SI (match_operand:SI 1 "nonmemory_operand" "!0,Rcqq,   0,  0, 
> c,cCsz")
>  (match_operand:SI 2 "nonmemory_operand"  "K,  K,RcqqM, 
> cL,cL,cCal")))]
>"TARGET_BARREL_SHIFTER
> && (register_operand (operands[1], SImode)
> @@ -3509,7 +3496,7 @@ core_3, archs4x, archs4xd, archs4xd_slow"
>  
>  (define_insn "*ashrsi3_insn"
>[(set (match_operand:SI 0 "dest_reg_operand" 
> "=Rcq,Rcqq,Rcqq,Rcw, w,   w")
> - (ashiftrt:SI (match_operand:SI 1 "nonmemory_operand" "!0,Rcqq,   0,  0, 
> c,cCal")
> + (ashiftrt:SI (match_operand:SI 1 "nonmemory_operand" "!0,Rcqq,   0,  0, 
> c,cCsz")
>(match_operand:SI 2 "nonmemory_operand"  "K,  K,RcqqM, 
> cL,cL,cCal")))]
>"TARGET_BARREL_SHIFTER
> && (register_operand (operands[1], SImode)
> @@ -3536,7 +3523,7 @@ core_3, archs4x, archs4xd, archs4xd_slow"
>  
>  (define_insn "rotrsi3"
>[(set (match_operand:SI 0 "dest_reg_operand" "=Rcw, w,   w")
> - (rotatert:SI (match_operand:SI 1 "register_operand"  " 0,cL,cCal")
> + (rotatert:SI (match_operand:SI 1 "register_operand"  " 0,cL,cCsz")
>(match_operand:SI 2 "nonmemory_operand" "cL,cL,cCal")))]
>"TARGET_BARREL_SHIFTER"
>"ror%? %0,%1,%2"
> @@ -4284,16 +4271,16 @@ core_3, archs4x, archs4xd, archs4xd_sl

[PATCH,FORTRAN] Tweak locations around CAF simplify

2018-09-20 Thread Bernhard Reutner-Fischer
addresses: FIXME: gfc_current_locus is wrong.
by using the locus of the current intrinsic.
Regtests clean, ok for trunk?

gcc/fortran/ChangeLog:

2018-09-20  Bernhard Reutner-Fischer  

* simplify.c (gfc_simplify_failed_or_stopped_images): Use
current intrinsic where locus.
(gfc_simplify_get_team): Likewise.
(gfc_simplify_num_images): Likewise.
(gfc_simplify_image_status): Likewise.
(gfc_simplify_this_image): Likewise.
---
 gcc/fortran/simplify.c | 28 +++-
 1 file changed, 15 insertions(+), 13 deletions(-)

diff --git a/gcc/fortran/simplify.c b/gcc/fortran/simplify.c
index d35bbbaaa1b..4ce91235e2d 100644
--- a/gcc/fortran/simplify.c
+++ b/gcc/fortran/simplify.c
@@ -2905,8 +2905,9 @@ gfc_simplify_failed_or_stopped_images (gfc_expr *team 
ATTRIBUTE_UNUSED,
 {
   if (flag_coarray == GFC_FCOARRAY_NONE)
 {
-  gfc_current_locus = *gfc_current_intrinsic_where;
-  gfc_fatal_error ("Coarrays disabled at %C, use %<-fcoarray=%> to 
enable");
+  gfc_fatal_error ("Coarrays disabled at %L, use %<-fcoarray=%> to enable",
+ gfc_current_intrinsic_where);
+
   return &gfc_bad_expr;
 }
 
@@ -2919,7 +2920,8 @@ gfc_simplify_failed_or_stopped_images (gfc_expr *team 
ATTRIBUTE_UNUSED,
   else
actual_kind = gfc_default_integer_kind;
 
-  result = gfc_get_array_expr (BT_INTEGER, actual_kind, 
&gfc_current_locus);
+  result = gfc_get_array_expr (BT_INTEGER, actual_kind,
+ gfc_current_intrinsic_where);
   result->rank = 1;
   return result;
 }
@@ -2935,15 +2937,16 @@ gfc_simplify_get_team (gfc_expr *level ATTRIBUTE_UNUSED)
 {
   if (flag_coarray == GFC_FCOARRAY_NONE)
 {
-  gfc_current_locus = *gfc_current_intrinsic_where;
-  gfc_fatal_error ("Coarrays disabled at %C, use %<-fcoarray=%> to 
enable");
+  gfc_fatal_error ("Coarrays disabled at %L, use %<-fcoarray=%> to enable",
+ gfc_current_intrinsic_where);
   return &gfc_bad_expr;
 }
 
   if (flag_coarray == GFC_FCOARRAY_SINGLE)
 {
   gfc_expr *result;
-  result = gfc_get_array_expr (BT_INTEGER, gfc_default_integer_kind, 
&gfc_current_locus);
+  result = gfc_get_array_expr (BT_INTEGER, gfc_default_integer_kind,
+ gfc_current_intrinsic_where);
   result->rank = 0;
   return result;
 }
@@ -5785,7 +5788,8 @@ gfc_simplify_num_images (gfc_expr *distance 
ATTRIBUTE_UNUSED, gfc_expr *failed)
 
   if (flag_coarray == GFC_FCOARRAY_NONE)
 {
-  gfc_fatal_error ("Coarrays disabled at %C, use %<-fcoarray=%> to 
enable");
+  gfc_fatal_error ("Coarrays disabled at %L, use %<-fcoarray=%> to enable",
+ gfc_current_intrinsic_where);
   return &gfc_bad_expr;
 }
 
@@ -5795,9 +5799,8 @@ gfc_simplify_num_images (gfc_expr *distance 
ATTRIBUTE_UNUSED, gfc_expr *failed)
   if (failed && failed->expr_type != EXPR_CONSTANT)
 return NULL;
 
-  /* FIXME: gfc_current_locus is wrong.  */
   result = gfc_get_constant_expr (BT_INTEGER, gfc_default_integer_kind,
- &gfc_current_locus);
+ gfc_current_intrinsic_where);
 
   if (failed && failed->value.logical != 0)
 mpz_set_si (result->value.integer, 0);
@@ -7678,8 +7681,8 @@ gfc_simplify_image_status (gfc_expr *image, gfc_expr 
*team ATTRIBUTE_UNUSED)
 {
   if (flag_coarray == GFC_FCOARRAY_NONE)
 {
-  gfc_current_locus = *gfc_current_intrinsic_where;
-  gfc_fatal_error ("Coarrays disabled at %C, use %<-fcoarray=%> to 
enable");
+  gfc_fatal_error ("Coarrays disabled at %L, use %<-fcoarray=%> to enable",
+ gfc_current_intrinsic_where);
   return &gfc_bad_expr;
 }
 
@@ -7716,9 +7719,8 @@ gfc_simplify_this_image (gfc_expr *coarray, gfc_expr *dim,
   if (coarray == NULL || !gfc_is_coarray (coarray))
 {
   gfc_expr *result;
-  /* FIXME: gfc_current_locus is wrong.  */
   result = gfc_get_constant_expr (BT_INTEGER, gfc_default_integer_kind,
- &gfc_current_locus);
+ gfc_current_intrinsic_where);
   mpz_set_si (result->value.integer, 1);
   return result;
 }
-- 
2.19.0



Re: C++ PATCH to refine c++/87109 patch

2018-09-20 Thread Jason Merrill
OK.

On Thu, Sep 20, 2018 at 4:49 PM, Marek Polacek  wrote:
> On Thu, Sep 20, 2018 at 12:20:08PM -0400, Jason Merrill wrote:
>> On Thu, Sep 20, 2018 at 11:53 AM, Marek Polacek  wrote:
>> > On Thu, Sep 20, 2018 at 11:25:38AM -0400, Jason Merrill wrote:
>> >> On Wed, Sep 19, 2018 at 9:50 PM, Marek Polacek  wrote:
>> >> > Aaaand this addresses 
>> >> > ,
>> >> > as I promised earlier.  I hope I got it right.
>> >> >
>> >> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
>> >> >
>> >> > 2018-09-19  Marek Polacek  
>> >> >
>> >> > PR c++/87109 - wrong ctor with maybe-rvalue semantics.
>> >> > * call.c (build_user_type_conversion_1): Refine the maybe-rvalue
>> >> > check to only return if we're converting from a base class.
>> >> >
>> >> > * g++.dg/cpp0x/ref-qual19.C: Adjust the expected results.
>> >> > * g++.dg/cpp0x/ref-qual20.C: New test.
>> >> >
>> >> > diff --git gcc/cp/call.c gcc/cp/call.c
>> >> > index ddf0ed044a0..4bbd77b9cef 100644
>> >> > --- gcc/cp/call.c
>> >> > +++ gcc/cp/call.c
>> >> > @@ -4034,9 +4034,13 @@ build_user_type_conversion_1 (tree totype, tree 
>> >> > expr, int flags,
>> >> >  conv->bad_p = true;
>> >> >
>> >> >/* We're performing the maybe-rvalue overload resolution and
>> >> > - a conversion function is in play.  This isn't going to work
>> >> > - because we would not end up with a suitable constructor.  */
>> >> > -  if ((flags & LOOKUP_PREFER_RVALUE) && !DECL_CONSTRUCTOR_P (cand->fn))
>> >> > + a conversion function is in play.  If we're converting from
>> >> > + a base class to a derived class, reject the conversion.  */
>> >> > +  if ((flags & LOOKUP_PREFER_RVALUE)
>> >> > +  && !DECL_CONSTRUCTOR_P (cand->fn)
>> >> > +  && CLASS_TYPE_P (fromtype)
>> >> > +  && CLASS_TYPE_P (totype)
>> >> > +  && DERIVED_FROM_P (fromtype, totype))
>> >>
>> >> Here fromtype is the type we're converting from, and what we want to
>> >> reject is converting the return value of the conversion op to a base
>> >> class.  CLASS_TYPE_P (fromtype) will always be true, since it has a
>> >> conversion op.  And I think we also want to handle the case of totype
>> >> being a reference.
>> >
>> > I think I totally misunderstood what this was about.  It's actually about
>> > this case
>> >
>> > struct Y { int y; };
>> > struct X : public Y { int x; };
>> >
>> > struct A {
>> >   operator X();
>> > };
>> >
>> > Y
>> > fn (A a)
>> > {
>> >   return a;
>> > }
>> >
>> > where we want to avoid slicing of X when converting X to Y, yes?
>>
>> Yes.
>
> Got it.  So I think the following is the real fix then:
>
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
>
> 2018-09-20  Marek Polacek  
>
> PR c++/87109 - wrong ctor with maybe-rvalue semantics.
> * call.c (build_user_type_conversion_1): Refine the maybe-rvalue
> check to only return if we're converting the return value to a base
> class.
>
> * g++.dg/cpp0x/ref-qual19.C: Adjust the expected results.
> * g++.dg/cpp0x/ref-qual20.C: New test.
>
> diff --git gcc/cp/call.c gcc/cp/call.c
> index ddf0ed044a0..b2ca667c8b4 100644
> --- gcc/cp/call.c
> +++ gcc/cp/call.c
> @@ -4034,10 +4034,12 @@ build_user_type_conversion_1 (tree totype, tree expr, 
> int flags,
>  conv->bad_p = true;
>
>/* We're performing the maybe-rvalue overload resolution and
> - a conversion function is in play.  This isn't going to work
> - because we would not end up with a suitable constructor.  */
> + a conversion function is in play.  Reject converting the return
> + value of the conversion function to a base class.  */
>if ((flags & LOOKUP_PREFER_RVALUE) && !DECL_CONSTRUCTOR_P (cand->fn))
> -return NULL;
> +for (conversion *t = cand->second_conv; t; t = next_conversion (t))
> +  if (t->kind == ck_base)
> +   return NULL;
>
>/* Remember that this was a list-initialization.  */
>if (flags & LOOKUP_NO_NARROWING)
> diff --git gcc/testsuite/g++.dg/cpp0x/ref-qual19.C 
> gcc/testsuite/g++.dg/cpp0x/ref-qual19.C
> index 8494b83e5b0..50f92977c49 100644
> --- gcc/testsuite/g++.dg/cpp0x/ref-qual19.C
> +++ gcc/testsuite/g++.dg/cpp0x/ref-qual19.C
> @@ -85,13 +85,13 @@ int
>  main ()
>  {
>C c1 = f (A());
> -  if (c1.i != 1)
> +  if (c1.i != 2)
>  __builtin_abort ();
>C c2 = f2 (A());
>if (c2.i != 2)
>  __builtin_abort ();
>C c3 = f3 ();
> -  if (c3.i != 1)
> +  if (c3.i != 2)
>  __builtin_abort ();
>C c4 = f4 ();
>if (c4.i != 2)
> @@ -100,13 +100,13 @@ main ()
>if (c5.i != 2)
>  __builtin_abort ();
>D c6 = f6 (B());
> -  if (c6.i != 3)
> +  if (c6.i != 4)
>  __builtin_abort ();
>D c7 = f7 (B());
>if (c7.i != 4)
>  __builtin_abort ();
>D c8 = f8 ();
> -  if (c8.i != 3)
> +  if (c8.i != 4)
>  __builtin_abort ();
>D c9 = f9 ();
>if (c9.i != 4)
> diff --git gcc/testsuite/g++.dg/cpp0x/ref-qual20

Re: [PATCH] Optimize sin(atan(x)), take 2

2018-09-20 Thread Giuliano Augusto Faulin Belinassi
Pinging match.pd and real.c maintainers, as suggested in IRC. Sorry if
it is inappropriate
On Mon, Sep 17, 2018 at 9:46 AM Giuliano Augusto Faulin Belinassi
 wrote:
>
> Ping.
>
> On Mon, Sep 3, 2018 at 4:11 PM, Giuliano Augusto Faulin Belinassi
>  wrote:
> > Fixed the issues pointed by the previous discussions. Closes PR86829.
> >
> > Adds substitution rules for sin(atan(x)) and cos(atan(x)), being
> > careful with overflow issues by constructing a assumed convergence
> > constant (see comment in real.c).
> >
> > 2018-09-03  Giuliano Belinassi 
> >
> > * match.pd: add simplification rules to sin(atan(x)) and cos(atan(x)).
> > * real.c: add code for assumed convergence constant to sin(atan(x)).
> > * real.h: allows the added code from real.c to be called externally.
> > * tree.c: add code for bulding nodes with the convergence constant.
> > * tree.h: allows the added code from tree.c to be called externally.
> > * sinatan-1.c: tests assumed convergence constant.
> > * sinatan-2.c: tests simplification rule.
> > * sinatan-3.c: likewise.
> >
> > There seems to be no broken tests in trunk that are related to this
> > modification.


[PATCH, OpenACC] Enable GOMP_MAP_FIRSTPRIVATE_INT for OpenACC

2018-09-20 Thread Julian Brown
This patch (by Cesar) changes the way that mapping of firstprivate
scalars works for OpenACC. For scalars whose type has a size equal to or
smaller than the size of a pointer, rather than copying the value of
the scalar to the target device and having a separate mapping for a
pointer to the copied value, a single "pointer" is mapped whose bits
are a type-punned representation of the value itself.

This is a performance optimisation: the idea, IIUC, is that it is a
good idea to avoid having all launched compute resources contend for a
single memory location -- the pointed-to cell containing the scalar on
the device, in this case. Cesar talks about speedups obtained here
(for an earlier version of the patch):

https://gcc.gnu.org/ml/gcc-patches/2017-01/msg02171.html

The patch implies an API change for the libgomp plugin, in that it must
now understand that NULL device pointers correspond to host pointers
that are actually type-punned scalars.

Tested with offloading to NVPTX and bootstrapped. OK for mainline?

Julian

ChangeLog

2018-09-20  Cesar Philippidis  
Julian Brown  

gcc/
* omp-low.c (maybe_lookup_field_in_outer_ctx): New function.
(convert_to_firstprivate_int): New function.
(convert_from_firstprivate_int): New function.
(lower_omp_target): Enable GOMP_MAP_FIRSTPRIVATE_INT in OpenACC.

libgomp/
* oacc-parallel.c (GOACC_parallel_keyed): Handle
GOMP_MAP_FIRSTPRIVATE_INT host addresses.
* plugin/plugin-nvptx.c (nvptx_exec): Handle
GOMP_MAP_FIRSTPRIVATE_INT host addresses.
* testsuite/libgomp.oacc-c++/firstprivate-int.C: New test.
* testsuite/libgomp.oacc-c-c++-common/firstprivate-int.c: New
test.
* testsuite/libgomp.oacc-fortran/firstprivate-int.f90: New test.
>From 1263a1bef1780fd015f9ee937c2b2df2717f1603 Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Mon, 17 Sep 2018 19:38:21 -0700
Subject: [PATCH 1/2] Enable GOMP_MAP_FIRSTPRIVATE_INT for OpenACC

	gcc/
	* omp-low.c (maybe_lookup_field_in_outer_ctx): New function.
	(convert_to_firstprivate_int): New function.
	(convert_from_firstprivate_int): New function.
	(lower_omp_target): Enable GOMP_MAP_FIRSTPRIVATE_INT in OpenACC.

	libgomp/
	* oacc-parallel.c (GOACC_parallel_keyed): Handle
	GOMP_MAP_FIRSTPRIVATE_INT host addresses.
	* plugin/plugin-nvptx.c (nvptx_exec): Handle GOMP_MAP_FIRSTPRIVATE_INT
	host addresses.
	* testsuite/libgomp.oacc-c++/firstprivate-int.C: New test.
	* testsuite/libgomp.oacc-c-c++-common/firstprivate-int.c: New test.
	* testsuite/libgomp.oacc-fortran/firstprivate-int.f90: New test.
---
 gcc/omp-low.c  | 171 +++--
 libgomp/oacc-parallel.c|   7 +-
 libgomp/plugin/plugin-nvptx.c  |   2 +-
 .../testsuite/libgomp.oacc-c++/firstprivate-int.C  |  83 +
 .../libgomp.oacc-c-c++-common/firstprivate-int.c   |  67 +++
 .../libgomp.oacc-fortran/firstprivate-int.f90  | 205 +
 6 files changed, 518 insertions(+), 17 deletions(-)
 create mode 100644 libgomp/testsuite/libgomp.oacc-c++/firstprivate-int.C
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/firstprivate-int.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/firstprivate-int.f90

diff --git a/gcc/omp-low.c b/gcc/omp-low.c
index fdabf67..5fc4a66 100644
--- a/gcc/omp-low.c
+++ b/gcc/omp-low.c
@@ -3264,6 +3264,19 @@ maybe_lookup_decl_in_outer_ctx (tree decl, omp_context *ctx)
   return t ? t : decl;
 }
 
+/* Returns true if DECL is present inside a field that encloses CTX.  */
+
+static bool
+maybe_lookup_field_in_outer_ctx (tree decl, omp_context *ctx)
+{
+  omp_context *up;
+
+  for (up = ctx->outer; up; up = up->outer)
+if (maybe_lookup_field (decl, up))
+  return true;
+
+  return false;
+}
 
 /* Construct the initialization value for reduction operation OP.  */
 
@@ -7470,6 +7483,88 @@ lower_omp_taskreg (gimple_stmt_iterator *gsi_p, omp_context *ctx)
 }
 }
 
+/* Helper function for lower_omp_target.  Converts VAR to something
+   that can be represented by a POINTER_SIZED_INT_NODE.  Any new
+   instructions are appended to GS.  This is primarily used to
+   optimize firstprivate variables, so that small types (less
+   precision than POINTER_SIZE) do not require additional data
+   mappings. */
+
+static tree
+convert_to_firstprivate_int (tree var, gimple_seq *gs)
+{
+  tree type = TREE_TYPE (var), new_type = NULL_TREE;
+  tree tmp = NULL_TREE;
+
+  if (omp_is_reference (var))
+type = TREE_TYPE (type);
+
+  if (INTEGRAL_TYPE_P (type) || POINTER_TYPE_P (type))
+{
+  if (omp_is_reference (var))
+	{
+	  tmp = create_tmp_var (type);
+	  gimplify_assign (tmp, build_simple_mem_ref (var), gs);
+	  var = tmp;
+	}
+
+  return fold_convert (pointer_sized_int_node, var);
+}
+
+  gcc_assert (tree_to_uhwi (TYPE_SIZE (type)) <= POINTER_SIZE);
+
+  new_type = lang_hooks.types.type_for_size

[PATCH, OpenACC] Fortran "declare create"/allocate support for OpenACC

2018-09-20 Thread Julian Brown
This patch (a combination of several previous patches by Cesar) adds
support for OpenACC 2.5's "declare create" directive with Fortran
allocatable variables (2.13.2. create clause). Allocate and deallocate
statements now allocate/deallocate memory on the target device as well
as on the host.

This works by triggering expansion of executable OpenACC
directives ("enter data" or "exit data") with new
GOMP_MAP_DECLARE_ALLOCATE or GOMP_MAP_DECLARE_DEALLOCATE clauses when
those statements are seen. Unlike other OpenACC functionality, no
additional explicit markup is required in the user's code.

This patch depends on the patch implementing GOMP_MAP_FIRSTPRIVATE_INT
for OpenACC, posted here:

https://gcc.gnu.org/ml/gcc-patches/2018-09/msg01202.html

Tested alongside that patch with offloading to NVPTX, and bootstrapped.
OK for trunk?

Thanks,

Julian

ChangeLog

2018-09-20  Cesar Philippidis  
Julian Brown  

gcc/
* omp-low.c (scan_sharing_clauses): Update handling of OpenACC declare
create, declare copyin and declare deviceptr to have local lifetimes.
(convert_to_firstprivate_int): Handle pointer types.
(convert_from_firstprivate_int): Likewise.  Create local storage for
the values being pointed to.  Add new orig_type argument.
(lower_omp_target): Handle GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
Add orig_type argument to convert_from_firstprivate_int call.
Allow pointer types with GOMP_MAP_FIRSTPRIVATE_INT.  Don't privatize
firstprivate VLAs.
* tree-pretty-print.c (dump_omp_clause): Handle
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.

gcc/fortran/
* gfortran.h (enum gfc_omp_map_op): Add OMP_MAP_DECLARE_ALLOCATE,
OMP_MAP_DECLARE_DEALLOCATE.
(gfc_omp_clauses): Add update_allocatable.
* trans-array.c (trans-stmt.h): Include.
(gfc_array_allocate): Call gfc_trans_oacc_declare_allocate for decls
that have oacc_declare_create attribute set.
* trans-decl.c (add_attributes_to_decl): Enable lowering of OpenACC
declare create, declare copyin and declare deviceptr clauses.
(add_clause): Don't duplicate OpenACC declare clauses.  Populate
sym->backend_decl so that it can be used to determine if two symbols are
unique.
(find_module_oacc_declare_clauses): Relax oacc_declare_create to
OMP_MAP_ALLOC, and oacc_declare_copyin to OMP_MAP_TO, in order to 
match OpenACC 2.5 semantics.
* trans-openmp.c (gfc_trans_omp_clauses): Use GOMP_MAP_ALWAYS_POINTER
(for update directive) or GOMP_MAP_FIRSTPRIVATE_POINTER (otherwise) for
allocatable scalar decls.  Handle OMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}
clauses.
(gfc_trans_oacc_executable_directive): Use GOMP_MAP_ALWAYS_POINTER
for allocatable scalar data clauses inside acc update directives.
(gfc_trans_oacc_declare_allocate): New function.
* trans-stmt.c (gfc_trans_allocate): Call
gfc_trans_oacc_declare_allocate for decls with oacc_declare_create
attribute set.
(gfc_trans_deallocate): Likewise.
* trans-stmt.h (gfc_trans_oacc_declare_allocate): Declare.

gcc/testsuite/
* gfortran.dg/goacc/declare-allocatable-1.f90: New test.

include/
* gomp-constants.h (enum gomp_map_kind): Define
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE} and GOMP_MAP_FLAG_SPECIAL_4.

libgomp/
* oacc-mem.c (gomp_acc_declare_allocate): New function.
* oacc-parallel.c (GOACC_enter_exit_data): Handle
GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
* testsuite/libgomp.oacc-fortran/allocatable-array.f90: New test.
* testsuite/libgomp.oacc-fortran/allocatable-scalar.f90: New test. 
* testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-2.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-3.f90: New test.
* testsuite/libgomp.oacc-fortran/declare-allocatable-4.f90: New test.
>From b63d0329fb73679b07f6318b8dd092113d5c8505 Mon Sep 17 00:00:00 2001
From: Julian Brown 
Date: Wed, 12 Sep 2018 20:15:08 -0700
Subject: [PATCH 2/2] Fortran "declare create"/allocate support for OpenACC

	gcc/
	* omp-low.c (scan_sharing_clauses): Update handling of OpenACC declare
	create, declare copyin and declare deviceptr to have local lifetimes.
	(convert_to_firstprivate_int): Handle pointer types.
	(convert_from_firstprivate_int): Likewise.  Create local storage for
	the values being pointed to.  Add new orig_type argument.
	(lower_omp_target): Handle GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
	Add orig_type argument to convert_from_firstprivate_int call.
	Allow pointer types with GOMP_MAP_FIRSTPRIVATE_INT.  Don't privatize
	firstprivate VLAs.
	* tree-pretty-print.c (dump_omp_clause): Handle
	GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.

	gcc/fortran/
	* gfortra

Re: [PATCH] PR libstdc++/78179 run long double tests separately

2018-09-20 Thread Hans-Peter Nilsson
> Date: Thu, 20 Sep 2018 15:22:23 +0100
> From: Jonathan Wakely 

> On 20/09/18 15:36 +0200, Christophe Lyon wrote:
> >On Wed, 19 Sep 2018 at 23:13, Rainer Orth  
> >wrote:
> >>
> >> Hi Christophe,
> >>
> >> > I have noticed failures on hypot-long-double.cc on arm, so I suggest we 
> >> > add:
> >> >
> >> > diff --git
> >> > a/libstdc++-v3/testsuite/26_numerics/headers/cmath/hypot-long-double.cc
> >> > b/libstdc++-v3/testsuite/26_numerics/headers/cmath/hypot-long-double.cc
> >> > index 8a05473..4c2e33b 100644
> >> > --- 
> >> > a/libstdc++-v3/testsuite/26_numerics/headers/cmath/hypot-long-double.cc
> >> > +++ 
> >> > b/libstdc++-v3/testsuite/26_numerics/headers/cmath/hypot-long-double.cc
> >> > @@ -17,7 +17,7 @@
> >> >
> >> >  // { dg-options "-std=gnu++17" }
> >> >  // { dg-do run { target c++17 } }
> >> > -// { dg-xfail-run-if "PR 78179" { powerpc-ibm-aix* hppa-*-linux* 
> >> > nios2-*-* } }
> >> > +// { dg-xfail-run-if "PR 78179" { powerpc-ibm-aix* hppa-*-linux*
> >> > nios2-*-* arm*-*-* } }
> >> >
> >> >  // Run the long double tests from hypot.cc separately, because they 
> >> > fail on a
> >> >  // number of targets. See PR libstdc++/78179 for details.
> >> >
> >> > OK?
> >>
> >> just a nit (and not a review): I'd prefer the target list to be sorted
> >> alphabetically, not completely random.
> >>
> >
> >Sure, I can sort the whole list, if OK on principle.
> 
> Yes, please go ahead and commit it with the sorted list.

"Me too".  Can I please, rather than piling on to a target list,
replace the whole xfail-list with the equivalent of "target { !
large_long_double }" (an already-existing "effective target")?

I'll leave the thought of running the test only for
large_long_double targets (qualifying the dg-do run) instead of
an xfail-clause for maintainers.

brgds, H-P


Re: [PATCH, OpenACC] Fortran "declare create"/allocate support for OpenACC

2018-09-20 Thread Bernhard Reutner-Fischer
[Please Cc the fortran list on fortran patches]

On Thu, 20 Sep 2018 19:59:08 -0400
Julian Brown  wrote:

> From b63d0329fb73679b07f6318b8dd092113d5c8505 Mon Sep 17 00:00:00 2001
> From: Julian Brown 
> Date: Wed, 12 Sep 2018 20:15:08 -0700
> Subject: [PATCH 2/2] Fortran "declare create"/allocate support for
> OpenACC
> 
>   gcc/
>   * omp-low.c (scan_sharing_clauses): Update handling of
> OpenACC declare create, declare copyin and declare deviceptr to have
> local lifetimes. (convert_to_firstprivate_int): Handle pointer types.
>   (convert_from_firstprivate_int): Likewise.  Create local
> storage for the values being pointed to.  Add new orig_type argument.
>   (lower_omp_target): Handle
> GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}. Add orig_type argument to
> convert_from_firstprivate_int call. Allow pointer types with
> GOMP_MAP_FIRSTPRIVATE_INT.  Don't privatize firstprivate VLAs.
>   * tree-pretty-print.c (dump_omp_clause): Handle
>   GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
> 
>   gcc/fortran/
>   * gfortran.h (enum gfc_omp_map_op): Add
> OMP_MAP_DECLARE_ALLOCATE, OMP_MAP_DECLARE_DEALLOCATE.
>   (gfc_omp_clauses): Add update_allocatable.
>   * trans-array.c (trans-stmt.h): Include.
>   (gfc_array_allocate): Call gfc_trans_oacc_declare_allocate
> for decls that have oacc_declare_create attribute set.
>   * trans-decl.c (add_attributes_to_decl): Enable lowering of
> OpenACC declare create, declare copyin and declare deviceptr clauses.
>   (add_clause): Don't duplicate OpenACC declare clauses.
> Populate sym->backend_decl so that it can be used to determine if two
> symbols are unique.
>   (find_module_oacc_declare_clauses): Relax oacc_declare_create
> to OMP_MAP_ALLOC, and oacc_declare_copyin to OMP_MAP_TO, in order to
>   match OpenACC 2.5 semantics.
>   * trans-openmp.c (gfc_trans_omp_clauses): Use
> GOMP_MAP_ALWAYS_POINTER (for update directive) or
> GOMP_MAP_FIRSTPRIVATE_POINTER (otherwise) for allocatable scalar
> decls.  Handle OMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE} clauses.
>   (gfc_trans_oacc_executable_directive): Use
> GOMP_MAP_ALWAYS_POINTER for allocatable scalar data clauses inside
> acc update directives. (gfc_trans_oacc_declare_allocate): New
> function.
>   * trans-stmt.c (gfc_trans_allocate): Call
>   gfc_trans_oacc_declare_allocate for decls with
> oacc_declare_create attribute set.
>   (gfc_trans_deallocate): Likewise.
>   * trans-stmt.h (gfc_trans_oacc_declare_allocate): Declare.
> 
>   gcc/testsuite/
>   * gfortran.dg/goacc/declare-allocatable-1.f90: New test.
> 
>   include/
>   * gomp-constants.h (enum gomp_map_kind): Define
>   GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE} and
> GOMP_MAP_FLAG_SPECIAL_4.
> 
>   libgomp/
>   * oacc-mem.c (gomp_acc_declare_allocate): New function.
>   * oacc-parallel.c (GOACC_enter_exit_data): Handle
>   GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
>   * testsuite/libgomp.oacc-fortran/allocatable-array.f90: New
> test.
>   * testsuite/libgomp.oacc-fortran/allocatable-scalar.f90: New
> test.
>   * testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90:
> New test.
>   * testsuite/libgomp.oacc-fortran/declare-allocatable-2.f90:
> New test.
>   * testsuite/libgomp.oacc-fortran/declare-allocatable-3.f90:
> New test.
>   * testsuite/libgomp.oacc-fortran/declare-allocatable-4.f90:
> New test. ---
>  gcc/fortran/gfortran.h |   6 +-
>  gcc/fortran/trans-array.c  |  10 +-
>  gcc/fortran/trans-decl.c   |  22 ++-
>  gcc/fortran/trans-openmp.c |  57 +-
>  gcc/fortran/trans-stmt.c   |  12 ++
>  gcc/fortran/trans-stmt.h   |   1 +
>  gcc/omp-low.c  |  62 --
>  .../gfortran.dg/goacc/declare-allocatable-1.f90|  25 +++
>  gcc/tree-pretty-print.c|   6 +
>  include/gomp-constants.h   |   6 +
>  libgomp/oacc-mem.c |  28 +++
>  libgomp/oacc-parallel.c|  30 ++-
>  .../libgomp.oacc-fortran/allocatable-array-1.f90   |  30 +++
>  .../libgomp.oacc-fortran/allocatable-scalar.f90|  33 
>  .../libgomp.oacc-fortran/declare-allocatable-1.f90 | 211
>  .../libgomp.oacc-fortran/declare-allocatable-2.f90
> |  48 + .../libgomp.oacc-fortran/declare-allocatable-3.f90 | 218
> + .../libgomp.oacc-fortran/declare-allocatable-4.f90
> |  66 +++ 18 files changed, 834 insertions(+), 37 deletions(-)
>  create mode 100644
> gcc/testsuite/gfortran.dg/goacc/declare-allocatable-1.f90 create mode
> 100644 libgomp/testsuite/libgomp.oacc-fortran/allocatable-array-1.f90
> create mode 100644
> libgomp/testsuite/libgomp.oacc-fortran/allocatable-scalar.f90 create
> mode 100644
> libgomp/testsuite/libgomp.oacc-fortran/declar

C++ PATCH for c++/87372, __func__ constexpr evaluation

2018-09-20 Thread Marek Polacek
The patch for P0595R1 - is_constant_evaluated had this hunk:

@@ -5279,7 +5315,9 @@ maybe_constant_init_1 (tree t, tree decl, bool 
allow_non_constant)
   else if (CONSTANT_CLASS_P (t) && allow_non_constant)
 /* No evaluation needed.  */;
   else
-t = cxx_eval_outermost_constant_expr (t, allow_non_constant, false, decl);
+t = cxx_eval_outermost_constant_expr (t, allow_non_constant,
+ !allow_non_constant,
+ pretend_const_required, decl);
   if (TREE_CODE (t) == TARGET_EXPR)
 {
   tree init = TARGET_EXPR_INITIAL (t);

the false -> !allow_non_constant change means that when calling
cxx_constant_init strict will be true because cxx_constant_init does not allow
non constants.  That means that for VAR_DECLs such as __func__ we'll call
decl_really_constant_value instead of decl_constant_value.  But only the latter
can evaluate __func__ to "foo()".

Jakub, was there a specific reason for this change?  Changing it back still
regtests cleanly and the attached test compiles again.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2018-09-20  Marek Polacek  

PR c++/87372 - __func__ constexpr evaluation.
* constexpr.c (maybe_constant_init_1): Pass false for strict down to
cxx_eval_outermost_constant_expr.

* g++.dg/cpp1y/func_constexpr2.C: New test.

diff --git gcc/cp/constexpr.c gcc/cp/constexpr.c
index fdea769faa9..6436b2f832d 100644
--- gcc/cp/constexpr.c
+++ gcc/cp/constexpr.c
@@ -5361,7 +5361,7 @@ maybe_constant_init_1 (tree t, tree decl, bool 
allow_non_constant,
 /* No evaluation needed.  */;
   else
 t = cxx_eval_outermost_constant_expr (t, allow_non_constant,
- !allow_non_constant,
+ /*strict*/false,
  pretend_const_required, decl);
   if (TREE_CODE (t) == TARGET_EXPR)
 {
diff --git gcc/testsuite/g++.dg/cpp1y/func_constexpr2.C 
gcc/testsuite/g++.dg/cpp1y/func_constexpr2.C
index e69de29bb2d..b1576e64960 100644
--- gcc/testsuite/g++.dg/cpp1y/func_constexpr2.C
+++ gcc/testsuite/g++.dg/cpp1y/func_constexpr2.C
@@ -0,0 +1,21 @@
+// PR c++/87372
+// { dg-do compile { target c++14 } }
+
+constexpr int
+foo (char const *s)
+{
+  int i = 0;
+  while (s[i])
+++i;
+  return i;
+}
+
+constexpr int
+bar ()
+{
+  constexpr int l = foo (__PRETTY_FUNCTION__);
+  constexpr int l2 = foo (__FUNCTION__);
+  constexpr int l3 = foo (__func__);
+  return l + l2 + l3;
+}
+static_assert (bar () == 25, "");


Re: [PATCH] Remove arc profile histogram in non-LTO mode.

2018-09-20 Thread Bin.Cheng
On Thu, Sep 20, 2018 at 6:43 PM Jan Hubicka  wrote:
>
> > On Thu, Sep 20, 2018 at 5:26 PM Jan Hubicka  wrote:
> > >
> > > > On Thu, Sep 20, 2018 at 2:11 AM Martin Liška  wrote:
> > > > >
> > > > > Hello.
> > > > >
> > > > > I've been working for some time on a patch that simplifies how we set
> > > > > the hotness threshold of basic blocks. Currently, we calculate so 
> > > > > called
> > > > > arc profile histograms that should identify edges that cover 99.9% of 
> > > > > all
> > > > > branching. These edges are then identified as hot. Disadvantage of 
> > > > > the approach
> > > > > is that it comes with significant overhead in run-time and GCC 
> > > > > related code
> > > > > is also not trivial. Moreover, anytime a histogram is merged after an 
> > > > > instrumented
> > > > > run, the resulting histogram is misleading.
> > > > >
> > > > > That said, I decided to simplify it again, remove usage of the 
> > > > > histogram and return
> > > > > to what we have before (--param hot-bb-count-fraction). That 
> > > > > basically says that
> > > > > we consider hot each edge that has execution count bigger than 
> > > > > sum_max / 10.000.
> > > > >
> > > > > Note that LTO+PGO remains untouched as it still uses histogram that 
> > > > > is dynamically
> > > > > calculated by read arc counts.
> > > > Hi,
> > > > Does this affect AutoFDO stuff?  AutoFDO is broken and I am fixing it
> > > > now, on the basis of current code.
> > >
> > > This is indpendent of Auto-FDO. There we probably can define cutoffs for 
> > > hot-cold
> > > partitions in the tool translating global data into per-file data read by 
> > > GCC.
> > > It is great you will take a deper look at autoFDO. it indeed needs work!
> > >
> > > The patch is OK, thank for working on it!  Histograms was added by google 
> > > as
> > > bit of experiment, but I do not think they turned out to be useful. The 
> > > data
> > I did some experiments showing it is somehow useful, for autoFDO.  To
> > which extend it is useful remains a question I need to investigate
> > later.
>
> Indeed auto-FDO has better idea about whole program behaviour. We could revive
> the patch for streaming histograms and reading them to compiler if that turns
> out to be a good idea. I can see that auto-FDO profile data tells you pretty
> clearly where the hot spots are and it is not as easy to recover this 
> information
> from profile annotated CFG becuase of all the transforms we do.
> Lets fix and benchmark auto-FDO first and then we could decide what is best 
> option.
> Putting the stream-in code back should not be hard if it turns out to be 
> useful.
>
> Main problem with current historams with normal FDO is the fact that you need
> to merge them between runs which is technically impossible job to do, so they
> work for programs run once, but not for programs run many times in train runs
> like gcc itself.  It seems to me that for those relaly interested in
> performance it is good idea to switch to LTO and that makes it possible to
> calculate histograms during the linking stage.

honza, thanks very much for detailed explanation.

Thanks,
bin
>
> Honza
> >
> > Thanks,
> > bin
> > > produced by them was not very related to what the IPA profile generation 
> > > produces
> > > and thus it did not seem to match reality very well.
> > >
> > > Honza
> > > >
> > > > Thanks,
> > > > bin
> > > > >
> > > > > Note the statistics of the patch:
> > > > >   19 files changed, 101 insertions(+), 1216 deletions(-)
> > > > >
> > > > > I'm attaching file sizes of SPEC2006 int benchmark.
> > > > >
> > > > > Patch survives testing on x86_64-linux-gnu machine.
> > > > > Ready to be installed?
> > > > >
> > > > > Martin
> > > > >
> > > > > gcc/ChangeLog:
> > > > >
> > > > > 2018-09-19  Martin Liska  
> > > > >
> > > > > * auto-profile.c (autofdo_source_profile::read): Do not
> > > > > set sum_all.
> > > > > (read_profile): Do not add working sets.
> > > > > (read_autofdo_file): Remove sum_all.
> > > > > (afdo_callsite_hot_enough_for_early_inline): Remove const
> > > > > qualifier.
> > > > > * coverage.c (struct counts_entry): Remove gcov_summary.
> > > > > (read_counts_file): Read new GCOV_TAG_OBJECT_SUMMARY,
> > > > > do not support GCOV_TAG_PROGRAM_SUMMARY.
> > > > > (get_coverage_counts): Remove summary and expected
> > > > > arguments.
> > > > > * coverage.h (get_coverage_counts): Likewise.
> > > > > * doc/gcov-dump.texi: Remove -w option.
> > > > > * gcov-dump.c (dump_working_sets): Remove.
> > > > > (main): Do not support '-w' option.
> > > > > (print_usage): Likewise.
> > > > > (tag_summary): Likewise.
> > > > > * gcov-io.c (gcov_write_summary): Do not dump
> > > > > histogram.
> > > > > (gcov_read_summary): Likewise.
> > > > > (gcov_histo_index): Remove.
> > > > > (gcov_histogram_merge): Likewise.
> > > > >