date:20181207

Re: [PATCH] handle function pointers in __builtin_object_size (PR 88372)

2018-12-07 Thread Richard Biener

On Thu, 6 Dec 2018, Martin Sebor wrote:

> On 12/6/18 2:26 PM, Jakub Jelinek wrote:
> > On Thu, Dec 06, 2018 at 01:21:58PM -0700, Martin Sebor wrote:
> > > Bug 88372 - alloc_size attribute is ignored on function pointers
> > > points out that even though the alloc_size attribute is accepted
> > > on function pointers it doesn't have any effect on Object Size
> > > Checking.  The reporter, who is implementing the feature in Clang,
> > > wants to know if by exposing it under the same name they won't be
> > > causing incompatibilities with GCC.
> > > 
> > > I don't think it's intentional that GCC doesn't take advantage of
> > > the attribute for Object Size Checking, and certainly not to detect
> > > the same kinds of issues as with other allocation functions (such
> > > as excessive or negative size arguments).  Rather, it's almost
> > > certainly an oversight since GCC does make use of function pointer
> > > attributes in other contexts (e.g., attributes alloc_align and
> > > noreturn).
> > > 
> > > As an oversight, I think it's fair to consider it a bug rather
> > > than a request for an enhancement.  Since not handling
> > > the attribute in Object Size Checking has adverse security
> > > implications, I also think this bug should be addressed in GCC
> > > 9.  With that, I submit the attached patch to resolve both
> > > aspects of the problem.
> > 
> > This is because alloc_object_size has been written before we had attributes
> > like alloc_size.  The only thing I'm unsure about is whether we should
> > prefer gimple_call_fntype or TREE_TYPE (gimple_call_fndecl ()) if it is a
> > direct call or if we should try to look for alloc_size attribute on both
> > of those if they are different types.  E.g. if somebody does
> > 
> > #include 
> > 
> > typedef void *(*allocfn) (size_t);
> > 
> > static inline void *
> > foo (allocfn fn, size_t sz)
> > {
> >return fn (sz);
> > }
> > 
> > static inline void *
> > bar (size_t sz)
> > {
> >return foo (malloc, sz);
> > }
> > 
> > then I think this patch would no longer treat it as malloc.
> > 
> > As this is security relevant, I'd probably look for alloc_size
> > attribute in both gimple_call_fntype and, if gimple_call_fndecl is non-NULL,
> > its TREE_TYPE.
> 
> Thanks for the test case!  I wondered if using fntype would
> always work but couldn't think of when it wouldn't.  I've
> adjusted the function to use both and added the test case.
> 
> While thinking about this it occurred to me that alloc_size
> is only documented as a function attribute but not one that
> applies to pointers or types.  I added documentation for
> these uses to the Common Type and Common Variable sections.

Please always _only_ use gimple_call_fntype when the decl
isn't visible.  As elsewhere the type of the function
pointer doesn't have any semantic meaning (it could be a
wrong one).

Richard.

> Martin
> 
> PS Other function attributes that also apply to types and
> variables are only documented in the function section.  They
> should also be mentioned in the other sections.  Which, if
> done in the established style, will result in duplicating
> a lot of text in three places.  I think that suggests that
> we might want to think about structuring these sections of
> the manual differently to avoid the duplication.
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [PATCH][RFC] Poison bitmap_head->obstack

2018-12-07 Thread Richard Biener

On Thu, 6 Dec 2018, Martin Sebor wrote:

> On 12/5/18 7:58 AM, Richard Biener wrote:
> > On Wed, 5 Dec 2018, Jeff Law wrote:
> > 
> > > On 12/4/18 6:16 AM, Richard Biener wrote:
> > > > 
> > > > This tries to make bugs like that in PR88317 harder to create by
> > > > introducing a bitmap_release function that can be used as
> > > > pendant to bitmap_initialize for non-allocated bitmap heads.
> > > > The function makes sure to poison the bitmaps obstack member
> > > > so the obstack the bitmap was initialized with can be safely
> > > > released.
> > > > 
> > > > The patch also adds a default constructor to bitmap_head
> > > > doing the same, but for C++ reason initializes to a
> > > > all-zero bitmap_obstack rather than 0xdeadbeef because
> > > > the latter isn't possible in constexpr context (it is
> > > > by using unions but then things start to look even more ugly).
> > > > 
> > > > The stage1 compiler might end up with a few extra runtime
> > > > initializers but constexpr makes sure they'll vanish for
> > > > later stages.
> > > > 
> > > > I had to paper over that you-may-not-use-memset-to-zero classes
> > > > with non-trivial constructors warning in two places and I
> > > > had to teach gengtype about CONSTEXPR (probably did so in
> > > > an awkward way - suggestions and pointers into gengtype
> > > > appreciated).
> > > > 
> > > > Bootstrapped (with host GCC 4.8) on x86_64-unknown-linux-gnu,
> > > > testing in progress.
> > > > 
> > > > The LRA issue seems to be rare enough (on x86_64...) that
> > > > I didn't trip over it sofar.
> > > > 
> > > > Comments?  Do we want this?  Not sure how we can easily
> > > > discover all bitmap_clear () users that should really
> > > > use bitmap_release (suggestion for a better name appreciated
> > > > as well - I thought about bitmap_uninitialize)
> > > > 
> > > > Richard.
> > > > 
> > > > 2018-12-04  Richard Biener  
> > > > 
> > > > * bitmap.c (bitmap_head::crashme): Define.
> > > > * bitmap.h (bitmap_head): Add constexpr default constructor
> > > > poisoning the obstack member.
> > > > (bitmap_head::crashme): Declare.
> > > > (bitmap_release): New function clearing a bitmap and poisoning
> > > > the obstack member.
> > > > * gengtype.c (main): Make it recognize CONSTEXPR.
> > > > 
> > > > * lra-constraints.c (lra_inheritance): Use bitmap_release
> > > > instead of bitmap_clear.
> > > > 
> > > > * ira.c (ira): Work around warning.
> > > > * regrename.c (create_new_chain): Likewise.
> > > I don't see enough complexity in here to be concerning -- so if it makes
> > > it harder to make mistakes, then I'm for it.
> > 
> > Any comment about the -Wclass-memaccess workaround sprinkling around two
> > (void *) conversions?  I didn't dig deep enough to look for a more
> > appropriate solution, also because there were some issues with older
> > host compilers and workarounds we installed elsewhere...
> 
> Using '*head = du_head ();' is the solution I like to encourage
> to zero out typed objects.  When the memory isn't initialized
> and the type has a user-defined ctor, bypassing it is undefined
> even if the ctor is a no-op (the ctor starts the lifetime of
> the object).  In that case, placement new is the appropriate
> way to bring the object to life and value-initialize it:
> 
>   new (head) du_head ();
> 
> If that's not good enough or portable enough to ancient compilers
> then I would suggest adding a comment to explain the intent of
> the cast.

Yes, I know.  But we have workarounds like the following and I
didn't want to open up another hole just for this debugging feature

#ifdef BROKEN_VALUE_INITIALIZATION
  /* Versions of GCC before 4.4 sometimes leave certain objects
 uninitialized when value initialized, though if the type has
 user defined default ctor, that ctor is invoked.  As a workaround
 perform clearing first and then the value initialization, which
 fixes the case when value initialization doesn't initialize due to
 the bugs and should initialize to all zeros, but still allows
 vectors for types with user defined default ctor that initializes
 some or all elements to non-zero.  If T has no user defined
 default ctor and some non-static data members have user defined
 default ctors that initialize to non-zero the workaround will
 still not work properly; in that case we just need to provide
 user defined default ctor.  */
  memset (dst, '\0', sizeof (T) * n);
#endif
  for ( ; n; ++dst, --n)
::new (static_cast(dst)) T ();

Richard.

> Martin
> 
> > 
> > Otherwise yes, it makes it harder to do mistakes.  I'll probably
> > use bitmap_head::crashme instead of 0xdeadbeef in bitmap_release.
> > And of course we'd need to hunt down users of bitmap_clear that
> > should be bitmap_release instead...
> > 
> > Thanks,
> > Richard.
> > 
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norto

Re: [PATCH AutoFDO/2]Treat ZERO as common profile probability/count

2018-12-07 Thread Bin.Cheng

On Tue, Dec 4, 2018 at 4:40 PM Bin.Cheng  wrote:
>
> On Thu, Nov 29, 2018 at 12:20 AM Jan Hubicka  wrote:
> >
> > > On Tue, Nov 20, 2018 at 6:55 PM bin.cheng  
> > > wrote:
> > > >
> > > > Sender:Jan Hubicka 
> > > > Sent at:2018 Nov 5 (Mon) 22:21
> > > > To:Richard Biener 
> > > > Cc:bin.cheng ; GCC Patches 
> > > > 
> > > > Subject:Re: [PATCH AutoFDO/2]Treat ZERO as common profile 
> > > > probability/count
> > > >
> > > > >
> > > > > > On Wed, Oct 31, 2018 at 7:30 AM bin.cheng 
> > > > > >  wrote:
> > > > > > >
> > > > > > > Hi,
> > > > > > > In new profile probability/count infra, we have different 
> > > > > > > precision quality categories,
> > > > > > > and probabilities/counts of different categories are not supposed 
> > > > > > > to be compared or
> > > > > > > calculated.  Though in general is an improvement, it introduces 
> > > > > > > unexpected behavior.
> > > > > > > Specifically, class profile_probablity and profile_count 
> > > > > > > themselves are implemented
> > > > > > > by comparing probabilities/counts against profile_count::zero().  
> > > > > > > while zero() is of
> > > > > > > profile_precision category, it's always compared different to 
> > > > > > > zero of other precision
> > > > > > > categories including afdo.
> > > > > > >
> > > > > > > I can see two ways fixing this: 1) Treat zero as a common 
> > > > > > > probability/count regardless
> > > > > > > of its category; 2) Provide an "is_zero" method rather than 
> > > > > > > relying on "==" comparison
> > > > > > > against probability_count::zero().  2) requires lots of code 
> > > > > > > changes so I went with 1)
> > > > > > > in this patch set.  This patch doesn't handle "always" but it 
> > > > > > > might be.
> > > > > > >
> > > > > > > This patch also corrects a minor issue where we try to invert an 
> > > > > > > uninitialized value.
> > > > > > >
> > > > > > > Bootstrap and test on x86_64 in patch set.  Is it OK?
> > > > > >
> > > > > > I'll defer on the emit_store_flag_force change, likewise for the 
> > > > > > zero
> > > > > > handling in
> > > > > > compares - I don't think zeros of different qualities should 
> > > > > > compare equal.
> > > > > > Would compares against ::always() not have the very same issue?
> > > > > > Likewise ::even(),
> > > > > > ::likely(), etc.?  Those always get guessed quality.
> > > > > >
> > > > > > The invert change looks OK to me.  The related change to the 
> > > > > > always() API would
> > > > > > suggest to replace guessed_always() with always (guessed) and also 
> > > > > > do similar
> > > > > > changes throughout the whole API...
> > > > > >
> > > > > > Honza?
> > > > >
> > > > > The zeros are really differenct zeros.  profile_count::zero makes us 
> > > > > to
> > > > > drop the basic block into cold section because we know that it won't 
> > > > > be
> > > > > executed in normal run of program (either we have accurate profile
> > > > > feedback or by proving that the program is on way to crash or user
> > > > > annotated cold section).  Having guessed zero or auto-fdo zero won't
> > > > > make us to do such agressive size optimization.
> > > > > This is important since those zeros relatively commonly happens by
> > > > > accident and thus if we dropped all the code to cold section the cold
> > > > > section would be visited relativel often during execution of program
> > > > > which would eliminate its need.
> > > > >
> > > > > Most comparsion in profile-count.h which goes agains 
> > > > > profile_count==zero
> > > > > are realy intended to pass only for this "aboslute zero". They bypass
> > > > > the precision adjusmtents which normally happen when you merge values
> > > > > of different precision.
> > > > >
> > > > > What kind of unexpected behaviour are you seeing?
> > > > > We already have nonzero_p which is what we use when we want to know 
> > > > > that
> > > > > count is non-zero in some sense of precision.
> > > > Hi Honza,
> > > > Sorry for letting this slip away.  So in case of AutoFDO, due to the 
> > > > nature
> > > > of sampling, lots of funcs/bbs are annotated with zero profile_count in 
> > > > afdo
> > > > precision, and we have checks against zero profile_count in precise 
> > > > precision
> > > > All these checks end up with false and cause issues.  Take the code in
> > > > update_profiling_info as an example:
> > > >
> > > > update_profiling_info (struct cgraph_node *orig_node,
> > > >struct cgraph_node *new_node)
> > > > {
> > > >struct cgraph_edge *cs;
> > > >struct caller_statistics stats;
> > > >profile_count new_sum, orig_sum;
> > > >profile_count remainder, orig_node_count = orig_node->count;
> > > >
> > > >if (!(orig_node_count.ipa () > profile_count::zero ()))
> > > >  return;
> > > >//...
> > > >for (cs = new_node->callees; cs; cs = cs->next_callee)
> > > >  cs->count = cs->count.apply_scale (new_sum, orig_node_count);
> > > >
> > > > Since we also have below code in profile_coun

[PATCH] Fix PR63184

2018-12-07 Thread Richard Biener



The following fixes PR63184 by using tree-affine to resolve pointer
comparisons.  Instead of trying to stick this into a match.pd pattern
the following does this in the more constrained forwprop environment.

I've only implemented the cases where the comparison resolves to a
compile-time value, not the case where for example &a[i] < &a[j]
could be simplified to i < j.  I'm not sure I can trust the
tree-affine machinery enough here to do that.

Both testcases require some CSE to happen thus the first forwprop
pass doesn't catch it.

Bootstrap & regtest running on x86_64-unknown-linux-gnu.

I'll collect some statistics from bootstrap.

Richard.

>From 79e88f7d31def4c49fe0b401e7fda408ef447710 Mon Sep 17 00:00:00 2001
From: Richard Guenther 
Date: Fri, 7 Dec 2018 10:47:07 +0100
Subject: [PATCH] fix-pr63184

2018-12-07  Richard Biener  

PR tree-optimization/63184
* tree-ssa-forwprop.c: Include tree-affine.h.
(ttacache): New global.
(forward_propagate_into_comparison_1): Use tree-affine to
resolve pointer comparisons.
(pass_forwprop::execute): Release ttacache.

* c-c++-common/pr19807-2.c: Remove XFAIL.
* c-c++-common/pr19807-3.c: Likewise.

diff --git a/gcc/testsuite/c-c++-common/pr19807-2.c 
b/gcc/testsuite/c-c++-common/pr19807-2.c
index d2c010140d0..18949a9bc5a 100644
--- a/gcc/testsuite/c-c++-common/pr19807-2.c
+++ b/gcc/testsuite/c-c++-common/pr19807-2.c
@@ -1,6 +1,5 @@
-/* Some targets can optimize this on RTL.  */
-/* { dg-do link { target { x86_64-*-* i?86-*-* } } } */
-/* { dg-options "-O -fdump-tree-optimized" } */
+/* { dg-do link } */
+/* { dg-options "-O -fdump-tree-forwprop2" } */
 
 extern void link_error(void);
 int i;
@@ -12,4 +11,4 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-not "link_error" "optimized" { xfail *-*-* } } 
} */
+/* { dg-final { scan-tree-dump-not "link_error" "forwprop2" } } */
diff --git a/gcc/testsuite/c-c++-common/pr19807-3.c 
b/gcc/testsuite/c-c++-common/pr19807-3.c
index bb7f9827725..f6a1531677f 100644
--- a/gcc/testsuite/c-c++-common/pr19807-3.c
+++ b/gcc/testsuite/c-c++-common/pr19807-3.c
@@ -1,6 +1,5 @@
-/* Some targets can optimize this on RTL.  */
-/* { dg-do link { target { x86_64-*-* i?86-*-* } } } */
-/* { dg-options "-O -fdump-tree-optimized" } */
+/* { dg-do link } */
+/* { dg-options "-O -fdump-tree-forwprop2" } */
 
 extern void link_error(void);
 int i;
@@ -12,4 +11,4 @@ int main()
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-not "link_error" "optimized" { xfail *-*-* } } 
} */
+/* { dg-final { scan-tree-dump-not "link_error" "forwprop2" } } */
diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
index 7449eaf86ae..08e41e2d85d 100644
--- a/gcc/tree-ssa-forwprop.c
+++ b/gcc/tree-ssa-forwprop.c
@@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "optabs-tree.h"
 #include "tree-vector-builder.h"
 #include "vec-perm-indices.h"
+#include "tree-affine.h"
 
 /* This pass propagates the RHS of assignment statements into use
sites of the LHS of the assignment.  It's basically a specialized
@@ -184,6 +185,8 @@ static tree rhs_to_tree (tree type, gimple *stmt);
 
 static bitmap to_purge;
 
+static hash_map *ttacache;
+
 /* Const-and-copy lattice.  */
 static vec lattice;
 
@@ -459,9 +462,72 @@ forward_propagate_into_comparison_1 (gimple *stmt,
   /* If that wasn't successful either, try both operands.  */
   if (rhs0 != NULL_TREE
   && rhs1 != NULL_TREE)
-tmp = combine_cond_expr_cond (stmt, code, type,
- rhs0, rhs1,
- !(single_use0_p && single_use1_p));
+{
+  tmp = combine_cond_expr_cond (stmt, code, type,
+   rhs0, rhs1,
+   !(single_use0_p && single_use1_p));
+  if (tmp)
+   return tmp;
+}
+
+  /* When comparing two pointers try harder to resolve it by expanding
+ definitions (albeit not using our lattice) and looking at the
+ difference using the affine machinery.  */
+  if (POINTER_TYPE_P (TREE_TYPE (op0))
+  && TREE_CODE (op0) == SSA_NAME
+  && TREE_CODE (op1) == SSA_NAME)
+{
+  aff_tree aff0, aff1;
+  tree_to_aff_combination_expand (op0, TREE_TYPE (op0), &aff0, &ttacache);
+  tree_to_aff_combination_expand (op1, TREE_TYPE (op1), &aff1, &ttacache);
+  aff_combination_scale (&aff1, -1);
+  aff_combination_add (&aff0, &aff1);
+  if (aff_combination_const_p (&aff0))
+   {
+ /* Pointer difference is to be interpreted signed.  */
+ poly_widest_int off = wi::sext (aff0.offset,
+ TYPE_PRECISION (TREE_TYPE (op0)));
+ if (known_eq (off, 0))
+   switch (code)
+ {
+ case EQ_EXPR:
+ case LE_EXPR:
+ case GE_EXPR:
+   return constant_boolean_node (true, type);
+ case NE_EXPR:
+ case LT_EXPR:
+

Re: libgo patch committed: Add precise stack scan support

2018-12-07 Thread Rainer Orth

Hi Ian,

> This libgo patch by Cherry Zhang adds support for precise stack
> scanning to the Go runtime.  This uses per-function stack maps stored
> in the exception tables in the language-specific data area.  The
> compiler needs to generate these stack maps; currently this is only
> done by a version of LLVM, not by GCC.  Each safepoint in a function
> is associated with a (real or dummy) landing pad, and its "type info"
> in the exception table is a pointer to the stack map. When a stack is
> scanned, the stack map is found by the stack unwinding code.
>
> For precise stack scan we need to unwind the stack. There are three cases:
>
> - If a goroutine is scanning its own stack, it can unwind the stack
> and scan the frames.
>
> - If a goroutine is scanning another, stopped, goroutine, it cannot
> directly unwind the target stack. We handle this by switching
> (runtime.gogo) to the target g, letting it unwind and scan the stack,
> and switch back.
>
> - If we are scanning a goroutine that is blocked in a syscall, we send
> a signal to the target goroutine's thread, and let the signal handler
> unwind and scan the stack. Extra care is needed as this races with
> enter/exit syscall.
>
> Currently this is only implemented on GNU/Linux.
>
> Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
> to mainline.

this broke Solaris (and other non-Linux) bootstrap:

/vol/gcc/src/hg/trunk/local/libgo/go/runtime/stubs_nonlinux.go:20:1: error: 
missing return at end of function
   20 | }
  | ^

Fixed by returning 0 for now, the return value is ignored in
go/runtime/proc.go (scang) anyway.

The Solaris equivalents would be thr_self (for gettid) and thr_kill (for
tgkill).  It were probably better to use non-Linux specific names
here...

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


# HG changeset patch
# Parent  2c9abfeb2f017f5c48ccdee7ce72d72bcffb9250
Fix go/runtime/stubs_nonlinux.go compilation

diff --git a/libgo/go/runtime/stubs_nonlinux.go b/libgo/go/runtime/stubs_nonlinux.go
--- a/libgo/go/runtime/stubs_nonlinux.go
+++ b/libgo/go/runtime/stubs_nonlinux.go
@@ -17,4 +17,5 @@ func gettid() _pid_t {
 
 func tgkill(pid _pid_t, tid _pid_t, sig uint32) uint32 {
 	throw("tgkill not implemented")
+	return 0;
 }

Re: RFC: libiberty PATCH to disable demangling of ancient mangling schemes

2018-12-07 Thread Nick Clifton

Hi Jason,

>> Looks good to me.  Independently, do you see a reason not to disable the
>> old demangler entirely?
> 
> Like so.  Does anyone object to this?  These mangling schemes haven't
> been relevant in decades.

I am not really familiar with this old scheme, so please excuse my ignorance
in asking these questions:

  * How likely is it that there are old toolchain in use out there that still 
use the v2 mangling ?  Ie I guess that I am asking "which generation(s)
of gcc used v2 mangling ?"

  * If a user does try to demangle a v2 mangled name, what will happen ?
Ie I guess I am asking if they will be given a hint that they need to use
an older toolchain in order to demangle the names.

Cheers
  Nick

PR88346, Inconsistent list of CPUs supported by the rs6000 backend after r266502

2018-12-07 Thread Alan Modra

This patch removes the %e error for AIX, since it seems there has been
no attempt to keep ASM_CPU_SPEC cpu support up to date for AIX, and
adds missing entries to ASM_CPU_SPEC in rs6000.h.  Removing the %e
isn't ideal, but leaving it in and hitting a compiler error for -mcpu
cases where the AIX assembler works fine with default options, is
worse.

The rs64a->rs64 name change happened a long time ago as a fix for
PR20813 (git commit c92b4c3f5b).

Bootstrapped and regression tested powerpc64le-linux.  OK?

PR 88346
* config/rs6000/rs6000.h (ASM_CPU_SPEC): Correct %e message.  Handle
-mcpu=rs64, not -mcpu=rs64a.  Handle -mcpu=powerpc64 and -mcpu=titan.
* config/rs6000/driver-rs6000.c (asm_names): Similarly.
* config/rs6000/aix71.h (ASM_CPU_SPEC): Delete %e message.  Handle
-mcpu=rs64, not -mcpu=rs64a.
* config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise.

diff --git a/gcc/config/rs6000/aix71.h b/gcc/config/rs6000/aix71.h
index 2398ed64baa..d2fbba4f509 100644
--- a/gcc/config/rs6000/aix71.h
+++ b/gcc/config/rs6000/aix71.h
@@ -77,7 +77,7 @@ do {  
\
   mcpu=power4: -mpwr4; \
   mcpu=power3: -m620; \
   mcpu=powerpc: -mppc; \
-  mcpu=rs64a: -mppc; \
+  mcpu=rs64: -mppc; \
   mcpu=603: -m603; \
   mcpu=603e: -m603; \
   mcpu=604: -m604; \
@@ -88,8 +88,7 @@ do {  
\
   !mcpu*: %{mvsx: -mpwr6; \
maltivec: -m970; \
maix64|mpowerpc64: -mppc64; \
-   : %(asm_default)}; \
-  :%eMissing -mcpu option in ASM_SPEC_CPU?\n} \
+   : %(asm_default)}} \
 -many"
 
 #undef ASM_DEFAULT_SPEC
diff --git a/gcc/config/rs6000/aix72.h b/gcc/config/rs6000/aix72.h
index cfb0258acce..211010748c6 100644
--- a/gcc/config/rs6000/aix72.h
+++ b/gcc/config/rs6000/aix72.h
@@ -77,7 +77,7 @@ do {  
\
   mcpu=power4: -mpwr4; \
   mcpu=power3: -m620; \
   mcpu=powerpc: -mppc; \
-  mcpu=rs64a: -mppc; \
+  mcpu=rs64: -mppc; \
   mcpu=603: -m603; \
   mcpu=603e: -m603; \
   mcpu=604: -m604; \
@@ -88,8 +88,7 @@ do {  
\
   !mcpu*: %{mvsx: -mpwr6; \
maltivec: -m970; \
maix64|mpowerpc64: -mppc64; \
-   : %(asm_default)}; \
-  :%eMissing -mcpu option in ASM_SPEC_CPU?\n} \
+   : %(asm_default)}} \
 -many"
 
 #undef ASM_DEFAULT_SPEC
diff --git a/gcc/config/rs6000/driver-rs6000.c 
b/gcc/config/rs6000/driver-rs6000.c
index 0a48d46d658..51c6b79e741 100644
--- a/gcc/config/rs6000/driver-rs6000.c
+++ b/gcc/config/rs6000/driver-rs6000.c
@@ -449,7 +449,7 @@ static const struct asm_name asm_names[] = {
   { "power8",  "-mpwr8" },
   { "power9",  "-mpwr9" },
   { "powerpc", "-mppc" },
-  { "rs64a",   "-mppc" },
+  { "rs64","-mppc" },
   { "603", "-m603" },
   { "603e","-m603" },
   { "604", "-m604" },
@@ -477,8 +477,9 @@ static const struct asm_name asm_names[] = {
   { "power9",  "-mpower9" },
   { "a2",  "-ma2" },
   { "powerpc", "-mppc" },
+  { "powerpc64", "-mppc64" },
   { "powerpc64le", "%{mpower9-vector:-mpower9;:-mpower8}" },
-  { "rs64a",   "-mppc64" },
+  { "rs64","-mppc64" },
   { "401", "-mppc" },
   { "403", "-m403" },
   { "405", "-m405" },
@@ -519,6 +520,7 @@ static const struct asm_name asm_names[] = {
   { "e500mc64","-me500mc64" },
   { "e5500",   "-me5500" },
   { "e6500",   "-me6500" },
+  { "titan",   "-mtitan" },
   { NULL,  "\
 %{mpower9-vector: -mpower9; \
   mpower8-vector|mcrypto|mdirect-move|mhtm: -mpower8; \
diff --git a/gcc/config/rs6000/rs6000.h b/gcc/config/rs6000/rs6000.h
index 2a62679bdb8..e7e998d1492 100644
--- a/gcc/config/rs6000/rs6000.h
+++ b/gcc/config/rs6000/rs6000.h
@@ -87,9 +87,10 @@
   mcpu=power4: -mpower4; \
   mcpu=power3: -mppc64; \
   mcpu=powerpc: -mppc; \
+  mcpu=powerpc64: -mppc64; \
   mcpu=a2: -ma2; \
   mcpu=cell: -mcell; \
-  mcpu=rs64a: -mppc64; \
+  mcpu=rs64: -mppc64; \
   mcpu=401: -mppc; \
   mcpu=403: -m403; \
   mcpu=405: -m405; \
@@ -130,11 +131,12 @@
   mcpu=e500mc64: -me500mc64; \
   mcpu=e5500: -me5500; \
   mcpu=e6500: -me6500; \
+  mcpu=titan: -mtitan; \
   !mcpu*: %{mpower9-vector: -mpower9; \
mpower8-vector|mcrypto|mdirect-move|mhtm: -mpower8; \
mvsx: -mpower7; \
mpowerpc64: -mppc64;: %(asm_default)}; \
-  :%eMissing -mcpu option in ASM_SPEC_CPU?\n} \
+  :%eMissing -mcpu option in ASM_CPU_SPEC?\n} \
 %{mvsx: -mvsx -maltivec; maltivec: -maltivec} \
 -many"
 

-- 
Alan Modra
Australia Development Lab, IBM

[C++ Patch] Fixes for three grokbitfield diagnostics

2018-12-07 Thread Paolo Carlini


Hi,

these should be more or less straightforward (now that 
DECL_SOURCE_LOCATION often extracts a proper location ;). In the case of 
warn_if_not_aligned I also removed the redundant width check (it's never 
null, double checked by running the instrumented testsuite too) and I 
also removed the DECL_NAME use, which definitely doesn't play well with 
unnamed declarations (other/bitfield7.C exercises that too).


Tested x86_64-linux, as usual.

Thanks, Paolo.

//

/cp
2018-12-07  Paolo Carlini  

* decl2.c (grokbitfield): Use DECL_SOURCE_LOCATION in error messages
about bit-fields with function type, warn_if_not_aligned type, and
static bit-fields; avoid DECL_NAME for unnamed declarations.

/testsuite
2018-12-07  Paolo Carlini  

* g++.dg/other/bitfield7.C: New.
* g++.dg/parse/bitfield8.C: Likewise.
* g++.dg/parse/bitfield9.C: Likewise.
* g++.dg/pr53037-4.C: Test the locations too.
Index: cp/decl2.c
===
--- cp/decl2.c  (revision 266884)
+++ cp/decl2.c  (working copy)
@@ -1046,15 +1046,15 @@ grokbitfield (const cp_declarator *declarator,
  check here.  */
   if (TREE_CODE (value) == FUNCTION_DECL)
 {
-  error ("cannot declare bit-field %qD with function type",
-DECL_NAME (value));
+  error_at (DECL_SOURCE_LOCATION (value),
+   "cannot declare bit-field %qD with function type", value);
   return NULL_TREE;
 }
 
-  if (width && TYPE_WARN_IF_NOT_ALIGN (type))
+  if (TYPE_WARN_IF_NOT_ALIGN (type))
 {
-  error ("cannot declare bit-field %qD with % type",
-DECL_NAME (value));
+  error_at (DECL_SOURCE_LOCATION (value), "cannot declare bit-field "
+   "%qD with % type", value);
   return NULL_TREE;
 }
 
@@ -1067,7 +1067,8 @@ grokbitfield (const cp_declarator *declarator,
 
   if (TREE_STATIC (value))
 {
-  error ("static member %qD cannot be a bit-field", value);
+  error_at (DECL_SOURCE_LOCATION (value),
+   "static member %qD cannot be a bit-field", value);
   return NULL_TREE;
 }
 
Index: testsuite/g++.dg/other/bitfield7.C
===
--- testsuite/g++.dg/other/bitfield7.C  (nonexistent)
+++ testsuite/g++.dg/other/bitfield7.C  (working copy)
@@ -0,0 +1,7 @@
+typedef int __attribute__((warn_if_not_aligned(8))) intwna;
+
+struct S
+{
+  intwna : 2;  // { dg-error "cannot declare bit-field" }
+  intwna i : 2;  // { dg-error "10:cannot declare bit-field .i." }
+};
Index: testsuite/g++.dg/parse/bitfield8.C
===
--- testsuite/g++.dg/parse/bitfield8.C  (nonexistent)
+++ testsuite/g++.dg/parse/bitfield8.C  (working copy)
@@ -0,0 +1,4 @@
+struct A
+{
+  static int a : 1;  // { dg-error "14:static member .a. cannot be a 
bit-field" }
+};
Index: testsuite/g++.dg/parse/bitfield9.C
===
--- testsuite/g++.dg/parse/bitfield9.C  (nonexistent)
+++ testsuite/g++.dg/parse/bitfield9.C  (working copy)
@@ -0,0 +1,6 @@
+template
+struct A
+{
+  typedef T type();
+  type i : 2;  // { dg-error "8:cannot declare bit-field" }
+};
Index: testsuite/g++.dg/pr53037-4.C
===
--- testsuite/g++.dg/pr53037-4.C(revision 266884)
+++ testsuite/g++.dg/pr53037-4.C(working copy)
@@ -12,7 +12,7 @@ foo2 (void) /* { dg-error "'warn_if_not_aligned' m
 
 struct foo3
 {
-  int i : 2 __attribute__((warn_if_not_aligned(8))); /* { dg-error 
"'warn_if_not_aligned' may not be specified for 'i'" } */
+  int i : 2 __attribute__((warn_if_not_aligned(8))); /* { dg-error 
"7:'warn_if_not_aligned' may not be specified for 'i'" } */
 };
 
 typedef unsigned int __u32
@@ -20,5 +20,5 @@ typedef unsigned int __u32
 
 struct foo4
 {
-  __u32 i : 2; /* { dg-error "cannot declare bit-field 'i' with 
'warn_if_not_aligned' type" } */
+  __u32 i : 2; /* { dg-error "9:cannot declare bit-field 'i' with 
'warn_if_not_aligned' type" } */
 };

Re: RFC: libiberty PATCH to disable demangling of ancient mangling schemes

2018-12-07 Thread Jakub Jelinek

On Fri, Dec 07, 2018 at 10:27:17AM +, Nick Clifton wrote:
> >> Looks good to me.  Independently, do you see a reason not to disable the
> >> old demangler entirely?
> > 
> > Like so.  Does anyone object to this?  These mangling schemes haven't
> > been relevant in decades.
> 
> I am not really familiar with this old scheme, so please excuse my ignorance
> in asking these questions:
> 
>   * How likely is it that there are old toolchain in use out there that still 
> use the v2 mangling ?  Ie I guess that I am asking "which generation(s)
> of gcc used v2 mangling ?"

GCC 3.0 and up used the new (Itanium C++ ABI) mangling, 2.95 and older used the 
old
mangling (2.96-RH used the new mangling I believe).
So you need compiler older than 17.5 years to have the old mangling.
Such a compiler didn't support most of the contemporarily used platforms
though at all (e.g. x86-64, powerpc64le, aarch64, I believe not even
powerpc64-linux).

Jakub

Re: [PATCH] Fix PR63184

2018-12-07 Thread Richard Biener

On Fri, 7 Dec 2018, Richard Biener wrote:

> 
> The following fixes PR63184 by using tree-affine to resolve pointer
> comparisons.  Instead of trying to stick this into a match.pd pattern
> the following does this in the more constrained forwprop environment.
> 
> I've only implemented the cases where the comparison resolves to a
> compile-time value, not the case where for example &a[i] < &a[j]
> could be simplified to i < j.  I'm not sure I can trust the
> tree-affine machinery enough here to do that.
> 
> Both testcases require some CSE to happen thus the first forwprop
> pass doesn't catch it.
> 
> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
> 
> I'll collect some statistics from bootstrap.

Somewhat depressing.  There's a single instance in
libiberty/rust-demangle.c that gets resolved (a ordered compare).
This instance triggers 4 times during a c,c++ bootstrap compared
to 258098 affine expansion combinations tried.

It doesn't trigger in tramp3d at all (just to try a C++ code base).

I suspect the cases in PR63184 are arcane enough and usually we
have simpler addresses that are resolved with the existing
patterns.

I'll attach the patch to the PR and leave it alone.

Richard.

> Richard.
> 
> From 79e88f7d31def4c49fe0b401e7fda408ef447710 Mon Sep 17 00:00:00 2001
> From: Richard Guenther 
> Date: Fri, 7 Dec 2018 10:47:07 +0100
> Subject: [PATCH] fix-pr63184
> 
> 2018-12-07  Richard Biener  
> 
>   PR tree-optimization/63184
>   * tree-ssa-forwprop.c: Include tree-affine.h.
>   (ttacache): New global.
>   (forward_propagate_into_comparison_1): Use tree-affine to
>   resolve pointer comparisons.
>   (pass_forwprop::execute): Release ttacache.
> 
>   * c-c++-common/pr19807-2.c: Remove XFAIL.
>   * c-c++-common/pr19807-3.c: Likewise.
> 
> diff --git a/gcc/testsuite/c-c++-common/pr19807-2.c 
> b/gcc/testsuite/c-c++-common/pr19807-2.c
> index d2c010140d0..18949a9bc5a 100644
> --- a/gcc/testsuite/c-c++-common/pr19807-2.c
> +++ b/gcc/testsuite/c-c++-common/pr19807-2.c
> @@ -1,6 +1,5 @@
> -/* Some targets can optimize this on RTL.  */
> -/* { dg-do link { target { x86_64-*-* i?86-*-* } } } */
> -/* { dg-options "-O -fdump-tree-optimized" } */
> +/* { dg-do link } */
> +/* { dg-options "-O -fdump-tree-forwprop2" } */
>  
>  extern void link_error(void);
>  int i;
> @@ -12,4 +11,4 @@ int main()
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-not "link_error" "optimized" { xfail *-*-* } 
> } } */
> +/* { dg-final { scan-tree-dump-not "link_error" "forwprop2" } } */
> diff --git a/gcc/testsuite/c-c++-common/pr19807-3.c 
> b/gcc/testsuite/c-c++-common/pr19807-3.c
> index bb7f9827725..f6a1531677f 100644
> --- a/gcc/testsuite/c-c++-common/pr19807-3.c
> +++ b/gcc/testsuite/c-c++-common/pr19807-3.c
> @@ -1,6 +1,5 @@
> -/* Some targets can optimize this on RTL.  */
> -/* { dg-do link { target { x86_64-*-* i?86-*-* } } } */
> -/* { dg-options "-O -fdump-tree-optimized" } */
> +/* { dg-do link } */
> +/* { dg-options "-O -fdump-tree-forwprop2" } */
>  
>  extern void link_error(void);
>  int i;
> @@ -12,4 +11,4 @@ int main()
>return 0;
>  }
>  
> -/* { dg-final { scan-tree-dump-not "link_error" "optimized" { xfail *-*-* } 
> } } */
> +/* { dg-final { scan-tree-dump-not "link_error" "forwprop2" } } */
> diff --git a/gcc/tree-ssa-forwprop.c b/gcc/tree-ssa-forwprop.c
> index 7449eaf86ae..08e41e2d85d 100644
> --- a/gcc/tree-ssa-forwprop.c
> +++ b/gcc/tree-ssa-forwprop.c
> @@ -48,6 +48,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "optabs-tree.h"
>  #include "tree-vector-builder.h"
>  #include "vec-perm-indices.h"
> +#include "tree-affine.h"
>  
>  /* This pass propagates the RHS of assignment statements into use
> sites of the LHS of the assignment.  It's basically a specialized
> @@ -184,6 +185,8 @@ static tree rhs_to_tree (tree type, gimple *stmt);
>  
>  static bitmap to_purge;
>  
> +static hash_map *ttacache;
> +
>  /* Const-and-copy lattice.  */
>  static vec lattice;
>  
> @@ -459,9 +462,72 @@ forward_propagate_into_comparison_1 (gimple *stmt,
>/* If that wasn't successful either, try both operands.  */
>if (rhs0 != NULL_TREE
>&& rhs1 != NULL_TREE)
> -tmp = combine_cond_expr_cond (stmt, code, type,
> -   rhs0, rhs1,
> -   !(single_use0_p && single_use1_p));
> +{
> +  tmp = combine_cond_expr_cond (stmt, code, type,
> + rhs0, rhs1,
> + !(single_use0_p && single_use1_p));
> +  if (tmp)
> + return tmp;
> +}
> +
> +  /* When comparing two pointers try harder to resolve it by expanding
> + definitions (albeit not using our lattice) and looking at the
> + difference using the affine machinery.  */
> +  if (POINTER_TYPE_P (TREE_TYPE (op0))
> +  && TREE_CODE (op0) == SSA_NAME
> +  && TREE_CODE (op1) == SSA_NAME)
> +{
> +  aff_tree aff0, aff1;
> +  tree_to_aff_c

Re: Make clear, when contributions will be ignored

2018-12-07 Thread Дилян Палаузов

Hello,

will it help, if Bugzilla is reprogrammed to send automatically weekly
reminders on all patches, that are not integrated yet?

Will lt help, if I hire myself to integrate the patch, or shall I
rather hire somebody to send reminders?

If something can be done after sending a reminder, then it can be
arranged also without reminders.  In particular, dealing with reminders
is avoidable extra work.

Whether people are paid or not, does not change on the subject very
much.  I have experienced organizations, where people are not paid and
they manage to tackle everything.  I have seen organizations where
people are paid and they do not get the management right.

I am not speaking about having some strict time to get a response, but
rather to ensure an answer in reasonable time.  No answer in reasonable
time is the same as ignorance — the subject of this thread.

The patch I proposed on 27th Oct was first submitted towards GDB and
then I was told to send it to GCC.  Here I was told to sent it to GDB. 
What shall happen to quit the loop?

In any case, if the common aim is to have a system where contributions
do not get lost, then I’m sure the workflows can be adjusted to achieve
this aim.

Regards
  Дилян

On Wed, 2018-12-05 at 17:37 +, Joseph Myers wrote:
> On Wed, 5 Dec 2018, Segher Boessenkool wrote:
> 
> > Patches are usually ignored because everyone thinks someone else will
> > handle it.
> 
> And in this case, it looks like this patch would best be reviewed first in 
> the GDB context - then once committed to binutils-gdb, the committer could 
> post to gcc-patches (CC:ing build system maintainers) requesting a commit 
> to GCC if they don't have write access to GCC themselves.  I consider 
> synchronizing changes to such top-level files in either direction to be 
> obvious and not to need a separate review.
>

Re: [PATCH, driver specs][2] Put -flto-partition= on the collect2 c/l

2018-12-07 Thread Richard Biener

On Fri, Dec 7, 2018 at 1:24 AM Iain Sandoe  wrote:
>
> Hi
>
> This got stuck in my stack of patches for LTO debug support, and I forgot to 
> ping it…
>
> > On 22 Aug 2018, at 14:20, Richard Biener  wrote:
> >
> > On Wed, Aug 22, 2018 at 2:56 PM Iain Sandoe  wrote:
> >>
> >>
> >>> On 20 Aug 2018, at 11:01, Richard Biener  
> >>> wrote:
> >>>
> >>> On Sat, Aug 18, 2018 at 9:00 PM Iain Sandoe  wrote:
> 
> >>
>  I plan on making Darwin default to fno-lto for link unless there’s an 
>  “flto*” on the link line, since otherwise there’s a process launch for 
>  every object on the c/l to do “nm”, which is quite heavy weight.  At 
>  present, ISTM if the compiler is configured with —enable-lto, the link 
>  line defaults to assuming that every object needs to be checked.
> >>>
> >>> I think we wanted to transparently handle LTO objects (similar to how
> >>> it works with a linker plugin).
> >>
> >> At present, we are not only calling nm for every object, but also dsymutil 
> >> sometimes unnecessarily - since the assumption on the “maybe_lto” path is 
> >> that an temporary file will be generated.
> >>
> >> as a headline - when I did the simplistic “default is to assume fno-lto” 
> >> that took 10% off the bootstrap time (with a stage#1 compiler without the 
> >> optimisation).
> >>
> >> - I have a patch under test as below + better fidelity of avoiding 
> >> dsymutil runs.
> >>
> >>> Ideally collect2 wouldn't use nm but simple-object to inspect files
> >>> (but that doesn't have symbol table query support
> >>
> >> Hrm.my WIP had grow symtab awareness to support debug-section copy on 
> >> mach-o
> >> (and the ELF impl. looks like it understands both symtab and relocs)
> >> - so maybe that’s not as far away as we might think.
> >>
> >>> though looking for LTO specific sections would have been better in the
> >>> first place - I'd suggest .gnu.lto_.symtab).
> >>
> >> I’ve done this noting that the symtab seems to be the last emitted section 
> >> - is that why you chose it (for security, rather than just stopping as 
> >> soon as we see a .gnu.lto_. section?)
> >
> > Ah, any .gnu.lto_.section would work as well I guess - I just picked
> > one that should be always
> > created.  .gnu.lto_.opts is another one.
>
> >> comments?
> >
> > OK.  Feel free to stop when recognizing any LTO section.  Btw, as you
> > include lto-section-names.h
> > you should probably use LTO_SECTION_NAME_PREFIX instead of hard-coding
> > .gnu.lto_.
>
> So I just look for the LTO_SECTION_NAME_PREFIX
>
> > I'm not sure if/how we should handle offload sections and if those
> > work at all without a
> > linker plugin currently?
>
> In this version, I look for the OFFLOAD_SECTION_NAME_PREFIX as well,
> although AFAICS collect2 itself has no specific handling of offload, the 
> detection
> is done in lto-wrapper.
>
> OK for trunk now?

OK.

Thanks,
Richard.

> thanks
> Iain
>
>
> From: Iain Sandoe 
> Date: Tue, 21 Aug 2018 12:55:02 +0100
> Subject: [PATCH] [PATCH, collect2] Use simple-object to find out if objects
>  contain LTO.
>
> This replaces the use of nm to search for the LTO common symbol marker
> and uses simple object to see if there's a section starting with
> ".gnu.lto_." or .gnu.offload_lto_
> ---
>  gcc/collect2.c | 120 +++--
>  1 file changed, 57 insertions(+), 63 deletions(-)
>
> diff --git a/gcc/collect2.c b/gcc/collect2.c
> index 60269682ec..dcbd3e18a6 100644
> --- a/gcc/collect2.c
> +++ b/gcc/collect2.c
> @@ -30,6 +30,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tm.h"
>  #include "filenames.h"
>  #include "file-find.h"
> +#include "simple-object.h"
> +#include "lto-section-names.h"
>
>  /* TARGET_64BIT may be defined to use driver specific functionality. */
>  #undef TARGET_64BIT
> @@ -804,7 +806,9 @@ maybe_run_lto_and_relink (char **lto_ld_argv, char 
> **object_lst,
>/* Run the linker again, this time replacing the object files
>   optimized by the LTO with the temporary file generated by the LTO.  
> */
>fork_execute ("ld", out_lto_ld_argv, HAVE_GNU_LD && at_file_supplied);
> -  post_ld_pass (true);
> +  /* We assume that temp files were created, and therefore we need to 
> take
> + that into account (maybe run dsymutil).  */
> +  post_ld_pass (/*temp_file*/true);
>free (lto_ld_argv);
>
>maybe_unlink_list (lto_o_files);
> @@ -814,10 +818,11 @@ maybe_run_lto_and_relink (char **lto_ld_argv, char 
> **object_lst,
>/* Our caller is relying on us to do the link
>   even though there is no LTO back end work to be done.  */
>fork_execute ("ld", lto_ld_argv, HAVE_GNU_LD && at_file_supplied);
> -  post_ld_pass (false);
> +  /* No LTO objects were found, so no new temp file.  */
> +  post_ld_pass (/*temp_file*/false);
>  }
>else
> -post_ld_pass (true);
> +post_ld_pass (false); /* No LTO objects were found, no temp file.  */
>  }
>
>

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

2018-12-07 Thread Thomas Schwinge

Hi Chung-Lin!

On Tue, 25 Sep 2018 21:10:47 +0800, Chung-Lin Tang  
wrote:
> These are the OpenACC specific changes, mostly the re-implementation of 
> async-related acc_* runtime
> library API functions to use the new backend plugin interfaces, in a 
> non-target specific way.

(The patch was missing the "libgomp/oacc-int.h" changes, but I had no
problem replicating these from openacc-gcc-8-branch.)


Does the following make sense?

commit f86b403dbe6ed17afa8d157ec4089ff169a63680
Author: Thomas Schwinge 
Date:   Fri Dec 7 12:19:56 2018 +0100

Don't create an asyncqueue just to then test/synchronize with it
---
 libgomp/oacc-async.c| 12 
 libgomp/oacc-parallel.c |  4 +++-
 2 files changed, 11 insertions(+), 5 deletions(-)

diff --git libgomp/oacc-async.c libgomp/oacc-async.c
index 553082fe3d4a..c9b134ac3380 100644
--- libgomp/oacc-async.c
+++ libgomp/oacc-async.c
@@ -119,8 +119,11 @@ acc_async_test (int async)
   if (!thr || !thr->dev)
 gomp_fatal ("no device active");
 
-  goacc_aq aq = lookup_goacc_asyncqueue (thr, true, async);
-  return thr->dev->openacc.async.test_func (aq);
+  goacc_aq aq = lookup_goacc_asyncqueue (thr, false, async);
+  if (!aq)
+return 1;
+  else
+return thr->dev->openacc.async.test_func (aq);
 }
 
 int
@@ -148,8 +151,9 @@ acc_wait (int async)
 
   struct goacc_thread *thr = get_goacc_thread ();
 
-  goacc_aq aq = lookup_goacc_asyncqueue (thr, true, async);
-  thr->dev->openacc.async.synchronize_func (aq);
+  goacc_aq aq = lookup_goacc_asyncqueue (thr, false, async);
+  if (aq)
+thr->dev->openacc.async.synchronize_func (aq);
 }
 
 /* acc_async_wait is an OpenACC 1.0 compatibility name for acc_wait.  */
diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
index 7b6a6e515018..0be48f98036f 100644
--- libgomp/oacc-parallel.c
+++ libgomp/oacc-parallel.c
@@ -491,7 +491,9 @@ goacc_wait (int async, int num_waits, va_list *ap)
 {
   int qid = va_arg (*ap, int);
   
-  goacc_aq aq = get_goacc_asyncqueue (qid);
+  goacc_aq aq = lookup_goacc_asyncqueue (thr, false, qid);
+  if (!aq)
+   continue;
   if (acc_dev->openacc.async.test_func (aq))
continue;
   if (async == acc_async_sync)


Grüße
 Thomas

Re: RFC: libiberty PATCH to disable demangling of ancient mangling schemes

2018-12-07 Thread Richard Biener

On Thu, Dec 6, 2018 at 10:22 PM Jason Merrill  wrote:
>
> On Thu, Dec 6, 2018 at 11:14 AM Jason Merrill  wrote:
> >
> > Looks good to me.  Independently, do you see a reason not to disable the
> > old demangler entirely?
>
> Like so.  Does anyone object to this?  These mangling schemes haven't
> been relevant in decades.

Why #ifdef the code?  Just rip it out?

Richard.

Re: Another patch for PR88282

2018-12-07 Thread Richard Sandiford

Vladimir Makarov  writes:
>    Here is another solution for
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88282
>
>    less hackish than original one.
>
>    The patch was bootstrapped and tested on x86/x86-64/ppc64/aarch64.
>
>    Committed as rev. 266862.

Nice!  Thanks for doing this.

Richard

Re: [PATCH, GCC, AARCH64, 3/6] Restrict indirect tail calls to x16 and x17

2018-12-07 Thread James Greenhalgh

On Thu, Nov 29, 2018 at 10:56:46AM -0600, Sudakshina Das wrote:
> Hi
> 
> On 02/11/18 18:37, Sudakshina Das wrote:
> > Hi
> > 
> > This patch is part of a series that enables ARMv8.5-A in GCC and
> > adds Branch Target Identification Mechanism.
> > (https://developer.arm.com/products/architecture/cpu-architecture/a-profile/exploration-tools)
> > 
> > This patch changes the registers that are allowed for indirect tail
> > calls. We are choosing to restrict these to only x16 or x17.
> > 
> > Indirect tail calls are special in a way that they convert a call
> > statement (BLR instruction) to a jump statement (BR instruction). For
> > the best possible use of Branch Target Identification Mechanism, we
> > would like to place a "BTI C" (call) at the beginning of the function
> > which is only compatible with BLRs and BR X16/X17. In order to make
> > indirect tail calls compatible with this scenario, we are restricting
> > the TAILCALL_ADDR_REGS.
> > 
> > In order to use x16/x17 for this purpose, we also had to change the use
> > of these registers in the epilogue/prologue handling. For this purpose
> > we are now using x12 and x13 named as EP0_REGNUM and EP1_REGNUM as
> > scratch registers for epilogue and prologue.
> > 
> > Bootstrapped and regression tested with aarch64-none-linux-gnu. Updated
> > test. Ran Spec2017 and no performance hit.
> > 
> > Is this ok for trunk?

While this isn't strictly needed outside of compilation for targets with BTI
protection enabled, I can well appreciate the simplification in our backend
code to avoid special cases in these areas.

I don't forsee a high likelihood of performance issues from this patch, 
but please do keep an eye out for any reports as we move through stage 3.

This is OK for trunk.

Thanks,
James


> > 
> > Thanks
> > Sudi
> > 
> > 
> > *** gcc/ChangeLog***
> > 
> > 2018-xx-xx  Sudakshina Das  
> > 
> >* config/aarch64/aarch64.c (aarch64_expand_prologue): Use new
> >epilogue/prologue scratch registers EP0_REGNUM and EP1_REGNUM.
> >(aarch64_expand_epilogue): Likewise.
> >(aarch64_output_mi_thunk): Likewise
> >* config/aarch64/aarch64.h (REG_CLASS_CONTENTS): Change
> > TAILCALL_ADDR_REGS
> >to x16 and x17.
> >* config/aarch64/aarch64.md: Define EP0_REGNUM and EP1_REGNUM.
> > 
> > *** gcc/testsuite/ChangeLog ***
> > 
> > 2018-xx-xx  Sudakshina Das  
> > 
> >* gcc.target/aarch64/test_frame_17.c: Update to check for
> > EP0_REGNUM instead of IP0_REGNUM and add test case.
> > 
> I have edited the patch to take out a change that was not needed as part
> of this patch in aarch64_expand_epilogue. The only change now happening
> there is as mentioned in the ChangeLog to replace the uses of IP0/IP1.
> ChangeLog still applies.

Re: [PATCH][Version 3]Come up with -flive-patching master option.

2018-12-07 Thread Rainer Orth

Hi Qing,

>> On Nov 28, 2018, at 9:52 AM, Jan Hubicka  wrote:
>> 
>>> 
>>> 2018-11-20  qing zhao  
>>> 
>>> * cif-code.def (EXTERN_LIVE_ONLY_STATIC): New CIF code.
>>> * common.opt: Add -flive-patching flag.
>>> * doc/invoke.texi: Document -flive-patching.
>>> * flag-types.h (enum live_patching_level): New enum.
>>> * ipa-inline.c (can_inline_edge_p): Disable external functions from
>>> inlining when flag_live_patching is LIVE_PATCHING_INLINE_ONLY_STATIC.
>>> * opts.c (control_options_for_live_patching): New function.
>>> (finish_options): Make flag_live_patching incompatible with flag_lto.
>>> Control IPA optimizations based on different levels of 
>>> flag_live_patching.
>>> 
>>> gcc/testsuite/ChangeLog:
>>> 
>>> 2018-11-20  qing zhao  
>>> 
>>> * gcc.dg/live-patching-1.c: New test.
>>> * gcc.dg/live-patching-2.c: New test.
>>> * gcc.dg/live-patching-3.c: New test.
>>> * gcc.dg/tree-ssa/writeonly-3.c: New test.
>>> * gcc.target/i386/ipa-stack-alignment-2.c: New test.
[...]
>> Patch is OK,
>
> thanks for the review.
>
> I will commit the patch very soon.

the new gcc.target/i386/ipa-stack-alignment-2.c testcase FAILs on
Solaris/x86:

FAIL: gcc.target/i386/ipa-stack-alignment-2.c scan-assembler sub.*%.sp

Like ipa-stack-alignment.c, it needs -fomit-frame-pointer.  Fixed as
follows, tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu, 32 and
64-bit each.  Installed on mainline.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2018-12-06  Rainer Orth  

* gcc.target/i386/ipa-stack-alignment-2.c: Add
-fomit-frame-pointer to dg-options.

# HG changeset patch
# Parent  580b8c24b018674f1ffd56a9c672129f746682c5
Build gcc.target/i386/ipa-stack-alignment-2.c with -fomit-frame-pointer

diff --git a/gcc/testsuite/gcc.target/i386/ipa-stack-alignment-2.c b/gcc/testsuite/gcc.target/i386/ipa-stack-alignment-2.c
--- a/gcc/testsuite/gcc.target/i386/ipa-stack-alignment-2.c
+++ b/gcc/testsuite/gcc.target/i386/ipa-stack-alignment-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-flive-patching -O" } */
+/* { dg-options "-flive-patching -O -fomit-frame-pointer" } */
 
 typedef struct {
   long a;

Pass GDCFLAGS and CCASFLAGS to libphobos subdirs

2018-12-07 Thread Rainer Orth

When trying to rebuild libphobos with -g3 -O0 for better debugging, I
noticed that GDCFLAGS weren't passed down as expected.  It turned out
that they are missing from AM_MAKEFLAGS.  After I fixed this, the only
file still compiled with -g -O2 was libdruntime/core/threadasm.S, so I
added CCASFLAGS, too.

Tested on i386-pc-solaris2.11.  Ok for mainline?

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


2018-12-06  Rainer Orth  

* Makefile.am (AM_MAKEFLAGS): Pass CCASFLAGS, GDCFLAGS.
* Makefile.in: Regenerate.

# HG changeset patch
# Parent  5910a51aed6e6de5841187a79b17799d6e2241b5
Pass GDCFLAGS and CCASFLAGS to libphobos subdirs

diff --git a/libphobos/Makefile.am b/libphobos/Makefile.am
--- a/libphobos/Makefile.am
+++ b/libphobos/Makefile.am
@@ -29,12 +29,14 @@ AM_MAKEFLAGS = \
 	"AR_FLAGS=$(AR_FLAGS)" \
 	"CC_FOR_BUILD=$(CC_FOR_BUILD)" \
 	"CC_FOR_TARGET=$(CC_FOR_TARGET)" \
+	"CCASFLAGS=$(CCASFLAGS)" \
 	"CFLAGS=$(CFLAGS)" \
 	"CXXFLAGS=$(CXXFLAGS)" \
 	"CFLAGS_FOR_BUILD=$(CFLAGS_FOR_BUILD)" \
 	"CFLAGS_FOR_TARGET=$(CFLAGS_FOR_TARGET)" \
 	"GDC_FOR_TARGET=$(GDC_FOR_TARGET)" \
 	"GDC=$(GDC)" \
+	"GDCFLAGS=$(GDCFLAGS)" \
 	"INSTALL=$(INSTALL)" \
 	"INSTALL_DATA=$(INSTALL_DATA)" \
 	"INSTALL_PROGRAM=$(INSTALL_PROGRAM)" \

Re: [committed] PR testsuite/86540, twiddle for aarch64

2018-12-07 Thread Richard Earnshaw (lists)

On 06/12/2018 15:36, Jeff Law wrote:
> 
> As outlined in the PR, the aarch64 has a non-default value for
> CASE_VALUES_THRESHOLD which changes decisions in switch lowering.  Those
> changes in switch lowering can expose additional jump threads later in
> the pipeline which cause heartburn for a couple tests.
> 
> I looked at all the other ports with a non-default value of
> CASE_VALUES_THRESHOLD and only aarch64 is high enough to trigger these
> changes in behavior on the two relevant tests.  So I'm just skipping the
> tests that run after switch lowering on aarch64.
> 
> Verified with a cross that these tests now pass.
> 
> Committing to the trunk,
> 

Can't we use a param to force the value back to (near) the default?
That would then work even if other targets start changing the default here.

R.


> Jeff
> 
> 
> P
> 
> diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
> index 1adb751cd34..0272bbe0605 100644
> --- a/gcc/testsuite/ChangeLog
> +++ b/gcc/testsuite/ChangeLog
> @@ -1,3 +1,10 @@
> +2018-12-06  Jeff Law  
> +
> + PR testsuite/86540
> + * gcc.dg/tree-ssa/ssa-dom-thread-7.c: Skip the post switch conversion
> + tests on aarch64.
> + * gcc.dg/tree-ssa/pr77445-2.c: Similarly.
> + 
>  2018-12-06  David Malcolm  
>  
>   PR c++/85110
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
> index eecfc4b195a..c5d567dabdc 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/pr77445-2.c
> @@ -118,10 +118,14 @@ enum STATES FMS( u8 **in , u32 *transitions) {
>  
>  /* The profile is not updated perfectly because it is inconsitent from
> profile estimation stage. But the number of inconsistencies should not
> -   increase much.  */
> +   increase much. 
> +
> +   aarch64 has the highest CASE_VALUES_THRESHOLD in GCC.  It's high enough
> +   to change decisions in switch expansion which in turn can expose new
> +   jump threading opportunities.  Skip the later tests on aarch64.  */
>  /* { dg-final { scan-tree-dump "Jumps threaded: 1\[1-9\]" "thread1" } } */
>  /* { dg-final { scan-tree-dump-times "Invalid sum" 3 "thread1" } } */
>  /* { dg-final { scan-tree-dump-not "not considered" "thread1" } } */
>  /* { dg-final { scan-tree-dump-not "not considered" "thread2" } } */
> -/* { dg-final { scan-tree-dump-not "not considered" "thread3" } } */
> -/* { dg-final { scan-tree-dump-not "not considered" "thread4" } } */
> +/* { dg-final { scan-tree-dump-not "not considered" "thread3" { target { ! 
> aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump-not "not considered" "thread4" { target { ! 
> aarch64*-*-* } } } } */ 
> diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c 
> b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
> index e395de26ec0..f833aa4351d 100644
> --- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
> +++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-dom-thread-7.c
> @@ -3,8 +3,11 @@
>  /* { dg-final { scan-tree-dump "Jumps threaded: 16"  "thread1" } } */
>  /* { dg-final { scan-tree-dump "Jumps threaded: 9" "thread2" } } */
>  /* { dg-final { scan-tree-dump "Jumps threaded: 1"  "dom2" } } */
> -/* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom3" } } */
> -/* { dg-final { scan-tree-dump-not "Jumps threaded"  "vrp2" } } */
> +/* aarch64 has the highest CASE_VALUES_THRESHOLD in GCC.  It's high enough
> +   to change decisions in switch expansion which in turn can expose new
> +   jump threading opportunities.  Skip the later tests on aarch64.  */
> +/* { dg-final { scan-tree-dump-not "Jumps threaded"  "dom3" { target { ! 
> aarch64*-*-* } } } } */
> +/* { dg-final { scan-tree-dump-not "Jumps threaded"  "vrp2" { target { ! 
> aarch64*-*-* } } } } */
>  
>  /* Most architectures get 3 threadable paths here, whereas aarch64 and
> possibly others get 5.  We really should rewrite threading tests to
>

Re: [PATCH] OpenACC 2.6 manual deep copy support (attach/detach)

2018-12-07 Thread Jakub Jelinek

On Fri, Nov 30, 2018 at 03:41:09AM -0800, Julian Brown wrote:
>   gcc/c-family/
>   * c-pragma.h (pragma_omp_clause): Add PRAGMA_OACC_CLAUSE_ATTACH,
>   PRAGMA_OACC_CLAUSE_DETACH.
...
> @@ -11804,9 +11808,12 @@ c_parser_omp_variable_list (c_parser *parser,
>   case OMP_CLAUSE_MAP:
>   case OMP_CLAUSE_FROM:
>   case OMP_CLAUSE_TO:
> -   while (c_parser_next_token_is (parser, CPP_DOT))
> +   while (c_parser_next_token_is (parser, CPP_DOT)
> +  || c_parser_next_token_is (parser, CPP_DEREF))
>   {
> location_t op_loc = c_parser_peek_token (parser)->location;
> +   if (c_parser_next_token_is (parser, CPP_DEREF))
> + t = build_simple_mem_ref (t);

This change is not ok, if OpenACC allows it in clauses, OpenMP 4.5 does not
and OpenMP 5.0 allows arbitrary lvalues that will need to be handled
differently (still unimplemented).  So, this needs to be guarded for OpenACC
only (perhaps for selected OpenACC clauses)?

> @@ -12632,6 +12631,8 @@ handle_omp_array_sections_1 (tree c, tree t, 
> vec &types,
>   }
> t = TREE_OPERAND (t, 0);
>   }
> +   if (TREE_CODE (t) == MEM_REF)
> + t = TREE_OPERAND (t, 0);

Again, better guard this for OpenACC.  Maybe verify that mem_ref_offset is 0?

> @@ -14163,6 +14214,8 @@ c_finish_omp_clauses (tree clauses, enum 
> c_omp_region_type ort)
>   }
> if (remove)
>   break;
> +   if (TREE_CODE (t) == MEM_REF)
> + t = TREE_OPERAND (t, 0);

Guard again?

> @@ -31832,15 +31836,19 @@ cp_parser_omp_var_list_no_open (cp_parser *parser, 
> enum omp_clause_code kind,
>   case OMP_CLAUSE_MAP:
>   case OMP_CLAUSE_FROM:
>   case OMP_CLAUSE_TO:
> -   while (cp_lexer_next_token_is (parser->lexer, CPP_DOT))
> +   while (cp_lexer_next_token_is (parser->lexer, CPP_DOT)
> +  || cp_lexer_next_token_is (parser->lexer, CPP_DEREF))

Ditto as for C.

> @@ -4691,6 +4690,19 @@ handle_omp_array_sections_1 (tree c, tree t, vec 
> &types,
>if (low_bound == NULL_TREE)
>  low_bound = integer_zero_node;
>  
> +  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
> +  && (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH
> +   || OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_DETACH))
> +{
> +  if (length != integer_one_node)
> + {
> +   error_at (OMP_CLAUSE_LOCATION (c),
> + OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH
> + ? "array section in % clause"
> + : "array section in % clause");

So, are any array sections invalid, including e.g. [0:1] or say
[5:] where size of the array is 6 elts, or what exactly is invalid?

> +  if (TREE_CODE (type) != POINTER_TYPE)
> + {
> +   error_at (OMP_CLAUSE_LOCATION (c),
> + OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_ATTACH
> + ? "expected pointer in % clause"
> + : "expected pointer in % clause");

Perhaps you can use %qs and omp_clause_name [OMP_CLAUSE_CODE (c)] ?
> +   return true;
> + }
> +}
> +
> +  return false;
> +}
> +
>  /* For all elements of CLAUSES, validate them vs OpenMP constraints.
> Remove any elements from the list that are invalid.  */
>  
> @@ -6288,7 +6337,7 @@ finish_omp_clauses (tree clauses, enum 
> c_omp_region_type ort)
>   t = OMP_CLAUSE_DECL (c);
>   check_dup_generic_t:
> if (t == current_class_ptr
> -   && (ort != C_ORT_OMP_DECLARE_SIMD
> +   && ((ort != C_ORT_OMP_DECLARE_SIMD && ort != C_ORT_ACC)
> || (OMP_CLAUSE_CODE (c) != OMP_CLAUSE_LINEAR
> && OMP_CLAUSE_CODE (c) != OMP_CLAUSE_UNIFORM)))
>   {
> @@ -6352,8 +6401,7 @@ finish_omp_clauses (tree clauses, enum 
> c_omp_region_type ort)
>   handle_field_decl:
> if (!remove
> && TREE_CODE (t) == FIELD_DECL
> -   && t == OMP_CLAUSE_DECL (c)
> -   && ort != C_ORT_ACC)
> +   && t == OMP_CLAUSE_DECL (c))
>   {
> OMP_CLAUSE_DECL (c)
>   = omp_privatize_field (t, (OMP_CLAUSE_CODE (c)
> @@ -6420,7 +6468,7 @@ finish_omp_clauses (tree clauses, enum 
> c_omp_region_type ort)
>   omp_note_field_privatization (t, OMP_CLAUSE_DECL (c));
> else
>   t = OMP_CLAUSE_DECL (c);
> -   if (t == current_class_ptr)
> +   if (ort != C_ORT_ACC && t == current_class_ptr)
>   {
> error_at (OMP_CLAUSE_LOCATION (c),
>   "% allowed in OpenMP only in %"
> @@ -6907,7 +6955,7 @@ finish_omp_clauses (tree clauses, enum 
> c_omp_region_type ort)
>   }
> if (t == error_mark_node)
>   remove = true;
> -   else if (t == current_class_ptr)
> +   else if (ort != C_ORT_ACC && t == current_class_ptr)
>   {
> error_at (OMP_CLAUSE_LOCATION (c),
>   "% allowed

Re: [PATCH, OpenACC] Fortran "declare create"/allocate support for OpenACC

2018-12-07 Thread Jakub Jelinek

On Thu, Oct 04, 2018 at 02:04:13PM +0100, Julian Brown wrote:
>   gcc/
>   * omp-low.c (scan_sharing_clauses): Update handling of OpenACC declare
>   create, declare copyin and declare deviceptr to have local lifetimes.
>   (convert_to_firstprivate_int): Handle pointer types.
>   (convert_from_firstprivate_int): Likewise.  Create local storage for
>   the values being pointed to.  Add new orig_type argument.
>   (lower_omp_target): Handle GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
>   Add orig_type argument to convert_from_firstprivate_int call.
>   Allow pointer types with GOMP_MAP_FIRSTPRIVATE_INT.  Don't privatize
>   firstprivate VLAs.
>   * tree-pretty-print.c (dump_omp_clause): Handle
>   GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
> 
>   gcc/fortran/
>   * gfortran.h (enum gfc_omp_map_op): Add OMP_MAP_DECLARE_ALLOCATE,
>   OMP_MAP_DECLARE_DEALLOCATE.
>   (gfc_omp_clauses): Add update_allocatable.
>   * trans-array.c (gfc_array_allocate): Call
>   gfc_trans_oacc_declare_allocate for decls that have oacc_declare_create
>   attribute set.
>   * trans-decl.c (add_attributes_to_decl): Enable lowering of OpenACC
>   declare create, declare copyin and declare deviceptr clauses.
>   (find_module_oacc_declare_clauses): Relax oacc_declare_create to
>   OMP_MAP_ALLOC, and oacc_declare_copyin to OMP_MAP_TO, in order to
>   match OpenACC 2.5 semantics.
>   (finish_oacc_declare): Reset module_oacc_clauses before scanning each
>   namespace.
>   * trans-openmp.c (gfc_trans_omp_clauses): Use GOMP_MAP_ALWAYS_POINTER
>   (for update directive) or GOMP_MAP_FIRSTPRIVATE_POINTER (otherwise) for
>   allocatable scalar decls.  Handle OMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}
>   clauses.
>   (gfc_trans_oacc_executable_directive): Use GOMP_MAP_ALWAYS_POINTER
>   for allocatable scalar data clauses inside acc update directives.
>   (gfc_trans_oacc_declare_allocate): New function.
>   * trans-stmt.c (gfc_trans_allocate): Call
>   gfc_trans_oacc_declare_allocate for decls with oacc_declare_create
>   attribute set.
>   (gfc_trans_deallocate): Likewise.
>   * trans.h (gfc_trans_oacc_declare_allocate): Declare.
> 
>   gcc/testsuite/
>   * gfortran.dg/goacc/declare-allocatable-1.f90: New test.
> 
>   include/
>   * gomp-constants.h (enum gomp_map_kind): Define
>   GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE} and GOMP_MAP_FLAG_SPECIAL_4.
> 
>   libgomp/
>   * oacc-mem.c (gomp_acc_declare_allocate): New function.
>   * oacc-parallel.c (GOACC_enter_exit_data): Handle
>   GOMP_MAP_DECLARE_{ALLOCATE,DEALLOCATE}.
>   * testsuite/libgomp.oacc-fortran/allocatable-array.f90: New test.
>   * testsuite/libgomp.oacc-fortran/allocatable-scalar.f90: New test.
>   * testsuite/libgomp.oacc-fortran/declare-allocatable-1.f90: New test.
>   * testsuite/libgomp.oacc-fortran/declare-allocatable-2.f90: New test.
>   * testsuite/libgomp.oacc-fortran/declare-allocatable-3.f90: New test.
>   * testsuite/libgomp.oacc-fortran/declare-allocatable-4.f90: New test.

If Thomas is ok with this, it is fine for me.

Jakub

Re: [PATCH, OpenACC] Enable GOMP_MAP_FIRSTPRIVATE_INT for OpenACC

2018-12-07 Thread Jakub Jelinek

On Thu, Dec 06, 2018 at 10:40:41PM +, Julian Brown wrote:
> + && (TREE_CODE (inner_type) == REAL_TYPE
> + || (!omp_is_reference (var)
> + && INTEGRAL_TYPE_P (inner_type))
> + || TREE_CODE (inner_type) == INTEGER_TYPE)

Not sure I understand the above.  INTEGRAL_TYPE_P is INTEGER_TYPE,
BOOLEAN_TYPE and ENUMERAL_TYPE, so you want to handle INTEGER_TYPE
no magger whether var should be passed by reference or not, but BOOLEAN_TYPE
or ENUMERAL_TYPE only if it is not a reference?
That is just weird.  Any test to back that up?

> + if ((TREE_CODE (inner_type) == REAL_TYPE
> +  || (!omp_is_reference (var)
> +  && INTEGRAL_TYPE_P (inner_type))
> +  || TREE_CODE (inner_type) == INTEGER_TYPE)

Ditto here.

Jakub

Re: [PATCH, libgcc/ARM & testsuite] Optimize executable size when using softfloat fmul/dmul

2018-12-07 Thread Richard Earnshaw (lists)

On 19/11/2018 09:57, Thomas Preudhomme wrote:
> Softfloat single precision and double precision floating-point
> multiplication routines in libgcc share some code with the
> floating-point division of their corresponding precision. As the code
> is structured now, this leads to *all* division code being pulled in an
> executable in softfloat mode even if only multiplication is
> performed.
> 
> This patch create some new LIB1ASMFUNCS macros to also build files with
> just the multiplication and shared code as weak symbols. By putting
> these earlier in the static library, they can then be picked up when
> only multiplication is used and they are overriden by the global
> definition in the existing file containing both multiplication and
> division code when division is needed.
> 
> The patch also removes changes made to the FUNC_START and ARM_FUNC_START
> macros in r218124 since the intent was to put multiplication and
> division code into their own section in a later patch to achieve the
> same size optimization. That approach relied on specific section layout
> to ensure multiplication and division were not too far from the shared
> bit of code in order to the branches to be within range. Due to lack of
> guarantee regarding section layout, in particular with all the
> possibility of linker scripts, this approach was chosen instead. This
> patch keeps the two testcases that were posted by Tony Wang (an Arm
> employee at the time) on the mailing list to implement this approach
> and adds a new one, hence the attribution.
> 
> ChangeLog entries are as follows:
> 
> *** gcc/ChangeLog ***
> 
> 2018-11-14  Thomas Preud'homme  
> 
> * config/arm/elf.h: Update comment about condition that need to
> match with libgcc/config/arm/lib1funcs.S to also include
> libgcc/config/arm/t-arm.
> * doc/sourcebuild.texi (output-exists, output-exists-not): Rename
> subsubsection these directives are in to "Check for output files".
> Move scan-symbol to that section and add to it new scan-symbol-not
> directive.
> 
> *** gcc/testsuite/ChangeLog ***
> 
> 2018-11-16  Tony Wang  
> Thomas Preud'homme  
> 
> * lib/lto.exp (lto-execute): Define output_file and testname_with_flags
> to same value as execname.
> (scan-symbol): Move and rename to ...
> * lib/gcc-dg.exp (scan-symbol-common): This.  Adapt into a
> helper function returning true or false if a symbol is present.
> (scan-symbol): New procedure.
> (scan-symbol-not): Likewise.
> * gcc.target/arm/size-optimization-ieee-1.c: New testcase.
> * gcc.target/arm/size-optimization-ieee-2.c: Likewise.
> * gcc.target/arm/size-optimization-ieee-3.c: Likewise.
> 
> *** libgcc/ChangeLog ***
> 
> 2018-11-16  Thomas Preud'homme  
> 
> * /config/arm/lib1funcs.S (FUNC_START): Remove unused sp_section
> parameter and corresponding code.
> (ARM_FUNC_START): Likewise in both definitions.
> Also update footer comment about condition that need to match with
> gcc/config/arm/elf.h to also include libgcc/config/arm/t-arm.
> * config/arm/ieee754-df.S (muldf3): Also build it if L_arm_muldf3 is
> defined.  Weakly define it in this case.
> * config/arm/ieee754-sf.S (mulsf3): Likewise with L_arm_mulsf3.
> * config/arm/t-elf (LIB1ASMFUNCS): Build _arm_muldf3.o and
> _arm_mulsf3.o before muldiv versions if targeting Thumb-1 only. Add
> comment to keep condition in sync with the one in
> libgcc/config/arm/lib1funcs.S and gcc/config/arm/elf.h.
> 
> Testing: Bootstrapped on arm-linux-gnueabihf (Arm & Thumb-2) and
> testsuite shows no
> regression. Also built an arm-none-eabi cross compiler targeting
> soft-float which also shows no regression. In particular newly added
> tests and gcc.dg/lto/20081212-1 test pass.

Which non-elf targets have you tested?

R.

> 
> Is this ok for stage3?
> 
> Best regards,
> 
> Thomas
> 
> 
> Optimize-size-fpmul_without_div.patch
> 
> From 8740697791f99b7175e188f049663883c39e51b0 Mon Sep 17 00:00:00 2001
> From: Thomas Preud'homme 
> Date: Fri, 26 Oct 2018 16:21:09 +0100
> Subject: [PATCH] [PATCH, libgcc/ARM] Optimize executable size when using
>  softfloat fmul/dmul
> 
> Softfloat single precision and double precision floating-point
> multiplication routines in libgcc share some code with the
> floating-point division of their corresponding precision. As the code
> is structured now, this leads to *all* division code being pulled in an
> executable in softfloat mode even if only multiplication is
> performed.
> 
> This patch create some new LIB1ASMFUNCS macros to also build files with
> just the multiplication and shared code as weak symbols. By putting
> these earlier in the static library, they can then be picked up when
> only multiplication is used and they are overriden by the global
> definition in the existing file containing both multiplication and
> division code when division is needed.
> 
> The patch also removes changes made to the FUNC_START a

Re: [PATCH 2/6, OpenACC, libgomp] Async re-work, oacc-* parts

2018-12-07 Thread Chung-Lin Tang

On 2018/12/7 07:32 PM, Thomas Schwinge wrote:
> Hi Chung-Lin!
> 
> On Tue, 25 Sep 2018 21:10:47 +0800, Chung-Lin Tang  
> wrote:
>> These are the OpenACC specific changes, mostly the re-implementation of 
>> async-related acc_* runtime
>> library API functions to use the new backend plugin interfaces, in a 
>> non-target specific way.
> 
> (The patch was missing the "libgomp/oacc-int.h" changes, but I had no
> problem replicating these from openacc-gcc-8-branch.)
> 
> 
> Does the following make sense?

I don't quite remember why I simply ensured asyncqueue creation here at the 
time,
maybe simply because it allowed simpler code at this level.

OTOH, the old logic is to GOMP_fatal upon such an unknown queue case, so maybe 
that's
the right thing to do (inside lookup_goacc_asyncqueue()), instead of silently 
allowing it to pass.

WDYT?

Chung-Lin


> commit f86b403dbe6ed17afa8d157ec4089ff169a63680
> Author: Thomas Schwinge 
> Date:   Fri Dec 7 12:19:56 2018 +0100
> 
> Don't create an asyncqueue just to then test/synchronize with it
> ---
>  libgomp/oacc-async.c| 12 
>  libgomp/oacc-parallel.c |  4 +++-
>  2 files changed, 11 insertions(+), 5 deletions(-)
> 
> diff --git libgomp/oacc-async.c libgomp/oacc-async.c
> index 553082fe3d4a..c9b134ac3380 100644
> --- libgomp/oacc-async.c
> +++ libgomp/oacc-async.c
> @@ -119,8 +119,11 @@ acc_async_test (int async)
>if (!thr || !thr->dev)
>  gomp_fatal ("no device active");
>  
> -  goacc_aq aq = lookup_goacc_asyncqueue (thr, true, async);
> -  return thr->dev->openacc.async.test_func (aq);
> +  goacc_aq aq = lookup_goacc_asyncqueue (thr, false, async);
> +  if (!aq)
> +return 1;
> +  else
> +return thr->dev->openacc.async.test_func (aq);
>  }
>  
>  int
> @@ -148,8 +151,9 @@ acc_wait (int async)
>  
>struct goacc_thread *thr = get_goacc_thread ();
>  
> -  goacc_aq aq = lookup_goacc_asyncqueue (thr, true, async);
> -  thr->dev->openacc.async.synchronize_func (aq);
> +  goacc_aq aq = lookup_goacc_asyncqueue (thr, false, async);
> +  if (aq)
> +thr->dev->openacc.async.synchronize_func (aq);
>  }
>  
>  /* acc_async_wait is an OpenACC 1.0 compatibility name for acc_wait.  */
> diff --git libgomp/oacc-parallel.c libgomp/oacc-parallel.c
> index 7b6a6e515018..0be48f98036f 100644
> --- libgomp/oacc-parallel.c
> +++ libgomp/oacc-parallel.c
> @@ -491,7 +491,9 @@ goacc_wait (int async, int num_waits, va_list *ap)
>  {
>int qid = va_arg (*ap, int);
>
> -  goacc_aq aq = get_goacc_asyncqueue (qid);
> +  goacc_aq aq = lookup_goacc_asyncqueue (thr, false, qid);
> +  if (!aq)
> + continue;
>if (acc_dev->openacc.async.test_func (aq))
>   continue;
>if (async == acc_async_sync)
> 
> 
> Grüße
>  Thomas
> 
>

[PR 87615] Limit AA walking in hopefully all of IPA summary generation

2018-12-07 Thread Martin Jambor

Hi,

the patch below adds alias analysis (AA) walk limiting to all
walk_alias_vdefs calls invoked as IPA-CP and ipa-fnsummary summary
generation.  Eventually the two should be unified into just one phase
but that is something for stage1.  This patch gives them both
independent budgets, both initialized to
PARAM_VALUE(PARAM_IPA_MAX_AA_STEPS) - as a consequence, the real limit
is twice the given value.

I'm not sure whether the ipa-prop AA limiting was written before
walk_alias_vdefs had its limit parameter added, but I changed the code
to use it and also modified the budget counter to go from the parameter
value down to zero as opposed to incrementing it and checking manually
before each call into AA.  I always pass the current budget + 1 to
walk_alias_vdefs limit so that I do not have to check whether it is zero
which means no limiting at all, and also so that loads from a parameter
right at the top of a function get detected as unmodified even if some
previous analysis already used up all the budget.

As far as the testcase for PR 87615 is concerned, this patch makes the
compile time go down from approx. 340 seconds to about 160 seconds on my
desktop and IPA-CP gets zeros in -ftime-reportbut we still do not reach
the 40 second compile time we get with -fno-ipa-cp -fno-inline.

Bootstrapped and tested on x86_64-linux.  OK for trunk?

Thanks,

Martin


2018-12-06  Martin Jambor  

PR ipa/87615
* ipa-prop.h (struct ipa_func_body_info): Replaced field aa_walked
with aa_walk_budget.
* cgraph.h (ipa_polymorphic_call_context::get_dynamic_type): Add
aa_walk_budget_p parameter.
* ipa-fnsummary.c (unmodified_parm_1): New parameter fbi.  Limit AA
walk.  Updated all callers.
(unmodified_parm): New parameter fbi, pass it to unmodified_parm_1.
(eliminated_by_inlining_prob): New parameter fbi, pass it on to
unmodified_parm.
(will_be_nonconstant_expr_predicate): New parameter fbi, removed
parameter info.  Extract info from fbi.  Pass fbi to recursive calls
and to unmodified_parm.
(phi_result_unknown_predicate): New parameter fbi, removed parameter
info, updated call to will_be_nonconstant_expr_predicate.
(param_change_prob): New parameter fbi, limit AA walking.
(analyze_function_body): Initialize aa_walk_budget in fbi.  Update
calls to various above functions.
* ipa-polymorphic-call.c (get_dynamic_type): Add aa_walk_budget_p
parameter.  Use it to limit AA walking.
* ipa-prop.c (detect_type_change_from_memory_writes): New parameter
fbi, limit AA walk.
(detect_type_change): New parameter fbi, pass it on to
detect_type_change_from_memory_writes.
(detect_type_change_ssa): Likewise.
(aa_overwalked): Removed.
(parm_preserved_before_stmt_p): Assume fbi is never NULL, stream line
accordingly, adjust to the neew AA limiting scheme.
(parm_ref_data_preserved_p): Likewise.
(ipa_compute_jump_functions_for_edge): Adjust call to
get_dynamic_type.
(ipa_analyze_call_uses): Likewise.
(ipa_analyze_virtual_call_uses): Pass fbi to detect_type_change_ssa.
(ipa_analyze_node): Initialize aa_walk_budget.
(ipcp_transform_function): Likewise.
* tree-ssa-sccvn.c (eliminate_dom_walker::eliminate_stmt): Update call
to get_dynamic_type.
---
 gcc/cgraph.h   |   2 +-
 gcc/ipa-fnsummary.c| 111 +++-
 gcc/ipa-polymorphic-call.c |  28 ++--
 gcc/ipa-prop.c | 128 +
 gcc/ipa-prop.h |   5 +-
 gcc/tree-ssa-sccvn.c   |   2 +-
 6 files changed, 154 insertions(+), 122 deletions(-)

diff --git a/gcc/cgraph.h b/gcc/cgraph.h
index b8e23cc338a..028899e44b1 100644
--- a/gcc/cgraph.h
+++ b/gcc/cgraph.h
@@ -1548,7 +1548,7 @@ public:
 
   /* Look for vtable stores or constructor calls to work out dynamic type
  of memory location.  */
-  bool get_dynamic_type (tree, tree, tree, gimple *);
+  bool get_dynamic_type (tree, tree, tree, gimple *, unsigned *);
 
   /* Make context non-speculative.  */
   void clear_speculation ();
diff --git a/gcc/ipa-fnsummary.c b/gcc/ipa-fnsummary.c
index 23b7821dcc1..f4cc4df5cda 100644
--- a/gcc/ipa-fnsummary.c
+++ b/gcc/ipa-fnsummary.c
@@ -941,7 +941,8 @@ mark_modified (ao_ref *ao ATTRIBUTE_UNUSED, tree vdef 
ATTRIBUTE_UNUSED,
PARM_DECL) will be stored to *SIZE_P in that case too.  */
 
 static tree
-unmodified_parm_1 (gimple *stmt, tree op, HOST_WIDE_INT *size_p)
+unmodified_parm_1 (ipa_func_body_info *fbi, gimple *stmt, tree op,
+  HOST_WIDE_INT *size_p)
 {
   /* SSA_NAME referring to parm default def?  */
   if (TREE_CODE (op) == SSA_NAME
@@ -959,8 +960,14 @@ unmodified_parm_1 (gimple *stmt, tree op, HOST_WIDE_INT 
*size_p)
 
   ao_ref refd;
   ao_ref_init (&refd, op);
-  walk_aliased_vdefs (&refd,

Re: libgo patch committed: Add precise stack scan support

2018-12-07 Thread Ian Lance Taylor

On Fri, Dec 7, 2018 at 2:07 AM Rainer Orth  
wrote:
>
> > This libgo patch by Cherry Zhang adds support for precise stack
> > scanning to the Go runtime.  This uses per-function stack maps stored
> > in the exception tables in the language-specific data area.  The
> > compiler needs to generate these stack maps; currently this is only
> > done by a version of LLVM, not by GCC.  Each safepoint in a function
> > is associated with a (real or dummy) landing pad, and its "type info"
> > in the exception table is a pointer to the stack map. When a stack is
> > scanned, the stack map is found by the stack unwinding code.
> >
> > For precise stack scan we need to unwind the stack. There are three cases:
> >
> > - If a goroutine is scanning its own stack, it can unwind the stack
> > and scan the frames.
> >
> > - If a goroutine is scanning another, stopped, goroutine, it cannot
> > directly unwind the target stack. We handle this by switching
> > (runtime.gogo) to the target g, letting it unwind and scan the stack,
> > and switch back.
> >
> > - If we are scanning a goroutine that is blocked in a syscall, we send
> > a signal to the target goroutine's thread, and let the signal handler
> > unwind and scan the stack. Extra care is needed as this races with
> > enter/exit syscall.
> >
> > Currently this is only implemented on GNU/Linux.
> >
> > Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
> > to mainline.
>
> this broke Solaris (and other non-Linux) bootstrap:
>
> /vol/gcc/src/hg/trunk/local/libgo/go/runtime/stubs_nonlinux.go:20:1: error: 
> missing return at end of function
>20 | }
>   | ^
>
> Fixed by returning 0 for now, the return value is ignored in
> go/runtime/proc.go (scang) anyway.

Thanks.  Committed to mainline.

Ian

Re: [RFC][AArch64] Add support for system register based stack protector canary access

2018-12-07 Thread Ramana Radhakrishnan

On Tue, Dec 4, 2018 at 12:58 PM Florian Weimer  wrote:
>
> * Wilco Dijkstra:
>
> >> For userland, I would like to eventually copy the OpenBSD approach for
> >> architectures which have some form of PC-relative addressing: we can
> >> have multiple random canaries in (RELRO) .rodata in sufficiently close
> >> to the code that needs them (assuming that we have split .rodata).

On AArch64 as well we've split .rodata. I think I did this with GCC 5.

All the addressing of global data is through PC relative access and in
the small model which is the default in Linux userland, I think we'll
just be fine.

>  At
> >> least for x86-64, I expect this to be a small win.  It's also a slight
> >> hardening improvement if the reference canary is not stored in writable
> >> memory.
> >
> > On AArch64 hardware pointer signing already provides a free and more robust
> > implementation of stack canaries, so we could change -fstack-protector to
> > use that when pointer signing is enabled.
>
> I expected to use both because not all AArch64 implementations support
> pointer signing, and we'd use the stack protector to get some coverage
> for the legacy implementations.

Indeed. until the default goes up to Armv8.3-A it's going to be default to this.


regards
Ramana

>
> Thanks,
> Florian

[PATCH v2] Fix PR64242

2018-12-07 Thread Wilco Dijkstra

Improve the fix for PR64242.  Various optimizations can change a memory 
reference
into a frame access.  Given there are multiple virtual frame pointers which may
be replaced by multiple hard frame pointers, there are no checks for writes to 
the
various frame pointers.  So updates to a frame pointer tends to generate 
incorrect
code.  Improve the previous fix to also add clobbers of several frame pointers 
and
add a scheduling barrier.  This should work in most cases until GCC supports a
generic "don't optimize across this instruction" feature.

Also improve testcase to handle alloca implementations which allocate too much,
and increase stack sizes to avoid accidentally passing the test like reported by
Jakub in comment #10 in the PR.

Bootstrap OK. Testcase passes on AArch64 and x86-64.  Inspected x86, Arm,
Thumb-1 and Thumb-2 assembler which looks correct.

ChangeLog:
2018-12-07  Wilco Dijkstra  

gcc/
PR middle-end/64242
* builtins.c (expand_builtin_longjmp): Add frame clobbers and schedule 
block.
(expand_builtin_nonlocal_goto): Likewise.

testsuite/
PR middle-end/64242
* gcc.c-torture/execute/pr64242.c: Update test.

---

diff --git a/gcc/builtins.c b/gcc/builtins.c
index 
669e548706f537fa9a92c5f47f30fc3c6ee38176..2ef9c9afcc69fcb775dc6a6fff550025bdc76337
 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -1138,15 +1138,20 @@ expand_builtin_longjmp (rtx buf_addr, rtx value)
emit_insn (targetm.gen_nonlocal_goto (value, lab, stack, fp));
   else
{
- lab = copy_to_reg (lab);
-
  emit_clobber (gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode)));
  emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx));
 
+ lab = copy_to_reg (lab);
+
  /* Restore the frame pointer and stack pointer.  We must use a
 temporary since the setjmp buffer may be a local.  */
  fp = copy_to_reg (fp);
  emit_stack_restore (SAVE_NONLOCAL, stack);
+
+ /* Ensure the frame pointer move is not optimized.  */
+ emit_insn (gen_blockage ());
+ emit_clobber (hard_frame_pointer_rtx);
+ emit_clobber (frame_pointer_rtx);
  emit_move_insn (hard_frame_pointer_rtx, fp);
 
  emit_use (hard_frame_pointer_rtx);
@@ -1285,15 +1290,20 @@ expand_builtin_nonlocal_goto (tree exp)
 emit_insn (targetm.gen_nonlocal_goto (const0_rtx, r_label, r_sp, r_fp));
   else
 {
-  r_label = copy_to_reg (r_label);
-
   emit_clobber (gen_rtx_MEM (BLKmode, gen_rtx_SCRATCH (VOIDmode)));
   emit_clobber (gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx));
 
+  r_label = copy_to_reg (r_label);
+
   /* Restore the frame pointer and stack pointer.  We must use a
 temporary since the setjmp buffer may be a local.  */
   r_fp = copy_to_reg (r_fp);
   emit_stack_restore (SAVE_NONLOCAL, r_sp);
+
+  /* Ensure the frame pointer move is not optimized.  */
+  emit_insn (gen_blockage ());
+  emit_clobber (hard_frame_pointer_rtx);
+  emit_clobber (frame_pointer_rtx);
   emit_move_insn (hard_frame_pointer_rtx, r_fp);
 
   /* USE of hard_frame_pointer_rtx added for consistency;
diff --git a/gcc/testsuite/gcc.c-torture/execute/pr64242.c 
b/gcc/testsuite/gcc.c-torture/execute/pr64242.c
index 
46a7b23d28d71604d141281c21fb0b77849b1b0d..e6139ede3f34d587ac53d04e286e5d75fd2ca76c
 100644
--- a/gcc/testsuite/gcc.c-torture/execute/pr64242.c
+++ b/gcc/testsuite/gcc.c-torture/execute/pr64242.c
@@ -5,46 +5,31 @@ extern void abort (void);
 __attribute ((noinline)) void
 broken_longjmp (void *p)
 {
-  void *buf[5];
+  void *buf[32];
   __builtin_memcpy (buf, p, 5 * sizeof (void*));
   /* Corrupts stack pointer...  */
   __builtin_longjmp (buf, 1);
 }
 
-__attribute ((noipa)) __UINTPTR_TYPE__
-foo (void *p)
-{
-  return (__UINTPTR_TYPE__) p;
-}
-
-__attribute ((noipa)) void
-bar (void *p)
-{
-  asm volatile ("" : : "r" (p));
-}
-
 volatile int x = 0;
-void *volatile p;
-void *volatile q;
+char *volatile p;
+char *volatile q;
 
 int
 main ()
 {
   void *buf[5];
-  struct __attribute__((aligned (32))) S { int a[4]; } s;
-  bar (&s);
   p = __builtin_alloca (x);
+  q = __builtin_alloca (x);
   if (!__builtin_setjmp (buf))
 broken_longjmp (buf);
 
+  /* Compute expected next alloca offset - some targets don't align properly
+ and allocate too much.  */
+  p = q + (q - p);
+
   /* Fails if stack pointer corrupted.  */
-  q = __builtin_alloca (x);
-  if (foo (p) < foo (q))
-{
-  if (foo (q) - foo (p) >= 1024)
-   abort ();
-}
-  else if (foo (p) - foo (q) >= 1024)
+  if (p != __builtin_alloca (x))
 abort ();
 
   return 0;

[PR 88214] Check that an argument is pointer before attempting agg jf construction from it

2018-12-07 Thread Martin Jambor

Hi,

ICE in PR 88214 happens because a type-mismatch in K&R C code makes
IPA-CP analysis call ao_ref_init_from_ptr_and_size on an integer
SSA_NAME, this function in turn constructs a temporary MEM_REF based on
that integer SSA_NAME and then later on call_may_clobber_ref_p_1 treats
the MEM_REF base as a pointer, gets its SSA_NAME_PTR_INFO and tries to
work with bitmaps there.  But because the SSA_NAME is an integer, there
is no SSA_NAME_PTR_INFO, there is range info instead and this leads to a
crash.

On a related note, would people object to adding the following assert,
which would have made this bug much more straightforward to find?

index 85a5de7..66cf2f2 100644
--- a/gcc/tree-ssa-alias.c
+++ b/gcc/tree-ssa-alias.c
@@ -710,6 +710,7 @@ ao_ref_init_from_ptr_and_size (ao_ref *ref, tree ptr, tree 
size)
 }
   else
 {
+  gcc_assert (POINTER_TYPE_P (TREE_TYPE (ptr)));
   ref->base = build2 (MEM_REF, char_type_node,
  ptr, null_pointer_node);
   ref->offset = 0;


The bug itself can be fixed with the patch below.  I have verified it
avoids the ICE on powerpc64-linux and did a full bootstrap and test on
an x86_64-linux.  The patch is simple enough that I believe that is good
enough.


2018-12-06  Martin Jambor  

PR ipa/88214
* ipa-prop.c (determine_locally_known_aggregate_parts): Make sure
we check pointers against pointers.

testsuite/
* gcc.dg/ipa/pr88214.c: New test.
---
 gcc/ipa-prop.c |  3 ++-
 gcc/testsuite/gcc.dg/ipa/pr88214.c | 10 ++
 2 files changed, 12 insertions(+), 1 deletion(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr88214.c

diff --git a/gcc/ipa-prop.c b/gcc/ipa-prop.c
index 74052350ac1..4dbe26829e3 100644
--- a/gcc/ipa-prop.c
+++ b/gcc/ipa-prop.c
@@ -1569,7 +1569,8 @@ determine_locally_known_aggregate_parts (gcall *call, 
tree arg,
   if (TREE_CODE (arg) == SSA_NAME)
{
  tree type_size;
-  if (!tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (arg_type
+  if (!tree_fits_uhwi_p (TYPE_SIZE (TREE_TYPE (arg_type)))
+ || !POINTER_TYPE_P (TREE_TYPE (arg)))
 return;
  check_ref = true;
  arg_base = arg;
diff --git a/gcc/testsuite/gcc.dg/ipa/pr88214.c 
b/gcc/testsuite/gcc.dg/ipa/pr88214.c
new file mode 100644
index 000..4daa9829e75
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr88214.c
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+/* { dg-options "-O2" } */
+
+void i();
+  short a;
+  void b(e) char * e;
+  {
+i();
+b(a);
+  }
-- 
2.19.1

[AArch64][SVE] Remove unnecessary PTRUEs from FP arithmetic

2018-12-07 Thread Richard Sandiford

When using the unpredicated all-register forms of FADD, FSUB and FMUL,
the rtl patterns would still have the predicate operand we created for
the other forms.  This patch splits the patterns after reload in order
to get rid of the predicate, like we already do for WHILE.

Tested on aarch64-linux-gnu and applied.

Richard


2018-12-07  Richard Sandiford  

gcc/
* config/aarch64/iterators.md (SVE_UNPRED_FP_BINARY): New code
iterator.
(sve_fp_op): Handle minus and mult.
* config/aarch64/aarch64-sve.md (*add3, *sub3)
(*mul3): Split the patterns after reload if we don't
need the predicate operand.
(*post_ra_3): New pattern.

gcc/testsuite/
* gcc.target/aarch64/sve/pred_elim_1.c: New test.

Index: gcc/config/aarch64/iterators.md
===
--- gcc/config/aarch64/iterators.md 2018-12-05 08:33:40.970920085 +
+++ gcc/config/aarch64/iterators.md 2018-12-07 14:59:39.875208953 +
@@ -1220,6 +1220,9 @@ (define_code_iterator SVE_INT_BINARY [pl
 ;; SVE integer binary division operations.
 (define_code_iterator SVE_INT_BINARY_SD [div udiv])
 
+;; SVE floating-point operations with an unpredicated all-register form.
+(define_code_iterator SVE_UNPRED_FP_BINARY [plus minus mult])
+
 ;; SVE integer comparisons.
 (define_code_iterator SVE_INT_CMP [lt le eq ne ge gt ltu leu geu gtu])
 
@@ -1423,6 +1426,8 @@ (define_code_attr sve_int_op_rev [(plus
 
 ;; The floating-point SVE instruction that implements an rtx code.
 (define_code_attr sve_fp_op [(plus "fadd")
+(minus "fsub")
+(mult "fmul")
 (neg "fneg")
 (abs "fabs")
 (sqrt "fsqrt")])
Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2018-07-18 18:44:56.0 +0100
+++ gcc/config/aarch64/aarch64-sve.md   2018-12-07 14:59:39.875208953 +
@@ -2194,7 +2194,7 @@ (define_expand "add3"
 )
 
 ;; Floating-point addition predicated with a PTRUE.
-(define_insn "*add3"
+(define_insn_and_split "*add3"
   [(set (match_operand:SVE_F 0 "register_operand" "=w, w, w")
(unspec:SVE_F
  [(match_operand: 1 "register_operand" "Upl, Upl, Upl")
@@ -2206,7 +2206,12 @@ (define_insn "*add3"
   "@
fadd\t%0., %1/m, %0., #%3
fsub\t%0., %1/m, %0., #%N3
-   fadd\t%0., %2., %3."
+   #"
+  ; Split the unpredicated form after reload, so that we don't have
+  ; the unnecessary PTRUE.
+  "&& reload_completed
+   && register_operand (operands[3], mode)"
+  [(set (match_dup 0) (plus:SVE_F (match_dup 2) (match_dup 3)))]
 )
 
 ;; Unpredicated floating-point subtraction.
@@ -2225,7 +2230,7 @@ (define_expand "sub3"
 )
 
 ;; Floating-point subtraction predicated with a PTRUE.
-(define_insn "*sub3"
+(define_insn_and_split "*sub3"
   [(set (match_operand:SVE_F 0 "register_operand" "=w, w, w, w")
(unspec:SVE_F
  [(match_operand: 1 "register_operand" "Upl, Upl, Upl, Upl")
@@ -2240,7 +2245,13 @@ (define_insn "*sub3"
fsub\t%0., %1/m, %0., #%3
fadd\t%0., %1/m, %0., #%N3
fsubr\t%0., %1/m, %0., #%2
-   fsub\t%0., %2., %3."
+   #"
+  ; Split the unpredicated form after reload, so that we don't have
+  ; the unnecessary PTRUE.
+  "&& reload_completed
+   && register_operand (operands[2], mode)
+   && register_operand (operands[3], mode)"
+  [(set (match_dup 0) (minus:SVE_F (match_dup 2) (match_dup 3)))]
 )
 
 ;; Unpredicated floating-point multiplication.
@@ -2259,7 +2270,7 @@ (define_expand "mul3"
 )
 
 ;; Floating-point multiplication predicated with a PTRUE.
-(define_insn "*mul3"
+(define_insn_and_split "*mul3"
   [(set (match_operand:SVE_F 0 "register_operand" "=w, w")
(unspec:SVE_F
  [(match_operand: 1 "register_operand" "Upl, Upl")
@@ -2270,8 +2281,24 @@ (define_insn "*mul3"
   "TARGET_SVE"
   "@
fmul\t%0., %1/m, %0., #%3
-   fmul\t%0., %2., %3."
-)
+   #"
+  ; Split the unpredicated form after reload, so that we don't have
+  ; the unnecessary PTRUE.
+  "&& reload_completed
+   && register_operand (operands[3], mode)"
+  [(set (match_dup 0) (mult:SVE_F (match_dup 2) (match_dup 3)))]
+)
+
+;; Unpredicated floating-point binary operations (post-RA only).
+;; These are generated by splitting a predicated instruction whose
+;; predicate is unused.
+(define_insn "*post_ra_3"
+  [(set (match_operand:SVE_F 0 "register_operand" "=w")
+   (SVE_UNPRED_FP_BINARY:SVE_F
+ (match_operand:SVE_F 1 "register_operand" "w")
+ (match_operand:SVE_F 2 "register_operand" "w")))]
+  "TARGET_SVE && reload_completed"
+  "\t%0., %1., %2.")
 
 ;; Unpredicated fma (%0 = (%1 * %2) + %3).
 (define_expand "fma4"
Index: gcc/testsuite/gcc.target/aarch64/sve/pred_elim_1.c
===
--- /dev/null   2018-11-29 13:15:04.463550658 +
+++ g

[AArch64][SVE] Remove unnecessary PTRUEs from integer arithmetic

2018-12-07 Thread Richard Sandiford

When using the unpredicated immediate forms of MUL, LSL, LSR and ASR,
the rtl patterns would still have the predicate operand we created for
the other forms.  This patch splits the patterns after reload in order
to get rid of the predicate, like we already do for WHILE.

Tested on aarch64-linux-gnu and applied.

Richard


2018-12-07  Richard Sandiford  

gcc/
* config/aarch64/aarch64-sve.md (*mul3, *v3):
Split the patterns after reload if we don't need the predicate
operand.
(*post_ra_mul3, *post_ra_v3): New patterns.

gcc/testsuite/
* gcc.target/aarch64/sve/pred_elim_2.c: New test.

Index: gcc/config/aarch64/aarch64-sve.md
===
--- gcc/config/aarch64/aarch64-sve.md   2018-12-07 15:01:42.602176516 +
+++ gcc/config/aarch64/aarch64-sve.md   2018-12-07 15:02:00.230028176 +
@@ -936,7 +936,7 @@ (define_expand "mul3"
 ;; predicate for the first alternative, but using Upa or X isn't likely
 ;; to gain much and would make the instruction seem less uniform to the
 ;; register allocator.
-(define_insn "*mul3"
+(define_insn_and_split "*mul3"
   [(set (match_operand:SVE_I 0 "register_operand" "=w, w, ?&w")
(unspec:SVE_I
  [(match_operand: 1 "register_operand" "Upl, Upl, Upl")
@@ -946,12 +946,30 @@ (define_insn "*mul3"
  UNSPEC_MERGE_PTRUE))]
   "TARGET_SVE"
   "@
-   mul\t%0., %0., #%3
+   #
mul\t%0., %1/m, %0., %3.
movprfx\t%0, %2\;mul\t%0., %1/m, %0., %3."
+  ; Split the unpredicated form after reload, so that we don't have
+  ; the unnecessary PTRUE.
+  "&& reload_completed
+   && !register_operand (operands[3], mode)"
+  [(set (match_dup 0) (mult:SVE_I (match_dup 2) (match_dup 3)))]
+  ""
   [(set_attr "movprfx" "*,*,yes")]
 )
 
+;; Unpredicated multiplications by a constant (post-RA only).
+;; These are generated by splitting a predicated instruction whose
+;; predicate is unused.
+(define_insn "*post_ra_mul3"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w")
+   (mult:SVE_I
+ (match_operand:SVE_I 1 "register_operand" "0")
+ (match_operand:SVE_I 2 "aarch64_sve_mul_immediate")))]
+  "TARGET_SVE && reload_completed"
+  "mul\t%0., %0., #%2"
+)
+
 (define_insn "*madd"
   [(set (match_operand:SVE_I 0 "register_operand" "=w, w, ?&w")
(plus:SVE_I
@@ -1232,7 +1250,7 @@ (define_expand "v3"
 ;; actually need the predicate for the first alternative, but using Upa
 ;; or X isn't likely to gain much and would make the instruction seem
 ;; less uniform to the register allocator.
-(define_insn "*v3"
+(define_insn_and_split "*v3"
   [(set (match_operand:SVE_I 0 "register_operand" "=w, w, ?&w")
(unspec:SVE_I
  [(match_operand: 1 "register_operand" "Upl, Upl, Upl")
@@ -1242,12 +1260,28 @@ (define_insn "*v3"
  UNSPEC_MERGE_PTRUE))]
   "TARGET_SVE"
   "@
-   \t%0., %2., #%3
+   #
\t%0., %1/m, %0., %3.
movprfx\t%0, %2\;\t%0., %1/m, %0., %3."
+  "&& reload_completed
+   && !register_operand (operands[3], mode)"
+  [(set (match_dup 0) (ASHIFT:SVE_I (match_dup 2) (match_dup 3)))]
+  ""
   [(set_attr "movprfx" "*,*,yes")]
 )
 
+;; Unpredicated shift operations by a constant (post-RA only).
+;; These are generated by splitting a predicated instruction whose
+;; predicate is unused.
+(define_insn "*post_ra_v3"
+  [(set (match_operand:SVE_I 0 "register_operand" "=w")
+   (ASHIFT:SVE_I
+ (match_operand:SVE_I 1 "register_operand" "w")
+ (match_operand:SVE_I 2 "aarch64_simd_shift_imm")))]
+  "TARGET_SVE && reload_completed"
+  "\t%0., %1., #%2"
+)
+
 ;; LSL, LSR and ASR by a scalar, which expands into one of the vector
 ;; shifts above.
 (define_expand "3"
Index: gcc/testsuite/gcc.target/aarch64/sve/pred_elim_2.c
===
--- /dev/null   2018-11-29 13:15:04.463550658 +
+++ gcc/testsuite/gcc.target/aarch64/sve/pred_elim_2.c  2018-12-07 
15:02:00.230028176 +
@@ -0,0 +1,31 @@
+/* { dg-options "-O2 -ftree-vectorize" } */
+
+#include 
+
+#define TEST_OP(NAME, TYPE, OP)\
+  void \
+  NAME##_##TYPE (TYPE *restrict a, TYPE *restrict b, int n)\
+  {\
+for (int i = 0; i < n; ++i)\
+  a[i] = b[i] OP;  \
+  }
+
+#define TEST_TYPE(TYPE) \
+  TEST_OP (shl, TYPE, << 6) \
+  TEST_OP (shr, TYPE, >> 6) \
+  TEST_OP (mult, TYPE, * 0x2b)
+
+TEST_TYPE (int8_t)
+TEST_TYPE (int16_t)
+TEST_TYPE (int32_t)
+TEST_TYPE (int64_t)
+TEST_TYPE (uint8_t)
+TEST_TYPE (uint16_t)
+TEST_TYPE (uint32_t)
+TEST_TYPE (uint64_t)
+
+/* { dg-final { scan-assembler-times {\tlsl\t} 8 } } */
+/* { dg-final { scan-assembler-times {\tlsr\t} 4 } } */
+/* { dg-final { scan-assembler-times {\tasr\t} 4 } } */
+/* { dg-final { scan-assembler-times {\tmul\t} 8 } }

Re: [committed] PR testsuite/86540, twiddle for aarch64

2018-12-07 Thread Jeff Law

On 12/7/18 6:46 AM, Richard Earnshaw (lists) wrote:
> On 06/12/2018 15:36, Jeff Law wrote:
>>
>> As outlined in the PR, the aarch64 has a non-default value for
>> CASE_VALUES_THRESHOLD which changes decisions in switch lowering.  Those
>> changes in switch lowering can expose additional jump threads later in
>> the pipeline which cause heartburn for a couple tests.
>>
>> I looked at all the other ports with a non-default value of
>> CASE_VALUES_THRESHOLD and only aarch64 is high enough to trigger these
>> changes in behavior on the two relevant tests.  So I'm just skipping the
>> tests that run after switch lowering on aarch64.
>>
>> Verified with a cross that these tests now pass.
>>
>> Committing to the trunk,
>>
> 
> Can't we use a param to force the value back to (near) the default?
> That would then work even if other targets start changing the default here.
That would have been my preference, but I didn't see a suitable PARAM
and adding one just for these two tests seemed like overkill.

jeff

Re: [C++ Patch] Fixes for three grokbitfield diagnostics

2018-12-07 Thread Jason Merrill


On 12/7/18 5:39 AM, Paolo Carlini wrote:

Hi,

these should be more or less straightforward (now that 
DECL_SOURCE_LOCATION often extracts a proper location ;). In the case of 
warn_if_not_aligned I also removed the redundant width check (it's never 
null, double checked by running the instrumented testsuite too) and I 
also removed the DECL_NAME use, which definitely doesn't play well with 
unnamed declarations (other/bitfield7.C exercises that too).


Tested x86_64-linux, as usual.

Thanks, Paolo.

//


OK.

Jason

Re: [C++ PATCH] Fix make_temporary_var_for_ref_to_temp (PR c++/86669)

2018-12-07 Thread Jason Merrill


On 12/6/18 6:26 PM, Jakub Jelinek wrote:

On Wed, Dec 05, 2018 at 09:50:56PM +0100, Jakub Jelinek wrote:

On Wed, Dec 05, 2018 at 03:49:26PM -0500, Jason Merrill wrote:

On 11/28/18 3:42 AM, Jakub Jelinek wrote:

Whenever we need to clone a cdtor (either because the target doesn't support
aliases the way we need, e.g. initlist105.C testcase on darwin, where it has
been originally reported, or when it has virtual bases, like the made up
initlist106.C on x86_64-linux), we rely on DECL_INITIAL of the local
variables being unshared, because the tree unsharing gimplify.c performs
doesn't unshare DECL_INITIAL.  clone_body has some code to recurse on the
DECL_INITIAL, but it handles just decls FOR_EACH_LOCAL_DECL walks.  I
believe it is generally ok that not all temporaries are covered in local
decls, the gimplifier should take care of those fine if we don't need
debug info for them.


I think any temporaries that have DECL_INITIAL should be pushed so that they
end up in local_decls.  set_up_extended_ref_temp already adds a DECL_EXPR
for it, I guess we need a pushdecl as well.  Though the comment for
get_temp_regvar suggests that this is problematic somehow.


Ok, will play with it tomorrow.


The following fixes the testcase too and passed bootstrap/regtest on
x86_64-linux and i686-linux, ok for trunk?

2018-12-07  Jakub Jelinek  

PR c++/86669
* call.c (make_temporary_var_for_ref_to_temp): Call pushdecl even for
automatic vars.


OK, thanks.

Jason

Re: [PATCH][Version 3]Come up with -flive-patching master option.

2018-12-07 Thread Qing Zhao

thanks a lot for fixing this issue.

Qing
> On Dec 7, 2018, at 7:07 AM, Rainer Orth  wrote:
> 
> Hi Qing,
> 
>>> On Nov 28, 2018, at 9:52 AM, Jan Hubicka  wrote:
>>> 
 
 2018-11-20  qing zhao  
 
* cif-code.def (EXTERN_LIVE_ONLY_STATIC): New CIF code.
* common.opt: Add -flive-patching flag.
* doc/invoke.texi: Document -flive-patching.
* flag-types.h (enum live_patching_level): New enum.
* ipa-inline.c (can_inline_edge_p): Disable external functions from
inlining when flag_live_patching is LIVE_PATCHING_INLINE_ONLY_STATIC.
* opts.c (control_options_for_live_patching): New function.
(finish_options): Make flag_live_patching incompatible with flag_lto.
Control IPA optimizations based on different levels of 
flag_live_patching.
 
 gcc/testsuite/ChangeLog:
 
 2018-11-20  qing zhao  
 
* gcc.dg/live-patching-1.c: New test.
* gcc.dg/live-patching-2.c: New test.
* gcc.dg/live-patching-3.c: New test.
* gcc.dg/tree-ssa/writeonly-3.c: New test.
* gcc.target/i386/ipa-stack-alignment-2.c: New test.
> [...]
>>> Patch is OK,
>> 
>> thanks for the review.
>> 
>> I will commit the patch very soon.
> 
> the new gcc.target/i386/ipa-stack-alignment-2.c testcase FAILs on
> Solaris/x86:
> 
> FAIL: gcc.target/i386/ipa-stack-alignment-2.c scan-assembler sub.*%.sp
> 
> Like ipa-stack-alignment.c, it needs -fomit-frame-pointer.  Fixed as
> follows, tested on i386-pc-solaris2.11 and x86_64-pc-linux-gnu, 32 and
> 64-bit each.  Installed on mainline.
> 
>   Rainer
> 
> -- 
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
> 
> 
> 2018-12-06  Rainer Orth  
> 
>   * gcc.target/i386/ipa-stack-alignment-2.c: Add
>   -fomit-frame-pointer to dg-options.
> 
> # HG changeset patch
> # Parent  580b8c24b018674f1ffd56a9c672129f746682c5
> Build gcc.target/i386/ipa-stack-alignment-2.c with -fomit-frame-pointer
> 
> diff --git a/gcc/testsuite/gcc.target/i386/ipa-stack-alignment-2.c 
> b/gcc/testsuite/gcc.target/i386/ipa-stack-alignment-2.c
> --- a/gcc/testsuite/gcc.target/i386/ipa-stack-alignment-2.c
> +++ b/gcc/testsuite/gcc.target/i386/ipa-stack-alignment-2.c
> @@ -1,5 +1,5 @@
> /* { dg-do compile } */
> -/* { dg-options "-flive-patching -O" } */
> +/* { dg-options "-flive-patching -O -fomit-frame-pointer" } */
> 
> typedef struct {
>   long a;

Re: [gofrontend-dev] Re: libgo patch committed: Add precise stack scan support

2018-12-07 Thread Cherry Zhang via gcc-patches

Sorry about the breakage. Thanks for the patch.


On Fri, Dec 7, 2018 at 9:23 AM Ian Lance Taylor  wrote:

> On Fri, Dec 7, 2018 at 2:07 AM Rainer Orth 
> wrote:
> >
> > > This libgo patch by Cherry Zhang adds support for precise stack
> > > scanning to the Go runtime.  This uses per-function stack maps stored
> > > in the exception tables in the language-specific data area.  The
> > > compiler needs to generate these stack maps; currently this is only
> > > done by a version of LLVM, not by GCC.  Each safepoint in a function
> > > is associated with a (real or dummy) landing pad, and its "type info"
> > > in the exception table is a pointer to the stack map. When a stack is
> > > scanned, the stack map is found by the stack unwinding code.
> > >
> > > For precise stack scan we need to unwind the stack. There are three
> cases:
> > >
> > > - If a goroutine is scanning its own stack, it can unwind the stack
> > > and scan the frames.
> > >
> > > - If a goroutine is scanning another, stopped, goroutine, it cannot
> > > directly unwind the target stack. We handle this by switching
> > > (runtime.gogo) to the target g, letting it unwind and scan the stack,
> > > and switch back.
> > >
> > > - If we are scanning a goroutine that is blocked in a syscall, we send
> > > a signal to the target goroutine's thread, and let the signal handler
> > > unwind and scan the stack. Extra care is needed as this races with
> > > enter/exit syscall.
> > >
> > > Currently this is only implemented on GNU/Linux.
> > >
> > > Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
> > > to mainline.
> >
> > this broke Solaris (and other non-Linux) bootstrap:
> >
> > /vol/gcc/src/hg/trunk/local/libgo/go/runtime/stubs_nonlinux.go:20:1:
> error: missing return at end of function
> >20 | }
> >   | ^
> >
> > Fixed by returning 0 for now, the return value is ignored in
> > go/runtime/proc .go (scang)
> anyway.
>
> Thanks.  Committed to mainline.
>
> Ian
>
> --
> You received this message because you are subscribed to the Google Groups
> "gofrontend-dev" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to gofrontend-dev+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

Re: [PATCH] Fix PR63184

2018-12-07 Thread Jeff Law

On 12/7/18 3:55 AM, Richard Biener wrote:
> On Fri, 7 Dec 2018, Richard Biener wrote:
> 
>>
>> The following fixes PR63184 by using tree-affine to resolve pointer
>> comparisons.  Instead of trying to stick this into a match.pd pattern
>> the following does this in the more constrained forwprop environment.
>>
>> I've only implemented the cases where the comparison resolves to a
>> compile-time value, not the case where for example &a[i] < &a[j]
>> could be simplified to i < j.  I'm not sure I can trust the
>> tree-affine machinery enough here to do that.
>>
>> Both testcases require some CSE to happen thus the first forwprop
>> pass doesn't catch it.
>>
>> Bootstrap & regtest running on x86_64-unknown-linux-gnu.
>>
>> I'll collect some statistics from bootstrap.
> 
> Somewhat depressing.  There's a single instance in
> libiberty/rust-demangle.c that gets resolved (a ordered compare).
> This instance triggers 4 times during a c,c++ bootstrap compared
> to 258098 affine expansion combinations tried.
> 
> It doesn't trigger in tramp3d at all (just to try a C++ code base).
> 
> I suspect the cases in PR63184 are arcane enough and usually we
> have simpler addresses that are resolved with the existing
> patterns.
> 
> I'll attach the patch to the PR and leave it alone.
Seems reasonable to me as well.  It's originally your BZ and somewhere
in it I think you indicated you didn't think it was terribly important.

I'd suggest pushing it out to P4.  But I know you don't like that, so I
won't actually do it :-)

Jeff

Re: [PATCH 5/6, OpenACC, libgomp] Async re-work, C/C++ testsuite changes

2018-12-07 Thread Thomas Schwinge

Hi Chung-Lin!

On Tue, 25 Sep 2018 21:11:42 +0800, Chung-Lin Tang  
wrote:
> These are the testsuite/libgomp.oacc-c-c++-common/* changes.

Please commit the following three hunks to trunk: the code as present
doesn't declare its async/wait dependencies correctly.  To record the
review effort, please include "Reviewed-by: Thomas Schwinge
" in the commit log, see
.

> diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c 
> b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c
> index 2ddfa7d..f553d3d 100644
> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-2-lib.c
> @@ -153,7 +153,7 @@ main (int argc, char **argv)
>  d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii];
>  
>  #pragma acc parallel present (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N], N) \
> -  async (4)
> +  wait (1, 2, 3) async (4)
>for (int ii = 0; ii < N; ii++)
>  e[ii] = a[ii] + b[ii] + c[ii] + d[ii];
>  
> diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c 
> b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c
> index 0c6abe6..81d623a 100644
> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-2.c
> @@ -162,7 +162,7 @@ main (int argc, char **argv)
>  d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii];
>  
>  #pragma acc parallel present (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N]) \
> -  wait (1) async (4)
> +  wait (1, 2, 3) async (4)
>for (int ii = 0; ii < N; ii++)
>  e[ii] = a[ii] + b[ii] + c[ii] + d[ii];
>  
> diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c 
> b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c
> index 0bf706a..5ec50b8 100644
> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/data-3.c
> @@ -138,7 +138,7 @@ main (int argc, char **argv)
>  d[ii] = ((a[ii] * a[ii] + a[ii]) / a[ii]) - a[ii];
>  
>  #pragma acc parallel present (a[0:N], b[0:N], c[0:N], d[0:N], e[0:N]) \
> -  wait (1,5) async (4)
> +  wait (1, 2, 3, 5) async (4)
>for (int ii = 0; ii < N; ii++)
>  e[ii] = a[ii] + b[ii] + c[ii] + d[ii];
>  


Grüße
 Thomas

[PR88407] [OpenACC] Correctly handle unseen async-arguments (was: [PATCH 5/6, OpenACC, libgomp] Async re-work, C/C++ testsuite changes)

2018-12-07 Thread Thomas Schwinge

Hi Chung-Lin!

On Tue, 25 Sep 2018 21:11:42 +0800, Chung-Lin Tang  
wrote:
> These are the testsuite/libgomp.oacc-c-c++-common/* changes.

> diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-71.c 
> b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-71.c
> index c85e824..6afe2a0 100644
> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-71.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-71.c
> @@ -92,16 +92,22 @@ main (int argc, char **argv)
>abort ();
>  }
>  
> -  fprintf (stderr, "CheCKpOInT\n");
> -  if (acc_async_test (1) != 0)
> +  if (acc_async_test (0) != 0)
>  {
>fprintf (stderr, "asynchronous operation not running\n");
>abort ();
>  }
>  
> +  /* Test unseen async number.  */
> +  if (acc_async_test (1) != 1)
> +{
> +  fprintf (stderr, "acc_async_test failed on unseen number\n");
> +  abort ();
> +}
> +
>sleep ((int) (dtime / 1000.0f) + 1);
>  
> -  if (acc_async_test (1) != 1)
> +  if (acc_async_test (0) != 1)
>  {
>fprintf (stderr, "found asynchronous operation still running\n");
>abort ();
> @@ -116,7 +122,3 @@ main (int argc, char **argv)
>  
>return 0;
>  }
> -
> -/* { dg-output "CheCKpOInT(\n|\r\n|\r).*" } */
> -/* { dg-output "unknown async \[0-9\]+" } */
> -/* { dg-shouldfail "" } */

That's now correct OpenACC usage, but you've now made this one
essentially the same as "libgomp.oacc-c-c++-common/lib-69.c".

> diff --git a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-77.c 
> b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-77.c
> index f4f196d..2821f88 100644
> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-77.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-77.c
> @@ -111,7 +111,7 @@ main (int argc, char **argv)
>  
>start_timer (0);
>  
> -  acc_wait (1);
> +  acc_wait (0);
>  
>atime = stop_timer (0);
>  
> @@ -132,7 +132,3 @@ main (int argc, char **argv)
>  
>return 0;
>  }
> -
> -/* { dg-output "CheCKpOInT(\n|\r\n|\r).*" } */
> -/* { dg-output "unknown async \[0-9\]+" } */
> -/* { dg-shouldfail "" } */

Again, that's now correct OpenACC usage, but you've now made this one
essentially the same as "libgomp.oacc-c-c++-common/lib-74.c".


So, confused about the intended behavior, I've asked the OpenACC
committee to clarify, and filed  "[OpenACC]
Correctly handle unseen async-arguments".

Assuming this gets clarified in the way I think it should, I suggest the
following.  Any comments?

commit a34177a6ce637da8060394f69358f25bce90a8be
Author: Thomas Schwinge 
Date:   Fri Dec 7 16:36:53 2018 +0100

[PR88407] [OpenACC] Correctly handle unseen async-arguments

... which turn the operation into a no-op.

libgomp/
* plugin/plugin-nvptx.c (nvptx_async_test, nvptx_wait)
(nvptx_wait_async): Unseen async-argument is a no-op.
* testsuite/libgomp.oacc-c-c++-common/async_queue-1.c: Update.
* testsuite/libgomp.oacc-c-c++-common/data-2-lib.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/data-2.c: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-79.c: Likewise.
* testsuite/libgomp.oacc-fortran/lib-12.f90: Likewise.
* testsuite/libgomp.oacc-c-c++-common/lib-71.c: Merge into...
* testsuite/libgomp.oacc-c-c++-common/lib-69.c: ... this.  Update.
* testsuite/libgomp.oacc-c-c++-common/lib-77.c: Merge into...
* testsuite/libgomp.oacc-c-c++-common/lib-74.c: ... this.  Update
---
 libgomp/plugin/plugin-nvptx.c  |  13 +-
 .../libgomp.oacc-c-c++-common/async_queue-1.c  |  30 +
 .../libgomp.oacc-c-c++-common/data-2-lib.c |   2 +
 .../testsuite/libgomp.oacc-c-c++-common/data-2.c   |   2 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-69.c   |   7 ++
 .../testsuite/libgomp.oacc-c-c++-common/lib-71.c   | 122 --
 .../testsuite/libgomp.oacc-c-c++-common/lib-74.c   |   4 +
 .../testsuite/libgomp.oacc-c-c++-common/lib-77.c   | 138 -
 .../testsuite/libgomp.oacc-c-c++-common/lib-79.c   |  24 
 libgomp/testsuite/libgomp.oacc-fortran/lib-12.f90  |   5 +
 10 files changed, 80 insertions(+), 267 deletions(-)

diff --git libgomp/plugin/plugin-nvptx.c libgomp/plugin/plugin-nvptx.c
index 7d0d38e0c2e1..6f9b16634b10 100644
--- libgomp/plugin/plugin-nvptx.c
+++ libgomp/plugin/plugin-nvptx.c
@@ -1539,9 +1539,8 @@ nvptx_async_test (int async)
   struct ptx_stream *s;
 
   s = select_stream_for_async (async, pthread_self (), false, NULL);
-
   if (!s)
-GOMP_PLUGIN_fatal ("unknown async %d", async);
+return 1;
 
   r = CUDA_CALL_NOCHECK (cuStreamQuery, s->stream);
   if (r == CUDA_SUCCESS)
@@ -1596,7 +1595,7 @@ nvptx_wait (int async)
 
   s = select_stream_for_async (async, pthread_self (), false, NULL);
   if (!s)
-GOMP_PLUGIN_fatal ("unknown async %d", async);
+return;
 
   CUDA_CALL_ASSERT (cuStreamSynchronize, s->stream)

[PATCH][AArch64][2/2] Add sve_width -moverride tunable

2018-12-07 Thread Kyrill Tkachov


Hi all,

On top of the previous patch that implements TARGET_ESTIMATED_POLY_VALUE
and adds an sve_width tuning field to the CPU structs, this patch implements
an -moverride knob to adjust this sve_width field to allow for experimentation.
Again, reminder that this only has an effect when compiling for VLA-SVE that is,
without msve-vector-bits=. This just adjusts tuning heuristics in the 
compiler,,
like profitability thresholds for vectorised versioned loops, and others.

It can be used, for example like -moverride=sve_width=256 to set the sve_width
tuning field to 256. Widths outside of the accepted SVE widths [128 - 2048] are 
rejected
as you'd expect.

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2018-12-07  Kyrylo Tkachov  

* config/aarch64/aarch64.c (aarch64_tuning_override_functions): Add
sve_width entry.
(aarch64_parse_sve_width_string): Define.

2018-12-07  Kyrylo Tkachov  

* gcc.target/aarch64/sve/override_sve_width_1.c: New test.
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 7ccc6b78d5872d6b43491badbfa9f2d70580015c..bad687d33479f4b6f8cbeaca799824e29b8e9ed1 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1086,12 +1086,14 @@ struct aarch64_tuning_override_function
 
 static void aarch64_parse_fuse_string (const char*, struct tune_params*);
 static void aarch64_parse_tune_string (const char*, struct tune_params*);
+static void aarch64_parse_sve_width_string (const char*, struct tune_params*);
 
 static const struct aarch64_tuning_override_function
 aarch64_tuning_override_functions[] =
 {
   { "fuse", aarch64_parse_fuse_string },
   { "tune", aarch64_parse_tune_string },
+  { "sve_width", aarch64_parse_sve_width_string },
   { NULL, NULL }
 };
 
@@ -10834,6 +10836,34 @@ aarch64_parse_tune_string (const char *tune_string,
  "tune=");
 }
 
+/* Parse the sve_width tuning moverride string in TUNE_STRING.
+   Accept the valid SVE vector widths allowed by
+   aarch64_sve_vector_bits_enum and use it to override sve_width
+   in TUNE.  */
+
+static void
+aarch64_parse_sve_width_string (const char *tune_string,
+struct tune_params *tune)
+{
+  int width = -1;
+
+  int n = sscanf (tune_string, "%d", &width);
+  if (n == EOF)
+error ("invalid format for sve_width");
+  switch (width)
+{
+  case SVE_128:
+  case SVE_256:
+  case SVE_512:
+  case SVE_1024:
+  case SVE_2048:
+	break;
+  default:
+	error ("invalid sve_width value: %d", width);
+}
+  tune->sve_width = (enum aarch64_sve_vector_bits_enum) width;
+}
+
 /* Parse TOKEN, which has length LENGTH to see if it is a tuning option
we understand.  If it is, extract the option string and handoff to
the appropriate function.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/override_sve_width_1.c b/gcc/testsuite/gcc.target/aarch64/sve/override_sve_width_1.c
new file mode 100644
index ..3752fdc2a7198783d2ed5c5f502c3227f98029b8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/override_sve_width_1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -moverride=sve_width=512" } */
+
+void __attribute__((noinline, noclone))
+vadd (int *dst, int *op1, int *op2, int count)
+{
+  for (int i = 0; i < count; ++i)
+dst[i] = op1[i] + op2[i];
+}

[PATCH][AArch64][1/2] Implement TARGET_ESTIMATED_POLY_VALUE

2018-12-07 Thread Kyrill Tkachov


Hi all,

The hook TARGET_ESTIMATED_POLY_VALUE allows a target to give an estimate for a 
poly_int run-time value.
It is used exclusively in tuning decisions, things like estimated loop 
iterations, probabilities etc.
It is not relied on for correctness.

If we know the SVE width implemented in hardware we can make more more
informed decisions in the implementation of TARGET_ESTIMATED_POLY_VALUE,
even when compiling for VLA vectorisation.

This patch adds an sve_width field to our tuning structs and sets it for
the current CPU tunings.

A new value is introduced to the aarch64_sve_vector_bits_enum enum that 
indicates
that SVE is not available: SVE_NOT_IMPLEMENTED. I set it to the same value as 
SVE_SCALABLE
so that parts of the aarch64 backend that follow the pattern:
if (vector_width == SVE_SCALABLE)
  do_vla_friendly_action ()
else
  assume_specific_width_for_correctness ()

continue to work without change, but the CPU tuning structs can use a more
appropriate moniker for indicating the absence of SVE.

This sets sve_width to SVE_NOT_IMPLEMENTED for all cores.
I aim to add an -moverride switch in the next patch that allows a power user to 
experiment
with different values of it for investigations.

Bootstrapped and tested on aarch64-none-linux-gnu.

Thanks,
Kyrill

2018-12-07  Kyrylo Tkachov  

* config/aarch64/aarch64-opts.h (aarch64_sve_vector_bits_enum):
Add SVE_NOT_IMPLEMENTED value.
* config/aarch64/aarch64-protos.h (struct tune_params): Add sve_width
field.
* config/aarch64/aarch64.c (generic_tunings,cortexa35_tunings,
cortexa53_tunings, cortexa57_tunings, cortexa72_tunings,
cortexa73_tunings, exynosm1_tunings, thunderx_tunings,
thunderx_tunings, tsv110_tunings, xgene1_tunings, qdf24xx_tunings,
saphira_tunings, thunderx2t99_tunings, emag_tunings):
Specify sve_width.
(aarch64_estimated_poly_value): Define.
(TARGET_ESTIMATED_POLY_VALUE): Define.
diff --git a/gcc/config/aarch64/aarch64-opts.h b/gcc/config/aarch64/aarch64-opts.h
index 9d44a598967cad6db06c0097f1f9f2378981b3de..00362695d53f5791d3312bf4d83ddabb7ac10739 100644
--- a/gcc/config/aarch64/aarch64-opts.h
+++ b/gcc/config/aarch64/aarch64-opts.h
@@ -90,6 +90,7 @@ enum aarch64_function_type {
 /* SVE vector register sizes.  */
 enum aarch64_sve_vector_bits_enum {
   SVE_SCALABLE,
+  SVE_NOT_IMPLEMENTED = SVE_SCALABLE,
   SVE_128 = 128,
   SVE_256 = 256,
   SVE_512 = 512,
diff --git a/gcc/config/aarch64/aarch64-protos.h b/gcc/config/aarch64/aarch64-protos.h
index 1fe1a50d52aeb3719cf30c4a2af41abb8dd7233d..fa3c247f0773e1d4101b6209b6b7ba6cd50f82eb 100644
--- a/gcc/config/aarch64/aarch64-protos.h
+++ b/gcc/config/aarch64/aarch64-protos.h
@@ -252,6 +252,10 @@ struct tune_params
   const struct cpu_vector_cost *vec_costs;
   const struct cpu_branch_cost *branch_costs;
   const struct cpu_approx_modes *approx_modes;
+  /* Width of the SVE registers or SVE_NOT_IMPLEMENTED if not appicable.
+ Only used for tuning decisions, does not disable VLA
+ vectorization.  */
+  enum aarch64_sve_vector_bits_enum sve_width;
   int memmov_cost;
   int issue_rate;
   unsigned int fusible_ops;
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 2c267b936b0495a8c5b6593d259619dbe88ae7a8..7ccc6b78d5872d6b43491badbfa9f2d70580015c 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -681,6 +681,7 @@ static const struct tune_params generic_tunings =
   &generic_vector_cost,
   &generic_branch_cost,
   &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
   4, /* memmov_cost  */
   2, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC), /* fusible_ops  */
@@ -706,6 +707,7 @@ static const struct tune_params cortexa35_tunings =
   &generic_vector_cost,
   &generic_branch_cost,
   &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
   4, /* memmov_cost  */
   1, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
@@ -732,6 +734,7 @@ static const struct tune_params cortexa53_tunings =
   &generic_vector_cost,
   &generic_branch_cost,
   &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
   4, /* memmov_cost  */
   2, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
@@ -758,6 +761,7 @@ static const struct tune_params cortexa57_tunings =
   &cortexa57_vector_cost,
   &generic_branch_cost,
   &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
   4, /* memmov_cost  */
   3, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
@@ -784,6 +788,7 @@ static const struct tune_params cortexa72_tunings =
   &cortexa57_vector_cost,
   &generic_branch_cost,
   &generic_approx_modes,
+  SVE_NOT_IMPLEMENTED, /* sve_width  */
   4, /* memmov_cost  */
   3, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
@@ -810,6 +815,7 @@ static const struct tune_params cortexa73_tunings =
   &cor

Re: RFC: libiberty PATCH to disable demangling of ancient mangling schemes

2018-12-07 Thread Jason Merrill


On 12/7/18 6:36 AM, Richard Biener wrote:

On Thu, Dec 6, 2018 at 10:22 PM Jason Merrill  wrote:


On Thu, Dec 6, 2018 at 11:14 AM Jason Merrill  wrote:


Looks good to me.  Independently, do you see a reason not to disable the
old demangler entirely?


Like so.  Does anyone object to this?  These mangling schemes haven't
been relevant in decades.


Why #ifdef the code?  Just rip it out?


I was thinking as an intermediate measure in case some user wanted it 
for some reason, but I'd be fine with that as well.


Jason

[RFA] [target/87369] Prefer "bit" over "bfxil"

2018-12-07 Thread Jeff Law

As I suggested in the BZ, this patch rejects constants with  just the
high bit set for the recently added "bfxil" pattern.  As a result we'll
return to using "bit" for the test in the BZ.

I'm not versed enough in aarch64 performance tuning to know if "bit" is
actually a better choice than "bfxil".  "bit" results in better code for
the testcase, but that seems more a function of register allocation than
"bit" being inherently better than "bfxil".   Obviously someone with
more aarch64 knowledge needs to make a decision here.

My first iteration of the patch changed "aarch64_high_bits_all_ones_p".
We could still go that way too, though the name probably needs to change.

I've bootstrapped and regression tested on aarch64-linux-gnu and it
fixes the regression.  I've also bootstrapped aarch64_be-linux-gnu, but
haven't done any kind of regression tested on that platform.


OK for the trunk?

Jeff
PR target/87369
* config/aarch64/aarch64.md (aarch64_bfxil): Do not accept
constant with just the high bit set.  That's better handled by
the "bit" pattern.

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 88f66104db3..ad6822410c2 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -5342,9 +5342,11 @@
(match_operand:GPI 3 "const_int_operand" "n, Ulc"))
(and:GPI (match_operand:GPI 2 "register_operand" "0,r")
(match_operand:GPI 4 "const_int_operand" "Ulc, n"]
-  "(INTVAL (operands[3]) == ~INTVAL (operands[4]))
-  && (aarch64_high_bits_all_ones_p (INTVAL (operands[3]))
-|| aarch64_high_bits_all_ones_p (INTVAL (operands[4])))"
+  "(INTVAL (operands[3]) == ~INTVAL (operands[4])
+&& ((aarch64_high_bits_all_ones_p (INTVAL (operands[3]))
+&& popcount_hwi (INTVAL (operands[3])) != 1)
+|| (aarch64_high_bits_all_ones_p (INTVAL (operands[4]))
+   && popcount_hwi (INTVAL (operands[4])) != 1)))"
   {
 switch (which_alternative)
 {

Re: [PATCH v2] Fix PR64242

2018-12-07 Thread Jakub Jelinek

On Fri, Dec 07, 2018 at 02:52:48PM +, Wilco Dijkstra wrote:
> -  struct __attribute__((aligned (32))) S { int a[4]; } s;
>  
> -  bar (&s);  
>  

Any reason to remove the above?

>p = __builtin_alloca (x);
> +  q = __builtin_alloca (x);
>if (!__builtin_setjmp (buf))
>  broken_longjmp (buf);
>  
> +  /* Compute expected next alloca offset - some targets don't align properly
> + and allocate too much.  */
> +  p = q + (q - p);

This is UB, pointer difference is only defined within the same object.
So, you can only do such subtraction in some integral type rather than as
pointer subtraction. 

> +
>/* Fails if stack pointer corrupted.  */
> -  q = __builtin_alloca (x);
> -  if (foo (p) < foo (q))
> -{
> -  if (foo (q) - foo (p) >= 1024)
> - abort ();
> -}
> -  else if (foo (p) - foo (q) >= 1024)
> +  if (p != __builtin_alloca (x))

And I'm not sure you have a guarantee that every zero sized alloca is at the
same offset from the previous one.

Jakub

Too strict synchronization with the local (host) thread? (was: [PATCH 5/6, OpenACC, libgomp] Async re-work, C/C++ testsuite changes)

2018-12-07 Thread Thomas Schwinge

Hi Chung-Lin!

On Tue, 25 Sep 2018 21:11:42 +0800, Chung-Lin Tang  
wrote:
> These are the testsuite/libgomp.oacc-c-c++-common/* changes.

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-79.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-79.c
> @@ -114,6 +114,7 @@ main (int argc, char **argv)
>  
>for (i = 0; i < N; i++)
>  {
> +  stream = (CUstream) acc_get_cuda_stream (i & 1);
>r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, stream, kargs, 0);

What's the motivation for this change?

And then:

> @@ -122,11 +123,11 @@ main (int argc, char **argv)
>   }
>  }
>  
> -  acc_wait_async (0, 1);
> -
>if (acc_async_test (0) != 0)
>  abort ();
>  
> +  acc_wait_async (0, 1);
> +
>if (acc_async_test (1) != 0)
>  abort ();

I somehow feel that this change...

> --- a/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-81.c
> +++ b/libgomp/testsuite/libgomp.oacc-c-c++-common/lib-81.c
> @@ -133,7 +133,7 @@ main (int argc, char **argv)
>  
>for (i = 0; i <= N; i++)
>  {
> -  if (acc_async_test (i) != 0)
> +  if (acc_async_test (i) == 0)
>   abort ();
>  }

..., and this change are needed because we're now more strictly
synchronizing with the local (host) thread.

Regarding the case of "libgomp.oacc-c-c++-common/lib-81.c", as currently
present:

[...]
  for (i = 0; i < N; i++)
{
  r = cuLaunchKernel (delay, 1, 1, 1, 1, 1, 1, 0, streams[i], kargs, 0);
  if (r != CUDA_SUCCESS)
{
  fprintf (stderr, "cuLaunchKernel failed: %d\n", r);
  abort ();
}
}

This launches N kernels on N separate async queues/CUDA streams, [0..N).

  acc_wait_all_async (N);

Then, the "acc_wait_all_async (N)" -- in my understanding! -- should
*not* synchronize with the local (host) thread, but instead just set up
the additional async queue/CUDA stream N to "depend" on [0..N).

  for (i = 0; i <= N; i++)
{
  if (acc_async_test (i) != 0)
abort ();
}

Thus, all [0..N) should then still be "acc_async_test (i) != 0" (still
running).

  acc_wait (N);

Here, the "acc_wait (N)" would synchronize the local (host) thread with
async queue/CUDA stream N and thus recursively with [0..N).

  for (i = 0; i <= N; i++)
{
  if (acc_async_test (i) != 1)
abort ();
}
[...]

So, then all these async queues/CUDA streams here indeed are
"acc_async_test (i) != 1", thas is, idle.

Now, the more strict synchronization with the local (host) thread is not
wrong in term of correctness, but I suppose it will impact performance of
otherwise asynchronous operations, which now get synchronized too much?

Or, of course, I'm misunderstanding something...

(For avoidance of doubt, I would accept the "async re-work" as is, but we
should eventually clarify this, and restore the behavior we -- apparently
-- had before, where we didn't synchronize so much?  (So, technically,
the "async re-work" would constitute a regression for this kind of
usage?)

Grüße
 Thomas

[RFC] Handle LHS zero_extracts in DSE

2018-12-07 Thread Andreas Krebbel

Hi,

debugging a problem with an older GCC (4.6).  I've seen RTXs using
ZERO_EXTRACT LHS operands on MEMs as in:

(insn 7 6 0 (set (zero_extract:DI (mem/s/c:QI (plus:DI (reg/f:DI 39 
virtual-stack-vars)
(const_int -5 [0xfffb])) [0+0 S1 A8])
(const_int 24 [0x18])
(const_int 0 [0]))
(subreg:DI (reg:SI 45) 0)) t.c:19 -1
 (nil))

occurring with that testcase:

struct a
{
  struct
  {
char b:1;
  } c;
  char d[3];
  char e;
  struct
  {
  int:1}
};

void
f ()
{
  struct a g = { };
  char h[4] = "foo";
  memcpy (g.d, h, 3);
  g.c.b = 1;
  i (g);
}


The program gets miscompiled with GCC 4.6 since DSE ignores that INSN
when it comes to recording stores. In that particular case it led to
the assignment g.c.b = 1 overwriting parts of the g.d field.

I couldn't reproduce that on recent GCCs. Since that fix from Jakub
I don't get these zero_extract anymore.

Author: jakub
Date: Thu Jan 26 14:09:29 2012
New Revision: 183560

URL: http://gcc.gnu.org/viewcvs?root=gcc&view=rev&rev=183560
Log:
 PR middle-end/51895
 * expr.c (expand_expr_real_1): Handle BLKmode MEM_REF of
 non-addressable non-BLKmode base correctly.

 * g++.dg/opt/pr51895.C: New test.


However, I still think that's a legal RTX which needs to be handled by
DSE correctly. I couldn't find anything preventing that problem from
occuring also with current GCCs. On the other hand I couldn't trigger
it with anything in testsuite.

Perhaps something like that would be needed for current GCC:

diff --git a/gcc/dse.c b/gcc/dse.c
index 21d166d..d27fb54 100644
--- a/gcc/dse.c
+++ b/gcc/dse.c
@@ -1346,6 +1346,47 @@ record_store (rtx body, bb_info_t bb_info)

   mem = SET_DEST (body);

+  /* Deal with (zero_extract:XX (mem:QI ...)).  */
+  if (GET_CODE (mem) == ZERO_EXTRACT && MEM_P (XEXP (mem, 0)))
+{
+  rtx size_op = XEXP (mem, 1);
+  rtx offset_op = XEXP (mem, 2);
+  enum machine_mode mode = GET_MODE (mem);
+  scalar_int_mode int_mode;
+
+  /* Turn a (zero_extract:XX (mem:QI ...)) into a (mem:BLK with
+the proper size and the adjusted address.  */
+  if (CONST_INT_P (size_op)
+ && CONST_INT_P (offset_op)
+ && is_a  (mode, &int_mode)
+ && INTVAL (size_op) + INTVAL (offset_op) <= GET_MODE_PRECISION 
(int_mode))
+   {
+ int size = INTVAL (size_op);
+ int offset = INTVAL (offset_op);
+
+ mem = copy_rtx (XEXP (mem, 0));
+ if (BITS_BIG_ENDIAN)
+   mem = adjust_address (mem, BLKmode, offset / BITS_PER_UNIT);
+ else
+   mem = adjust_address (mem, BLKmode,
+ (GET_MODE_BITSIZE (int_mode) - size - offset)
+ / BITS_PER_UNIT);
+
+ set_mem_size (mem, (size - 1) / BITS_PER_UNIT + 1);
+   }
+  else
+   {
+ if (find_reg_note (insn_info->insn, REG_UNUSED, mem) == NULL)
+   {
+ /* If the set or clobber is unused, then it does not effect our
+ability to get rid of the entire insn.  */
+ insn_info->cannot_delete = true;
+ clear_rhs_from_active_local_stores ();
+   }
+ return 0;
+   }
+}
+
   /* If this is not used, then this cannot be used to keep the insn
  from being deleted.  On the other hand, it does provide something
  that can be used to prove that another store is dead.  */

Re: RFC: libiberty PATCH to disable demangling of ancient mangling schemes

2018-12-07 Thread Pedro Alves

Adding gdb-patches, since demangling affects gdb.

Ref: https://gcc.gnu.org/ml/gcc-patches/2018-12/msg00407.html

On 12/07/2018 10:40 AM, Jakub Jelinek wrote:
> On Fri, Dec 07, 2018 at 10:27:17AM +, Nick Clifton wrote:
 Looks good to me.  Independently, do you see a reason not to disable the
 old demangler entirely?
>>>
>>> Like so.  Does anyone object to this?  These mangling schemes haven't
>>> been relevant in decades.
>>
>> I am not really familiar with this old scheme, so please excuse my ignorance
>> in asking these questions:
>>
>>   * How likely is it that there are old toolchain in use out there that 
>> still 
>> use the v2 mangling ?  Ie I guess that I am asking "which generation(s)
>> of gcc used v2 mangling ?"
> 
> GCC 3.0 and up used the new (Itanium C++ ABI) mangling, 2.95 and older used 
> the old
> mangling (2.96-RH used the new mangling I believe).
> So you need compiler older than 17.5 years to have the old mangling.
> Such a compiler didn't support most of the contemporarily used platforms
> though at all (e.g. x86-64, powerpc64le, aarch64, I believe not even
> powerpc64-linux).
> 
Yeah.

I guess the question would be whether it is reasonable to expect
that people will still need to debug&inspect (with gdb, c++filt, etc.)
any such old binary, and that they will need to do it with with modern
tools, as opposed to sticking with older binutils&gdb, and how often
would that be needed.

I would say that it's very, very unlikely, and not worth it of the
maintenance burden.

Last I heard of 2.95-produced binaries I think was for some ancient 
gcc-2.95-based
cross compiler that was still being minimally maintained, because it was needed
to build&maintain some legacy stuff.  That was maybe over 8 years ago, and
it was off trunk.  It's probably dead by now.  And if isn't dead,
whoever maintains the compiler off trunk certainly can also maintain old-ish
binutils & gdb off trunk.

Thanks,
Pedro Alves

patch for PR88349

2018-12-07 Thread Vladimir Makarov


The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88349

The patch is simple and therefore I checked it only on x86-64.

Committed as rev. 226894.

Index: ChangeLog
===
--- ChangeLog	(revision 266893)
+++ ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2018-12-07  Vladimir Makarov  
+
+	PR rtl-optimization/88349
+	* ira-costs.c (record_operand_costs): Check bigger reg class on
+	NO_REGS.
+
 2018-12-07  Richard Sandiford  
 
 	* config/aarch64/aarch64-sve.md (*mul3, *v3):
Index: ira-costs.c
===
--- ira-costs.c	(revision 266862)
+++ ira-costs.c	(working copy)
@@ -1327,8 +1327,9 @@ record_operand_costs (rtx_insn *insn, en
 	 fit the the hard reg class (e.g. DImode for AREG on
 	 i386).  Check this and use a bigger class to get the
 	 right cost.  */
-	  if (! ira_hard_reg_in_set_p (other_regno, mode,
-   reg_class_contents[hard_reg_class]))
+	  if (bigger_hard_reg_class != NO_REGS
+	  && ! ira_hard_reg_in_set_p (other_regno, mode,
+	  reg_class_contents[hard_reg_class]))
 	hard_reg_class = bigger_hard_reg_class;
 	  i = regno == (int) REGNO (src) ? 1 : 0;
 	  for (k = cost_classes_ptr->num - 1; k >= 0; k--)
Index: testsuite/ChangeLog
===
--- testsuite/ChangeLog	(revision 266893)
+++ testsuite/ChangeLog	(working copy)
@@ -1,3 +1,8 @@
+2018-12-07  Vladimir Makarov  
+
+	PR rtl-optimization/88349
+	* gcc.target/mips/pr88349.c: New.
+
 2018-12-07  Jakub Jelinek  
 
 	PR c++/86669
Index: testsuite/gcc.target/mips/pr88349.c
===
--- testsuite/gcc.target/mips/pr88349.c	(nonexistent)
+++ testsuite/gcc.target/mips/pr88349.c	(working copy)
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-mel -mabi=32 -march=mips64r2 -fexpensive-optimizations" } */
+/* { dg-skip-if "code quality test" { *-*-* } { "-O0" } { "" } } */
+
+typedef int DI __attribute__((mode(DI)));
+typedef int SI __attribute__((mode(SI)));
+
+__attribute__((mips16)) SI
+f (SI x, SI y)
+{
+  return ((DI) x * y) >> 32;
+}
+
+/* { dg-final { scan-assembler-not "\tsw\t" } } */

Re: RFA/RFC: Add stack recursion limit to libiberty's demangler [v5]

2018-12-07 Thread H.J. Lu

On Thu, Dec 6, 2018 at 10:04 AM Ian Lance Taylor via gcc-patches
 wrote:
>
> On Thu, Dec 6, 2018 at 3:12 AM Nick Clifton  wrote:
> >
> >   Is the patch OK with you ?
>

This caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88409


-- 
H.J.

Re: [PATCH v2] Fix PR64242

2018-12-07 Thread Wilco Dijkstra

Hi,

Jakub Jelinek wrote:
> On Fri, Dec 07, 2018 at 02:52:48PM +, Wilco Dijkstra wrote:
>> -  struct __attribute__((aligned (32))) S { int a[4]; } s;   
>>  
>> -  bar (&s); 
>>  
>
> Any reason to remove the above?

The test case doesn't need an aligned object to fail, so why did you add it?

>> +  /* Compute expected next alloca offset - some targets don't align properly
>> + and allocate too much.  */
>> +  p = q + (q - p);
>
> This is UB, pointer difference is only defined within the same object.
> So, you can only do such subtraction in some integral type rather than as
> pointer subtraction. 

__builtin_setjmp is already undefined behaviour, and the stack corruption is
even more undefined - trying to avoid harmless theoretical undefined behaviour
wouldn't be helpful.

> And I'm not sure you have a guarantee that every zero sized alloca is at the
> same offset from the previous one.

The above pointer adjustment handles the case where alloca overallocates.
It passes on x86-64 which always adds 8 unnecessary bytes.

Wilco

Re: [PATCH v2] Fix PR64242

2018-12-07 Thread Jakub Jelinek

On Fri, Dec 07, 2018 at 04:19:22PM +, Wilco Dijkstra wrote:
> Jakub Jelinek wrote:
> > On Fri, Dec 07, 2018 at 02:52:48PM +, Wilco Dijkstra wrote:
> >> -  struct __attribute__((aligned (32))) S { int a[4]; } s; 
> >>    
> >> -  bar (&s);   
> >>    
> >
> > Any reason to remove the above?
> 
> The test case doesn't need an aligned object to fail, so why did you add it?

It needed it on i686, because otherwise it happened to see the value it
wanted in the caller's stack frame.

> >> +  /* Compute expected next alloca offset - some targets don't align 
> >> properly
> >> + and allocate too much.  */
> >> +  p = q + (q - p);
> >
> > This is UB, pointer difference is only defined within the same object.
> > So, you can only do such subtraction in some integral type rather than as
> > pointer subtraction. 
> 
> __builtin_setjmp is already undefined behaviour, and the stack corruption is
> even more undefined - trying to avoid harmless theoretical undefined behaviour
> wouldn't be helpful.

No, __builtin_setjmp is a GNU extension, not undefined behavior.  And
something that is UB and might be harmless today might be harmful tomorrow,
gcc optimizes heavily on the assumption that UB doesn't happen in the
program, so might optimize that subtraction to 0 or 42 or whatever else.

> 
> > And I'm not sure you have a guarantee that every zero sized alloca is at the
> > same offset from the previous one.
> 
> The above pointer adjustment handles the case where alloca overallocates.
> It passes on x86-64 which always adds 8 unnecessary bytes.

What guarantee is there that it overallocates each time the same though?

Jakub

[PATCH] Set DEMANGLE_RECURSION_LIMIT to 1536

2018-12-07 Thread H.J. Lu

On Fri, Dec 7, 2018 at 8:17 AM H.J. Lu  wrote:
>
> On Thu, Dec 6, 2018 at 10:04 AM Ian Lance Taylor via gcc-patches
>  wrote:
> >
> > On Thu, Dec 6, 2018 at 3:12 AM Nick Clifton  wrote:
> > >
> > >   Is the patch OK with you ?
> >
>
> This caused:
>
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=88409
>

Here is the fix.   OK for trunk?

Thanks.

-- 
H.J.
From 676dc7f98d0c191e550f87df70393116d9e19ccb Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 7 Dec 2018 08:20:45 -0800
Subject: [PATCH] Set DEMANGLE_RECURSION_LIMIT to 1536

Define DEMANGLE_RECURSION_LIMIT to 1536 for

_ZN4modc6parser8sequenceINS_9astParser13LocatedParserINS0_9ParserRefINS2_UlRNS2_16TokenParserInputEE_EINS0_14OptionalParserINS2_18ListParserTemplateILNS_6tokens5Token4TypeE4EXadL_ZNSD_Ut_13parenthesized6ParserINS4_INS0_6ParserIS5_NS_3ast10ExpressionENSA_INS4_INS2_22OneOfKeywordsToTParserINSJ_5StyleEEENS0_14SequenceParserIS5_INS0_18ExactElementParserIS5_EENSA_ISM_ENS0_14RepeatedParserINS4_INS0_15TransformParserINSU_IS5_INS4_INSP_INSJ_10Annotation12RelationshipESX_EEENS2_UlNS2_3LocES12_ONS_5MaybeISK_EEE19_ELb0EENSU_INS0_17ExtractParserTypeIT_E9InputTypeEINS0_8MaybeRefIS1F_E4TypeEDpNS1I_IT0_E4TypeOS1F_DpOS1L_

the recursion level can reach 1306.

	PR other/88409
	* demangle.h (DEMANGLE_RECURSION_LIMIT): Set to 1536.
---
 include/demangle.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/demangle.h b/include/demangle.h
index 1e67fe2fb3..d9de074bef 100644
--- a/include/demangle.h
+++ b/include/demangle.h
@@ -77,7 +77,7 @@ extern "C" {
 /* If DMGL_NO_RECURSE_LIMIT is not enabled, then this is the value used as
the maximum depth of recursion allowed.  It should be enough for any
real-world mangled name.  */
-#define DEMANGLE_RECURSION_LIMIT 1024
+#define DEMANGLE_RECURSION_LIMIT 1536
   
 /* Enumeration of possible demangling styles.
 
-- 
2.19.2

Re: RFC: libiberty PATCH to disable demangling of ancient mangling schemes

2018-12-07 Thread Nick Clifton

Hi Guys,

 Looks good to me.  Independently, do you see a reason not to disable the
 old demangler entirely?

>>   * How likely is it that there are old toolchain in use out there that 
>> still 
>> use the v2 mangling ?

> GCC 3.0 and up used the new (Itanium C++ ABI) mangling, 2.95 and older used 
> the old
> mangling (2.96-RH used the new mangling I believe).
> So you need compiler older than 17.5 years to have the old mangling.
> Such a compiler didn't support most of the contemporarily used platforms
> though at all (e.g. x86-64, powerpc64le, aarch64, I believe not even
> powerpc64-linux).

Well that is good enough for me. :-)  I do not have the power to approve
the patch, but I would certainly be happy to see it go in.

Cheers
  Nick

Re: [PATCH][AArch64][1/2] Implement TARGET_ESTIMATED_POLY_VALUE

2018-12-07 Thread Richard Sandiford

"Kyrill Tkachov"  writes:
> diff --git a/gcc/config/aarch64/aarch64-protos.h 
> b/gcc/config/aarch64/aarch64-protos.h
> index 
> 1fe1a50d52aeb3719cf30c4a2af41abb8dd7233d..fa3c247f0773e1d4101b6209b6b7ba6cd50f82eb
>  100644
> --- a/gcc/config/aarch64/aarch64-protos.h
> +++ b/gcc/config/aarch64/aarch64-protos.h
> @@ -252,6 +252,10 @@ struct tune_params
>const struct cpu_vector_cost *vec_costs;
>const struct cpu_branch_cost *branch_costs;
>const struct cpu_approx_modes *approx_modes;
> +  /* Width of the SVE registers or SVE_NOT_IMPLEMENTED if not appicable.

Typo: applicable.

OK with that change, thanks.

Richard

Re: [PATCH][AArch64][2/2] Add sve_width -moverride tunable

2018-12-07 Thread Richard Sandiford

"Kyrill Tkachov"  writes:
> @@ -10834,6 +10836,34 @@ aarch64_parse_tune_string (const char *tune_string,
>"tune=");
>  }
>  
> +/* Parse the sve_width tuning moverride string in TUNE_STRING.
> +   Accept the valid SVE vector widths allowed by
> +   aarch64_sve_vector_bits_enum and use it to override sve_width
> +   in TUNE.  */
> +
> +static void
> +aarch64_parse_sve_width_string (const char *tune_string,
> + struct tune_params *tune)
> +{
> +  int width = -1;
> +
> +  int n = sscanf (tune_string, "%d", &width);
> +  if (n == EOF)
> +error ("invalid format for sve_width");

Should probably return here, otherwise we'll report a second
error for width == -1.

> +  switch (width)
> +{
> +  case SVE_128:
> +  case SVE_256:
> +  case SVE_512:
> +  case SVE_1024:
> +  case SVE_2048:
> + break;
> +  default:

> + error ("invalid sve_width value: %d", width);
> +}
> +  tune->sve_width = (enum aarch64_sve_vector_bits_enum) width;
> +}

Formatting nit: cases should line up with the "{".

OK with those changes, thanks.

Richard

Re: [PATCH v2] Fix PR64242

2018-12-07 Thread Wilco Dijkstra

Hi,

Jakub Jelinek wrote:
On Fri, Dec 07, 2018 at 04:19:22PM +, Wilco Dijkstra wrote:

>> The test case doesn't need an aligned object to fail, so why did you add it?
>
> It needed it on i686, because otherwise it happened to see the value it
> wanted in the caller's stack frame.

Right, so I fixed that by increasing the size of the frame in broken_setjmp to 
be
larger than the frame in main, so it's now extremely unlikely to accidentally 
read
from a random stack location and end up with a valid stack pointer.

> >> +  /* Compute expected next alloca offset - some targets don't align 
> >> properly
> >> + and allocate too much.  */
> >> +  p = q + (q - p);
> >
> > This is UB, pointer difference is only defined within the same object.
> > So, you can only do such subtraction in some integral type rather than as
> > pointer subtraction. 
> 
> __builtin_setjmp is already undefined behaviour, and the stack corruption is
> even more undefined - trying to avoid harmless theoretical undefined behaviour
> wouldn't be helpful.

> No, __builtin_setjmp is a GNU extension, not undefined behavior.  

Well the evidence is that it's undocumented, unspecified and causes undefined
behaviour...

> And 
> something that is UB and might be harmless today might be harmful tomorrow,
> gcc optimizes heavily on the assumption that UB doesn't happen in the
> program, so might optimize that subtraction to 0 or 42 or whatever else.
>
>> > And I'm not sure you have a guarantee that every zero sized alloca is at 
>> > the
>> > same offset from the previous one.
>> 
>> The above pointer adjustment handles the case where alloca overallocates.
>> It passes on x86-64 which always adds 8 unnecessary bytes.
>
> What guarantee is there that it overallocates each time the same though?

How could it not be? It could only vary if it was reading an uninitialized 
register or
adding a random extra amount as a form of ASLR. But there is no point in trying
to support future unknown features/bugs since it will give false negatives 
today.

Wilco

[PATCH, committed][PR88408][rs6000] mmintrin.h: fix use of "vector"

2018-12-07 Thread Paul Clarke

A recent patch inadvertently added the use of "vector" to mmintrin.h
when all such uses should be "__vector".

Committed as obvious/trivial.

[gcc]

2018-12-07  Paul A. Clarke  

PR target/88408
* config/rs6000/mmintrin.h (_mm_packs_pu16): Correctly use "__vector".

Index: gcc/config/rs6000/mmintrin.h
===
--- gcc/config/rs6000/mmintrin.h(revision 266871)
+++ gcc/config/rs6000/mmintrin.h(working copy)
@@ -228,9 +228,9 @@ _mm_packs_pu16 (__m64 __m1, __m64 __m2)
 #endif
   const __vector signed short __zero = { 0 };
   __vector __bool short __select = vec_cmplt (vm1, __zero);
-  r = vec_packs ((vector unsigned short) vm1, (vector unsigned short) vm1);
+  r = vec_packs ((__vector unsigned short) vm1, (__vector unsigned short) vm1);
   __vector __bool char packsel = vec_pack (__select, __select);
-  r = vec_sel (r, (const vector unsigned char) __zero, packsel);
+  r = vec_sel (r, (const __vector unsigned char) __zero, packsel);
   return (__m64) ((__vector long long) r)[0];
 }

Re: [PATCH AutoFDO/2]Treat ZERO as common profile probability/count

2018-12-07 Thread Jan Hubicka

> Hi Honza,
> I have committed the typo fix as revision 266885.
> Also I followed your suggestion (IIUC) by calling
> profile_count::adjust_for_ipa_scaling for zero den in function
> update_profiling_info.  It works and does make more sense than
> changing the global zero check logic.
> Patch tested as before, is it ok?

Thanks, patch is OK.
What is situation with AutoFDO now? It would be very nice to get it
fixed for the release :)

Honza
> 
> Thanks,
> bin
> 
> diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
> index 4471bae11c7..5074ef63da1 100644
> --- a/gcc/ipa-cp.c
> +++ b/gcc/ipa-cp.c
> @@ -3715,9 +3715,11 @@ update_profiling_info (struct cgraph_node *orig_node,
>new_sum = orig_node_count.combine_with_ipa_count (new_sum);
>orig_node->count = remainder;
> 
> +  profile_count::adjust_for_ipa_scaling (&new_sum, &orig_node_count);
>for (cs = new_node->callees; cs; cs = cs->next_callee)
>  cs->count = cs->count.apply_scale (new_sum, orig_node_count);
> 
> +  profile_count::adjust_for_ipa_scaling (&remainder, &orig_node_count);
>for (cs = orig_node->callees; cs; cs = cs->next_callee)
>  cs->count = cs->count.apply_scale (remainder, orig_node_count);
> 
> 2018-12-07  Bin Cheng  
> 
> * ipa-cp.c (update_profiling_info): Call adjust_for_ipa_scaling for
> zero profile count.

[committed] [PR middle-end/87813] EVRP vs global range data for warning passes

2018-12-07 Thread Jeff Law

So as discussed in BZ87813, the evrp range analyzer currently reflects
any globally valid ranges it finds into the global range information we
attach to SSA_NAMEs.

That is usually a good thing, except when the analyzer is called from a
warning pass.  Enabling/disabling of warnings should not affect code
generation.

This patch allows clients to indicate if they want the range analyzer to
reflect globally discovered ranges into the SSA_NAME.  The optimization
passes obviously want that behavior.  The sprintf warning pass does not.
 As expected, it was just a matter of adding a flag to the class and
testing it in a few places.

There was a solitary test, identified in the BZ, that turned out to be
problematical.

builtins/strnlen.c at -Og was relying on the range analyzer instance in
the sprintf warning pass to set a global range.  That in turn allowed
the call to strnlen to be eliminated.  With this patch that no longer
occurs and we call strnlen.  strnlen in this test is actually wrapped so
that it aborts if OPTIMIZE is defined.

I didn't see any reasonable way to preserve the optimization at -Og
given we don't run VRP, DOM, EVRP, etc.  So I've just added a strnlen.x
file to filter out the -Og run of the strnlen test.

Bootstrapped and regression tested on x86_64-linux-gnu.  Installing on
the trunk momentarily.

jeff
* gimple-ssa-evrp-analyze.h (class evrp_range_analyzer): Add
m_update_global_ranges member.  Add corresponding argument to ctor.
* gimple-ssa-evrp-analyze.c
(evrp_range_analyzer::evrp_range_analyzer): Add new argument and
initialize m_update_global_ranges.
(evrp_range_analyzer::set_ssa_range_info): Assert that we are
updating global ranges.
(evrp_range_analyzer::record_ranges_from_incoming_edge): Only
update global ranges if explicitly requested.
(evrp_range_analyzer::record_ranges_from_phis): Similarly.
(evrp_range_analyzer::record_ranges_from_stmt): Similarly.
* gimple-ssa-evrp.c (evrp_dom_walker): Pass new argument to
evrp_range_analyzer ctor.
* gimple-ssa-sprintf.c (sprintf_dom_walker): Similarly.
* tree-ssa-dom.c (dom_opt_dom_walker): Similarly.

* gcc.c-torture/builtins/strnlen.x: New file to filter -Og from
options to test.

diff --git a/gcc/gimple-ssa-evrp-analyze.c b/gcc/gimple-ssa-evrp-analyze.c
index 220dde093b5..3efaca1a994 100644
--- a/gcc/gimple-ssa-evrp-analyze.c
+++ b/gcc/gimple-ssa-evrp-analyze.c
@@ -42,7 +42,8 @@ along with GCC; see the file COPYING3.  If not see
 #include "vr-values.h"
 #include "gimple-ssa-evrp-analyze.h"
 
-evrp_range_analyzer::evrp_range_analyzer () : stack (10)
+evrp_range_analyzer::evrp_range_analyzer (bool update_global_ranges)
+  : stack (10), m_update_global_ranges (update_global_ranges)
 {
   edge e;
   edge_iterator ei;
@@ -107,6 +108,8 @@ evrp_range_analyzer::try_find_new_range (tree name,
 void
 evrp_range_analyzer::set_ssa_range_info (tree lhs, value_range *vr)
 {
+  gcc_assert (m_update_global_ranges);
+
   /* Set the SSA with the value range.  */
   if (INTEGRAL_TYPE_P (TREE_TYPE (lhs)))
 {
@@ -213,6 +216,7 @@ evrp_range_analyzer::record_ranges_from_incoming_edge 
(basic_block bb)
continue;
  push_value_range (vrs[i].first, vrs[i].second);
  if (is_fallthru
+ && m_update_global_ranges
  && all_uses_feed_or_dominated_by_stmt (vrs[i].first, stmt))
{
  set_ssa_range_info (vrs[i].first, vrs[i].second);
@@ -267,7 +271,8 @@ evrp_range_analyzer::record_ranges_from_phis (basic_block 
bb)
   vr_values->update_value_range (lhs, &vr_result);
 
   /* Set the SSA with the value range.  */
-  set_ssa_range_info (lhs, &vr_result);
+  if (m_update_global_ranges)
+   set_ssa_range_info (lhs, &vr_result);
 }
 }
 
@@ -309,7 +314,8 @@ evrp_range_analyzer::record_ranges_from_stmt (gimple *stmt, 
bool temporary)
  /* Case one.  We can just update the underlying range
 information as well as the global information.  */
  vr_values->update_value_range (output, &vr);
- set_ssa_range_info (output, &vr);
+ if (m_update_global_ranges)
+   set_ssa_range_info (output, &vr);
}
  else
{
diff --git a/gcc/gimple-ssa-evrp-analyze.h b/gcc/gimple-ssa-evrp-analyze.h
index 0d2b4188b8c..01942f06362 100644
--- a/gcc/gimple-ssa-evrp-analyze.h
+++ b/gcc/gimple-ssa-evrp-analyze.h
@@ -23,7 +23,7 @@ along with GCC; see the file COPYING3.  If not see
 class evrp_range_analyzer
 {
  public:
-  evrp_range_analyzer (void);
+  evrp_range_analyzer (bool update_global_ranges);
   ~evrp_range_analyzer (void)
   {
 delete vr_values;
@@ -70,6 +70,9 @@ class evrp_range_analyzer
 
   /* STACK holds the old VR.  */
   auto_vec > stack;
+
+  /* True if we are updating global ranges, false otherwise.  */
+  bool m_update_global_r

Re: [PATCH][AArch64][2/2] Add sve_width -moverride tunable

2018-12-07 Thread Kyrill Tkachov



On 07/12/18 16:32, Richard Sandiford wrote:

"Kyrill Tkachov"  writes:

@@ -10834,6 +10836,34 @@ aarch64_parse_tune_string (const char *tune_string,
 "tune=");
  }
  
+/* Parse the sve_width tuning moverride string in TUNE_STRING.

+   Accept the valid SVE vector widths allowed by
+   aarch64_sve_vector_bits_enum and use it to override sve_width
+   in TUNE.  */
+
+static void
+aarch64_parse_sve_width_string (const char *tune_string,
+   struct tune_params *tune)
+{
+  int width = -1;
+
+  int n = sscanf (tune_string, "%d", &width);
+  if (n == EOF)
+error ("invalid format for sve_width");

Should probably return here, otherwise we'll report a second
error for width == -1.


+  switch (width)
+{
+  case SVE_128:
+  case SVE_256:
+  case SVE_512:
+  case SVE_1024:
+  case SVE_2048:
+   break;
+  default:
+   error ("invalid sve_width value: %d", width);
+}
+  tune->sve_width = (enum aarch64_sve_vector_bits_enum) width;
+}

Formatting nit: cases should line up with the "{".

OK with those changes, thanks.


Thanks Richard. This is what I've committed with r266898.

Kyrill


Richard


diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 7ccc6b78d5872d6b43491badbfa9f2d70580015c..da050b88bdc4859e6c3eb7f90023a05868536399 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -1086,12 +1086,14 @@ struct aarch64_tuning_override_function
 
 static void aarch64_parse_fuse_string (const char*, struct tune_params*);
 static void aarch64_parse_tune_string (const char*, struct tune_params*);
+static void aarch64_parse_sve_width_string (const char*, struct tune_params*);
 
 static const struct aarch64_tuning_override_function
 aarch64_tuning_override_functions[] =
 {
   { "fuse", aarch64_parse_fuse_string },
   { "tune", aarch64_parse_tune_string },
+  { "sve_width", aarch64_parse_sve_width_string },
   { NULL, NULL }
 };
 
@@ -10834,6 +10836,37 @@ aarch64_parse_tune_string (const char *tune_string,
  "tune=");
 }
 
+/* Parse the sve_width tuning moverride string in TUNE_STRING.
+   Accept the valid SVE vector widths allowed by
+   aarch64_sve_vector_bits_enum and use it to override sve_width
+   in TUNE.  */
+
+static void
+aarch64_parse_sve_width_string (const char *tune_string,
+struct tune_params *tune)
+{
+  int width = -1;
+
+  int n = sscanf (tune_string, "%d", &width);
+  if (n == EOF)
+{
+  error ("invalid format for sve_width");
+  return;
+}
+  switch (width)
+{
+case SVE_128:
+case SVE_256:
+case SVE_512:
+case SVE_1024:
+case SVE_2048:
+  break;
+default:
+  error ("invalid sve_width value: %d", width);
+}
+  tune->sve_width = (enum aarch64_sve_vector_bits_enum) width;
+}
+
 /* Parse TOKEN, which has length LENGTH to see if it is a tuning option
we understand.  If it is, extract the option string and handoff to
the appropriate function.  */
diff --git a/gcc/testsuite/gcc.target/aarch64/sve/override_sve_width_1.c b/gcc/testsuite/gcc.target/aarch64/sve/override_sve_width_1.c
new file mode 100644
index ..3752fdc2a7198783d2ed5c5f502c3227f98029b8
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/sve/override_sve_width_1.c
@@ -0,0 +1,9 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -moverride=sve_width=512" } */
+
+void __attribute__((noinline, noclone))
+vadd (int *dst, int *op1, int *op2, int count)
+{
+  for (int i = 0; i < count; ++i)
+dst[i] = op1[i] + op2[i];
+}

Re: [RFA] [target/87369] Prefer "bit" over "bfxil"

2018-12-07 Thread Richard Earnshaw (lists)

On 07/12/2018 15:52, Jeff Law wrote:
> As I suggested in the BZ, this patch rejects constants with  just the
> high bit set for the recently added "bfxil" pattern.  As a result we'll
> return to using "bit" for the test in the BZ.
> 
> I'm not versed enough in aarch64 performance tuning to know if "bit" is
> actually a better choice than "bfxil".  "bit" results in better code for
> the testcase, but that seems more a function of register allocation than
> "bit" being inherently better than "bfxil".   Obviously someone with
> more aarch64 knowledge needs to make a decision here.
> 
> My first iteration of the patch changed "aarch64_high_bits_all_ones_p".
> We could still go that way too, though the name probably needs to change.
> 
> I've bootstrapped and regression tested on aarch64-linux-gnu and it
> fixes the regression.  I've also bootstrapped aarch64_be-linux-gnu, but
> haven't done any kind of regression tested on that platform.
> 
> 
> OK for the trunk?

The problem here is that the optimum solution depends on the register
classes involved and we don't know this during combine.  If we have
general register, then we want bfi/bfxil to be used; if we have a vector
register, then bit is preferable as it changes 3 inter-bank register
copies to a single inter-bank copy; and that copy might be hoisted out
of a loop.

For example, this case:

unsigned long
f (unsigned long a, unsigned long b)
{
  return (b & 0x7fff) | (a & 0x8000);
}

before your patch this expands to just a single bfxil instruction and
that's exactly what we'd want here.  With it, however, I'm now seeing

f:
and x1, x1, 9223372036854775807
and x0, x0, -9223372036854775808
orr x0, x1, x0
ret

which seems to be even worse than gcc-8 where we got a bfi instruction.

Ultimately, the best solution here will probably depend on which we
think is more likely, copysign or the example I give above.

It might be that for copysign we'll need to expand initially to some
unspec that uses a register initialized with a suitable immediate, but
otherwise hides the operation from combine until after that has run,
thus preventing the compiler from doing the otherwise right thing.  We'd
lose in the (hopefully) rare case where the operands really were in
general registers, but otherwise win for the more common case where they
aren't.

R.

> 
> Jeff
> 
> 
> P
> 
>   PR target/87369
>   * config/aarch64/aarch64.md (aarch64_bfxil): Do not accept
>   constant with just the high bit set.  That's better handled by
>   the "bit" pattern.
> 
> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 88f66104db3..ad6822410c2 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -5342,9 +5342,11 @@
>   (match_operand:GPI 3 "const_int_operand" "n, Ulc"))
>   (and:GPI (match_operand:GPI 2 "register_operand" "0,r")
>   (match_operand:GPI 4 "const_int_operand" "Ulc, n"]
> -  "(INTVAL (operands[3]) == ~INTVAL (operands[4]))
> -  && (aarch64_high_bits_all_ones_p (INTVAL (operands[3]))
> -|| aarch64_high_bits_all_ones_p (INTVAL (operands[4])))"
> +  "(INTVAL (operands[3]) == ~INTVAL (operands[4])
> +&& ((aarch64_high_bits_all_ones_p (INTVAL (operands[3]))
> +  && popcount_hwi (INTVAL (operands[3])) != 1)
> +|| (aarch64_high_bits_all_ones_p (INTVAL (operands[4]))
> + && popcount_hwi (INTVAL (operands[4])) != 1)))"
>{
>  switch (which_alternative)
>  {
>

Re: [Patch 1/4][Aarch64] v2: Implement Aarch64 SIMD ABI

2018-12-07 Thread Richard Sandiford

Sorry for the slow review.

Steve Ellcey  writes:
> @@ -1470,6 +1479,45 @@ aarch64_hard_regno_mode_ok (unsigned regno, 
> machine_mode mode)
>return false;
>  }
>  
> +/* Return true if this is a definition of a vectorized simd function.  */
> +
> +static bool
> +aarch64_simd_decl_p (tree fndecl)
> +{
> +  tree fntype;
> +
> +  if (fndecl == NULL)
> +return false;
> +  fntype = TREE_TYPE (fndecl);
> +  if (fntype == NULL)
> +return false;
> +
> +  /* All functions with the aarch64_vector_pcs attribute use the simd ABI.  
> */
> +  if (lookup_attribute ("aarch64_vector_pcs", TYPE_ATTRIBUTES (fntype)) != 
> NULL)
> +return true;
> +  /* Functions without the aarch64_vector_pcs or simd attribute never use the
> + simd ABI.  */
> +  if (lookup_attribute ("simd", DECL_ATTRIBUTES (fndecl)) == NULL)
> +return false;
> +  /* Functions with the simd attribute can generate three versions of a
> + function, a masked vector function, an unmasked vector function,
> + and a scalar version.  Only the vector versions use the simd ABI.  */
> +  return (VECTOR_TYPE_P (TREE_TYPE (fntype)));

Is this enough?  E.g.:

void __attribute__ ((simd)) f (int *x) { *x = 1; }

generates SIMD clones but doesn't have a vector return type.

I'm not an expert on this stuff, but it looks like:

  struct cgraph_node *node = cgraph_node::get (fndecl);
  return node && node->simdclone;

might work.  But in some ways it would be cleaner to add the
aarch64_vector_pcs attribute for SIMD clones, e.g. via a new hook,
so that the function type is "correct".

It might be more efficient to save aarch64_simd_decl_p in cfun->machine.

> @@ -4863,6 +4949,7 @@ aarch64_process_components (sbitmap components, bool 
> prologue_p)
>mergeable with the current one into a pair.  */
>if (!satisfies_constraint_Ump (mem)
> || GP_REGNUM_P (regno) != GP_REGNUM_P (regno2)
> +   || (aarch64_simd_decl_p (cfun->decl) && (FP_REGNUM_P (regno)))

Formatting nit: redundant brackets around FP_REGNUM_P (regno).

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 82af4d4..44261ee 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -724,7 +724,13 @@
>""
>  )
>  
> -(define_insn "simple_return"
> +(define_expand "simple_return"
> +  [(simple_return)]
> +  "aarch64_use_simple_return_insn_p ()"
> +  ""
> +)
> +
> +(define_insn "*simple_return"
>[(simple_return)]
>""
>"ret"

Can't you just change the condition on the existing define_insn,
without turning it in a define_expand?  Worth a comment explaining
why if not.

> @@ -1487,6 +1538,23 @@
>[(set_attr "type" "neon_store1_2reg")]
>  )
>  
> +(define_insn "storewb_pair_"
> +  [(parallel
> +[(set (match_operand:P 0 "register_operand" "=&k")
> +  (plus:P (match_operand:P 1 "register_operand" "0")
> +  (match_operand:P 4 "aarch64_mem_pair_offset" "n")))
> + (set (mem:TX (plus:P (match_dup 0)
> +  (match_dup 4)))

Should be indented under the (match_dup 0).

> +  (match_operand:TX 2 "register_operand" "w"))
> + (set (mem:TX (plus:P (match_dup 0)
> +  (match_operand:P 5 "const_int_operand" "n")))
> +  (match_operand:TX 3 "register_operand" "w"))])]

Think this last part should be:

 (set (mem:TX (plus:P (plus:P (match_dup 0)
  (match_dup 4))
  (match_operand:P 5 "const_int_operand" "n")))
  (match_operand:TX 3 "register_operand" "w"))])]

> +  "TARGET_SIMD &&
> +   INTVAL (operands[5]) == INTVAL (operands[4]) + GET_MODE_SIZE 
> (mode)"

&& should be on the second line (which makes the line long enough to
need breaking).

> diff --git a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c 
> b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c
> index e69de29..bf6e64a 100644
> --- a/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c
> +++ b/gcc/testsuite/gcc.target/aarch64/torture/simd-abi-2.c
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +
> +void
> +f (void)
> +{
> +  /* Clobber all fp/simd regs and verify that the correct ones are saved
> + and restored in the prologue and epilogue of a SIMD function. */
> +  __asm__ __volatile__ ("" :::  "q0",  "q1",  "q2",  "q3");
> +  __asm__ __volatile__ ("" :::  "q4",  "q5",  "q6",  "q7");
> +  __asm__ __volatile__ ("" :::  "q8",  "q9", "q10", "q11");
> +  __asm__ __volatile__ ("" ::: "q12", "q13", "q14", "q15");
> +  __asm__ __volatile__ ("" ::: "q16", "q17", "q18", "q19");
> +  __asm__ __volatile__ ("" ::: "q20", "q21", "q22", "q23");
> +  __asm__ __volatile__ ("" ::: "q24", "q25", "q26", "q27");
> +  __asm__ __volatile__ ("" ::: "q28", "q29", "q30", "q31");
> +}
> +
> +/* { dg-final { scan-assembler {\sstp\td8, d9} } } */
> +/* { dg-final { scan-assembler {\sstp\td10, d11} } } */
> +/* { dg-final { scan-assembler {\sstp\td12, d13} } } */
> +/* { dg-final { scan-assembler {\sstp\td1

Re: [PATCH, rs6000] Fix PR87496: ICE in aggregate_value_p at gcc/function.c:2046

2018-12-07 Thread Peter Bergner

On 12/4/18 4:53 PM, Segher Boessenkool wrote:
> Okay, so you make -mabi={ibm,ieee}longdouble be valid options only if
> there is -mlong-double-128.  Could you add that to the documentation then
> please?

Done.


>> Is this ok for mainline once bootstrap and regtesting come back clean?
> 
> Okay with that documentation added and if it tests okay, yes.  Thanks!

...and committed to trunk.


>> Since I backported the earlier fix to GCC8, I'd like to backport this
>> there too.
> 
> Okay for there too.

Great, I'll backport the changes and commit after regression testing.
Thanks!

Peter

Re: RFC: libiberty PATCH to disable demangling of ancient mangling schemes

2018-12-07 Thread Tom Tromey

> "Pedro" == Pedro Alves  writes:

Pedro> I would say that it's very, very unlikely, and not worth it of the
Pedro> maintenance burden.

Agreed, and especially true for the more unusual demanglings like Lucid
or EDG.

On the gdb side perhaps we can get rid of "demangle-style" now.  It
probably hasn't worked properly in years, and after this it would be
guaranteed not to.

Tom

Re: [RFA] [target/87369] Prefer "bit" over "bfxil"

2018-12-07 Thread Jeff Law

On 12/7/18 10:31 AM, Richard Earnshaw (lists) wrote:
> On 07/12/2018 15:52, Jeff Law wrote:
>> As I suggested in the BZ, this patch rejects constants with  just the
>> high bit set for the recently added "bfxil" pattern.  As a result we'll
>> return to using "bit" for the test in the BZ.
>>
>> I'm not versed enough in aarch64 performance tuning to know if "bit" is
>> actually a better choice than "bfxil".  "bit" results in better code for
>> the testcase, but that seems more a function of register allocation than
>> "bit" being inherently better than "bfxil".   Obviously someone with
>> more aarch64 knowledge needs to make a decision here.
>>
>> My first iteration of the patch changed "aarch64_high_bits_all_ones_p".
>> We could still go that way too, though the name probably needs to change.
>>
>> I've bootstrapped and regression tested on aarch64-linux-gnu and it
>> fixes the regression.  I've also bootstrapped aarch64_be-linux-gnu, but
>> haven't done any kind of regression tested on that platform.
>>
>>
>> OK for the trunk?
> 
> The problem here is that the optimum solution depends on the register
> classes involved and we don't know this during combine.  If we have
> general register, then we want bfi/bfxil to be used; if we have a vector
> register, then bit is preferable as it changes 3 inter-bank register
> copies to a single inter-bank copy; and that copy might be hoisted out
> of a loop.
Ugh.  Things are never simple, are they?

> 
> Ultimately, the best solution here will probably depend on which we
> think is more likely, copysign or the example I give above.
I'd tend to suspect we'd see more pure integer bit twiddling than the
copysign stuff.


> 
> It might be that for copysign we'll need to expand initially to some
> unspec that uses a register initialized with a suitable immediate, but
> otherwise hides the operation from combine until after that has run,
> thus preventing the compiler from doing the otherwise right thing.  We'd
> lose in the (hopefully) rare case where the operands really were in
> general registers, but otherwise win for the more common case where they
> aren't.
Could we have the bfxil pattern have an alternative that accepts vector
regs and generates bit in appropriate circumstances?

Hmm, maybe the other way around would be better.   Have the "bit"
pattern with a general register alternative that generates bfxil when
presented with general registers.

I would generally warn against hiding things in unspecs like you've
suggested above.  We're seeing cases where that's been on in the x86
backend and it's inhibiting optimizations in various places.

Thoughts?
Jeff

Re: [RFA] [target/87369] Prefer "bit" over "bfxil"

2018-12-07 Thread Wilco Dijkstra

Hi,

>> Ultimately, the best solution here will probably depend on which we
>> think is more likely, copysign or the example I give above.
> I'd tend to suspect we'd see more pure integer bit twiddling than the
> copysign stuff.

All we need to do is to clearly separate the integer and FP/SIMD cases.
Copysign should always expand into a pattern that cannot generate
integer instructions. This could be done by adding a bit/bif pattern with
UNSPEC for the DI/SImode case or use V2DI/V2SI in the copysign
expansion.
 
> Could we have the bfxil pattern have an alternative that accepts vector
> regs and generates bit in appropriate circumstances?

We already do that in too many cases, and it only makes the problem
worse since the register allocator cannot cost these patterns at all (let
alone accurately). This is particularly bad when the expansions are
wildly different and emit extra instructions which cannot be optimized
after register allocation.

We simply need to make an early choice which register file to use.

> Hmm, maybe the other way around would be better.   Have the "bit"
> pattern with a general register alternative that generates bfxil when
> presented with general registers.

We already have that, and that's a very complex pattern which already
results in inefficient integer code.

For the overlapping cases between bfi and bfxil the mid-end should really
simplify one into the other to avoid having to have multiple MD patterns
for equivalent expressions. This may solve the problem.

> I would generally warn against hiding things in unspecs like you've
> suggested above.  We're seeing cases where that's been on in the x86
> backend and it's inhibiting optimizations in various places.

In the general case we can't describe a clear preference for a specific
register file without support for scalar vector types (eg. V1DI, V1SI) or
having a way to set virtual register preferences at expand time.

Wilco

Re: [RFA] [target/87369] Prefer "bit" over "bfxil"

2018-12-07 Thread Jeff Law

On 12/7/18 11:48 AM, Wilco Dijkstra wrote:
> Hi,
> 
>>> Ultimately, the best solution here will probably depend on which we
>>> think is more likely, copysign or the example I give above.
>> I'd tend to suspect we'd see more pure integer bit twiddling than the
>> copysign stuff.
> 
> All we need to do is to clearly separate the integer and FP/SIMD cases.
> Copysign should always expand into a pattern that cannot generate
> integer instructions. This could be done by adding a bit/bif pattern with
> UNSPEC for the DI/SImode case or use V2DI/V2SI in the copysign
> expansion.
As I've noted, adding those unspecs is likely to get in the way of
things like CSE, combine, etc.

>  
>> Could we have the bfxil pattern have an alternative that accepts vector
>> regs and generates bit in appropriate circumstances?
> 
> We already do that in too many cases, and it only makes the problem
> worse since the register allocator cannot cost these patterns at all (let
> alone accurately). This is particularly bad when the expansions are
> wildly different and emit extra instructions which cannot be optimized
> after register allocation.
I'm not sure what you mean by it can't cost them.  Costs certainly
factor into the algorithms used by IRA/LRA.  But I would agree that it's
not particularly good at costing across register banks and modeling the
cost of reloads it'll have to generate if it needs to move a value from
one bank to another.

> 
> We simply need to make an early choice which register file to use.
GCC fundamentally isn't designed to do that.

> 
>> Hmm, maybe the other way around would be better.   Have the "bit"
>> pattern with a general register alternative that generates bfxil when
>> presented with general registers.
> 
> We already have that, and that's a very complex pattern which already
> results in inefficient integer code.
> 
> For the overlapping cases between bfi and bfxil the mid-end should really
> simplify one into the other to avoid having to have multiple MD patterns
> for equivalent expressions. This may solve the problem.
Well, the bfxil pattern is general enough to handle both, the problem is
it only works on one register file.

> 
>> I would generally warn against hiding things in unspecs like you've
>> suggested above.  We're seeing cases where that's been on in the x86
>> backend and it's inhibiting optimizations in various places.
> 
> In the general case we can't describe a clear preference for a specific
> register file without support for scalar vector types (eg. V1DI, V1SI) or
> having a way to set virtual register preferences at expand time.
I'm going to step away from this problem.  It looked like it might be
trackable, but there's clearly a lot more to it and someone with more
experience on aarch64 will have to run with it.

Patch withdrawn.

jeff

Re: Pass GDCFLAGS and CCASFLAGS to libphobos subdirs

2018-12-07 Thread Iain Buclaw

On Fri, 7 Dec 2018 at 14:30, Rainer Orth  wrote:
>
> When trying to rebuild libphobos with -g3 -O0 for better debugging, I
> noticed that GDCFLAGS weren't passed down as expected.  It turned out
> that they are missing from AM_MAKEFLAGS.  After I fixed this, the only
> file still compiled with -g -O2 was libdruntime/core/threadasm.S, so I
> added CCASFLAGS, too.
>
> Tested on i386-pc-solaris2.11.  Ok for mainline?
>
> Rainer
>
> --
> -
> Rainer Orth, Center for Biotechnology, Bielefeld University
>
>
> 2018-12-06  Rainer Orth  
>
> * Makefile.am (AM_MAKEFLAGS): Pass CCASFLAGS, GDCFLAGS.
> * Makefile.in: Regenerate.
>

Thanks, looks good to me.

-- 
Iain

[PATCH 1/2] v3: C++: more location wrapper nodes (PR c++/43064, PR c++/43486)

2018-12-07 Thread David Malcolm

On Tue, 2018-12-04 at 18:31 -0500, Jason Merrill wrote:
> On 12/3/18 5:10 PM, Jeff Law wrote:
> > On 11/19/18 9:51 AM, David Malcolm wrote:
[...]
> > Well, you probably know my biggest worry with this -- the
> > unwrappign
> > wack-a-mole problem.  It looks like additional memory usage is
> > measurable, but tiny.
> > 
> > Have you received any feedback from Jason on the C++ side?  That's
> > the
> > bulk of this patch AFAICT.
> > 
> > I don't see anything objectionable in the c-family/ or generic
> > bits.
> > But again I worry more about the need to sprinkle unwrapping all
> > over
> > the place.
> 
> If we're adding wrappers, any place that wants to look at particular 
> tree codes will need to deal with unwrapping.  The question then is 
> whether to just remove location wrappers or do fold_for_warn, which
> will 
> remove location wrappers and also try to provide a constant
> value.  In 
> general, the former is better for things that deal with the forms of
> the 
> code, and the latter for things that deal with values.

Thanks.

> > > @@ -1236,6 +1236,8 @@ unsafe_conversion_p (location_t loc, tree
> > > type, tree expr, tree result,
> > >   
> > >   loc = expansion_point_location_if_in_system_header (loc);
> > >   
> > > +  STRIP_ANY_LOCATION_WRAPPER (expr);
> > > +
> > > if (TREE_CODE (expr) == REAL_CST || TREE_CODE (expr) ==
> > > INTEGER_CST)
> 
> Why do the former here, when this function is interested in
> constants?

I've changed it to use fold_for_warn (in the v2 patch).

> >  warn_array_subscript_with_type_char (location_t loc, tree index)
> >  {
> > -  if (TYPE_MAIN_VARIANT (TREE_TYPE (index)) == char_type_node
> > -  && TREE_CODE (index) != INTEGER_CST)
> > -warning_at (loc, OPT_Wchar_subscripts,
> > -   "array subscript has type %");
> > +  if (TYPE_MAIN_VARIANT (TREE_TYPE (index)) == char_type_node)
> > +{
> > +  STRIP_ANY_LOCATION_WRAPPER (index);
> > +  if (TREE_CODE (index) != INTEGER_CST)
> > +   /* If INDEX has a location, use it; otherwise use LOC (the
> > location
> > +  of the subscripting expression as a whole).  */
> > +   warning_at (EXPR_LOC_OR_LOC (index, loc),
> > OPT_Wchar_subscripts,
> > +   "array subscript has type %");
> 
> Here you strip a location wrapper and then check whether the
> expression 
> has a location, which will give suboptimal results if the expression
> is 
> a variable.

I've updated it in the v3 patch to extract any location from INDEX
first.

> > @@ -7375,6 +7375,12 @@ fixed_type_or_null (tree instance, int
> > *nonnull, int *cdtorp)
> > }
> >return NULL_TREE;
> >  
> > +case VIEW_CONVERT_EXPR:
> > +  if (location_wrapper_p (instance))
> > +   return RECUR (TREE_OPERAND (instance, 0));
> > +  else
> > +   return NULL_TREE;
> 
> I think recursion is probably right for some non-location-wrapper
> uses 
> of VIEW_CONVERT_EXPR, as well, but for now let's just add a comment
> to 
> that effect.

Do you mean something like:

case VIEW_CONVERT_EXPR:
  if (location_wrapper_p (instance))
return RECUR (TREE_OPERAND (instance, 0));
  else
/* TODO: Recursion may be correct for some non-location-wrapper
   uses of VIEW_CONVERT_EXPR.  */
return NULL_TREE;

(I've done that in the v3 patch)

> > +   STRIP_ANY_LOCATION_WRAPPER (from);
> > +   if (TREE_CODE (from) == INTEGER_CST
> > +   && !integer_zerop (from))
> > + {
> > +   if (flags & tf_error)
> > + error_at (loc, "reinterpret_cast from integer to
> > pointer");
> 
> This might be another place where folding is the right answer.

I tried updating this to use fold_for_warn, but it caused this change:

 PASS -> FAIL : g++.dg/cpp0x/constexpr-nullptr-2.C  -std=c++14  (test for 
errors, line 197)
 PASS -> FAIL : g++.dg/cpp0x/constexpr-nullptr-2.C  -std=c++14  (test for 
errors, line 198)
 PASS -> FAIL : g++.dg/cpp0x/constexpr-nullptr-2.C  -std=c++14 (test for excess 
errors)
 PASS -> FAIL : g++.dg/cpp0x/constexpr-nullptr-2.C  -std=c++17  (test for 
errors, line 197)
 PASS -> FAIL : g++.dg/cpp0x/constexpr-nullptr-2.C  -std=c++17  (test for 
errors, line 198)
 PASS -> FAIL : g++.dg/cpp0x/constexpr-nullptr-2.C  -std=c++17 (test for excess 
errors)

With trunk, and with stripping, we get:
constexpr-nullptr-2.C:197:32: error: dereferencing a null pointer in '*0'
  197 | constexpr int* pj0 = &((S*)0)->j; // { dg-error "null pointer|not a 
constant" }
  |^
constexpr-nullptr-2.C:198:38: error: dereferencing a null pointer in '*0'
  198 | constexpr int* pj1 = &((S*)nullptr)->j; // { dg-error "null pointer|not 
a constant" }
  |  ^

With fold_for_warn we get:
constexpr-nullptr-2.C:197:22: error: reinterpret_cast from integer to pointer
  197 | constexpr int* pj0 = &((S*)0)->j; // { dg-error "null pointer|not a 
constant" }
  |  ^~~
constexpr-nullptr-2.C:198:22:

Re: [RFC] support --with-multilib-list=@/path/name

2018-12-07 Thread Alexandre Oliva

On Nov 12, 2018, Alexandre Oliva  wrote:

> I'm having second thoughts on specifying external files with a
> full pathname, though, as it might lead to accidental GPL violations.

Here's a patch that takes the multilib list file from gcc/config/arm.

Any objections, further requests or advice, before I put it in and
proceed to extend it to other targets?


support --with-multilib-list=@name for ARM

Introduce @name as a means to specify alternate multilib profiles as
arguments to --with-multilib-list.

So far this is only implemented for ARM.

Tested on x86_64-linux-gnu-x-arm-eabi, comparing multilib.h before and
after the patch after configuring with preexisting profiles, and also 
@name with and without preexisting profiles.


for  gcc/ChangeLog

* config.gcc (tmake_file): Add name to tmake_file for
each @name in --with-multilib-list on arm-*-* targets.
* doc/install.texi (with-multilib-list): Document it.
---
 gcc/config.gcc   |   13 +
 gcc/doc/install.texi |   43 +--
 2 files changed, 46 insertions(+), 10 deletions(-)

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 71f083555a44..2f10b73f525b 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -4040,6 +4040,7 @@ case "${target}" in
 
# Add extra multilibs
if test "x$with_multilib_list" != x; then
+   ml=
arm_multilibs=`echo $with_multilib_list | sed -e 's/,/ 
/g'`
if test "x${arm_multilibs}" != xdefault ; then
for arm_multilib in ${arm_multilibs}; do
@@ -4047,6 +4048,15 @@ case "${target}" in
aprofile|rmprofile)

tmake_profile_file="arm/t-multilib"
;;
+   @*)
+   ml=`echo "X$arm_multilib" | sed 
'1s,^X@,,'`
+   if test -f 
"${srcdir}/config/arm/${ml}"; then
+   
tmake_file="${tmake_file} arm/${ml}"
+   else
+   echo "Error: ${ml} does 
not exist in ${srcdir}/config/arm" >&2
+   exit 1
+   fi
+   ;;
*)
echo "Error: 
--with-multilib-list=${with_multilib_list} not supported." 1>&2
exit 1
@@ -4067,6 +4077,9 @@ case "${target}" in
|| test "x$with_mode" != x ; then
echo "Error: You cannot use any of 
--with-arch/cpu/fpu/float/mode with --with-multilib-list=${with_multilib_list}" 
1>&2
exit 1
+   elif test "x$ml" != x ; then
+   echo "Error: You cannot use builtin 
multilib profiles along with custom ones" 1>&2
+   exit 1
fi
# But pass the default value for float-abi
# through to the multilib selector
diff --git a/gcc/doc/install.texi b/gcc/doc/install.texi
index 19adb7ef8705..443003b0fde6 100644
--- a/gcc/doc/install.texi
+++ b/gcc/doc/install.texi
@@ -1078,18 +1078,41 @@ values and meaning for each target is given below.
 
 @table @code
 @item arm*-*-*
-@var{list} is a comma separated list of @code{aprofile} and @code{rmprofile}
-to build multilibs for A or R and M architecture profiles respectively.  Note
-that, due to some limitation of the current multilib framework, using the
-combined @code{aprofile,rmprofile} multilibs selects in some cases a less
-optimal multilib than when using the multilib profile for the architecture
-targetted.  The special value @code{default} is also accepted and is equivalent
-to omitting the option, ie. only the default run-time library will be enabled.
+@var{list} is a comma separated list of @code{aprofile} and
+@code{rmprofile} to build multilibs for A or R and M architecture
+profiles respectively.  Note that, due to some limitation of the current
+multilib framework, using the combined @code{aprofile,rmprofile}
+multilibs selects in some cases a less optimal multilib than when using
+the multilib profile for the architecture targetted.  The special value
+@code{default} is also accepted and is equivalent to omitting the
+option, i.e., only the default run-time library will be enabled.
+
+@var{list} may instead contain @code{@@name}, to use the multilib
+configuration Makefile fragment @file{name} in @file{gcc/config/arm} in
+the source tree (it is

Re: [wwwdocs] Add D language to news, frontends, release notes, and readings.

2018-12-07 Thread Iain Buclaw

On Sun, 2 Dec 2018 at 18:23, Gerald Pfeifer  wrote:
>
> Hi Iain,
>
> On Mon, 12 Nov 2018, Iain Buclaw wrote:
> > As suggested, this adds an announcement of the D front end addition
> > to the news items on the GCC home page, and from what I can tell, the
> > relevant pages where the language should get a mention.
>
> I noticed this hasn't gone in; apologies for missing it.  (You are
> welcome, though should not have to, copy me on wwwdocs changes, which
> usually helps a bit, though also others can approve those.)
>
> > Kept the release notes brief for now, will expand later.  Is this OK?
>
> Yes, just a minor nit:
>

After pushing, it was noticed that there was a small misspelling that
went unnoticed.  I've committed the correction as I don't think
there'll be any objection to the change.


--- htdocs/index.html6 Dec 2018 11:27:41 -1.1113
+++ htdocs/index.html7 Dec 2018 22:31:08 -
@@ -61,7 +61,7 @@
 D front end added
  [2018-10-29]
  The https://dlang.org";>D programming language front end
-   has beed added to GCC.
+   has been added to GCC.
This front end was contributed by Iain Buclaw.

 GCC 6.5 released


--
Iain

Re: [wwwdocs] Add D language to news, frontends, release notes, and readings.

2018-12-07 Thread Gerald Pfeifer

On Fri, 7 Dec 2018, Iain Buclaw wrote:
> After pushing, it was noticed that there was a small misspelling that
> went unnoticed.  I've committed the correction as I don't think
> there'll be any objection to the change.

Indeed.  Obvious changes and ones within your domain (so release
notes for D,...) are good to go any time - and in fact appreciated. :-)

Gerald

Re: [PATCH] Added information about inline assembler in stack calculations (.su files)

2018-12-07 Thread Segher Boessenkool

Hi Torbjorn,

Just some formatting nitpicking:

On Tue, Nov 27, 2018 at 07:45:59PM +, Torbjorn SVENSSON wrote:
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 38e27a50a1e..e61ddc3260b 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -6456,7 +6456,7 @@ Warn if the stack usage of a function might exceed 
> @var{byte-size}.
>  The computation done to determine the stack usage is conservative.
>  Any space allocated via @code{alloca}, variable-length arrays, or related
>  constructs is included by the compiler when determining whether or not to
> -issue a warning.
> +issue a warning. @code{asm} statements are ignored when computing stack 
> usage.

Two spaces after a full stop.  Please try to start sentences with a capital
letter.

> --- a/gcc/function.c
> +++ b/gcc/function.c
> @@ -1822,6 +1822,10 @@ instantiate_virtual_regs_in_insn (rtx_insn *insn)
>  
>if (asm_noperands (PATTERN (insn)) >= 0)
>  {
> +   if (flag_stack_usage_info)
> +  {
> +current_function_has_inline_assembler = 1;
> +  }

Single statements should not normally get a { block } wrapped around it.

> diff --git a/gcc/function.h b/gcc/function.h
> index 7e59050e8a6..8c3ef49e866 100644
> --- a/gcc/function.h
> +++ b/gcc/function.h
> @@ -208,11 +208,16 @@ struct GTY(()) stack_usage
>/* Nonzero if the amount of stack space allocated dynamically cannot
>   be bounded at compile-time.  */
>unsigned int has_unbounded_dynamic_stack_size : 1;
> +
> +  /* NonZero if body contains asm statement (ignored in stack calculations) 
> */
> +  unsigned int has_inline_assembler: 1;

"Nonzero", no camelcase.  I think it should be "an asm statement", or "some
asm statemement", or similar.  The line should end with a full stop and two
spaces, before the */.

> +  /* Add info regarding inline assembler (not part of stack calculations) */

Similar here.


Segher

73 matches

Mail list logo