date:20170621

Re: [Patch match.pd] Fold (A / (1 << B)) to (A >> B)

2017-06-21 Thread Richard Biener

On Tue, 20 Jun 2017, James Greenhalgh wrote:

> 
> On Fri, Jun 16, 2017 at 11:41:57AM +0200, Richard Biener wrote:
> > On Fri, 16 Jun 2017, James Greenhalgh wrote:
> > > On Mon, Jun 12, 2017 at 03:56:25PM +0200, Richard Biener wrote:
> > > > +   We can't do the same for signed A, as it might be negative, which
> > > > would
> > > > +   introduce undefined behaviour.  */
> > > >
> > > > huh, AFAIR it is _left_ shift of negative values that invokes
> > > > undefined behavior.
> > >
> > > You're right this is not a clear comment. The problem is not undefined
> > > behaviour, so that text needs to go, but rounding towards/away from zero
> > > for signed negative values. Division will round towards zero, arithmetic
> > > right shift away from zero. For example in:
> > >
> > > -1 / (1 << 1)   !=-1 >> 1
> > >   = -1 / 2
> > >   = 0 = -1
> > >
> > > I've rewritten the comment to make it clear this is why we can only make
> > > this optimisation for unsigned values.
> >
> > Ah, of course.  You could use
> >
> >  if ((TYPE_UNSIGNED (type)
> >   || tree_expr_nonnegative_p (@0))
> >
> > here as improvement.
> 
> Thanks, I've made that change.
> 
> > > See, for example, gcc.c-torture/execute/pr34070-2.c
> > >
> > > > Note that as you are accepting vectors you need to make sure the
> > > > target actually supports arithmetic right shift of vectors
> > > > (you only know it supports left shift and division -- so it might
> > > > be sort-of-superfluous to check in case there is no arch that supports
> > > > those but not the other).
> > >
> > > I've added a check for that using optabs, is that the right way to do 
> > > this?
> >
> > +  && (!VECTOR_TYPE_P (type)
> > +  || optab_for_tree_code (RSHIFT_EXPR, type, optab_vector)
> > +  || optab_for_tree_code (RSHIFT_EXPR, type, optab_scalar)))
> >
> > is not enough -- you need sth like
> >
> >  optab ot = optab_for_tree_code (RSHIFT_EXPR, type, optab_vector);
> >  if (ot != unknown_optab
> >  && optab_handler (ot, TYPE_MODE (type)) != CODE_FOR_nothing)
> >.. ok! ...
> >
> > ideally we'd have a helper for this in optab-tree.[ch],
> > tree-vect-patterns.c could also make use of that.
> 
> OK. I've added "target_has_vector_rshift_p" for this purpose.

Actually I was looking for a bit more generic

bool
target_supports_op_p (tree type, enum tree_code code,
  enum optab_subtype = optab_default)
{
  optab ot = optab_for_tree_code (code, type, optab_subtype);
  return (ot != unknown_optab
  && optab_handler (ot, TYPE_MODE (type)) != CODE_FOR_nothing);
}

and you using target_supports_op_p (type, RSHIFT_EXPR, optab_scalar)
|| target_supports_op_p (type, RSHIFT_EXPR, optab_vector)

Ok with that change.

Thanks,
Richard.

> Bootstrapped and tested on aarch64-none-linux-gnu with no issues.
> 
> OK?
> 
> Thanks,
> James
> 
> ---
> gcc/
> 
> 2017-06-19  James Greenhalgh  
> 
>   * match.pd (A / (1 << B) -> A >> B): New.
>   * generic-match-head.c: Include optabs-tree.h.
>   * gimple-match-head.c: Likewise.
>   * optabs-tree.h (target_has_vector_rshift_p): New.
>   * optabs-tree.c (target_has_vector_rshift_p): New.
> 
> gcc/testsuite/
> 
> 2017-06-19  James Greenhalgh  
> 
>   * gcc.dg/tree-ssa/forwprop-37.c: New.
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: SSA range class and removal of VR_ANTI_RANGEs

2017-06-21 Thread Aldy Hernandez


Hi folks.

The following is another iteration of the SSA range class, taking into 
account many of the suggestions posted on this thread, especially the 
addition of a memory efficient class for storage, folding non-zero bits 
back into the range information, C++ suggestions by Martin, and some 
minor suggestions.


Most importantly, I have included an irange_storage class that uses 
trailing_wide_ints<5>.  This produces far better results that my 
previous incarnation with wide_int[6] :).


The storage class is basically this:

class GTY((variable_size)) irange_storage
{
  friend class irange;
 public:
/* Maximum number of pairs of ranges allowed.  */
  static const unsigned int max_pairs = 2;
  /* These are the pair of subranges for the irange.  The last
 wide_int allocated is a mask representing which bits in an
 integer are known to be non-zero.  */
  trailing_wide_ints trailing_bounds;
}

Compare this with mainline which has trailing_wide_ints<3>.  The extra 2 
items in this patchset chew up two 64-bit words, for an additional 16 
bytes per range in SSA_NAME_RANGE_INFO.  No additional storage is needed 
for SSA_NAMEs per se.


I looked at Jakub's suggestion of compiling insn-recog.c.  Although I 
don't see 4M SSA_NAMES nodes created Jakub sees, I do see a little over 
a million when building with:


./cc1plus insn-recog.ii -fno-PIE -O2 -fno-exceptions -fno-rtti 
-fasynchronous-unwind-tables  -quiet -fsanitize=address,undefined 
-fmem-report


I explored 3 different ways of measuring memory consumption:

1. /usr/bin/time -f "%M" , which measures maximum RSS usage.  This 
produced results within the noise.  The RSS usage differed slightly 
between runs, with no consistent difference between mainline and patch.


2. valgrind --tool=massif , no difference.  Perhaps the overhead of 
our GC hides any difference?


3. --enable-gather-detailed-mem-stats and -fmem-report ...

Total Allocated before: 2351658176
Total Allocated  after: 2353199328
diff: 1541152 (0.06%)

SSA_NAME nodes allocated: 1026694

AFAICT with -fmem-report, a 2.35gig compilation consumes 1.5 more megs? 
This is total usage, and some of this gets cleaned up during GC, so the 
total impact is probably less.  Unless there is another preferred way of 
measuring memory usage, I think memory is a non-issue with this approach.


Note, this is even before my pending patch avoiding generation of 
useless range information 
(https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01068.html).


How does this look?

Aldy
diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index 8ace3c2..5e48d6e 100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -1416,6 +1416,7 @@ OBJS = \
print-rtl-function.o \
print-tree.o \
profile.o \
+   range.o \
read-md.o \
read-rtl.o \
read-rtl-function.o \
@@ -2484,6 +2485,7 @@ GTFILES = $(CPP_ID_DATA_H) $(srcdir)/input.h 
$(srcdir)/coretypes.h \
   $(srcdir)/gimple.h \
   $(srcdir)/gimple-ssa.h \
   $(srcdir)/tree-chkp.c \
+  $(srcdir)/range.h $(srcdir)/range.c \
   $(srcdir)/tree-ssanames.c $(srcdir)/tree-eh.c $(srcdir)/tree-ssa-address.c \
   $(srcdir)/tree-cfg.c $(srcdir)/tree-ssa-loop-ivopts.c \
   $(srcdir)/tree-dfa.c \
diff --git a/gcc/builtins.c b/gcc/builtins.c
index 4f6c9c4..b5c9eb0 100644
--- a/gcc/builtins.c
+++ b/gcc/builtins.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "tm_p.h"
 #include "stringpool.h"
 #include "tree-vrp.h"
+#include "range.h"
 #include "tree-ssanames.h"
 #include "expmed.h"
 #include "optabs.h"
@@ -2894,6 +2895,52 @@ builtin_memcpy_read_str (void *data, HOST_WIDE_INT 
offset,
   return c_readstr (str + offset, mode);
 }
 
+/* If a range IR may have wrapped in such a way that we can guess the
+   range is positive, return TRUE and set PROBABLE_MAX_SIZE.
+   Otherwise, return FALSE and leave PROBABLE_MAX_SIZE unchanged.  */
+
+static bool
+range_may_have_wrapped (irange ir,
+   unsigned HOST_WIDE_INT *probable_max_size)
+{
+  /* Code like:
+
+   signed int n;
+   if (n < 100)
+ {
+   # RANGE [0, 99][0x8000, 0x]
+  _1 = (unsigned) n;
+  memcpy (a, b, _1)
+ }
+
+ Produce a range allowing negative values of N.  We can still use
+ the information and make a guess that N is not negative.  */
+  if (ir.num_pairs () != 2
+  || ir.lower_bound () != 0)
+return false;
+
+  const_tree type = ir.get_type ();
+  unsigned precision = TYPE_PRECISION (type);
+  gcc_assert (TYPE_UNSIGNED (type));
+
+  /* Build a range with all the negatives: [0x8000, 0x].  */
+  wide_int minus_one = wi::bit_not (wide_int::from (0, precision, UNSIGNED));
+  wide_int smallest_neg = wi::lshift (minus_one, precision / 2 - 1);
+  irange negatives (type, smallest_neg, minus_one);
+
+  irange orig_range = ir;
+  ir.intersect (negatives);
+  if (ir == negatives)
+{
+  wide_int max = orig_range.upp

Re: [PATCH 2/2] DWARF: make it possible to emit debug info for declarations only

2017-06-21 Thread Richard Biener

On Tue, Jun 20, 2017 at 4:34 PM, Pierre-Marie de Rodat
 wrote:
> On 06/20/2017 02:16 PM, Richard Biener wrote:
>>
>> Nice.  This looks ok.
>
>
> Great, thank you!
>
>> I'm mildy curious about the deecrease of debuginfo size for cc1 -- did
>> you spot anything obvious there?
>
>
> Well, the benchmark I exposed was for the whole file size, not just
> .debug_info section size. Just to be sure, I compared object files for both
> trunk and my patched tree: outside of Ada units, I only get the following
> evolution:
>>
>> gcc/dwarf2out.o: -10168 bytes
>> gcc/godump.o: +272 bytes
>> gcc/passes.o: +880 bytes
>
>
> This diff comes from my changes themselves. I had a quick look at the same
> for cc1’s .debug_info: there is the expected evolution, too,
>
>> I suspect Fortran wants to do sth similar as Ada for imported modules.
>
>
> Maybe. I have zero Fortran knowledge, so I’ll let a Fortran expert decide,
> if that is fine for you. :-) In any case, the back-end is ready for that.

Sure, no obligation for you to enhance Fortran debug!

Thanks,
Richard.

> --
> Pierre-Marie de Rodat

Re: NOP conversions in X+CST+CST

2017-06-21 Thread Richard Biener

On Tue, Jun 20, 2017 at 11:00 PM, Marc Glisse  wrote:
> Hello,
>
> now that FRE was fixed to avoid infinite recursion, this patch passes
> bootstrap+testsuite on x86_64-pc-linux-gnu multilib with all languages
> (including ada).
>
> This isn't exactly the patch that was reverted, because the previous patch
> did not actually handle vectors properly.
>
> It still shouldn't interfere with the patch by Robin Dapp which IIUC only
> handles the case where the conversion is an extension.

Ok.

Richard.

> 2017-06-21  Marc Glisse  
>
> gcc/
> * match.pd (nop_convert): New predicate.
> ((A +- CST1) +- CST2): Allow some NOP conversions.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/addadd.c: Un-XFAIL.
> * gcc.dg/tree-ssa/addadd-2.c: New file.
>
> --
> Marc Glisse

Re: Add quality tracking for profile counter

2017-06-21 Thread Andreas Schwab

On Jun 19 2017, Jan Hubicka  wrote:

> this patch makes us to track quality of the profile.  This is useful
> to disable some agressive optimizations when counts are known to be
> unreliable.

This breaks bootstrap on ia64 with a comparison failure in value-prof.o.

The only difference is in gimple_value_profile_transformations:

@@ -8962,14 +8962,14 @@ Disassembly of section .text:
 ad06:  10 00 a0 01 42 80   mov r1=r104
 ad0c:  0d e0 02 84 mov r108=r92;;
 ad10:  09 00 cc 1c 90 11   [MMI]   st4 [r14]=r51
-ad16:  e0 00 20 30 20 60   ld8 r14=[r8]
+ad16:  00 00 00 02 00 60   nop.m 0x0
 ad1c:  0d 08 01 84 mov r107=r33;;
-ad20:  0b 70 d8 1c 0c 20   [MMI]   and r14=r54,r14;;
-ad26:  e0 38 39 1c 40 00   or r14=r39,r14
+ad20:  0b 70 00 10 18 10   [MMI]   ld8 r14=[r8];;
+ad26:  e0 b0 39 18 40 00   and r14=r54,r14
 ad2c:  00 00 04 00 nop.i 0x0;;
-ad30:  01 00 00 00 01 00   [MII]   nop.m 0x0
-ad36:  e0 90 39 22 20 00   dep r14=r50,r14,62,2
-ad3c:  00 00 04 00 nop.i 0x0;;
+ad30:  0b 70 9c 1c 0e 20   [MMI]   or r14=r39,r14;;
+ad36:  00 00 00 02 00 c0   nop.m 0x0
+ad3c:  21 73 44 40 dep r14=r50,r14,62,2;;
 ad40:  09 00 38 10 98 11   [MMI]   st8 [r8]=r14
 ad46:  00 98 41 20 23 00   st4 [r16]=r51
 ad4c:  00 00 04 00 nop.i 0x0;;


Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [i386] __builtin_ia32_stmxcsr could be pure

2017-06-21 Thread Uros Bizjak

Hello!

> glibc marks fegetround as a pure function. On x86, people tend to use
> _MM_GET_ROUNDING_MODE instead, which could benefit from the same. I think it 
> is safe, but
> a second opinion would be welcome. I could have handled just this builtin, 
> but it seemed better to
> provide def_builtin_pure (like "const" already has) since there should be 
> other builtins that can be
> marked this way (maybe the gathers?).
>
> Bootstrap+testsuite on x86_64-pc-linux-gnu with default languages.
>
> 2017-05-29  Marc Glisse  
>
> gcc/
> * config/i386/i386.c (struct builtin_isa): New field pure_p.
> Reorder for compactness.
> (def_builtin, def_builtin2, ix86_add_new_builtins): Handle pure_p.
> (def_builtin_pure, def_builtin_pure2): New functions.
> (ix86_init_mmx_sse_builtins) [__builtin_ia32_stmxcsr]: Mark as pure.
>
> gcc/testsuite/
> * gcc.target/i386/getround.c: New file.

OK.

Thanks,
Uros.

Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-21 Thread Andreas Schwab

On Jun 20 2017, Eric Botcazou  wrote:

> Right, because the Linux kernel for x86/x86-64 is the only OS flavor that 
> doesn't let you probe the stack ahead of the stack pointer.  All other 
> combinations of OS and architecture we tried (and it's quite a lot) do.

Take a look at do_page_fault in arch/*/mm/fault.c, there are a lot of
architectures that place a limit on how far you can probe below USP.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: [RFC PATCH] -fsanitize=pointer-overflow support (PR sanitizer/80998)

2017-06-21 Thread Jakub Jelinek

On Tue, Jun 20, 2017 at 10:18:20AM +0200, Richard Biener wrote:
> > It would be an attempt to avoid sanitizing int foo (int *p) { return p[10] 
> > + p[-5]; }
> > (when the offset is constant and small and we dereference it).
> > If there is no page mapped at NULL or at the highest page in the virtual
> > address space, then the above will crash in case p + 10 or p - 5 wraps
> > around.
> 
> Ah, so merely an optimization to avoid excessive instrumentation then,
> yes, this might work (maybe make 4096 a --param configurable to be able
> to disable it).

Yes.  And I think it can be implemented incrementally.

> > > > I've bootstrapped/regtested the patch on x86_64-linux and i686-linux
> > > > and additionally bootstrapped/regtested with bootstrap-ubsan on both 
> > > > too.
> > > > The latter revealed a couple of issues I'd like to discuss:
> > > > 
> > > > 1) libcpp/symtab.c contains a couple of spots reduced into:
> > > > #define DELETED ((char *) -1)
> > > > void bar (char *);
> > > > void
> > > > foo (char *p)
> > > > {
> > > >   if (p && p != DELETED)
> > > > bar (p);
> > > > }
> > > > where we fold it early into if ((p p+ -1) <= (char *) -3)
> > > > and as the instrumentation is done during ubsan pass, if p is NULL,
> > > > we diagnose this as invalid pointer overflow from NULL to 0x*f.
> > > > Shall we change the folder so that during GENERIC folding it
> > > > actually does the addition and comparison in pointer_sized_int
> > > > instead (my preference), or shall I move the UBSAN_PTR instrumentation
> > > > earlier into the FEs (but then I still risk stuff is folded earlier)?
> > > 
> > > Aww, so we turn the pointer test into a range test ;)  That it uses
> > > a pointer type rather than an unsigned integer type is a bug, probably
> > > caused by pointers being TYPE_UNSIGNED.
> > > 
> > > Not sure if the folding itself is worthwhile to keep though, thus an
> > > option would be to not generate range tests from pointers?
> > 
> > I'll have a look.  Maybe only do it during reassoc and not earlier.
> 
> It certainly looks somewhat premature in fold-const.c.

So for this, I have right now 2 variant patches:

The first one keeps doing what we were except for the
-fsanitize=pointer-overflow case and has been bootstrap-ubsan
bootstrapped/regtested on x86_64-linux and i686-linux.

The second one performs the addition and comparison in pointer sized
unsigned type instead (not bootstrapped yet).

I think the second one would be my preference.  Note build_range_check
is used not just during early folding, but e.g. during ifcombine, reassoc
etc.

Martin is contemplating instrumentation of pointer <=/=/> comparisons
and in that case we'd need some further build_range_check changes,
because while ptr == (void *) 0 || ptr == (void *) 1 || ptr == (void *) 2
would be without UB, ptr <= (void *) 2 would be UB, so we'd need to perform
all pointer range checks in integral type except the ones where we just do
EQ_EXPR/NE_EXPR.

Jakub
2017-06-21  Jakub Jelinek  

PR sanitizer/80998
* fold-const.c: Include asan.h.
(build_range_check): For -fsanitize=pointer-overflow don't
add pointer arithmetics for range test.

--- gcc/fold-const.c.jj 2017-06-14 18:07:47.0 +0200
+++ gcc/fold-const.c2017-06-20 17:05:44.351608513 +0200
@@ -79,6 +79,7 @@ along with GCC; see the file COPYING3.
 #include "tree-vrp.h"
 #include "tree-ssanames.h"
 #include "selftest.h"
+#include "asan.h"
 
 /* Nonzero if we are folding constants inside an initializer; zero
otherwise.  */
@@ -4906,6 +4907,14 @@ build_range_check (location_t loc, tree
 {
   if (value != 0 && !TREE_OVERFLOW (value))
{
+ /* Avoid creating pointer arithmetics that is not present
+in the source when sanitizing.  */
+ if (!integer_zerop (low)
+ && current_function_decl
+ && sanitize_flags_p (SANITIZE_POINTER_OVERFLOW,
+  current_function_decl))
+   return 0;
+
  low = fold_build1_loc (loc, NEGATE_EXPR, TREE_TYPE (low), low);
   return build_range_check (loc, type,
fold_build_pointer_plus_loc (loc, exp, low),
2017-06-21  Jakub Jelinek  

* fold-const.c (build_range_check): Compute pointer range check in
integral type if pointer arithmetics would be needed.  Formatting
fixes.

--- gcc/fold-const.c.jj 2017-06-20 21:38:04.0 +0200
+++ gcc/fold-const.c2017-06-21 09:23:00.572404964 +0200
@@ -4818,21 +4818,21 @@ build_range_check (location_t loc, tree
 
   if (low == 0)
 return fold_build2_loc (loc, LE_EXPR, type, exp,
-   fold_convert_loc (loc, etype, high));
+   fold_convert_loc (loc, etype, high));
 
   if (high == 0)
 return fold_build2_loc (loc, GE_EXPR, type, exp,
-   fold_convert_loc (loc, etype, low));
+   fold_convert_loc (loc, etype,

Re: [PING] C++ Re: [PATCH] C/C++: fix quoting of "aka" typedef information (PR 62170)

2017-06-21 Thread Volker Reichelt

On 20 Jun, Jason Merrill wrote:
> On Tue, Jun 20, 2017 at 3:06 PM, David Malcolm  wrote:
>> It's not clear to me what the issue alluded to with negative
>> obstack_blank is, but I chose to follow the above docs and use
>> obstack_blank_fast; am testing an updated patch in which the above line
>> now looks like:
>>
>>   obstack_blank_fast (ob, -(type_start + type_len));
>>
>> Is the patch OK with that change? (assuming bootstrap®rtesting
>> pass), or should I re-post?
> 
> OK with that change.
> 
>> On a related matter, this patch conflicts with Volker's patch here:
>>
>>   https://gcc.gnu.org/ml/gcc-patches/2017-04/msg01576.html
>>
>> in which he removes the trailing "{enum}" info (and hence all of our
>> changes to the testsuite conflict between the two patches...)
>>
>> Do you have any thoughts on that other patch? [Ccing Volker]
> 
> That patch makes sense to me; I prefer "enum E" to "E {enum}".
> 
> Jason

Is 'makes sense' equivalent to 'OK for trunk' here? If so, should my
patch go in before David's or should we do it the other way round?

Regards,
Volker

Re: [RFC PATCH] -fsanitize=pointer-overflow support (PR sanitizer/80998)

2017-06-21 Thread Jakub Jelinek

On Tue, Jun 20, 2017 at 09:41:43AM +0200, Richard Biener wrote:
> > 2) libcpp/line-map.c has this:
> > static int
> > location_adhoc_data_update (void **slot, void *data)
> > {
> >   *((char **) slot) += *((int64_t *) data);
> >   return 1;
> > }
> > where the (why int64_t always?, we really need just intptr_t) adjusts
> > one pointer from an unrelated one (result of realloc).  That is a UB
> > and actually can trigger this sanitization if the two regions are
> > far away from each other, e.g. on i686-linux:
> > ../../libcpp/line-map.c:102:21: runtime error: pointer index expression 
> > with base 0x0899e308 overflowed to 0xf74c4ab8
> > ../../libcpp/line-map.c:102:21: runtime error: pointer index expression 
> > with base 0x08add7c0 overflowed to 0xf74c9a08
> > ../../libcpp/line-map.c:102:21: runtime error: pointer index expression 
> > with base 0x092ba308 overflowed to 0xf741cab8
> > ../../libcpp/line-map.c:102:21: runtime error: pointer index expression 
> > with base 0x0a3757c0 overflowed to 0xf7453a08
> > Shall we perform the addition in uintptr_t instead to make it
> > implementation defined rather than UB?
> 
> Yes.

Here is a patch for 2), bootstrap-ubsan bootstrapped/regtested on
x86_64-linux and i686-linux, ok for trunk?

Note both ptrdiff_t and uintptr_t are already used in libcpp, so I think
it shouldn't create new portability issues.

2017-06-21  Jakub Jelinek  

* line-map.c (location_adhoc_data_update): Perform addition in
uintptr_t type rather than char * type.  Read *data using
ptrdiff_t type instead of int64_t.
(get_combined_adhoc_loc): Change offset type to ptrdiff_t from
int64_t.

--- libcpp/line-map.c.jj2017-06-19 08:28:18.0 +0200
+++ libcpp/line-map.c   2017-06-20 16:43:25.193063344 +0200
@@ -99,7 +99,8 @@ location_adhoc_data_eq (const void *l1,
 static int
 location_adhoc_data_update (void **slot, void *data)
 {
-  *((char **) slot) += *((int64_t *) data);
+  *((char **) slot)
+= (char *) ((uintptr_t) *((char **) slot) + *((ptrdiff_t *) data));
   return 1;
 }
 
@@ -221,7 +222,7 @@ get_combined_adhoc_loc (struct line_maps
  set->location_adhoc_data_map.allocated)
{
  char *orig_data = (char *) set->location_adhoc_data_map.data;
- int64_t offset;
+ ptrdiff_t offset;
  /* Cast away extern "C" from the type of xrealloc.  */
  line_map_realloc reallocator = (set->reallocator
  ? set->reallocator


Jakub

Re: [RFC PATCH] -fsanitize=pointer-overflow support (PR sanitizer/80998)

2017-06-21 Thread Richard Biener

On Wed, 21 Jun 2017, Jakub Jelinek wrote:

> On Tue, Jun 20, 2017 at 09:41:43AM +0200, Richard Biener wrote:
> > > 2) libcpp/line-map.c has this:
> > > static int
> > > location_adhoc_data_update (void **slot, void *data)
> > > {
> > >   *((char **) slot) += *((int64_t *) data);
> > >   return 1;
> > > }
> > > where the (why int64_t always?, we really need just intptr_t) adjusts
> > > one pointer from an unrelated one (result of realloc).  That is a UB
> > > and actually can trigger this sanitization if the two regions are
> > > far away from each other, e.g. on i686-linux:
> > > ../../libcpp/line-map.c:102:21: runtime error: pointer index expression 
> > > with base 0x0899e308 overflowed to 0xf74c4ab8
> > > ../../libcpp/line-map.c:102:21: runtime error: pointer index expression 
> > > with base 0x08add7c0 overflowed to 0xf74c9a08
> > > ../../libcpp/line-map.c:102:21: runtime error: pointer index expression 
> > > with base 0x092ba308 overflowed to 0xf741cab8
> > > ../../libcpp/line-map.c:102:21: runtime error: pointer index expression 
> > > with base 0x0a3757c0 overflowed to 0xf7453a08
> > > Shall we perform the addition in uintptr_t instead to make it
> > > implementation defined rather than UB?
> > 
> > Yes.
> 
> Here is a patch for 2), bootstrap-ubsan bootstrapped/regtested on
> x86_64-linux and i686-linux, ok for trunk?

Ok.

Richard.

> Note both ptrdiff_t and uintptr_t are already used in libcpp, so I think
> it shouldn't create new portability issues.
> 
> 2017-06-21  Jakub Jelinek  
> 
>   * line-map.c (location_adhoc_data_update): Perform addition in
>   uintptr_t type rather than char * type.  Read *data using
>   ptrdiff_t type instead of int64_t.
>   (get_combined_adhoc_loc): Change offset type to ptrdiff_t from
>   int64_t.
> 
> --- libcpp/line-map.c.jj  2017-06-19 08:28:18.0 +0200
> +++ libcpp/line-map.c 2017-06-20 16:43:25.193063344 +0200
> @@ -99,7 +99,8 @@ location_adhoc_data_eq (const void *l1,
>  static int
>  location_adhoc_data_update (void **slot, void *data)
>  {
> -  *((char **) slot) += *((int64_t *) data);
> +  *((char **) slot)
> += (char *) ((uintptr_t) *((char **) slot) + *((ptrdiff_t *) data));
>return 1;
>  }
>  
> @@ -221,7 +222,7 @@ get_combined_adhoc_loc (struct line_maps
> set->location_adhoc_data_map.allocated)
>   {
> char *orig_data = (char *) set->location_adhoc_data_map.data;
> -   int64_t offset;
> +   ptrdiff_t offset;
> /* Cast away extern "C" from the type of xrealloc.  */
> line_map_realloc reallocator = (set->reallocator
> ? set->reallocator
> 
> 
>   Jakub
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)

Re: [RFC PATCH] -fsanitize=pointer-overflow support (PR sanitizer/80998)

2017-06-21 Thread Richard Biener

On Wed, 21 Jun 2017, Jakub Jelinek wrote:

> On Tue, Jun 20, 2017 at 10:18:20AM +0200, Richard Biener wrote:
> > > It would be an attempt to avoid sanitizing int foo (int *p) { return 
> > > p[10] + p[-5]; }
> > > (when the offset is constant and small and we dereference it).
> > > If there is no page mapped at NULL or at the highest page in the virtual
> > > address space, then the above will crash in case p + 10 or p - 5 wraps
> > > around.
> > 
> > Ah, so merely an optimization to avoid excessive instrumentation then,
> > yes, this might work (maybe make 4096 a --param configurable to be able
> > to disable it).
> 
> Yes.  And I think it can be implemented incrementally.
> 
> > > > > I've bootstrapped/regtested the patch on x86_64-linux and i686-linux
> > > > > and additionally bootstrapped/regtested with bootstrap-ubsan on both 
> > > > > too.
> > > > > The latter revealed a couple of issues I'd like to discuss:
> > > > > 
> > > > > 1) libcpp/symtab.c contains a couple of spots reduced into:
> > > > > #define DELETED ((char *) -1)
> > > > > void bar (char *);
> > > > > void
> > > > > foo (char *p)
> > > > > {
> > > > >   if (p && p != DELETED)
> > > > > bar (p);
> > > > > }
> > > > > where we fold it early into if ((p p+ -1) <= (char *) -3)
> > > > > and as the instrumentation is done during ubsan pass, if p is NULL,
> > > > > we diagnose this as invalid pointer overflow from NULL to 0x*f.
> > > > > Shall we change the folder so that during GENERIC folding it
> > > > > actually does the addition and comparison in pointer_sized_int
> > > > > instead (my preference), or shall I move the UBSAN_PTR instrumentation
> > > > > earlier into the FEs (but then I still risk stuff is folded earlier)?
> > > > 
> > > > Aww, so we turn the pointer test into a range test ;)  That it uses
> > > > a pointer type rather than an unsigned integer type is a bug, probably
> > > > caused by pointers being TYPE_UNSIGNED.
> > > > 
> > > > Not sure if the folding itself is worthwhile to keep though, thus an
> > > > option would be to not generate range tests from pointers?
> > > 
> > > I'll have a look.  Maybe only do it during reassoc and not earlier.
> > 
> > It certainly looks somewhat premature in fold-const.c.
> 
> So for this, I have right now 2 variant patches:
> 
> The first one keeps doing what we were except for the
> -fsanitize=pointer-overflow case and has been bootstrap-ubsan
> bootstrapped/regtested on x86_64-linux and i686-linux.
> 
> The second one performs the addition and comparison in pointer sized
> unsigned type instead (not bootstrapped yet).
> 
> I think the second one would be my preference.  Note build_range_check
> is used not just during early folding, but e.g. during ifcombine, reassoc
> etc.
> 
> Martin is contemplating instrumentation of pointer <=/=/> comparisons
> and in that case we'd need some further build_range_check changes,
> because while ptr == (void *) 0 || ptr == (void *) 1 || ptr == (void *) 2
> would be without UB, ptr <= (void *) 2 would be UB, so we'd need to perform
> all pointer range checks in integral type except the ones where we just do
> EQ_EXPR/NE_EXPR.

Yes, exactly.

The 2nd patch is ok if it passes bootstrap/test.

Richard.

Backports to 6.x

2017-06-21 Thread Martin Liška


As release managers are planning to release next version of GCC 6. I would like 
to
do backport revisions attached.

The only complicated one is the one for PR69953 where I decided to backport
also refactoring patches applied by Nathan (244529, 244156).

I would appreciate another pair of eyes to look at backports.

Thanks,
Martin
>From 3d06a155b652468dae32382aae8abc9d6da10b77 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Wed, 31 May 2017 11:40:13 +
Subject: [PATCH 15/15] Backport r248729

gcc/ChangeLog:

2017-05-31  Martin Liska  

	PR target/79155
	* config/i386/cpuid.h: Fix typo in a comment in cpuid.h.
---
 gcc/config/i386/cpuid.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/config/i386/cpuid.h b/gcc/config/i386/cpuid.h
index d67eeae75ce..89e260c62eb 100644
--- a/gcc/config/i386/cpuid.h
+++ b/gcc/config/i386/cpuid.h
@@ -172,7 +172,7 @@
 
 
 /* Return highest supported input value for cpuid instruction.  ext can
-   be either 0x0 or 0x800 to return highest supported value for
+   be either 0x0 or 0x8000 to return highest supported value for
basic or extended cpuid information.  Function returns 0 if cpuid
is not supported or whatever cpuid returns in eax register.  If sig
pointer is non-null, then first four bytes of the signature
-- 
2.13.1

>From b921b54246135959a6fe9d4f6534299b3cc152fc Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 30 May 2017 08:02:03 +
Subject: [PATCH 14/15] Backport r248647

gcc/ChangeLog:

2017-05-30  Martin Liska  

	PR other/80909
	* auto-profile.c (get_function_decl_from_block): Fix
	parenthesis.
---
 gcc/auto-profile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/auto-profile.c b/gcc/auto-profile.c
index 94afe6fd2d9..2bf5e07ab25 100644
--- a/gcc/auto-profile.c
+++ b/gcc/auto-profile.c
@@ -353,7 +353,7 @@ get_function_decl_from_block (tree block)
 {
   tree decl;
 
-  if (LOCATION_LOCUS (BLOCK_SOURCE_LOCATION (block) == UNKNOWN_LOCATION))
+  if (LOCATION_LOCUS (BLOCK_SOURCE_LOCATION (block)) == UNKNOWN_LOCATION)
 return NULL_TREE;
 
   for (decl = BLOCK_ABSTRACT_ORIGIN (block);
-- 
2.13.1

>From eec428d554a565ae1d73e6c5824474a5751bc7ce Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 26 May 2017 11:05:52 +
Subject: [PATCH 13/15] Backport r248489

gcc/ChangeLog:

2017-05-26  Martin Liska  

	PR ipa/80663
	* params.def: Bound partial-inlining-entry-probability param.

gcc/testsuite/ChangeLog:

2017-05-26  Martin Liska  

	PR ipa/80663
	* g++.dg/ipa/pr80212.C: Remove the test as it does not longer
	split at the problematic spot.
	* gcc.dg/ipa/pr48195.c: Change 101 to 100 as 101 is no longer
	a valid value of the param.
---
 gcc/params.def |  2 +-
 gcc/testsuite/g++.dg/ipa/pr80212.C | 18 --
 gcc/testsuite/gcc.dg/ipa/pr48195.c |  2 +-
 3 files changed, 2 insertions(+), 20 deletions(-)
 delete mode 100644 gcc/testsuite/g++.dg/ipa/pr80212.C

diff --git a/gcc/params.def b/gcc/params.def
index 76308cdfcdb..ce83aa71e6d 100644
--- a/gcc/params.def
+++ b/gcc/params.def
@@ -126,7 +126,7 @@ DEFPARAM (PARAM_COMDAT_SHARING_PROBABILITY,
 DEFPARAM (PARAM_PARTIAL_INLINING_ENTRY_PROBABILITY,
 	  "partial-inlining-entry-probability",
 	  "Maximum probability of the entry BB of split region (in percent relative to entry BB of the function) to make partial inlining happen.",
-	  70, 0, 0)
+	  70, 0, 100)
 
 /* Limit the number of expansions created by the variable expansion
optimization to avoid register pressure.  */
diff --git a/gcc/testsuite/g++.dg/ipa/pr80212.C b/gcc/testsuite/g++.dg/ipa/pr80212.C
deleted file mode 100644
index 60d3b613035..000
--- a/gcc/testsuite/g++.dg/ipa/pr80212.C
+++ /dev/null
@@ -1,18 +0,0 @@
-// PR ipa/80212
-// { dg-options "-O2 --param partial-inlining-entry-probability=403796683 -fno-early-inlining" }
-
-struct b
-{
-  virtual b *c () const;
-};
-struct d : virtual b
-{
-};
-struct e : d
-{
-  e *
-  c () const
-  {
-  }
-};
-main () { e a; }
diff --git a/gcc/testsuite/gcc.dg/ipa/pr48195.c b/gcc/testsuite/gcc.dg/ipa/pr48195.c
index 2e38452d598..25e80bab8f8 100644
--- a/gcc/testsuite/gcc.dg/ipa/pr48195.c
+++ b/gcc/testsuite/gcc.dg/ipa/pr48195.c
@@ -1,5 +1,5 @@
 /* { dg-do link } */
-/* { dg-options "-O2 -flto --param partial-inlining-entry-probability=101" } */
+/* { dg-options "-O2 -flto --param partial-inlining-entry-probability=100" } */
 /* { dg-require-effective-target lto } */
 
 extern void abort(void);
-- 
2.13.1

>From 59383db2594cfaf380ce6e91a14b4b11f977f497 Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 16 May 2017 08:57:05 +
Subject: [PATCH 12/15] Backport r248089

gcc/ChangeLog:

2017-05-16  Martin Liska  

	PR ipa/79849.
	PR ipa/79850.
	* ipa-devirt.c (warn_types_mismatch): Fix typo.
	(odr_types_equivalent_p): Likewise.
---
 gcc/ipa-devirt.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/ipa-devirt.c b/gcc/ipa-devirt.c
index 0332b3ec616..9853c4a499c 100644
--- a/gcc/ipa-devirt.c
++

Re: [PATCH 2/3] Make early return predictor more precise.

2017-06-21 Thread Jan Hubicka

> 
> Ok, so I fixed that in the described way. There's one remaining fallout of:
> gcc/testsuite/gcc.dg/tree-ssa/ipa-split-5.c
> 
> Where a fnsplit is properly done, but then it's again inlined:
> 
> Considering split_me.part.0/5 with 23 size
>  to be inlined into test/2 in unknown:0
>  Estimated badness is -0.01, frequency 0.33.
>  Inlined split_me.part.0 into test which now has time 50.30 and size 44, 
> net change of +17.
> 
> Considering split_me.part.0/5 with 23 size
>  to be inlined into test/2 in unknown:0
>  Estimated badness is -0.01, frequency 0.33.
>  Inlined split_me.part.0 into test which now has time 70.76 and size 61, 
> net change of +17.
> 
> Considering split_me.part.0/5 with 23 size
>  to be inlined into test/2 in unknown:0
>  Estimated badness is -0.01, frequency 0.33.
>  Inlined split_me.part.0 into test which now has time 91.22 and size 78, 
> net change of +17.
> 
> Considering split_me.part.0/5 with 23 size
>  to be inlined into test/2 in unknown:0
>  Estimated badness is -0.01, frequency 0.33.
>  Inlined split_me.part.0 into test which now has time 111.68 and size 95, 
> net change of +17.
> Unit growth for small function inlining: 61->129 (111%)
> 
> ...
> 
> Any hint how to block the IPA inlining?

I guess you only want to make cold part of split_me bigger or
just use --param to reduce growth for auto inlining.

How the patch reduces split_me_part considerably?
Honza
> 
> Sending new version of patch.
> Martin
> 
> > 
> > I would just move pass_strip_predict_hints pre-IPA and not worry about
> > them chaining.
> > 
> > There is problem that after inlining the prediction may expand its scope
> > and predict branch that it outside of the original function body,
> > but I do not see very easy solution for that besides not re-doing
> > prediction (we could also copy probabilities from the inlined function
> > when they exists and honnor them in the outer function. I am not sure
> > that is going to improve prediction quality though - extra context
> > is probably useful)
> > 
> > Thanks,
> > Honza
> >>
> >> Thanks,
> >> Martin
> >>
> >>>
> >>> Where did you found this case?
> >>> Honza
>   
> /* Create a new deep copy of the statement.  */
> copy = gimple_copy (stmt);
>  -- 
>  2.13.0
> 
> 

> >From 84625a782add6ae2ed29630815b61b34a052770a Mon Sep 17 00:00:00 2001
> From: marxin 
> Date: Tue, 6 Jun 2017 10:55:18 +0200
> Subject: [PATCH 1/2] Make early return predictor more precise.
> 
> gcc/ChangeLog:
> 
> 2017-05-26  Martin Liska  
> 
>   PR tree-optimization/79489
>   * gimplify.c (maybe_add_early_return_predict_stmt): New
>   function.
>   (gimplify_return_expr): Call the function.
>   * predict.c (tree_estimate_probability_bb): Remove handling
>   of early return.
>   * predict.def: Update comment about early return predictor.
>   * gimple-predict.h (is_gimple_predict): New function.
>   * predict.def: Change default value of early return to 66.
>   * tree-tailcall.c (find_tail_calls): Skip GIMPLE_PREDICT
>   statements.
>   * passes.def: Put pass_strip_predict_hints to the beginning of
>   IPA passes.
> ---
>  gcc/gimple-low.c |  2 ++
>  gcc/gimple-predict.h |  8 
>  gcc/gimplify.c   | 16 
>  gcc/passes.def   |  1 +
>  gcc/predict.c| 41 -
>  gcc/predict.def  | 15 +++
>  gcc/tree-tailcall.c  |  2 ++
>  7 files changed, 32 insertions(+), 53 deletions(-)
> 
> diff --git a/gcc/gimple-low.c b/gcc/gimple-low.c
> index 619b9d7bfb1..4ea6c3532f3 100644
> --- a/gcc/gimple-low.c
> +++ b/gcc/gimple-low.c
> @@ -30,6 +30,8 @@ along with GCC; see the file COPYING3.  If not see
>  #include "calls.h"
>  #include "gimple-iterator.h"
>  #include "gimple-low.h"
> +#include "predict.h"
> +#include "gimple-predict.h"
>  
>  /* The differences between High GIMPLE and Low GIMPLE are the
> following:
> diff --git a/gcc/gimple-predict.h b/gcc/gimple-predict.h
> index ba58e12e9e9..0e6c2e1ea01 100644
> --- a/gcc/gimple-predict.h
> +++ b/gcc/gimple-predict.h
> @@ -80,4 +80,12 @@ gimple_build_predict (enum br_predictor predictor, enum 
> prediction outcome)
>return p;
>  }
>  
> +/* Return true if GS is a GIMPLE_PREDICT statement.  */
> +
> +static inline bool
> +is_gimple_predict (const gimple *gs)
> +{
> +  return gimple_code (gs) == GIMPLE_PREDICT;
> +}
> +
>  #endif  /* GCC_GIMPLE_PREDICT_H */
> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
> index 9af95a28704..1c6e1591953 100644
> --- a/gcc/gimplify.c
> +++ b/gcc/gimplify.c
> @@ -1428,6 +1428,20 @@ gimplify_bind_expr (tree *expr_p, gimple_seq *pre_p)
>return GS_ALL_DONE;
>  }
>  
> +/* Maybe add early return predict statement to PRE_P sequence.  */
> +
> +static void
> +maybe_add_early_return_predict_stmt (gimple_seq *pre_p)
> +{
> +  /* If we are not in a conditional context, add PREDICT statement.  */
> +  if (gimple_con

RE: [PATCH] [Aarch64] Variable shift count truncation issues

2017-06-21 Thread Michael Collison

Updated the patch per Richard's suggestions to allow scheduling of instructions 
before reload.

Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk?

2017-05-22  Kyrylo Tkachov  
Michael Collison 

PR target/70119
* config/aarch64/aarch64.md (*aarch64__reg_3_mask1):
New pattern.
(*aarch64_reg_3_neg_mask2): New pattern.
(*aarch64_reg_3_minus_mask): New pattern.
(*aarch64__reg_di3_mask2): New pattern.
* config/aarch64/aarch64.c (aarch64_rtx_costs): Account for cost
of shift when the shift amount is masked with constant equal to
the size of the mode.
* config/aarch64/predicates.md (subreg_lowpart_operator): New
predicate.

2016-05-22  Kyrylo Tkachov  
Michael Collison 

PR target/70119
* gcc.target/aarch64/var_shift_mask_1.c: New test.

-Original Message-
From: Richard Sandiford [mailto:richard.sandif...@linaro.org] 
Sent: Thursday, June 15, 2017 6:40 AM
To: Michael Collison 
Cc: Wilco Dijkstra ; Christophe Lyon 
; GCC Patches ; nd 

Subject: Re: [PATCH] [Aarch64] Variable shift count truncation issues

Michael Collison  writes:
> +(define_insn_and_split "*aarch64_reg_3_neg_mask2"
> +  [(set (match_operand:GPI 0 "register_operand" "=r")
> + (SHIFT:GPI
> +   (match_operand:GPI 1 "register_operand" "r")
> +   (match_operator 4 "subreg_lowpart_operator"
> +   [(neg:SI (and:SI (match_operand:SI 2 "register_operand" "r")
> +(match_operand 3 "const_int_operand" "n")))])))
> +   (clobber (match_scratch:SI 5 "=&r"))]
> +  "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) == 0)"
> +  "#"
> +  "&& reload_completed"
> +  [(const_int 0)]
> +  {
> +emit_insn (gen_negsi2 (operands[5], operands[2]));
> +
> +rtx and_op = gen_rtx_AND (SImode, operands[5], operands[3]);
> +rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[4]), and_op,
> +  SUBREG_BYTE (operands[4]));
> +emit_insn (gen_3 (operands[0], operands[1], subreg_tmp));
> +DONE;
> +  }
> +)

Thanks, I agree this looks correct from the split/reload_completed POV.
I think we can go one better though, either:

(a) Still allow the split when !reload_completed, and use:

 if (GET_MODE (operands[5]) == SCRATCH)
   operands[5] = gen_reg_rtx (SImode);

This will allow the individual instructions to be scheduled by sched1.

(b) Continue to restrict the split to reload_completed, change operand 0
to =&r so that it can be used as a temporary, and drop operand 5 entirely.

Or perhaps do both:

(define_insn_and_split "*aarch64_reg_3_neg_mask2"
  [(set (match_operand:GPI 0 "register_operand" "=&r")
(SHIFT:GPI
  (match_operand:GPI 1 "register_operand" "r")
  (match_operator 4 "subreg_lowpart_operator"
  [(neg:SI (and:SI (match_operand:SI 2 "register_operand" "r")
   (match_operand 3 "const_int_operand" "n")))])))]
  "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) == 0)"
  "#"
  "&& 1"
  [(const_int 0)]
  {
rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (mode)
   : operands[0]);
emit_insn (gen_negsi2 (tmp, operands[2]));

rtx and_op = gen_rtx_AND (SImode, tmp, operands[3]);
rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[4]), and_op,
 SUBREG_BYTE (operands[4]));
emit_insn (gen_3 (operands[0], operands[1], subreg_tmp));
DONE;
  }
)

Sorry for the run-around.  I should have realised earlier that these patterns 
didn't really need a distinct register after RA.

Thanks,
Richard

pr5546v5.patch
Description: pr5546v5.patch

Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-21 Thread Richard Earnshaw (lists)

On 21/06/17 00:22, Wilco Dijkstra wrote:
> Jeff Law wrote:
>> But the stack pointer might have already been advanced into the guard
>> page by the caller.   For the sake of argument assume the guard page is
>> 0xf1000 and assume that our stack pointer at entry is 0xf1010 and that
>> the caller hasn't touched the 0xf1000 page.
>>
>> If FrameSize >= 32, then the stores are going to hit the 0xf page
>> rather than the 0xf1000 page.   That's jumping the guard.  Thus we have
>> to emit a probe prior to this stack allocation.
> 
> That's an incorrect ABI that allows adjusting the frame by 4080+32! A
> correct
> one might allow say 1024 bytes for outgoing arguments. That means when
> you call a function, there is still guard-page-size - 1024 bytes left
> that you can
> use to allocate locals. With a 4K guard page that allows leaf functions
> up to 3KB,
> and depending on the frame locals of 2-3KB plus up to 1024 bytes of outgoing
> arguments without inserting any probes beyond the normal frame stores.
> 
> This design means almost no functions need additional probes. Assuming we're
> also increasing the guard page size to 64KB, it's cheap even for large
> functions.
> 
> Wilco

A mere 256 bytes for the caller would permit 32 x 8byte arguments on the
stack which, with at least 8 parameters passed in registers, would allow
for calls with 40 parameters.  There can't be many in that space.  Any
function making calls with more than that might need additional probes,
but that's going to be exceedingly rare.

Put the cost on the least common sequences, even if they pay
disproportionately - it will be a win over all.

R.

Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-21 Thread Richard Earnshaw (lists)

On 20/06/17 22:39, Jeff Law wrote:
> On 06/20/2017 03:27 AM, Richard Earnshaw (lists) wrote:
>> On 19/06/17 18:07, Jeff Law wrote:
>>> As some of you are likely aware, Qualys has just published fairly
>>> detailed information on using stack/heap clashes as an attack vector.
>>> Eric B, Michael M -- sorry I couldn't say more when I contact you about
>>> -fstack-check and some PPC specific stuff.  This has been under embargo
>>> for the last month.
>>>
>>>
>>> --
>>>
>>>
>>> http://www.openwall.com/lists/oss-security/2017/06/19/1
>>>
>> [...]
>>> aarch64 is significantly worse.  There are no implicit probes we can
>>> exploit.  Furthermore, the prologue may allocate stack space 3-4 times.
>>> So we have the track the distance to the most recent probe and when that
>>> distance grows too large, we have to emit a probe.  Of course we have to
>>> make worst case assumptions at function entry.
>>>
>>
>> I'm not sure I understand what you're saying here.  According to the
>> comment above aarch64_expand_prologue, the stack frame looks like:
>>
>> +---+
>> |   |
>> |  incoming stack arguments |
>> |   |
>> +---+
>> |   | <-- incoming stack pointer (aligned)
>> |  callee-allocated save area   |
>> |  for register varargs |
>> |   |
>> +---+
>> |  local variables  | <-- frame_pointer_rtx
>> |   |
>> +---+
>> |  padding0 | \
>> +---+  |
>> |  callee-saved registers   |  | frame.saved_regs_size
>> +---+  |
>> |  LR'  |  |
>> +---+  |
>> |  FP'  | / <- hard_frame_pointer_rtx (aligned)
>> +---+
>> |  dynamic allocation   |
>> +---+
>> |  padding  |
>> +---+
>> |  outgoing stack arguments | <-- arg_pointer
>> |   |
>> +---+
>> |   | <-- stack_pointer_rtx (aligned)
>>
>> Now for the majority of frames the amount of local variables is small
>> and there is neither dynamic allocation nor the need for outgoing local
>> variables.  In this case the first instruction in the function is
>>
>>  stp fp, lr, [sp, #-FrameSize
> But the stack pointer might have already been advanced into the guard
> page by the caller.   For the sake of argument assume the guard page is
> 0xf1000 and assume that our stack pointer at entry is 0xf1010 and that
> the caller hasn't touched the 0xf1000 page.

Then make sure the caller does touch the 0xf1000 page.  If it's
allocated that much stack it should be forced to do the probe and not
rely on all it's children having to do it because it can't be bothered.


> 
> If FrameSize >= 32, then the stores are going to hit the 0xf page
> rather than the 0xf1000 page.   That's jumping the guard.  Thus we have
> to emit a probe prior to this stack allocation.
> 
> Now because this instruction stores at *new_sp, it does allow us to
> eliminate future probes and I do take advantage of that in my code.
> 
> The implementation is actually rather simple.  We keep a conservative
> estimate of the offset of the last known probe relative to the stack
> pointer.  At entry we have to assume the offset is:
> 
> PROBE_INTERVAL - (STACK_BOUNDARY / BITS_PER_UNIT)
> 
> 
> A stack allocation increases the offset.  A store into the stack
> decreases the offset.
>  i
> A probe is required before an allocation that increases the offset to >=
> PROBE_INTERVAL.
> 
> An allocation + store instruction such as shown does both, but can (and
> is) easily modeled.  THe only tricky case here is that you can't naively
> break it up into an allocation and store as that can force an
> unnecessary probe (say if the allocated space is just enough to hold the
> stored objects).
> 

It's better to pay the (relatively small) price when doing large
allocations than to repeatedly pay the price on every small allocation.

> 
> 
> 
>>
>>
>> If the locals area gets slightly larger (>= 512 bytes) then the sequence
>> becomes
>>  sub sp, sp, #FrameSize
>>  stp fp, lr, [sp]
>>
>> But again this acts as a sufficient implicit probe provided that
>> FrameSize does not exceed the probe interval.
> And again, the store acts as a probe which can eliminate potential
> probes that might occur later in the instruction stream.  But if the
> allocation by the "sub" instruction causes our running offset to cross
> PROBE_BOUNDARY, then we must emit a probe prior to the "sub" instruction.
> 
> Hopefully it'll be clearer when I post the code :-)  aarch64 is one that
> will need updating as all work to-date has been with

[patch][x86] Remove old rounding code

2017-06-21 Thread Koval, Julia

Hi,
This patch removes old parallel code for avx512er. Parallel in this case can't 
be generated anymore, because all existing patterns were reworked to unspec in 
r249423 and r249009. Ok for trunk?

gcc/
* gcc/config/i386/i386.c (ix86_erase_embedded_rounding):
Remove code for old rounding pattern.

Thanks,
Julia


0001-remove-code-for-old-rounding.patch
Description: 0001-remove-code-for-old-rounding.patch

Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-21 Thread Andreas Schwab

On Jun 21 2017, "Richard Earnshaw (lists)"  wrote:

> A mere 256 bytes for the caller would permit 32 x 8byte arguments on the
> stack which, with at least 8 parameters passed in registers, would allow
> for calls with 40 parameters.  There can't be many in that space.  Any
> function making calls with more than that might need additional probes,
> but that's going to be exceedingly rare.

With passing structures by value you can have arbitrary large
parameters.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."

Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-21 Thread Richard Earnshaw (lists)

On 21/06/17 09:44, Andreas Schwab wrote:
> On Jun 21 2017, "Richard Earnshaw (lists)"  wrote:
> 
>> A mere 256 bytes for the caller would permit 32 x 8byte arguments on the
>> stack which, with at least 8 parameters passed in registers, would allow
>> for calls with 40 parameters.  There can't be many in that space.  Any
>> function making calls with more than that might need additional probes,
>> but that's going to be exceedingly rare.
> 
> With passing structures by value you can have arbitrary large
> parameters.
> 
> Andreas.
> 


No.  Those are passed by copies which appear in the locals portion of
the frame (so are covered by normal allocation priniciples).  Only
structures of less than 16 bytes are passed by direct copy.

R.

Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-21 Thread Richard Earnshaw (lists)

On 21/06/17 09:46, Richard Earnshaw (lists) wrote:
> On 21/06/17 09:44, Andreas Schwab wrote:
>> On Jun 21 2017, "Richard Earnshaw (lists)"  wrote:
>>
>>> A mere 256 bytes for the caller would permit 32 x 8byte arguments on the
>>> stack which, with at least 8 parameters passed in registers, would allow
>>> for calls with 40 parameters.  There can't be many in that space.  Any
>>> function making calls with more than that might need additional probes,
>>> but that's going to be exceedingly rare.
>>
>> With passing structures by value you can have arbitrary large
>> parameters.
>>
>> Andreas.
>>
> 
> 
> No.  Those are passed by copies which appear in the locals portion of
* pointers to copies *
R.

> the frame (so are covered by normal allocation priniciples).  Only
> structures of less than 16 bytes are passed by direct copy.
> 
> R.
>

Re: [PATCH] PR libstdc++/81092 add std::wstring symbols and bump library version

2017-06-21 Thread Jonathan Wakely


On 14/06/17 19:13 +0100, Jonathan Wakely wrote:

There are two symbols defined in GCC 7.1's libstdc++.6.0.23 library
which are not exported on all targets (because I wrote "m" in the
linker script instead of "[jmy]").

This patch bumps the library version on gcc-7-branch to 6.0.24 and
exports the "[jy]" versions of the symbols with version the new
GLIBCXX_3.4.24 symbol version.

This requires bumping the version on trunk to 6.0.25 and moving the
new random_device::_M_get_entropy() symbol to GLIBCXX_3.4.25 (which
will be done by the patch in the following mail).

Target maintainers will need to regenerate the baseline symbols on
gcc-7-branch and trunk.

I intend to commit this tomorrow to gcc-7-branch.


I forgot to commit the regenerated configure scripts, which I've now
done (r249438 for trunk and r249439 for the branch).

Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-21 Thread Wilco Dijkstra

Richard Earnshaw wrote:
> A mere 256 bytes for the caller would permit 32 x 8byte arguments on the
> stack which, with at least 8 parameters passed in registers, would allow
> for calls with 40 parameters.  There can't be many in that space.  Any
> function making calls with more than that might need additional probes,
> but that's going to be exceedingly rare.
> 
> Put the cost on the least common sequences, even if they pay
> disproportionately - it will be a win over all.

Functions with large outgoing arguments are extremely rare indeed, it tails off
really fast after 64 bytes. The only large cases I've seen are from Fortran 
code -
and those cases seem buggy (40KBytes of outgoing args means 5000 double
args which is unlikely).

Wilco

Re: [Patch AArch64] Add initial tuning support for Cortex-A55 and Cortex-A75

2017-06-21 Thread Richard Earnshaw (lists)

On 20/06/17 16:41, James Greenhalgh wrote:
> 
> Hi,
> 
> This patch adds support for the ARM Cortex-A75 and
> Cortex-A55 processors through the -mcpu/-mtune values cortex-a55 and
> cortex-a75, and an ARM DynamIQ big.LITTLE configuration of these two
> processors through the -mcpu/-mtune value cortex-a75.cortex-a55
> 
> The ARM Cortex-A75 is ARM's latest and highest performance applications
> processor. For the initial tuning provided in this patch, I have chosen to
> share the tuning structure with its predecessor, the Cortex-A73.
> 
> The ARM Cortex-A55 delivers the best combination of power efficiency
> and performance in its class. For the initial tuning provided in this patch,
> I have chosen to share the tuning structure with its predecessor, the
> Cortex-A53.
> 
> Both Cortex-A55 and Cortex-A75 support ARMv8-A with the ARM8.1-A and
> ARMv8.2-A extensions, along with the cryptography extension, and
> the RCPC extensions from ARMv8.3-A. This is reflected in the patch,
> -mcpu=cortex-a75 is treated as equivalent to passing -mtune=cortex-a75
> -march=armv8.2-a+rcpc .
> 
> Tested on aarch64-none-elf with no issues.
> 
> OK for trunk?
> 
> Thanks,
> James
> 
> ---
> 2017-06-20  James Greenhalgh  
> 
>   * config/aarch64/aarch64-cores.def (cortex-a55): New.
>   (cortex-a75): Likewise.
>   (cortex-a75.cortex-a55): Likewise.
>   * config/aarch64/aarch64-tune.md: Regenerate.
>   * doc/invoke.texi (-mtune): Document new values for -mtune.
> 
> 

Mostly ok, but...

> 0001-Patch-AArch64-Add-initial-tuning-support-for-Cortex-.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-cores.def 
> b/gcc/config/aarch64/aarch64-cores.def
> index e333d5f..0baa20c 100644
> --- a/gcc/config/aarch64/aarch64-cores.def
> +++ b/gcc/config/aarch64/aarch64-cores.def
> @@ -80,6 +80,12 @@ AARCH64_CORE("vulcan",  vulcan, thunderx2t99, 8_1A,  
> AARCH64_FL_FOR_ARCH8_1 | AA
>  /* Cavium ('C') cores. */
>  AARCH64_CORE("thunderx2t99",  thunderx2t99,  thunderx2t99, 8_1A,  
> AARCH64_FL_FOR_ARCH8_1 | AARCH64_FL_CRYPTO, thunderx2t99, 0x43, 0x0af, -1)
>  
> +/* ARMv8.2-A Architecture Processors.  */
> +
> +/* ARM ('A') cores. */
> +AARCH64_CORE("cortex-a55",  cortexa55, cortexa53, 8_2A,  
> AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_RCPC, cortexa53, 0x41, 0xd05, -1)
> +AARCH64_CORE("cortex-a75",  cortexa75, cortexa57, 8_2A,  
> AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_RCPC, cortexa73, 0x41, 0xd0a, -1)
> +
>  /* ARMv8-A big.LITTLE implementations.  */
>  
>  AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8A,  
> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, 0x41, AARCH64_BIG_LITTLE 
> (0xd07, 0xd03), -1)
> @@ -87,4 +93,8 @@ AARCH64_CORE("cortex-a72.cortex-a53",  cortexa72cortexa53, 
> cortexa53, 8A,  AARCH
>  AARCH64_CORE("cortex-a73.cortex-a35",  cortexa73cortexa35, cortexa53, 8A,  
> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE 
> (0xd09, 0xd04), -1)
>  AARCH64_CORE("cortex-a73.cortex-a53",  cortexa73cortexa53, cortexa53, 8A,  
> AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, 0x41, AARCH64_BIG_LITTLE 
> (0xd09, 0xd03), -1)
>  
> +/* ARM DynamIQ big.LITTLE configurations.  */
> +
> +AARCH64_CORE("cortex-a75.cortex-a55",  cortexa75cortexa55, cortexa53, 8_2A,  
> AARCH64_FL_FOR_ARCH8_2 | AARCH64_FL_RCPC, cortexa73, 0x41, AARCH64_BIG_LITTLE 
> (0xd0a, 0xd05), -1)
> +
>  #undef AARCH64_CORE
> diff --git a/gcc/config/aarch64/aarch64-tune.md 
> b/gcc/config/aarch64/aarch64-tune.md
> index 4209f67..7fcd6cb 100644
> --- a/gcc/config/aarch64/aarch64-tune.md
> +++ b/gcc/config/aarch64/aarch64-tune.md
> @@ -1,5 +1,5 @@
>  ;; -*- buffer-read-only: t -*-
>  ;; Generated automatically by gentune.sh from aarch64-cores.def
>  (define_attr "tune"
> - 
> "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,thunderx2t99p1,vulcan,thunderx2t99,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
> + 
> "cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,thunderx,thunderxt88p1,thunderxt88,thunderxt81,thunderxt83,xgene1,falkor,qdf24xx,exynosm1,thunderx2t99p1,vulcan,thunderx2t99,cortexa55,cortexa75,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53,cortexa75cortexa55"
>   (const (symbol_ref "((enum attr_tune) aarch64_tune)")))
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 86c8d62..2746c3e 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -14077,17 +14077,19 @@ processors implementing the target architecture.
>  @opindex mtune
>  Specify the name of the target processor for which GCC should tune the
>  performance of the code.  Permissible values for this option are:
> -@samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex-a57},
> -@samp{cortex-a72}, @samp{cortex-a73}, @samp{exynos-m1},
> -@samp{xgene1}, @samp{vulcan}, @samp{thunderx},
> +@samp{generic}, @samp{cortex-a35}, @samp{cortex-a53}, @samp{cortex

Re: [committed] Fix bootstrap on armv6-*-freebsd

2017-06-21 Thread Richard Earnshaw (lists)

On 20/06/17 22:29, Andreas Tobler wrote:
> Hi All,
> 
> I committed the chunk below to fix bootstrap on armv6*-*-freebsd.
> 
> Andreas
> 
> 2017-06-20  Andreas Tobler  
> 
> * config.gcc (armv6*-*-freebsd*): Change the target_cpu_cname to
> arm1176jzf-s.
> 
> Index: config.gcc
> ===
> --- config.gcc(revision 249427)
> +++ config.gcc(working copy)
> @@ -1089,7 +1089,7 @@
>  tm_file="${tm_file} arm/bpabi.h arm/freebsd.h arm/aout.h arm/arm.h"
>  case $target in
>  armv6*-*-freebsd*)
> -target_cpu_cname="arm1176jzfs"
> +target_cpu_cname="arm1176jzf-s"
>  tm_defines="${tm_defines} TARGET_FREEBSD_ARMv6=1"
>  if test $fbsd_major -ge 11; then
> tm_defines="${tm_defines} TARGET_FREEBSD_ARM_HARD_FLOAT=1"

Ooops, sorry about that.  At one point these were internal identifiers
and I'm afraid I must have missed this particular port when I converted
the code to using the user-visible product names.

R.

Re: [PATCH][AArch64] Mark symbols as constant

2017-06-21 Thread Richard Earnshaw (lists)

On 20/06/17 15:56, Wilco Dijkstra wrote:
> Richard Earnshaw wrote:
> 
>> What testing has this had with -fpic?  I'm not convinced that this
>> assertion is true in that case?
> 
> I ran the GLIBC tests which pass. -fpic works since it does also form a
> constant address, ie. instead of:
> 
> adrp  x1, global
> add x1, x1, :lo12:global
> 
> we do:
> 
> adrp  x1, :got:global
> ldr x1, [x1, :got_lo12:global]
> 
> CSEing or rematerializing either sequence works in the same way.
> 
> With TLS the resulting addresses are also constant, however this could
> cause rather complex TLS sequences to be rematerialized.  It seems
> best to block that.  Updated patch below:
> 
> 
> Aarch64_legitimate_constant_p currently returns false for symbols,
> eventhough they are always valid constants.  This means LOSYM isn't
> CSEd correctly.  If we return true CSE works better, resulting in
> smaller/faster code (0.3% smaller code on SPEC2006).  Avoid this
> for TLS symbols since their sequence is complex.
> 
> int x0 = 1, x1 = 2, x2 = 3;
> 
> int 
> f (int x, int y)
> {
>   x += x1;
>   if (x > 100)
> y += x2;
>   x += x0;
>   return x + y;
> }
> 
> Before:
>   adrpx3, .LANCHOR0
>   add x4, x3, :lo12:.LANCHOR0
>   ldr w2, [x3, #:lo12:.LANCHOR0]
>   add w0, w0, w2
>   cmp w0, 100
>   ble .L5
>   ldr w2, [x4, 8]
>   add w1, w1, w2
> .L5:
>   add x3, x3, :lo12:.LANCHOR0
>   ldr w2, [x3, 4]
>   add w0, w0, w2
>   add w0, w0, w1
>   ret
> 
> After:
>   adrpx2, .LANCHOR0
>   add x3, x2, :lo12:.LANCHOR0
>   ldr w2, [x2, #:lo12:.LANCHOR0]
>   add w0, w0, w2
>   cmp w0, 100
>   ble .L5
>   ldr w2, [x3, 8]
>   add w1, w1, w2
> .L5:
>   ldr w2, [x3, 4]
>   add w0, w0, w2
>   add w0, w0, w1
>   ret
> 
> Bootstrap OK, OK for commit?
> 
> ChangeLog:
> 2017-06-20  Wilco Dijkstra  
> 
>   * config/aarch64/aarch64.c (aarch64_legitimate_constant_p):
>   Return true for non-tls symbols.
> --

OK.

R.

> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 
> 5ec6bbfcf484baa4005b8a88cb98d0d04f710877..060cd8476d2954119daac495ecb059c9be73edbe
>  100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -10111,6 +10111,11 @@ aarch64_legitimate_constant_p (machine_mode mode, 
> rtx x)
>&& aarch64_valid_symref (XEXP (x, 0), GET_MODE (XEXP (x, 0
>  return true;
>  
> +  /* Treat symbols as constants.  Avoid TLS symbols as they are complex,
> + so spilling them is better than rematerialization.  */
> +  if (SYMBOL_REF_P (x) && !SYMBOL_REF_TLS_MODEL (x))
> +return true;
> +
>return aarch64_constant_address_p (x);
>  }
>  
>

Re: [gcn][patch] Add -mgpu option and plumb in assembler/linker

2017-06-21 Thread Andrew Stubbs


On 06/06/17 20:52, Andrew Stubbs wrote:
Thomas objects to the new option, and after talking with him the 
reasoning seems sound. GCC has been moving away from -mcpu in any case, 
so I guess I'll put -march and -mtune back, and use those for the same 
purpose.


I'll commit the patch with those changes soonish.


I've finally got around to pushing the patch.

It now uses -march to set the GPU name.

Andrew

Switch to HSACO2

2017-06-21  Andrew Stubbs  

	gcc/
	* config.gcc (amdgcn): Set default to "carrizo"
	* config/gcn/gcn-opts.h: New file.
	* config/gcn/gcn.c (output_file_start): Switch to HSACO version
	2 and auto-detection of GPU type (from -mcpu).
	(gcn_arch, gcn_tune): Remove.
	* config/gcn/gcn.h: Include gcn-opts.h.
	(enum processor_type): Move to gcn-opts.h.
	(LIBGCC_SPEC, ASM_SPEC, LINK_SPEC): Define.
	(gcn_arch, gcn_tune): Remove.
	* config/gcn/gcn.opt: Include gcn-opts.h.
	(gpu_type): New Enum.
	(march, mtune): New options.

diff --git a/gcc/config.gcc b/gcc/config.gcc
index 32f0f4b..99c9c4a 100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3930,7 +3930,7 @@ case "${target}" in
 		for which in arch tune; do
 			eval "val=\$with_$which"
 			case ${val} in
-			"" | fiji)
+			"" | carrizo | fiji)
 # OK
 ;;
 			*)
@@ -3939,6 +3939,7 @@ case "${target}" in
 ;;
 			esac
 		done
+		[ "x$with_arch" = x ] && with_arch=carrizo
 		;;
 
 	hppa*-*-*)
diff --git a/gcc/config/gcn/gcn-opts.h b/gcc/config/gcn/gcn-opts.h
new file mode 100644
index 000..d0586d6
--- /dev/null
+++ b/gcc/config/gcn/gcn-opts.h
@@ -0,0 +1,27 @@
+/* Copyright (C) 2016-2017 Free Software Foundation, Inc.
+
+   This file is free software; you can redistribute it and/or modify it under
+   the terms of the GNU General Public License as published by the Free
+   Software Foundation; either version 3 of the License, or (at your option)
+   any later version.
+
+   This file is distributed in the hope that it will be useful, but WITHOUT
+   ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+   FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+   for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with GCC; see the file COPYING3.  If not see
+   .  */
+
+#ifndef GCN_OPTS_H
+#define GCN_OPTS_H
+
+/* Which processor to generate code or schedule for.  */
+enum processor_type
+{
+  PROCESSOR_CARRIZO,
+  PROCESSOR_FIJI
+};
+
+#endif
diff --git a/gcc/config/gcn/gcn.c b/gcc/config/gcn/gcn.c
index eb6edd8..c80bdf5 100644
--- a/gcc/config/gcn/gcn.c
+++ b/gcc/config/gcn/gcn.c
@@ -60,11 +60,6 @@
 /* This file should be included last.  */
 #include "target-def.h"
 
-/* Which instruction set architecture to use.  */
-int gcn_arch;
-/* Which cpu are we tuning for.  */
-int gcn_tune;
-
 static REAL_VALUE_TYPE dconst4, dconst1over2pi;
 static bool ext_gcn_constants_init = 0;
 
@@ -2006,8 +2001,8 @@ static void
 output_file_start (void)
 {
   fprintf (asm_out_file, "\t.hsatext\n");
-  fprintf (asm_out_file, "\t.hsa_code_object_version 1,0\n");
-  fprintf (asm_out_file, "\t.hsa_code_object_isa 8,0,1,\"AMD\",\"AMDGPU\"\n");
+  fprintf (asm_out_file, "\t.hsa_code_object_version 2,0\n");
+  fprintf (asm_out_file, "\t.hsa_code_object_isa\n");  /* Autodetect.  */
   fprintf (asm_out_file, "\t.section\t.AMDGPU.config\n");
   fprintf (asm_out_file, "\t.hsatext\n");
 }
diff --git a/gcc/config/gcn/gcn.h b/gcc/config/gcn/gcn.h
index 903022f..3b41095 100644
--- a/gcc/config/gcn/gcn.h
+++ b/gcc/config/gcn/gcn.h
@@ -14,18 +14,22 @@
along with GCC; see the file COPYING3.  If not see
.  */
 
+#include "config/gcn/gcn-opts.h"
+
 
 /* FIXME */
 #define TARGET_CPU_CPP_BUILTINS()
 
-/* Which processor to generate code or schedule for.  */
-enum processor_type
-{
-  PROCESSOR_CARRIZO,
-};
+/* Temporarily disable libgcc until one actually exists.  */
+#undef  LIBGCC_SPEC
+#define LIBGCC_SPEC ""
+
+/* Use LLVM assembler options.  */
+#undef ASM_SPEC
+#define ASM_SPEC "-triple=amdgcn--amdhsa %{march=*:-mcpu=%*} -filetype=obj"
 
-extern GTY(()) int gcn_arch;
-extern GTY(()) int gcn_tune;
+#undef LINK_SPEC
+#define LINK_SPEC ""
 
 /* Support for a compile-time default architecture and tuning.  The rules are:
--with-arch is ignored if -march is specified.
diff --git a/gcc/config/gcn/gcn.opt b/gcc/config/gcn/gcn.opt
index 8fc02b7..ffb5547 100644
--- a/gcc/config/gcn/gcn.opt
+++ b/gcc/config/gcn/gcn.opt
@@ -17,3 +17,24 @@
 ; You should have received a copy of the GNU General Public License
 ; along with GCC; see the file COPYING3.  If not see
 ; .
+
+HeaderInclude
+config/gcn/gcn-opts.h
+
+Enum
+Name(gpu_type) Type(enum processor_type)
+GCN GPU type to use:
+
+EnumValue
+Enum(gpu_type) String(carrizo) Value(PROCESSOR_CARRIZO)
+
+EnumValue
+Enum(gpu_type) String(fiji) Value(PROCESSOR_FIJI)
+
+march=
+Target RejectNegative Joined ToLower Enum(gpu_type) Var(gcn_arch) Init(PROCESSOR_C

Re: C++ PATCH for c++/81073, constexpr and static var in statement-expression

2017-06-21 Thread Jakub Jelinek

On Tue, Jun 20, 2017 at 09:45:10PM +0200, Andreas Schwab wrote:
> On Jun 20 2017, Jason Merrill  wrote:
> 
> > On Tue, Jun 20, 2017 at 5:40 AM, Andreas Schwab  wrote:
> >> FAIL: g++.dg/cpp0x/constexpr-cast.C  -std=c++11  (test for errors, line 10)
> >> FAIL: g++.dg/cpp0x/constexpr-cast.C  -std=c++11 (test for excess errors)
> >> FAIL: g++.dg/cpp0x/constexpr-cast.C  -std=c++14  (test for errors, line 10)
> >> FAIL: g++.dg/cpp0x/constexpr-cast.C  -std=c++14 (test for excess errors)
> >
> > I'm not seeing this.  Can you give more detail?
> 
> http://gcc.gnu.org/ml/gcc-testresults/2017-06/msg02172.html

It doesn't fail on LP64 targets, but does fail on ILP32,
on x86_64-linux can be reproduced with e.g.
make check-g++ RUNTESTFLAGS='--target_board=unix\{-m32,-m64\} 
dg.exp=constexpr-cast.C'
The difference is that for LP64, 1 has different sizeof from void * and thus
you get one diagnostics, while on ILP32 int has the same precision as void
*.
So one gets:

/usr/src/gcc/gcc/testsuite/g++.dg/cpp0x/constexpr-cast.C:10:22: error: 
reinterpret_cast from integer to pointer
/usr/src/gcc/gcc/testsuite/g++.dg/cpp0x/constexpr-cast.C:11:22: error: 
'reinterpret_cast(1)' is not a constant expression
/usr/src/gcc/gcc/testsuite/g++.dg/cpp0x/constexpr-cast.C:24:26:   in constexpr 
expansion of 'f()'
/usr/src/gcc/gcc/testsuite/g++.dg/cpp0x/constexpr-cast.C:24:27: error: value 
'4' of type 'int*' is not a constant expression
compiler exited with status 1
XFAIL: g++.dg/cpp0x/constexpr-cast.C  -std=c++11 bug c++/49171 (test for 
errors, line 8)
FAIL: g++.dg/cpp0x/constexpr-cast.C  -std=c++11  (test for errors, line 10)
PASS: g++.dg/cpp0x/constexpr-cast.C  -std=c++11  (test for errors, line 11)
PASS: g++.dg/cpp0x/constexpr-cast.C  -std=c++11  (test for errors, line 24)
FAIL: g++.dg/cpp0x/constexpr-cast.C  -std=c++11 (test for excess errors)
Excess errors:
/usr/src/gcc/gcc/testsuite/g++.dg/cpp0x/constexpr-cast.C:10:22: error: 
reinterpret_cast from integer to pointer

The following patch fixes it by allowing that wording too on the line 10.
Is this ok for trunk or do you have some other preference?

2017-06-21  Jakub Jelinek  

* g++.dg/cpp0x/constexpr-cast.C: Adjust dg-error for ILP32.

--- gcc/testsuite/g++.dg/cpp0x/constexpr-cast.C.jj  2016-08-08 
21:42:30.825683528 +0200
+++ gcc/testsuite/g++.dg/cpp0x/constexpr-cast.C 2017-06-21 12:30:19.425955047 
+0200
@@ -7,7 +7,7 @@ int i;
 // The following is accepted due to bug 49171.
 constexpr void *q = reinterpret_cast(&i);// { dg-error "" "bug 
c++/49171" { xfail *-*-* } }
 
-constexpr void *r0 = reinterpret_cast(1);// { dg-error "not a 
constant expression" }
+constexpr void *r0 = reinterpret_cast(1);// { dg-error "not a 
constant expression|reinterpret_cast from integer to pointer" }
 constexpr void *r1 = reinterpret_cast(sizeof 'x');  // { dg-error 
".reinterpret_cast\\(1\[ul\]\*\\). is not a constant expression" }
 
 template 


Jakub

RE: [PATCH][GCC][AArch64] optimize float immediate moves (2 /4) - HF/DF/SF mode.

2017-06-21 Thread Tamar Christina

> > movi\\t%0.4h, #0
> > -   mov\\t%0.h[0], %w1
> > +   fmov\\t%s0, %w1
> 
> Should this not be %h0?

The problem is that H registers are only available in ARMv8.2+,
I'm not sure what to do about ARMv8.1 given your other feedback
Pointing out that the bit patterns between how it's stored in s vs h registers
differ.

> 
> > umov\\t%w0, %1.h[0]
> > mov\\t%0.h[0], %1.h[0]
> > +   fmov\\t%s0, %1
> 
> Likewise, and much more important for correctness as it changes the way the
> bit pattern ends up in the register (see table C2-1 in release B.a of the ARM
> Architecture Reference Manual for ARMv8-A), here.
> 
> > +   * return aarch64_output_scalar_simd_mov_immediate (operands[1],
> > + SImode);
> > ldr\\t%h0, %1
> > str\\t%h1, %0
> > ldrh\\t%w0, %1
> > strh\\t%w1, %0
> > mov\\t%w0, %w1"
> > -  [(set_attr "type"
> "neon_move,neon_from_gp,neon_to_gp,neon_move,\
> > - f_loads,f_stores,load1,store1,mov_reg")
> > -   (set_attr "simd" "yes,yes,yes,yes,*,*,*,*,*")]
> > +  "&& can_create_pseudo_p ()
> > +   && !aarch64_can_const_movi_rtx_p (operands[1], HFmode)
> > +   && !aarch64_float_const_representable_p (operands[1])
> > +   &&  aarch64_float_const_rtx_p (operands[1])"
> > +  [(const_int 0)]
> > +  "{
> > +unsigned HOST_WIDE_INT ival;
> > +if (!aarch64_reinterpret_float_as_int (operands[1], &ival))
> > +  FAIL;
> > +
> > +rtx tmp = gen_reg_rtx (SImode);
> > +aarch64_expand_mov_immediate (tmp, GEN_INT (ival));
> > +tmp = simplify_gen_subreg (HImode, tmp, SImode, 0);
> > +emit_move_insn (operands[0], gen_lowpart (HFmode, tmp));
> > +DONE;
> > +  }"
> > +  [(set_attr "type" "neon_move,f_mcr,neon_to_gp,neon_move,fconsts,
> \
> > +neon_move,f_loads,f_stores,load1,store1,mov_reg")
> > +   (set_attr "simd" "yes,*,yes,yes,*,yes,*,*,*,*,*")]
> >  )
> 
> Thanks,
> James

Re: [Patch AArch64] Stop generating BSL for simple integer code

2017-06-21 Thread James Greenhalgh

*ping*

Thanks,
James

On Mon, Jun 12, 2017 at 02:44:40PM +0100, James Greenhalgh wrote:
> [Sorry for the re-send. I spotted that the attributes were not right for the
>  new pattern I was adding. The change between this and the first version was:
> 
>   +  [(set_attr "type" "neon_bsl,neon_bsl,neon_bsl,multiple")
>   +   (set_attr "length" "4,4,4,12")]
> ]
> 
> ---
> 
> Hi,
> 
> In this testcase, all argument registers and the return register
> will be general purpose registers:
> 
>   long long
>   foo (long long a, long long b, long long c)
>   {
> return ((a ^ b) & c) ^ b;
>   }
> 
> However, due to the implementation of aarch64_simd_bsl_internal
> we'll match that pattern and emit a BSL, necessitating moving all those
> arguments and results to the Advanced SIMD registers:
> 
>   fmovd2, x0
>   fmovd0, x2
>   fmovd1, x1
>   bsl v0.8b, v2.8b, v1.8b
>   fmovx0, d0
> 
> To fix this, we turn aarch64_simd_bsldi_internal in to an insn_and_split that
> knows to split back to integer operations if the register allocation
> falls that way.
> 
> We could have used an unspec, but then we lose some of the nice
> simplifications that can be made from explicitly spelling out the semantics
> of BSL.
> 
> Bootstrapped on aarch64-none-linux-gnu.
> 
> OK?
> 
> Thanks,
> James
> 
> ---
> gcc/
> 
> 2017-06-12  James Greenhalgh  
> 
>   * config/aarch64/aarch64-simd.md
>   (aarch64_simd_bsl_internal): Remove DImode.
>   (*aarch64_simd_bsl_alt): Likewise.
>   (aarch64_simd_bsldi_internal): New.
> 
> gcc/testsuite/
> 
> 2017-06-12  James Greenhalgh  
> 
>   * gcc.target/aarch64/no-dimode-bsl.c: New.
>   * gcc.target/aarch64/dimode-bsl.c: New.
> 

> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index c5a86ff..7b6b12f 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -2256,13 +2256,13 @@
>  ;; in *aarch64_simd_bsl_alt.
>  
>  (define_insn "aarch64_simd_bsl_internal"
> -  [(set (match_operand:VSDQ_I_DI 0 "register_operand" "=w,w,w")
> - (xor:VSDQ_I_DI
> -(and:VSDQ_I_DI
> -  (xor:VSDQ_I_DI
> +  [(set (match_operand:VDQ_I 0 "register_operand" "=w,w,w")
> + (xor:VDQ_I
> +(and:VDQ_I
> +  (xor:VDQ_I
>  (match_operand: 3 "register_operand" "w,0,w")
> -(match_operand:VSDQ_I_DI 2 "register_operand" "w,w,0"))
> -  (match_operand:VSDQ_I_DI 1 "register_operand" "0,w,w"))
> +(match_operand:VDQ_I 2 "register_operand" "w,w,0"))
> +  (match_operand:VDQ_I 1 "register_operand" "0,w,w"))
> (match_dup: 3)
>   ))]
>"TARGET_SIMD"
> @@ -2280,14 +2280,14 @@
>  ;; permutations of commutative operations, we have to have a separate 
> pattern.
>  
>  (define_insn "*aarch64_simd_bsl_alt"
> -  [(set (match_operand:VSDQ_I_DI 0 "register_operand" "=w,w,w")
> - (xor:VSDQ_I_DI
> -(and:VSDQ_I_DI
> -  (xor:VSDQ_I_DI
> -(match_operand:VSDQ_I_DI 3 "register_operand" "w,w,0")
> -(match_operand:VSDQ_I_DI 2 "register_operand" "w,0,w"))
> -   (match_operand:VSDQ_I_DI 1 "register_operand" "0,w,w"))
> -   (match_dup:VSDQ_I_DI 2)))]
> +  [(set (match_operand:VDQ_I 0 "register_operand" "=w,w,w")
> + (xor:VDQ_I
> +(and:VDQ_I
> +  (xor:VDQ_I
> +(match_operand:VDQ_I 3 "register_operand" "w,w,0")
> +(match_operand:VDQ_I 2 "register_operand" "w,0,w"))
> +   (match_operand:VDQ_I 1 "register_operand" "0,w,w"))
> +   (match_dup:VDQ_I 2)))]
>"TARGET_SIMD"
>"@
>bsl\\t%0., %3., %2.
> @@ -2296,6 +2296,45 @@
>[(set_attr "type" "neon_bsl")]
>  )
>  
> +;; DImode is special, we want to avoid computing operations which are
> +;; more naturally computed in general purpose registers in the vector
> +;; registers.  If we do that, we need to move all three operands from general
> +;; purpose registers to vector registers, then back again.  However, we
> +;; don't want to make this pattern an UNSPEC as we'd lose scope for
> +;; optimizations based on the component operations of a BSL.
> +;;
> +;; That means we need a splitter back to the individual operations, if they
> +;; would be better calculated on the integer side.
> +
> +(define_insn_and_split "aarch64_simd_bsldi_internal"
> +  [(set (match_operand:DI 0 "register_operand" "=w,w,w,&r")
> + (xor:DI
> +(and:DI
> +  (xor:DI
> +(match_operand:DI 3 "register_operand" "w,0,w,r")
> +(match_operand:DI 2 "register_operand" "w,w,0,r"))
> +  (match_operand:DI 1 "register_operand" "0,w,w,r"))
> +   (match_dup:DI 3)
> + ))]
> +  "TARGET_SIMD"
> +  "@
> +  bsl\\t%0.8b, %2.8b, %3.8b
> +  bit\\t%0.8b, %2.8b, %1.8b
> +  bif\\t%0.8b, %3.8b, %1.8b
> +  #"
> +  "&& GP_REGNUM_P (REGNO (operands[0]))"
> +  [(match_dup 0) (match_dup 1) (match_dup 2) (match_dup 3)]
> +{
> +  /* Split back to individual operation

Re: [Mechanical Patch ARM/AArch64 1/2] Rename load/store scheduling types to encode data size

2017-06-21 Thread James Greenhalgh

On Mon, Jun 12, 2017 at 03:28:52PM +0100, Kyrill Tkachov wrote:
> 
> On 12/06/17 14:53, James Greenhalgh wrote:
> >Hi,
> >
> >In the AArch64 backend and scheduling models there is some confusion as to
> >what the load1/load2 etc. scheduling types refer to. This leads to us using
> >load1/load2 in two contexts - for a variety of 32-bit, 64-bit and 128-bit
> >loads in AArch32 and 128-bit loads in AArch64. That leads to an undesirable
> >confusion in scheduling.
> >
> >Fixing it is easy, but mechanical and boring. Essentially,
> >
> >   s/load1/load_4/
> >   s/load2/load_8/
> >   s/load3/load_12/
> >   s/load4/load_16/
> >   s/store1/store_4/
> >   s/store2/store_8/
> >   s/store3/store_12/
> >   s/store4/store_16/
> 
> So the number now is the number of bytes being loaded?
> 
> >Across all sorts of pipeline models, and the two backends.
> >
> >I have intentionally not modified any of the patterns which now look 
> >obviously
> >incorrect. I'll be doing a second pass over the AArch64 back-end in patch
> >2/2 which will fix these bugs. The AArch32 back-end looked to me to get this
> >correct.
> >
> >Bootstrapped on AArch64 and ARM without issue - there's no functional
> >change here.
> >
> >OK?
> 
> Ok from an arm perspective.

*Ping* for the AArch64 maintainers.

Thanks,
James

Re: [Patch AArch64 2/2] Fix memory sizes to load/store patterns

2017-06-21 Thread James Greenhalgh

*ping*

Thanks,
James

On Mon, Jun 12, 2017 at 02:54:00PM +0100, James Greenhalgh wrote:
> 
> Hi,
> 
> There seems to be a partial misconception in the AArch64 backend that
> load1/load2 referred to the number of registers to load, rather than the
> number of words to load. This patch fixes that using the new "number of
> byte" types added in the previous patch.
> 
> That means using the load_16 and store_16 types that were defined in the
> previous patch for the first time in the AArch64 backend. To ensure
> continuity for scheduling models, I've just split this out from load_8.
> Please update your models if this is very wrong!
> 
> Bootstrapped on aarch64-none-linux-gnu with no issue.
> 
> OK?
> 
> Thanks,
> James
> 
> ---
> 2017-06-12  James Greenhalgh  
> 
>   * config/aarch64/aarch64.md (movdi_aarch64): Set load/store
>   types correctly.
>   (movti_aarch64): Likewise.
>   (movdf_aarch64): Likewise.
>   (movtf_aarch64): Likewise.
>   (load_pairdi): Likewise.
>   (store_pairdi): Likewise.
>   (load_pairdf): Likewise.
>   (store_pairdf): Likewise.
>   (loadwb_pair_): Likewise.
>   (storewb_pair_): Likewise.
>   (ldr_got_small_): Likewise.
>   (ldr_got_small_28k_): Likewise.
>   (ldr_got_tiny): Likewise.
>   * config/aarch64/iterators.md (ldst_sz): New.
>   (ldpstp_sz): Likewise.
>   * config/aarch64/thunderx.md (thunderx_storepair): Split store_8
>   to store_16.
>   (thunderx_load): Split load_8 to load_16.
>   * config/aarch64/thunderx2t99.md (thunderx2t99_loadpair): Split
>   load_8 to load_16.
>   (thunderx2t99_storepair_basic): Split store_8 to store_16.
>   * config/arm/xgene1.md (xgene1_load_pair): Split load_8 to load_16.
>   (xgene1_store_pair): Split store_8 to store_16.
> 

> diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
> index 11295a6..a1385e3 100644
> --- a/gcc/config/aarch64/aarch64.md
> +++ b/gcc/config/aarch64/aarch64.md
> @@ -981,7 +981,7 @@
> DONE;
>  }"
>[(set_attr "type" "mov_reg,mov_reg,mov_reg,mov_imm,mov_imm,\
> - load_4,load_4,store_4,store_4,\
> + load_8,load_8,store_8,store_8,\
>   adr,adr,f_mcr,f_mrc,fmov,neon_move")
> (set_attr "fp" "*,*,*,*,*,*,yes,*,yes,*,*,yes,yes,yes,*")
> (set_attr "simd" "*,*,*,*,*,*,*,*,*,*,*,*,*,*,yes")]
> @@ -1026,7 +1026,8 @@
> ldr\\t%q0, %1
> str\\t%q1, %0"
>[(set_attr "type" "multiple,f_mcr,f_mrc,neon_logic_q, \
> -  load_8,store_8,store_8,f_loadd,f_stored")
> +  load_16,store_16,store_16,\
> + load_16,store_16")
> (set_attr "length" "8,8,8,4,4,4,4,4,4")
> (set_attr "simd" "*,*,*,yes,*,*,*,*,*")
> (set_attr "fp" "*,*,*,*,*,*,*,yes,yes")]
> @@ -1121,7 +1122,7 @@
> str\\t%x1, %0
> mov\\t%x0, %x1"
>[(set_attr "type" "neon_move,f_mcr,f_mrc,fmov,fconstd,\
> - f_loadd,f_stored,load_4,store_4,mov_reg")
> + f_loadd,f_stored,load_8,store_8,mov_reg")
> (set_attr "simd" "yes,*,*,*,*,*,*,*,*,*")]
>  )
>  
> @@ -1145,7 +1146,7 @@
> stp\\t%1, %H1, %0
> stp\\txzr, xzr, %0"
>[(set_attr "type" "logic_reg,multiple,f_mcr,f_mrc,neon_move_q,f_mcr,\
> - f_loadd,f_stored,load_8,store_8,store_8")
> + f_loadd,f_stored,load_16,store_16,store_16")
> (set_attr "length" "4,8,8,8,4,4,4,4,4,4,4")
> (set_attr "simd" "yes,*,*,*,yes,*,*,*,*,*,*")]
>  )
> @@ -1209,7 +1210,7 @@
>"@
> ldp\\t%x0, %x2, %1
> ldp\\t%d0, %d2, %1"
> -  [(set_attr "type" "load_8,neon_load1_2reg")
> +  [(set_attr "type" "load_16,neon_load1_2reg")
> (set_attr "fp" "*,yes")]
>  )
>  
> @@ -1244,7 +1245,7 @@
>"@
> stp\\t%x1, %x3, %0
> stp\\t%d1, %d3, %0"
> -  [(set_attr "type" "store_8,neon_store1_2reg")
> +  [(set_attr "type" "store_16,neon_store1_2reg")
> (set_attr "fp" "*,yes")]
>  )
>  
> @@ -1278,7 +1279,7 @@
>"@
> ldp\\t%d0, %d2, %1
> ldp\\t%x0, %x2, %1"
> -  [(set_attr "type" "neon_load1_2reg,load_8")
> +  [(set_attr "type" "neon_load1_2reg,load_16")
> (set_attr "fp" "yes,*")]
>  )
>  
> @@ -1312,7 +1313,7 @@
>"@
> stp\\t%d1, %d3, %0
> stp\\t%x1, %x3, %0"
> -  [(set_attr "type" "neon_store1_2reg,store_8")
> +  [(set_attr "type" "neon_store1_2reg,store_16")
> (set_attr "fp" "yes,*")]
>  )
>  
> @@ -1330,7 +1331,7 @@
> (match_operand:P 5 "const_int_operand" "n"])]
>"INTVAL (operands[5]) == GET_MODE_SIZE (mode)"
>"ldp\\t%2, %3, [%1], %4"
> -  [(set_attr "type" "load_8")]
> +  [(set_attr "type" "load_")]
>  )
>  
>  (define_insn "loadwb_pair_"
> @@ -1363,7 +1364,7 @@
>(match_operand:GPI 3 "register_operand" "r"))])]
>"INTVAL (operands[5]) == INTVAL (operands[4]) + GET_MODE_SIZE 
> (mode)"
>"stp\\t%2, %3, [%0, %4]!"
> -  [(set_attr "type" "store_8")]
> +  [(set_attr "type" "st

Re: [PATCH/AARCH64 v2] Enable software prefetching (-fprefetch-loop-arrays) for ThunderX 88xxx

2017-06-21 Thread James Greenhalgh

On Tue, Jun 20, 2017 at 11:13:24AM -0700, Andrew Pinski wrote:
> Here is the updated patch based on the new infrastructure which is now 
> included.
> 
> OK?  Bootstrapped and tested on aarch64-linux-gnu with no regressions
> and tested again on SPEC CPU 2006 on THunderX T88 with the speed up
> mentioned before.

OK.

Thanks,
James

> 
> Thanks,
> Andrew Pinski
> 
> ChangeLog:
> * config/aarch64/aarch64-cores.def (thunderxt88p1): Use thunderxt88 tunings.
> (thunderxt88): Likewise.
> 
> * config/aarch64/aarch64.c (thunderxt88_prefetch_tune): New variable.
> (thunderx_prefetch_tune): New variable.
> (thunderx2t99_prefetch_tune): Update for the correct values.
> (thunderxt88_tunings): New variable.
> (thunderx_tunings): Use thunderx_prefetch_tune instead of 
> generic_prefetch_tune.
> (thunderx2t99_tunings): Use AUTOPREFETCHER_WEAK.

[committed] Fix ICE with shared clause on non-static data member in a member function (PR c++/81130)

2017-06-21 Thread Jakub Jelinek

Hi!

This patch is both a fix for the ICE on the testcase below and an
optimization - there is no point to keep shared clauses for vars that
have ctors/dtors, but aren't referenced in the construct, for privatization
clauses we do it because of the ctors/dtors involved, but for shared there
is no privatization, nothing needs to be constructed or destructed.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk,
queued for backporting.

2017-06-21  Jakub Jelinek  

PR c++/81130
* gimplify.c (omp_add_variable): Don't force GOVD_SEEN for types
with ctors/dtors if GOVD_SHARED is set.

* testsuite/libgomp.c++/pr81130.C: New test.

--- gcc/gimplify.c.jj   2017-06-19 17:25:06.0 +0200
+++ gcc/gimplify.c  2017-06-20 12:08:07.296187833 +0200
@@ -6634,9 +6634,11 @@ omp_add_variable (struct gimplify_omp_ct
 return;
 
   /* Never elide decls whose type has TREE_ADDRESSABLE set.  This means
- there are constructors involved somewhere.  */
-  if (TREE_ADDRESSABLE (TREE_TYPE (decl))
-  || TYPE_NEEDS_CONSTRUCTING (TREE_TYPE (decl)))
+ there are constructors involved somewhere.  Exception is a shared clause,
+ there is nothing privatized in that case.  */
+  if ((flags & GOVD_SHARED) == 0
+  && (TREE_ADDRESSABLE (TREE_TYPE (decl))
+ || TYPE_NEEDS_CONSTRUCTING (TREE_TYPE (decl
 flags |= GOVD_SEEN;
 
   n = splay_tree_lookup (ctx->variables, (splay_tree_key)decl);
--- libgomp/testsuite/libgomp.c++/pr81130.C.jj  2017-06-20 12:34:19.251478185 
+0200
+++ libgomp/testsuite/libgomp.c++/pr81130.C 2017-06-20 12:33:51.0 
+0200
@@ -0,0 +1,41 @@
+// PR c++/81130
+// { dg-do run }
+
+struct A
+{
+  A ();
+  ~A ();
+  int a;
+};
+
+A::A ()
+{
+  a = 0;
+}
+
+A::~A ()
+{
+}
+
+struct B
+{
+  A b;
+  int c;
+  B () : c (1)
+  {
+#pragma omp parallel shared (b, c) num_threads (2)
+#pragma omp master
+{
+  b.a++;
+  c += 2;
+}
+  }
+};
+
+int
+main ()
+{
+  B v;
+  if (v.b.a != 1 || v.c != 3)
+__builtin_abort ();
+}

Jakub

Re: [PATCH/AARCH64] Improve/correct ThunderX 1 cost model for Arith_shift

2017-06-21 Thread James Greenhalgh

On Tue, Jun 20, 2017 at 02:07:22PM -0700, Andrew Pinski wrote:
> On Mon, Jun 19, 2017 at 2:00 PM, Andrew Pinski  wrote:
> > On Wed, Jun 7, 2017 at 10:16 AM, James Greenhalgh
> >  wrote:
> >> On Fri, Dec 30, 2016 at 10:05:26PM -0800, Andrew Pinski wrote:
> >>> Hi,
> >>>   Currently for the following function:
> >>> int f(int a, int b)
> >>> {
> >>>   return a + (b <<7);
> >>> }
> >>>
> >>> GCC produces:
> >>> add w0, w0, w1, lsl 7
> >>> But for ThunderX 1, it is better if the instruction was split allowing
> >>> better scheduling to happen in most cases, the latency is the same.  I
> >>> get a small improvement in coremarks, ~1%.
> >>>
> >>> Currently the code does not take into account Arith_shift even though
> >>> the comment:
> >>>   /* Strip any extend, leave shifts behind as we will
> >>> cost them through mult_cost.  */
> >>> Say it does not strip out the shift, aarch64_strip_extend does and has
> >>> always has since the back-end was added to GCC.
> >>>
> >>> Once I fixed the code around aarch64_strip_extend, I got a regression
> >>> for ThunderX 1 as some shifts/extends (left shifts <=4 and/or zero
> >>> extends) are considered free so I needed to add a new tuning flag.
> >>>
> >>> Note I will get an even more improvement for ThunderX 2 CN99XX, but I
> >>> have not measured it yet as I have not made the change to
> >>> aarch64-cost-tables.h yet as I am waiting for approval of the renaming
> >>> patch first before submitting any of the cost table changes.  Also I
> >>> noticed this problem with this tuning first and then looked back at
> >>> what I needed to do for ThunderX 1.
> >>>
> >>> OK?  Bootstrapped and tested on aarch64-linux-gnu without any
> >>> regressions (both with and without --with-cpu=thunderx).
> >>
> >> This is mostly OK, but I don't like the name "easy"_shift_extend. Cheap
> >> or free seems better. I have some other minor points below.
> >
> >
> > Ok, that seems like a good idea.  I used easy since that was the
> > wording our hardware folks had came up with.  I am changing the
> > comments to make clearer when this flag should be used.
> > I should a new patch out by the end of today.
> 
> Due to the LSE ICE which I reported in the other thread, it took me
> longer to send out a new patch.
> Anyways here is the updated patch with the changes requested.
> 
> 
> OK? Bootstrapped and tested on aarch64-linux-gnu with no regressions.

One grammar fix inline below, otherwise this is OK.

Thanks,
James

> * config/aarch64/aarch64-cost-tables.h (thunderx_extra_costs):
> Increment Arith_shift and Arith_shift_reg by 1.
> * config/aarch64/aarch64-tuning-flags.def (cheap_shift_extend): New tuning 
> flag.
> * config/aarch64/aarch64.c (thunderx_tunings): Enable
> AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND.
> (aarch64_strip_extend): Add new argument and test for it.
> (aarch64_cheap_mult_shift_p): New function.
> (aarch64_rtx_mult_cost): Call aarch64_cheap_mult_shift_p and don't add
> a cost if it is true.
> Update calls to aarch64_strip_extend.
> (aarch64_rtx_costs): Update calls to aarch64_strip_extend.
> 
> +
> +/* Return true iff X is an cheap shift without a sign extend. */

s/an cheap/a cheap/

> +
> +static bool
> +aarch64_cheap_mult_shift_p (rtx x)
> +{
> +  rtx op0, op1;
> +
> +  op0 = XEXP (x, 0);
> +  op1 = XEXP (x, 1);
> +
> +  if (!(aarch64_tune_params.extra_tuning_flags
> +  & AARCH64_EXTRA_TUNE_CHEAP_SHIFT_EXTEND))
> +return false;
> +
> +  if (GET_CODE (op0) == SIGN_EXTEND)
> +return false;
> +
> +  if (GET_CODE (x) == ASHIFT && CONST_INT_P (op1)
> +  && UINTVAL (op1) <= 4)
> +return true;
> +
> +  if (GET_CODE (x) != MULT || !CONST_INT_P (op1))
> +return false;
> +
> +  HOST_WIDE_INT l2 = exact_log2 (INTVAL (op1));
> +
> +  if (l2 > 0 && l2 <= 4)
> +return true;
> +
> +  return false;
> +}
> +
>  /* Helper function for rtx cost calculation.  Calculate the cost of
> a MULT or ASHIFT, which may be part of a compound PLUS/MINUS rtx.
> Return the calculated cost of the expression, recursing manually in to
> @@ -6164,7 +6200,11 @@ aarch64_rtx_mult_cost (rtx x, enum rtx_c
>   {
> if (compound_p)
>   {
> -   if (REG_P (op1))
> +   /* If the shift is considered cheap,
> +  then don't add any cost. */
> +   if (aarch64_cheap_mult_shift_p (x))
> + ;
> +   else if (REG_P (op1))
>   /* ARITH + shift-by-register.  */
>   cost += extra_cost->alu.arith_shift_reg;
> else if (is_extend)
> @@ -6182,7 +6222,7 @@ aarch64_rtx_mult_cost (rtx x, enum rtx_c
>   }
> /* Strip extends as we will have costed them in the case above.  */
> if (is_extend)
> - op0 = aarch64_strip_extend (op0);
> + op0 = aarch64_strip_extend (op0, true);
>  
> cost += rtx_cost (op0, VOIDmode, code, 0, speed);
>  
> @@ -7026,13 +7066,13 @@ cost_minus:
>   if (s

Re: Backports to 6.x

2017-06-21 Thread Nathan Sidwell


On 06/21/2017 04:14 AM, Martin Liška wrote:
As release managers are planning to release next version of GCC 6. I 
would like to

do backport revisions attached.

The only complicated one is the one for PR69953 where I decided to backport
also refactoring patches applied by Nathan (244529, 244156).

I would appreciate another pair of eyes to look at backports.


Looks good to me.

nathan

--
Nathan Sidwell

Re: [PATCH 2/2] DWARF: make it possible to emit debug info for declarations only

2017-06-21 Thread Pierre-Marie de Rodat


On 06/21/2017 09:11 AM, Richard Biener wrote:

Sure, no obligation for you to enhance Fortran debug!


Understood, thanks. For your information, I just committed the change. 
Thank you again for reviewing!


--
Pierre-Marie de Rodat

Re: [PATCH 2/3] Simplify wrapped binops

2017-06-21 Thread Robin Dapp

> use INTEGRAL_TYPE_P.

Done.

> but you do not actually _use_ vr_outer.  Do you think that if
> vr_outer is a VR_RANGE then the outer operation may not
> possibly have wrapped?  That's a false conclusion.

These were remains of a previous version.  vr_outer is indeed not needed
anymore; removed.

> wi::add overload with the overflow flag?  ISTR you want to handle "negative"
> unsigned constants somehow, but then I don't see how the above works.
> I'd say if wmin/wmax interpreted as signed are positive and then using
> a signed op to add w1 results in a still positive number you're fine
> (you don't seem
> to restrict the widening cast to either zero- or sign-extending).

Changed to using wi:add overload now.

In essence, three cases are being handled:
 - wrapped_range --> do not simplify
 - !wrapped_range && ovf ("negative" unsigned) --> simplify and combine
with sign extension in the outer type
 - !wrapped_range && !ovf ("positive" unsigned) --> simplify and combine
with zero extension in the outer type.

Regards
 Robin
diff --git a/gcc/match.pd b/gcc/match.pd
index 80a17ba..ec1af69 100644
--- a/gcc/match.pd
+++ b/gcc/match.pd
@@ -1290,6 +1290,116 @@ DEFINE_INT_AND_FLOAT_ROUND_FN (RINT)
 (if (cst && !TREE_OVERFLOW (cst))
  (plus { cst; } @0
 
+/* ((T)(A +- CST)) +- CST -> (T)(A) +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+ (for inner_op (plus minus)
+   (simplify
+	 (outer_op (convert (inner_op@3 @0 INTEGER_CST@1)) INTEGER_CST@2)
+	   (if (INTEGRAL_TYPE_P (type)
+		&& TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@3)))
+	(with
+	{
+	  tree cst;
+	  tree inner_type = TREE_TYPE (@3);
+	  wide_int wmin0, wmax0;
+
+	  bool ovf = true;
+	  bool ovf_undef = TYPE_OVERFLOW_UNDEFINED (inner_type);
+
+	  enum value_range_type vr0 =
+		get_range_info (@0, &wmin0, &wmax0);
+
+	  bool wrapped_range = true;
+
+	  /* Convert combined constant to tree of outer type if
+		 there was no overflow in the original inner operation.  */
+	  if (ovf_undef || vr0 == VR_RANGE)
+	  {
+		wide_int w1 = @1;
+		wide_int w2 = @2;
+
+		if (inner_op == MINUS_EXPR)
+		  w1 = wi::neg (w1);
+
+		if (outer_op == MINUS_EXPR)
+		  w2 = wi::neg (w2);
+
+		bool ovf;
+
+		if (!ovf_undef && vr0 == VR_RANGE)
+		  {
+		bool max_ovf;
+		bool min_ovf;
+
+		signop sgn = TYPE_SIGN (inner_type);
+		wi::add (wmin0, w1, sgn, &min_ovf);
+		wi::add (wmax0, w1, sgn, &max_ovf);
+
+		ovf = min_ovf || max_ovf;
+		wrapped_range = ((min_ovf && !max_ovf)
+   || (!min_ovf && max_ovf));
+		  }
+
+		/* Extend @1 to TYPE. */
+		w1 = w1.from (w1, TYPE_PRECISION (type),
+			  ovf ? SIGNED : TYPE_SIGN (inner_type));
+
+		/* Combine in outer, larger type.  */
+		wide_int combined_cst;
+		combined_cst = wi::add (w1, w2);
+
+		cst = wide_int_to_tree (type, combined_cst);
+	  }
+	}
+(if (ovf_undef || !wrapped_range)
+	 (outer_op (convert @0) { cst; }))
+	)
+#endif
+
+/* ((T)(A)) +- CST -> (T)(A +- CST)  */
+#if GIMPLE
+   (for outer_op (plus minus)
+(simplify
+ (outer_op (convert SSA_NAME@0) INTEGER_CST@2)
+  (if (INTEGRAL_TYPE_P (TREE_TYPE (@0))
+	   && INTEGRAL_TYPE_P (type)
+	   && TYPE_PRECISION (type) > TYPE_PRECISION (TREE_TYPE (@0)))
+   /* Perform binary operation inside the cast if the constant fits
+	  and there is no overflow.  */
+   (with
+	{
+	  bool wrapped_range = true;
+	  tree cst_inner = NULL_TREE;
+	  enum value_range_type vr = VR_VARYING;
+	  tree inner_type = TREE_TYPE (@0);
+
+	  if (int_fits_type_p (@2, inner_type))
+	  {
+	cst_inner = fold_convert (inner_type, @2);
+
+	wide_int wmin0, wmax0;
+	wide_int w1 = cst_inner;
+	signop sgn = TYPE_SIGN (inner_type);
+	vr = get_range_info (@0, &wmin0, &wmax0);
+
+	if (vr == VR_RANGE)
+	  {
+		bool min_ovf;
+		wi::add (wmin0, w1, sgn, &min_ovf);
+
+		bool max_ovf;
+		wi::add (wmax0, w1, sgn, &max_ovf);
+
+		wrapped_range = (min_ovf && !max_ovf) || (!min_ovf && max_ovf);
+	  }
+	  }
+	}
+   (if (cst_inner && !wrapped_range)
+	(convert (outer_op @0 { cst_inner; })))
+   
+#endif
+
   /* ~A + A -> -1 */
   (simplify
(plus:c (bit_not @0) @0)

Re: [PATCH] Fix PR81090, properly free niter estimates

2017-06-21 Thread Christophe Lyon

On 20 June 2017 at 11:45, Richard Biener  wrote:
> On Tue, 20 Jun 2017, Alan Hayward wrote:
>
>>
>> > On 19 Jun 2017, at 13:35, Richard Biener  wrote:
>> >
>> > On Mon, 19 Jun 2017, Christophe Lyon wrote:
>> >
>> >> Hi Richard,
>> >>
>> >> On 16 June 2017 at 14:18, Richard Biener  wrote:
>> >>> On Wed, 14 Jun 2017, Richard Biener wrote:
>> >>>
>> 
>>  niter estimates are not kept up-to-date (they reference gimple stmts
>>  and trees) in the keep-loop-stuff infrastructure so similar to the
>>  SCEV cache we rely on people freeing it after passes.
>> 
>>  The following brings us a step closer to that by freeing them whenever
>>  SCEV is invalidated (we only compute them when SCEV is active) plus
>>  removing the odd record-bounds pass that just computes them, leaving
>>  scavenging to following passes.
>> 
>>  Bootstrap and regtest running on x86_64-unknown-linux-gnu.
>> >>>
>> >>> Some awkward interactions with peeling means I'm installing the
>> >>> following less aggressive variant.
>> >>>
>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.
>> >>>
>> >>> Richard.
>> >>>
>> >>> 2017-06-16  Richard Biener  
>> >>>
>> >>>PR tree-optimization/81090
>> >>>* passes.def (pass_record_bounds): Remove.
>> >>>* tree-pass.h (make_pass_record_bounds): Likewise.
>> >>>* tree-ssa-loop.c (pass_data_record_bounds, pass_record_bounds,
>> >>>make_pass_record_bounds): Likewise.
>> >>>* tree-ssa-loop-ivcanon.c (canonicalize_induction_variables): Do
>> >>>not free niter estimates at the beginning but at the end.
>> >>>* tree-scalar-evolution.c (scev_finalize): Free niter estimates.
>> >>>
>> >>>* gcc.dg/graphite/pr81090.c: New testcase.
>> >>>
>> >>
>> >> Sorry to bother you again...
>> >> With this commit (r249249), I've noticed regressions on aarch64/arm:
>> >> FAIL:gcc.dg/vect/pr65947-9.c -flto -ffat-lto-objects
>> >> scan-tree-dump-not vect "LOOP VECTORIZED"
>> >> FAIL:gcc.dg/vect/pr65947-9.c scan-tree-dump-not vect "LOOP VECTORIZED"
>> >
>> > So the testcase gets vectorized now (for whatever reason) and still passes
>> > execution.  Not sure why the testcase checked for not being vectorized.
>> >
>> > Alan?
>> >
>> > Richard.
>>
>> I’ve not looked at the new patch, but pr65947-9.c was added to test:
>>
>> + /* Condition reduction with maximum possible loop size.  Will fail to
>> +vectorize because the vectorisation requires a slot for default values. 
>>  */
>>
>> So, in the pr65947-9.c, if nothing passes the IF clause, then LAST needs to 
>> be
>> set to -72.
>
> So the runtime part of the testcase fails to test this case and we expect
> it to FAIL if vectorized?
>
> Index: testsuite/gcc.dg/vect/pr65947-9.c
> ===
> --- testsuite/gcc.dg/vect/pr65947-9.c   (revision 249145)
> +++ testsuite/gcc.dg/vect/pr65947-9.c   (working copy)
> @@ -34,9 +34,9 @@ main (void)
>
>check_vect ();
>
> -  char ret = condition_reduction (a, 16);
> +  char ret = condition_reduction (a, 1);
>
> -  if (ret != 10)
> +  if (ret != -72)
>  abort ();
>
>return 0;
>
> On aarch64 I can reproduce the inline copy in main to be vectorized
> (doesn't happen on x86_64).  niter analysis says:
>
> Analyzing # of iterations of loop 1
>   exit condition [253, + , 4294967295] != 0
>   bounds on difference of bases: -253 ... -253
>   result:
> # of iterations 253, bounded by 253
> Analyzing # of iterations of loop 1
>   exit condition [253, + , 4294967295] != 0
>   bounds on difference of bases: -253 ... -253
>   result:
> # of iterations 253, bounded by 253
> Statement (exit)if (ivtmp_45 != 0)
>  is executed at most 253 (bounded by 253) + 1 times in loop 1.
>
> so it fits there.  While the offline copy has
>
> Analyzing # of iterations of loop 1
>   exit condition [254, + , 4294967295] != 0
>   bounds on difference of bases: -254 ... -254
>   result:
> # of iterations 254, bounded by 254
> Analyzing # of iterations of loop 1
>   exit condition [254, + , 4294967295] != 0
>   bounds on difference of bases: -254 ... -254
>   result:
> # of iterations 254, bounded by 254
> Statement (exit)if (ivtmp_7 != 0)
>  is executed at most 254 (bounded by 254) + 1 times in loop 1.
>
> we peeled one iteration (ch_loop does that) so we have the place
> left.
>
> Marking the function noinline works as a fix I guess.
>
> Tested on x86_64-unknown-linux-gnu, installed.
>
> Richard.
>
> 2017-06-20  Richard Biener  
>
> * gcc.dg/vect/pr65947-9.c: Adjust.

Hi,

After this change (r249400), the test fails on aarch64/arm:
FAIL:gcc.dg/vect/pr65947-9.c -flto -ffat-lto-objects
scan-tree-dump vect "loop size is greater than data size"
FAIL:gcc.dg/vect/pr65947-9.c scan-tree-dump vect "loop size is
greater than data size"

Christophe

>
> Index: gcc/testsuite/gcc.dg/vect/pr65947-9.c
> =

Re: [PATCH 2/3] Make early return predictor more precise.

2017-06-21 Thread Martin Liška

On 06/21/2017 10:26 AM, Jan Hubicka wrote:
>>
>> Ok, so I fixed that in the described way. There's one remaining fallout of:
>> gcc/testsuite/gcc.dg/tree-ssa/ipa-split-5.c
>>
>> Where a fnsplit is properly done, but then it's again inlined:
>>
>> Considering split_me.part.0/5 with 23 size
>>  to be inlined into test/2 in unknown:0
>>  Estimated badness is -0.01, frequency 0.33.
>>  Inlined split_me.part.0 into test which now has time 50.30 and size 44, 
>> net change of +17.
>>
>> Considering split_me.part.0/5 with 23 size
>>  to be inlined into test/2 in unknown:0
>>  Estimated badness is -0.01, frequency 0.33.
>>  Inlined split_me.part.0 into test which now has time 70.76 and size 61, 
>> net change of +17.
>>
>> Considering split_me.part.0/5 with 23 size
>>  to be inlined into test/2 in unknown:0
>>  Estimated badness is -0.01, frequency 0.33.
>>  Inlined split_me.part.0 into test which now has time 91.22 and size 78, 
>> net change of +17.
>>
>> Considering split_me.part.0/5 with 23 size
>>  to be inlined into test/2 in unknown:0
>>  Estimated badness is -0.01, frequency 0.33.
>>  Inlined split_me.part.0 into test which now has time 111.68 and size 
>> 95, net change of +17.
>> Unit growth for small function inlining: 61->129 (111%)
>>
>> ...
>>
>> Any hint how to block the IPA inlining?
> 
> I guess you only want to make cold part of split_me bigger or
> just use --param to reduce growth for auto inlining.
> 
> How the patch reduces split_me_part considerably?
> Honza

Well, I probably overlooked test results, test works fine.

I'm going to install the patch.

Martin

>>
>> Sending new version of patch.
>> Martin
>>
>>>
>>> I would just move pass_strip_predict_hints pre-IPA and not worry about
>>> them chaining.
>>>
>>> There is problem that after inlining the prediction may expand its scope
>>> and predict branch that it outside of the original function body,
>>> but I do not see very easy solution for that besides not re-doing
>>> prediction (we could also copy probabilities from the inlined function
>>> when they exists and honnor them in the outer function. I am not sure
>>> that is going to improve prediction quality though - extra context
>>> is probably useful)
>>>
>>> Thanks,
>>> Honza

 Thanks,
 Martin

>
> Where did you found this case?
> Honza
>>  
>>/* Create a new deep copy of the statement.  */
>>copy = gimple_copy (stmt);
>> -- 
>> 2.13.0
>>
>>
> 
>> >From 84625a782add6ae2ed29630815b61b34a052770a Mon Sep 17 00:00:00 2001
>> From: marxin 
>> Date: Tue, 6 Jun 2017 10:55:18 +0200
>> Subject: [PATCH 1/2] Make early return predictor more precise.
>>
>> gcc/ChangeLog:
>>
>> 2017-05-26  Martin Liska  
>>
>>  PR tree-optimization/79489
>>  * gimplify.c (maybe_add_early_return_predict_stmt): New
>>  function.
>>  (gimplify_return_expr): Call the function.
>>  * predict.c (tree_estimate_probability_bb): Remove handling
>>  of early return.
>>  * predict.def: Update comment about early return predictor.
>>  * gimple-predict.h (is_gimple_predict): New function.
>>  * predict.def: Change default value of early return to 66.
>>  * tree-tailcall.c (find_tail_calls): Skip GIMPLE_PREDICT
>>  statements.
>>  * passes.def: Put pass_strip_predict_hints to the beginning of
>>  IPA passes.
>> ---
>>  gcc/gimple-low.c |  2 ++
>>  gcc/gimple-predict.h |  8 
>>  gcc/gimplify.c   | 16 
>>  gcc/passes.def   |  1 +
>>  gcc/predict.c| 41 -
>>  gcc/predict.def  | 15 +++
>>  gcc/tree-tailcall.c  |  2 ++
>>  7 files changed, 32 insertions(+), 53 deletions(-)
>>
>> diff --git a/gcc/gimple-low.c b/gcc/gimple-low.c
>> index 619b9d7bfb1..4ea6c3532f3 100644
>> --- a/gcc/gimple-low.c
>> +++ b/gcc/gimple-low.c
>> @@ -30,6 +30,8 @@ along with GCC; see the file COPYING3.  If not see
>>  #include "calls.h"
>>  #include "gimple-iterator.h"
>>  #include "gimple-low.h"
>> +#include "predict.h"
>> +#include "gimple-predict.h"
>>  
>>  /* The differences between High GIMPLE and Low GIMPLE are the
>> following:
>> diff --git a/gcc/gimple-predict.h b/gcc/gimple-predict.h
>> index ba58e12e9e9..0e6c2e1ea01 100644
>> --- a/gcc/gimple-predict.h
>> +++ b/gcc/gimple-predict.h
>> @@ -80,4 +80,12 @@ gimple_build_predict (enum br_predictor predictor, enum 
>> prediction outcome)
>>return p;
>>  }
>>  
>> +/* Return true if GS is a GIMPLE_PREDICT statement.  */
>> +
>> +static inline bool
>> +is_gimple_predict (const gimple *gs)
>> +{
>> +  return gimple_code (gs) == GIMPLE_PREDICT;
>> +}
>> +
>>  #endif  /* GCC_GIMPLE_PREDICT_H */
>> diff --git a/gcc/gimplify.c b/gcc/gimplify.c
>> index 9af95a28704..1c6e1591953 100644
>> --- a/gcc/gimplify.c
>> +++ b/gcc/gimplify.c
>> @@ -1428,6 +1428,20 @@ gimplify_bind_expr (tree *expr_p, gimple_seq *pre_p)
>>return GS_ALL_DONE;
>>  }
>>  
>> +/* Ma

[PATCH 4/N] Recover GOTO predictor.

2017-06-21 Thread Martin Liška

Hello.

There's one additional predictor enhancement that is GOTO predict that
used to working. Following patch adds expect statement for C and C++ family
languages.

There's one fallout which is vrp24.c test-case, where only 'Simplified 
relational'
appears just once. Adding Richi and Patrick who can probably help how to fix the
test-case.

Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

;; Function sss (sss, funcdef_no=0, decl_uid=1811, cgraph_uid=0, symbol_order=0)

;; 3 loops found
;;
;; Loop 0
;;  header 0, latch 1
;;  depth 0, outer -1
;;  nodes: 0 1 2 3 4 5 6 7 8 9 10 11 12
;; 2 succs { 10 3 }
;; 3 succs { 4 8 }
;; 4 succs { 10 5 }
;; 5 succs { 12 }
;; 6 succs { 7 }
;; 7 succs { 9 }
;; 8 succs { 12 }
;; 9 succs { 12 }
;; 10 succs { 6 11 }
;; 11 succs { 9 }
;; 12 succs { 1 }

SSA form after inserting ASSERT_EXPRs
sss (struct rtx_def * insn, int code1, int code2, int code3)
{
  int D1544;
  int n_sets;
  struct rtx_def * body;
  _Bool D1562;

   [100.00%] [count: INV]:
  body_5 = insn_4(D)->u.fld[5].rt_rtx;
  D1544_6 = body_5->code;
  if (D1544_6 == 55)
goto  (L7); [34.00%] [count: INV]
  else
goto ; [66.00%] [count: INV]

   [66.00%] [count: INV]:
  if (code3_7(D) == 99)
goto ; [34.00%] [count: INV]
  else
goto ; [66.00%] [count: INV]

   [22.44%] [count: INV]:
  D1562_9 = code1_8(D) == 10;
  n_sets_10 = (int) D1562_9;
  if (n_sets_10 > 0)
goto  (L7); [64.00%] [count: INV]
  else
goto ; [36.00%] [count: INV]

   [8.10%] [count: INV]:
  # n_sets_1 = PHI <0(4)>
  goto  (L16); [100.00%] [count: INV]

   [9.79%] [count: INV]:
  arf ();

   [9.82%] [count: INV]:
  # n_sets_19 = PHI <1(6)>
  goto ; [100.00%] [count: INV]

   [43.56%] [count: INV]:
  # n_sets_21 = PHI <0(3)>
  goto  (L16); [100.00%] [count: INV]

   [46.68%] [count: INV]:
  frob ();
  goto  (L16); [100.00%] [count: INV]

L7 [48.36%] [count: INV]:
  if (code2_12(D) == 42)
goto ; [20.24%] [count: INV]
  else
goto ; [79.76%] [count: INV]

   [38.61%] [count: INV]:
  # n_sets_17 = PHI <1(10)>
  goto ; [100.00%] [count: INV]

L16 [100.00%] [count: INV]:
  return;

}


Immediate_uses: 

n_sets_1 : --> no uses.

.MEM_2 : --> single use.
# VUSE <.MEM_2>
return;

.MEM_3(D) : -->6 uses.
.MEM_22 = PHI <.MEM_3(D)(3)>
.MEM_18 = PHI <.MEM_3(D)(10)>
.MEM_16 = PHI <.MEM_3(D)(4)>
# .MEM_13 = VDEF <.MEM_3(D)>
arf ();
# VUSE <.MEM_3(D)>
D1544_6 = body_5->code;
# VUSE <.MEM_3(D)>
body_5 = insn_4(D)->u.fld[5].rt_rtx;

insn_4(D) : --> single use.
body_5 = insn_4(D)->u.fld[5].rt_rtx;

body_5 : --> single use.
D1544_6 = body_5->code;

D1544_6 : --> single use.
if (D1544_6 == 55)

code3_7(D) : --> single use.
if (code3_7(D) == 99)

code1_8(D) : --> single use.
D1562_9 = code1_8(D) == 10;

D1562_9 : --> single use.
n_sets_10 = (int) D1562_9;

n_sets_10 : --> single use.
if (n_sets_10 > 0)

code2_12(D) : --> single use.
if (code2_12(D) == 42)

.MEM_13 : --> single use.
.MEM_20 = PHI <.MEM_13(6)>

.MEM_14 : --> single use.
.MEM_2 = PHI <.MEM_14(9), .MEM_22(8), .MEM_16(5)>

.MEM_16 : --> single use.
.MEM_2 = PHI <.MEM_14(9), .MEM_22(8), .MEM_16(5)>

n_sets_17 : --> no uses.

.MEM_18 : --> single use.
.MEM_23 = PHI <.MEM_20(7), .MEM_18(11)>

n_sets_19 : --> no uses.

.MEM_20 : --> single use.
.MEM_23 = PHI <.MEM_20(7), .MEM_18(11)>

n_sets_21 : --> no uses.

.MEM_22 : --> single use.
.MEM_2 = PHI <.MEM_14(9), .MEM_22(8), .MEM_16(5)>

.MEM_23 : --> single use.
# .MEM_14 = VDEF <.MEM_23>
frob ();

Adding destination of edge (0 -> 2) to worklist

Simulating block 2

Visiting statement:
if (D1544_6 == 55)

Visiting conditional with predicate: if (D1544_6 == 55)

With known ranges
D1544_6: VARYING

Predicate evaluates to: DON'T KNOW
Adding destination of edge (2 -> 10) to worklist
Adding destination of edge (2 -> 3) to worklist

Simulating block 3

Visiting statement:
if (code3_7(D) == 99)

Visiting conditional with predicate: if (code3_7(D) == 99)

With known ranges
code3_7(D): []

Predicate evaluates to: DON'T KNOW
Adding destination of edge (3 -> 4) to worklist
Adding destination of edge (3 -> 8) to worklist

Simulating block 8

Visiting PHI node: n_sets_21 = PHI <0(3)>
Argument #0 (3 -> 8 executable)
0: [0, 0]
Intersecting
  [0, 0]
and
  [0, 1]
to
  [0, 0]
Found new range for n_sets_21: [0, 0]
marking stmt to be not simulated again
Adding destination of edge (8 -> 12) to worklist

Simulating block 4

Visiting statement:
D1562_9 = code1_8(D) == 10;
Intersecting
  [0, +INF]
and
  [0, +INF]
to
  [0, +INF]
Found new range for D1562_9: [0, +INF]
marking stmt to be not simulated again

Visiting statement:
n_sets_10 = (int) D1562_9;
Intersecting
  [0, 1]
and
  [0, 1]
to
  [0, 1]
Found new range for n_sets_10: [0, 1]
marking stmt to be not simulated again

Visiting statement:
if (n_sets_10 > 0)

Visiting conditional with predicate: if (n_sets_10 > 0)

With known ranges
n_sets_10: [0, 1]

Predicate evaluates to: DON'T KNOW
Adding destination of edge (4 ->

Re: [PATCH 4/N] Recover GOTO predictor.

2017-06-21 Thread Martin Liška

On 06/21/2017 03:06 PM, Martin Liška wrote:
> Hello.
> 
> There's one additional predictor enhancement that is GOTO predict that
> used to working. Following patch adds expect statement for C and C++ family
> languages.
> 
> There's one fallout which is vrp24.c test-case, where only 'Simplified 
> relational'
> appears just once. Adding Richi and Patrick who can probably help how to fix 
> the
> test-case.
> 
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin
> 

And I forgot to mention hitrate on SPEC2017:

HEURISTICS   BRANCHES  (REL)  BR. HITRATE   
 HITRATE   COVERAGE COVERAGE  (REL)  predict.def  (REL)
goto  622   1.0%   64.31%   65.92% 
/  83.70%  725127790  725.13M   0.1%

Which says it's quite rare predictor, but with quite nice hitrate.

Martin

Re: [PATCH, testsuite] Add effective target stack_size

2017-06-21 Thread Jakub Jelinek

On Fri, Jun 09, 2017 at 04:24:30PM +0200, Tom de Vries wrote:
>   * gcc.dg/tree-prof/comp-goto-1.c: Same.
>   * gcc.dg/tree-prof/pr44777.c: Same.

> --- a/gcc/testsuite/gcc.dg/tree-prof/comp-goto-1.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/comp-goto-1.c
> @@ -1,6 +1,8 @@
>  /* { dg-require-effective-target freorder } */
>  /* { dg-require-effective-target label_values } */
>  /* { dg-options "-O2 -freorder-blocks-and-partition" } */
> +/* { dg-additional-options "-DSTACK_SIZE=[dg-effective-target-value 
> stack_size]" { target { stack_size } } } */
> +
>  #include 
>  
>  #if (!defined(STACK_SIZE) || STACK_SIZE >= 4000) && __INT_MAX__ >= 2147483647
> diff --git a/gcc/testsuite/gcc.dg/tree-prof/pr44777.c 
> b/gcc/testsuite/gcc.dg/tree-prof/pr44777.c
> index 4074b75..1249b5b 100644
> --- a/gcc/testsuite/gcc.dg/tree-prof/pr44777.c
> +++ b/gcc/testsuite/gcc.dg/tree-prof/pr44777.c
> @@ -2,6 +2,8 @@
>  /* { dg-require-effective-target label_values } */
>  /* { dg-require-effective-target trampolines } */
>  /* { dg-options "-O0" } */
> +/* { dg-additional-options "-DSTACK_SIZE=[dg-effective-target-value 
> stack_size]" { target { stack_size } } } */
> +
>  /* A variant of gcc.c-torture/execute/comp-goto-2.c.  */
>  
>  extern void abort (void);

I'm now seeing
WARNING: profopt.exp does not support dg-add-options
WARNING: profopt.exp does not support dg-add-options
so the above doesn't look correct.

Jakub

[PATCH] [SPARC] Add a workaround for the LEON3FT store-store errata

2017-06-21 Thread Daniel Cederman

Hello all,

I have modified the patch so that the workaround is enabled by using
either mfix-ut699, -mfix-ut700, or -mfix-gr712rc.

Daniel

-

This patch adds a workaround to the Sparc backend for the LEON3FT
store-store errata. It is enabled when using the -mfix-ut699,
-mfix-ut700, or -mfix-gr712rc flag.

The workaround inserts NOP instructions to prevent the following two
instruction sequences from being generated:

std -> stb/sth/st/std
stb/sth/st -> any single non-store/load instruction -> stb/sth/st/std

The __FIX_B2BST define can be used to only enable workarounds in assembly
code when the flag is used.

See GRLIB-TN-0009, "LEON3FT Stale Cache Entry After Store with Data Tag
Parity Error", for more information.

gcc/ChangeLog:

2017-06-21  Daniel Cederman  

* config/sparc/sparc.c (sparc_do_work_around_errata): Insert NOP
instructions to prevent sequences that can trigger the store-store
errata for certain LEON3FT processors.
(sparc_option_override): -mfix-ut699, -mfix-ut700, and
-mfix-gr712rc enables the errata workaround.
* config/sparc/sparc-c.c (sparc_target_macros): Define __FIX_B2BST
when errata workaround is enabled.
* config/sparc/sparc.md: Prevent stores in delay slot.
* config/sparc/sparc.opt: Add -mfix-ut700 and -mfix-gr712rc flag.
* doc/invoke.texi: Document -mfix-ut700 and -mfix-gr712rc flag.
---
 gcc/config/sparc/sparc-c.c |   3 ++
 gcc/config/sparc/sparc.c   | 115 -
 gcc/config/sparc/sparc.md  |  10 +++-
 gcc/config/sparc/sparc.opt |  12 +
 gcc/doc/invoke.texi|  14 +-
 5 files changed, 148 insertions(+), 6 deletions(-)

diff --git a/gcc/config/sparc/sparc-c.c b/gcc/config/sparc/sparc-c.c
index 9603173..6979f9c 100644
--- a/gcc/config/sparc/sparc-c.c
+++ b/gcc/config/sparc/sparc-c.c
@@ -60,4 +60,7 @@ sparc_target_macros (void)
   cpp_define (parse_in, "__VIS__=0x100");
   cpp_define (parse_in, "__VIS=0x100");
 }
+
+  if (sparc_fix_b2bst)
+builtin_define_std ("__FIX_B2BST");
 }
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index 790a036..6d6c941 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -896,6 +896,12 @@ mem_ref (rtx x)
to properly detect the various hazards.  Therefore, this machine specific
pass runs as late as possible.  */
 
+/* True if INSN is a md pattern or asm statement.  */
+#define USEFUL_INSN_P(INSN)\
+  (NONDEBUG_INSN_P (INSN)  \
+   && GET_CODE (PATTERN (INSN)) != USE \
+   && GET_CODE (PATTERN (INSN)) != CLOBBER)
+
 static unsigned int
 sparc_do_work_around_errata (void)
 {
@@ -915,6 +921,98 @@ sparc_do_work_around_errata (void)
if (rtx_sequence *seq = dyn_cast  (PATTERN (insn)))
  insn = seq->insn (1);
 
+  /* Look for a double-word store.  */
+  if (sparc_fix_b2bst
+ && NONJUMP_INSN_P (insn)
+ && (set = single_set (insn)) != NULL_RTX
+ && GET_MODE_SIZE (GET_MODE (SET_DEST (set))) == 8
+ && MEM_P (SET_DEST (set)))
+   {
+ next = next_active_insn (insn);
+ if (!next)
+   break;
+
+ /* Skip empty assembly statements.  */
+ if ((GET_CODE (PATTERN (next)) == UNSPEC_VOLATILE)
+ || (USEFUL_INSN_P (next)
+ && (asm_noperands (PATTERN (next))>=0)
+ && !strcmp (decode_asm_operands (PATTERN (next),
+  NULL, NULL, NULL,
+  NULL, NULL), "")))
+   next = next_active_insn (next);
+ if (!next)
+   break;
+
+ /* If the insn is a branch, then it cannot be problematic.  */
+ if (!NONJUMP_INSN_P (next) || GET_CODE (PATTERN (next)) == SEQUENCE)
+   continue;
+
+ if ((set = single_set (next)) == NULL_RTX)
+   continue;
+
+ /* Add NOP if double-word store is followed by any type of store.  */
+ if (MEM_P (SET_DEST (set)))
+   insert_nop = true;
+   }
+  else
+  /* Look for single-word, half-word, or byte store.  */
+  if (sparc_fix_b2bst
+ && NONJUMP_INSN_P (insn)
+ && (set = single_set (insn)) != NULL_RTX
+ && GET_MODE_SIZE (GET_MODE (SET_DEST (set))) <= 4
+ && MEM_P (SET_DEST (set)))
+   {
+ rtx_insn *after;
+
+ next = next_active_insn (insn);
+ if (!next)
+   break;
+
+ /* Skip empty assembly statements.  */
+ if ((GET_CODE (PATTERN (next)) == UNSPEC_VOLATILE)
+ || (USEFUL_INSN_P (next)
+ && (asm_noperands (PATTERN (next))>=0)
+ && !strcmp (decode_asm_operands (PATTERN (next),
+  NULL, NULL, NULL,
+  NULL, NULL), "")))
+   next = ne

[PATCH][AArch64] Fix atomic_cmp_exchange_zero_reg_1.c with +lse

2017-06-21 Thread Kyrill Tkachov


Hi all,

As Andrew pointed out, the patch at r248921 
(https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01648.html)
that allowed const0_rtx as an argument to the compare-exchange
patterns was incomplete. It didn't extend the TARGET_LSE patterns as well, 
causing the expander to generate
an invalid pattern that the insn_and_split and define_insns didn't accept. This 
patch extends them as well
to allow aarch64_reg_or_zero rather than just register_operand in the operand 
they're comparing against.

With this patch the testcase compiles successfully with +lse, generating a "casa
w1, wzr, [x0]".

Bootstrapped and tested on aarch64-none-linux-gnu.

Ok for trunk?

Thanks,
Kyrill

2017-06-21  Kyrylo Tkachov  

* config/aarch64/atomics.md (aarch64_compare_and_swap_lse,
SHORT): Relax operand 3 to aarch64_reg_or_zero and constraint to Z.
(aarch64_compare_and_swap_lse, GPI): Likewise.
(aarch64_atomic_cas, SHORT): Likewise for operand 2.
(aarch64_atomic_cas, GPI): Likewise.
diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
index 27fc193..32b7169 100644
--- a/gcc/config/aarch64/atomics.md
+++ b/gcc/config/aarch64/atomics.md
@@ -94,7 +94,7 @@
(set (match_dup 1)
 (unspec_volatile:SHORT
   [(match_operand:SI 2 "aarch64_plus_operand" "rI")	;; expected
-   (match_operand:SHORT 3 "register_operand" "r")	;; desired
+   (match_operand:SHORT 3 "aarch64_reg_or_zero" "rZ")	;; desired
(match_operand:SI 4 "const_int_operand")		;; is_weak
(match_operand:SI 5 "const_int_operand")		;; mod_s
(match_operand:SI 6 "const_int_operand")]	;; mod_f
@@ -119,7 +119,7 @@
(set (match_dup 1)
 (unspec_volatile:GPI
   [(match_operand:GPI 2 "aarch64_plus_operand" "rI")	;; expect
-   (match_operand:GPI 3 "register_operand" "r")		;; desired
+   (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")		;; desired
(match_operand:SI 4 "const_int_operand")			;; is_weak
(match_operand:SI 5 "const_int_operand")			;; mod_s
(match_operand:SI 6 "const_int_operand")]		;; mod_f
@@ -616,7 +616,7 @@
   (set (match_dup 1)
(unspec_volatile:SHORT
 [(match_dup 0)
- (match_operand:SHORT 2 "register_operand" "r")	;; value.
+ (match_operand:SHORT 2 "aarch64_reg_or_zero" "rZ")	;; value.
  (match_operand:SI 3 "const_int_operand" "")]	;; model.
 UNSPECV_ATOMIC_CAS))]
  "TARGET_LSE && reload_completed"
@@ -640,7 +640,7 @@
   (set (match_dup 1)
(unspec_volatile:GPI
 [(match_dup 0)
- (match_operand:GPI 2 "register_operand" "r")	;; value.
+ (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ")	;; value.
  (match_operand:SI 3 "const_int_operand" "")]	;; model.
 UNSPECV_ATOMIC_CAS))]
   "TARGET_LSE && reload_completed"

[PATCH] Implement cond and induction cond reduction w/o REDUC_MAX_EXPR

2017-06-21 Thread Richard Biener


During my attempt to refactor reduction vectorization I ran across
the special casing of inital values for INTEGER_INDUC_COND_REDUCTION
and tried to see what it is about.  So I ended up implementing
cond reduction support for targets w/o REDUC_MAX_EXPR by simply
doing the reduction in scalar code -- while that results in an
expensive epilogue the vector loop should be reasonably fast.

I still didn't run into any exec FAILs in vect.exp with removing
the INTEGER_INDUC_COND_REDUCTION special case thus the following
patch.

Alan -- is there a testcase (maybe full bootstrap & regtest will
unconver one) that shows how this is necessary?

Bootstrap and regtest running on x86_64-unknown-linux-gnu, testing
on arm appreciated.

Thanks,
Richard.

2016-06-21  Richard Biener  

* tree-vect-loop.c (vect_model_reduction_cost): Handle
COND_REDUCTION and INTEGER_INDUC_COND_REDUCTION without
REDUC_MAX_EXPR support.
(vectorizable_reduction): Likewise.
(vect_create_epilog_for_reduction): Remove special case of
INTEGER_INDUC_COND_REDUCTION initial value.
(vect_create_epilog_for_reduction): Handle COND_REDUCTION
and INTEGER_INDUC_COND_REDUCTION without REDUC_MAX_EXPR support.
Remove compensation code for initial value special handling
of INTEGER_INDUC_COND_REDUCTION.

* gcc.dg/vect/pr65947-1.c: Remove xfail.
* gcc.dg/vect/pr65947-2.c: Likewise.
* gcc.dg/vect/pr65947-3.c: Likewise.
* gcc.dg/vect/pr65947-4.c: Likewise.
* gcc.dg/vect/pr65947-5.c: Likewise.
* gcc.dg/vect/pr65947-6.c: Likewise.
* gcc.dg/vect/pr65947-8.c: Likewise.
* gcc.dg/vect/pr65947-9.c: Likewise.

Index: gcc/testsuite/gcc.dg/vect/pr65947-1.c
===
--- gcc/testsuite/gcc.dg/vect/pr65947-1.c   (revision 249446)
+++ gcc/testsuite/gcc.dg/vect/pr65947-1.c   (working copy)
@@ -40,5 +40,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! 
vect_max_reduc } } } } */
-/* { dg-final { scan-tree-dump-times "condition expression based on integer 
induction." 4 "vect" { xfail { ! vect_max_reduc } } } } */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "condition expression based on integer 
induction." 4 "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/pr65947-2.c
===
--- gcc/testsuite/gcc.dg/vect/pr65947-2.c   (revision 249446)
+++ gcc/testsuite/gcc.dg/vect/pr65947-2.c   (working copy)
@@ -41,5 +41,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! 
vect_max_reduc } } } } */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
 /* { dg-final { scan-tree-dump-not "condition expression based on integer 
induction." "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/pr65947-3.c
===
--- gcc/testsuite/gcc.dg/vect/pr65947-3.c   (revision 249446)
+++ gcc/testsuite/gcc.dg/vect/pr65947-3.c   (working copy)
@@ -51,5 +51,5 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! 
vect_max_reduc } } } } */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
 /* { dg-final { scan-tree-dump-not "condition expression based on integer 
induction." "vect" } } */
Index: gcc/testsuite/gcc.dg/vect/pr65947-4.c
===
--- gcc/testsuite/gcc.dg/vect/pr65947-4.c   (revision 249446)
+++ gcc/testsuite/gcc.dg/vect/pr65947-4.c   (working copy)
@@ -40,6 +40,6 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" { xfail { ! 
vect_max_reduc } } } } */
-/* { dg-final { scan-tree-dump-times "condition expression based on integer 
induction." 4 "vect" { xfail { ! vect_max_reduc } } } } */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 2 "vect" } } */
+/* { dg-final { scan-tree-dump-times "condition expression based on integer 
induction." 4 "vect" } } */
 
Index: gcc/testsuite/gcc.dg/vect/pr65947-5.c
===
--- gcc/testsuite/gcc.dg/vect/pr65947-5.c   (revision 249446)
+++ gcc/testsuite/gcc.dg/vect/pr65947-5.c   (working copy)
@@ -41,6 +41,6 @@ main (void)
   return 0;
 }
 
-/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" { xfail { ! 
vect_max_reduc } } } } */
-/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" { 
xfail { ! vect_max_reduc } } } } */
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED" 1 "vect" } } */
+/* { dg-final { scan-tree-dump "loop size is greater than data size" "vect" } 
} */
 /* { dg-final { scan-tree-dump-not "condition expression base

[PATCH] v3: C/C++: fix quoting of "aka" typedef information (PR 62170)

2017-06-21 Thread David Malcolm

On Tue, 2017-06-20 at 15:11 -0400, Jason Merrill wrote:
> On Tue, Jun 20, 2017 at 3:06 PM, David Malcolm 
> wrote:
> > It's not clear to me what the issue alluded to with negative
> > obstack_blank is, but I chose to follow the above docs and use
> > obstack_blank_fast; am testing an updated patch in which the above
> > line
> > now looks like:
> > 
> >   obstack_blank_fast (ob, -(type_start + type_len));
> > 
> > Is the patch OK with that change? (assuming bootstrap®rtesting
> > pass), or should I re-post?
> 
> OK with that change.

It turns out that the resizing calculation in the above line is
wrong, but I managed not to expose my mistake in the light testing
I did before posting the above.  Oops.

Thankfully the issue showed up in g++.dg/other/error23.C when I
ran the full test suite.

Sorry about that.

I've updated the shrinking calculation to look like this:

  int delta = type_start + type_len - obstack_object_size (ob);
  gcc_assert (delta <= 0);
  obstack_blank_fast (ob, delta);

and with that, it successfully bootstrapped & regression-tested on
x86_64-pc-linux-gnu.

OK for trunk?

Full patch follows, for reference:

gcc/c/ChangeLog:
PR c++/62170
* c-objc-common.c (c_tree_printer): Convert penultimate param from
bool to bool *.  Within '%T' handling, if showing an "aka", use
"quoted" param to add appropriate quoting.

gcc/cp/ChangeLog:
PR c++/62170
* error.c (type_to_string): Add leading comment.  Add params
"postprocessed", "quote", and "show_color", using them to fix
quoting of the "aka" for types involving typedefs.
(arg_to_string): Update for new params to type_to_string.
(cxx_format_postprocessor::handle): Likewise.
(cp_printer): Convert penultimate param from bool to bool *.
Update call to type_to_string and calls to
defer_phase_2_of_type_diff.

gcc/fortran/ChangeLog:
PR c++/62170
* error.c (gfc_notify_std): Convert "quoted" param from bool to
bool *.

gcc/ChangeLog:
PR c++/62170
* pretty-print.c (pp_format): Move quoting implementation to
pp_begin_quote and pp_end_quote.  Update pp_format_decoder call
to pass address of "quote" local.
(pp_begin_quote): New function.
(pp_end_quote): New function.
* pretty-print.h (printer_fn): Convert penultimate param from bool
to bool *.
(pp_begin_quote): New decl.
(pp_end_quote): New decl.
* tree-diagnostic.c (default_tree_printer): Convert penultimate
param from bool to bool *.
* tree-diagnostic.h (default_tree_printer): Likewise.

gcc/testsuite/ChangeLog:
PR c++/62170
* g++.dg/cpp1z/direct-enum-init1.C: Update expected error messages
to reflect fixes to quoting.
* g++.dg/diagnostic/aka1.C: Likewise.
* g++.dg/diagnostic/aka2.C: New test case.
* g++.dg/parse/error55.C: Update expected error messages to
reflect fixes to quoting.
* g++.dg/warn/pr12242.C: Likewise.
* g++.old-deja/g++.mike/enum1.C: Likewise.
* gcc.dg/diag-aka-1.c: Likewise.
* gcc.dg/diag-aka-2.c: New test case.
* gcc.dg/pr13804-1.c: Update expected error messages to reflect
fixes to quoting.
* gcc.dg/pr56980.c: Likewise.
* gcc.dg/pr65050.c: Likewise.
* gcc.dg/redecl-14.c: Likewise.
* gcc.dg/utf16-4.c Likewise.
* gcc.target/i386/sse-vect-types.c (__m128d): Likewise.
* obj-c++.dg/invalid-type-1.mm: Likewise.
---
 gcc/c/c-objc-common.c  |  12 +-
 gcc/cp/error.c |  94 --
 gcc/fortran/error.c|   2 +-
 gcc/pretty-print.c |  37 +++-
 gcc/pretty-print.h |   5 +-
 gcc/testsuite/g++.dg/cpp1z/direct-enum-init1.C | 234 -
 gcc/testsuite/g++.dg/diagnostic/aka1.C |   2 +-
 gcc/testsuite/g++.dg/diagnostic/aka2.C |  32 
 gcc/testsuite/g++.dg/parse/error55.C   |   2 +-
 gcc/testsuite/g++.dg/warn/pr12242.C|  16 +-
 gcc/testsuite/g++.old-deja/g++.mike/enum1.C|   2 +-
 gcc/testsuite/gcc.dg/diag-aka-1.c  |   4 +-
 gcc/testsuite/gcc.dg/diag-aka-2.c  |  12 ++
 gcc/testsuite/gcc.dg/pr13804-1.c   |   4 +-
 gcc/testsuite/gcc.dg/pr56980.c |  12 +-
 gcc/testsuite/gcc.dg/pr65050.c |   8 +-
 gcc/testsuite/gcc.dg/redecl-14.c   |   2 +-
 gcc/testsuite/gcc.dg/utf16-4.c |   2 +-
 gcc/testsuite/gcc.target/i386/sse-vect-types.c |   2 +-
 gcc/testsuite/obj-c++.dg/invalid-type-1.mm |   4 +-
 gcc/tree-diagnostic.c  |   2 +-
 gcc/tree-diagnostic.h  |   2 +-
 22 files changed, 314 insertions(+), 178 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/diagnostic/aka2

RE: [PATCH] [i386] Enable Control-flow Enforcement Technology (CET).

2017-06-21 Thread Tsimbalist, Igor V

Thanks for the feedback. I'll redo the patch according to your comments.

Igor

-Original Message-
From: Joseph Myers [mailto:jos...@codesourcery.com] 
Sent: Tuesday, June 20, 2017 11:30 PM
To: Tsimbalist, Igor V 
Cc: gcc-patches@gcc.gnu.org; Chupin, Pavel V 
Subject: Re: [PATCH] [i386] Enable Control-flow Enforcement Technology (CET).
Importance: High

This patch is completely missing documentation (in *.texi files) of the new 
options, attribute, built-in functions etc.

You appear to be adding quite target-specific things to the 
architecture-independent compiler.  If the attribute, for example, is to be 
architecture-independent, the documentation needs to define semantics for it 
that make sense on any architecture (or any architecture providing such 
features), not just on x86; the patch submission needs to justify the design 
choices of what is target-specific and what is target-independent.  
(Cf. MPX where there are various architecture-independent features for which a 
software implementation would be logically possible, although actually the only 
implementation of those features in GCC is for MPX
hardware.)  I don't think this patch would even build for non-x86 targets, 
because you're putting completely x86-specific references such as TARGET_CET 
and gen_nop_endbr in target-independent files.

--
Joseph S. Myers
jos...@codesourcery.com

[testsuite, committed] Fix warning in gcc.dg/tree-prof/comp-goto-1.c

2017-06-21 Thread Tom de Vries


Hi,

this patch fixes a 'return type defaults to int' warning in test-case 
gcc.dg/tree-prof/comp-goto-1.c.


Committed as obvious.

Thanks,
- Tom
Fix warning in gcc.dg/tree-prof/comp-goto-1.c

2017-06-21  Tom de Vries  

	* gcc.dg/tree-prof/comp-goto-1.c: Fix 'return type defaults to int'
	warning.

---
 gcc/testsuite/gcc.dg/tree-prof/comp-goto-1.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/testsuite/gcc.dg/tree-prof/comp-goto-1.c b/gcc/testsuite/gcc.dg/tree-prof/comp-goto-1.c
index bb1d120..fe768f9 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/comp-goto-1.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/comp-goto-1.c
@@ -165,5 +165,5 @@ main ()
   exit (0);
 }
 #else
-main(){ exit (0); }
+int main(){ exit (0); }
 #endif

[testsuite, committed] Support dg-add-options in profopt.exp

2017-06-21 Thread Tom de Vries


[ was: Re: [PATCH, testsuite] Add effective target stack_size ]

On 06/21/2017 03:19 PM, Jakub Jelinek wrote:

On Fri, Jun 09, 2017 at 04:24:30PM +0200, Tom de Vries wrote:

* gcc.dg/tree-prof/comp-goto-1.c: Same.
* gcc.dg/tree-prof/pr44777.c: Same.



--- a/gcc/testsuite/gcc.dg/tree-prof/comp-goto-1.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/comp-goto-1.c
@@ -1,6 +1,8 @@
  /* { dg-require-effective-target freorder } */
  /* { dg-require-effective-target label_values } */
  /* { dg-options "-O2 -freorder-blocks-and-partition" } */
+/* { dg-additional-options "-DSTACK_SIZE=[dg-effective-target-value 
stack_size]" { target { stack_size } } } */
+
  #include 
  
  #if (!defined(STACK_SIZE) || STACK_SIZE >= 4000) && __INT_MAX__ >= 2147483647

diff --git a/gcc/testsuite/gcc.dg/tree-prof/pr44777.c 
b/gcc/testsuite/gcc.dg/tree-prof/pr44777.c
index 4074b75..1249b5b 100644
--- a/gcc/testsuite/gcc.dg/tree-prof/pr44777.c
+++ b/gcc/testsuite/gcc.dg/tree-prof/pr44777.c
@@ -2,6 +2,8 @@
  /* { dg-require-effective-target label_values } */
  /* { dg-require-effective-target trampolines } */
  /* { dg-options "-O0" } */
+/* { dg-additional-options "-DSTACK_SIZE=[dg-effective-target-value 
stack_size]" { target { stack_size } } } */
+
  /* A variant of gcc.c-torture/execute/comp-goto-2.c.  */
  
  extern void abort (void);


I'm now seeing
WARNING: profopt.exp does not support dg-add-options
WARNING: profopt.exp does not support dg-add-options
so the above doesn't look correct.


Thanks for pointing that out. I've double-checked, and the warnings did 
appear for me in the testlogs, but my automated test-comparison 
reporting did not pick up on that, I'll have to fix that.


Fixed the warnings by adding dg-add-options support in profopt-get-options.

Tested both test-cases on x86_64.

Tested both test-cases on x86_64 with effective target stack_size set to 
0, and verified that -DSTACK_SIZE=0 was passed.


Committed as obvious.

Thanks,
- Tom
Support dg-add-options in profopt.exp

2017-06-21  Tom de Vries  

	* lib/profopt.exp (profopt-get-options): Support dg-add-options.

---
 gcc/testsuite/lib/profopt.exp | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/lib/profopt.exp b/gcc/testsuite/lib/profopt.exp
index 6519c44..0ea8e7a 100644
--- a/gcc/testsuite/lib/profopt.exp
+++ b/gcc/testsuite/lib/profopt.exp
@@ -249,6 +249,7 @@ proc profopt-get-options { src } {
 	set cmd [lindex $op 0]
 	if { ![string compare "dg-options" $cmd] \
 	 || ![string compare "dg-additional-options" $cmd] \
+	 || ![string compare "dg-add-options" $cmd] \
 	 || ![string compare "dg-skip-if" $cmd] \
 	 || ![string compare "dg-final-generate" $cmd] \
 	 || ![string compare "dg-final-use" $cmd] \

[Patch AArch64 / libstdc++] Update baseline symbols for aarch64-none-linux-gnu

2017-06-21 Thread Ramana Radhakrishnan

Regenerate symbols file for aarch64-none-linux-gnu. Tested with make 
check in libstdc++ and eyeballing outputs.


Applied as obvious.

Tested on aarch64-none-linux-gnu with no issues in libstdc++ tests.

regards
Ramana



* config/abi/post/aarch64-linux-gnu/baseline_symbols.txt: Regenerate
diff --git 
a/libstdc++-v3/config/abi/post/aarch64-linux-gnu/baseline_symbols.txt 
b/libstdc++-v3/config/abi/post/aarch64-linux-gnu/baseline_symbols.txt
index b444b4a..d942ba6 100644
--- a/libstdc++-v3/config/abi/post/aarch64-linux-gnu/baseline_symbols.txt
+++ b/libstdc++-v3/config/abi/post/aarch64-linux-gnu/baseline_symbols.txt
@@ -444,6 +444,7 @@ 
FUNC:_ZNKSt13basic_fstreamIwSt11char_traitsIwEE7is_openEv@GLIBCXX_3.4
 FUNC:_ZNKSt13basic_istreamIwSt11char_traitsIwEE6gcountEv@@GLIBCXX_3.4
 FUNC:_ZNKSt13basic_istreamIwSt11char_traitsIwEE6sentrycvbEv@@GLIBCXX_3.4
 FUNC:_ZNKSt13basic_ostreamIwSt11char_traitsIwEE6sentrycvbEv@@GLIBCXX_3.4
+FUNC:_ZNKSt13random_device13_M_getentropyEv@@GLIBCXX_3.4.25
 FUNC:_ZNKSt13runtime_error4whatEv@@GLIBCXX_3.4
 FUNC:_ZNKSt14basic_ifstreamIcSt11char_traitsIcEE5rdbufEv@@GLIBCXX_3.4
 FUNC:_ZNKSt14basic_ifstreamIcSt11char_traitsIcEE7is_openEv@@GLIBCXX_3.4.5
@@ -4003,6 +4004,8 @@ OBJECT:0:GLIBCXX_3.4.20
 OBJECT:0:GLIBCXX_3.4.21
 OBJECT:0:GLIBCXX_3.4.22
 OBJECT:0:GLIBCXX_3.4.23
+OBJECT:0:GLIBCXX_3.4.24
+OBJECT:0:GLIBCXX_3.4.25
 OBJECT:0:GLIBCXX_3.4.3
 OBJECT:0:GLIBCXX_3.4.4
 OBJECT:0:GLIBCXX_3.4.5

Re: [PATCH, rs6000] Add vec_reve support

2017-06-21 Thread Segher Boessenkool

Hi Carl,

On Tue, Jun 20, 2017 at 06:23:33PM -0700, Carl Love wrote:
>   * config/rs6000/rs6000-builtin.def (VREVE_V2DI, VREVE_V4SI,
>   VREVE_V8HI, VREVE_V16QI, VREVE_V2DF, VREVE_V4SF, VREVE): New

"New." or "New builtin.".

>   * config/rs6000/altivec.md (UNSPEC_VREVEV, VEC_A_size,
>   altivec_vrev): New
>   UNSPEC, new mode_attr, new patterns.

This wrapped oddly...  mail client problem?

Please put these things in separate entries.

>   * config/rs6000/altivec.h (vec_reve): New define

Dot.

>   * gcc.target/powerpc/builtins-3-vec_reve-runable.c (test_results,
>   main): Add new runnable test file for the vec_rev built-ins.

You misspelled the filename.

> diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
> index d542315..98ccfd2 100644
> --- a/gcc/config/rs6000/altivec.h
> +++ b/gcc/config/rs6000/altivec.h
> @@ -142,6 +142,7 @@
>  #define vec_madd __builtin_vec_madd
>  #define vec_madds __builtin_vec_madds
>  #define vec_mtvscr __builtin_vec_mtvscr
> +#define vec_reve   __builtin_vec_vreve
>  #define vec_vmaxfp __builtin_vec_vmaxfp
>  #define vec_vmaxsw __builtin_vec_vmaxsw
>  #define vec_vmaxsh __builtin_vec_vmaxsh

All the rest here use just a single space, please do the same.

> UNSPEC_VPACK_UNS_UNS_SAT
> UNSPEC_VPACK_UNS_UNS_MOD
> UNSPEC_VPACK_UNS_UNS_MOD_DIRECT
> +   UNSPEC_VREVEV
> UNSPEC_VSLV4SI
> UNSPEC_VSLO
> UNSPEC_VSR
> @@ -231,6 +232,11 @@
>  ;; Vector negate
>  (define_mode_iterator VNEG [V4SI V2DI])
>  
> +;; Vector reverse elements, uses define_mode_iterator VEC_A
> +;; size in bytes of the vector element
> +(define_mode_attr VEC_A_size [(V2DI "8") (V4SI "4") (V8HI "2")
> +  (V16QI "1") (V2DF "8") (V4SF "4")])

I think you want to use GET_MODE_UNIT_SIZE instead, no need for a new
attribute.

> +  size = ;

size = GET_MODE_UNIT_SIZE (mode);

> +  num_elements = 16 / size;

num_elements = GET_MODE_NUNITS (mode);

> +  for (j = num_elements-1; j >= 0; j--)
> +for (i = 0; i < size; i++)
> +  RTVEC_ELT (v, i + j*size) = gen_rtx_CONST_INT (QImode, k++);

Why does j walk backwards?  Oh, because of k++.  Write that one as
something with i and j as well?


Segher

[RFC PATCH] Fix pointer diff (was: -fsanitize=pointer-overflow support (PR sanitizer/80998))

2017-06-21 Thread Jakub Jelinek

On Tue, Jun 20, 2017 at 10:18:20AM +0200, Richard Biener wrote:
> > > > 3) not really related to this patch, but something I also saw during the
> > > > bootstrap-ubsan on i686-linux:
> > > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: 
> > > > -2147426384 - 2147475412 cannot be represented in type 'int'
> > > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: 
> > > > -2147426384 - 2147478324 cannot be represented in type 'int'
> > > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: 
> > > > -2147450216 - 2147451580 cannot be represented in type 'int'
> > > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: 
> > > > -2147450216 - 2147465664 cannot be represented in type 'int'
> > > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: 
> > > > -2147469348 - 2147451544 cannot be represented in type 'int'
> > > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: 
> > > > -2147482364 - 2147475376 cannot be represented in type 'int'
> > > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: 
> > > > -2147483624 - 2147475376 cannot be represented in type 'int'
> > > > ../../gcc/bitmap.c:141:12: runtime error: signed integer overflow: 
> > > > -2147483628 - 2147451544 cannot be represented in type 'int'
> > > > ../../gcc/memory-block.cc:59:4: runtime error: signed integer overflow: 
> > > > -2147426384 - 2147475376 cannot be represented in type 'int'
> > > > ../../gcc/memory-block.cc:59:4: runtime error: signed integer overflow: 
> > > > -2147450216 - 2147451544 cannot be represented in type 'int'
> > > > The problem here is that we lower pointer subtraction, e.g.
> > > > long foo (char *p, char *q) { return q - p; }
> > > > as return (ptrdiff_t) ((ssizetype) q - (ssizetype) p);
> > > > and even for a valid testcase where we have an array across
> > > > the middle of the virtual address space, say the first one above
> > > > is (char *) 0x8000dfb0 - (char *) 0x7fffdfd4 subtraction, even if
> > > > there is 128KB array starting at 0x7fffd000, it will yield
> > > > UB (not in the source, but in whatever the compiler lowered it into).
> > > > So, shall we instead do the subtraction in sizetype and only then
> > > > cast?  For sizeof (*ptr) > 1 I think we have some outstanding PR,
> > > > and it is more difficult to find out in what types to compute it.
> > > > Or do we want to introduce POINTER_DIFF_EXPR?
> > > 
> > > Just use uintptr_t for the difference computation (well, an unsigned
> > > integer type of desired precision -- mind address-spaces), then cast
> > > the result to signed.
> > 
> > Ok (of course, will handle this separately from the rest).
> 
> Yes.  Note I didn't look at the actual patch (yet).

So, I wrote following patch to do the subtraction in unsigned
type.  It passes bootstrap, but on both x86_64-linux and i686-linux
regresses:
+FAIL: gcc.dg/torture/pr66178.c   -O*  (test for excess errors)
+FAIL: gcc.dg/tree-ssa/cmpexactdiv-2.c scan-tree-dump-not optimized "minus_expr"
+FAIL: g++.dg/tree-ssa/pr21082.C  -std=gnu++* (test for excess errors)

E.g. in the first testcase we have in the test:
static uintptr_t a =  ((char *)&&l2-(char *)&&l3)+((char *)&&l1-(char *)&&l2);
Without the patch, we ended up with:
static uintptr_t a = (uintptr_t) (((long int) &l2 - (long int) &l3) + ((long 
int) &l1 - (long int) &l2));
but with the patch with (the negation in signed type sounds like a folding
bug), which is too difficult for the initializer_constant_valid_p* handling:
(uintptr_t) (((long unsigned int) -(long int) &l3 - (long unsigned int) &l2) + 
((long unsigned int) &l2 + (long unsigned int) &l1));
Shall we just xfail that test, or make sure we don't reassociate such
subtractions, something different?

The second failure is on:
int f (long *a, long *b, long *c) {
__PTRDIFF_TYPE__ l1 = b - a;
__PTRDIFF_TYPE__ l2 = c - a;
return l1 < l2;
}
where without the patch during forwprop2 we optimize it
using match.pd:
/* X - Z < Y - Z is the same as X < Y when there is no overflow.  */
because we had:
  b.0_1 = (long int) b_8(D);
  a.1_2 = (long int) a_9(D);
  _3 = b.0_1 - a.1_2;
  c.2_4 = (long int) c_11(D);
  a.3_5 = (long int) a_9(D);
  _6 = c.2_4 - a.3_5;
  _7 = _3 < _6;
But with the patch we have:
  b.0_1 = (long unsigned int) b_9(D);
  a.1_2 = (long unsigned int) a_10(D);
  _3 = b.0_1 - a.1_2;
  _4 = (long int) _3;
  c.2_5 = (long unsigned int) c_11(D);
  _6 = c.2_5 - a.1_2;
  _7 = (long int) _6;
  _8 = _4 < _7;
instead.  But that is something we can't generally optimize.
So do we need to introduce POINTER_DIFF (where we could still
optimize this) or remove the test?  If we rely on largest possible
array to be half of the VA size - 1 (i.e. where for x > y both being
pointers into the same array x - y > 0), then it is a valid optimization
of the 2 pointer subtractions, but it is not a valid optimization on
comparison of unsigned subtractions cast to signed type.

The third o

Re: [PATCH][AArch64] Fix atomic_cmp_exchange_zero_reg_1.c with +lse

2017-06-21 Thread James Greenhalgh

On Wed, Jun 21, 2017 at 02:48:20PM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> As Andrew pointed out, the patch at r248921
> (https://gcc.gnu.org/ml/gcc-patches/2017-02/msg01648.html) that allowed
> const0_rtx as an argument to the compare-exchange patterns was incomplete. It
> didn't extend the TARGET_LSE patterns as well, causing the expander to
> generate an invalid pattern that the insn_and_split and define_insns didn't
> accept. This patch extends them as well to allow aarch64_reg_or_zero rather
> than just register_operand in the operand they're comparing against.
> 
> With this patch the testcase compiles successfully with +lse, generating a
> "casaw1, wzr, [x0]".
> 
> Bootstrapped and tested on aarch64-none-linux-gnu.
> 
> Ok for trunk?

This looks correct to me.

OK for trunk.
James

> 2017-06-21  Kyrylo Tkachov  
> 
> * config/aarch64/atomics.md (aarch64_compare_and_swap_lse,
> SHORT): Relax operand 3 to aarch64_reg_or_zero and constraint to Z.
> (aarch64_compare_and_swap_lse, GPI): Likewise.
> (aarch64_atomic_cas, SHORT): Likewise for operand 2.
> (aarch64_atomic_cas, GPI): Likewise.

> diff --git a/gcc/config/aarch64/atomics.md b/gcc/config/aarch64/atomics.md
> index 27fc193..32b7169 100644
> --- a/gcc/config/aarch64/atomics.md
> +++ b/gcc/config/aarch64/atomics.md
> @@ -94,7 +94,7 @@
> (set (match_dup 1)
>  (unspec_volatile:SHORT
>[(match_operand:SI 2 "aarch64_plus_operand" "rI")  ;; expected
> -   (match_operand:SHORT 3 "register_operand" "r");; desired
> +   (match_operand:SHORT 3 "aarch64_reg_or_zero" "rZ");; desired
> (match_operand:SI 4 "const_int_operand")  ;; is_weak
> (match_operand:SI 5 "const_int_operand")  ;; mod_s
> (match_operand:SI 6 "const_int_operand")] ;; mod_f
> @@ -119,7 +119,7 @@
> (set (match_dup 1)
>  (unspec_volatile:GPI
>[(match_operand:GPI 2 "aarch64_plus_operand" "rI") ;; expect
> -   (match_operand:GPI 3 "register_operand" "r")  ;; desired
> +   (match_operand:GPI 3 "aarch64_reg_or_zero" "rZ")  ;; 
> desired
> (match_operand:SI 4 "const_int_operand")  ;; 
> is_weak
> (match_operand:SI 5 "const_int_operand")  ;; mod_s
> (match_operand:SI 6 "const_int_operand")] ;; mod_f
> @@ -616,7 +616,7 @@
>(set (match_dup 1)
> (unspec_volatile:SHORT
>  [(match_dup 0)
> - (match_operand:SHORT 2 "register_operand" "r")  ;; value.
> + (match_operand:SHORT 2 "aarch64_reg_or_zero" "rZ")  ;; value.
>   (match_operand:SI 3 "const_int_operand" "")];; model.
>  UNSPECV_ATOMIC_CAS))]
>   "TARGET_LSE && reload_completed"
> @@ -640,7 +640,7 @@
>(set (match_dup 1)
> (unspec_volatile:GPI
>  [(match_dup 0)
> - (match_operand:GPI 2 "register_operand" "r");; value.
> + (match_operand:GPI 2 "aarch64_reg_or_zero" "rZ");; value.
>   (match_operand:SI 3 "const_int_operand" "")];; model.
>  UNSPECV_ATOMIC_CAS))]
>"TARGET_LSE && reload_completed"

Re: C++ PATCH for c++/81073, constexpr and static var in statement-expression

2017-06-21 Thread Jason Merrill

On Wed, Jun 21, 2017 at 6:32 AM, Jakub Jelinek  wrote:
> The following patch fixes it by allowing that wording too on the line 10.
> Is this ok for trunk or do you have some other preference?

OK, thanks.

Jason

Re: [RFC PATCH] Fix pointer diff (was: -fsanitize=pointer-overflow support (PR sanitizer/80998))

2017-06-21 Thread Jakub Jelinek

On Wed, Jun 21, 2017 at 04:40:01PM +0200, Jakub Jelinek wrote:
> So, I wrote following patch to do the subtraction in unsigned
> type.  It passes bootstrap, but on both x86_64-linux and i686-linux
> regresses:
> +FAIL: gcc.dg/torture/pr66178.c   -O*  (test for excess errors)
> +FAIL: gcc.dg/tree-ssa/cmpexactdiv-2.c scan-tree-dump-not optimized 
> "minus_expr"
> +FAIL: g++.dg/tree-ssa/pr21082.C  -std=gnu++* (test for excess errors)

Another option is to do what the patch does only when sanitizing and accept
in that case less efficient code and rejection of weird corner case
testcases like the first one.  We risk miscompilation of the pointer
differences, but I haven't managed to come up with a testcase where it would
show (I guess more likely is when we propagate constants into the pointers).

Jakub

[PATCH, GCC/contrib] Fix variant selection in dg-cmp-results.sh

2017-06-21 Thread Thomas Preudhomme


Hi,

Commit r249422 to dg-cmp-results.sh broke the variant selection feature
where one can restrict the regression test to a specific target variant. This
fix restores the feature.


ChangeLog entry is as follows:

*** contrib/ChangeLog ***

2017-06-21  Thomas Preud'homme  

* dg-cmp-results.sh: Restore filtering on target variant.


Tested on a file with multiple variants which now gives sane results.

Is this ok for trunk?

Best regards,

Thomas
diff --git a/contrib/dg-cmp-results.sh b/contrib/dg-cmp-results.sh
index 921e9337d1f8ffea78ef566c351fb48a8f6ca064..5f2fed5ec3ff0c66d22bc07c84571568730fbcac 100755
--- a/contrib/dg-cmp-results.sh
+++ b/contrib/dg-cmp-results.sh
@@ -90,7 +90,7 @@ echo "Newer log file: $NFILE"
 sed $E -e '/^[[:space:]]+===/,$d' $NFILE
 
 # Create a temporary file from the old file's interesting section.
-sed $E -e '/^Running target /,/^[[:space:]]+===.*Summary ===/!d' \
+sed $E -e "/$header/,/^[[:space:]]+===.*Summary ===/!d" \
   -e '/^[A-Z]+:/!d' \
   -e '/^(WARNING|ERROR):/d' \
   -e 's/\r$//' \
@@ -100,7 +100,7 @@ sed $E -e '/^Running target /,/^[[:space:]]+===.*Summary ===/!d' \
   >/tmp/o$$-$OBASE
 
 # Create a temporary file from the new file's interesting section.
-sed $E -e '/^Running target /,/^[[:space:]]+===.*Summary ===/!d' \
+sed $E -e "/$header/,/^[[:space:]]+===.*Summary ===/!d" \
   -e '/^[A-Z]+:/!d' \
   -e '/^(WARNING|ERROR):/d' \
   -e 's/\r$//' \

Re: [PATCH] [Aarch64] Variable shift count truncation issues

2017-06-21 Thread Richard Sandiford

Michael Collison  writes:
> Updated the patch per Richard's suggestions to allow scheduling of
> instructions before reload.

Thanks, this looks good to me FWIW, but obviously I can't approve it.

Richard

> Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk?
>
> 2017-05-22  Kyrylo Tkachov  
>   Michael Collison 
>
>   PR target/70119
>   * config/aarch64/aarch64.md (*aarch64__reg_3_mask1):
>   New pattern.
>   (*aarch64_reg_3_neg_mask2): New pattern.
>   (*aarch64_reg_3_minus_mask): New pattern.
>   (*aarch64__reg_di3_mask2): New pattern.
>   * config/aarch64/aarch64.c (aarch64_rtx_costs): Account for cost
>   of shift when the shift amount is masked with constant equal to
>   the size of the mode.
>   * config/aarch64/predicates.md (subreg_lowpart_operator): New
>   predicate.
>
>
> 2016-05-22  Kyrylo Tkachov  
>   Michael Collison 
>
>   PR target/70119
>   * gcc.target/aarch64/var_shift_mask_1.c: New test.
>
> -Original Message-
> From: Richard Sandiford [mailto:richard.sandif...@linaro.org] 
> Sent: Thursday, June 15, 2017 6:40 AM
> To: Michael Collison 
> Cc: Wilco Dijkstra ; Christophe Lyon 
> ; GCC Patches ; nd 
> 
> Subject: Re: [PATCH] [Aarch64] Variable shift count truncation issues
>
> Michael Collison  writes:
>> +(define_insn_and_split "*aarch64_reg_3_neg_mask2"
>> +  [(set (match_operand:GPI 0 "register_operand" "=r")
>> +(SHIFT:GPI
>> +  (match_operand:GPI 1 "register_operand" "r")
>> +  (match_operator 4 "subreg_lowpart_operator"
>> +  [(neg:SI (and:SI (match_operand:SI 2 "register_operand" "r")
>> +   (match_operand 3 "const_int_operand" "n")))])))
>> +   (clobber (match_scratch:SI 5 "=&r"))]
>> +  "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) == 0)"
>> +  "#"
>> +  "&& reload_completed"
>> +  [(const_int 0)]
>> +  {
>> +emit_insn (gen_negsi2 (operands[5], operands[2]));
>> +
>> +rtx and_op = gen_rtx_AND (SImode, operands[5], operands[3]);
>> +rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[4]), and_op,
>> + SUBREG_BYTE (operands[4]));
>> +emit_insn (gen_3 (operands[0], operands[1], subreg_tmp));
>> +DONE;
>> +  }
>> +)
>
> Thanks, I agree this looks correct from the split/reload_completed POV.
> I think we can go one better though, either:
>
> (a) Still allow the split when !reload_completed, and use:
>
>  if (GET_MODE (operands[5]) == SCRATCH)
>operands[5] = gen_reg_rtx (SImode);
>
> This will allow the individual instructions to be scheduled by sched1.
>
> (b) Continue to restrict the split to reload_completed, change operand 0
> to =&r so that it can be used as a temporary, and drop operand 5 entirely.
>
> Or perhaps do both:
>
> (define_insn_and_split "*aarch64_reg_3_neg_mask2"
>   [(set (match_operand:GPI 0 "register_operand" "=&r")
>   (SHIFT:GPI
> (match_operand:GPI 1 "register_operand" "r")
> (match_operator 4 "subreg_lowpart_operator"
> [(neg:SI (and:SI (match_operand:SI 2 "register_operand" "r")
>  (match_operand 3 "const_int_operand" "n")))])))]
>   "((~INTVAL (operands[3]) & (GET_MODE_BITSIZE (mode)-1)) == 0)"
>   "#"
>   "&& 1"
>   [(const_int 0)]
>   {
> rtx tmp = (can_create_pseudo_p () ? gen_reg_rtx (mode)
>: operands[0]);
> emit_insn (gen_negsi2 (tmp, operands[2]));
>
> rtx and_op = gen_rtx_AND (SImode, tmp, operands[3]);
> rtx subreg_tmp = gen_rtx_SUBREG (GET_MODE (operands[4]), and_op,
>SUBREG_BYTE (operands[4]));
> emit_insn (gen_3 (operands[0], operands[1], subreg_tmp));
> DONE;
>   }
> )
>
> Sorry for the run-around.  I should have realised earlier that these patterns 
> didn't really need a distinct register after RA.
>
> Thanks,
> Richard
>
> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
> index 5707e53..45377a2 100644
> --- a/gcc/config/aarch64/aarch64.c
> +++ b/gcc/config/aarch64/aarch64.c
> @@ -7407,17 +7407,26 @@ cost_plus:
>  }
>else
>  {
> -   if (speed)
> +   if (VECTOR_MODE_P (mode))
>   {
> -   if (VECTOR_MODE_P (mode))
> - {
> -   /* Vector shift (register).  */
> -   *cost += extra_cost->vect.alu;
> - }
> -   else
> +   if (speed)
> + /* Vector shift (register).  */
> + *cost += extra_cost->vect.alu;
> + }
> +   else
> + {
> +   if (speed)
> + /* LSLV.  */
> + *cost += extra_cost->alu.shift_reg;
> +
> +   if (GET_CODE (op1) == AND && REG_P (XEXP (op1, 0))
> +   && CONST_INT_P (XEXP (op1, 1))
> +   && INTVAL (XEXP (op1, 1)) == GET_MODE_BITSIZE (mode) - 1)
>   {
> -   /* LSLV.  */
> -   *cost += extra_cost->alu.shift_reg;
> +

Re: [PATCH, GCC/contrib] Fix variant selection in dg-cmp-results.sh

2017-06-21 Thread Mike Stump

On Jun 21, 2017, at 8:30 AM, Thomas Preudhomme  
wrote:
> 
> Commit r249422 to dg-cmp-results.sh broke the variant selection feature

> 2017-06-21  Thomas Preud'homme  
> 
>   * dg-cmp-results.sh: Restore filtering on target variant.
> 
> 
> Tested on a file with multiple variants which now gives sane results.
> 
> Is this ok for trunk?

Ok.

Re: [RFC PATCH] Fix pointer diff (was: -fsanitize=pointer-overflow support (PR sanitizer/80998))

2017-06-21 Thread Marc Glisse


On Wed, 21 Jun 2017, Jakub Jelinek wrote:


So, I wrote following patch to do the subtraction in unsigned
type.  It passes bootstrap, but on both x86_64-linux and i686-linux
regresses:
+FAIL: gcc.dg/torture/pr66178.c   -O*  (test for excess errors)
+FAIL: gcc.dg/tree-ssa/cmpexactdiv-2.c scan-tree-dump-not optimized "minus_expr"
+FAIL: g++.dg/tree-ssa/pr21082.C  -std=gnu++* (test for excess errors)

E.g. in the first testcase we have in the test:
static uintptr_t a =  ((char *)&&l2-(char *)&&l3)+((char *)&&l1-(char *)&&l2);
Without the patch, we ended up with:
static uintptr_t a = (uintptr_t) (((long int) &l2 - (long int) &l3) + ((long int) 
&l1 - (long int) &l2));
but with the patch with (the negation in signed type sounds like a folding
bug), which is too difficult for the initializer_constant_valid_p* handling:
(uintptr_t) (((long unsigned int) -(long int) &l3 - (long unsigned int) &l2) + ((long 
unsigned int) &l2 + (long unsigned int) &l1));
Shall we just xfail that test, or make sure we don't reassociate such
subtractions, something different?


Adding to match.pd a few more simple reassoc transformations (like
(c-b)+(b-a) for instance) that work for both signed and unsigned is on
my TODO-list, though that may not be enough. Maybe together with fixing
whatever produced the negation would suffice?


The second failure is on:
int f (long *a, long *b, long *c) {
   __PTRDIFF_TYPE__ l1 = b - a;
   __PTRDIFF_TYPE__ l2 = c - a;
   return l1 < l2;
}
where without the patch during forwprop2 we optimize it
using match.pd:
/* X - Z < Y - Z is the same as X < Y when there is no overflow.  */
because we had:
 b.0_1 = (long int) b_8(D);
 a.1_2 = (long int) a_9(D);
 _3 = b.0_1 - a.1_2;
 c.2_4 = (long int) c_11(D);
 a.3_5 = (long int) a_9(D);
 _6 = c.2_4 - a.3_5;
 _7 = _3 < _6;
But with the patch we have:
 b.0_1 = (long unsigned int) b_9(D);
 a.1_2 = (long unsigned int) a_10(D);
 _3 = b.0_1 - a.1_2;
 _4 = (long int) _3;
 c.2_5 = (long unsigned int) c_11(D);
 _6 = c.2_5 - a.1_2;
 _7 = (long int) _6;
 _8 = _4 < _7;
instead.  But that is something we can't generally optimize.
So do we need to introduce POINTER_DIFF (where we could still
optimize this) or remove the test?  If we rely on largest possible
array to be half of the VA size - 1 (i.e. where for x > y both being
pointers into the same array x - y > 0), then it is a valid optimization
of the 2 pointer subtractions, but it is not a valid optimization on
comparison of unsigned subtractions cast to signed type.


(this testcase was meant as a simpler version of
vector.size() < vector.capacity() )

It does indeed seem impossible to do this optimization with the unsigned
pointer subtraction.

If we consider pointers as unsigned, with a subtraction that has a signed 
result with the constraint that overflow is undefined, we cannot model 
that optimally with just the usual signed/unsigned operations, so I am in 
favor of POINTER_DIFF, at least in the long run (together with having a 
signed second argument for POINTER_PLUS, etc). For 64-bit platforms it 
might have been easier to declare that the upper half (3/4 ?) of the 
address space doesn't exist...



The third one is
   if (&a[b] - &a[c] != b - c)
   link_error();
where fold already during generic folding used to be able to cope with it,
but now we have:
(long int) (((long unsigned int) b - (long unsigned int) c) * 4) /[ex] 4 != b - 
c
which we don't fold.


Once we have this last expression, we have lost, we need to know that the 
multiplication cannot overflow for this. When the size multiplications are 
done in a signed type in the future (?), it might help.


On the other hand, is this an important optimization? I am surprised we 
are only doing this transformation in generic (so some hack in the 
front-end could still work), it shouldn't be hard to implement some subset 
of fold_addr_of_array_ref_difference in match.pd (it is recursive so a 
complete move may be harder). But that would make your patch break even 
more stuff :-(


--
Marc Glisse

Re: [PATCH, ARM] Implement __ARM_FEATURE_COPROC coprocessor intrinsic feature macro

2017-06-21 Thread Christophe Lyon

Hi,


On 19 June 2017 at 11:32, Richard Earnshaw (lists)
 wrote:
> On 16/06/17 15:56, Prakhar Bahuguna wrote:
>> On 16/06/2017 15:37:18, Richard Earnshaw (lists) wrote:
>>> On 16/06/17 08:48, Prakhar Bahuguna wrote:
 On 15/06/2017 17:23:43, Richard Earnshaw (lists) wrote:
> On 14/06/17 10:35, Prakhar Bahuguna wrote:
>> The ARM ACLE defines the __ARM_FEATURE_COPROC macro which indicates which
>> coprocessor intrinsics are available for the target. If 
>> __ARM_FEATURE_COPROC is
>> undefined, the target does not support coprocessor intrinsics. The 
>> feature
>> levels are defined as follows:
>>
>> +-+---+--+
>> | **Bit** | **Value** | **Intrinsics Available** 
>> |
>> +-+---+--+
>> | 0   | 0x1   | __arm_cdp __arm_ldc, __arm_ldcl, __arm_stc,  
>> |
>> | |   | __arm_stcl, __arm_mcr and __arm_mrc  
>> |
>> +-+---+--+
>> | 1   | 0x2   | __arm_cdp2, __arm_ldc2, __arm_stc2, __arm_ldc2l, 
>> |
>> | |   | __arm_stc2l, __arm_mcr2 and __arm_mrc2   
>> |
>> +-+---+--+
>> | 2   | 0x4   | __arm_mcrr and __arm_mrrc
>> |
>> +-+---+--+
>> | 3   | 0x8   | __arm_mcrr2 and __arm_mrrc2  
>> |
>> +-+---+--+
>>
>> This patch implements full support for this feature macro as defined in 
>> section
>> 5.9 of the ACLE
>> (https://developer.arm.com/products/software-development-tools/compilers/arm-compiler-5/docs/101028/latest/5-feature-test-macros).
>>
>> gcc/ChangeLog:
>>
>> 2017-06-14  Prakhar Bahuguna  
>>
>>   * config/arm/arm-c.c (arm_cpu_builtins): New block to define
>>__ARM_FEATURE_COPROC according to support.
>>
>> 2017-06-14  Prakhar Bahuguna  
>>   * gcc/testsuite/gcc.target/arm/acle/cdp.c: Add feature macro bitmap
>>   test.
>>   * gcc/testsuite/gcc.target/arm/acle/cdp2.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/ldc.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/ldc2.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/ldc2l.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/ldcl.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/mcr.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/mcr2.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/mcrr.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/mcrr2.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/mrc.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/mrc2.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/mrrc.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/mrrc2.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/stc.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/stc2.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/stc2l.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/stcl.c: Likewise.
>>
>> Testing done: ACLE regression tests updated with tests for feature macro 
>> bits.
>> All regression tests pass.
>>
>> Okay for trunk?
>>
>>
>> 0001-Implement-__ARM_FEATURE_COPROC-coprocessor-intrinsic.patch
>>
>>
>> From 79d71aec9d2bdee936b240ae49368ff5f8d8fc48 Mon Sep 17 00:00:00 2001
>> From: Prakhar Bahuguna 
>> Date: Tue, 2 May 2017 13:43:40 +0100
>> Subject: [PATCH] Implement __ARM_FEATURE_COPROC coprocessor intrinsic 
>> feature
>>  macro
>>
>> ---
>>  gcc/config/arm/arm-c.c| 19 +++
>>  gcc/testsuite/gcc.target/arm/acle/cdp.c   |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/cdp2.c  |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/ldc.c   |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/ldc2.c  |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/ldc2l.c |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/ldcl.c  |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/mcr.c   |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/mcr2.c  |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/mcrr.c  |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/mcrr2.c |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/mrc.c   |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/mrc2.c  |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/mrrc.c  |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/mrrc2.c |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/stc.c   |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/stc2.c  |  3 +++

Re: [PATCH, ARM] Implement __ARM_FEATURE_COPROC coprocessor intrinsic feature macro

2017-06-21 Thread Christophe Lyon

Hi,


On 19 June 2017 at 11:32, Richard Earnshaw (lists)
 wrote:
> On 16/06/17 15:56, Prakhar Bahuguna wrote:
>> On 16/06/2017 15:37:18, Richard Earnshaw (lists) wrote:
>>> On 16/06/17 08:48, Prakhar Bahuguna wrote:
 On 15/06/2017 17:23:43, Richard Earnshaw (lists) wrote:
> On 14/06/17 10:35, Prakhar Bahuguna wrote:
>> The ARM ACLE defines the __ARM_FEATURE_COPROC macro which indicates which
>> coprocessor intrinsics are available for the target. If 
>> __ARM_FEATURE_COPROC is
>> undefined, the target does not support coprocessor intrinsics. The 
>> feature
>> levels are defined as follows:
>>
>> +-+---+--+
>> | **Bit** | **Value** | **Intrinsics Available** 
>> |
>> +-+---+--+
>> | 0   | 0x1   | __arm_cdp __arm_ldc, __arm_ldcl, __arm_stc,  
>> |
>> | |   | __arm_stcl, __arm_mcr and __arm_mrc  
>> |
>> +-+---+--+
>> | 1   | 0x2   | __arm_cdp2, __arm_ldc2, __arm_stc2, __arm_ldc2l, 
>> |
>> | |   | __arm_stc2l, __arm_mcr2 and __arm_mrc2   
>> |
>> +-+---+--+
>> | 2   | 0x4   | __arm_mcrr and __arm_mrrc
>> |
>> +-+---+--+
>> | 3   | 0x8   | __arm_mcrr2 and __arm_mrrc2  
>> |
>> +-+---+--+
>>
>> This patch implements full support for this feature macro as defined in 
>> section
>> 5.9 of the ACLE
>> (https://developer.arm.com/products/software-development-tools/compilers/arm-compiler-5/docs/101028/latest/5-feature-test-macros).
>>
>> gcc/ChangeLog:
>>
>> 2017-06-14  Prakhar Bahuguna  
>>
>>   * config/arm/arm-c.c (arm_cpu_builtins): New block to define
>>__ARM_FEATURE_COPROC according to support.
>>
>> 2017-06-14  Prakhar Bahuguna  
>>   * gcc/testsuite/gcc.target/arm/acle/cdp.c: Add feature macro bitmap
>>   test.
>>   * gcc/testsuite/gcc.target/arm/acle/cdp2.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/ldc.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/ldc2.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/ldc2l.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/ldcl.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/mcr.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/mcr2.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/mcrr.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/mcrr2.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/mrc.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/mrc2.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/mrrc.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/mrrc2.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/stc.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/stc2.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/stc2l.c: Likewise.
>>   * gcc/testsuite/gcc.target/arm/acle/stcl.c: Likewise.
>>
>> Testing done: ACLE regression tests updated with tests for feature macro 
>> bits.
>> All regression tests pass.
>>
>> Okay for trunk?
>>
>>
>> 0001-Implement-__ARM_FEATURE_COPROC-coprocessor-intrinsic.patch
>>
>>
>> From 79d71aec9d2bdee936b240ae49368ff5f8d8fc48 Mon Sep 17 00:00:00 2001
>> From: Prakhar Bahuguna 
>> Date: Tue, 2 May 2017 13:43:40 +0100
>> Subject: [PATCH] Implement __ARM_FEATURE_COPROC coprocessor intrinsic 
>> feature
>>  macro
>>
>> ---
>>  gcc/config/arm/arm-c.c| 19 +++
>>  gcc/testsuite/gcc.target/arm/acle/cdp.c   |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/cdp2.c  |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/ldc.c   |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/ldc2.c  |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/ldc2l.c |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/ldcl.c  |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/mcr.c   |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/mcr2.c  |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/mcrr.c  |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/mcrr2.c |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/mrc.c   |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/mrc2.c  |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/mrrc.c  |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/mrrc2.c |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/stc.c   |  3 +++
>>  gcc/testsuite/gcc.target/arm/acle/stc2.c  |  3 +++

Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-21 Thread Jeff Law

On 06/20/2017 05:22 PM, Wilco Dijkstra wrote:
> Jeff Law wrote:
>> But the stack pointer might have already been advanced into the guard
>> page by the caller.   For the sake of argument assume the guard page is
>> 0xf1000 and assume that our stack pointer at entry is 0xf1010 and that
>> the caller hasn't touched the 0xf1000 page.
>>
>> If FrameSize >= 32, then the stores are going to hit the 0xf page
>> rather than the 0xf1000 page.   That's jumping the guard.  Thus we have
>> to emit a probe prior to this stack allocation.
> 
> That's an incorrect ABI that allows adjusting the frame by 4080+32! A correct
> one might allow say 1024 bytes for outgoing arguments. That means when
> you call a function, there is still guard-page-size - 1024 bytes left that 
> you can
> use to allocate locals. With a 4K guard page that allows leaf functions up to 
> 3KB, 
> and depending on the frame locals of 2-3KB plus up to 1024 bytes of outgoing
> arguments without inserting any probes beyond the normal frame stores. 
> 
> This design means almost no functions need additional probes. Assuming we're
> also increasing the guard page size to 64KB, it's cheap even for large 
> functions.
I'm a little confused.  I'm not defining or changing the ABI.  I'm
working within my understanding of the existing aarch64 ABI used on
linux systems.  My understanding after reading that ABI and the prologue
code for aarch64 is there's nothing that can currently be relied upon in
terms of the offset from the incoming stack pointer to the most recent
"probe" in the caller.

Just limiting the size of the outgoing arguments is not sufficient
though.  You still have the dynamic allocation area in the caller.  The
threat model assumes the caller follows the ABI, but does not have to
have been compiled with -fstack-check.

Thus you'd have to have an ABI mandated probe at the lowest address of
the dynamic allocation area and limit the size of the outgoing stack
arguments for your suggestion to be useful.

If ARM wants to update the ABI, that's fine with me.  But until that
happens and compilers which implement that ABI are ubiquitous ISTM we
can't actually depend on those guarantees.

If I'm wrong and there is some guarantee we can rely on, let me know.
It's trivial to change the initial state to utilize such guarantees.

jeff

Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-21 Thread Jeff Law

On 06/21/2017 02:41 AM, Richard Earnshaw (lists) wrote:

>> But the stack pointer might have already been advanced into the guard
>> page by the caller.   For the sake of argument assume the guard page is
>> 0xf1000 and assume that our stack pointer at entry is 0xf1010 and that
>> the caller hasn't touched the 0xf1000 page.
> 
> Then make sure the caller does touch the 0xf1000 page.  If it's
> allocated that much stack it should be forced to do the probe and not
> rely on all it's children having to do it because it can't be bothered.
That needs to be mandated at the ABI level if it's going to happen.  The
threat model assumes that the caller adheres to the ABI, but was not
necessarily compiled with -fstack-check.

I'm all for making the common path fast and letting the uncommon cases
pay additional penalties.  That mindset has driven the work-to-date.

But I don't think I have the liberty to change existing ABIs to
facilitate lower overhead approaches.  But I think ARM does given it
owns the ABI for aarch64 and I would happily exploit whatever guarantees
we can derive from an updated ABI.

So if you want the caller to touch the page, you need to amend the ABI.
I'd think touching the lowest address of the alloca area and outgoing
args, if large would be sufficient.

jeff

Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-21 Thread Jeff Law

On 06/20/2017 04:20 PM, Eric Botcazou wrote:
>> But what you end up depending on is undocumented behavior of a
>> particular kernel implementation.  That seems rather unwise.
> 
> And it's the single example of such a thing in the entire codebase?
> I don't know the code of the sanitizer much, but from the outside it looks 
> full of similar tricks...
I think the sanitizer runtime code is a pile of *(&@#$.  But I'm not
involved with that :-)


> 
>> Which ABIs have that property?  I'll be the first to admit that I've
>> purged much of my weird ABI memories.
> 
> The original Alpha ABI mentioned by Richard IIRC for example.
> 
>> Supporting ABIs which force us into a probe, then allocate strategy is
>> actually easy.  We can use the existing -fstack-check code, but use the
>> value 0 for STACK_CHECK_PROTECT.
>>
>> Just replace all uses of STACK_CHECK_PROTECT with calls to a wrapper.
>>
>> The wrapper looks like
>>
>> if (magic_flag)
>>   return STACK_CHECK_PROTECT;
>> else
>>   return 0;
>>
>> That's precisely what we were planning to do prior to bumping against
>> the valgrind issues.  That indirection makes it easy to ensure we didn't
>> change the behavior of the existing stack-check for Ada, but also allows
>> us to change the behavior for the new stack checking option.
> 
> Yes, that would seem the most straightforward thing to do modulo Valgrind.
And we could still do that for ports like the Alpha which mandate that
model or for ports that don't care about valgrind.


> 
>> Ah, so if you're running on an alternate stack, then why probe ahead of
>> need?  I thought the whole point of probing a couple pages ahead as to
>> ensure you could take the signal the Ada.
> 
> We run on the alternate stack only when we do _not_ probe ahead, i.e. on 
> x86/x86-64 Linux.
Ah.  Another piece of the puzzle make sense.  Thanks.

Jeff

Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-21 Thread Wilco Dijkstra

Jeff Law wrote:

> I'm a little confused.  I'm not defining or changing the ABI.  I'm
> working within my understanding of the existing aarch64 ABI used on
> linux systems.  My understanding after reading that ABI and the prologue
> code for aarch64 is there's nothing that can currently be relied upon in
> terms of the offset from the incoming stack pointer to the most recent
> "probe" in the caller.

Well what we need is a properly defined ABI on when to emit probes.  That
doesn't exist yet indeed, so there is nothing you can rely on today.  But that
is true for any architecture. In particular if the caller hasn't been compiled 
with
probes, even if the call instruction does an implicit probe, you still have to 
assume that the stack guard has been breached already (ie. doing probes in
the callee is useless if they aren't done in the whole call chain).

Remember Richard's reply was to this:
>> aarch64 is significantly worse.  There are no implicit probes we can
>> exploit.  Furthermore, the prologue may allocate stack space 3-4 times.
>> So we have the track the distance to the most recent probe and when that
>> distance grows too large, we have to emit a probe.  Of course we have to
>> make worst case assumptions at function entry.

As pointed out, AArch64 is not significantly worse since the majority of frames
do not need any probes (let alone multiple probes as suggested), neither do we
need to make worst-case assumptions on entry (stack guard has already been
breached).

> Just limiting the size of the outgoing arguments is not sufficient
> though.  You still have the dynamic allocation area in the caller.  The
> threat model assumes the caller follows the ABI, but does not have to
> have been compiled with -fstack-check.

The only mitigation for that is to increase the stack guard. A callee cannot 
somehow undo crossing the stack guard by an unchecked caller (that includes
the case where calls do an implicit probe).

> Thus you'd have to have an ABI mandated probe at the lowest address of
> the dynamic allocation area and limit the size of the outgoing stack
> arguments for your suggestion to be useful.

Yes, every alloca will need at least one probe, larger ones need a loop or call
to do the probes.

> If ARM wants to update the ABI, that's fine with me.  But until that
> happens and compilers which implement that ABI are ubiquitous ISTM we
> can't actually depend on those guarantees.

Indeed, given current binaries don't do probes for alloca or large frames, the 
only
possible mitigation for those is to increase the stack guard.

Wilco

Re: [PATCH, AArch64] Add x86 intrinsic headers to GCC AArch64 taget

2017-06-21 Thread Steven Munroe

On Tue, 2017-06-20 at 17:16 -0500, Segher Boessenkool wrote:
> On Tue, Jun 20, 2017 at 09:34:25PM +, Joseph Myers wrote:
> > On Tue, 20 Jun 2017, Segher Boessenkool wrote:
> > 
> > > > And as you see see below the gcc.target tests have to be duplicated
> > > > anyway. Even if the C code is common there will many differences in
> > > > dg-options and dg-require-effective-target. Trying to common these
> > > > implementations only creates more small files to manage.
> > > 
> > > So somewhere in the near future we'll have to pull things apart again,
> > > if we go with merging things now.
> > 
> > The common part in the intrinsics implementation should be exactly the 
> > parts that can be implemented in GNU C without target-specific intrinsics 
> > being needed.  There should be nothing to pull apart if you start with the 
> > right things in the common header.  If a particular header has some 
> > functions that can be implemented in GNU C and some that need 
> > target-specific code, the generic GNU C functions should be in a common 
> > header, #included by the target-specific header.  The common header should 
> > have no conditionals on target architectures whatever (it might have 
> > conditionals on things like endianness).
> 
> I don't think there is much that will end up in the common header
> eventually.  If it was possible to describe most of this in plain C,
> and in such a way that it would optimise well, there would not *be*
> these intrinsics.
> 
> > I don't expect many different effective-target / dg-add-options keywords 
> > to be needed for common tests (obviously, duplicating tests for each 
> > architecture wanting these intrinsics is generally a bad idea).
> 
> Yeah, I think it should be possible to share the tests, perhaps with
> some added dg things (so that we don't have to repeat the same things
> over and over).
> 
I don't see how we can share the test as this requires platform unique
dg-options and dg-require-effective-target values to enforce the
platform restrictions you mentioned earlier.

RE: [PATCH] [i386] Enable Control-flow Enforcement Technology (CET).

2017-06-21 Thread Bernhard Reutner-Fischer

On 21 June 2017 16:07:29 CEST, "Tsimbalist, Igor V" 
 wrote:
>Thanks for the feedback. I'll redo the patch according to your
>comments.

what is "noni-tracking" ? Surplus i.
"codegeneration" probably lacks a space. 
Thanks,

[PATCH] Fix -Wmaybe-uninitialized warning on sse.md (PR target/81151)

2017-06-21 Thread Jakub Jelinek

Hi!

This expander has a gap in between the operands and match_dup indexes,
which results in genemit generating:
operand2 = operands[2];
(void) operand2;
where operands[2] has not been initialized.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
trunk?

2017-06-21  Jakub Jelinek  

PR target/81151
* config/i386/sse.md (round2): Renumber match_dup and
operands indexes to avoid gap between operands and match_dups.

--- gcc/config/i386/sse.md.jj   2017-05-24 11:59:06.0 +0200
+++ gcc/config/i386/sse.md  2017-06-21 14:10:02.768078833 +0200
@@ -15638,13 +15638,13 @@ (define_insn "sse4_1_round")])
 
 (define_expand "round2"
-  [(set (match_dup 4)
+  [(set (match_dup 3)
(plus:VF
  (match_operand:VF 1 "register_operand")
- (match_dup 3)))
+ (match_dup 2)))
(set (match_operand:VF 0 "register_operand")
(unspec:VF
- [(match_dup 4) (match_dup 5)]
+ [(match_dup 3) (match_dup 4)]
  UNSPEC_ROUND))]
   "TARGET_ROUND && !flag_trapping_math"
 {
@@ -15664,11 +15664,11 @@ (define_expand "round2"
   vec_half = ix86_build_const_vector (mode, true, half);
   vec_half = force_reg (mode, vec_half);
 
-  operands[3] = gen_reg_rtx (mode);
-  emit_insn (gen_copysign3 (operands[3], vec_half, operands[1]));
+  operands[2] = gen_reg_rtx (mode);
+  emit_insn (gen_copysign3 (operands[2], vec_half, operands[1]));
 
-  operands[4] = gen_reg_rtx (mode);
-  operands[5] = GEN_INT (ROUND_TRUNC);
+  operands[3] = gen_reg_rtx (mode);
+  operands[4] = GEN_INT (ROUND_TRUNC);
 })
 
 (define_expand "round2_sfix"

Jakub

[committed] Fix ICE on OVERLOAD in OpenMP clauses (PR c++/81154)

2017-06-21 Thread Jakub Jelinek

Hi!

tsubst_* ICEs when seeing an OVERLOAD that is dependent on template
arguments.  But if we see an OVERLOAD in an OpenMP data sharing/mapping clause,
even when processing_template_decl we know that it won't be a variable,
so we can as well diagnose it right away, without having to wait for the
instantiation.

Bootstrapped/regtested on x86_64-linux and i686-linux, committed to trunk,
queued for backports.

2017-06-21  Jakub Jelinek  

PR c++/81154
* semantics.c (handle_omp_array_sections_1, finish_omp_clauses):
Complain about t not being a variable if t is OVERLOAD even
when processing_template_decl.

* g++.dg/gomp/pr81154.C: New test.

--- gcc/cp/semantics.c.jj   2017-06-20 08:48:38.0 +0200
+++ gcc/cp/semantics.c  2017-06-21 17:47:03.281327745 +0200
@@ -4589,7 +4589,7 @@ handle_omp_array_sections_1 (tree c, tre
}
   if (!VAR_P (t) && TREE_CODE (t) != PARM_DECL)
{
- if (processing_template_decl)
+ if (processing_template_decl && TREE_CODE (t) != OVERLOAD)
return NULL_TREE;
  if (DECL_P (t))
error_at (OMP_CLAUSE_LOCATION (c),
@@ -6109,7 +6109,7 @@ finish_omp_clauses (tree clauses, enum c
  if (!VAR_P (t) && TREE_CODE (t) != PARM_DECL
  && (!field_ok || TREE_CODE (t) != FIELD_DECL))
{
- if (processing_template_decl)
+ if (processing_template_decl && TREE_CODE (t) != OVERLOAD)
break;
  if (DECL_P (t))
error ("%qD is not a variable in clause %qs", t,
@@ -6181,7 +6181,7 @@ finish_omp_clauses (tree clauses, enum c
  && ((ort & C_ORT_OMP_DECLARE_SIMD) != C_ORT_OMP
  || TREE_CODE (t) != FIELD_DECL))
{
- if (processing_template_decl)
+ if (processing_template_decl && TREE_CODE (t) != OVERLOAD)
break;
  if (DECL_P (t))
error ("%qD is not a variable in clause %", t);
@@ -6224,7 +6224,7 @@ finish_omp_clauses (tree clauses, enum c
  && ((ort & C_ORT_OMP_DECLARE_SIMD) != C_ORT_OMP
  || TREE_CODE (t) != FIELD_DECL))
{
- if (processing_template_decl)
+ if (processing_template_decl && TREE_CODE (t) != OVERLOAD)
break;
  if (DECL_P (t))
error ("%qD is not a variable in clause %", t);
@@ -6587,7 +6587,7 @@ finish_omp_clauses (tree clauses, enum c
}
  if (!VAR_P (t) && TREE_CODE (t) != PARM_DECL)
{
- if (processing_template_decl)
+ if (processing_template_decl && TREE_CODE (t) != OVERLOAD)
break;
  if (DECL_P (t))
error ("%qD is not a variable in % clause", t);
@@ -6669,7 +6669,7 @@ finish_omp_clauses (tree clauses, enum c
remove = true;
  else if (!VAR_P (t) && TREE_CODE (t) != PARM_DECL)
{
- if (processing_template_decl)
+ if (processing_template_decl && TREE_CODE (t) != OVERLOAD)
break;
  if (DECL_P (t))
error ("%qD is not a variable in % clause", t);
@@ -6800,7 +6800,7 @@ finish_omp_clauses (tree clauses, enum c
}
  if (!VAR_P (t) && TREE_CODE (t) != PARM_DECL)
{
- if (processing_template_decl)
+ if (processing_template_decl && TREE_CODE (t) != OVERLOAD)
break;
  if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
  && (OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER
--- gcc/testsuite/g++.dg/gomp/pr81154.C.jj  2017-06-21 18:13:21.235405677 
+0200
+++ gcc/testsuite/g++.dg/gomp/pr81154.C 2017-06-21 18:13:03.0 +0200
@@ -0,0 +1,57 @@
+// PR c++/81154
+// { dg-do compile }
+
+template 
+struct C
+{
+  int foo (T n) const
+  {
+#pragma omp parallel shared (foo)  // { dg-error "is not a variable in 
clause" }
+;
+#pragma omp parallel private (foo) // { dg-error "is not a variable in 
clause" }
+;
+#pragma omp parallel firstprivate (foo)// { dg-error "is not a 
variable in clause" }
+;
+#pragma omp parallel for lastprivate (foo) // { dg-error "is not a 
variable in clause" }
+for (T i = 0; i < n; i++)
+  ;
+#pragma omp parallel for linear (foo)  // { dg-error "is not a variable in 
clause" }
+for (T i = 0; i < n; i++)
+  ;
+#pragma omp parallel reduction (+:foo) // { dg-error "is not a variable in 
clause" }
+;
+return 0;
+  }
+  int foo (int x, int y) { return x; }
+};
+
+struct D
+{
+  typedef int T;
+  int foo (T n) const
+  {
+#pragma omp parallel shared (foo)  // { dg-error "is not a variable in 
clause" }
+;
+#pragma omp parallel private (foo) // { dg-error "is not a variable in 
clause" }
+;
+#pragma omp parallel firstprivate (foo)// { dg-error "is not a 
variable in clause" }
+;
+#pragma omp parallel for lastprivate (

Re: RFC: stack/heap collision vulnerability and mitigation with GCC

2017-06-21 Thread Florian Weimer

On 06/20/2017 11:52 PM, Jeff Law wrote:
> I've also wondered if a 2 page guard would solve some of these problems.
>  In the event of stack overflow, the kernel maps in one of the two pages
> for use by the signal handler.   But changing things at this point may
> not be worth the effort.

I think Hotspot does that as well.  At the low level, Java programs can
recover from stack overflow (but it's still a VM error which taints the
entire process because these errors can strike at too many places, and
critical invariants could be violated).

Thanks,
Florian

Re: [PATCH] Fix -Wmaybe-uninitialized warning on sse.md (PR target/81151)

2017-06-21 Thread Uros Bizjak

On Wed, Jun 21, 2017 at 8:27 PM, Jakub Jelinek  wrote:
> Hi!
>
> This expander has a gap in between the operands and match_dup indexes,
> which results in genemit generating:
> operand2 = operands[2];
> (void) operand2;
> where operands[2] has not been initialized.
>
> Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux, ok for
> trunk?
>
> 2017-06-21  Jakub Jelinek  
>
> PR target/81151
> * config/i386/sse.md (round2): Renumber match_dup and
> operands indexes to avoid gap between operands and match_dups.

OK for mainline and release branches.

Thanks,
Uros.

> --- gcc/config/i386/sse.md.jj   2017-05-24 11:59:06.0 +0200
> +++ gcc/config/i386/sse.md  2017-06-21 14:10:02.768078833 +0200
> @@ -15638,13 +15638,13 @@ (define_insn "sse4_1_round (set_attr "mode" "")])
>
>  (define_expand "round2"
> -  [(set (match_dup 4)
> +  [(set (match_dup 3)
> (plus:VF
>   (match_operand:VF 1 "register_operand")
> - (match_dup 3)))
> + (match_dup 2)))
> (set (match_operand:VF 0 "register_operand")
> (unspec:VF
> - [(match_dup 4) (match_dup 5)]
> + [(match_dup 3) (match_dup 4)]
>   UNSPEC_ROUND))]
>"TARGET_ROUND && !flag_trapping_math"
>  {
> @@ -15664,11 +15664,11 @@ (define_expand "round2"
>vec_half = ix86_build_const_vector (mode, true, half);
>vec_half = force_reg (mode, vec_half);
>
> -  operands[3] = gen_reg_rtx (mode);
> -  emit_insn (gen_copysign3 (operands[3], vec_half, operands[1]));
> +  operands[2] = gen_reg_rtx (mode);
> +  emit_insn (gen_copysign3 (operands[2], vec_half, operands[1]));
>
> -  operands[4] = gen_reg_rtx (mode);
> -  operands[5] = GEN_INT (ROUND_TRUNC);
> +  operands[3] = gen_reg_rtx (mode);
> +  operands[4] = GEN_INT (ROUND_TRUNC);
>  })
>
>  (define_expand "round2_sfix"
>
> Jakub

Re: [PATCH, AArch64] Add x86 intrinsic headers to GCC AArch64 taget

2017-06-21 Thread Segher Boessenkool

On Wed, Jun 21, 2017 at 12:55:54PM -0500, Steven Munroe wrote:
> > > I don't expect many different effective-target / dg-add-options keywords 
> > > to be needed for common tests (obviously, duplicating tests for each 
> > > architecture wanting these intrinsics is generally a bad idea).
> > 
> > Yeah, I think it should be possible to share the tests, perhaps with
> > some added dg things (so that we don't have to repeat the same things
> > over and over).
> > 
> I don't see how we can share the test as this requires platform unique
> dg-options and dg-require-effective-target values to enforce the
> platform restrictions you mentioned earlier.

Most dg-* take a target selector.  It probably will be handy to have
a specific effective-target for this, in any case, so it could be
made usable by all targets?

Segher

[PATCH, committed], Update tests for pr80510 on PowerPC

2017-06-21 Thread Michael Meissner

Andreas Schwab noticed that the tests for PR target/80510 were failing on
32-bit PowerPC systems.

In looking at the tests, the reason was it was trying to create a test that had
more than 32 live floating point values.  The loop to use all of the values had
tests of the form:

  for (i = 0; i < n; i++)
{
  ITYPE bit = get_bits (bits[i]);

  if ((bit & ((ITYPE)1) << 33) != 0) x_33 += one;
}

where ITYPE was a long type (and 32-bits on a 32-bit system).  I changed this
to __INT64_TYPE__, and the test does not work.  It turns out, only the 64-bit
system had added support for STXSDX and STXSSPX.

I've checked the following into trunk, and I will check it into the gcc-7 and
gcc-6 branches so that the test is run only on 64-bit systems, but I want to
look at enabling the support in 32-bit systems.

2017-06-21  Michael Meissner  

PR target/80510
* gcc.target/powerpc/pr80510-1.c: Restrict test to 64-bit until
32-bit support is added.  Change ITYPE size to 64-bit integer.
* gcc.target/powerpc/pr80510-2.c: Likewise.

Index: gcc/testsuite/gcc.target/powerpc/pr80510-1.c
===
--- gcc/testsuite/gcc.target/powerpc/pr80510-1.c(revision 249466)
+++ gcc/testsuite/gcc.target/powerpc/pr80510-1.c(working copy)
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_vsx_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power7" } } */
@@ -6,8 +6,10 @@
 
 /* Make sure that STXSDX is generated for double scalars in Altivec registers
on power7 instead of moving the value to a FPR register and doing a X-FORM
-   store.  */
+   store.
 
+   32-bit currently does not have support for STXSDX in the mov{df,dd} 
patterns.  */
+
 #ifndef TYPE
 #define TYPE double
 #endif
@@ -21,7 +23,7 @@
 #endif
 
 #ifndef ITYPE
-#define ITYPE long
+#define ITYPE __INT64_TYPE__
 #endif
 
 #ifdef DO_CALL
Index: gcc/testsuite/gcc.target/powerpc/pr80510-2.c
===
--- gcc/testsuite/gcc.target/powerpc/pr80510-2.c(revision 249466)
+++ gcc/testsuite/gcc.target/powerpc/pr80510-2.c(working copy)
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { powerpc*-*-* } } } */
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
 /* { dg-skip-if "" { powerpc*-*-darwin* } } */
 /* { dg-require-effective-target powerpc_p8vector_ok } */
 /* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power8" } } */
@@ -6,8 +6,10 @@
 
 /* Make sure that STXSSPX is generated for float scalars in Altivec registers
on power7 instead of moving the value to a FPR register and doing a X-FORM
-   store.  */
+   store.
 
+   32-bit currently does not have support for STXSSPX in the mov{sf,sd} 
patterns.  */
+
 #ifndef TYPE
 #define TYPE float
 #endif
@@ -21,7 +23,7 @@
 #endif
 
 #ifndef ITYPE
-#define ITYPE long
+#define ITYPE __INT64_TYPE__
 #endif
 
 #ifdef DO_CALL

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797

Re: [PATCH v2, rs6000] Add vec_reve support

2017-06-21 Thread Carl Love

GCC maintainers:

I have updated the patch per the comments from Segher.  I used the
functions GET_MODE_UNIT_SIZE() and  GET_MODE_NUNITS() as suggested.  I
didn't know about these functions, I checked out all the various
functions defined in machmode.h for future reference.

Note, I did make some additional changes to the test case
builtins-3-vec_reve-runnable.c for debugging purposes.  These changes
make it easy to turn on/off debugging.

Please let me know if the revised patch OK for gcc mainline?

  Carl Love

---

2017-06-21  Carl Love  

* config/rs6000/rs6000-c.c: Add support for built-in functions
vector bool char vec_reve (vector bool char);
vector signed char vec_reve (vector signed char);
vector unsigned char vec_reve (vector unsigned char);
vector bool int vec_reve (vector bool int);
vector signed int vec_reve (vector signed int);
vector unsigned int vec_reve (vector unsigned int);
vector bool long long vec_reve (vector bool long long);
vector signed long long vec_reve (vector signed long long);
vector unsigned long long vec_reve (vector unsigned long long);
vector bool short vec_reve (vector bool short);
vector signed short vec_reve (vector signed short);
vector double vec_reve (vector double);
vector float vec_reve (vector float);
* config/rs6000/rs6000-builtin.def (VREVE_V2DI, VREVE_V4SI,
VREVE_V8HI, VREVE_V16QI, VREVE_V2DF, VREVE_V4SF, VREVE): New builtin.
* config/rs6000/altivec.md (UNSPEC_VREVEV): New UNSPEC.
(altivec_vreve): New pattern.
* config/rs6000/altivec.h (vec_reve): New define.
* doc/extend.texi (vec_rev): Update the built-in documentation file
for the new built-in functions.

gcc/testsuite/ChangeLog:

2017-06-21  Carl Love  

* gcc.target/powerpc/builtins-3-vec_reve-runnable.c (test_results,
main): Add new runnable test file for the vec_rev built-ins.
---
 gcc/config/rs6000/altivec.h|   1 +
 gcc/config/rs6000/altivec.md   |  27 ++
 gcc/config/rs6000/rs6000-builtin.def   |   9 +
 gcc/config/rs6000/rs6000-c.c   |  29 ++
 gcc/doc/extend.texi|  13 +
 .../powerpc/builtins-3-vec_reve-runnable.c | 394 +
 6 files changed, 473 insertions(+)
 create mode 100644 
gcc/testsuite/gcc.target/powerpc/builtins-3-vec_reve-runnable.c

diff --git a/gcc/config/rs6000/altivec.h b/gcc/config/rs6000/altivec.h
index d542315..dd68ae1 100644
--- a/gcc/config/rs6000/altivec.h
+++ b/gcc/config/rs6000/altivec.h
@@ -142,6 +142,7 @@
 #define vec_madd __builtin_vec_madd
 #define vec_madds __builtin_vec_madds
 #define vec_mtvscr __builtin_vec_mtvscr
+#define vec_reve __builtin_vec_vreve
 #define vec_vmaxfp __builtin_vec_vmaxfp
 #define vec_vmaxsw __builtin_vec_vmaxsw
 #define vec_vmaxsh __builtin_vec_vmaxsh
diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 25b2768..736cc19 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -46,6 +46,7 @@
UNSPEC_VPACK_UNS_UNS_SAT
UNSPEC_VPACK_UNS_UNS_MOD
UNSPEC_VPACK_UNS_UNS_MOD_DIRECT
+   UNSPEC_VREVEV
UNSPEC_VSLV4SI
UNSPEC_VSLO
UNSPEC_VSR
@@ -3727,6 +3728,32 @@
   DONE;
 }")
 
+;; Vector reverse elements
+(define_expand "altivec_vreve2"
+  [(set (match_operand:VEC_A 0 "register_operand" "=v")
+   (unspec:VEC_A [(match_operand:VEC_A 1 "register_operand" "v")]
+ UNSPEC_VREVEV))]
+  "TARGET_ALTIVEC"
+{
+  int i, j, size, num_elements;
+  rtvec v = rtvec_alloc (16);
+  rtx mask = gen_reg_rtx (V16QImode);
+
+  size = GET_MODE_UNIT_SIZE (mode);
+  num_elements = GET_MODE_NUNITS (mode);
+
+  for (j = num_elements - 1; j >= 0; j--)
+for (i = 0; i < size; i++)
+  RTVEC_ELT (v, i + j * size)
+   =  gen_rtx_CONST_INT (QImode,
+ size * num_elements - j * size + i - size);
+
+  emit_insn (gen_vec_initv16qi (mask, gen_rtx_PARALLEL (V16QImode, v)));
+  emit_insn (gen_altivec_vperm_ (operands[0], operands[1],
+operands[1], mask));
+  DONE;
+})
+
 ;; Vector SIMD PEM v2.06c defines LVLX, LVLXL, LVRX, LVRXL,
 ;; STVLX, STVLXL, STVVRX, STVRXL are available only on Cell.
 (define_insn "altivec_lvlx"
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index 4682628..20974b4 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1130,6 +1130,13 @@ BU_ALTIVEC_1 (VUPKLSB, "vupklsb",CONST,  
altivec_vupklsb)
 BU_ALTIVEC_1 (VUPKLPX,   "vupklpx",CONST,  altivec_vupklpx)
 BU_ALTIVEC_1 (VUPKLSH,   "vupklsh",CONST,  altivec_vupklsh)
 
+BU_ALTIVEC_1 (VREVE_V2DI,  "vreve_v2di", CONST,  altivec_vrevev2di2)
+BU_ALTIVEC_1 (VREVE_V4SI,  "vreve_v4si", CO

libgo patch committed: Fix ptrace implementation on MIPS

2017-06-21 Thread Ian Lance Taylor

This patch from James Cowgill fixes the libgo ptrace implementation
for MIPS by modifying mksysinfo.sh to look for the pt_regs struct.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu, which
admittedly proves little.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 249208)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-6449e2832eef94eacf89c88fa16bede637f729ba
+b2bebba1f8a8185546c47f8460a3d5c2e31d0434
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/configure.ac
===
--- libgo/configure.ac  (revision 249205)
+++ libgo/configure.ac  (working copy)
@@ -580,7 +580,7 @@ AC_C_BIGENDIAN
 
 GCC_CHECK_UNWIND_GETIPINFO
 
-AC_CHECK_HEADERS(port.h sched.h semaphore.h sys/file.h sys/mman.h syscall.h 
sys/epoll.h sys/event.h sys/inotify.h sys/ptrace.h sys/syscall.h sys/user.h 
sys/utsname.h sys/select.h sys/socket.h net/if.h net/if_arp.h net/route.h 
netpacket/packet.h sys/prctl.h sys/mount.h sys/vfs.h sys/statfs.h sys/timex.h 
sys/sysinfo.h utime.h linux/ether.h linux/fs.h linux/reboot.h netinet/icmp6.h 
netinet/in_syst.h netinet/ip.h netinet/ip_mroute.h netinet/if_ether.h)
+AC_CHECK_HEADERS(port.h sched.h semaphore.h sys/file.h sys/mman.h syscall.h 
sys/epoll.h sys/event.h sys/inotify.h sys/ptrace.h sys/syscall.h sys/user.h 
sys/utsname.h sys/select.h sys/socket.h net/if.h net/if_arp.h net/route.h 
netpacket/packet.h sys/prctl.h sys/mount.h sys/vfs.h sys/statfs.h sys/timex.h 
sys/sysinfo.h utime.h linux/ether.h linux/fs.h linux/ptrace.h linux/reboot.h 
netinet/icmp6.h netinet/in_syst.h netinet/ip.h netinet/ip_mroute.h 
netinet/if_ether.h)
 
 AC_CHECK_HEADERS([linux/filter.h linux/if_addr.h linux/if_ether.h 
linux/if_tun.h linux/netlink.h linux/rtnetlink.h], [], [],
 [#ifdef HAVE_SYS_SOCKET_H
Index: libgo/go/syscall/syscall_linux_mipsx.go
===
--- libgo/go/syscall/syscall_linux_mipsx.go (revision 249205)
+++ libgo/go/syscall/syscall_linux_mipsx.go (working copy)
@@ -3,10 +3,24 @@
 // license that can be found in the LICENSE file.
 
 // +build linux
-// +build mips mipsle
+// +build mips mipsle mips64 mips64le mips64p32 mips64p32le
 
 package syscall
 
-func (r *PtraceRegs) PC() uint64 { return uint64(r.Regs[64]) }
+import "unsafe"
 
-func (r *PtraceRegs) SetPC(pc uint64) { r.Regs[64] = uint32(pc) }
+func (r *PtraceRegs) PC() uint64 {
+   return r.Cp0_epc
+}
+
+func (r *PtraceRegs) SetPC(pc uint64) {
+   r.Cp0_epc = pc
+}
+
+func PtraceGetRegs(pid int, regsout *PtraceRegs) (err error) {
+   return ptrace(PTRACE_GETREGS, pid, 0, uintptr(unsafe.Pointer(regsout)))
+}
+
+func PtraceSetRegs(pid int, regs *PtraceRegs) (err error) {
+   return ptrace(PTRACE_SETREGS, pid, 0, uintptr(unsafe.Pointer(regs)))
+}
Index: libgo/mksysinfo.sh
===
--- libgo/mksysinfo.sh  (revision 249205)
+++ libgo/mksysinfo.sh  (working copy)
@@ -317,9 +317,13 @@ if test "$regs" = ""; then
 upcase_fields "__user_psw_struct" "PtracePsw" >> ${OUT} || true
 upcase_fields "__user_fpregs_struct" "PtraceFpregs" >> ${OUT} || true
 upcase_fields "__user_per_struct" "PtracePer" >> ${OUT} || true
+  else
+# mips*
+regs=`grep '^type _pt_regs struct' gen-sysinfo.go || true`
   fi
 fi
 if test "$regs" != ""; then
+  regs=`echo $regs | sed -e 's/type _pt_regs struct//'`
   regs=`echo $regs |
 sed -e 's/type __*user_regs_struct struct //' -e 's/[{}]//g'`
   regs=`echo $regs | sed -e s'/^ *//'`
Index: libgo/sysinfo.c
===
--- libgo/sysinfo.c (revision 249205)
+++ libgo/sysinfo.c (working copy)
@@ -102,6 +102,9 @@
 #if defined(HAVE_LINUX_NETLINK_H)
 #include 
 #endif
+#if defined(HAVE_LINUX_PTRACE_H)
+#include 
+#endif
 #if defined(HAVE_LINUX_RTNETLINK_H)
 #include 
 #endif

libgo patch committed: implement randomTrap on mips64p32*

2017-06-21 Thread Ian Lance Taylor

This patch from James Cowgill implements randomTrip for mips64p32*.
Bootstrapped on x86_64-pc-linux-gnu, which proves little.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 249472)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-b2bebba1f8a8185546c47f8460a3d5c2e31d0434
+c49c752b4d2934cff325dd540821c4b27cc61a05
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/internal/syscall/unix/getrandom_linux_mips64p32x.go
===
--- libgo/go/internal/syscall/unix/getrandom_linux_mips64p32x.go
(revision 0)
+++ libgo/go/internal/syscall/unix/getrandom_linux_mips64p32x.go
(working copy)
@@ -0,0 +1,11 @@
+// Copyright 2016 The Go Authors. All rights reserved.
+// Use of this source code is governed by a BSD-style
+// license that can be found in the LICENSE file.
+
+// +build mipsn32 mips64p32 mips64p32le
+
+package unix
+
+// Linux getrandom system call number.
+// See GetRandom in getrandom_linux.go.
+const randomTrap uintptr = 6317
Index: libgo/go/internal/syscall/unix/getrandom_linux_mipsn32.go
===
--- libgo/go/internal/syscall/unix/getrandom_linux_mipsn32.go   (revision 
249205)
+++ libgo/go/internal/syscall/unix/getrandom_linux_mipsn32.go   (working copy)
@@ -1,11 +0,0 @@
-// Copyright 2016 The Go Authors. All rights reserved.
-// Use of this source code is governed by a BSD-style
-// license that can be found in the LICENSE file.
-
-// +build mipsn32
-
-package unix
-
-// Linux getrandom system call number.
-// See GetRandom in getrandom_linux.go.
-const randomTrap uintptr = 6317

libgo patch committed: Add mips64p32* to cgo tool

2017-06-21 Thread Ian Lance Taylor

This libgo patch by James Cowgill adds mips64p32* to the size maps in
the cgo tool.  Bootstrapped on x86_64-pc-linux-gnu.  Committed to
mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 249474)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-c49c752b4d2934cff325dd540821c4b27cc61a05
+5a97e51022e3b7798f985714ced3e02d6e730b54
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/cmd/cgo/main.go
===
--- libgo/go/cmd/cgo/main.go(revision 249205)
+++ libgo/go/cmd/cgo/main.go(working copy)
@@ -139,51 +139,55 @@ func usage() {
 }
 
 var ptrSizeMap = map[string]int64{
-   "386":  4,
-   "alpha":8,
-   "amd64":8,
-   "arm":  4,
-   "arm64":8,
-   "m68k": 4,
-   "mipso32":  4,
-   "mipsn32":  4,
-   "mipso64":  8,
-   "mipsn64":  8,
-   "mips": 4,
-   "mipsle":   4,
-   "mips64":   8,
-   "mips64le": 8,
-   "ppc":  4,
-   "ppc64":8,
-   "ppc64le":  8,
-   "s390": 4,
-   "s390x":8,
-   "sparc":4,
-   "sparc64":  8,
+   "386": 4,
+   "alpha":   8,
+   "amd64":   8,
+   "arm": 4,
+   "arm64":   8,
+   "m68k":4,
+   "mipso32": 4,
+   "mipsn32": 4,
+   "mipso64": 8,
+   "mipsn64": 8,
+   "mips":4,
+   "mipsle":  4,
+   "mips64":  8,
+   "mips64le":8,
+   "mips64p32":   4,
+   "mips64p32le": 4,
+   "ppc": 4,
+   "ppc64":   8,
+   "ppc64le": 8,
+   "s390":4,
+   "s390x":   8,
+   "sparc":   4,
+   "sparc64": 8,
 }
 
 var intSizeMap = map[string]int64{
-   "386":  4,
-   "alpha":8,
-   "amd64":8,
-   "arm":  4,
-   "arm64":8,
-   "m68k": 4,
-   "mipso32":  4,
-   "mipsn32":  4,
-   "mipso64":  8,
-   "mipsn64":  8,
-   "mips": 4,
-   "mipsle":   4,
-   "mips64":   8,
-   "mips64le": 8,
-   "ppc":  4,
-   "ppc64":8,
-   "ppc64le":  8,
-   "s390": 4,
-   "s390x":8,
-   "sparc":4,
-   "sparc64":  8,
+   "386": 4,
+   "alpha":   8,
+   "amd64":   8,
+   "arm": 4,
+   "arm64":   8,
+   "m68k":4,
+   "mipso32": 4,
+   "mipsn32": 4,
+   "mipso64": 8,
+   "mipsn64": 8,
+   "mips":4,
+   "mipsle":  4,
+   "mips64":  8,
+   "mips64le":8,
+   "mips64p32":   8,
+   "mips64p32le": 8,
+   "ppc": 4,
+   "ppc64":   8,
+   "ppc64le": 8,
+   "s390":4,
+   "s390x":   8,
+   "sparc":   4,
+   "sparc64": 8,
 }
 
 var cPrefix string

libgo patch committed: Use gc toolchain names for MIPS variants

2017-06-21 Thread Ian Lance Taylor

This patch from James Cowgill changes libgo to use the gc toolchain
names for MIPS variants.  Bootstrapped on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 249475)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-5a97e51022e3b7798f985714ced3e02d6e730b54
+3f713ddb2a9a2a736f3a12d71c56cb7fd444afba
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/configure.ac
===
--- libgo/configure.ac  (revision 249472)
+++ libgo/configure.ac  (working copy)
@@ -299,9 +299,9 @@ GOARCH_HUGEPAGESIZE="1 << 21"
[AC_MSG_ERROR([unknown MIPS ABI])
 [mips_abi="n32"]])])])])
 case "$mips_abi" in
-"o32") GOARCH=mipso32 ;;
-"n32") GOARCH=mipsn32 ;;
-"n64") GOARCH=mipsn64 ;;
+"o32") GOARCH=mips ;;
+"n32") GOARCH=mips64p32 ;;
+"n64") GOARCH=mips64 ;;
 "o64") GOARCH=mipso64 ;;
 esac
 case "$mips_abi" in
@@ -315,7 +315,8 @@ GOARCH_HUGEPAGESIZE="1 << 21"
 ;;
 esac
 case "${host}" in
-mips*el)
+mips*el-*-*)
+GOARCH="${GOARCH}le"
 ;;
 *)
GOARCH_BIGENDIAN=1

libgo patch committed: Remove old MIPS architecture names

2017-06-21 Thread Ian Lance Taylor

This patch from James Cowgill finishes standardizing on the names used
in the gc toolchain by removing the old names used in the gofrontend
sources.  It drops the mipso64 ABI, which has been dead for a long
time (as a historical note I think I invented the o64 ABI by accident
when I did the initial GCC MIPS64 port).  Bootstrapped on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 249476)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-3f713ddb2a9a2a736f3a12d71c56cb7fd444afba
+a4b455aa584e0d6e362a88597f11bba1427088e2
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/configure.ac
===
--- libgo/configure.ac  (revision 249476)
+++ libgo/configure.ac  (working copy)
@@ -208,7 +208,7 @@ AC_SUBST(USE_DEJAGNU)
 # supported by the gofrontend and all architectures supported by the
 # gc toolchain.
 # N.B. Keep in sync with gcc/testsuite/go.test/go-test.exp (go-set-goarch).
-ALLGOARCH="386 alpha amd64 amd64p32 arm armbe arm64 arm64be ia64 m68k mipso32 
mipsn32 mipso64 mipsn64 mips mipsle mips64 mips64le mips64p32 mips64p32le ppc 
ppc64 ppc64le s390 s390x sparc sparc64"
+ALLGOARCH="386 alpha amd64 amd64p32 arm armbe arm64 arm64be ia64 m68k mips 
mipsle mips64 mips64le mips64p32 mips64p32le ppc ppc64 ppc64le s390 s390x sparc 
sparc64"
 
 # All known GOARCH_FAMILY values.
 ALLGOARCHFAMILY="I386 ALPHA AMD64 ARM ARM64 IA64 M68K MIPS MIPS64 PPC PPC64 
S390 S390X SPARC SPARC64"
@@ -291,25 +291,19 @@ GOARCH_HUGEPAGESIZE="1 << 21"
 #error not n64
 #endif],
 [mips_abi="n64"],
-   [AC_COMPILE_IFELSE([
-#if _MIPS_SIM != _ABIO64
-#error not o64
-#endif],
-[mips_abi="o64"],
[AC_MSG_ERROR([unknown MIPS ABI])
-[mips_abi="n32"]])])])])
+[mips_abi="n32"]])])])
 case "$mips_abi" in
 "o32") GOARCH=mips ;;
 "n32") GOARCH=mips64p32 ;;
 "n64") GOARCH=mips64 ;;
-"o64") GOARCH=mipso64 ;;
 esac
 case "$mips_abi" in
 "o32" | "n32")
 GOARCH_FAMILY=MIPS
GOARCH_MINFRAMESIZE=4
 ;;
-"n64" | "o64")
+"n64")
 GOARCH_FAMILY=MIPS64
GOARCH_MINFRAMESIZE=8
 ;;
Index: libgo/go/cmd/cgo/main.go
===
--- libgo/go/cmd/cgo/main.go(revision 249475)
+++ libgo/go/cmd/cgo/main.go(working copy)
@@ -145,10 +145,6 @@ var ptrSizeMap = map[string]int64{
"arm": 4,
"arm64":   8,
"m68k":4,
-   "mipso32": 4,
-   "mipsn32": 4,
-   "mipso64": 8,
-   "mipsn64": 8,
"mips":4,
"mipsle":  4,
"mips64":  8,
@@ -171,10 +167,6 @@ var intSizeMap = map[string]int64{
"arm": 4,
"arm64":   8,
"m68k":4,
-   "mipso32": 4,
-   "mipsn32": 4,
-   "mipso64": 8,
-   "mipsn64": 8,
"mips":4,
"mipsle":  4,
"mips64":  8,
Index: libgo/go/go/build/syslist.go
===
--- libgo/go/go/build/syslist.go(revision 249205)
+++ libgo/go/go/build/syslist.go(working copy)
@@ -5,4 +5,4 @@
 package build
 
 const goosList = "aix android darwin dragonfly freebsd linux nacl netbsd 
openbsd plan9 solaris windows zos "
-const goarchList = "386 amd64 amd64p32 arm armbe arm64 arm64be alpha m68k 
ppc64 ppc64le mips mipsle mips64 mips64le mips64p32 mips64p32le mipso32 mipsn32 
mipsn64 mipso64 ppc s390 s390x sparc sparc64 "
+const goarchList = "386 amd64 amd64p32 arm armbe arm64 arm64be alpha m68k 
ppc64 ppc64le mips mipsle mips64 mips64le mips64p32 mips64p32le ppc s390 s390x 
sparc sparc64 "
Index: libgo/go/internal/syscall/unix/getrandom_linux_mips64p32x.go
===
--- libgo/go/internal/syscall/unix/getrandom_linux_mips64p32x.go
(revision 249474)
+++ libgo/go/internal/syscall/unix/getrandom_linux_mips64p32x.go
(working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build mipsn32 mips64p32 mips64p32le
+// +build mips64p32 mips64p32le
 
 package unix
 
Index: libgo/go/internal/syscall/unix/getrandom_linux_mips64x.go
===
--- libgo/go/internal/syscall/unix/getrandom_linux_mips64x.go   (revision 
249205)
+++ libgo/go/internal/syscall/unix/getrandom_linux_mips64x.go   (working copy)
@@ -2,7 +2,7 @@
 // Use of this source code is governed by a BSD-style
 // license that can be found in the LICENSE file.
 
-// +build mips64 mips64le mipsn64 mipso64
+// +build mips64 mips64le
 
 package unix
 
Index: libgo/go/runtime/hash32.go
==

Go patch committed: Fix missing case in Array_type::get_lvalue_pointer

2017-06-21 Thread Ian Lance Taylor

This patch from Than McIntosh fixes a missing case in
Array_type::get_value_pointer in the Go frontend.  It updates the code
in Array_type::get_value_pointer that handles "lvalue" context to look
for both regular var expressions and temp var expressions, since both
can appear in array/slice index expressions on the left hand side of
assignments.  Bootstrapped and ran Go testsuite on
x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 249477)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-a4b455aa584e0d6e362a88597f11bba1427088e2
+0b93af68feb0a4135e83dd9e6c11df1563d862a9
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/types.cc
===
--- gcc/go/gofrontend/types.cc  (revision 249208)
+++ gcc/go/gofrontend/types.cc  (working copy)
@@ -7635,12 +7635,19 @@ Array_type::get_value_pointer(Gogo*, Exp
 {
   Temporary_reference_expression* tref =
   array->temporary_reference_expression();
+  Var_expression* ve = array->var_expression();
   if (tref != NULL)
 {
   tref = tref->copy()->temporary_reference_expression();
   tref->set_is_lvalue();
   array = tref;
 }
+  else if (ve != NULL)
+{
+  ve = new Var_expression(ve->named_object(), ve->location());
+  ve->set_in_lvalue_pos();
+  array = ve;
+}
 }
 
   return Expression::make_slice_info(array,

Re: [PATCH, alpha, go]: Introduce applyRelocationsALPHA

2017-06-21 Thread Ian Lance Taylor

On Tue, Jun 20, 2017 at 12:46 PM, Uros Bizjak  wrote:
> This patch inroduces applyRelocationsALPHA to solve:
>
> FAIL: TestCgoConsistentResults
> FAIL: TestCgoPkgConfig
> FAIL: TestCgoHandlesWlORIGIN
>
> gotools errors.
>
> Bootstrapped and regression tested on alphaev68-linux-gnu.

Thanks!  Committed to mainline.

Ian

[PATCH,rs6000] Add IEEE 128 support for several existing built-in functions

2017-06-21 Thread Kelvin Nilsen


This patch adds IEEE 128 support to the existing scalar_insert_exp,
scalar_extract_exp, scalar_extract_sig, scalar_test_data_class, and
scalar_test_neg rs6000 built-in functions.  Test programs are provided
to exercise the new IEEE 128 functionality and to validate forms of
these built-in functions that do not depend on IEEE 128 support.

The patch has been boostrapped and tested on powerpc64le-unknown-linux
(both P8 and P9 targets) and powerpc-unknown-linux (beg-endian, with
both -m32 and -m64 target options) with no regressions.

Is this ok for the trunk?

gcc/ChangeLog:

2017-06-19  Kelvin Nilsen  

* config/rs6000/rs6000-c.c (altivec_overloaded_builtins): Add
array entries to represent __ieee128 versions of the
scalar_test_data_class, scalar_test_neg, scalar_extract_exp,
scalar_extract_sig, and scalar_insert_exp built-in functions.
(altivec_resolve_overloaded_builtin): Add special case handling
for the __builtin_scalar_insert_exp function, as represented by
the P9V_BUILTIN_VEC_VSIEDP constant.
* config/rs6000/rs6000-builtin.def (VSEEQP): Add scalar extract
exponent support for __ieee128 argument.
(VSESQP): Add scalar extract signature support for __ieee128
argument.
(VSTDCNQP): Add scalar test negative support for __ieee128
argument.
(VSIEQP): Add scalar insert exponent support for __int128 argument
with __ieee128 result.
(VSIEQPF): Add scalar insert exponent support for __ieee128
argument with __ieee128 result.
(VSTDCQP): Add scalar test data class support for __ieee128
argument.
(VSTDCNQP): Add overload support for scalar test negative with
__ieee128 argument.
(VSTDCQP): Add overload support for scalar test data class
__ieee128 argument.
* config/rs6000/vsx.md (UNSPEC_VSX_SIEXPQP): New constant.
(xsxexpqp): New insn for VSX scalar extract exponent quad
precision.
(xsxsigqp): New insn for VSX scalar extract significand quad
precision.
(xsiexpqpf): New insn for VSX scalar insert exponent quad
precision with floating point argument.
(xststdcqp): New expand for VSX scalar test data class quad
precision.
(xststdcnegqp): New expand for VSX scalar test negative quad
precision.
(xststdcqp): New insn to match expansions for VSX scalar test data
class quad precision and VSX scalar test negative quad precision.
* config/rs6000/rs6000.c (rs6000_expand_binop_builtin): Add
special case operand checking to enforce that second operand of
VSX scalar test data class with quad precision argument is a 7-bit
unsigned literal.
* doc/extend.texi (PowerPC AltiVec Built-in Functions): Add
prototypes and descriptions of __ieee128 versions of
scalar_extract_exp, scalar_extract_sig, scalar_insert_exp,
scalar_test_data_class, and scalar_test_neg built-in functions.

gcc/testsuite/ChangeLog:

2017-06-19  Kelvin Nilsen  

* gcc.target/powerpc/bfp/scalar-cmp-exp-eq-3.c: New test.
* gcc.target/powerpc/bfp/scalar-cmp-exp-eq-4.c: New test.
* gcc.target/powerpc/bfp/scalar-cmp-exp-gt-3.c: New test.
* gcc.target/powerpc/bfp/scalar-cmp-exp-gt-4.c: New test.
* gcc.target/powerpc/bfp/scalar-cmp-exp-lt-3.c: New test.
* gcc.target/powerpc/bfp/scalar-cmp-exp-lt-4.c: New test.
* gcc.target/powerpc/bfp/scalar-cmp-exp-unordered-3.c: New test.
* gcc.target/powerpc/bfp/scalar-cmp-exp-unordered-4.c: New test.
* gcc.target/powerpc/bfp/scalar-extract-exp-3.c: New test.
* gcc.target/powerpc/bfp/scalar-extract-exp-4.c: New test.
* gcc.target/powerpc/bfp/scalar-extract-exp-5.c: New test.
* gcc.target/powerpc/bfp/scalar-extract-exp-6.c: New test.
* gcc.target/powerpc/bfp/scalar-extract-exp-7.c: New test.
* gcc.target/powerpc/bfp/scalar-extract-sig-3.c: New test.
* gcc.target/powerpc/bfp/scalar-extract-sig-4.c: New test.
* gcc.target/powerpc/bfp/scalar-extract-sig-5.c: New test.
* gcc.target/powerpc/bfp/scalar-extract-sig-6.c: New test.
* gcc.target/powerpc/bfp/scalar-extract-sig-7.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-10.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-11.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-12.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-13.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-14.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-15.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-6.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-7.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-8.c: New test.
* gcc.target/powerpc/bfp/scalar-insert-exp-9.c: New test.
* gcc.target/

Go patch committed: Better stack trace for `go f()` where f is nil

2017-06-21 Thread Ian Lance Taylor

This Go frontend patch produces a better stack trace for `go f()`
where f is nil.  The test for this is TestGoNil in the runtime
package, which we don't run yet but will run with a subsequent gotools
patch.  Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.
Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 249487)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-dac4bb4f4ed8e7f2939d45439048dec2f6db14cf
+075e67bdbcb730669c1af1aa2d53bb77cbb2a3c5
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: gcc/go/gofrontend/gogo.h
===
--- gcc/go/gofrontend/gogo.h(revision 249205)
+++ gcc/go/gofrontend/gogo.h(working copy)
@@ -3379,6 +3379,9 @@ static const int RUNTIME_ERROR_MAKE_CHAN
 // Division by zero.
 static const int RUNTIME_ERROR_DIVISION_BY_ZERO = 10;
 
+// Go statement with nil function.
+static const int RUNTIME_ERROR_GO_NIL = 11;
+
 // This is used by some of the langhooks.
 extern Gogo* go_get_gogo();
 
Index: gcc/go/gofrontend/statements.cc
===
--- gcc/go/gofrontend/statements.cc (revision 249205)
+++ gcc/go/gofrontend/statements.cc (working copy)
@@ -2201,6 +2201,15 @@ Thunk_statement::simplify_statement(Gogo
 
   Location location = this->location();
 
+  bool is_constant_function = this->is_constant_function();
+  Temporary_statement* fn_temp = NULL;
+  if (!is_constant_function)
+{
+  fn_temp = Statement::make_temporary(NULL, fn, location);
+  block->insert_statement_before(block->statements()->size() - 1, fn_temp);
+  fn = Expression::make_temporary_reference(fn_temp, location);
+}
+
   std::string thunk_name = Gogo::thunk_name();
 
   // Build the thunk.
@@ -2212,7 +2221,7 @@ Thunk_statement::simplify_statement(Gogo
   // argument to the thunk.
 
   Expression_list* vals = new Expression_list();
-  if (!this->is_constant_function())
+  if (!is_constant_function)
 vals->push_back(fn);
 
   if (interface_method != NULL)
@@ -2238,6 +2247,23 @@ Thunk_statement::simplify_statement(Gogo
   // Allocate the initialized struct on the heap.
   constructor = Expression::make_heap_expression(constructor, location);
 
+  // Throw an error if the function is nil.  This is so that for `go
+  // nil` we get a backtrace from the go statement, rather than a
+  // useless backtrace from the brand new goroutine.
+  Expression* param = constructor;
+  if (!is_constant_function)
+{
+  fn = Expression::make_temporary_reference(fn_temp, location);
+  Expression* nil = Expression::make_nil(location);
+  Expression* isnil = Expression::make_binary(OPERATOR_EQEQ, fn, nil,
+ location);
+  Expression* crash = gogo->runtime_error(RUNTIME_ERROR_GO_NIL, location);
+  crash = Expression::make_conditional(isnil, crash,
+  Expression::make_nil(location),
+  location);
+  param = Expression::make_compound(crash, constructor, location);
+}
+
   // Look up the thunk.
   Named_object* named_thunk = gogo->lookup(thunk_name, NULL);
   go_assert(named_thunk != NULL && named_thunk->is_function());
@@ -2246,7 +2272,7 @@ Thunk_statement::simplify_statement(Gogo
   Expression* func = Expression::make_func_reference(named_thunk, NULL,
 location);
   Expression_list* params = new Expression_list();
-  params->push_back(constructor);
+  params->push_back(param);
   Call_expression* call = Expression::make_call(func, params, false, location);
 
   // Build the simple go or defer statement.
Index: libgo/runtime/go-runtime-error.c
===
--- libgo/runtime/go-runtime-error.c(revision 249205)
+++ libgo/runtime/go-runtime-error.c(working copy)
@@ -49,7 +49,10 @@ enum
   MAKE_CHAN_OUT_OF_BOUNDS = 9,
 
   /* Integer division by zero.  */
-  DIVISION_BY_ZERO = 10
+  DIVISION_BY_ZERO = 10,
+
+  /* Go statement with nil function.  */
+  GO_NIL = 11
 };
 
 extern void __go_runtime_error () __attribute__ ((noreturn));
@@ -84,6 +87,12 @@ __go_runtime_error (int32 i)
 case DIVISION_BY_ZERO:
   runtime_panicstring ("integer divide by zero");
 
+case GO_NIL:
+  /* This one is a throw, rather than a panic.  Set throwing to
+not dump full stacks.  */
+  runtime_g()->m->throwing = -1;
+  runtime_throw ("go of nil func value");
+
 default:
   runtime_panicstring ("unknown runtime error");
 }

libgo patch committed: Print "panic" rather than "runtime.gopanic"

2017-06-21 Thread Ian Lance Taylor

This libgo patch changes tracebacks to print "panic" rather than
"runtime.gopanic".  Since the user's code calls "panic", this is
generally clearer.  The test for this is TestPanicTraceback in
runtime/crash_test.go; we don't run it yet, but we will soon.
Bootstrapped and ran Go testsuite on x86_64-pc-linux-gnu.  Committed
to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 249494)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-075e67bdbcb730669c1af1aa2d53bb77cbb2a3c5
+f70ef19badafb85b1caa72b51b0082deb48d433a
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/runtime/traceback_gccgo.go
===
--- libgo/go/runtime/traceback_gccgo.go (revision 249205)
+++ libgo/go/runtime/traceback_gccgo.go (working copy)
@@ -77,7 +77,11 @@ func traceback(skip int32) {
 func printtrace(locbuf []location, gp *g) {
for i := range locbuf {
if showframe(locbuf[i].function, gp) {
-   print(locbuf[i].function, "\n\t", locbuf[i].filename, 
":", locbuf[i].lineno, "\n")
+   name := locbuf[i].function
+   if name == "runtime.gopanic" {
+   name = "panic"
+   }
+   print(name, "\n\t", locbuf[i].filename, ":", 
locbuf[i].lineno, "\n")
}
}
 }

Re: [PR80693] drop value of parallel SETs dropped by combine

2017-06-21 Thread Alexandre Oliva

On Jun  8, 2017, Segher Boessenkool  wrote:

> [ I missed this patch the first time around; please cc: me to prevent this ]

> On Thu, May 18, 2017 at 07:25:57AM -0300, Alexandre Oliva wrote:
>> When an insn used by combine has multiple SETs, only the non-REG_UNUSED
>> set is used: others will end up dropped on the floor.

> Sometimes, yes; not always.

You mean sets to non-REGs, I suppose.  I didn't take them into account
in my statement indeed, but I think it still applies: can_combine_p will
reject parallel SETs if two or more of them don't have a REG_UNUSED note
for their respective SET_DESTs.

>> PR rtl-optimization/80693
>> * combine.c (distribute_notes): Add IDEST parameter.  Reset any
>> REG_UNUSED REGs that are not IDEST, if IDEST is given.  Adjust
>> all callers.

> Most callers use NULL_RTX for idest.  It isn't obvious to me that this
> is correct.

My reasoning is that the block of try_combine that passes i[0-3]dest
will cover all combine-affected insns that could possibly have
REG_UNUSED notes, since these notes would have to be put back in the
insns at that point.  Since it's enough to reset the reg stats once,
that would do it, and so we need not be concerned with any other cases,
so we can pass NULL_RTX for idest elsewhere.

>> @@ -14087,6 +14090,26 @@ distribute_notes (rtx notes, rtx_insn *from_insn, 
>> rtx_insn *i3, rtx_insn *i2,
>> PUT_REG_NOTE_KIND (note, REG_DEAD);
>> place = i3;
>> }
>> +
>> +  /* If there were any parallel sets in FROM_INSN other than
>> + the one setting IDEST, it must be REG_UNUSED, otherwise
>> + we could not have used FROM_INSN in combine.  Since this
>> + combine attempt succeeded, we know this unused SET was
>> + dropped on the floor, because the insn was either deleted
>> + or created from a new pattern that does not use its
>> + SET_DEST.  We must forget whatever we knew about the
>> + value that was stored by that SET, since the prior value
>> + may still be present in IDEST's src expression or
>> + elsewhere, and we do not want to use properties of the
>> + dropped value as if they applied to the prior one when
>> + simplifying e.g. subsequent combine attempts.  */
>> +  if (idest && XEXP (note, 0) != idest)

> Would it work to just have "else" instead if this "if"?  Or hrm, we'll
> need to kill the recorded reg_stat value in the last case before this
> as well?

You mean instead of passing an idest to distribute_notes and testing it,
right?

We might also need to catch the case in which the first if block
breaks.  I think I had missed that.

> Could you try that out?  Or I can do it, let me know what you prefer.

I'll give it a shot.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer

Re: Unreviewed^2 build, cpp patches

2017-06-21 Thread Rainer Orth

Rainer Orth  writes:

> The following patches have remained unreviewed for two weeks despite a
> reminder:

it's three weeks now...

>   [build] Support --sysroot with Solaris ld
> https://gcc.gnu.org/ml/gcc-patches/2017-05/msg02342.html
>
> This needs a build maintainer, though I still think it could be
> considered obvious.

I've now installed this patch under the obvious rule: the `check ld
--help stderr, too' parts counts as such, I believe, and Paolo has
always been fine with me making configure.ac changes that only affect
Solaris.

>   Support $SYSROOT for = in -I etc.
> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00011.html
>
> This needs a cpp or C maintainer.

It still does...

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University

87 matches

Mail list logo