Re: [PATCH] Fix PR81175, make gather builtins pure

2017-06-26 Thread Uros Bizjak
On Fri, Jun 23, 2017 at 3:22 PM, Richard Biener  wrote:
> On Fri, 23 Jun 2017, Marc Glisse wrote:
>
>> On Fri, 23 Jun 2017, Richard Biener wrote:
>>
>> > The vectorizer is confused about the spurious VDEFs that are caused
>> > by gather vectorization so the following avoids them by making the
>> > builtins pure appropriately.
>> >
>> > Bootstrap / regtest pending on x86_64-unknown-linux-gnu, ok for
>> > trunk and branch?
>> >
>> > Thanks,
>> > Richard.
>> >
>> > 2017-06-23  Richard Biener  
>> >
>> > PR target/81175
>> > * config/i386/i386.c (struct builtin_isa): Add pure_p member.
>> > (def_builtin2): Initialize pure_p.
>> > (ix86_add_new_builtins): Honor pure_p.
>> > (def_builtin_pure): New function.
>>
>> If you svn update (or equivalent), you will notice that the above is already
>> available ;-)
>
> Sorry, that was the GCC 7 variant of the patch ...  just scrap the
> already available pieces for trunk ;)

The rest is OK.

Thanks,
Uros.


Re: [PATCH] [SPARC] Add a workaround for the LEON3FT store-store errata

2017-06-26 Thread Eric Botcazou
> Eric, does Daniel's patch meet your requirements now?

Yes, modulo the config/sparc/sparc-c.c hunk, what is it used for?

But the implementation looks a bit strange, can't we merge the essentially 
identical blocks of code into a single block, as for the other fixes?

-- 
Eric Botcazo


[PATCH, alpha, go]: Remove PtraceRegs definition to restore bootstrap

2017-06-26 Thread Uros Bizjak
Hello!

libgo is now able to automatically determine PtraceRegs. Attached
patch removes duplicate manual definition from system dependent
source.

Bootstrapped and regression tested on alphaev68-linux-gnu.

Uros.
Index: go/syscall/syscall_linux_alpha.go
===
--- go/syscall/syscall_linux_alpha.go   (revision 249592)
+++ go/syscall/syscall_linux_alpha.go   (working copy)
@@ -8,38 +8,6 @@
 
 import "unsafe"
 
-type PtraceRegs struct {
-   R0  uint64
-   R1  uint64
-   R2  uint64
-   R3  uint64
-   R4  uint64
-   R5  uint64
-   R6  uint64
-   R7  uint64
-   R8  uint64
-   R19 uint64
-   R20 uint64
-   R21 uint64
-   R22 uint64
-   R23 uint64
-   R24 uint64
-   R25 uint64
-   R26 uint64
-   R27 uint64
-   R28 uint64
-   Hae uint64
-   Trap_a0 uint64
-   Trap_a1 uint64
-   Trap_a2 uint64
-   Ps  uint64
-   Pc  uint64
-   Gp  uint64
-   R16 uint64
-   R17 uint64
-   R18 uint64
-}
-
 func (r *PtraceRegs) PC() uint64 {
return r.Pc
 }


Re: [PATCH] Fix more PR80928 fallout

2017-06-26 Thread Richard Biener
On Fri, 23 Jun 2017, Jeff Law wrote:

> On 06/23/2017 05:39 AM, Richard Biener wrote:
> > 
> > SLP induction vectorization runs into the issue that it remembers
> > pointers to PHI nodes in the SLP tree during analysis.  But those
> > may get invalidated by loop copying (for prologue/epilogue peeling
> > or versioning) as the low-level CFG helper copy_bbs works in the
> > way of copying individual BBs plus their outgoing edges but with
> > old destinations and at the end re-directing the edges to the
> > desired location.  In SSA this triggers the whole machinery of
> > making room for new PHI nodes -- that is undesirable because it
> > causes re-allocation of PHI nodes in the set of source blocks.
> > 
> > After much pondering I arrived at the following (least ugly) solution
> > to this "problem" (well, I define it as a problem, it's at least
> > an inefficiency and a workaround in the vectorizer would be way
> > uglier).  Namely simply do not trigger the SSA machinery for
> > blocks with BB_DUPLICATED (I skimmed all other users and they seem
> > fine).
> > 
> > In the process I also implemented some poisoning of the old PHI node
> > when we reallocate (well, free) PHI nodes.  But that triggers some
> > other issues, one fixed by the tree-ssa-phionlycoprop.c hunk below.
> > So I'm not submitting it as part of this fix.
> > 
> > Bootstrapped (with the poisoning sofar, plain patch still running)
> > on x86_64-unknown-linux-gnu, testing in progress.
> > 
> > Comments welcome, testing won't finish before I leave for the
> > weekend.
> I fully support poisoning the old PHI nodes -- I tracked down a similar
> problem just a few months back that probably would have been obvious if
> we had poisoned the old nodes (79621 which is now a missed optimization
> bug).
> 
> I wouldn't be surprised if there's others lurking and given the general
> trend of using block duplication to enable various optimizations,
> catching this stuff early would definitely be good.

I've applied this fix.  For reference, below is what passed bootstrap
and regtest minus a few PRE related testsuite fallouts (stupid PRE
simple-minded DCE is giving me a hard time here ... the interaction
between remove_dead_inserted_code and el_to_remove is quite ugly).

When looking at this I also wondered why/if the cache of allocated
PHI nodes is worth the extra trouble of duplicated API like
remove_phi_node vs. gsi_remove.  Not something I have time right now
to clean up, so I'll sit on the patch below for some more time as well.

Richard.

Index: gcc/tree-phinodes.c
===
--- gcc/tree-phinodes.c (revision 249638)
+++ gcc/tree-phinodes.c (working copy)
@@ -67,7 +67,7 @@ along with GCC; see the file COPYING3.
the -2 on all the calculations below.  */
 
 #define NUM_BUCKETS 10
-static GTY ((deletable (""))) vec *free_phinodes[NUM_BUCKETS 
- 2];
+static GTY ((deletable (""))) vec *free_phinodes[NUM_BUCKETS - 
2];
 static unsigned long free_phinode_count;
 
 static int ideal_phi_node_len (int);
@@ -103,10 +103,10 @@ allocate_phi_node (size_t len)
 
   /* If our free list has an element, then use it.  */
   if (bucket < NUM_BUCKETS - 2
-  && gimple_phi_capacity ((*free_phinodes[bucket])[0]) >= len)
+  && (*free_phinodes[bucket]).last()->capacity >= len)
 {
   free_phinode_count--;
-  phi = as_a  (free_phinodes[bucket]->pop ());
+  phi = free_phinodes[bucket]->pop ();
   if (free_phinodes[bucket]->is_empty ())
vec_free (free_phinodes[bucket]);
   if (GATHER_STATISTICS)
@@ -208,7 +208,7 @@ make_phi_node (tree var, int len)
 /* We no longer need PHI, release it so that it may be reused.  */
 
 static void
-release_phi_node (gimple *phi)
+release_phi_node (gphi *phi)
 {
   size_t bucket;
   size_t len = gimple_phi_capacity (phi);
@@ -220,6 +220,13 @@ release_phi_node (gimple *phi)
   imm = gimple_phi_arg_imm_use_ptr (phi, x);
   delink_imm_use (imm);
 }
+  if (flag_checking)
+{
+  memset (phi, 0xfe, (sizeof (struct gphi)
+ - sizeof (struct phi_arg_d)
+ + sizeof (struct phi_arg_d) * len));
+  phi->capacity = len;
+}
 
   bucket = len > NUM_BUCKETS - 1 ? NUM_BUCKETS - 1 : len;
   bucket -= 2;
@@ -438,7 +445,7 @@ remove_phi_args (edge e)
 void
 remove_phi_node (gimple_stmt_iterator *gsi, bool release_lhs_p)
 {
-  gimple *phi = gsi_stmt (*gsi);
+  gphi *phi = as_a  (gsi_stmt (*gsi));
 
   if (release_lhs_p)
 insert_debug_temps_for_defs (gsi);
@@ -447,9 +454,9 @@ remove_phi_node (gimple_stmt_iterator *g
 
   /* If we are deleting the PHI node, then we should release the
  SSA_NAME node so that it can be reused.  */
-  release_phi_node (phi);
   if (release_lhs_p)
 release_ssa_name (gimple_phi_result (phi));
+  release_phi_node (phi);
 }
 
 /* Remove all the phi nodes from BB.  */
Index: gcc/tree-ssa-pre.c
===
--- g

Re: [GCC][PATCH][mid-end] Optimize x * copysign (1.0, y) [Patch (1/2)]

2017-06-26 Thread Richard Biener
On Sat, 24 Jun 2017, Andrew Pinski wrote:

> On Mon, Jun 12, 2017 at 12:56 AM, Tamar Christina
>  wrote:
> > Hi All,
> >
> > this patch implements a optimization rewriting
> >
> > x * copysign (1.0, y) and
> > x * copysign (-1.0, y)
> 
> 
> This reminds me:
> copysign(-1.0, y) can be just optimized to:
> copysign(1.0, y)

I think I suggested that in my earlie review.

> I did that in my patch here:
> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01860.html
> 
> This should allow you to reduce the number of patterns needed to match here.
> Note I still think we could do this in expand without a new
> builtin/internal function.
> I might go and code that up soonish.
> 
> Thanks,
> Andrew
> 
> >
> > to:
> >
> > x ^ (y & (1 << sign_bit_position))
> >
> > This is done by creating a special builtin during matching and generate the
> > appropriate instructions during expand. This new builtin is called XORSIGN.
> >
> > The expansion of xorsign depends on if the backend has an appropriate optab
> > available. If this is not the case then we use a modified version of the 
> > existing
> > copysign which does not take the abs value of the first argument as a fall 
> > back.
> >
> > This patch is a revival of a previous patch
> > https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00069.html
> >
> > Bootstrapped on both aarch64-none-linux-gnu and x86_64 with no issues.
> > Regression done on aarch64-none-linux-gnu and no regressions.
> >
> > Ok for trunk?
> >
> > gcc/
> > 2017-06-07  Tamar Christina  
> >
> > * builtins.def (BUILT_IN_XORSIGN, BUILT_IN_XORSIGNF): New.
> > (BUILT_IN_XORSIGNL, BUILT_IN_XORSIGN_FLOAT_NX): Likewise.
> > * match.pd (mult (COPYSIGN:s real_onep @0) @1): New simplifier.
> > (mult (COPYSIGN:s real_mus_onep @0) @1): Likewise.
> > (copysigns @0 (negate @1)): Likewise.
> > * builtins.c (expand_builtin_copysign): Promoted local to argument.
> > (expand_builtin): Added CASE_FLT_FN_FLOATN_NX (BUILT_IN_XORSIGN) and
> > CASE_FLT_FN (BUILT_IN_XORSIGN).
> > (BUILT_IN_COPYSIGN): Updated function call.
> > * optabs.h (expand_copysign): New bool.
> > (expand_xorsign): New.
> > * optabs.def (xorsign_optab): New.
> > * optabs.c (expand_copysign): New parameter.
> > * fortran/f95-lang.c (xorsignl, xorsign, xorsignf): New.
> > * fortran/mathbuiltins.def (XORSIGN): New.
> >
> > gcc/testsuite/
> > 2017-06-07  Tamar Christina  
> >
> > * gcc.dg/tree-ssa/xorsign.c: New.
> > * gcc.dg/xorsign_exec.c: New.
> > * gcc.dg/vec-xorsign_exec.c: New.
> > * gcc.dg/tree-ssa/reassoc-39.c (f2, f3): Updated constant to 2.
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [Patch, ARM, testsuite] Add -mfloat-abi=hard to arm_neon_ok

2017-06-26 Thread Christophe Lyon
ping?

On 16 June 2017 at 17:39, Christophe Lyon  wrote:
> ping?
>
> On 7 June 2017 at 11:13, Christophe Lyon  wrote:
>> Hi,
>>
>>
>> On 2 June 2017 at 16:19, Christophe Lyon  wrote:
>>> Hi,
>>>
>>> I have recently updated the dejagnu version I use for
>>> cross-testing arm and aarch64 toolchains to 1.6+. One of the side
>>> effects was mentioned by Jonathan in
>>> https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01267.html. Since I
>>> use multilibs to test many configurations, I noticed several
>>> changes in the results I get.
>>>
>>> In particular, on arm-none-linux-gnueabihf with -march=armv5t,
>>> all the tests that require arm_neon_ok fail to compile because
>>> they now use -march=armv5t -mfpu=neon -mfloat-abi=softfp
>>> -march=armv7-a, which leads to a failure to include
>>> gnu/stubs-soft.h (not present since the target is
>>> 'hf'). Previously, -march=armv5t was appended, making the tests
>>> unsupported because -mfpu=neon conflicts with -march=armv5t. Now,
>>> arm_neon_ok succeeds because it only checks if some preprocessor
>>> defines are present.
>>>
>>> This patch fixes that by including arm_neon.h in arm_neon_ok, such
>>> that it fails for unsupported cases. However, since most of these
>>> tests should pass instead of becoming unsupported, I have added flag
>>> combinations with -mfloat-abi=hard.
>>>
>>> However, this is not sufficient to make the
>>> gcc.target/arm/lto/pr65837* tests pass: they do not require
>>> arm_neon_ok, and when I tried to add it, they still failed
>>> because these lto tests do not support dg-add-options. My
>>> proposal is to add a new
>>> check_effective_target_arm_neon_ok_no_float_abi function which
>>> tries to use neon without trying to change the -mfloat-abi
>>> setting (that is, the same as arm_neon_ok, with only ""
>>> and "-mfpu=neon" in the list of flags) . This makes these two lto
>>> tests unsupported for non-hf targets (again because
>>> gnu/stubs-soft.h is not present).
>>>
>>> To make them pass on "hf" targets:
>>> - I added -mfpu=neon to dg-lto-options in pr65837-attr_0.c,
>>>   because the fpu attributes in arm_neon.h only work if
>>>   -mfpu=neon is enabled
>>> - I removed dg-suppress-ld-options {-mfpu=neon} from pr65837_0.c,
>>>   -mfpu=neon is needed for the test to compile with toolchains
>>>   where the default fpu is not neon (eg vfpv3-d16-fp16)
>>>
>>> On arm-none-linux-gnueabihf --with-cpu=cortex-a9 --with-fpu=vfp
>>> and multilib test flag=-march=armv5t, this patch brings:
>>> - 2 UNRESOLVED -> FAIL (gcc.dg/vect/vect-align-1.c)
>>> - 14 UNRESOLVED -> XPASS (in gcc.dg/vect/)
>>> - 2765 new PASS
>>> - 3639 FAIL -> PASS
>>> - 1826 UNRESOLVED -> PASS
>>> - 102 UNRESOLVED -> XFAIL
>>>
>>> as visible in the red cell at
>>> http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/248552-gnu-stubs9.patch/report-build-info.html
>>> (the build-failed line can be ignored, it was caused by a server
>>> problem)
>>>
>>> Sorry, the explanation is almost longer than the patch :-)
>>>
>>> Is it OK for trunk?
>>> (Just realizing that I forgot to document the new functions :( )
>>>
>>
>> Here is an updated version with a bit of documentation for the new
>> effective target.
>> arm_neon_ok_no_float_abi now only tries to add -mfpu=neon, not ""
>> since we always
>> add -mfpu=neon in the lto tests anyway.
>>
>> OK?
>>
>>
>>> Thanks,
>>>
>>> Christophe


Re: [PATCH, ARM] Implement __ARM_FEATURE_COPROC coprocessor intrinsic feature macro

2017-06-26 Thread Christophe Lyon
ping?

On 21 June 2017 at 18:57, Christophe Lyon  wrote:
> Hi,
>
>
> On 19 June 2017 at 11:32, Richard Earnshaw (lists)
>  wrote:
>> On 16/06/17 15:56, Prakhar Bahuguna wrote:
>>> On 16/06/2017 15:37:18, Richard Earnshaw (lists) wrote:
 On 16/06/17 08:48, Prakhar Bahuguna wrote:
> On 15/06/2017 17:23:43, Richard Earnshaw (lists) wrote:
>> On 14/06/17 10:35, Prakhar Bahuguna wrote:
>>> The ARM ACLE defines the __ARM_FEATURE_COPROC macro which indicates 
>>> which
>>> coprocessor intrinsics are available for the target. If 
>>> __ARM_FEATURE_COPROC is
>>> undefined, the target does not support coprocessor intrinsics. The 
>>> feature
>>> levels are defined as follows:
>>>
>>> +-+---+--+
>>> | **Bit** | **Value** | **Intrinsics Available**
>>>  |
>>> +-+---+--+
>>> | 0   | 0x1   | __arm_cdp __arm_ldc, __arm_ldcl, __arm_stc, 
>>>  |
>>> | |   | __arm_stcl, __arm_mcr and __arm_mrc 
>>>  |
>>> +-+---+--+
>>> | 1   | 0x2   | __arm_cdp2, __arm_ldc2, __arm_stc2, 
>>> __arm_ldc2l, |
>>> | |   | __arm_stc2l, __arm_mcr2 and __arm_mrc2  
>>>  |
>>> +-+---+--+
>>> | 2   | 0x4   | __arm_mcrr and __arm_mrrc   
>>>  |
>>> +-+---+--+
>>> | 3   | 0x8   | __arm_mcrr2 and __arm_mrrc2 
>>>  |
>>> +-+---+--+
>>>
>>> This patch implements full support for this feature macro as defined in 
>>> section
>>> 5.9 of the ACLE
>>> (https://developer.arm.com/products/software-development-tools/compilers/arm-compiler-5/docs/101028/latest/5-feature-test-macros).
>>>
>>> gcc/ChangeLog:
>>>
>>> 2017-06-14  Prakhar Bahuguna  
>>>
>>>   * config/arm/arm-c.c (arm_cpu_builtins): New block to define
>>>__ARM_FEATURE_COPROC according to support.
>>>
>>> 2017-06-14  Prakhar Bahuguna  
>>>   * gcc/testsuite/gcc.target/arm/acle/cdp.c: Add feature macro bitmap
>>>   test.
>>>   * gcc/testsuite/gcc.target/arm/acle/cdp2.c: Likewise.
>>>   * gcc/testsuite/gcc.target/arm/acle/ldc.c: Likewise.
>>>   * gcc/testsuite/gcc.target/arm/acle/ldc2.c: Likewise.
>>>   * gcc/testsuite/gcc.target/arm/acle/ldc2l.c: Likewise.
>>>   * gcc/testsuite/gcc.target/arm/acle/ldcl.c: Likewise.
>>>   * gcc/testsuite/gcc.target/arm/acle/mcr.c: Likewise.
>>>   * gcc/testsuite/gcc.target/arm/acle/mcr2.c: Likewise.
>>>   * gcc/testsuite/gcc.target/arm/acle/mcrr.c: Likewise.
>>>   * gcc/testsuite/gcc.target/arm/acle/mcrr2.c: Likewise.
>>>   * gcc/testsuite/gcc.target/arm/acle/mrc.c: Likewise.
>>>   * gcc/testsuite/gcc.target/arm/acle/mrc2.c: Likewise.
>>>   * gcc/testsuite/gcc.target/arm/acle/mrrc.c: Likewise.
>>>   * gcc/testsuite/gcc.target/arm/acle/mrrc2.c: Likewise.
>>>   * gcc/testsuite/gcc.target/arm/acle/stc.c: Likewise.
>>>   * gcc/testsuite/gcc.target/arm/acle/stc2.c: Likewise.
>>>   * gcc/testsuite/gcc.target/arm/acle/stc2l.c: Likewise.
>>>   * gcc/testsuite/gcc.target/arm/acle/stcl.c: Likewise.
>>>
>>> Testing done: ACLE regression tests updated with tests for feature 
>>> macro bits.
>>> All regression tests pass.
>>>
>>> Okay for trunk?
>>>
>>>
>>> 0001-Implement-__ARM_FEATURE_COPROC-coprocessor-intrinsic.patch
>>>
>>>
>>> From 79d71aec9d2bdee936b240ae49368ff5f8d8fc48 Mon Sep 17 00:00:00 2001
>>> From: Prakhar Bahuguna 
>>> Date: Tue, 2 May 2017 13:43:40 +0100
>>> Subject: [PATCH] Implement __ARM_FEATURE_COPROC coprocessor intrinsic 
>>> feature
>>>  macro
>>>
>>> ---
>>>  gcc/config/arm/arm-c.c| 19 +++
>>>  gcc/testsuite/gcc.target/arm/acle/cdp.c   |  3 +++
>>>  gcc/testsuite/gcc.target/arm/acle/cdp2.c  |  3 +++
>>>  gcc/testsuite/gcc.target/arm/acle/ldc.c   |  3 +++
>>>  gcc/testsuite/gcc.target/arm/acle/ldc2.c  |  3 +++
>>>  gcc/testsuite/gcc.target/arm/acle/ldc2l.c |  3 +++
>>>  gcc/testsuite/gcc.target/arm/acle/ldcl.c  |  3 +++
>>>  gcc/testsuite/gcc.target/arm/acle/mcr.c   |  3 +++
>>>  gcc/testsuite/gcc.target/arm/acle/mcr2.c  |  3 +++
>>>  gcc/testsuite/gcc.target/arm/acle/mcrr.c  |  3 +++
>>>  gcc/testsuite/gcc.target/arm/acle/mcrr2.c |  3 +++
>>>  gcc/testsuite/gcc.target/arm/acle/mrc.c   |  3 +++
>>>  gcc/testsuite/gcc.target/arm/acle/mrc2.c  |  3 +++
>>>  gcc/testsuite/gcc.target/arm/acle/mrrc.c  |  3 +++
>>>  

Re: fix libcc1 dependencies in toplevel Makefile

2017-06-26 Thread Olivier Hainque
Hello Alex,

Thanks for the review and for the extensive comments on this,
much appreciated :)

> On Jun 22, 2017, at 14:12 , Alexandre Oliva  wrote:
> 
> On Jun 13, 2017, Olivier Hainque  wrote:
> 
>> 2017-06-13  Olivier Hainque  
> 
>>  * Makefile.def (host_modules): Set depgcc to true for libcc1,
>>  meaning need of a dep on stage_current if gcc-bootstrap and on
>>  maybe-all-gcc otherwise.
>>  (dependencies) Remove unconditional dependency on all-gcc.
> 
>>  * Makefile.tpl ("all" targets): Handle depgcc.
>>  * Makefile.in: Regenerate
> 
> This looks reasonable to me.  libcc1 is weird.  It's not a target
> library, it doesn't use the current stage tools for building.  It might
> as well not have any deps on the current stage's gcc, if it weren't for
> the fact that it includes headers from the current stage's gcc and links
> with current stage's host libraries, and even its configure reads from
> files created in current stage's gcc configuration.
> 
> So, it needs to be built after gcc and its host deps are built, and it
> needs to be configured after gcc is configured.  However, it is not part
> of the bootstrap, and we avoid building it more than once even in a
> bootstrap build.  That's what makes it special and tricky.

OK, thanks for summarizing the areas of intricacy.

> Your patch takes care of the build dependencies of libcc1, which should
> avoid some scenarios that might lead to concurrency between staged and
> non-staged builds.  However, I don't see that it ensures libcc1 will be
> built after GCC in bootstrap scenarios; it might do so under 'make
> bootstrap', but probably not under 'make all-libcc1'.  I think we may
> need some additional bootstrap-only explicit dependency for that to work
> properly.

I don't quite understand this: we're using the same prerequisite as target
libraries, e.g. all-target-libstdc++-v3 or all-target-libbacktrace, and I
don't see other deps for these either.

I don't see why the sequencing constraints for libcc1 should be tighter
than those for the target libraries.

I certainly don't grasp all the ramifications of the particularities you
outlined above, though.

> Furthermore, the patch does not take care of the configure dependencies
> of libcc1, so I think there might still be room for trouble, depending
> on what make targets are concurrently requested.  I'm not entirely sure
> this is true, though.

To my knowledge, we have never observed a problem in this area, assuming
our understanding of the problems we saw was correct :)

> I'd like to understand better what the concurrency problem is with the
> current build machinery, before we proceed with this change.  If you
> manage to trigger the problem again, could you try to further analyze
> build logs to check for e.g. concurrent activation of all-gcc in both
> the top-level Makefile and the recursed-into-for-stage1 Makefile, or
> somesuch?  Something else worth considering is what the make targets
> specified in the command line were.

The problems were showing pretty rarely, only on certain hosts, in
certain load conditions. We should still have the logs around and I'll
look into this. They are regular logs, without -d. I can almost for sure
fetch the exact "make" command line involved.

We had performed some analysis of what was happening, to our understanding.
I'll dig this out as well.

> All this said, I do agree that explicit deps on maybe-all-gcc are a
> likely source of trouble;

OK

> AFAICT all other host modules that are to be
> built after gcc depend on some target lib too.  Perhaps that brings some
> dep that libcc1 should have too...

This relates to the comments above. I guess I don't understand
what libcc1 might need that target libs wouldn't need.

Olivier


Re: [RFC][PR 67336][PING^2] Verify pointers during stack unwind

2017-06-26 Thread Yuri Gribov
On Sun, Jun 25, 2017 at 8:13 PM, Andrew Pinski  wrote:
> On Sun, Jun 25, 2017 at 12:08 PM, Yuri Gribov  wrote:
>> Hi all,
>>
>> Libgcc unwinder currently does not do any verification of pointers
>> which it chases on stack. In practice this not so rarely causes
>> segfaults when unwinding on corrupted stacks (e.g. when when trying to
>> print diagnostic on
>> fatal error) [1]. Ironically this usually happens in error reporting
>> code which puzzles users.
>>
>> I've attached one motivating example - with current libgcc it will
>> abort inside unwinder when trying to access invalid address
>> (0x0a0a0a0a).
>>
>> There is an old request to provide a safer version of
>> _Unwind_Backtrace (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67336)
>> that would check pointer validity prior to dereferencing. I've
>> attached a draft patch to see if this approach is viable and hopefully
>> get some advice.
>>
>> The change is rather straightforward: I add a new
>> _Unwind_Backtrace_Checked which checks return value of msync on every
>> potentially unsafe access (libunwind does something like this as well,
>> although in a very incomplete manner).
>> To avoid paying for syscalls too often, I cache the last checked
>> memory page. Potentially parsing /proc/$$/maps would allow for much
>> faster verification but I didn't bother too much as new APIs are
>> intended for reporting fatal errors where speed isn't an issue.
>>
>> The code is only implemented for DW2 unwinder (probably used on most 
>> targets).
>>
>> So my questions now are:
>> 1) Would this feature considered useful i.e. will it be accepted for
>> trunk once implementation is polished/tested?
>> 2) Should I strive to implement it for all possible targets or DW2
>> would do for now? I don't have easy access to other platforms (ARM,
>> C6x, etc.) so this may delay implementation.
>> 3) Any suggestions/comments on this attached draft implementation?
>> E.g. alternative syscalls to use (Andrew suggested futex), how many
>> verified addresses to cache, whether I should verify unwind table
>> accesses in addition to stack accesses, etc.
>
> The version script should be using GCC_8.0.0 since 6.2.0 has already
> shipped months ago.
> Also all patches should be submitted against the trunk and not a
> released version.

Well, that's an RFC so I thought maybe original patch might be enough
for general ok/not ok decision...

-Y


Re: [PATCH] Fold (a > 0 ? 1.0 : -1.0) into copysign (1.0, a) and a * copysign (1.0, a) into abs(a)

2017-06-26 Thread Richard Sandiford
Marc Glisse  writes:
> +(for cmp (gt ge lt le)
> + outp (convert convert negate negate)
> + outn (negate negate convert convert)
> + /* Transform (X > 0.0 ? 1.0 : -1.0) into copysign(1, X). */
> + /* Transform (X >= 0.0 ? 1.0 : -1.0) into copysign(1, X). */
> + /* Transform (X < 0.0 ? 1.0 : -1.0) into copysign(1,-X). */
> + /* Transform (X <= 0.0 ? 1.0 : -1.0) into copysign(1,-X). */
> + (simplify
> +  (cond (cmp @0 real_zerop) real_onep real_minus_onep)
> +  (if (!HONOR_NANS (type) && !HONOR_SIGNED_ZEROS (type)
> +   && types_match (type, TREE_TYPE (@0)))
> +   (switch
> +(if (types_match (type, float_type_node))
> + (BUILT_IN_COPYSIGNF { build_one_cst (type); } (outp @0)))
> +(if (types_match (type, double_type_node))
> + (BUILT_IN_COPYSIGN { build_one_cst (type); } (outp @0)))
> +(if (types_match (type, long_double_type_node))
> + (BUILT_IN_COPYSIGNL { build_one_cst (type); } (outp @0))
>
> There is already a 1.0 of the right type in the input, it would be easier 
> to reuse it in the output than build a new one.
>
> Non-generic builtins like copysign are such a pain... We also end up 
> missing the 128-bit case that way (pre-existing problem, not your patch). 
> We seem to have a corresponding internal function, but apparently it is 
> not used until expansion (well, maybe during vectorization).

It should be OK to introduce uses of the internal functions whenever
it's useful.  The match code will check that the internal function is
implemented before allowing the transformation.

The idea was more-or-less:

- Leave calls to libm functions alone until expand if there's no
  particular benefit to converting them earlier.  This avoids introducing
  a gratuitous difference between targets that can and can't open-code the
  function.

- Fold calls to libm functions to calls to libm functions where
  possible, because these transformations work regardless of whether the
  function can be open-coded.

- When introducing new calls, use internal functions if we need to be
  sure that the target has an appropriate optab.

- Also use internal functions to represent the non-errno setting forms
  of an internal function, in cases where the built-in functions are
  assumed to set errno.

But IFN_COPYSIGN might not be useful as-is, since the copysign built-ins
are usually expanded without the help of an optab.  It should be possible
to change things so that IFN_COPYSIGN is supported in the same situations
as the built-in though.

Thanks,
Richard


Re: [GCC][PATCH][mid-end] Optimize x * copysign (1.0, y) [Patch (1/2)]

2017-06-26 Thread Tamar Christina
Hi Andrew,

Thanks! I'll put together the rest today or tomorrow.
Sorry for the slow response on this one.

Tamar

From: Andrew Pinski 
Sent: Monday, June 26, 2017 3:09:54 AM
To: Tamar Christina
Cc: GCC Patches; nd; l...@redhat.com; i...@airs.com; rguent...@suse.de
Subject: Re: [GCC][PATCH][mid-end] Optimize x * copysign (1.0, y) [Patch (1/2)]

On Sat, Jun 24, 2017 at 4:53 PM, Andrew Pinski  wrote:
> On Mon, Jun 12, 2017 at 12:56 AM, Tamar Christina
>  wrote:
>> Hi All,
>>
>> this patch implements a optimization rewriting
>>
>> x * copysign (1.0, y) and
>> x * copysign (-1.0, y)
>
>
> This reminds me:
> copysign(-1.0, y) can be just optimized to:
> copysign(1.0, y)
>
> I did that in my patch here:
> https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01860.html

I updated the patch to handle all constants and not just -1.0.

>
> This should allow you to reduce the number of patterns needed to match here.
> Note I still think we could do this in expand without a new
> builtin/internal function.
> I might go and code that up soonish.

Also something like attached (NOTE this is NOT a full patch and needs
the xorsign optabs part of your patch) should work for the expand side
rather than creating a new builtin.  There still needs to handling of
the vector based copysign.  But you should get the general idea.  I
would like to see more of these special expand patterns really.

NOTE you can remove the target hook part and just check if xorsign
optab is there.  I don't know if that is what we want to do if not
allow for generic expanding of this.

Thanks,
Andrew Pinski


>
> Thanks,
> Andrew
>
>>
>> to:
>>
>> x ^ (y & (1 << sign_bit_position))
>>
>> This is done by creating a special builtin during matching and generate the
>> appropriate instructions during expand. This new builtin is called XORSIGN.
>>
>> The expansion of xorsign depends on if the backend has an appropriate optab
>> available. If this is not the case then we use a modified version of the 
>> existing
>> copysign which does not take the abs value of the first argument as a fall 
>> back.
>>
>> This patch is a revival of a previous patch
>> https://gcc.gnu.org/ml/gcc-patches/2015-10/msg00069.html
>>
>> Bootstrapped on both aarch64-none-linux-gnu and x86_64 with no issues.
>> Regression done on aarch64-none-linux-gnu and no regressions.
>>
>> Ok for trunk?
>>
>> gcc/
>> 2017-06-07  Tamar Christina  
>>
>> * builtins.def (BUILT_IN_XORSIGN, BUILT_IN_XORSIGNF): New.
>> (BUILT_IN_XORSIGNL, BUILT_IN_XORSIGN_FLOAT_NX): Likewise.
>> * match.pd (mult (COPYSIGN:s real_onep @0) @1): New simplifier.
>> (mult (COPYSIGN:s real_mus_onep @0) @1): Likewise.
>> (copysigns @0 (negate @1)): Likewise.
>> * builtins.c (expand_builtin_copysign): Promoted local to argument.
>> (expand_builtin): Added CASE_FLT_FN_FLOATN_NX (BUILT_IN_XORSIGN) and
>> CASE_FLT_FN (BUILT_IN_XORSIGN).
>> (BUILT_IN_COPYSIGN): Updated function call.
>> * optabs.h (expand_copysign): New bool.
>> (expand_xorsign): New.
>> * optabs.def (xorsign_optab): New.
>> * optabs.c (expand_copysign): New parameter.
>> * fortran/f95-lang.c (xorsignl, xorsign, xorsignf): New.
>> * fortran/mathbuiltins.def (XORSIGN): New.
>>
>> gcc/testsuite/
>> 2017-06-07  Tamar Christina  
>>
>> * gcc.dg/tree-ssa/xorsign.c: New.
>> * gcc.dg/xorsign_exec.c: New.
>> * gcc.dg/vec-xorsign_exec.c: New.
>> * gcc.dg/tree-ssa/reassoc-39.c (f2, f3): Updated constant to 2.


Re: [Patch, ARM, testsuite] Add -mfloat-abi=hard to arm_neon_ok

2017-06-26 Thread Kyrill Tkachov

Hi Christophe,

On 07/06/17 10:13, Christophe Lyon wrote:

Hi,


On 2 June 2017 at 16:19, Christophe Lyon  wrote:

Hi,

I have recently updated the dejagnu version I use for
cross-testing arm and aarch64 toolchains to 1.6+. One of the side
effects was mentioned by Jonathan in
https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01267.html. Since I
use multilibs to test many configurations, I noticed several
changes in the results I get.

In particular, on arm-none-linux-gnueabihf with -march=armv5t,
all the tests that require arm_neon_ok fail to compile because
they now use -march=armv5t -mfpu=neon -mfloat-abi=softfp
-march=armv7-a, which leads to a failure to include
gnu/stubs-soft.h (not present since the target is
'hf'). Previously, -march=armv5t was appended, making the tests
unsupported because -mfpu=neon conflicts with -march=armv5t. Now,
arm_neon_ok succeeds because it only checks if some preprocessor
defines are present.

This patch fixes that by including arm_neon.h in arm_neon_ok, such
that it fails for unsupported cases. However, since most of these
tests should pass instead of becoming unsupported, I have added flag
combinations with -mfloat-abi=hard.

However, this is not sufficient to make the
gcc.target/arm/lto/pr65837* tests pass: they do not require
arm_neon_ok, and when I tried to add it, they still failed
because these lto tests do not support dg-add-options. My
proposal is to add a new
check_effective_target_arm_neon_ok_no_float_abi function which
tries to use neon without trying to change the -mfloat-abi
setting (that is, the same as arm_neon_ok, with only ""
and "-mfpu=neon" in the list of flags) . This makes these two lto
tests unsupported for non-hf targets (again because
gnu/stubs-soft.h is not present).

To make them pass on "hf" targets:
- I added -mfpu=neon to dg-lto-options in pr65837-attr_0.c,
   because the fpu attributes in arm_neon.h only work if
   -mfpu=neon is enabled
- I removed dg-suppress-ld-options {-mfpu=neon} from pr65837_0.c,
   -mfpu=neon is needed for the test to compile with toolchains
   where the default fpu is not neon (eg vfpv3-d16-fp16)

On arm-none-linux-gnueabihf --with-cpu=cortex-a9 --with-fpu=vfp
and multilib test flag=-march=armv5t, this patch brings:
- 2 UNRESOLVED -> FAIL (gcc.dg/vect/vect-align-1.c)
- 14 UNRESOLVED -> XPASS (in gcc.dg/vect/)
- 2765 new PASS
- 3639 FAIL -> PASS
- 1826 UNRESOLVED -> PASS
- 102 UNRESOLVED -> XFAIL

as visible in the red cell at
http://people.linaro.org/~christophe.lyon/cross-validation/gcc-test-patches/248552-gnu-stubs9.patch/report-build-info.html
(the build-failed line can be ignored, it was caused by a server
problem)

Sorry, the explanation is almost longer than the patch :-)

Is it OK for trunk?
(Just realizing that I forgot to document the new functions :( )


Here is an updated version with a bit of documentation for the new
effective target.
arm_neon_ok_no_float_abi now only tries to add -mfpu=neon, not ""
since we always
add -mfpu=neon in the lto tests anyway.

OK?



This is ok.
Sorry for the delay.

Kyrill


Thanks,

Christophe




[Patch testsuite]

2017-06-26 Thread Dominique d'Humières
Is it OK to commit the following patch (darwin only)?

--- ../_clean/gcc/testsuite/gcc.dg/pubtypes-2.c 2017-06-17 17:55:51.0 
+0200
+++ gcc/testsuite/gcc.dg/pubtypes-2.c   2017-06-25 18:01:52.0 +0200
@@ -2,7 +2,7 @@
 /* { dg-options "-O0 -gdwarf-2 -dA" } */
 /* { dg-skip-if "Unmatchable assembly" { mmix-*-* } } */
 /* { dg-final { scan-assembler "__debug_pubtypes" } } */
-/* { dg-final { scan-assembler "long+\[ \t\]+0x13b+\[ \t\]+\[#;]+\[ \t\]+Pub 
Info Length" } } */
+/* { dg-final { scan-assembler "long+\[ \t\]+0x125\[ \t\]+\[#;]+\[ \t\]+Pub 
Info Length" } } */
 /* { dg-final { scan-assembler "used_struct0\"+\[ \t\]+\[#;]+\[ 
\t\]+external name" } } */
 /* { dg-final { scan-assembler-not "unused_struct0\"+\[ \t\]+\[#;]+\[ 
\t\]+external name" } } */
 
--- ../_clean/gcc/testsuite/gcc.dg/pubtypes-3.c 2017-06-17 17:55:52.0 
+0200
+++ gcc/testsuite/gcc.dg/pubtypes-3.c   2017-06-25 18:03:38.0 +0200
@@ -2,7 +2,7 @@
 /* { dg-options "-O0 -gdwarf-2 -dA" } */
 /* { dg-skip-if "Unmatchable assembly" { mmix-*-* } } */
 /* { dg-final { scan-assembler "__debug_pubtypes" } } */
-/* { dg-final { scan-assembler "long+\[ \t\]+0x13b+\[ \t\]+\[#;]+\[ \t\]+Pub 
Info Length" } } */
+/* { dg-final { scan-assembler "long+\[ \t\]+0x125\[ \t\]+\[#;]+\[ \t\]+Pub 
Info Length" } } */
 /* { dg-final { scan-assembler "used_struct0\"+\[ \t\]+\[#;]+\[ 
\t\]+external name" } } */
 /* { dg-final { scan-assembler-not "unused_struct0\"+\[ \t\]+\[#;]+\[ 
\t\]+external name" } } */
 /* { dg-final { scan-assembler-not "\"list_name_type0\"+\[ \t\]+\[#;]+\[ 
\t\]+external name" } } */
--- ../_clean/gcc/testsuite/gcc.dg/pubtypes-4.c 2017-06-17 17:55:51.0 
+0200
+++ gcc/testsuite/gcc.dg/pubtypes-4.c   2017-06-25 18:04:38.0 +0200
@@ -2,7 +2,7 @@
 /* { dg-options "-O0 -gdwarf-2 -dA" } */
 /* { dg-skip-if "Unmatchable assembly" { mmix-*-* } } */
 /* { dg-final { scan-assembler "__debug_pubtypes" } } */
-/* { dg-final { scan-assembler "long+\[ \t\]+0x172+\[ \t\]+\[#;]+\[ \t\]+Pub 
Info Length" } } */
+/* { dg-final { scan-assembler "long+\[ \t\]+0x15c\[ \t\]+\[#;]+\[ \t\]+Pub 
Info Length" } } */
 /* { dg-final { scan-assembler "used_struct0\"+\[ \t\]+\[#;]+\[ 
\t\]+external name" } } */
 /* { dg-final { scan-assembler-not "unused_struct0\"+\[ \t\]+\[#;]+\[ 
\t\]+external name" } } */
 /* { dg-final { scan-assembler "\"list_name_type0\"+\[ \t\]+\[#;]+\[ 
\t\]+external name" } } */

TIA

Dominique



Re: [Patch testsuite]

2017-06-26 Thread Rainer Orth
Hi Dominique,

> Is it OK to commit the following patch (darwin only)?

this patch needs a ChangeLog entry (and preferably a description of the
problem you're fixing ;-)

> --- ../_clean/gcc/testsuite/gcc.dg/pubtypes-2.c 2017-06-17
> 17:55:51.0 +0200
> +++ gcc/testsuite/gcc.dg/pubtypes-2.c 2017-06-25 18:01:52.0 +0200
> @@ -2,7 +2,7 @@
>  /* { dg-options "-O0 -gdwarf-2 -dA" } */
>  /* { dg-skip-if "Unmatchable assembly" { mmix-*-* } } */
>  /* { dg-final { scan-assembler "__debug_pubtypes" } } */
> -/* { dg-final { scan-assembler "long+\[ \t\]+0x13b+\[ \t\]+\[#;]+\[
> \t\]+Pub Info Length" } } */
> +/* { dg-final { scan-assembler "long+\[ \t\]+0x125\[ \t\]+\[#;]+\[
> \t\]+Pub Info Length" } } */
>  /* { dg-final { scan-assembler "used_struct0\"+\[ \t\]+\[#;]+\[
> \t\]+external name" } } */
>  /* { dg-final { scan-assembler-not "unused_struct0\"+\[ \t\]+\[#;]+\[
> \t\]+external name" } } */
>  
> --- ../_clean/gcc/testsuite/gcc.dg/pubtypes-3.c 2017-06-17
> 17:55:52.0 +0200
> +++ gcc/testsuite/gcc.dg/pubtypes-3.c 2017-06-25 18:03:38.0 +0200
> @@ -2,7 +2,7 @@
>  /* { dg-options "-O0 -gdwarf-2 -dA" } */
>  /* { dg-skip-if "Unmatchable assembly" { mmix-*-* } } */
>  /* { dg-final { scan-assembler "__debug_pubtypes" } } */
> -/* { dg-final { scan-assembler "long+\[ \t\]+0x13b+\[ \t\]+\[#;]+\[
> \t\]+Pub Info Length" } } */
> +/* { dg-final { scan-assembler "long+\[ \t\]+0x125\[ \t\]+\[#;]+\[
> \t\]+Pub Info Length" } } */
>  /* { dg-final { scan-assembler "used_struct0\"+\[ \t\]+\[#;]+\[
> \t\]+external name" } } */
>  /* { dg-final { scan-assembler-not "unused_struct0\"+\[ \t\]+\[#;]+\[
> \t\]+external name" } } */
>  /* { dg-final { scan-assembler-not "\"list_name_type0\"+\[
> \t\]+\[#;]+\[ \t\]+external name" } } */
> --- ../_clean/gcc/testsuite/gcc.dg/pubtypes-4.c 2017-06-17
> 17:55:51.0 +0200
> +++ gcc/testsuite/gcc.dg/pubtypes-4.c 2017-06-25 18:04:38.0 +0200
> @@ -2,7 +2,7 @@
>  /* { dg-options "-O0 -gdwarf-2 -dA" } */
>  /* { dg-skip-if "Unmatchable assembly" { mmix-*-* } } */
>  /* { dg-final { scan-assembler "__debug_pubtypes" } } */
> -/* { dg-final { scan-assembler "long+\[ \t\]+0x172+\[ \t\]+\[#;]+\[
> \t\]+Pub Info Length" } } */
> +/* { dg-final { scan-assembler "long+\[ \t\]+0x15c\[ \t\]+\[#;]+\[
> \t\]+Pub Info Length" } } */
>  /* { dg-final { scan-assembler "used_struct0\"+\[ \t\]+\[#;]+\[
> \t\]+external name" } } */
>  /* { dg-final { scan-assembler-not "unused_struct0\"+\[ \t\]+\[#;]+\[
> \t\]+external name" } } */
>  /* { dg-final { scan-assembler "\"list_name_type0\"+\[ \t\]+\[#;]+\[
> \t\]+external name" } } */

Why not got for

/* { dg-final { scan-assembler "long+\[ \t\]+0x\[0-9a-f]+\[ \t\]+\[#;]+\[ \t\]+P
ub Info Length" } } */

i.e. not checking for a specific length, as gcc.dg/pubtypes-1.c already
does?

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH, Committed] Add myself to MAINTAINERS file

2017-06-26 Thread Maxim Ostapenko

Hi,

when requesting cfarm account, Segher noticed that I didn't add myself 
in MAINTAINERS file when obtained write access to SVN repo. Fixing this now.


-Maxim
diff --git a/MAINTAINERS b/MAINTAINERS
index c76c181..04fcb8a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -523,6 +523,7 @@ Braden Obrzut	
 Carlos O'Donell	
 Peter O'Gorman	
 Andrea Ornstein	
+Maxim Ostapenko	
 Patrick Palka	
 Devang Patel	
 Andris Pavenis	


Re: C/C++ PATCH to implement -Wmultistatement-macros (PR c/80116)

2017-06-26 Thread Marek Polacek
On Mon, Jun 19, 2017 at 12:01:06PM +0200, Marek Polacek wrote:
> On Tue, Jun 13, 2017 at 03:29:32PM +, Joseph Myers wrote:
> > On Tue, 13 Jun 2017, Marek Polacek wrote:
> > 
> > >   * c-parser.c (c_parser_if_body): Set the location of the
> > >   body of the conditional after parsing all the labels.  Call
> > >   warn_for_multistatement_macros.
> > >   (c_parser_else_body): Likewise.
> > >   (c_parser_switch_statement): Likewise.
> > >   (c_parser_while_statement): Likewise.
> > >   (c_parser_for_statement): Likewise.
> > >   (c_parser_statement): Add a default argument.  Save the location
> > >   after labels have been parsed.
> > >   (c_parser_c99_block_statement): Likewise.
> > 
> > The gcc/c/ changes are OK.
> 
> Thanks.
> 
> David, do you have any more comments on the patch?

Seems not, so I'll commit the patch today.

Marek


Re: [PATCH GCC][5/6]Record initialization statements and only insert it for valid chains

2017-06-26 Thread Bin.Cheng
On Fri, May 12, 2017 at 12:28 PM, Bin Cheng  wrote:
> Hi,
> This patch caches initialization statements and only inserts it for valid 
> chains.
> Looks like current code even inserts such stmts for invalid chains which will 
> be
> deleted as dead code afterwards.
>
> Bootstrap and test on x86_64 and AArch64, is it OK?

Ping this one because it's a prerequisite patch for following predcom
enhancement.
Also I updated the patch a little bit.

Bootstrap and test along with following patches of predcom on x86_64
and AArch64.  Is it OK?

Thanks,
bin
2017-06-26  Bin Cheng  

* tree-predcom.c (struct chain): New field init_seq.
(release_chain): Release init_seq.
(prepare_initializers_chain): Record intialization stmts in above
field.  Discard it if chain is invalid.
(insert_init_seqs): New function.
(tree_predictive_commoning_loop): Call insert_init_seqs.
From df8db56c584864ab6c8e6c2b7dcab2d57daf830a Mon Sep 17 00:00:00 2001
From: Bin Cheng 
Date: Mon, 26 Jun 2017 10:33:18 +0100
Subject: [PATCH 2/6] chain-init-seq-20170620.txt

---
 gcc/tree-predcom.c | 25 -
 1 file changed, 24 insertions(+), 1 deletion(-)

diff --git a/gcc/tree-predcom.c b/gcc/tree-predcom.c
index 4547b6d..260caaf 100644
--- a/gcc/tree-predcom.c
+++ b/gcc/tree-predcom.c
@@ -294,6 +294,9 @@ typedef struct chain
   /* Initializers for the variables.  */
   vec inits;
 
+  /* gimple stmts intializing the initial variables of the chain.  */
+  gimple_seq init_seq;
+
   /* True if there is a use of a variable with the maximal distance
  that comes after the root in the loop.  */
   unsigned has_max_use_after : 1;
@@ -511,6 +514,8 @@ release_chain (chain_p chain)
   chain->refs.release ();
   chain->vars.release ();
   chain->inits.release ();
+  if (chain->init_seq)
+gimple_seq_discard (chain->init_seq);
 
   free (chain);
 }
@@ -2457,7 +2462,7 @@ prepare_initializers_chain (struct loop *loop, chain_p chain)
 	}
 
   if (stmts)
-	gsi_insert_seq_on_edge_immediate (entry, stmts);
+	gimple_seq_add_seq (&chain->init_seq, stmts);
 
   chain->inits[i] = init;
 }
@@ -2487,6 +2492,22 @@ prepare_initializers (struct loop *loop, vec chains)
 }
 }
 
+/* Insert all initializing gimple stmts into loop's entry edge.  */
+
+static void
+insert_init_seqs (struct loop *loop, vec chains)
+{
+  unsigned i;
+  edge entry = loop_preheader_edge (loop);
+
+  for (i = 0; i < chains.length (); ++i)
+if (chains[i]->init_seq)
+  {
+	gsi_insert_seq_on_edge_immediate (entry, chains[i]->init_seq);
+	chains[i]->init_seq = NULL;
+  }
+}
+
 /* Performs predictive commoning for LOOP.  Returns true if LOOP was
unrolled.  */
 
@@ -2568,6 +2589,8 @@ tree_predictive_commoning_loop (struct loop *loop)
   /* Try to combine the chains that are always worked with together.  */
   try_combine_chains (&chains);
 
+  insert_init_seqs (loop, chains);
+
   if (dump_file && (dump_flags & TDF_DETAILS))
 {
   fprintf (dump_file, "Before commoning:\n\n");
-- 
1.9.1



[testsuite] Compile c-c++-common/ubsan/sanitize-recover-7.c with -w

2017-06-26 Thread Eric Botcazou
In order to kill "warning: -fsanitize=address not supported for this target". 

Tested on SPARC64/Linux, applied on the mainline and 7 branch as obvious.


2017-06-26  Eric Botcazou  

* c-c++-common/ubsan/sanitize-recover-7.c (dg-options): Add -w.

-- 
Eric BotcazouIndex: c-c++-common/ubsan/sanitize-recover-7.c
===
--- c-c++-common/ubsan/sanitize-recover-7.c	(revision 249619)
+++ c-c++-common/ubsan/sanitize-recover-7.c	(working copy)
@@ -1,5 +1,5 @@
 /* { dg-do compile } */
-/* { dg-options "-fsanitize=address -fsanitize=thread" } */
+/* { dg-options "-fsanitize=address -fsanitize=thread -w" } */
 
 int i;
 


Re: Simplify 3*x == 3*y for wrapping types

2017-06-26 Thread Richard Biener
On Sat, Jun 24, 2017 at 2:34 PM, Marc Glisse  wrote:
> Hello,
>
> I remember wanting to add this when the undefined-overflow case was
> introduced a while ago.
>
> It turns out the tree where I wrote this wasn't clean. Since the rest is
> details, I am including it in this patch, hope it is ok.

Yeah, that's ok.

Besides eventually handling FP (not required in this patch initially)
I think the mults don't need :c as we should canonicalize the constant
to 2nd operand.

Ok with removing :c

Thanks,
Richard.

> Bootstrap + testsuite on powerpc64le-unknown-linux-gnu.
>
> 2017-06-26  Marc Glisse  
>
> gcc/
> * match.pd ((X & ~Y) | (~X & Y)): Generalize to + and ^.
> (x * C EQ/NE y * C): New transformation.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/addadd.c: Remove test duplicated in addadd-2.c.
> * gcc.dg/tree-ssa/mulcmp-1.c: New file.
>
> --
> Marc Glisse


Re: [PATCH GCC][5/6]Record initialization statements and only insert it for valid chains

2017-06-26 Thread Richard Biener
On Mon, Jun 26, 2017 at 11:47 AM, Bin.Cheng  wrote:
> On Fri, May 12, 2017 at 12:28 PM, Bin Cheng  wrote:
>> Hi,
>> This patch caches initialization statements and only inserts it for valid 
>> chains.
>> Looks like current code even inserts such stmts for invalid chains which 
>> will be
>> deleted as dead code afterwards.
>>
>> Bootstrap and test on x86_64 and AArch64, is it OK?
>
> Ping this one because it's a prerequisite patch for following predcom
> enhancement.
> Also I updated the patch a little bit.
>
> Bootstrap and test along with following patches of predcom on x86_64
> and AArch64.  Is it OK?

   if (stmts)
-   gsi_insert_seq_on_edge_immediate (entry, stmts);
+   gimple_seq_add_seq (&chain->init_seq, stmts);

use gimple_seq_add_seq_without_update.

Ok with that change.

Thanks,
Richard.

> Thanks,
> bin
> 2017-06-26  Bin Cheng  
>
> * tree-predcom.c (struct chain): New field init_seq.
> (release_chain): Release init_seq.
> (prepare_initializers_chain): Record intialization stmts in above
> field.  Discard it if chain is invalid.
> (insert_init_seqs): New function.
> (tree_predictive_commoning_loop): Call insert_init_seqs.


Re: Simplify 3*x == 3*y for wrapping types

2017-06-26 Thread Marc Glisse

On Mon, 26 Jun 2017, Richard Biener wrote:


I think the mults don't need :c as we should canonicalize the constant
to 2nd operand.


Oups, I copied the transformation above (which does need :c) and didn't 
notice it was there. Good catch, thanks.


--
Marc Glisse


Re: fenv.h builtins

2017-06-26 Thread Richard Biener
On Fri, Jun 23, 2017 at 5:12 PM, Marc Glisse  wrote:
> Hello,
>
> this is now the complete list of C99 fenv.h functions. I tried to be rather
> conservative, only fegetround is pure, and functions that "raise an
> exception" (in the fenv sense, not the C++ one) do not get nothrow,leaf. We
> can always change that afterwards.
>
> I am not convinced there is much we will be able to do with those, but at
> least they are available now...

Well, the most obvious thing is to compute (at IPA/LTO WPA analysis) whether
a function accesses the environment or not.  Not sure if that somehow helps
optimization ;)  Likewise if we can find regions that have guaranteed nearest
rounding mode that would help (IPA-CPing, eventually even cloning for
this case).

For anything else we'd need explicit fenv state on each stmt I guess...

> Trying to declare those functions with wrong prototypes now gives the
> expected error.
>
> Bootstrap + testsuite on powerpc64le-unknown-linux-gnu.

Ok.

Thanks,
Richard.

> 2017-06-23  Marc Glisse  
>
> * builtin-types.def (BT_FENV_T_PTR, BT_CONST_FENV_T_PTR,
> BT_FEXCEPT_T_PTR, BT_CONST_FEXCEPT_T_PTR): New primitive types.
> (BT_FN_INT_FENV_T_PTR, BT_FN_INT_CONST_FENV_T_PTR,
> BT_FN_INT_FEXCEPT_T_PTR_INT, BT_FN_INT_CONST_FEXCEPT_T_PTR_INT):
> New function types.
> * builtins.def (BUILT_IN_FECLEAREXCEPT, BUILT_IN_FEGETENV,
> BUILT_IN_FEGETEXCEPTFLAG, BUILT_IN_FEGETROUND,
> BUILT_IN_FEHOLDEXCEPT, BUILT_IN_FERAISEEXCEPT,
> BUILT_IN_FESETENV, BUILT_IN_FESETEXCEPTFLAG,
> BUILT_IN_FESETROUND, BUILT_IN_FETESTEXCEPT,
> BUILT_IN_FEUPDATEENV): New builtins.
> * tree-core.h (TI_FENV_T_PTR_TYPE, TI_CONST_FENV_T_PTR_TYPE,
> TI_FEXCEPT_T_PTR_TYPE, TI_CONST_FEXCEPT_T_PTR_TYPE): New entries.
> * tree.h (fenv_t_ptr_type_node, const_fenv_t_ptr_type_node,
> fexcept_t_ptr_type_node, const_fexcept_t_ptr_type_node): New
> macros.
> (builtin_structptr_types): Adjust size.
> * tree.c (builtin_structptr_types): Add four entries.
>
>
> --
> Marc Glisse


Re: [PATCH] Fix PR71815 (SLSR misses PHI opportunities)

2017-06-26 Thread Richard Biener
On Fri, Jun 23, 2017 at 6:06 PM, Bill Schmidt
 wrote:
> Hi,
>
> Here's version 2 of the patch to fix the missed SLSR PHI opportunities,
> addressing Richard's comments.  I've repeated regstrap and SPEC testing
> on powerpc64le-unknown-linux-gnu, again showing the patch as neutral
> with respect to performance.  Is this ok for trunk?

Ok!

Thanks,
Richard.

> Thanks for the review!
>
> Bill
>
>
> [gcc]
>
> 2016-06-23  Bill Schmidt  
>
> * gimple-ssa-strength-reduction.c (uses_consumed_by_stmt): New
> function.
> (find_basis_for_candidate): Call uses_consumed_by_stmt rather than
> has_single_use.
> (slsr_process_phi): Likewise.
> (replace_uncond_cands_and_profitable_phis): Don't replace a
> multiply candidate with a stride of 1 (copy or cast).
> (phi_incr_cost): Call uses_consumed_by_stmt rather than
> has_single_use.
> (lowest_cost_path): Likewise.
> (total_savings): Likewise.
>
> [gcc/testsuite]
>
> 2016-06-23  Bill Schmidt  
>
> * gcc.dg/tree-ssa/slsr-35.c: Remove -fno-code-hoisting workaround.
> * gcc.dg/tree-ssa/slsr-36.c: Likewise.
>
>
> Index: gcc/gimple-ssa-strength-reduction.c
> ===
> --- gcc/gimple-ssa-strength-reduction.c (revision 249223)
> +++ gcc/gimple-ssa-strength-reduction.c (working copy)
> @@ -482,6 +482,36 @@ find_phi_def (tree base)
>return c->cand_num;
>  }
>
> +/* Determine whether all uses of NAME are directly or indirectly
> +   used by STMT.  That is, we want to know whether if STMT goes
> +   dead, the definition of NAME also goes dead.  */
> +static bool
> +uses_consumed_by_stmt (tree name, gimple *stmt, unsigned recurse = 0)
> +{
> +  gimple *use_stmt;
> +  imm_use_iterator iter;
> +  bool retval = true;
> +
> +  FOR_EACH_IMM_USE_STMT (use_stmt, iter, name)
> +{
> +  if (use_stmt == stmt || is_gimple_debug (use_stmt))
> +   continue;
> +
> +  if (!is_gimple_assign (use_stmt)
> + || !gimple_get_lhs (use_stmt)
> + || !is_gimple_reg (gimple_get_lhs (use_stmt))
> + || recurse >= 10
> + || !uses_consumed_by_stmt (gimple_get_lhs (use_stmt), stmt,
> +recurse + 1))
> +   {
> + retval = false;
> + BREAK_FROM_IMM_USE_STMT (iter);
> +   }
> +}
> +
> +  return retval;
> +}
> +
>  /* Helper routine for find_basis_for_candidate.  May be called twice:
> once for the candidate's base expr, and optionally again either for
> the candidate's phi definition or for a CAND_REF's alternative base
> @@ -558,7 +588,8 @@ find_basis_for_candidate (slsr_cand_t c)
>
>   /* If we found a hidden basis, estimate additional dead-code
>  savings if the phi and its feeding statements can be removed.  */
> - if (basis && has_single_use (gimple_phi_result 
> (phi_cand->cand_stmt)))
> + tree feeding_var = gimple_phi_result (phi_cand->cand_stmt);
> + if (basis && uses_consumed_by_stmt (feeding_var, c->cand_stmt))
> c->dead_savings += phi_cand->dead_savings;
> }
>  }
> @@ -789,7 +820,7 @@ slsr_process_phi (gphi *phi, bool speed)
>
>   /* Gather potential dead code savings if the phi statement
>  can be removed later on.  */
> - if (has_single_use (arg))
> + if (uses_consumed_by_stmt (arg, phi))
> {
>   if (gimple_code (arg_stmt) == GIMPLE_PHI)
> savings += arg_cand->dead_savings;
> @@ -2479,7 +2510,9 @@ replace_uncond_cands_and_profitable_phis (slsr_can
>  {
>if (phi_dependent_cand_p (c))
>  {
> -  if (c->kind == CAND_MULT)
> +  /* A multiply candidate with a stride of 1 is just an artifice
> +of a copy or cast; there is no value in replacing it.  */
> +  if (c->kind == CAND_MULT && wi::to_widest (c->stride) != 1)
> {
>   /* A candidate dependent upon a phi will replace a multiply by
>  a constant with an add, and will insert at most one add for
> @@ -2725,8 +2758,9 @@ phi_incr_cost (slsr_cand_t c, const widest_int &in
>   if (gimple_code (arg_def) == GIMPLE_PHI)
> {
>   int feeding_savings = 0;
> + tree feeding_var = gimple_phi_result (arg_def);
>   cost += phi_incr_cost (c, incr, arg_def, &feeding_savings);
> - if (has_single_use (gimple_phi_result (arg_def)))
> + if (uses_consumed_by_stmt (feeding_var, phi))
> *savings += feeding_savings;
> }
>   else
> @@ -2739,7 +2773,7 @@ phi_incr_cost (slsr_cand_t c, const widest_int &in
>   tree basis_lhs = gimple_assign_lhs (basis->cand_stmt);
>   tree lhs = gimple_assign_lhs (arg_cand->cand_stmt);
>   cost += add_cost (true, TYPE_MODE (TREE_TYPE (basis_lhs)));
> - if (has_single_use (lhs))
> + if (use

Re: Simple reassoc transforms in match.pd

2017-06-26 Thread Richard Biener
On Fri, Jun 23, 2017 at 3:12 PM, Marc Glisse  wrote:
> Hello,
>
> here are a few simple transformations, mostly useful for types with
> undefined overflow where we do not have reassoc.
>
> I did not name the testcase reassoc-* to leave that namespace to the realloc
> pass, and -fno-tree-reassoc is just in case someone ever enhances that
> pass...

You probably saw

  /* (T)(P + A) - (T)(P + B) -> (T)A - (T)B */
  (for add (plus pointer_plus)
   (simplify
(minus (convert (add @@0 @1))
 (convert (add @0 @2)))

as you didn't duplicate its functionality.  It misses a :c in one of
the adds for the
PLUS_EXPR case though so it might be worth splitting that out near to your
added cases?  Which then raises the question of handling conversions around
the inner ops in your patterns?

I think the patch is ok as-is but we could improve this as a followup maybe?

Thanks,
Richard.

> Bootstrap + testsuite on powerpc64le-unknown-linux-gnu.
>
> 2017-06-23  Marc Glisse  
>
> gcc/
> * match.pd ((A+-B)+(C-A), (A+B)-(A-C)): New transformations.
>
> gcc/testsuite/
> * gcc.dg/tree-ssa/assoc-1.c: New file.
>
> --
> Marc Glisse


Re: [PATCH GCC][01/13]Introduce internal function IFN_LOOP_DIST_ALIAS

2017-06-26 Thread Richard Sandiford
Just a couple of cosmetic things:

"Bin.Cheng"  writes:
> @@ -225,6 +225,15 @@ struct GTY ((chain_next ("%h.next"))) loop {
>   builtins.  */
>tree simduid;
>  
> +  /* For loops generated by distribution with runtime alias checks, this
> + is a unique identifier of the original distributed loop.  Generally
> + it is the number of the original loop.  IFN_LOOP_DIST_ALIAS builtin
> + uses this id as its first argument.  Give a loop with an id, we can
> + look upward in dominance tree for the corresponding IFN_LOOP_DIST_ALIAS
> + buildin.  Note this id has no meanling after IFN_LOOP_DIST_ALIAS is

s/meanling/meaning/

> +/* Fold LOOP_DIST_ALIAS internal call stmt according to KEEP_P and update
> +   any immediate uses of it's LHS.  Stmt is folded to its second argument

s/it's/its/

Thanks,
Richard


Re: PR81136: ICE from inconsistent DR_MISALIGNMENTs

2017-06-26 Thread Richard Biener
On Fri, Jun 23, 2017 at 2:05 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Thu, Jun 22, 2017 at 1:30 PM, Richard Sandiford
>>  wrote:
>>> The test case triggered this assert in vect_update_misalignment_for_peel:
>>>
>>>   gcc_assert (DR_MISALIGNMENT (dr) / dr_size ==
>>>   DR_MISALIGNMENT (dr_peel) / dr_peel_size);
>>>
>>> We knew that the two DRs had the same misalignment at runtime, but when
>>> considered in isolation, one data reference guaranteed a higher compile-time
>>> base alignment than the other.
>>>
>>> In the test case this looks like a missed opportunity.  Both references
>>> are unconditional, so it should be possible to use the highest of the
>>> available base alignment guarantees when analyzing each reference.
>>> The patch does this.
>>>
>>> However, as the comment in the patch says, the base alignment guarantees
>>> provided by a conditional reference only apply if the reference occurs
>>> at least once.  In this case it would be legitimate for two references
>>> to have the same runtime misalignment and for one reference to provide a
>>> stronger compile-time guarantee than the other about what the misalignment
>>> actually is.  The patch therefore relaxes the assert to handle that case.
>>
>> Hmm, but you don't actually check whether a reference occurs only 
>> conditional,
>> do you?  You just seem to say that for masked loads/stores the reference
>> is conditional (I believe that's not true).  But for a loop like
>>
>>  for (;;)
>>if (a[i])
>>  sum += b[j];
>>
>> you still assume b[j] executes unconditionally?
>
> Maybe the documentation isn't clear enough, but DR_IS_CONDITIONAL
> was supposed to mean "even if the containing statement executes
> and runs to completion, the reference might not actually occur".
> The example above isn't conditional in that sense because the
> reference to b[j] does occur if the store is reached and completes.
>
> Masked loads and stores are conditional in that sense though.
> The reference only occurs if the mask is nonzero; the memory
> isn't touched otherwise.  The functions are used to if-convert
> things like:
>
>for (...)
>  a[i] = b[i] ? c[i] : d[i];
>
> where there's no guarantee that it's safe to access c[i] when !b[i]
> (or d[i] when b[i]).  No reference occurs for an all-false mask.

But as you touch generic data-ref code here you should apply more
sensible semantics to DR_IS_CONDITIONAL than just marking
masked loads/stores but not DRs occuring inside BBs only executed
conditionally ...

>> The vectorizer of course only sees unconditionally executed stmts.
>>
>> So - I'd simply not add this DR_IS_CONDITIONAL.  Did you run into
>> any real-world (testsuite) issues without this?
>
> Dropping DR_IS_CONDITIONAL would cause us to make invalid alignment
> assumptions in silly corner cases.  I could add a scan test for it,
> for targets with masked loads and stores.  It wouldn't trigger
> an execution failure though because we assume that targets with
> masked loads and stores allow unaligned accesses:
>
>   /* For now assume all conditional loads/stores support unaligned
>  access without any special code.  */
>   if (is_gimple_call (stmt)
>   && gimple_call_internal_p (stmt)
>   && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD
>   || gimple_call_internal_fn (stmt) == IFN_MASK_STORE))
> return dr_unaligned_supported;
>
> So the worst that would happen is that we'd supposedly peel for
> alignment, but actually misalign everything instead, and so make
> things slower rather than quicker.
>
>> Note that the assert is to prevent bogus information.  Iff we aligned
>> DR with base alignment 8 and misalign 3 then if another same-align
>> DR has base alignment 16 we can't simply zero its DR_MISALIGNMENT
>> as it still can be 8 after aligning DR.
>>
>> So I think it's wrong to put DRs with differing base-alignment into
>> the same-align-refs chain, those should get their DR_MISALIGNMENT
>> updated independenlty after peeling.
>
> DR_MISALIGNMENT is relative to the vector alignment rather than
> the base alignment though.  So:

We seem to use it that way, yes (looking at set_ptr_info_alignment
uses).  So why not fix the assert then by capping the alignment/misalignment
we compute at this value as well?  (and document this in the header
around DR_MISALIGNMENT)

Ideally we'd do alignment analysis independent of the vector size
though (for those stupid targets with multiple vector sizes to consider...).

> a) when looking for references *A1 and *A2 with the same alignment,
>we simply have to prove that A1 % vecalign == A2 % vecalign.
>This doesn't require any knowledge about the base alignment.
>If we break the addresses down as:
>
>   A1 = BASE1 + REST1,  REST1 = INIT1 + OFFSET1 + X * STEP1
>   A2 = BASE2 + REST2,  REST2 = INIT2 + OFFSET2 + X * STEP2
>
>and can prove that BASE1 == BASE2, the alignment of that base
>isn't important.  We simply ne

Re: PR81136: ICE from inconsistent DR_MISALIGNMENTs

2017-06-26 Thread Richard Biener
On Mon, Jun 26, 2017 at 12:25 PM, Richard Biener
 wrote:
> On Fri, Jun 23, 2017 at 2:05 PM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Thu, Jun 22, 2017 at 1:30 PM, Richard Sandiford
>>>  wrote:
 The test case triggered this assert in vect_update_misalignment_for_peel:

   gcc_assert (DR_MISALIGNMENT (dr) / dr_size ==
   DR_MISALIGNMENT (dr_peel) / dr_peel_size);

 We knew that the two DRs had the same misalignment at runtime, but when
 considered in isolation, one data reference guaranteed a higher 
 compile-time
 base alignment than the other.

 In the test case this looks like a missed opportunity.  Both references
 are unconditional, so it should be possible to use the highest of the
 available base alignment guarantees when analyzing each reference.
 The patch does this.

 However, as the comment in the patch says, the base alignment guarantees
 provided by a conditional reference only apply if the reference occurs
 at least once.  In this case it would be legitimate for two references
 to have the same runtime misalignment and for one reference to provide a
 stronger compile-time guarantee than the other about what the misalignment
 actually is.  The patch therefore relaxes the assert to handle that case.
>>>
>>> Hmm, but you don't actually check whether a reference occurs only 
>>> conditional,
>>> do you?  You just seem to say that for masked loads/stores the reference
>>> is conditional (I believe that's not true).  But for a loop like
>>>
>>>  for (;;)
>>>if (a[i])
>>>  sum += b[j];
>>>
>>> you still assume b[j] executes unconditionally?
>>
>> Maybe the documentation isn't clear enough, but DR_IS_CONDITIONAL
>> was supposed to mean "even if the containing statement executes
>> and runs to completion, the reference might not actually occur".
>> The example above isn't conditional in that sense because the
>> reference to b[j] does occur if the store is reached and completes.
>>
>> Masked loads and stores are conditional in that sense though.
>> The reference only occurs if the mask is nonzero; the memory
>> isn't touched otherwise.  The functions are used to if-convert
>> things like:
>>
>>for (...)
>>  a[i] = b[i] ? c[i] : d[i];
>>
>> where there's no guarantee that it's safe to access c[i] when !b[i]
>> (or d[i] when b[i]).  No reference occurs for an all-false mask.
>
> But as you touch generic data-ref code here you should apply more
> sensible semantics to DR_IS_CONDITIONAL than just marking
> masked loads/stores but not DRs occuring inside BBs only executed
> conditionally ...
>
>>> The vectorizer of course only sees unconditionally executed stmts.
>>>
>>> So - I'd simply not add this DR_IS_CONDITIONAL.  Did you run into
>>> any real-world (testsuite) issues without this?
>>
>> Dropping DR_IS_CONDITIONAL would cause us to make invalid alignment
>> assumptions in silly corner cases.  I could add a scan test for it,
>> for targets with masked loads and stores.  It wouldn't trigger
>> an execution failure though because we assume that targets with
>> masked loads and stores allow unaligned accesses:
>>
>>   /* For now assume all conditional loads/stores support unaligned
>>  access without any special code.  */
>>   if (is_gimple_call (stmt)
>>   && gimple_call_internal_p (stmt)
>>   && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD
>>   || gimple_call_internal_fn (stmt) == IFN_MASK_STORE))
>> return dr_unaligned_supported;
>>
>> So the worst that would happen is that we'd supposedly peel for
>> alignment, but actually misalign everything instead, and so make
>> things slower rather than quicker.
>>
>>> Note that the assert is to prevent bogus information.  Iff we aligned
>>> DR with base alignment 8 and misalign 3 then if another same-align
>>> DR has base alignment 16 we can't simply zero its DR_MISALIGNMENT
>>> as it still can be 8 after aligning DR.
>>>
>>> So I think it's wrong to put DRs with differing base-alignment into
>>> the same-align-refs chain, those should get their DR_MISALIGNMENT
>>> updated independenlty after peeling.
>>
>> DR_MISALIGNMENT is relative to the vector alignment rather than
>> the base alignment though.  So:
>
> We seem to use it that way, yes (looking at set_ptr_info_alignment
> uses).  So why not fix the assert then by capping the alignment/misalignment
> we compute at this value as well?  (and document this in the header
> around DR_MISALIGNMENT)
>
> Ideally we'd do alignment analysis independent of the vector size
> though (for those stupid targets with multiple vector sizes to consider...).
>
>> a) when looking for references *A1 and *A2 with the same alignment,
>>we simply have to prove that A1 % vecalign == A2 % vecalign.
>>This doesn't require any knowledge about the base alignment.
>>If we break the addresses down as:
>>
>>   A1 = BASE1 + REST1,  REST1 = INIT1 + OFFSET

[PATCH] Fix PR81203

2017-06-26 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2017-06-26  Richard Biener  

PR tree-optimization/81203
* tree-tailcall.c (find_tail_calls): Do not move stmts into
non-dominating BBs.

* gcc.dg/torture/pr81203.c: New testcase.

Index: gcc/tree-tailcall.c
===
--- gcc/tree-tailcall.c (revision 249638)
+++ gcc/tree-tailcall.c (working copy)
@@ -573,6 +573,11 @@ find_tail_calls (basic_block bb, struct
{
  if (! tail_recursion)
return;
+ /* Do not deal with checking dominance, the real fix is to
+do path isolation for the transform phase anyway, removing
+the need to compute the accumulators with new stmts.  */
+ if (abb != bb)
+   return;
  for (unsigned opno = 1; opno < gimple_num_ops (stmt); ++opno)
{
  tree op = gimple_op (stmt, opno);
Index: gcc/testsuite/gcc.dg/torture/pr81203.c
===
--- gcc/testsuite/gcc.dg/torture/pr81203.c  (nonexistent)
+++ gcc/testsuite/gcc.dg/torture/pr81203.c  (working copy)
@@ -0,0 +1,10 @@
+/* { dg-do compile } */
+
+int a;
+int b()
+{
+  int c, d;
+  if (a)
+d = b();
+  return 1 + c + d;
+}


[PATCH][AArch64] Fix ldp/stp patterns for ILP32

2017-06-26 Thread Wilco Dijkstra
The ldp/stp patterns call plus_constant which forces the mode to Pmode.
However in ILP32 addresses are SImode.  This may result in an assert if
an ldp/stp pattern is tested with a SImode pointer.  Fix this by using
the mode of the pointer rather than Pmode.

This fixes a failure in gcc.target/aarch64/reload-valid-spoff.c triggered
by https://gcc.gnu.org/ml/gcc-patches/2017-06/msg01367.html.

OK for commit?

ChangeLog:
2017-06-26  Wilco Dijkstra  

* config/aarch64/aarch64.md (load_pairsi): Avoid Pmode.
(store_pairsi): Likewise.
(load_pairdi): Likewise.
(store_pairdi): Likewise.
(load_pairsf): Likewise.
(store_pairsf): Likewise.
(load_pairdf): Likewise.
(store_pairdf): Likewise.
(load_pair_extendsidi2_aarch64): Likewise.
(load_pair_zero_extendsidi2_aarch64): Likewise.
* config/aarch64/aarch64-simd.md (load_pair): Likewise.
(store_pair): Likewise.
--
diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 
d6e10427b324449eee90871682a59ed4c7d03b42..46816f7766d4a830536c9bc52e52f013d44bee40
 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -173,7 +173,7 @@ (define_insn "load_pair"
(match_operand:VD 3 "memory_operand" "m"))]
   "TARGET_SIMD
&& rtx_equal_p (XEXP (operands[3], 0),
-  plus_constant (Pmode,
+  plus_constant (GET_MODE (XEXP (operands[1], 0)),
  XEXP (operands[1], 0),
  GET_MODE_SIZE (mode)))"
   "ldp\\t%d0, %d2, %1"
@@ -187,7 +187,7 @@ (define_insn "store_pair"
(match_operand:VD 3 "register_operand" "w"))]
   "TARGET_SIMD
&& rtx_equal_p (XEXP (operands[2], 0),
-  plus_constant (Pmode,
+  plus_constant (GET_MODE (XEXP (operands[0], 0)),
  XEXP (operands[0], 0),
  GET_MODE_SIZE (mode)))"
   "stp\\t%d1, %d3, %0"
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 
82f9f2d6af89db327eae3cb8eadcde850183dfb6..48c4c566d72c989c9d8d509866422039478f4b39
 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -1185,7 +1185,7 @@ (define_insn "load_pairsi"
(set (match_operand:SI 2 "register_operand" "=r,*w")
(match_operand:SI 3 "memory_operand" "m,m"))]
   "rtx_equal_p (XEXP (operands[3], 0),
-   plus_constant (Pmode,
+   plus_constant (GET_MODE (XEXP (operands[1], 0)),
   XEXP (operands[1], 0),
   GET_MODE_SIZE (SImode)))"
   "@
@@ -1201,7 +1201,7 @@ (define_insn "load_pairdi"
(set (match_operand:DI 2 "register_operand" "=r,*w")
(match_operand:DI 3 "memory_operand" "m,m"))]
   "rtx_equal_p (XEXP (operands[3], 0),
-   plus_constant (Pmode,
+   plus_constant (GET_MODE (XEXP (operands[1], 0)),
   XEXP (operands[1], 0),
   GET_MODE_SIZE (DImode)))"
   "@
@@ -1220,7 +1220,7 @@ (define_insn "store_pairsi"
(set (match_operand:SI 2 "memory_operand" "=m,m")
(match_operand:SI 3 "aarch64_reg_or_zero" "rZ,*w"))]
   "rtx_equal_p (XEXP (operands[2], 0),
-   plus_constant (Pmode,
+   plus_constant (GET_MODE (XEXP (operands[0], 0)),
   XEXP (operands[0], 0),
   GET_MODE_SIZE (SImode)))"
   "@
@@ -1236,7 +1236,7 @@ (define_insn "store_pairdi"
(set (match_operand:DI 2 "memory_operand" "=m,m")
(match_operand:DI 3 "aarch64_reg_or_zero" "rZ,*w"))]
   "rtx_equal_p (XEXP (operands[2], 0),
-   plus_constant (Pmode,
+   plus_constant (GET_MODE (XEXP (operands[0], 0)),
   XEXP (operands[0], 0),
   GET_MODE_SIZE (DImode)))"
   "@
@@ -1254,7 +1254,7 @@ (define_insn "load_pairsf"
(set (match_operand:SF 2 "register_operand" "=w,*r")
(match_operand:SF 3 "memory_operand" "m,m"))]
   "rtx_equal_p (XEXP (operands[3], 0),
-   plus_constant (Pmode,
+   plus_constant (GET_MODE (XEXP (operands[1], 0)),
   XEXP (operands[1], 0),
   GET_MODE_SIZE (SFmode)))"
   "@
@@ -1270,7 +1270,7 @@ (define_insn "load_pairdf"
(set (match_operand:DF 2 "register_operand" "=w,*r")
(match_operand:DF 3 "memory_operand" "m,m"))]
   "rtx_equal_p (XEXP (operands[3], 0),
-   plus_constant (Pmode,
+   plus_constant (GET_MODE (XEXP (operands[1], 0)),
   XEXP (operands[1], 0),
   GET_MODE_SIZE (DFmode)))"
   "@
@@ -1288,7 +1288,7 @@ (define_insn "store_pairsf"
(set (match_operand:SF 2 "memory_operand" "=m,m")
(match_operand:SF 3 "aarch64_reg_or_fp_zero" "w,*rY"))]
   "rtx_equal_p (XEXP 

Re: [PATCH][ARM] Fix static analysis warnings in arm backend

2017-06-26 Thread Kyrill Tkachov

Hi Michael,

On 23/06/17 21:44, Michael Collison wrote:

This patch cleans up warning messages due to unused variables and overly 
complicated loop structures.

Okay for trunk?


Ok.
Thanks,
Kyrill


2017-03-30  Michael Collison  

PR target/68535
* config/arm/arm.c (gen_ldm_seq): Remove last unnecessary
set of base_reg
(arm_gen_movmemqi): Removed unused variable 'i'.
Convert 'for' loop into 'while' loop.
(arm_expand_prologue): Remove last unnecessary set of insn.
(thumb_pop): Remove unused variable 'pushed_words'.
(thumb_exit): Remove last unnecessary set of regs_to_pop.




Re: [PATCH][GCC][AArch64] optimize float immediate moves (3 /4) - testsuite.

2017-06-26 Thread Tamar Christina
Hi,

With the changes in the patches the testsuite had a minor update in the 
assembler scan.
I've posted the patch but will assume it's OK based on the previous OK for 
trunk and
the fact that this can fall in the obvious rule.

Thanks,
Tamar

From: James Greenhalgh 
Sent: Wednesday, June 14, 2017 10:11:19 AM
To: Tamar Christina
Cc: GCC Patches; nd; Richard Earnshaw; Marcus Shawcroft
Subject: Re: [PATCH][GCC][AArch64] optimize float immediate moves (3 /4) - 
testsuite.

On Wed, Jun 07, 2017 at 12:38:41PM +0100, Tamar Christina wrote:
> Hi All,
>
>
> This patch adds new tests to cover the newly generated code from this patch 
> series.
>
>
> Regression tested on aarch64-none-linux-gnu and no regressions.
>
> OK for trunk?

OK.

Thanks,
James

>
> gcc/testsuite/
> 2017-06-07  Tamar Christina  
>   Bilyan Borisov  
>
>   * gcc.target/aarch64/dbl_mov_immediate_1.c: New.
>   * gcc.target/aarch64/flt_mov_immediate_1.c: New.
>   * gcc.target/aarch64/f16_mov_immediate_1.c: New.
>   * gcc.target/aarch64/f16_mov_immediate_2.c: New.


diff --git a/gcc/testsuite/gcc.target/aarch64/dbl_mov_immediate_1.c b/gcc/testsuite/gcc.target/aarch64/dbl_mov_immediate_1.c
new file mode 100644
index ..eb5b23b8f842c1f299bd58c8f944dce6234c111b
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/dbl_mov_immediate_1.c
@@ -0,0 +1,52 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+
+double d0(void)
+{
+  double x = 0.0d;
+  return x;
+}
+
+double dn1(void)
+{
+  double x = -0.0d;
+  return x;
+}
+
+
+double d1(void)
+{
+  double x = 1.5d;
+  return x;
+}
+
+double d2(void)
+{
+  double x = 123256.0d;
+  return x;
+}
+
+double d3(void)
+{
+  double x = 123256123456.0d;
+  return x;
+}
+
+double d4(void)
+{
+  double x = 123456123456123456.0d;
+  return x;
+}
+
+/* { dg-final { scan-assembler-times "movi\td\[0-9\]+, ?#0" 1 } } */
+
+/* { dg-final { scan-assembler-times "adrp\tx\[0-9\]+, \.LC\[0-9\]" 2 } } */
+/* { dg-final { scan-assembler-times "ldr\td\[0-9\]+, \\\[x\[0-9\], #:lo12:\.LC\[0-9\]\\\]" 2 } } */
+
+/* { dg-final { scan-assembler-times "fmov\td\[0-9\]+, 1\\\.5e\\\+0"1 } } */
+
+/* { dg-final { scan-assembler-times "mov\tx\[0-9\]+, 25838523252736"   1 } } */
+/* { dg-final { scan-assembler-times "movk\tx\[0-9\]+, 0x40fe, lsl 48"  1 } } */
+/* { dg-final { scan-assembler-times "mov\tx\[0-9\]+, -9223372036854775808" 1 } } */
+/* { dg-final { scan-assembler-times "fmov\td\[0-9\]+, x\[0-9\]+"   2 } } */
+
diff --git a/gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_1.c b/gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_1.c
new file mode 100644
index ..1ed3831e139745227487eafa3ccfdc05c99deb34
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_1.c
@@ -0,0 +1,49 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */
+/* { dg-add-options arm_v8_2a_fp16_scalar } */
+
+extern __fp16 foo ();
+extern void bar (__fp16* x);
+
+void f1 ()
+{
+  volatile __fp16 a = 17.0;
+}
+
+
+void f2 (__fp16 *a)
+{
+  *a = 17.0;
+}
+
+void f3 ()
+{
+  __fp16 b = foo ();
+  b = 17.0;
+  bar (&b);
+}
+
+__fp16 f4 ()
+{
+  __fp16 a = 0;
+  __fp16 b = 1;
+  __fp16 c = 2;
+  __fp16 d = 4;
+
+  __fp16 z = a + b;
+  z = z + c;
+  z = z - d;
+  return z;
+}
+
+__fp16 f5 ()
+{
+  __fp16 a = 16;
+  bar (&a);
+  return a;
+}
+
+/* { dg-final { scan-assembler-times "mov\tw\[0-9\]+, #?19520"   3 } } */
+/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.2s, 0xbc, lsl 8"  1 } } */
+/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.2s, 0x4c, lsl 8"  1 } } */
diff --git a/gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_2.c b/gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_2.c
new file mode 100644
index ..6f44821e9d08d4c3b87eb52b70491183a32ac2ff
--- /dev/null
+++ b/gcc/testsuite/gcc.target/aarch64/f16_mov_immediate_2.c
@@ -0,0 +1,45 @@
+/* { dg-do compile } */
+/* { dg-options "-O3" } */
+/* { dg-require-effective-target arm_v8_2a_fp16_scalar_ok } */
+/* { dg-add-options arm_v8_2a_fp16_scalar } */
+
+#include 
+
+float16_t f0(void)
+{
+  float16_t x = 0.0f;
+  return x;
+}
+
+float16_t fn1(void)
+{
+  float16_t x = -0.0f;
+  return x;
+}
+
+float16_t f1(void)
+{
+  float16_t x = 256.0f;
+  return x;
+}
+
+float16_t f2(void)
+{
+  float16_t x = 123256.0f;
+  return x;
+}
+
+float16_t f3(void)
+{
+  float16_t x = 17.0;
+  return x;
+}
+
+/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.4h, ?#0" 1 } } */
+/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.2s, 0x80, lsl 8" 1 } } */
+/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.2s, 0x5c, lsl 8" 1 } } */
+/* { dg-final { scan-assembler-times "movi\tv\[0-9\]+\\\.2s, 0x7c, lsl 8" 1 } } */
+
+/* { dg-final { scan-assembler-times "mov\tw\[0-9\]+, 19

Re: [PATCH][GCC][AArch64] optimize float immediate moves (1 /4) - infrastructure.

2017-06-26 Thread Tamar Christina
Hi All,

I've updated patch accordingly.

This mostly involves removing the loop to create the ival
and removing the *2 code and instead defaulting to 64bit
and switching to 128 when needed.

Regression tested on aarch64-none-linux-gnu and no regressions.

OK for trunk?

Thanks,
Tamar


gcc/
2017-06-26  Tamar Christina  

* config/aarch64/aarch64.c
(aarch64_simd_container_mode): Add prototype.
(aarch64_expand_mov_immediate): Add HI support.
(aarch64_reinterpret_float_as_int, aarch64_float_const_rtx_p: New.
(aarch64_can_const_movi_rtx_p): New.
(aarch64_preferred_reload_class):
Remove restrictions of using FP registers for certain SIMD operations.
(aarch64_rtx_costs): Added new cost for CONST_DOUBLE moves.
(aarch64_valid_floating_const): Add integer move validation.
(aarch64_simd_imm_scalar_p): Remove.
(aarch64_output_scalar_simd_mov_immediate): Generalize function.
(aarch64_legitimate_constant_p): Expand list of supported cases.
* config/aarch64/aarch64-protos.h
(aarch64_float_const_rtx_p, aarch64_can_const_movi_rtx_p): New.
(aarch64_reinterpret_float_as_int): New.
(aarch64_simd_imm_scalar_p): Remove.
* config/aarch64/predicates.md (aarch64_reg_or_fp_float): New.
* config/aarch64/constraints.md (Uvi): New.
(Dd): Split into Ds and new Dd.
* config/aarch64/aarch64.md (*movsi_aarch64):
Add SIMD mov case.
(*movdi_aarch64): Add SIMD mov case.

From: Tamar Christina
Sent: Thursday, June 15, 2017 1:50:19 PM
To: James Greenhalgh
Cc: Richard Sandiford; GCC Patches; nd; Marcus Shawcroft; Richard Earnshaw
Subject: RE: [PATCH][GCC][AArch64] optimize float immediate moves (1 /4) - 
infrastructure.

>
> This patch is pretty huge, are there any opportunities to further split it to 
> aid
> review?

Unfortunately because I'm also changing some constraints it introduced a bit of 
a dependency cycle.
If I were to break it up more, the individual patches won't work on their own 
anymore. If this is acceptable
I can break it up more.

> > +  ival = zext_hwi (res[needed - 1], 32);  for (int i = needed - 2; i
> > + >= 0; i--)
> > +{
> > +  ival <<= 32;
> > +  ival |= zext_hwi (res[i], 32);
> > +}
> > +
> > +  *intval = ival;
>
> ???
>
> Two cases here, needed is either 2 if GET_MODE_BITSIZE (mode) == 64, or it
> is 1 otherwise. So i starts at either -1 or 0. So this for loop either runs
> 0 or 1 times. What am I missing? I'm sure this is all an indirect way of
> writing:
>

Yes, the code was set up to be easily extended to support 128 floats as well,
Which was deprioritized. I'll just remove the loop.

> > +
> > +  /* Determine whether it's cheaper to write float constants as
> > + mov/movk pairs over ldr/adrp pairs.  */  unsigned HOST_WIDE_INT
> > + ival;
> > +
> > +  if (GET_CODE (x) == CONST_DOUBLE
> > +  && SCALAR_FLOAT_MODE_P (mode)
> > +  && aarch64_reinterpret_float_as_int (x, &ival))
> > +{
> > +  machine_mode imode = mode == HFmode ? SImode :
> int_mode_for_mode (mode);
> > +  int num_instr = aarch64_internal_mov_immediate
> > +   (NULL_RTX, gen_int_mode (ival, imode), false,
> imode);
> > +  return num_instr < 3;
>
> Should this cost model be static on a magin number? Is it not the case that
> the decision should be based on the relative speeds of a memory access
> compared with mov/movk/fmov ?
>

As far as I'm aware, the cost model is too simplistic to be able to express the
Actual costs of mov/movk and movk/movk pairs. E.g it doesn't take into account
The latency and throughput difference when the instructions occur in 
sequence/pairs.

This leads to it allowing a smaller subset through here then what would be 
beneficial.

> > +/* Return TRUE if rtx X is immediate constant that fits in a single
> > +   MOVI immediate operation.  */
> > +bool
> > +aarch64_can_const_movi_rtx_p (rtx x, machine_mode mode) {
> > +  if (!TARGET_SIMD)
> > + return false;
> > +
> > +  machine_mode vmode, imode;
> > +  unsigned HOST_WIDE_INT ival;
> > +
> > +  /* Don't write float constants out to memory.  */
> > +  if (GET_CODE (x) == CONST_DOUBLE
> > +  && SCALAR_FLOAT_MODE_P (mode))
> > +{
> > +  if (!aarch64_reinterpret_float_as_int (x, &ival))
> > +   return false;
> > +
> > +  imode = int_mode_for_mode (mode);
> > +}
> > +  else if (GET_CODE (x) == CONST_INT
> > +  && SCALAR_INT_MODE_P (mode))
> > +{
> > +   imode = mode;
> > +   ival = INTVAL (x);
> > +}
> > +  else
> > +return false;
> > +
> > +  unsigned width = GET_MODE_BITSIZE (mode) * 2;
>
> Why * 2? It isn't obvious to me from my understanding of movi why that
> would be better than just clamping to 64-bit?

The idea is to get the smallest vector mode for the given mode.
For SF that's V2SF and DF: V2DF, which is why the *2. Clamping to 64 bit there
would be no 

Re: [PATCH][GCC][AArch64] optimize float immediate moves (2 /4) - HF/DF/SF mode.

2017-06-26 Thread Tamar Christina
Hi all,

Here's the re-spun patch.
Aside from the grouping of the split patterns it now also uses h register for 
the fmov for HF when available,
otherwise it forces a literal load.

Regression tested on aarch64-none-linux-gnu and no regressions.

OK for trunk?

Thanks,
Tamar


gcc/
2017-06-26  Tamar Christina  
Richard Sandiford 

* config/aarch64/aarch64.md (mov): Generalize.
(*movhf_aarch64, *movsf_aarch64, *movdf_aarch64):
Add integer and movi cases.
(movi-split-hf-df-sf split, fp16): New.
(enabled): Added TARGET_FP_F16INST.
* config/aarch64/iterators.md (GPF_HF): New.

From: Tamar Christina
Sent: Wednesday, June 21, 2017 11:48:33 AM
To: James Greenhalgh
Cc: GCC Patches; nd; Marcus Shawcroft; Richard Earnshaw
Subject: RE: [PATCH][GCC][AArch64] optimize float immediate moves (2 /4) - 
HF/DF/SF mode.

> > movi\\t%0.4h, #0
> > -   mov\\t%0.h[0], %w1
> > +   fmov\\t%s0, %w1
>
> Should this not be %h0?

The problem is that H registers are only available in ARMv8.2+,
I'm not sure what to do about ARMv8.1 given your other feedback
Pointing out that the bit patterns between how it's stored in s vs h registers
differ.

>
> > umov\\t%w0, %1.h[0]
> > mov\\t%0.h[0], %1.h[0]
> > +   fmov\\t%s0, %1
>
> Likewise, and much more important for correctness as it changes the way the
> bit pattern ends up in the register (see table C2-1 in release B.a of the ARM
> Architecture Reference Manual for ARMv8-A), here.
>
> > +   * return aarch64_output_scalar_simd_mov_immediate (operands[1],
> > + SImode);
> > ldr\\t%h0, %1
> > str\\t%h1, %0
> > ldrh\\t%w0, %1
> > strh\\t%w1, %0
> > mov\\t%w0, %w1"
> > -  [(set_attr "type"
> "neon_move,neon_from_gp,neon_to_gp,neon_move,\
> > - f_loads,f_stores,load1,store1,mov_reg")
> > -   (set_attr "simd" "yes,yes,yes,yes,*,*,*,*,*")]
> > +  "&& can_create_pseudo_p ()
> > +   && !aarch64_can_const_movi_rtx_p (operands[1], HFmode)
> > +   && !aarch64_float_const_representable_p (operands[1])
> > +   &&  aarch64_float_const_rtx_p (operands[1])"
> > +  [(const_int 0)]
> > +  "{
> > +unsigned HOST_WIDE_INT ival;
> > +if (!aarch64_reinterpret_float_as_int (operands[1], &ival))
> > +  FAIL;
> > +
> > +rtx tmp = gen_reg_rtx (SImode);
> > +aarch64_expand_mov_immediate (tmp, GEN_INT (ival));
> > +tmp = simplify_gen_subreg (HImode, tmp, SImode, 0);
> > +emit_move_insn (operands[0], gen_lowpart (HFmode, tmp));
> > +DONE;
> > +  }"
> > +  [(set_attr "type" "neon_move,f_mcr,neon_to_gp,neon_move,fconsts,
> \
> > +neon_move,f_loads,f_stores,load1,store1,mov_reg")
> > +   (set_attr "simd" "yes,*,yes,yes,*,yes,*,*,*,*,*")]
> >  )
>
> Thanks,
> James

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 1a721bfbe42270ec75268b6e2366290aa6ad2134..c951efc383c17ea81800e482e39760eb17830c0a 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -181,6 +181,11 @@
 ;; will be disabled when !TARGET_FLOAT.
 (define_attr "fp" "no,yes" (const_string "no"))
 
+;; Attribute that specifies whether or not the instruction touches half
+;; precision fp registers.  When this is set to yes for an alternative,
+;; that alternative will be disabled when !TARGET_FP_F16INST.
+(define_attr "fp16" "no,yes" (const_string "no"))
+
 ;; Attribute that specifies whether or not the instruction touches simd
 ;; registers.  When this is set to yes for an alternative, that alternative
 ;; will be disabled when !TARGET_SIMD.
@@ -194,11 +199,14 @@
 ;; registers when -mgeneral-regs-only is specified.
 (define_attr "enabled" "no,yes"
   (cond [(ior
-	(and (eq_attr "fp" "yes")
-	 (eq (symbol_ref "TARGET_FLOAT") (const_int 0)))
-	(and (eq_attr "simd" "yes")
-	 (eq (symbol_ref "TARGET_SIMD") (const_int 0
-	 (const_string "no")
+	(ior
+		(and (eq_attr "fp" "yes")
+		 (eq (symbol_ref "TARGET_FLOAT") (const_int 0)))
+		(and (eq_attr "simd" "yes")
+		 (eq (symbol_ref "TARGET_SIMD") (const_int 0
+	(and (eq_attr "fp16" "yes")
+		 (eq (symbol_ref "TARGET_FP_F16INST") (const_int 0
+	(const_string "no")
 	] (const_string "yes")))
 
 ;; Attribute that specifies whether we are dealing with a branch to a
@@ -1062,65 +1070,94 @@
 )
 
 (define_insn "*movhf_aarch64"
-  [(set (match_operand:HF 0 "nonimmediate_operand" "=w,w  ,?r,w,w,m,r,m ,r")
-	(match_operand:HF 1 "general_operand"  "Y ,?rY, w,w,m,w,m,rY,r"))]
+  [(set (match_operand:HF 0 "nonimmediate_operand" "=w,w  ,?r,w,w  ,w  ,w,m,r,m ,r")
+	(match_operand:HF 1 "general_operand"  "Y ,?rY, w,w,Ufc,Uvi,m,w,m,rY,r"))]
   "TARGET_FLOAT && (register_operand (operands[0], HFmode)
-|| aarch64_reg_or_fp_zero (operands[1], HFmode))"
+|| aarch64_reg_or_fp_float (operands[1], HFmode))"
   "@
movi\\t%0.4h, #0
-   mov\\t%0.h[0], %w1
+   fmov\\t%h0, %w1
umov\\t%w0, %1.h[0]
mov\\t%0.h[0], %1.h[0]
+   fmov\

Re: PR81136: ICE from inconsistent DR_MISALIGNMENTs

2017-06-26 Thread Richard Sandiford
Richard Biener  writes:
> On Fri, Jun 23, 2017 at 2:05 PM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Thu, Jun 22, 2017 at 1:30 PM, Richard Sandiford
>>>  wrote:
 The test case triggered this assert in vect_update_misalignment_for_peel:

   gcc_assert (DR_MISALIGNMENT (dr) / dr_size ==
   DR_MISALIGNMENT (dr_peel) / dr_peel_size);

 We knew that the two DRs had the same misalignment at runtime, but when
 considered in isolation, one data reference guaranteed a higher 
 compile-time
 base alignment than the other.

 In the test case this looks like a missed opportunity.  Both references
 are unconditional, so it should be possible to use the highest of the
 available base alignment guarantees when analyzing each reference.
 The patch does this.

 However, as the comment in the patch says, the base alignment guarantees
 provided by a conditional reference only apply if the reference occurs
 at least once.  In this case it would be legitimate for two references
 to have the same runtime misalignment and for one reference to provide a
 stronger compile-time guarantee than the other about what the misalignment
 actually is.  The patch therefore relaxes the assert to handle that case.
>>>
>>> Hmm, but you don't actually check whether a reference occurs only
> conditional,
>>> do you?  You just seem to say that for masked loads/stores the reference
>>> is conditional (I believe that's not true).  But for a loop like
>>>
>>>  for (;;)
>>>if (a[i])
>>>  sum += b[j];
>>>
>>> you still assume b[j] executes unconditionally?
>>
>> Maybe the documentation isn't clear enough, but DR_IS_CONDITIONAL
>> was supposed to mean "even if the containing statement executes
>> and runs to completion, the reference might not actually occur".
>> The example above isn't conditional in that sense because the
>> reference to b[j] does occur if the store is reached and completes.
>>
>> Masked loads and stores are conditional in that sense though.
>> The reference only occurs if the mask is nonzero; the memory
>> isn't touched otherwise.  The functions are used to if-convert
>> things like:
>>
>>for (...)
>>  a[i] = b[i] ? c[i] : d[i];
>>
>> where there's no guarantee that it's safe to access c[i] when !b[i]
>> (or d[i] when b[i]).  No reference occurs for an all-false mask.
>
> But as you touch generic data-ref code here you should apply more
> sensible semantics to DR_IS_CONDITIONAL than just marking
> masked loads/stores but not DRs occuring inside BBs only executed
> conditionally ...

I don't see why that's more sensible though.  If a statement is only
conditionally executed in a loop, it's up to the consumer to decide
what to do about that.  The conditions under which the statement
is reached are a control-flow issue and tree-data-ref.c doesn't
have any special information about it.

Masked loads and stores are special because the DR_REFs created by
tree-data-ref.c are artificial: they didn't exist as MEM_REFs in the
original DR_STMT.  And AIUI they didn't exist as MEM_REFs precisely
because they're not guaranteed to happen, even if the load or store
statement itself is executed.  So in this case the DR_IS_CONDITIONAL
is reflecting something that tree-data-ref.c itself has done.

How about calling it DR_IS_CONDITIONAL_IN_STMT to avoid the
general-sounding name?

>>> The vectorizer of course only sees unconditionally executed stmts.
>>>
>>> So - I'd simply not add this DR_IS_CONDITIONAL.  Did you run into
>>> any real-world (testsuite) issues without this?
>>
>> Dropping DR_IS_CONDITIONAL would cause us to make invalid alignment
>> assumptions in silly corner cases.  I could add a scan test for it,
>> for targets with masked loads and stores.  It wouldn't trigger
>> an execution failure though because we assume that targets with
>> masked loads and stores allow unaligned accesses:
>>
>>   /* For now assume all conditional loads/stores support unaligned
>>  access without any special code.  */
>>   if (is_gimple_call (stmt)
>>   && gimple_call_internal_p (stmt)
>>   && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD
>>   || gimple_call_internal_fn (stmt) == IFN_MASK_STORE))
>> return dr_unaligned_supported;
>>
>> So the worst that would happen is that we'd supposedly peel for
>> alignment, but actually misalign everything instead, and so make
>> things slower rather than quicker.
>>
>>> Note that the assert is to prevent bogus information.  Iff we aligned
>>> DR with base alignment 8 and misalign 3 then if another same-align
>>> DR has base alignment 16 we can't simply zero its DR_MISALIGNMENT
>>> as it still can be 8 after aligning DR.
>>>
>>> So I think it's wrong to put DRs with differing base-alignment into
>>> the same-align-refs chain, those should get their DR_MISALIGNMENT
>>> updated independenlty after peeling.
>>
>> DR_MISALIGNMENT is relative to the vecto

Re: [PATCH, GCC/ARM, Stage 1] Rename FPSCR builtins to correct names

2017-06-26 Thread Thomas Preudhomme

Hi Christophe,

On 23/06/17 20:10, Christophe Lyon wrote:

Hi Thomas,

On 23 June 2017 at 17:48, Thomas Preudhomme
 wrote:

Hi Kyrill,


On 10/04/17 15:01, Kyrill Tkachov wrote:


Hi Prakhar,
Sorry for the delay,

On 22/03/17 10:46, Prakhar Bahuguna wrote:


The GCC documentation in section 6.60.8 ARM Floating Point Status and
Control
Intrinsics states that the FPSCR register can be read and written to
using the
intrinsics __builtin_arm_get_fpscr and __builtin_arm_set_fpscr. However,
these
are misnamed within GCC itself and these intrinsic names are not
recognised.
This patch corrects the intrinsic names to match the documentation, and
adds
tests to verify these intrinsics generate the correct instructions.

Testing done: Ran regression tests on arm-none-eabi for Cortex-M4.

2017-03-09  Prakhar Bahuguna  

gcc/ChangeLog:

 * gcc/config/arm/arm-builtins.c (arm_init_builtins): Rename
   __builtin_arm_ldfscr to __builtin_arm_get_fpscr, and rename
   __builtin_arm_stfscr to __builtin_arm_set_fpscr.
 * gcc/testsuite/gcc.target/arm/fpscr.c: New file.

Okay for stage 1?



I see that the mistake was in not addressing one of the review comments
in:
https://gcc.gnu.org/ml/gcc-patches/2014-04/msg01832.html
properly in the patch that added these functions :(

This is ok for stage 1 if a bootstrap and test on arm-none-linux-gnueabihf
works
fine
I don't think we want to maintain the __builtin_arm_[ld,st]fscr names for
backwards compatibility
as they were not documented and are __builtin_arm* functions that we don't
guarantee to maintain.



How about a backport to GCC 5, 6 & 7? The patch applied cleanly on each of
these versions and the testsuite didn't show any regression for any of the
backport when run for Cortex-M7.



Three's a problem with GCC-5:
 gcc.target/arm/fpscr.c: unknown effective target keyword
`arm_fp_ok' for " dg-require-effective-target 4 arm_fp_ok "

Indeed arm_fp_ok effective-target does not exist in the gcc-5 branch.


Oh no. I remember not seeing anything but I can indeed see this with 
compare_tests from the sum file I save after each testing. Alright, what is done 
is done, working on a patch now.


Best regards,

Thomas


[PATCH, 0/4] Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS,JIT} in libgomp nvptx plugin

2017-06-26 Thread Tom de Vries

Hi,

I've written a patch series to facilitate debugging libgomp openacc 
testcase failures on the nvptx accelerator.



When running an openacc test-case on an nvptx accelerator, the following 
happens:

- the plugin obtains the ptx assembly for the acceleration kernels
- it calls the cuda jit to compile and link the ptx into a module
- it loads the module
- it starts an acceleration kernel

The patch series adds these environment variables:
- GOMP_OPENACC_NVPTX_SAVE_TEMPS: a means to save the resulting module
  such that it can be investigated using nvdisasm and cuobjdump.
- GOMP_OPENACC_NVPTX_DISASM: a means to see the resulting module in
  the debug output,  by writing it into a file and calling nvdisasm on
  it
- GOMP_OPENACC_NVPTX_JIT: a means to set parameters of the
  compilation/linking process, currently supporting:
  * -O[0-4], mapping onto CU_JIT_OPTIMIZATION_LEVEL
  * -ori, mapping onto CU_JIT_NEW_SM3X_OPT


The patch series consists of these patches:

1. Show value of GOMP_OPENACC_DIM in libgomp nvptx plugin
2. Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin
3. Handle GOMP_OPENACC_NVPTX_JIT=-O[0-4] in libgomp nvptx plugin
4. Handle GOMP_OPENACC_NVPTX_JIT=-ori in libgomp nvptx plugin


I've tested the patch series on top of gomp-4_0-branch, by running an 
openacc testcase from the command line and defining the various 
environment variables.


[ A relevant difference between gomp-4_0-branch and master is that:
- master defines and includes ./libgomp/plugin/cuda/cuda.h, so I had to
  add the CU_JIT constants there, while
- gomp-4_0-branch doesn't define that local minimal cuda.h file but
  includes cuda's cuda.h. My setup linked against cuda 6.5 which defines
  CU_JIT_OPTIMIZATION_LEVEL but not yet CU_JIT_NEW_SM3X_OPT (that seems
  to have been introduced at cuda 8.0), so I had to hardcode the latter.
]


OK for trunk if bootstrap and reg-test on x86_64 with nvidia accelerator 
succeeds?


Thanks,
- Tom


[PATCH, 1/4] Show value of GOMP_OPENACC_DIM in libgomp nvptx plugin

2017-06-26 Thread Tom de Vries

On 06/26/2017 01:24 PM, Tom de Vries wrote:

Hi,

I've written a patch series to facilitate debugging libgomp openacc 
testcase failures on the nvptx accelerator.



When running an openacc test-case on an nvptx accelerator, the following 
happens:

- the plugin obtains the ptx assembly for the acceleration kernels
- it calls the cuda jit to compile and link the ptx into a module
- it loads the module
- it starts an acceleration kernel

The patch series adds these environment variables:
- GOMP_OPENACC_NVPTX_SAVE_TEMPS: a means to save the resulting module
   such that it can be investigated using nvdisasm and cuobjdump.
- GOMP_OPENACC_NVPTX_DISASM: a means to see the resulting module in
   the debug output,  by writing it into a file and calling nvdisasm on
   it
- GOMP_OPENACC_NVPTX_JIT: a means to set parameters of the
   compilation/linking process, currently supporting:
   * -O[0-4], mapping onto CU_JIT_OPTIMIZATION_LEVEL
   * -ori, mapping onto CU_JIT_NEW_SM3X_OPT


The patch series consists of these patches:

1. Show value of GOMP_OPENACC_DIM in libgomp nvptx plugin


This patch adds a debug message (for GOMP_DEBUG=1) about the value of 
the GOMP_OPENACC_DIM variable read from the environment.


Thanks,
- Tom
Show value of GOMP_OPENACC_DIM in libgomp nvptx plugin

2017-06-26  Tom de Vries  

	* plugin/plugin-nvptx.c (notify_var): New function.
	(nvptx_exec): Use notify_var for GOMP_OPENACC_DIM.

---
 libgomp/plugin/plugin-nvptx.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 0e1b3e2..71630b5 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -867,6 +867,14 @@ nvptx_get_num_devices (void)
   return n;
 }
 
+static void
+notify_var (const char *var_name, const char *env_var)
+{
+  if (env_var == NULL)
+GOMP_PLUGIN_debug (0, "%s: \n", var_name);
+  else
+GOMP_PLUGIN_debug (0, "%s: '%s'\n", var_name, env_var);
+}
 
 static bool
 link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
@@ -1089,10 +1097,12 @@ nvptx_exec (void (*fn), size_t mapnum, void **hostaddrs, void **devaddrs,
   pthread_mutex_lock (&ptx_dev_lock);
   if (!default_dims[0])
 	{
+	  const char *var_name = "GOMP_OPENACC_DIM";
 	  /* We only read the environment variable once.  You can't
 	 change it in the middle of execution.  The syntax  is
 	 the same as for the -fopenacc-dim compilation option.  */
-	  const char *env_var = getenv ("GOMP_OPENACC_DIM");
+	  const char *env_var = getenv (var_name);
+	  notify_var (var_name, env_var);
 	  if (env_var)
 	{
 	  const char *pos = env_var;


[PATCH, GCC/ARM, gcc-5-branch] Fix gcc.target/arm/fpscr.c

2017-06-26 Thread Thomas Preudhomme

Hi,

As raised by Christophe Lyon, fpscr.c FAILs because arm_fp_ok and arm_fp
are not defined in GCC 5. This commit changes the test to use the same
recipe as gcc.target/arm/cmp-2.c

ChangeLog entry is as follows:


*** gcc/testsuite/ChangeLog ***

2017-06-26  Thomas Preud'homme  

* gcc.target/arm/fpscr.c: Require arm_vfp_ok instead of arm_fp_ok and
add -mfpu=vfp -mfloat-abi=softfp instead of fp_ok options.


Ok for GCC 5?

Best regards,

Thomas
diff --git a/gcc/testsuite/gcc.target/arm/fpscr.c b/gcc/testsuite/gcc.target/arm/fpscr.c
index 7b4d71d72d8964f6da0d0604bf59aeb4a895df43..cafba4e8d67545bd210477230b9682fe86620e23 100644
--- a/gcc/testsuite/gcc.target/arm/fpscr.c
+++ b/gcc/testsuite/gcc.target/arm/fpscr.c
@@ -1,9 +1,9 @@
 /* Test the fpscr builtins.  */
 
 /* { dg-do compile } */
-/* { dg-require-effective-target arm_fp_ok } */
+/* { dg-require-effective-target arm_vfp_ok } */
 /* { dg-skip-if "need fp instructions" { *-*-* } { "-mfloat-abi=soft" } { "" } } */
-/* { dg-add-options arm_fp } */
+/* { dg-options "-mfpu=vfp -mfloat-abi=softfp" } */
 
 void
 test_fpscr ()


Re: PR81136: ICE from inconsistent DR_MISALIGNMENTs

2017-06-26 Thread Richard Biener
On Mon, Jun 26, 2017 at 1:14 PM, Richard Sandiford
 wrote:
> Richard Biener  writes:
>> On Fri, Jun 23, 2017 at 2:05 PM, Richard Sandiford
>>  wrote:
>>> Richard Biener  writes:
 On Thu, Jun 22, 2017 at 1:30 PM, Richard Sandiford
  wrote:
> The test case triggered this assert in vect_update_misalignment_for_peel:
>
>   gcc_assert (DR_MISALIGNMENT (dr) / dr_size ==
>   DR_MISALIGNMENT (dr_peel) / dr_peel_size);
>
> We knew that the two DRs had the same misalignment at runtime, but when
> considered in isolation, one data reference guaranteed a higher 
> compile-time
> base alignment than the other.
>
> In the test case this looks like a missed opportunity.  Both references
> are unconditional, so it should be possible to use the highest of the
> available base alignment guarantees when analyzing each reference.
> The patch does this.
>
> However, as the comment in the patch says, the base alignment guarantees
> provided by a conditional reference only apply if the reference occurs
> at least once.  In this case it would be legitimate for two references
> to have the same runtime misalignment and for one reference to provide a
> stronger compile-time guarantee than the other about what the misalignment
> actually is.  The patch therefore relaxes the assert to handle that case.

 Hmm, but you don't actually check whether a reference occurs only
>> conditional,
 do you?  You just seem to say that for masked loads/stores the reference
 is conditional (I believe that's not true).  But for a loop like

  for (;;)
if (a[i])
  sum += b[j];

 you still assume b[j] executes unconditionally?
>>>
>>> Maybe the documentation isn't clear enough, but DR_IS_CONDITIONAL
>>> was supposed to mean "even if the containing statement executes
>>> and runs to completion, the reference might not actually occur".
>>> The example above isn't conditional in that sense because the
>>> reference to b[j] does occur if the store is reached and completes.
>>>
>>> Masked loads and stores are conditional in that sense though.
>>> The reference only occurs if the mask is nonzero; the memory
>>> isn't touched otherwise.  The functions are used to if-convert
>>> things like:
>>>
>>>for (...)
>>>  a[i] = b[i] ? c[i] : d[i];
>>>
>>> where there's no guarantee that it's safe to access c[i] when !b[i]
>>> (or d[i] when b[i]).  No reference occurs for an all-false mask.
>>
>> But as you touch generic data-ref code here you should apply more
>> sensible semantics to DR_IS_CONDITIONAL than just marking
>> masked loads/stores but not DRs occuring inside BBs only executed
>> conditionally ...
>
> I don't see why that's more sensible though.  If a statement is only
> conditionally executed in a loop, it's up to the consumer to decide
> what to do about that.  The conditions under which the statement
> is reached are a control-flow issue and tree-data-ref.c doesn't
> have any special information about it.
>
> Masked loads and stores are special because the DR_REFs created by
> tree-data-ref.c are artificial: they didn't exist as MEM_REFs in the
> original DR_STMT.  And AIUI they didn't exist as MEM_REFs precisely
> because they're not guaranteed to happen, even if the load or store
> statement itself is executed.  So in this case the DR_IS_CONDITIONAL
> is reflecting something that tree-data-ref.c itself has done.
>
> How about calling it DR_IS_CONDITIONAL_IN_STMT to avoid the
> general-sounding name?

That sounds better and avoids the ambiguity.

 The vectorizer of course only sees unconditionally executed stmts.

 So - I'd simply not add this DR_IS_CONDITIONAL.  Did you run into
 any real-world (testsuite) issues without this?
>>>
>>> Dropping DR_IS_CONDITIONAL would cause us to make invalid alignment
>>> assumptions in silly corner cases.  I could add a scan test for it,
>>> for targets with masked loads and stores.  It wouldn't trigger
>>> an execution failure though because we assume that targets with
>>> masked loads and stores allow unaligned accesses:
>>>
>>>   /* For now assume all conditional loads/stores support unaligned
>>>  access without any special code.  */
>>>   if (is_gimple_call (stmt)
>>>   && gimple_call_internal_p (stmt)
>>>   && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD
>>>   || gimple_call_internal_fn (stmt) == IFN_MASK_STORE))
>>> return dr_unaligned_supported;
>>>
>>> So the worst that would happen is that we'd supposedly peel for
>>> alignment, but actually misalign everything instead, and so make
>>> things slower rather than quicker.
>>>
 Note that the assert is to prevent bogus information.  Iff we aligned
 DR with base alignment 8 and misalign 3 then if another same-align
 DR has base alignment 16 we can't simply zero its DR_MISALIGNMENT
 as it still can be 8 after aligning DR.

 S

[PATCH, 2/4] Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin

2017-06-26 Thread Tom de Vries

On 06/26/2017 01:24 PM, Tom de Vries wrote:

Hi,

I've written a patch series to facilitate debugging libgomp openacc 
testcase failures on the nvptx accelerator.



When running an openacc test-case on an nvptx accelerator, the following 
happens:

- the plugin obtains the ptx assembly for the acceleration kernels
- it calls the cuda jit to compile and link the ptx into a module
- it loads the module
- it starts an acceleration kernel

The patch series adds these environment variables:
- GOMP_OPENACC_NVPTX_SAVE_TEMPS: a means to save the resulting module
   such that it can be investigated using nvdisasm and cuobjdump.
- GOMP_OPENACC_NVPTX_DISASM: a means to see the resulting module in
   the debug output,  by writing it into a file and calling nvdisasm on
   it
- GOMP_OPENACC_NVPTX_JIT: a means to set parameters of the
   compilation/linking process, currently supporting:
   * -O[0-4], mapping onto CU_JIT_OPTIMIZATION_LEVEL
   * -ori, mapping onto CU_JIT_NEW_SM3X_OPT


The patch series consists of these patches:

2. Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin


This patch adds handling of:
- GOMP_OPENACC_NVPTX_SAVE_TEMPS=[01], and
- GOMP_OPENACC_NVPTX_DISASM=[01]

The filename used for dumping the module is plugin-nvptx..cubin.

Thanks,
- Tom
Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin

2017-06-26  Tom de Vries  

	* plugin/plugin-nvptx.c (do_prog, debug_linkout): New function.
	(link_ptx): Use debug_linkout.

---
 libgomp/plugin/plugin-nvptx.c | 103 ++
 1 file changed, 103 insertions(+)

diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 71630b5..df1bfdd 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -47,6 +47,9 @@
 #include 
 #include 
 #include 
+#include 
+#include 
+#include 
 
 #if PLUGIN_NVPTX_DYNAMIC
 # include 
@@ -876,6 +879,104 @@ notify_var (const char *var_name, const char *env_var)
 GOMP_PLUGIN_debug (0, "%s: '%s'\n", var_name, env_var);
 }
 
+static void
+do_prog (const char *prog, const char *arg)
+{
+  pid_t pid = fork ();
+
+  if (pid == -1)
+{
+  GOMP_PLUGIN_error ("Fork failed");
+  return;
+}
+  else if (pid > 0)
+{
+  int status;
+  waitpid (pid, &status, 0);
+  if (!WIFEXITED (status))
+	GOMP_PLUGIN_error ("Running %s %s failed", prog, arg);
+}
+  else
+{
+  execlp (prog, prog /* argv[0] */, arg, NULL);
+  abort ();
+}
+}
+
+static void
+debug_linkout (void *linkout, size_t linkoutsize)
+{
+  static int gomp_openacc_nvptx_disasm = -1;
+  if (gomp_openacc_nvptx_disasm == -1)
+{
+  const char *var_name = "GOMP_OPENACC_NVPTX_DISASM";
+  const char *env_var = getenv (var_name);
+  notify_var (var_name, env_var);
+  gomp_openacc_nvptx_disasm
+	= ((env_var != NULL && env_var[0] == '1' && env_var[1] == '\0')
+	   ? 1 : 0);
+}
+
+  static int gomp_openacc_nvptx_save_temps = -1;
+  if (gomp_openacc_nvptx_save_temps == -1)
+{
+  const char *var_name = "GOMP_OPENACC_NVPTX_SAVE_TEMPS";
+  const char *env_var = getenv (var_name);
+  notify_var (var_name, env_var);
+  gomp_openacc_nvptx_save_temps
+	= ((env_var != NULL && env_var[0] == '1' && env_var[1] == '\0')
+	   ? 1 : 0);
+}
+
+  if (gomp_openacc_nvptx_disasm == 0
+  && gomp_openacc_nvptx_save_temps == 0)
+return;
+
+  const char *prefix = "plugin-nvptx.";
+  const char *postfix = ".cubin";
+  const int len =	(strlen (prefix)
+			 + 20 /* %lld.  */
+			 + strlen (postfix)
+			 + 1  /* '\0'.  */);
+  char file_name[len];
+  int res = snprintf (file_name, len, "%s%lld%s", prefix,
+		  (long long)getpid (), postfix);
+  assert (res < len); /* Assert there's no truncation.  */
+
+  GOMP_PLUGIN_debug (0, "Generating %s with size %zu\n",
+		 file_name, linkoutsize);
+  FILE *cubin_file = fopen (file_name, "wb");
+  if (cubin_file == NULL)
+{
+  GOMP_PLUGIN_debug (0, "Opening %s failed\n", file_name);
+  return;
+}
+
+  fwrite (linkout, linkoutsize, 1, cubin_file);
+  unsigned int write_succeeded = ferror (cubin_file) == 0;
+  if (!write_succeeded)
+GOMP_PLUGIN_debug (0, "Writing %s failed\n", file_name);
+
+  res = fclose (cubin_file);
+  if (res != 0)
+GOMP_PLUGIN_debug (0, "Closing %s failed\n", file_name);
+
+  if (!write_succeeded)
+return;
+
+  if (gomp_openacc_nvptx_disasm == 1)
+{
+  GOMP_PLUGIN_debug (0, "Disassembling %s\n", file_name);
+  do_prog ("nvdisasm", file_name);
+}
+
+  if (gomp_openacc_nvptx_save_temps == 0)
+{
+  GOMP_PLUGIN_debug (0, "Removing %s\n", file_name);
+  remove (file_name);
+}
+}
+
 static bool
 link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
 	  unsigned num_objs)
@@ -939,6 +1040,8 @@ link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
   return false;
 }
 
+  debug_linkout (linkout, linkoutsize);
+
   CUDA_CALL (cuModuleLoadData, module, linko

Re: [PATCH, 0/4] Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS,JIT} in libgomp nvptx plugin

2017-06-26 Thread Tom de Vries

On 06/26/2017 01:24 PM, Tom de Vries wrote:

Hi,

I've written a patch series to facilitate debugging libgomp openacc 
testcase failures on the nvptx accelerator.



When running an openacc test-case on an nvptx accelerator, the following 
happens:

- the plugin obtains the ptx assembly for the acceleration kernels
- it calls the cuda jit to compile and link the ptx into a module
- it loads the module
- it starts an acceleration kernel

The patch series adds these environment variables:
- GOMP_OPENACC_NVPTX_SAVE_TEMPS: a means to save the resulting module
   such that it can be investigated using nvdisasm and cuobjdump.
- GOMP_OPENACC_NVPTX_DISASM: a means to see the resulting module in
   the debug output,  by writing it into a file and calling nvdisasm on
   it
- GOMP_OPENACC_NVPTX_JIT: a means to set parameters of the
   compilation/linking process, currently supporting:
   * -O[0-4], mapping onto CU_JIT_OPTIMIZATION_LEVEL
   * -ori, mapping onto CU_JIT_NEW_SM3X_OPT


The patch series consists of these patches:

3. Handle GOMP_OPENACC_NVPTX_JIT=-O[0-4] in libgomp nvptx plugin


This patch adds handling of Handle GOMP_OPENACC_NVPTX_JIT=-O[0-4].

Thanks,
- Tom
Handle GOMP_OPENACC_NVPTX_JIT=-O[0-4] in libgomp nvptx plugin

2017-06-26  Tom de Vries  

	* plugin/cuda/cuda.h (enum CUjit_option): Add CU_JIT_OPTIMIZATION_LEVEL.
	* plugin/plugin-nvptx.c (process_GOMP_OPENACC_NVPTX_JIT): New function.
	(link_ptx): Add CU_JIT_OPTIMIZATION_LEVEL to opts.

---
 libgomp/plugin/cuda/cuda.h|  1 +
 libgomp/plugin/plugin-nvptx.c | 44 ---
 2 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/libgomp/plugin/cuda/cuda.h b/libgomp/plugin/cuda/cuda.h
index 25d5d19..75dfe3d 100644
--- a/libgomp/plugin/cuda/cuda.h
+++ b/libgomp/plugin/cuda/cuda.h
@@ -88,6 +88,7 @@ typedef enum {
   CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES = 4,
   CU_JIT_ERROR_LOG_BUFFER = 5,
   CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES = 6,
+  CU_JIT_OPTIMIZATION_LEVEL = 7,
   CU_JIT_LOG_VERBOSE = 12
 } CUjit_option;
 
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index df1bfdd..3cd5557 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -977,12 +977,43 @@ debug_linkout (void *linkout, size_t linkoutsize)
 }
 }
 
+static void
+process_GOMP_OPENACC_NVPTX_JIT (intptr_t *gomp_openacc_nvptx_o)
+{
+  const char *var_name = "GOMP_OPENACC_NVPTX_JIT";
+  const char *env_var = getenv (var_name);
+  notify_var (var_name, env_var);
+
+  *gomp_openacc_nvptx_o = 4;
+  if (env_var == NULL)
+return;
+
+  const char *c = env_var;
+  while (*c != '\0')
+{
+  while (*c == ' ')
+	c++;
+
+  if (c[0] == '-' && c[1] == 'O'
+	  && '0' <= c[2] && c[2] <= '4'
+	  && (c[3] == '\0' || c[3] == ' '))
+	{
+	  *gomp_openacc_nvptx_o = c[2] - '0';
+	  c += 3;
+	  continue;
+	}
+
+  GOMP_PLUGIN_error ("Error parsing %s", var_name);
+  break;
+}
+}
+
 static bool
 link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
 	  unsigned num_objs)
 {
-  CUjit_option opts[6];
-  void *optvals[6];
+  CUjit_option opts[7];
+  void *optvals[7];
   float elapsed = 0.0;
   char elog[1024];
   char ilog[16384];
@@ -1009,7 +1040,14 @@ link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
   opts[5] = CU_JIT_LOG_VERBOSE;
   optvals[5] = (void *) 1;
 
-  CUDA_CALL (cuLinkCreate, 6, opts, optvals, &linkstate);
+  static intptr_t gomp_openacc_nvptx_o = -1;
+  if (gomp_openacc_nvptx_o == -1)
+process_GOMP_OPENACC_NVPTX_JIT (&gomp_openacc_nvptx_o);
+
+  opts[6] = CU_JIT_OPTIMIZATION_LEVEL;
+  optvals[6] = (void *) gomp_openacc_nvptx_o;
+
+  CUDA_CALL (cuLinkCreate, 7, opts, optvals, &linkstate);
 
   for (; num_objs--; ptx_objs++)
 {


[PATCH, 4/4] Handle GOMP_OPENACC_NVPTX_JIT=-ori in libgomp nvptx plugin

2017-06-26 Thread Tom de Vries

On 06/26/2017 01:24 PM, Tom de Vries wrote:

Hi,

I've written a patch series to facilitate debugging libgomp openacc 
testcase failures on the nvptx accelerator.



When running an openacc test-case on an nvptx accelerator, the following 
happens:

- the plugin obtains the ptx assembly for the acceleration kernels
- it calls the cuda jit to compile and link the ptx into a module
- it loads the module
- it starts an acceleration kernel

The patch series adds these environment variables:
- GOMP_OPENACC_NVPTX_SAVE_TEMPS: a means to save the resulting module
   such that it can be investigated using nvdisasm and cuobjdump.
- GOMP_OPENACC_NVPTX_DISASM: a means to see the resulting module in
   the debug output,  by writing it into a file and calling nvdisasm on
   it
- GOMP_OPENACC_NVPTX_JIT: a means to set parameters of the
   compilation/linking process, currently supporting:
   * -O[0-4], mapping onto CU_JIT_OPTIMIZATION_LEVEL
   * -ori, mapping onto CU_JIT_NEW_SM3X_OPT


The patch series consists of these patches:

4. Handle GOMP_OPENACC_NVPTX_JIT=-ori in libgomp nvptx plugin


This patch adds handling of GOMP_OPENACC_NVPTX_JIT=-ori.

Thanks,
- Tom
Handle GOMP_OPENACC_NVPTX_JIT=-ori in libgomp nvptx plugin

2017-06-26  Tom de Vries  

	* plugin/cuda/cuda.h (enum CUjit_option): Add CU_JIT_NEW_SM3X_OPT.
	* plugin/plugin-nvptx.c (process_GOMP_OPENACC_NVPTX_JIT): Add
	gomp_openacc_nvptx_ori parameter.  Handle -ori.
	(link_ptx): Add CU_JIT_NEW_SM3X_OPT to opts.

---
 libgomp/plugin/cuda/cuda.h|  3 ++-
 libgomp/plugin/plugin-nvptx.c | 30 +-
 2 files changed, 27 insertions(+), 6 deletions(-)

diff --git a/libgomp/plugin/cuda/cuda.h b/libgomp/plugin/cuda/cuda.h
index 75dfe3d..4644870 100644
--- a/libgomp/plugin/cuda/cuda.h
+++ b/libgomp/plugin/cuda/cuda.h
@@ -89,7 +89,8 @@ typedef enum {
   CU_JIT_ERROR_LOG_BUFFER = 5,
   CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES = 6,
   CU_JIT_OPTIMIZATION_LEVEL = 7,
-  CU_JIT_LOG_VERBOSE = 12
+  CU_JIT_LOG_VERBOSE = 12,
+  CU_JIT_NEW_SM3X_OPT = 15
 } CUjit_option;
 
 typedef enum {
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index 3cd5557..a8548fb 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -978,13 +978,15 @@ debug_linkout (void *linkout, size_t linkoutsize)
 }
 
 static void
-process_GOMP_OPENACC_NVPTX_JIT (intptr_t *gomp_openacc_nvptx_o)
+process_GOMP_OPENACC_NVPTX_JIT (intptr_t *gomp_openacc_nvptx_o,
+intptr_t *gomp_openacc_nvptx_ori)
 {
   const char *var_name = "GOMP_OPENACC_NVPTX_JIT";
   const char *env_var = getenv (var_name);
   notify_var (var_name, env_var);
 
   *gomp_openacc_nvptx_o = 4;
+  *gomp_openacc_nvptx_ori = 0;
   if (env_var == NULL)
 return;
 
@@ -1003,6 +1005,14 @@ process_GOMP_OPENACC_NVPTX_JIT (intptr_t *gomp_openacc_nvptx_o)
 	  continue;
 	}
 
+  if (c[0] == '-' && c[1] == 'o' && c[2] == 'r' && c[3] == 'i'
+	  && (c[4] == '\0' || c[4] == ' '))
+	{
+	  *gomp_openacc_nvptx_ori = 1;
+	  c += 4;
+	  continue;
+	}
+
   GOMP_PLUGIN_error ("Error parsing %s", var_name);
   break;
 }
@@ -1012,8 +1022,8 @@ static bool
 link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
 	  unsigned num_objs)
 {
-  CUjit_option opts[7];
-  void *optvals[7];
+  CUjit_option opts[8];
+  void *optvals[8];
   float elapsed = 0.0;
   char elog[1024];
   char ilog[16384];
@@ -1041,13 +1051,23 @@ link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
   optvals[5] = (void *) 1;
 
   static intptr_t gomp_openacc_nvptx_o = -1;
+  static intptr_t gomp_openacc_nvptx_ori = -1;
   if (gomp_openacc_nvptx_o == -1)
-process_GOMP_OPENACC_NVPTX_JIT (&gomp_openacc_nvptx_o);
+process_GOMP_OPENACC_NVPTX_JIT (&gomp_openacc_nvptx_o,
+&gomp_openacc_nvptx_ori);
 
   opts[6] = CU_JIT_OPTIMIZATION_LEVEL;
   optvals[6] = (void *) gomp_openacc_nvptx_o;
 
-  CUDA_CALL (cuLinkCreate, 7, opts, optvals, &linkstate);
+  int nopts = 7;
+  if (gomp_openacc_nvptx_ori)
+{
+  opts[nopts] = CU_JIT_NEW_SM3X_OPT;
+  optvals[nopts] = (void *) gomp_openacc_nvptx_ori;
+  nopts++;
+}
+
+  CUDA_CALL (cuLinkCreate, nopts, opts, optvals, &linkstate);
 
   for (; num_objs--; ptx_objs++)
 {


[PATCH, 3/4] Handle GOMP_OPENACC_NVPTX_JIT=-O[0-4] in libgomp nvptx plugin

2017-06-26 Thread Tom de Vries

[ reposting with proper subject ]

On 06/26/2017 01:42 PM, Tom de Vries wrote:

On 06/26/2017 01:24 PM, Tom de Vries wrote:

Hi,

I've written a patch series to facilitate debugging libgomp openacc 
testcase failures on the nvptx accelerator.



When running an openacc test-case on an nvptx accelerator, the 
following happens:

- the plugin obtains the ptx assembly for the acceleration kernels
- it calls the cuda jit to compile and link the ptx into a module
- it loads the module
- it starts an acceleration kernel

The patch series adds these environment variables:
- GOMP_OPENACC_NVPTX_SAVE_TEMPS: a means to save the resulting module
   such that it can be investigated using nvdisasm and cuobjdump.
- GOMP_OPENACC_NVPTX_DISASM: a means to see the resulting module in
   the debug output,  by writing it into a file and calling nvdisasm on
   it
- GOMP_OPENACC_NVPTX_JIT: a means to set parameters of the
   compilation/linking process, currently supporting:
   * -O[0-4], mapping onto CU_JIT_OPTIMIZATION_LEVEL
   * -ori, mapping onto CU_JIT_NEW_SM3X_OPT


The patch series consists of these patches:

3. Handle GOMP_OPENACC_NVPTX_JIT=-O[0-4] in libgomp nvptx plugin


This patch adds handling of Handle GOMP_OPENACC_NVPTX_JIT=-O[0-4].


Thanks,
- Tom
Handle GOMP_OPENACC_NVPTX_JIT=-O[0-4] in libgomp nvptx plugin

2017-06-26  Tom de Vries  

	* plugin/cuda/cuda.h (enum CUjit_option): Add CU_JIT_OPTIMIZATION_LEVEL.
	* plugin/plugin-nvptx.c (process_GOMP_OPENACC_NVPTX_JIT): New function.
	(link_ptx): Add CU_JIT_OPTIMIZATION_LEVEL to opts.

---
 libgomp/plugin/cuda/cuda.h|  1 +
 libgomp/plugin/plugin-nvptx.c | 44 ---
 2 files changed, 42 insertions(+), 3 deletions(-)

diff --git a/libgomp/plugin/cuda/cuda.h b/libgomp/plugin/cuda/cuda.h
index 25d5d19..75dfe3d 100644
--- a/libgomp/plugin/cuda/cuda.h
+++ b/libgomp/plugin/cuda/cuda.h
@@ -88,6 +88,7 @@ typedef enum {
   CU_JIT_INFO_LOG_BUFFER_SIZE_BYTES = 4,
   CU_JIT_ERROR_LOG_BUFFER = 5,
   CU_JIT_ERROR_LOG_BUFFER_SIZE_BYTES = 6,
+  CU_JIT_OPTIMIZATION_LEVEL = 7,
   CU_JIT_LOG_VERBOSE = 12
 } CUjit_option;
 
diff --git a/libgomp/plugin/plugin-nvptx.c b/libgomp/plugin/plugin-nvptx.c
index df1bfdd..3cd5557 100644
--- a/libgomp/plugin/plugin-nvptx.c
+++ b/libgomp/plugin/plugin-nvptx.c
@@ -977,12 +977,43 @@ debug_linkout (void *linkout, size_t linkoutsize)
 }
 }
 
+static void
+process_GOMP_OPENACC_NVPTX_JIT (intptr_t *gomp_openacc_nvptx_o)
+{
+  const char *var_name = "GOMP_OPENACC_NVPTX_JIT";
+  const char *env_var = getenv (var_name);
+  notify_var (var_name, env_var);
+
+  *gomp_openacc_nvptx_o = 4;
+  if (env_var == NULL)
+return;
+
+  const char *c = env_var;
+  while (*c != '\0')
+{
+  while (*c == ' ')
+	c++;
+
+  if (c[0] == '-' && c[1] == 'O'
+	  && '0' <= c[2] && c[2] <= '4'
+	  && (c[3] == '\0' || c[3] == ' '))
+	{
+	  *gomp_openacc_nvptx_o = c[2] - '0';
+	  c += 3;
+	  continue;
+	}
+
+  GOMP_PLUGIN_error ("Error parsing %s", var_name);
+  break;
+}
+}
+
 static bool
 link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
 	  unsigned num_objs)
 {
-  CUjit_option opts[6];
-  void *optvals[6];
+  CUjit_option opts[7];
+  void *optvals[7];
   float elapsed = 0.0;
   char elog[1024];
   char ilog[16384];
@@ -1009,7 +1040,14 @@ link_ptx (CUmodule *module, const struct targ_ptx_obj *ptx_objs,
   opts[5] = CU_JIT_LOG_VERBOSE;
   optvals[5] = (void *) 1;
 
-  CUDA_CALL (cuLinkCreate, 6, opts, optvals, &linkstate);
+  static intptr_t gomp_openacc_nvptx_o = -1;
+  if (gomp_openacc_nvptx_o == -1)
+process_GOMP_OPENACC_NVPTX_JIT (&gomp_openacc_nvptx_o);
+
+  opts[6] = CU_JIT_OPTIMIZATION_LEVEL;
+  optvals[6] = (void *) gomp_openacc_nvptx_o;
+
+  CUDA_CALL (cuLinkCreate, 7, opts, optvals, &linkstate);
 
   for (; num_objs--; ptx_objs++)
 {


Re: PR81136: ICE from inconsistent DR_MISALIGNMENTs

2017-06-26 Thread Richard Sandiford
Richard Biener  writes:
> On Mon, Jun 26, 2017 at 1:14 PM, Richard Sandiford
>  wrote:
>> Richard Biener  writes:
>>> On Fri, Jun 23, 2017 at 2:05 PM, Richard Sandiford
>>>  wrote:
 Richard Biener  writes:
> On Thu, Jun 22, 2017 at 1:30 PM, Richard Sandiford
>  wrote:
>> The test case triggered this assert in vect_update_misalignment_for_peel:
>>
>>   gcc_assert (DR_MISALIGNMENT (dr) / dr_size ==
>>   DR_MISALIGNMENT (dr_peel) / dr_peel_size);
>>
>> We knew that the two DRs had the same misalignment at runtime, but when
>> considered in isolation, one data reference guaranteed a higher 
>> compile-time
>> base alignment than the other.
>>
>> In the test case this looks like a missed opportunity.  Both references
>> are unconditional, so it should be possible to use the highest of the
>> available base alignment guarantees when analyzing each reference.
>> The patch does this.
>>
>> However, as the comment in the patch says, the base alignment guarantees
>> provided by a conditional reference only apply if the reference occurs
>> at least once.  In this case it would be legitimate for two references
>> to have the same runtime misalignment and for one reference to provide a
>> stronger compile-time guarantee than the other about what the 
>> misalignment
>> actually is.  The patch therefore relaxes the assert to handle that case.
>
> Hmm, but you don't actually check whether a reference occurs only
>>> conditional,
> do you?  You just seem to say that for masked loads/stores the reference
> is conditional (I believe that's not true).  But for a loop like
>
>  for (;;)
>if (a[i])
>  sum += b[j];
>
> you still assume b[j] executes unconditionally?

 Maybe the documentation isn't clear enough, but DR_IS_CONDITIONAL
 was supposed to mean "even if the containing statement executes
 and runs to completion, the reference might not actually occur".
 The example above isn't conditional in that sense because the
 reference to b[j] does occur if the store is reached and completes.

 Masked loads and stores are conditional in that sense though.
 The reference only occurs if the mask is nonzero; the memory
 isn't touched otherwise.  The functions are used to if-convert
 things like:

for (...)
  a[i] = b[i] ? c[i] : d[i];

 where there's no guarantee that it's safe to access c[i] when !b[i]
 (or d[i] when b[i]).  No reference occurs for an all-false mask.
>>>
>>> But as you touch generic data-ref code here you should apply more
>>> sensible semantics to DR_IS_CONDITIONAL than just marking
>>> masked loads/stores but not DRs occuring inside BBs only executed
>>> conditionally ...
>>
>> I don't see why that's more sensible though.  If a statement is only
>> conditionally executed in a loop, it's up to the consumer to decide
>> what to do about that.  The conditions under which the statement
>> is reached are a control-flow issue and tree-data-ref.c doesn't
>> have any special information about it.
>>
>> Masked loads and stores are special because the DR_REFs created by
>> tree-data-ref.c are artificial: they didn't exist as MEM_REFs in the
>> original DR_STMT.  And AIUI they didn't exist as MEM_REFs precisely
>> because they're not guaranteed to happen, even if the load or store
>> statement itself is executed.  So in this case the DR_IS_CONDITIONAL
>> is reflecting something that tree-data-ref.c itself has done.
>>
>> How about calling it DR_IS_CONDITIONAL_IN_STMT to avoid the
>> general-sounding name?
>
> That sounds better and avoids the ambiguity.

OK.

> The vectorizer of course only sees unconditionally executed stmts.
>
> So - I'd simply not add this DR_IS_CONDITIONAL.  Did you run into
> any real-world (testsuite) issues without this?

 Dropping DR_IS_CONDITIONAL would cause us to make invalid alignment
 assumptions in silly corner cases.  I could add a scan test for it,
 for targets with masked loads and stores.  It wouldn't trigger
 an execution failure though because we assume that targets with
 masked loads and stores allow unaligned accesses:

   /* For now assume all conditional loads/stores support unaligned
  access without any special code.  */
   if (is_gimple_call (stmt)
   && gimple_call_internal_p (stmt)
   && (gimple_call_internal_fn (stmt) == IFN_MASK_LOAD
   || gimple_call_internal_fn (stmt) == IFN_MASK_STORE))
 return dr_unaligned_supported;

 So the worst that would happen is that we'd supposedly peel for
 alignment, but actually misalign everything instead, and so make
 things slower rather than quicker.

> Note that the assert is to prevent bogus information.  Iff we aligned
> DR with base alignment 8 and misalign 3 then if a

[PATCH v2][ASAN] Implement dynamic allocas/VLAs sanitization.​

2017-06-26 Thread Maxim Ostapenko

Hi,

I'm sorry for a long delay. Here an updated patch.
Following Jakub's suggestion from previous review, I've added a 
get_nonzero_bits stuff into handle_builtin_alloca in order to avoid 
redundant redzone size calculations in case we know that alloca has 
alignment >= ASAN_RED_ZONE_SIZE. Thus, for the following code:


struct __attribute__((aligned (N))) S { char s[N]; };

void bar (struct S *, struct S *);

void
foo (int x)
{
  struct S a;
  {
struct S b[x];
bar (&a, &b[0]);
  }
  {
struct S b[x + 4];
bar (&a, &b[0]);
  }
}

void
baz (int x)
{
  struct S a;
  struct S b[x];
  bar (&a, &b[0]);
}

compiled with -O2 -fsanitize=address -DN=64, we have expected

  _2 = (sizetype) x_1(D);
  _8 = _2 * 64;
  _14 = _8 + 96;
  _15 = __builtin_alloca_with_align (_14, 512);
  _16 = _15 + 64;
  __builtin___asan_alloca_poison (_16, _8);

instead of previous

  _1 = (sizetype) x_4(D);
  _2 = _1 * 64;
  _24 = _2 & 31;
  _19 = _2 + 128;
  _27 = _19 - _24;
  _28 = __builtin_alloca_with_align (_27, 512);
  _29 = _28 + 64;
  __builtin___asan_alloca_poison (_29, _2);


Also, I've added a simple pattern for X & C -> 0 if we know that corresponding 
bits are zero, but I'm not sure this pattern has a practical value.
Tested and bootstrapped on x86_64-unknown-linux-gnu. Could you take a look?

-Maxim

gcc/ChangeLog:

2017-06-26  Maxim Ostapenko  

	* asan.c: Include gimple-fold.h.
	(get_last_alloca_addr): New function.
	(handle_builtin_stackrestore): Likewise.
	(handle_builtin_alloca): Likewise.
	(asan_emit_allocas_unpoison): Likewise.
	(get_mem_refs_of_builtin_call): Add new parameter, remove const
	quallifier from first paramerer. Handle BUILT_IN_ALLOCA,
	BUILT_IN_ALLOCA_WITH_ALIGN and BUILT_IN_STACK_RESTORE builtins.
	(instrument_builtin_call): Pass gimple iterator to
	get_mem_refs_of_builtin_call.
	(last_alloca_addr): New global.
	* asan.h (asan_emit_allocas_unpoison): Declare.
	* builtins.c (expand_asan_emit_allocas_unpoison): New function.
	(expand_builtin): Handle BUILT_IN_ASAN_ALLOCAS_UNPOISON.
	* cfgexpand.c (expand_used_vars): Call asan_emit_allocas_unpoison
	if function calls alloca.
	* gimple-fold.c (replace_call_with_value): Remove static keyword.
	* gimple-fold.h (replace_call_with_value): Declare.
	* internal-fn.c: Include asan.h.
	* sanitizer.def (BUILT_IN_ASAN_ALLOCA_POISON,
	BUILT_IN_ASAN_ALLOCAS_UNPOISON): New builtins.
	* match.pd: Add new pattern.

gcc/testsuite/ChangeLog:

2017-06-26  Maxim Ostapenko  

	* c-c++-common/asan/alloca_big_alignment.c: New test.
	* c-c++-common/asan/alloca_detect_custom_size.c: Likewise.
	* c-c++-common/asan/alloca_instruments_all_paddings.c: Likewise.
	* c-c++-common/asan/alloca_loop_unpoisoning.c: Likewise.
	* c-c++-common/asan/alloca_overflow_partial.c: Likewise.
	* c-c++-common/asan/alloca_overflow_right.c: Likewise.
	* c-c++-common/asan/alloca_safe_access.c: Likewise.
	* c-c++-common/asan/alloca_underflow_left.c: Likewise.

diff --git a/gcc/asan.c b/gcc/asan.c
index e730530..6d7a5ec 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -55,6 +55,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "langhooks.h"
 #include "cfgloop.h"
 #include "gimple-builder.h"
+#include "gimple-fold.h"
 #include "ubsan.h"
 #include "params.h"
 #include "builtins.h"
@@ -245,6 +246,7 @@ along with GCC; see the file COPYING3.  If not see
 static unsigned HOST_WIDE_INT asan_shadow_offset_value;
 static bool asan_shadow_offset_computed;
 static vec sanitized_sections;
+static tree last_alloca_addr = NULL_TREE;
 
 /* Set of variable declarations that are going to be guarded by
use-after-scope sanitizer.  */
@@ -529,11 +531,183 @@ get_mem_ref_of_assignment (const gassign *assignment,
   return true;
 }
 
+/* Return address of last allocated dynamic alloca.  */
+
+static tree
+get_last_alloca_addr ()
+{
+  if (last_alloca_addr)
+return last_alloca_addr;
+
+  gimple_seq seq = NULL;
+  gassign *g;
+
+  last_alloca_addr = create_tmp_reg (ptr_type_node, "last_alloca_addr");
+  g = gimple_build_assign (last_alloca_addr, NOP_EXPR,
+			   build_int_cst (ptr_type_node, 0));
+  gimple_seq_add_stmt_without_update (&seq, g);
+
+  edge e = single_succ_edge (ENTRY_BLOCK_PTR_FOR_FN (cfun));
+  gsi_insert_seq_on_edge_immediate (e, seq);
+  return last_alloca_addr;
+}
+
+/* Insert __asan_allocas_unpoison (top, bottom) call after
+   __builtin_stack_restore (new_sp) call.
+   The pseudocode of this routine should look like this:
+ __builtin_stack_restore (new_sp);
+ top = last_alloca_addr;
+ bot = virtual_dynamic_stack_rtx;
+ __asan_allocas_unpoison (top, bottom);
+ last_alloca_addr = new_sp;
+   We don't use new_sp as bot parameter because on some architectures
+   SP has non zero offset from dynamic stack area.  Moreover, on some
+   architectures this offset (STACK_DYNAMIC_OFFSET) becomes known for each
+   particular function only after all callees were expanded to rtl.
+   The most noticable example is PowerPC{,64}, see
+   http://refspecs.linuxfoundation.org/

Re: C/C++ PATCH to implement -Wmultistatement-macros (PR c/80116)

2017-06-26 Thread David Malcolm
On Mon, 2017-06-26 at 11:40 +0200, Marek Polacek wrote:
> On Mon, Jun 19, 2017 at 12:01:06PM +0200, Marek Polacek wrote:
> > On Tue, Jun 13, 2017 at 03:29:32PM +, Joseph Myers wrote:
> > > On Tue, 13 Jun 2017, Marek Polacek wrote:
> > > 
> > > > * c-parser.c (c_parser_if_body): Set the location of
> > > > the
> > > > body of the conditional after parsing all the labels. 
> > > >  Call
> > > > warn_for_multistatement_macros.
> > > > (c_parser_else_body): Likewise.
> > > > (c_parser_switch_statement): Likewise.
> > > > (c_parser_while_statement): Likewise.
> > > > (c_parser_for_statement): Likewise.
> > > > (c_parser_statement): Add a default argument.  Save the
> > > > location
> > > > after labels have been parsed.
> > > > (c_parser_c99_block_statement): Likewise.
> > > 
> > > The gcc/c/ changes are OK.
> > 
> > Thanks.
> > 
> > David, do you have any more comments on the patch?
> 
> Seems not, so I'll commit the patch today.
> 

Oops; sorry.

I think all I had were those relatively minor comments about the
wording in the description/invoke.texi, but I have no objection to this
going into trunk as-is.


Re: libdecnumber/bid/bid2dpd_dpd2bid.c: Simplify code

2017-06-26 Thread Sylvestre Ledru


Le 26/05/2017 à 15:34, Sylvestre Ledru a écrit :
> Hello,
>
> The attach patch (dup.diff) performs the following changes:
>
> * bid/bid2dpd_dpd2bid.c: Remove identical code for different
> branches (CID 1286836, 1286837, 1286838)
>   Remove some useless } else { declaration as we are returning
>   Remove some whitespace changes & tab
>
> i attached the word diff to highlight the change.
>
> No functional changes! The identical code has been found by coverity.
>
> Thanks!
>
> S
>
>
ping?



Re: [PATCH] lto-wrapper.c (copy_file): Fix resource leaks

2017-06-26 Thread Sylvestre Ledru


Le 16/05/2017 à 09:59, Sylvestre Ledru a écrit :
> Le 15/05/2017 à 23:58, Jeff Law a écrit :
>> On 05/14/2017 04:00 AM, Sylvestre Ledru wrote:
>>> Add missing fclose
>>> CID 1407987, 1407986
>>>
>>> S
>>>
>>>
>>>
>>> 0005-2017-05-14-Sylvestre-Ledru-sylvestre-debian.org.patch
>>>
>>>
>>>  From d255827a64012fb81937d6baa8534eabecf9b735 Mon Sep 17 00:00:00 2001
>>> From: Sylvestre Ledru
>>> Date: Sun, 14 May 2017 11:37:37 +0200
>>> Subject: [PATCH 5/5] 2017-05-14  Sylvestre Ledru
>>>
>>> * lto-wrapper.c (copy_file): Fix resource leaks
>>>CID 1407987, 1407986
>> Doesn't this still leak in the cases were we call fatal_error? 
> Indeed. thanks! Patch updated!
>
ping ?
S



Re: [PATCH] lto-wrapper.c (copy_file): Fix resource leaks

2017-06-26 Thread Jakub Jelinek
On Mon, May 15, 2017 at 03:58:29PM -0600, Jeff Law wrote:
> On 05/14/2017 04:00 AM, Sylvestre Ledru wrote:
> > Add missing fclose
> > CID 1407987, 1407986
> > 
> > S
> > 
> > 
> > 
> > 0005-2017-05-14-Sylvestre-Ledru-sylvestre-debian.org.patch
> > 
> > 
> >  From d255827a64012fb81937d6baa8534eabecf9b735 Mon Sep 17 00:00:00 2001
> > From: Sylvestre Ledru
> > Date: Sun, 14 May 2017 11:37:37 +0200
> > Subject: [PATCH 5/5] 2017-05-14  Sylvestre Ledru
> > 
> > * lto-wrapper.c (copy_file): Fix resource leaks
> >CID 1407987, 1407986
> Doesn't this still leak in the cases were we call fatal_error?

fatal_error is a noreturn function, why should we bother to do any cleanups
after it?  All that code is going to be optimized away anyway.

Jakub


Re: [PATCH, ARM] Implement __ARM_FEATURE_COPROC coprocessor intrinsic feature macro

2017-06-26 Thread Thomas Preudhomme

Hi Christophe,

On 21/06/17 17:57, Christophe Lyon wrote:

Hi,


On 19 June 2017 at 11:32, Richard Earnshaw (lists)
 wrote:

On 16/06/17 15:56, Prakhar Bahuguna wrote:

On 16/06/2017 15:37:18, Richard Earnshaw (lists) wrote:

On 16/06/17 08:48, Prakhar Bahuguna wrote:

On 15/06/2017 17:23:43, Richard Earnshaw (lists) wrote:

On 14/06/17 10:35, Prakhar Bahuguna wrote:

The ARM ACLE defines the __ARM_FEATURE_COPROC macro which indicates which
coprocessor intrinsics are available for the target. If __ARM_FEATURE_COPROC is
undefined, the target does not support coprocessor intrinsics. The feature
levels are defined as follows:

+-+---+--+
| **Bit** | **Value** | **Intrinsics Available** |
+-+---+--+
| 0   | 0x1   | __arm_cdp __arm_ldc, __arm_ldcl, __arm_stc,  |
| |   | __arm_stcl, __arm_mcr and __arm_mrc  |
+-+---+--+
| 1   | 0x2   | __arm_cdp2, __arm_ldc2, __arm_stc2, __arm_ldc2l, |
| |   | __arm_stc2l, __arm_mcr2 and __arm_mrc2   |
+-+---+--+
| 2   | 0x4   | __arm_mcrr and __arm_mrrc|
+-+---+--+
| 3   | 0x8   | __arm_mcrr2 and __arm_mrrc2  |
+-+---+--+

This patch implements full support for this feature macro as defined in section
5.9 of the ACLE
(https://developer.arm.com/products/software-development-tools/compilers/arm-compiler-5/docs/101028/latest/5-feature-test-macros).

gcc/ChangeLog:

2017-06-14  Prakhar Bahuguna  

   * config/arm/arm-c.c (arm_cpu_builtins): New block to define
__ARM_FEATURE_COPROC according to support.

2017-06-14  Prakhar Bahuguna  
   * gcc/testsuite/gcc.target/arm/acle/cdp.c: Add feature macro bitmap
   test.
   * gcc/testsuite/gcc.target/arm/acle/cdp2.c: Likewise.
   * gcc/testsuite/gcc.target/arm/acle/ldc.c: Likewise.
   * gcc/testsuite/gcc.target/arm/acle/ldc2.c: Likewise.
   * gcc/testsuite/gcc.target/arm/acle/ldc2l.c: Likewise.
   * gcc/testsuite/gcc.target/arm/acle/ldcl.c: Likewise.
   * gcc/testsuite/gcc.target/arm/acle/mcr.c: Likewise.
   * gcc/testsuite/gcc.target/arm/acle/mcr2.c: Likewise.
   * gcc/testsuite/gcc.target/arm/acle/mcrr.c: Likewise.
   * gcc/testsuite/gcc.target/arm/acle/mcrr2.c: Likewise.
   * gcc/testsuite/gcc.target/arm/acle/mrc.c: Likewise.
   * gcc/testsuite/gcc.target/arm/acle/mrc2.c: Likewise.
   * gcc/testsuite/gcc.target/arm/acle/mrrc.c: Likewise.
   * gcc/testsuite/gcc.target/arm/acle/mrrc2.c: Likewise.
   * gcc/testsuite/gcc.target/arm/acle/stc.c: Likewise.
   * gcc/testsuite/gcc.target/arm/acle/stc2.c: Likewise.
   * gcc/testsuite/gcc.target/arm/acle/stc2l.c: Likewise.
   * gcc/testsuite/gcc.target/arm/acle/stcl.c: Likewise.

Testing done: ACLE regression tests updated with tests for feature macro bits.
All regression tests pass.

Okay for trunk?


0001-Implement-__ARM_FEATURE_COPROC-coprocessor-intrinsic.patch


 From 79d71aec9d2bdee936b240ae49368ff5f8d8fc48 Mon Sep 17 00:00:00 2001
From: Prakhar Bahuguna 
Date: Tue, 2 May 2017 13:43:40 +0100
Subject: [PATCH] Implement __ARM_FEATURE_COPROC coprocessor intrinsic feature
  macro

---
  gcc/config/arm/arm-c.c| 19 +++
  gcc/testsuite/gcc.target/arm/acle/cdp.c   |  3 +++
  gcc/testsuite/gcc.target/arm/acle/cdp2.c  |  3 +++
  gcc/testsuite/gcc.target/arm/acle/ldc.c   |  3 +++
  gcc/testsuite/gcc.target/arm/acle/ldc2.c  |  3 +++
  gcc/testsuite/gcc.target/arm/acle/ldc2l.c |  3 +++
  gcc/testsuite/gcc.target/arm/acle/ldcl.c  |  3 +++
  gcc/testsuite/gcc.target/arm/acle/mcr.c   |  3 +++
  gcc/testsuite/gcc.target/arm/acle/mcr2.c  |  3 +++
  gcc/testsuite/gcc.target/arm/acle/mcrr.c  |  3 +++
  gcc/testsuite/gcc.target/arm/acle/mcrr2.c |  3 +++
  gcc/testsuite/gcc.target/arm/acle/mrc.c   |  3 +++
  gcc/testsuite/gcc.target/arm/acle/mrc2.c  |  3 +++
  gcc/testsuite/gcc.target/arm/acle/mrrc.c  |  3 +++
  gcc/testsuite/gcc.target/arm/acle/mrrc2.c |  3 +++
  gcc/testsuite/gcc.target/arm/acle/stc.c   |  3 +++
  gcc/testsuite/gcc.target/arm/acle/stc2.c  |  3 +++
  gcc/testsuite/gcc.target/arm/acle/stc2l.c |  3 +++
  gcc/testsuite/gcc.target/arm/acle/stcl.c  |  3 +++
  19 files changed, 73 insertions(+)

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 3abe7d1f1f5..3daf4e5e1f3 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -200,6 +200,25 @@ arm_cpu_builtins (struct cpp_reader* pfile)
def_or_undef_macro (pfile, "__ARM_FEATURE_IDIV", TARGET_IDIV);

def_or_undef_macro (pfile, "__ARM_ASM_SYNTAX_UNIFIED__", 
inline_asm_unified);
+
+  if ((!TARGET_THUMB || TARGET_THUMB2) && arm_arch4 &&


(!TARGET_THU

Re: [PATCH, ARM] Implement __ARM_FEATURE_COPROC coprocessor intrinsic feature macro

2017-06-26 Thread Christophe Lyon
On 26 June 2017 at 16:09, Thomas Preudhomme
 wrote:
> Hi Christophe,
>
>
> On 21/06/17 17:57, Christophe Lyon wrote:
>>
>> Hi,
>>
>>
>> On 19 June 2017 at 11:32, Richard Earnshaw (lists)
>>  wrote:
>>>
>>> On 16/06/17 15:56, Prakhar Bahuguna wrote:

 On 16/06/2017 15:37:18, Richard Earnshaw (lists) wrote:
>
> On 16/06/17 08:48, Prakhar Bahuguna wrote:
>>
>> On 15/06/2017 17:23:43, Richard Earnshaw (lists) wrote:
>>>
>>> On 14/06/17 10:35, Prakhar Bahuguna wrote:

 The ARM ACLE defines the __ARM_FEATURE_COPROC macro which indicates
 which
 coprocessor intrinsics are available for the target. If
 __ARM_FEATURE_COPROC is
 undefined, the target does not support coprocessor intrinsics. The
 feature
 levels are defined as follows:


 +-+---+--+
 | **Bit** | **Value** | **Intrinsics Available**
 |

 +-+---+--+
 | 0   | 0x1   | __arm_cdp __arm_ldc, __arm_ldcl, __arm_stc,
 |
 | |   | __arm_stcl, __arm_mcr and __arm_mrc
 |

 +-+---+--+
 | 1   | 0x2   | __arm_cdp2, __arm_ldc2, __arm_stc2,
 __arm_ldc2l, |
 | |   | __arm_stc2l, __arm_mcr2 and __arm_mrc2
 |

 +-+---+--+
 | 2   | 0x4   | __arm_mcrr and __arm_mrrc
 |

 +-+---+--+
 | 3   | 0x8   | __arm_mcrr2 and __arm_mrrc2
 |

 +-+---+--+

 This patch implements full support for this feature macro as defined
 in section
 5.9 of the ACLE

 (https://developer.arm.com/products/software-development-tools/compilers/arm-compiler-5/docs/101028/latest/5-feature-test-macros).

 gcc/ChangeLog:

 2017-06-14  Prakhar Bahuguna  

* config/arm/arm-c.c (arm_cpu_builtins): New block to define
 __ARM_FEATURE_COPROC according to support.

 2017-06-14  Prakhar Bahuguna  
* gcc/testsuite/gcc.target/arm/acle/cdp.c: Add feature macro
 bitmap
test.
* gcc/testsuite/gcc.target/arm/acle/cdp2.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/ldc.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/ldc2.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/ldc2l.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/ldcl.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/mcr.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/mcr2.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/mcrr.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/mcrr2.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/mrc.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/mrc2.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/mrrc.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/mrrc2.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/stc.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/stc2.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/stc2l.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/stcl.c: Likewise.

 Testing done: ACLE regression tests updated with tests for feature
 macro bits.
 All regression tests pass.

 Okay for trunk?


 0001-Implement-__ARM_FEATURE_COPROC-coprocessor-intrinsic.patch


  From 79d71aec9d2bdee936b240ae49368ff5f8d8fc48 Mon Sep 17 00:00:00
 2001
 From: Prakhar Bahuguna 
 Date: Tue, 2 May 2017 13:43:40 +0100
 Subject: [PATCH] Implement __ARM_FEATURE_COPROC coprocessor
 intrinsic feature
   macro

 ---
   gcc/config/arm/arm-c.c| 19 +++
   gcc/testsuite/gcc.target/arm/acle/cdp.c   |  3 +++
   gcc/testsuite/gcc.target/arm/acle/cdp2.c  |  3 +++
   gcc/testsuite/gcc.target/arm/acle/ldc.c   |  3 +++
   gcc/testsuite/gcc.target/arm/acle/ldc2.c  |  3 +++
   gcc/testsuite/gcc.target/arm/acle/ldc2l.c |  3 +++
   gcc/testsuite/gcc.target/arm/acle/ldcl.c  |  3 +++
   gcc/testsuite/gcc.target/arm/acle/mcr.c   |  3 +++
   gcc/testsuite/gcc.target/arm/acle/mcr2.c  |  3 +++
   gcc/testsuite/gcc.target/arm/acle/mcrr.c  |  3 +++
   gcc/testsuite/gcc.target/arm/ac

[wwwdocs] Add -Wmultistatement-macros

2017-06-26 Thread Marek Polacek
I've committed the following to reflect the recent addition.

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-8/changes.html,v
retrieving revision 1.5
diff -u -r1.5 changes.html
--- changes.html20 Jun 2017 10:19:11 -  1.5
+++ changes.html26 Jun 2017 14:53:28 -
@@ -50,6 +50,14 @@
 
 C family
 
+New command-line options have been added for the C and C++ compilers:
+  
+   -Wmultistatement-macros warns about unsafe macros
+   expanding to multiple statements used as a body of a clause such
+   as if, else, while,
+   switch, or for.
+  
+
 -fno-strict-overflow is now mapped to
  -fwrapv and signed integer overflow is now undefined by
  default at all optimization levels.  Using

Marek


Re: [PATCH] fold a * (a > 0 ? 1 : -1) to abs(a) and related optimizations

2017-06-26 Thread Joseph Myers
On Sat, 24 Jun 2017, Marc Glisse wrote:

> * if X is NaN, we may get a qNaN with the wrong sign bit. We probably don't
> care much though...

The sign bit from a multiplication involving a NaN is not specified.  
*But* making any of these transformations with a qNaN loses the "invalid" 
exception from an ordered comparison involving a qNaN, so isn't valid in 
the case of (qNaNs respected and trapping-math).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: C/C++ PATCH to add __typeof_noqual (PR c/65455, c/39985)

2017-06-26 Thread Marek Polacek
On Fri, Jun 23, 2017 at 04:27:47PM +, Joseph Myers wrote:
> On Fri, 23 Jun 2017, Marek Polacek wrote:
> 
> > You'll also see that I dropped all qualifiers for __auto_type.  But I 
> > actually
> > couldn't trigger the
> > init_type = c_build_qualified_type (init_type, TYPE_UNQUALIFIED);
> > line in c_parser_declaration_or_fndef (even when running the whole 
> > testsuite)
> > so I'm not convinced it makes any difference.
> 
> It looks like it would only make a difference, in the present code, for 
> the case of an atomic register variable, or bit-field in an atomic 
> structure, as the initializer.  Those are the cases where 

Ah, right.  But since __auto_type doesn't work with bit-fields, I only
tested the register variant.

> convert_lvalue_to_rvalue would not return a non-atomic result, given an 
> atomic argument.  With the proposed change, it should apply to any 
> qualified lvalue used as the initializer.
 
Right.

> > @@ -506,6 +508,7 @@ const struct c_common_resword c_common_reswords[] =
> >{ "typename",RID_TYPENAME,   D_CXXONLY | D_CXXWARN },
> >{ "typeid",  RID_TYPEID, D_CXXONLY | D_CXXWARN },
> >{ "typeof",  RID_TYPEOF, D_ASM | D_EXT },
> > +  { "typeof_noqual",   RID_TYPEOF_NOQUAL, D_ASM | D_EXT },
> >{ "union",   RID_UNION,  0 },
> >{ "unsigned",RID_UNSIGNED,   0 },
> >{ "using",   RID_USING,  D_CXXONLY | D_CXXWARN },
> 
> I don't think we should have this keyword variant.

Ok, dropped.

> I think there should be tests of the change to __auto_type.

I've added one.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-06-26  Marek Polacek  
Richard Henderson  

PR c/65455
PR c/39985
* c-common.c (c_common_reswords): Add __typeof_noqual and
__typeof_noqual__.
(keyword_begins_type_specifier): Handle RID_TYPEOF_NOQUAL.
* c-common.h (enum rid): Add RID_TYPEOF_NOQUAL.

* c-parser.c (c_keyword_starts_typename): Handle RID_TYPEOF_NOQUAL.
(c_token_starts_declspecs): Likewise.
(c_parser_declaration_or_fndef): Always strip all qualifiers for
__auto_type.
(c_parser_declspecs): Handle RID_TYPEOF_NOQUAL.
(c_parser_typeof_specifier): Handle RID_TYPEOF_NOQUAL by dropping
all the qualifiers.
(c_parser_objc_selector): Handle RID_TYPEOF_NOQUAL.

* parser.c (cp_keyword_starts_decl_specifier_p): Handle 
RID_TYPEOF_NOQUAL.
(cp_parser_simple_type_specifier): Handle RID_TYPEOF_NOQUAL by dropping
all the qualifiers.

* doc/extend.texi: Document __typeof_noqual.

* c-c++-common/typeof-noqual-1.c: New test.
* c-c++-common/typeof-noqual-2.c: New test.
* gcc.dg/typeof-noqual-1.c: New test.
* gcc.dg/auto-type-3.c: New test.

diff --git gcc/c-family/c-common.c gcc/c-family/c-common.c
index f6a9d05..7993de2 100644
--- gcc/c-family/c-common.c
+++ gcc/c-family/c-common.c
@@ -433,6 +433,8 @@ const struct c_common_resword c_common_reswords[] =
   { "__transaction_cancel", RID_TRANSACTION_CANCEL, 0 },
   { "__typeof",RID_TYPEOF, 0 },
   { "__typeof__",  RID_TYPEOF, 0 },
+  { "__typeof_noqual", RID_TYPEOF_NOQUAL, 0 },
+  { "__typeof_noqual__", RID_TYPEOF_NOQUAL, 0 },
   { "__underlying_type", RID_UNDERLYING_TYPE, D_CXXONLY },
   { "__volatile",  RID_VOLATILE,   0 },
   { "__volatile__",RID_VOLATILE,   0 },
@@ -7511,6 +7513,7 @@ keyword_begins_type_specifier (enum rid keyword)
 case RID_SAT:
 case RID_COMPLEX:
 case RID_TYPEOF:
+case RID_TYPEOF_NOQUAL:
 case RID_STRUCT:
 case RID_CLASS:
 case RID_UNION:
diff --git gcc/c-family/c-common.h gcc/c-family/c-common.h
index f3d051a..ad00ae8 100644
--- gcc/c-family/c-common.h
+++ gcc/c-family/c-common.h
@@ -100,8 +100,9 @@ enum rid
   /* C extensions */
   RID_ASM,   RID_TYPEOF,   RID_ALIGNOF,  RID_ATTRIBUTE,  RID_VA_ARG,
   RID_EXTENSION, RID_IMAGPART, RID_REALPART, RID_LABEL,  RID_CHOOSE_EXPR,
-  RID_TYPES_COMPATIBLE_P,  RID_BUILTIN_COMPLEX, 
RID_BUILTIN_SHUFFLE,
-  RID_DFLOAT32, RID_DFLOAT64, RID_DFLOAT128,
+  RID_TYPES_COMPATIBLE_P,  RID_BUILTIN_COMPLEX,
+  RID_BUILTIN_SHUFFLE, RID_DFLOAT32, RID_DFLOAT64,  RID_DFLOAT128,
+  RID_TYPEOF_NOQUAL,
 
   /* TS 18661-3 keywords, in the same sequence as the TI_* values.  */
   RID_FLOAT16,
diff --git gcc/c/c-parser.c gcc/c/c-parser.c
index f8fbc92..eb6cfad 100644
--- gcc/c/c-parser.c
+++ gcc/c/c-parser.c
@@ -495,6 +495,7 @@ c_keyword_starts_typename (enum rid keyword)
 case RID_STRUCT:
 case RID_UNION:
 case RID_TYPEOF:
+case RID_TYPEOF_NOQUAL:
 case RID_CONST:
 case RID_ATOMIC:
 case RID_VOLATILE:
@@ -671,6 +672,7 @@ c_token_starts_declspecs (c_token *token)
case RID_STRUCT:
case RID_UNION:
case RID_TYPEOF:
+   case RID_TYPEOF_NOQUAL:
case RID_CONST:
case RID_VOLATILE:
   

builtin_define _VX_TOOL and _VX_TOOL_FAMILY for VxWorks

2017-06-26 Thread Olivier Hainque
Hello,

Defining at least one of the two is needed on VxWorks 7 and helpful
in some cases on VxWorks 6.

We have been using this in house for all our ports for a while, and I verified
that it allows a toolchain + libgcc build for x86_64-vxworks7 to finish with
mainline, in association with further patches to come for the general support
of vxworks 7 and the x86_64 configuration in particular.

Committing to mainline.

2017-06-26  Jerome Lambourg  

* config/vxworks.h (VXWORKS_OS_CPP_BUILTINS): builtin_define
_VX_TOOL_FAMILY and _VX_TOOL to gnu. 

With Kind Regards,

Olivier



0001-improve-automatic-defines-on-VxWorks.patch
Description: Binary data


Re: [PATCH] Fold (a > 0 ? 1.0 : -1.0) into copysign (1.0, a) and a * copysign (1.0, a) into abs(a)

2017-06-26 Thread Joseph Myers
On Mon, 26 Jun 2017, Richard Sandiford wrote:

> > Non-generic builtins like copysign are such a pain... We also end up 
> > missing the 128-bit case that way (pre-existing problem, not your patch). 
> > We seem to have a corresponding internal function, but apparently it is 
> > not used until expansion (well, maybe during vectorization).
> 
> It should be OK to introduce uses of the internal functions whenever
> it's useful.  The match code will check that the internal function is
> implemented before allowing the transformation.

How well would internal functions work with some having built-in functions 
only for float, double and long double, others (like copysign) having them 
for all the _FloatN and _FloatNx types?

(Preferably of course the built-in functions for libm functions would 
generally exist for all the types.  I didn't include that in my patches 
adding _FloatN/_FloatNx support 
 
 and noted 
various issues to watch out for there, especially increasing the size of 
the enum of built-in functions and the startup cost of initializing them.  
There are other optimizations with similar issues of only covering float, 
double and long double.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] lto-wrapper.c (copy_file): Fix resource leaks

2017-06-26 Thread Jeff Law
On 06/26/2017 07:58 AM, Jakub Jelinek wrote:
> On Mon, May 15, 2017 at 03:58:29PM -0600, Jeff Law wrote:
>> On 05/14/2017 04:00 AM, Sylvestre Ledru wrote:
>>> Add missing fclose
>>> CID 1407987, 1407986
>>>
>>> S
>>>
>>>
>>>
>>> 0005-2017-05-14-Sylvestre-Ledru-sylvestre-debian.org.patch
>>>
>>>
>>>  From d255827a64012fb81937d6baa8534eabecf9b735 Mon Sep 17 00:00:00 2001
>>> From: Sylvestre Ledru
>>> Date: Sun, 14 May 2017 11:37:37 +0200
>>> Subject: [PATCH 5/5] 2017-05-14  Sylvestre Ledru
>>>
>>> * lto-wrapper.c (copy_file): Fix resource leaks
>>>CID 1407987, 1407986
>> Doesn't this still leak in the cases were we call fatal_error?
> 
> fatal_error is a noreturn function, why should we bother to do any cleanups
> after it?  All that code is going to be optimized away anyway.
But cleaning this kind of thing up does help static analyzers and such.
 ISTM that we'd need a compelling reason _not_ to accept this kind of patch.

jeff


Re: [PATCH] lto-wrapper.c (copy_file): Fix resource leaks

2017-06-26 Thread Jakub Jelinek
On Mon, Jun 26, 2017 at 09:22:31AM -0600, Jeff Law wrote:
> >>>  From d255827a64012fb81937d6baa8534eabecf9b735 Mon Sep 17 00:00:00 2001
> >>> From: Sylvestre Ledru
> >>> Date: Sun, 14 May 2017 11:37:37 +0200
> >>> Subject: [PATCH 5/5] 2017-05-14  Sylvestre Ledru
> >>>
> >>>   * lto-wrapper.c (copy_file): Fix resource leaks
> >>>CID 1407987, 1407986
> >> Doesn't this still leak in the cases were we call fatal_error?
> > 
> > fatal_error is a noreturn function, why should we bother to do any cleanups
> > after it?  All that code is going to be optimized away anyway.
> But cleaning this kind of thing up does help static analyzers and such.
>  ISTM that we'd need a compelling reason _not_ to accept this kind of patch.

Are the static analyzers so dumb to report something like that?

Unless we have a proof that they are, I think the original short patch is
the way to go, rather than the much more complicated later patch.

Jakub


Re: [PATCH, 2/4] Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin

2017-06-26 Thread Joseph Myers
On Mon, 26 Jun 2017, Tom de Vries wrote:

> > 2. Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin
> 
> This patch adds handling of:
> - GOMP_OPENACC_NVPTX_SAVE_TEMPS=[01], and
> - GOMP_OPENACC_NVPTX_DISASM=[01]
> 
> The filename used for dumping the module is plugin-nvptx..cubin.

Are you sure this use of getenv and writing to that file is safe for 
setuid/setgid programs?  I'd expect you to need to use secure_getenv as in 
plugin-hsa.c; certainly for anything that could results in writes to a 
file like that.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, 2/4] Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin

2017-06-26 Thread Jakub Jelinek
On Mon, Jun 26, 2017 at 03:26:57PM +, Joseph Myers wrote:
> On Mon, 26 Jun 2017, Tom de Vries wrote:
> 
> > > 2. Handle GOMP_OPENACC_NVPTX_{DISASM,SAVE_TEMPS} in libgomp nvptx plugin
> > 
> > This patch adds handling of:
> > - GOMP_OPENACC_NVPTX_SAVE_TEMPS=[01], and
> > - GOMP_OPENACC_NVPTX_DISASM=[01]
> > 
> > The filename used for dumping the module is plugin-nvptx..cubin.
> 
> Are you sure this use of getenv and writing to that file is safe for 
> setuid/setgid programs?  I'd expect you to need to use secure_getenv as in 
> plugin-hsa.c; certainly for anything that could results in writes to a 
> file like that.

Yeah, definitely it should be using secure_getenv/__secure_getenv.
And IMNSHO GOMP_DEBUG too.

Jakub


common grounds for VxWorks 7 support

2017-06-26 Thread Olivier Hainque
Hello,

This patch introduces common grounds for VxWorks 7 support.

The main device is the introduction of a TARGET_VXWORKS7 macro which we
leverage throughout the various vxworks.h / vxworks.c files, common or cpu
specific.

We have done several CPU specific ports already, and this scheme works pretty
well.

There's quite a bit of common material between Vx7 and the previous versions.

For the distinctions we need to make, the use of preprocessing doesn't impair
readability IMO, on the contrary, while the usual alternative with extra target
files rapidly becomes subtle to handle right (getting the proper inclusion
ordering while preventing code duplication is not always easy and may make
maintenance harder or more error-prone).

Nathan does this work for you ?

If OK on principle, I'm not sure who is to approve the tiny config.gcc part.

Thanks in advance for your feedback,

With Kind Regards,

Olivier

2017-06-26  Jerome Lambourg  
Olivier Hainque  

* config.gcc (tm_defines for VxWorks): Define TARGET_VXWORKS7 for
all vxworks7 targets.
* config/vxworks.h (TARGET_VXWORKS7): If not defined, define to 0.
(VXWORKS_ADDITIONAL_CPP_SPEC): Alternative definition for VXWORKS7.
(VXWORKS_LIBS_RTP, VXWORKS_LIBS_RTP_DIR): New macros, allowing
variations for VX6/VX7 and 32/64bits later on in ...
(VXWORKS_LIB_SPEC): Leverage new macros.
(VXWORKS_OS_CPP_BUILTINS): Define _VSB_CONFIG_FILE for VXWORKS7,
as well as _ALLOW_KEYWORD_MACROS when "inline" is not a keyword.



0002-common-vxworks7-support.patch
Description: Binary data


Re: common grounds for VxWorks 7 support

2017-06-26 Thread Nathan Sidwell

On 06/26/2017 11:38 AM, Olivier Hainque wrote:


Nathan does this work for you ?


certainly,


If OK on principle, I'm not sure who is to approve the tiny config.gcc part.


You :) (it's in a vxworks-specific fragment, I think the Changelog 
format for that kind of thing is:

   * config.gcc (triplet-glob): ...





--
Nathan Sidwell


Re: [PATCH, ARM] Implement __ARM_FEATURE_COPROC coprocessor intrinsic feature macro

2017-06-26 Thread Thomas Preudhomme



On 26/06/17 15:16, Christophe Lyon wrote:

On 26 June 2017 at 16:09, Thomas Preudhomme
 wrote:

Hi Christophe,


On 21/06/17 17:57, Christophe Lyon wrote:


Hi,


On 19 June 2017 at 11:32, Richard Earnshaw (lists)
 wrote:


On 16/06/17 15:56, Prakhar Bahuguna wrote:


On 16/06/2017 15:37:18, Richard Earnshaw (lists) wrote:


On 16/06/17 08:48, Prakhar Bahuguna wrote:


On 15/06/2017 17:23:43, Richard Earnshaw (lists) wrote:


On 14/06/17 10:35, Prakhar Bahuguna wrote:


The ARM ACLE defines the __ARM_FEATURE_COPROC macro which indicates
which
coprocessor intrinsics are available for the target. If
__ARM_FEATURE_COPROC is
undefined, the target does not support coprocessor intrinsics. The
feature
levels are defined as follows:


+-+---+--+
| **Bit** | **Value** | **Intrinsics Available**
|

+-+---+--+
| 0   | 0x1   | __arm_cdp __arm_ldc, __arm_ldcl, __arm_stc,
|
| |   | __arm_stcl, __arm_mcr and __arm_mrc
|

+-+---+--+
| 1   | 0x2   | __arm_cdp2, __arm_ldc2, __arm_stc2,
__arm_ldc2l, |
| |   | __arm_stc2l, __arm_mcr2 and __arm_mrc2
|

+-+---+--+
| 2   | 0x4   | __arm_mcrr and __arm_mrrc
|

+-+---+--+
| 3   | 0x8   | __arm_mcrr2 and __arm_mrrc2
|

+-+---+--+

This patch implements full support for this feature macro as defined
in section
5.9 of the ACLE

(https://developer.arm.com/products/software-development-tools/compilers/arm-compiler-5/docs/101028/latest/5-feature-test-macros).

gcc/ChangeLog:

2017-06-14  Prakhar Bahuguna  

* config/arm/arm-c.c (arm_cpu_builtins): New block to define
 __ARM_FEATURE_COPROC according to support.

2017-06-14  Prakhar Bahuguna  
* gcc/testsuite/gcc.target/arm/acle/cdp.c: Add feature macro
bitmap
test.
* gcc/testsuite/gcc.target/arm/acle/cdp2.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/ldc.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/ldc2.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/ldc2l.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/ldcl.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/mcr.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/mcr2.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/mcrr.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/mcrr2.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/mrc.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/mrc2.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/mrrc.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/mrrc2.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/stc.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/stc2.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/stc2l.c: Likewise.
* gcc/testsuite/gcc.target/arm/acle/stcl.c: Likewise.

Testing done: ACLE regression tests updated with tests for feature
macro bits.
All regression tests pass.

Okay for trunk?


0001-Implement-__ARM_FEATURE_COPROC-coprocessor-intrinsic.patch


  From 79d71aec9d2bdee936b240ae49368ff5f8d8fc48 Mon Sep 17 00:00:00
2001
From: Prakhar Bahuguna 
Date: Tue, 2 May 2017 13:43:40 +0100
Subject: [PATCH] Implement __ARM_FEATURE_COPROC coprocessor
intrinsic feature
   macro

---
   gcc/config/arm/arm-c.c| 19 +++
   gcc/testsuite/gcc.target/arm/acle/cdp.c   |  3 +++
   gcc/testsuite/gcc.target/arm/acle/cdp2.c  |  3 +++
   gcc/testsuite/gcc.target/arm/acle/ldc.c   |  3 +++
   gcc/testsuite/gcc.target/arm/acle/ldc2.c  |  3 +++
   gcc/testsuite/gcc.target/arm/acle/ldc2l.c |  3 +++
   gcc/testsuite/gcc.target/arm/acle/ldcl.c  |  3 +++
   gcc/testsuite/gcc.target/arm/acle/mcr.c   |  3 +++
   gcc/testsuite/gcc.target/arm/acle/mcr2.c  |  3 +++
   gcc/testsuite/gcc.target/arm/acle/mcrr.c  |  3 +++
   gcc/testsuite/gcc.target/arm/acle/mcrr2.c |  3 +++
   gcc/testsuite/gcc.target/arm/acle/mrc.c   |  3 +++
   gcc/testsuite/gcc.target/arm/acle/mrc2.c  |  3 +++
   gcc/testsuite/gcc.target/arm/acle/mrrc.c  |  3 +++
   gcc/testsuite/gcc.target/arm/acle/mrrc2.c |  3 +++
   gcc/testsuite/gcc.target/arm/acle/stc.c   |  3 +++
   gcc/testsuite/gcc.target/arm/acle/stc2.c  |  3 +++
   gcc/testsuite/gcc.target/arm/acle/stc2l.c |  3 +++
   gcc/testsuite/gcc.target/arm/acle/stcl.c  |  3 +++
   19 files changed, 73 insertions(+)

diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
index 3abe7d1f1f5..3daf4e5e1f3 100644
--- a/gcc/config/arm/arm-c.c
+++ b/gcc/config/arm/arm-c.c
@@ -200,6 +200,25 @@ arm_cpu_builtins (struct cpp_reader* pfile)
 def_or_undef_macro (pfile, "__ARM_FEATURE_IDIV", TARGET_IDIV);

 def_or_undef_macro (pfile, "__ARM_ASM_SYNTAX_UNIFIED__",
inline_asm_unified);
+
+  if ((!

Re: common grounds for VxWorks 7 support

2017-06-26 Thread Olivier Hainque

> On Jun 26, 2017, at 17:56 , Nathan Sidwell  wrote:
> 
> On 06/26/2017 11:38 AM, Olivier Hainque wrote:
> 
>> Nathan does this work for you ?
> 
> certainly,

Great!

>> If OK on principle, I'm not sure who is to approve the tiny config.gcc part.
> 
> You :) (it's in a vxworks-specific fragment,

Oh, nice :)

> I think the Changelog format for that kind of thing is:
>   * config.gcc (triplet-glob): ...

I see, will adjust.

Thanks for your prompt feedback!

More patches to come :)

Olivier



Re: C/C++ PATCH to add __typeof_noqual (PR c/65455, c/39985)

2017-06-26 Thread Martin Sebor

On 06/23/2017 08:46 AM, Marek Polacek wrote:

This patch adds a variant of __typeof, called __typeof_noqual.  As the name
suggests, this variant always drops all qualifiers, not just when the type
is atomic.  This was discussed several times in the past, see e.g.

or

It's been brought to my attention again here:


One approach would be to just modify the current __typeof, but that could
cause some incompatibilities, I'm afraid.  This is based on rth's earlier
patch:  but I
didn't do the address space-stripping variant __typeof_noas.  I also added
a couple of missing things.


I haven't reviewed all the discussions super carefully so I wonder
what alternatives have been considered.  For instance, it seems to
me that it should be possible to emulate __typeof_noqual__ by relying
on the atomic built-ins' type-genericity.  E.g., like this:

  #define __typeof_noqual__(x) \
__typeof__ (__atomic_load_n ((__typeof__ (x)*)0, 0))

Alternatively, adding support for lower-level C-only primitives like
__remove_const and __remove_volatile, to parallel the C++ library
traits, might provide a more general solution and avoid introducing
yet another mechanism for determining the type of an expression to
the languages (C++ already has a few).


+@code{typeof_noqual} behaves the same except that it strips type qualifiers
+such as @code{const} and @code{volatile}, if given an expression.  This can
+be useful for certain macros when passed const arguments:
+
+@smallexample
+#define MAX(__x, __y)  \
+  (@{  \
+  __typeof_noqual(__x) __ret = __x;\
+  if (__y > __ret) __ret = __y; \
+__ret; \
+  @})


The example should probably avoid using reserved names (with
leading/double underscores).

Martin



[C++ PATCH] identifier flags

2017-06-26 Thread Nathan Sidwell
This patch continues my changes to the identifier node flags.  It makes 
use of the new enumeration to reimplement some of the accessors and checks.


We don't have to go checking for the various special identifiers 
explicitly, just test various bits in the identifier node.


One bit in the land_decl_fn struct goes away.

nathan
--
Nathan Sidwell
2017-06-26  Nathan Sidwell  

	gcc/cp/
	* cp-tree.h (lang_decl_fn): Remove assignment_operator_p field.
	(DECL_COMPLETE_CONSTRUCTOR_P): Directly compare
	identifier.
	(DECL_BASE_CONSTRUCTOR_P, DECL_COMPLETE_DESTRUCTOR_P,
	DECL_BASE_DESTRUCTOR_P, DECL_DELETING_DESTRUCTOR_P): Likewise.
	(DECL_ASSIGNMENT_OPERATOR_P): Use IDENTIFIER_ASSIGN_OP_P.
	* decl.c (grok_op_properties): Adjust identifier checking.
	* init.c (expand_default_init): Adjust identifier descision.
	* method.c (implicitly_declare_fn): Don't use
	DECL_ASSIGNMENT_OPERATOR_P.
	* search.c (lookup_fnfields_1): Use IDENTIFIER_CTOR_P,
	IDENTIFIER_DTOR_P.
	* call.c (in_charge_arg_for_name): Reimplement.
	(build_special_member_call): Use IDENTIFIER_CDTOR_P,
	IDENTIFIER_DTOR_P.

	libcc1/
	* libcp1plugin.cc (plugin_build_decl): Don't set
	DECL_ASSIGNMENT_OPERATOR_P.

Index: gcc/cp/call.c
===
--- gcc/cp/call.c	(revision 249654)
+++ gcc/cp/call.c	(working copy)
@@ -8677,20 +8677,22 @@ build_cxx_call (tree fn, int nargs, tree
 tree
 in_charge_arg_for_name (tree name)
 {
- if (name == base_ctor_identifier
-  || name == base_dtor_identifier)
-return integer_zero_node;
-  else if (name == complete_ctor_identifier)
-return integer_one_node;
-  else if (name == complete_dtor_identifier)
-return integer_two_node;
-  else if (name == deleting_dtor_identifier)
-return integer_three_node;
-
-  /* This function should only be called with one of the names listed
- above.  */
-  gcc_unreachable ();
-  return NULL_TREE;
+  if (IDENTIFIER_CTOR_P (name))
+{
+  if (name == complete_ctor_identifier)
+	return integer_one_node;
+  gcc_checking_assert (name == base_ctor_identifier);
+}
+  else
+{
+  if (name == complete_dtor_identifier)
+	return integer_two_node;
+  else if (name == deleting_dtor_identifier)
+	return integer_three_node;
+  gcc_checking_assert (name == base_dtor_identifier);
+}
+
+  return integer_zero_node;
 }
 
 /* We've built up a constructor call RET.  Complain if it delegates to the
@@ -8729,11 +8731,7 @@ build_special_member_call (tree instance
   vec *allocated = NULL;
   tree ret;
 
-  gcc_assert (name == complete_ctor_identifier
-	  || name == base_ctor_identifier
-	  || name == complete_dtor_identifier
-	  || name == base_dtor_identifier
-	  || name == deleting_dtor_identifier
+  gcc_assert (IDENTIFIER_CDTOR_P (name)
 	  || name == cp_assignment_operator_id (NOP_EXPR));
   if (TYPE_P (binfo))
 {
@@ -8753,9 +8751,7 @@ build_special_member_call (tree instance
 instance = build_dummy_object (class_type);
   else
 {
-  if (name == complete_dtor_identifier
-	  || name == base_dtor_identifier
-	  || name == deleting_dtor_identifier)
+  if (IDENTIFIER_DTOR_P (name))
 	gcc_assert (args == NULL || vec_safe_is_empty (*args));
 
   /* Convert to the base class, if necessary.  */
Index: gcc/cp/cp-tree.h
===
--- gcc/cp/cp-tree.h	(revision 249654)
+++ gcc/cp/cp-tree.h	(working copy)
@@ -1776,6 +1776,7 @@ struct GTY(()) language_function {
   (operator_name_info[(int) (CODE)].identifier)
 #define cp_assignment_operator_id(CODE) \
   (assignment_operator_name_info[(int) (CODE)].identifier)
+
 /* In parser.c.  */
 extern tree cp_literal_operator_id (const char *);
 
@@ -2495,25 +2496,27 @@ struct GTY(()) lang_decl_fn {
   struct lang_decl_min min;
 
   /* In an overloaded operator, this is the value of
- DECL_OVERLOADED_OPERATOR_P.  */
+ DECL_OVERLOADED_OPERATOR_P.
+ FIXME: We should really do better in compressing this.  */
   ENUM_BITFIELD (tree_code) operator_code : 16;
 
   unsigned global_ctor_p : 1;
   unsigned global_dtor_p : 1;
-  unsigned assignment_operator_p : 1;
   unsigned static_function : 1;
   unsigned pure_virtual : 1;
   unsigned defaulted_p : 1;
   unsigned has_in_charge_parm_p : 1;
   unsigned has_vtt_parm_p : 1;
-  
   unsigned pending_inline_p : 1;
+
   unsigned nonconverting : 1;
   unsigned thunk_p : 1;
   unsigned this_thunk_p : 1;
   unsigned hidden_friend_p : 1;
   unsigned omp_declare_reduction_p : 1;
-  /* 2 spare bits on 32-bit hosts, 34 on 64-bit hosts.  */
+  /* 3 spare bits.  */
+
+  /* 32-bits padding on 64-bit host.  */
 
   /* For a non-thunk function decl, this is a tree list of
  friendly classes. For a thunk function decl, it is the
@@ -2694,14 +2697,12 @@ struct GTY(()) lang_decl {
 /* Nonzero if NODE (a FUNCTION_DECL) is a constructor for a complete
object.  */
 #define DECL_COMPLETE_CONSTRUCTOR_P(NODE)		\
-  (DECL_CONSTRUCTOR_P (NODE)		

Re: common grounds for VxWorks 7 support

2017-06-26 Thread Jeff Law
On 06/26/2017 09:38 AM, Olivier Hainque wrote:
> Hello,
> 
> This patch introduces common grounds for VxWorks 7 support.
> 
> The main device is the introduction of a TARGET_VXWORKS7 macro which we
> leverage throughout the various vxworks.h / vxworks.c files, common or cpu
> specific.
> 
> We have done several CPU specific ports already, and this scheme works pretty
> well.
> 
> There's quite a bit of common material between Vx7 and the previous versions.
> 
> For the distinctions we need to make, the use of preprocessing doesn't impair
> readability IMO, on the contrary, while the usual alternative with extra 
> target
> files rapidly becomes subtle to handle right (getting the proper inclusion
> ordering while preventing code duplication is not always easy and may make
> maintenance harder or more error-prone).
> 
> Nathan does this work for you ?
> 
> If OK on principle, I'm not sure who is to approve the tiny config.gcc part.
> 
> Thanks in advance for your feedback,
> 
> With Kind Regards,
> 
> Olivier
> 
> 2017-06-26  Jerome Lambourg  
> Olivier Hainque  
> 
> * config.gcc (tm_defines for VxWorks): Define TARGET_VXWORKS7 for
> all vxworks7 targets.
> * config/vxworks.h (TARGET_VXWORKS7): If not defined, define to 0.
> (VXWORKS_ADDITIONAL_CPP_SPEC): Alternative definition for VXWORKS7.
> (VXWORKS_LIBS_RTP, VXWORKS_LIBS_RTP_DIR): New macros, allowing
> variations for VX6/VX7 and 32/64bits later on in ...
> (VXWORKS_LIB_SPEC): Leverage new macros.
> (VXWORKS_OS_CPP_BUILTINS): Define _VSB_CONFIG_FILE for VXWORKS7,
> as well as _ALLOW_KEYWORD_MACROS when "inline" is not a keyword.
> 
I'd think the tiny config.gcc bits would fall under the VxWorks umbrella
and you can self-approve.

jeff


Re: libgo patch committed: Fix ptrace implementation on MIPS

2017-06-26 Thread Ian Lance Taylor
On Sat, Jun 24, 2017 at 12:04 AM, Andreas Schwab  wrote:
> On Jun 23 2017, Ian Lance Taylor  wrote:
>
>> Andreas, can we avoid the problem for earlier glibc versions with a
>> patch like the appended?
>>
>> Ian
>>
>> diff --git a/libgo/sysinfo.c b/libgo/sysinfo.c
>> index a1afc7d1..80407443 100644
>> --- a/libgo/sysinfo.c
>> +++ b/libgo/sysinfo.c
>> @@ -38,7 +38,10 @@
>>  #if defined(HAVE_NETINET_IF_ETHER_H)
>>  #include 
>>  #endif
>> +/* Avoid https://sourceware.org/bugzilla/show_bug.cgi?id=762 .  */
>> +#define ia64_fpreg pt_ia64_fpreg
>>  #include 
>> +#undef ia64_fpreg
>
> That doesn't work, but this does:
>
> diff --git a/libgo/sysinfo.c b/libgo/sysinfo.c
> index a1afc7d119c..1ba27b1a093 100644
> --- a/libgo/sysinfo.c
> +++ b/libgo/sysinfo.c
> @@ -103,7 +103,12 @@
>  #include 
>  #endif
>  #if defined(HAVE_LINUX_PTRACE_H)
> +/* Avoid https://sourceware.org/bugzilla/show_bug.cgi?id=762 .  */
> +#define ia64_fpreg pt_ia64_fpreg
> +#define pt_all_user_regs pt_ia64_all_user_regs
>  #include 
> +#undef ia64_fpreg
> +#undef pt_all_user_regs
>  #endif
>  #if defined(HAVE_LINUX_RTNETLINK_H)
>  #include 


Thanks.

Committed to mainline.

Ian


Re: [PATCH, alpha, go]: Remove PtraceRegs definition to restore bootstrap

2017-06-26 Thread Ian Lance Taylor
On Mon, Jun 26, 2017 at 12:24 AM, Uros Bizjak  wrote:
>
> libgo is now able to automatically determine PtraceRegs. Attached
> patch removes duplicate manual definition from system dependent
> source.
>
> Bootstrapped and regression tested on alphaev68-linux-gnu.

Thanks.

Committed to mainline.

Ian


Re: [PATCH] Fold (a > 0 ? 1.0 : -1.0) into copysign (1.0, a) and a * copysign (1.0, a) into abs(a)

2017-06-26 Thread Richard Sandiford
Joseph Myers  writes:
> On Mon, 26 Jun 2017, Richard Sandiford wrote:
>> > Non-generic builtins like copysign are such a pain... We also end up 
>> > missing the 128-bit case that way (pre-existing problem, not your patch). 
>> > We seem to have a corresponding internal function, but apparently it is 
>> > not used until expansion (well, maybe during vectorization).
>> 
>> It should be OK to introduce uses of the internal functions whenever
>> it's useful.  The match code will check that the internal function is
>> implemented before allowing the transformation.
>
> How well would internal functions work with some having built-in functions 
> only for float, double and long double, others (like copysign) having them 
> for all the _FloatN and _FloatNx types?

I don't think the internal functions themselves should be affected,
since they rely on optabs rather than external functions.  They can
support any mode that the target can support, even if there's no
equivalent standard function.

It might be a good idea to extend gencfn-macros.c and co. to handle
built-in functions that operate on _Float types though.  That wouldn't
be needed for correctness, but would be useful if we want to have folds
that operate on all built-in copysign functions as well as the internal
copysign function.

Thanks,
Richard


Re: [PATCH v2 9/13] D: D2 Testsuite Dejagnu files.

2017-06-26 Thread Mike Stump
On Jun 24, 2017, at 10:52 AM, Iain Buclaw  wrote:
> 
> On 28 May 2017 at 23:16, Iain Buclaw  wrote:
>> This patch adds D language support to the GCC testsuite.
>> 
>> As well as generating the DejaGNU options for compile and link tests,
>> handles the conversion from DMD-style compiler options to GDC.
>> 
>> ---
> 
> Added a few extra comments for procedures, altering dmd2dg to write
> out flags converted to dejagnu in-place, instead on newlines.
> 
> In the other testsuite patch, added new tests to accompany fixes that
> have been made since the last patch.

Ok.


Re: [Patch testsuite]

2017-06-26 Thread Mike Stump
On Jun 26, 2017, at 2:34 AM, Rainer Orth  wrote:
> 
>> Is it OK to commit the following patch (darwin only)?
> 
> this patch needs a ChangeLog entry (and preferably a description of the
> problem you're fixing ;-)

Actually, the CL isn't required, testsuite is special that way.


Re: [Patch testsuite]

2017-06-26 Thread Mike Stump
On Jun 26, 2017, at 2:26 AM, Dominique d'Humières  wrote:
> 
> Is it OK to commit the following patch (darwin only)?

Ok.  As for [0-9a-f]*ing the numbers, at least 1 of test cases should retain 
the actual number check.  I'm fine with the resting being an RE, if someone 
wants to do that.

Re: [Patch testsuite]

2017-06-26 Thread Rainer Orth
Mike Stump  writes:

> On Jun 26, 2017, at 2:34 AM, Rainer Orth  
> wrote:
>> 
>>> Is it OK to commit the following patch (darwin only)?
>> 
>> this patch needs a ChangeLog entry (and preferably a description of the
>> problem you're fixing ;-)
>
> Actually, the CL isn't required, testsuite is special that way.

I believe it is, but some developers choose to ignore that requirement ;-)

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


C++ PATCH for c++/81215, deduction failure with variadic TTP

2017-06-26 Thread Jason Merrill
For the C++17 changes to handling of template template parameter
matching, I replaced a lot of the old code.  But it seems that this
piece is still necessary when we aren't in C++17 mode, to handle the
case where we are comparing C to set, with different
numbers of arguments.  This is handled in C++17 mode by
coerce_ttp_args_for_tta, but that has other effects that we want to
limit to C++17 mode, at least for now.

Tested x86_64-pc-linux-gnu, applying to trunk and 7.
commit 6de84fdaef34cd48649ed501958aa3c93b289f7e
Author: Jason Merrill 
Date:   Mon Jun 26 14:35:52 2017 -0400

PR c++/81215 - deduction failure with variadic TTP.

* pt.c (unify_bound_ttp_args): Restore old logic for C++14 and down.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 392fba0..43f9ca8 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -7170,26 +7170,68 @@ unify_bound_ttp_args (tree tparms, tree targs, tree 
parm, tree& arg,
   parmvec = expand_template_argument_pack (parmvec);
   argvec = expand_template_argument_pack (argvec);
 
-  tree nparmvec = parmvec;
   if (flag_new_ttp)
 {
   /* In keeping with P0522R0, adjust P's template arguments
 to apply to A's template; then flatten it again.  */
+  tree nparmvec = parmvec;
   nparmvec = coerce_ttp_args_for_tta (arg, parmvec, tf_none);
   nparmvec = expand_template_argument_pack (nparmvec);
+
+  if (unify (tparms, targs, nparmvec, argvec,
+UNIFY_ALLOW_NONE, explain_p))
+   return 1;
+
+  /* If the P0522 adjustment eliminated a pack expansion, deduce
+empty packs.  */
+  if (flag_new_ttp
+ && TREE_VEC_LENGTH (nparmvec) < TREE_VEC_LENGTH (parmvec)
+ && unify_pack_expansion (tparms, targs, parmvec, argvec,
+  DEDUCE_EXACT, /*sub*/true, explain_p))
+   return 1;
 }
+  else
+{
+  /* Deduce arguments T, i from TT or TT.
+We check each element of PARMVEC and ARGVEC individually
+rather than the whole TREE_VEC since they can have
+different number of elements, which is allowed under N2555.  */
+
+  int len = TREE_VEC_LENGTH (parmvec);
+
+  /* Check if the parameters end in a pack, making them
+variadic.  */
+  int parm_variadic_p = 0;
+  if (len > 0
+ && PACK_EXPANSION_P (TREE_VEC_ELT (parmvec, len - 1)))
+   parm_variadic_p = 1;
 
-  if (unify (tparms, targs, nparmvec, argvec,
-UNIFY_ALLOW_NONE, explain_p))
-return 1;
-
-  /* If the P0522 adjustment eliminated a pack expansion, deduce
- empty packs.  */
-  if (flag_new_ttp
-  && TREE_VEC_LENGTH (nparmvec) < TREE_VEC_LENGTH (parmvec)
-  && unify_pack_expansion (tparms, targs, parmvec, argvec,
-  DEDUCE_EXACT, /*sub*/true, explain_p))
-return 1;
+  for (int i = 0; i < len - parm_variadic_p; ++i)
+   /* If the template argument list of P contains a pack
+  expansion that is not the last template argument, the
+  entire template argument list is a non-deduced
+  context.  */
+   if (PACK_EXPANSION_P (TREE_VEC_ELT (parmvec, i)))
+ return unify_success (explain_p);
+
+  if (TREE_VEC_LENGTH (argvec) < len - parm_variadic_p)
+   return unify_too_few_arguments (explain_p,
+   TREE_VEC_LENGTH (argvec), len);
+
+  for (int i = 0; i < len - parm_variadic_p; ++i)
+   if (unify (tparms, targs,
+  TREE_VEC_ELT (parmvec, i),
+  TREE_VEC_ELT (argvec, i),
+  UNIFY_ALLOW_NONE, explain_p))
+ return 1;
+
+  if (parm_variadic_p
+ && unify_pack_expansion (tparms, targs,
+  parmvec, argvec,
+  DEDUCE_EXACT,
+  /*subr=*/true, explain_p))
+   return 1;
+}
 
   return 0;
 }
diff --git a/gcc/testsuite/g++.dg/cpp0x/variadic-ttp7.C 
b/gcc/testsuite/g++.dg/cpp0x/variadic-ttp7.C
new file mode 100644
index 000..0dbe904
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/variadic-ttp7.C
@@ -0,0 +1,16 @@
+// PR c++/81215
+// { dg-do compile { target c++11 } }
+
+template struct X { };
+template struct set { };
+
+template  class C>
+void bar (const X>&)
+{
+}
+
+void
+foo (X>& x)
+{
+  bar (x);
+}


Re: [AArch64] Improve HFA code generation

2017-06-26 Thread Richard Sandiford
James Greenhalgh  writes:
> Hi,
>
> For this code:
>
>   struct y {
> float x[4];
>   };
>
>   float
>   bar3 (struct y x)
>   {
> return x.x[3];
>   }
>
> GCC generates:
>
>   bar3:
>   fmovx1, d2
>   mov x0, 0
>   bfi x0, x1, 0, 32
>   fmovx1, d3
>   bfi x0, x1, 32, 32
>   sbfxx0, x0, 32, 32
>   fmovs0, w0
>   ret
>
> If you can wrap your head around that, you'll spot that it could be
> simplified to:
>
>   bar3:
>   fmovs0, s3
>   ret

I get the second version with current trunk, at -O and above.  Does this
only happen with some other modifications, or for certain subtargets?
Or maybe I'm testing the wrong thing.

> Looking at it, I think the issue is the mode that we assign to the
> PARALLEL we build for an HFA in registers. When we get in to
> aarch64_layout_arg with a composite, MODE is set to the smallest integer
> mode that would contain the size of the composite type. That is to say, in
> the example above, MODE will be TImode.
>
> Looking at the expansion path through assign_parms, we're going to go:
>
> assign_parms
>   assign_parm_setup_reg
> assign_parm_remove_parallels
>   emit_group_store
>
> assign_parm_remove_parallels is going to try to create a REG in MODE,
> then construct that REG using the values in the HFA PARALLEL we created. So,
> for the example above, we're going to try to create a packed TImode value
> built up from each of the four "S" registers we've assigned for the
> arguments. Using one of the struct elements is then a 32-bit extract from
> the TImode value (then a move back to FP/SIMD registers). This explains
> the code-gen in the example. Note that an extract from the TImode value
> makes the whole TImode value live, so we can't optimize away the
> construction in registers.
>
> If instead we make the PARALLEL that we create in aarch64_layout_arg BLKmode
> then our expansion path is through:
>
> assign_parms
>   assign_parm_setup_block
>
> Which handles creating a stack slot of the right size for our HFA, and
> copying it to there. We could then trust the usual optimisers to deal with
> the object construction and eliminate it where possible.
>
> However, we can't just return a BLKmode Parallel, as the mid-end was explictly
> asking us to return in MODE, and will eventually ICE given the inconsistency.
> One other way we can force these structures to be given BLKmode is through
> TARGET_MEMBER_TYPE_FORCES_BLK. Which is what we do in this patch.
>
> We're going to tell the mid-end that any structure of more than one element
> which contains either floating-point or vector data should be set out in
> BLKmode rather than a large-enough integer mode. In doing so, we implicitly
> fix the issue with HFA layout above. But at what cost!

The patch only seems to handle scalar floats, not vectors.

Wasn't replying to this to nitpick though :-).  I just wondered whether
there would be any benefit to having an array_mode hook that returns
V4SF for float[4].  (We needed that hook to handle load/store lanes
for SVE, so see git branch linaro-dev/sve if you want to try it.)
Maybe alignment would be a problem though?

Thanks,
Richard


Re: [Patch testsuite]

2017-06-26 Thread Mike Stump
On Jun 26, 2017, at 11:35 AM, Rainer Orth  wrote:
> 
> Mike Stump  writes:
> 
>> On Jun 26, 2017, at 2:34 AM, Rainer Orth  
>> wrote:
>>> 
 Is it OK to commit the following patch (darwin only)?
>>> 
>>> this patch needs a ChangeLog entry (and preferably a description of the
>>> problem you're fixing ;-)
>> 
>> Actually, the CL isn't required, testsuite is special that way.
> 
> I believe it is,

That's way I sent the email.  It's been this way for a very long time.  I don't 
recall participating in a consensus building exercise where we changed the 
requirement, maybe I was sleeping?  If you have a pointer to a thread where we 
changed it, that'd be fine.  I'm happy to update my notion if we changed it.  
The doc page says:

  There is no established convention on when ChangeLog entries are to be made 
for testsuite changes

so, certainly no one reflected any such change in the web pages yet.  I'd 
rather consensus build rather than you or I just passing an edict.  Last time 
we spoke about ChangeLogs, the direction was to eliminate them entirely in 
preference to the git checkin comments, so, not sure we'd go in that direction 
today.

Re: common grounds for VxWorks 7 support

2017-06-26 Thread Olivier Hainque

> On Jun 26, 2017, at 18:44 , Jeff Law  wrote:
> 
>> If OK on principle, I'm not sure who is to approve the tiny config.gcc part.

> I'd think the tiny config.gcc bits would fall under the VxWorks umbrella
> and you can self-approve.

Wonderful :-) Thanks for confirming Jeff!

Olivier



Re: [PATCH] Fix PR71815 (SLSR misses PHI opportunities)

2017-06-26 Thread H.J. Lu
On Fri, Jun 23, 2017 at 9:06 AM, Bill Schmidt
 wrote:
> Hi,
>
> Here's version 2 of the patch to fix the missed SLSR PHI opportunities,
> addressing Richard's comments.  I've repeated regstrap and SPEC testing
> on powerpc64le-unknown-linux-gnu, again showing the patch as neutral
> with respect to performance.  Is this ok for trunk?
>
> Thanks for the review!
>
> Bill
>
>
> [gcc]
>
> 2016-06-23  Bill Schmidt  
>
> * gimple-ssa-strength-reduction.c (uses_consumed_by_stmt): New
> function.
> (find_basis_for_candidate): Call uses_consumed_by_stmt rather than
> has_single_use.
> (slsr_process_phi): Likewise.
> (replace_uncond_cands_and_profitable_phis): Don't replace a
> multiply candidate with a stride of 1 (copy or cast).
> (phi_incr_cost): Call uses_consumed_by_stmt rather than
> has_single_use.
> (lowest_cost_path): Likewise.
> (total_savings): Likewise.
>

This may have caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81216

-- 
H.J.


Default std::list default and move constructors

2017-06-26 Thread François Dumont

Hi

Here is the patch to default implementation of std::list default 
and move constructors.


I introduce _List_node_header to take care of the move 
implementation and also isolate management of the optional list size 
storage. I prefer it to usage of _List_node as move constructor 
seems complicated to implement with an __aligned_membuf. It also avoids 
to use raw memory as-if it was a size_t without constructing it. Even if 
size_t constructor is trivial I guess some memory analyser could have 
complain about it.


* include/bits/stl_list.h
(_List_node_base()): Define.
(_List_node_base(_List_node_base*, _List_node_base*)): New.
(struct _List_node_header): New.
(_List_impl()): Fix noexcept qualification.
(_List_impl(_List_impl&&)): New, default.
(_List_impl(_List_impl&&, _Node_alloc_type&&)): New.
(_List_base()): Default.
(_List_base(_List_base&&)): Default.
(_List_base(_List_base&&, _Node_alloc_type&&, true_type)): New.
(_List_base(_List_base&&, _Node_alloc_type&&, false_type)): New.
(_List_base(_List_base&&, _Node_alloc_type&&)): Use latters.
(_List_base::_M_move_nodes): Adapt to use
_List_node_header._M_move_nodes.
(_List_base::_M_init): Likewise.
(list<>()): Default.
(list<>(list&&)): Default.
(list<>::_M_move_assign(list&&, true_type)): Use _M_move_nodes.

Tested under Linux x86_64.

Ok to commit ?

François


diff --git a/libstdc++-v3/include/bits/stl_list.h b/libstdc++-v3/include/bits/stl_list.h
index 232885a..7e5 100644
--- a/libstdc++-v3/include/bits/stl_list.h
+++ b/libstdc++-v3/include/bits/stl_list.h
@@ -82,6 +82,17 @@ namespace std _GLIBCXX_VISIBILITY(default)
   _List_node_base* _M_next;
   _List_node_base* _M_prev;
 
+#if __cplusplus >= 201103L
+  _List_node_base() = default;
+#else
+  _List_node_base()
+  { }
+#endif
+
+  _List_node_base(_List_node_base* __next, _List_node_base* __prev)
+	: _M_next(__next), _M_prev(__prev)
+  { }
+
   static void
   swap(_List_node_base& __x, _List_node_base& __y) _GLIBCXX_USE_NOEXCEPT;
 
@@ -99,6 +110,79 @@ namespace std _GLIBCXX_VISIBILITY(default)
   _M_unhook() _GLIBCXX_USE_NOEXCEPT;
 };
 
+/// The %list node header.
+struct _List_node_header : public _List_node_base
+{
+private:
+#if _GLIBCXX_USE_CXX11_ABI
+  std::size_t _M_size;
+#endif
+
+  _List_node_base* _M_base() { return this; }
+
+public:
+  _List_node_header() _GLIBCXX_NOEXCEPT
+  : _List_node_base(this, this)
+# if _GLIBCXX_USE_CXX11_ABI
+  , _M_size(0)
+# endif
+  { }
+
+#if __cplusplus >= 201103L
+  _List_node_header(_List_node_header&& __x) noexcept
+  : _List_node_base(__x._M_next, __x._M_prev)
+# if _GLIBCXX_USE_CXX11_ABI
+  , _M_size(__x._M_size)
+# endif
+  {
+	if (__x._M_base()->_M_next == __x._M_base())
+	  this->_M_next = this->_M_prev = this;
+	else
+	  {
+	this->_M_next->_M_prev = this->_M_prev->_M_next = this->_M_base();
+	__x._M_init();
+	  }
+  }
+
+  void
+  _M_move_nodes(_List_node_header&& __x)
+  {
+	_List_node_base* const __xnode = __x._M_base();
+	if (__xnode->_M_next == __xnode)
+	  _M_init();
+	else
+	  {
+	_List_node_base* const __node = this->_M_base();
+	__node->_M_next = __xnode->_M_next;
+	__node->_M_prev = __xnode->_M_prev;
+	__node->_M_next->_M_prev = __node->_M_prev->_M_next = __node;
+	_M_set_size(__x._M_get_size());
+	__x._M_init();
+	  }
+  }
+#endif
+
+#if _GLIBCXX_USE_CXX11_ABI
+  size_t _M_get_size() const { return _M_size; }
+  void _M_set_size(size_t __n) { _M_size = __n; }
+  void _M_inc_size(size_t __n) { _M_size += __n; }
+  void _M_dec_size(size_t __n) { _M_size -= __n; }
+#else
+  // dummy implementations used when the size is not stored
+  size_t _M_get_size() const { return 0; }
+  void _M_set_size(size_t) { }
+  void _M_inc_size(size_t) { }
+  void _M_dec_size(size_t) { }
+#endif
+
+  void
+  _M_init() _GLIBCXX_NOEXCEPT
+  {
+	this->_M_next = this->_M_prev = this;
+	_M_set_size(0);
+  }
+};
+
   _GLIBCXX_END_NAMESPACE_VERSION
   } // namespace detail
 
@@ -323,51 +407,53 @@ _GLIBCXX_BEGIN_NAMESPACE_CXX11
   struct _List_impl
   : public _Node_alloc_type
   {
-#if _GLIBCXX_USE_CXX11_ABI
-	_List_node _M_node;
-#else
-	__detail::_List_node_base _M_node;
-#endif
+	__detail::_List_node_header _M_node;
 
-	_List_impl() _GLIBCXX_NOEXCEPT
-	: _Node_alloc_type(), _M_node()
+	_List_impl() _GLIBCXX_NOEXCEPT_IF( noexcept(_Node_alloc_type()) )
+	: _Node_alloc_type()
 	{ }
 
 	_List_impl(const _Node_alloc_type& __a) _GLIBCXX_NOEXCEPT
-	: _Node_alloc_type(__a), _M_node()
+	: _Node_alloc_type(__a)
 	{ }
 
 #if __cplusplus >= 201103L
+	_List_impl(_List_impl&&) = default;
+
+	_List_impl(_List_impl&& __x, _Node_alloc_type&& __a)
+	  : _Node_alloc_type(std::move(__a)), _M_node(std::move(__x._M_node))
+	{ }
+
 	_List_impl(_Node_alloc_type&& __a) noexcept
-	

Re: [Patch testsuite]

2017-06-26 Thread Dominique d'Humières

> Le 26 juin 2017 à 20:35, Mike Stump  a écrit :
> 
> On Jun 26, 2017, at 2:26 AM, Dominique d'Humières  wrote:
>> 
>> Is it OK to commit the following patch (darwin only)?
> 
> Ok.  As for [0-9a-f]*ing the numbers, at least 1 of test cases should retain 
> the actual number check.  I'm fine with the resting being an RE, if someone 
> wants to do that.

Which test case should retain the actual number check? and could elaborate why? 
These tests are fragile and the RE have already been changed in the past.

Dominique



Re: [PATCH] Fix PR71815 (SLSR misses PHI opportunities)

2017-06-26 Thread Bill Schmidt

> On Jun 26, 2017, at 2:22 PM, H.J. Lu  wrote:
> 
> This may have caused:
> 
> https://gcc.gnu.org/bugzilla/show_bug.cgi?id=81216
> 
> -- 
> H.J.
> 

Nope.  Reverting my patch does not solve the problem, which appears to begin 
with r249643.

Bill


Re: [Patch testsuite]

2017-06-26 Thread Mike Stump
On Jun 26, 2017, at 1:56 PM, Dominique d'Humières  wrote:
> 
>> Le 26 juin 2017 à 20:35, Mike Stump  a écrit :
>> On Jun 26, 2017, at 2:26 AM, Dominique d'Humières  wrote:
>>> 
>>> Is it OK to commit the following patch (darwin only)?
>> 
>> Ok.  As for [0-9a-f]*ing the numbers, at least 1 of test cases should retain 
>> the actual number check.  I'm fine with the resting being an RE, if someone 
>> wants to do that.
> 
> Which test case should retain the actual number check? and could elaborate 
> why? These tests are fragile and the RE have already been changed in the past.

This was commentary on the other comment about using REs instead.  You can 
ignore it, if you want.  As for which test case, I'd have to closely examine 
them to determine that.  I've not done that.  Technically, you want to check 
all the ones that have items in them that aren't reflected in other test cases 
that check the value.



Re: [PATCH] PR66669: Fix failure of gcc.dg/loop-8.c on Power (Backport)

2017-06-26 Thread Kelvin Nilsen

Is it ok to backport this patch to GCC-6?

On 01/23/2017 09:59 AM, Kelvin Nilsen wrote:
> 
> The test gcc.dg/loop-8.c makes assumptions that are not valid on Power
> architecture (and on certain other architectures for which this issue
> has already been addressed).  The test case assumes that a single
> loop-invariant statement will be moved outside the loop.  On Power, a
> constant is copy-propagated within the loop, and the subsequent
> loop-invariant code motion moves two loop-invariant statements out of
> the loop.
> 
> This patch simply disables this test case on Power architecture.
> 
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-01-23  Kelvin Nilsen  
> 
>   PR target/9
>   * gcc.dg/loop-8.c: Modify dg-skip-if directive to exclude this
>   test on powerpc targets.
> 
> Index: gcc/testsuite/gcc.dg/loop-8.c
> ===
> --- gcc/testsuite/gcc.dg/loop-8.c (revision 244730)
> +++ gcc/testsuite/gcc.dg/loop-8.c (working copy)
> @@ -1,6 +1,6 @@
>  /* { dg-do compile } */
>  /* { dg-options "-O1 -fdump-rtl-loop2_invariant" } */
> -/* { dg-skip-if "unexpected IV" { "hppa*-*-* mips*-*-* visium-*-*" } { "*" } 
> { "" } } */
> +/* { dg-skip-if "unexpected IV" { "hppa*-*-* mips*-*-* visium-*-* 
> powerpc*-*-*" } { "*" } { "" } } */
> 
>  void
>  f (int *a, int *b)
> 
> 

-- 
Kelvin Nilsen, Ph.D.  kdnil...@linux.vnet.ibm.com
home office: 801-756-4821, cell: 520-991-6727
IBM Linux Technology Center - PPC Toolchain



Re: [PATCH] PR68972: g++.dg/cpp1y/vla-initlist1.C test case fails on power (backport)

2017-06-26 Thread Kelvin Nilsen

Is this ok for backport to GCC 6?

On 02/06/2017 03:20 PM, Kelvin Nilsen wrote:
> 
> The test g++.dg/cpp1y/vla-initlist1.C makes assumptions that the memory
> used to represent the private temporary variables of neighboring control
> blocks at the same control nesting level is:
> 
> 1. found at the same address, and
> 2. not overwritten between when the first block ends and the second
> block begins.
> 
> While these assumptions are valid with some optimization choices on some
> architectures, these assumptions do not hold universally.
> 
> With optimization disabled on the power architecture, the
> g++.dg/cpp1y/vla-initlist1.C test program runs initialization code to
> allocate the variable-length array a[] before entry into the second of
> two neighboring control blocks.  This initialization code overwrites the
> first two cells of the array i[] that were initialized by the first of
> the two neighboring control blocks.  Thus, the initialization value
> stored into i[1] is no longer present when this value is subsequently
> fetched as a[1].i from within the second control block.
> 
> This patch disables this particular test case on power hardware.
> 
> The patch has been bootstrapped and tested on
> powerpc64le-unknown-linux with no regressions.
> 
> Is this ok for trunk?
> 
> gcc/testsuite/ChangeLog:
> 
> 2017-02-06  Kelvin Nilsen  
> 
>   PR target/68972
>   * g++.dg/cpp1y/vla-initlist1.C: Add dg-skip-if directive to
>   disable this test on power architecture.
> 
> Index: gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C
> ===
> --- gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C(revision 245156)
> +++ gcc/testsuite/g++.dg/cpp1y/vla-initlist1.C(working copy)
> @@ -1,4 +1,5 @@
>  // { dg-do run { target c++11 } }
> +// { dg-skip-if "power overwrites two slots of array i" { "power*-*-*"
> } { "*" } { "" } }
>  // { dg-options "-Wno-vla" }
> 
>  #include 
> 
> 

-- 
Kelvin Nilsen, Ph.D.  kdnil...@linux.vnet.ibm.com
home office: 801-756-4821, cell: 520-991-6727
IBM Linux Technology Center - PPC Toolchain



Backports to 6 (and 7, and 5)

2017-06-26 Thread Segher Boessenkool
Hi!

I'd like to backport the following patches to 7, 6, and where
applicable 5 (some are in 7 already).


https://gcc.gnu.org/ml/gcc-patches/2017-05/msg02107.html
Fix expand_builtin_atomic_fetch_op for pre-op (PR80902)

https://gcc.gnu.org/ml/gcc-patches/2017-05/msg01404.html
Fix comparison of decimal float zeroes (PR80692)

https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00819.html
IRA: Don't create new regs for debug insns (PR80429)

https://gcc.gnu.org/ml/gcc-patches/2016-08/msg01050.html
lra: A multiple_sets is not a simple_move_p (PR73650)

https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01853.html
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg01923.html
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02048.html
https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02606.html
bb-reorder: Improve compgotos pass (PR71785)

https://gcc.gnu.org/ml/gcc-patches/2017-05/msg3.html
simplify-rtx: Fix compare of comparisons (PR60818)


Are those okay to backport?


And, rs6000 patches:

https://gcc.gnu.org/ml/gcc-patches/2017-06/msg00567.html
rs6000: Don't add an immediate to r0 (PR80966)

https://gcc.gnu.org/ml/gcc-patches/2017-05/msg02408.html
rs6000: Don't write "nor" as (not (ior () ())) (PR80618)

https://gcc.gnu.org/ml/gcc-patches/2017-04/msg00568.html
rs6000: Enforce quad_address_p in TImode atomic_load/store (PR80382)

https://gcc.gnu.org/ml/gcc-patches/2017-01/msg01543.html
rs6000: Small varargs for BE SVR4 (PR61729, PR77850)


Segher


Re: [gomp4] OpenACC async re-work

2017-06-26 Thread Cesar Philippidis
I still need more time to review this, but ...

On 06/24/2017 12:54 AM, Chung-Lin Tang wrote:
> Hi Cesar, Thomas,
> This patch is the re-implementation of OpenACC async we talked about.
> The changes are rather large, so I am putting it here for a few days before
> actually committing them to gomp-4_0-branch. Would appreciate if you guys
> take a look.
>
> To overall describe the highlights of the changes:
> 
> (1) Instead of essentially implementing the entire OpenACC async support
> inside the plugin, we now use an opaque 'goacc_asyncqueue' implemented
> by the plugin, along with core 'test', 'synchronize', 'serialize', etc.
> plugin functions. Most of the OpenACC specific logic is pulled into
> libgomp/oacc-async.c

I'm not sure if plugins need to maintain backwards compatibility.
However, I don't see any changes inside libgomp.map, so maybe it's not
required.

> (2) CUDA events are no longer used. The limitation of no CUDA calls inside
> CUDA callbacks were a problem for resource freeing, but we now stash
> them onto the ptx_device and free them later.

Yay!

> (3) For 'wait + async', we now add a local thread synchronize, instead
> of just ordering the streams.
>
> (4) To work with the (3) change, some front end changes were added to
> propagate argument-less wait clauses as 'wait(GOACC_ASYNC_NOVAL)' to
> represent a 'wait all'.

What's the significance of GOMP_ASYNC_NOVAL? Wouldn't it have been
easier to make that change in the gimplifier?

> Patch was tested to have no regressions on gomp-4_0-branch. I'll commit
> this after the weekend (or Tues.)

>   * plugin/plugin-nvptx.c (struct cuda_map): Remove.
> (GOMP_OFFLOAD_openacc_exec): Adjust parameters and code.
> (GOMP_OFFLOAD_openacc_async_exec): New plugin hook function.

These two functions seem extremely similar.  I wonder if you should
consolidate them.

Overall, I like how you were able eliminate the externally managed map_*
data structure which was used to pass in arguments to nvptx_exec.
Although I wonder if we should just pass in those individual arguments
directly to cuLaunchKernel. But that's a big change in itself.

Cesar


Re: [PATCH, alpha, go]: Introduce applyRelocationsALPHA

2017-06-26 Thread Ian Lance Taylor
On Thu, Jun 22, 2017 at 12:13 AM, Uros Bizjak  wrote:
>
> However, there is one another issue with zdefaultcc.go generation. On
> my system, the default gccgo, gcc and g++ are installed in:
>
> $ which gccgo
> /usr/bin/gccgo
> $ which gcc
> /usr/bin/gcc
>
> but gotools Makefile uses $(bindir) to derive absolute path to the binaries:
>
> echo 'package main' > zdefaultcc.go.tmp
> echo 'const defaultGCCGO = "$(bindir)/$(GCCGO_INSTALL_NAME)"'
>>> zdefaultcc.go.tmp
> echo 'const defaultCC = "$(bindir)/$(GCC_INSTALL_NAME)"' >>
> zdefaultcc.go.tmp
> echo 'const defaultCXX = "$(bindir)/$(GXX_INSTALL_NAME)"' >>
> zdefaultcc.go.tmp
> echo 'const defaultPkgConfig = "pkg-config"' >> zdefaultcc.go.tmp
>
> However, since $prefix (by default) points to /usr/local, $bindir
> points to /usr/local/bin. Consequently, zdefaultcc.go reads:
>
> package main
> const defaultGCCGO = "/usr/local/bin/gccgo"
> const defaultCC = "/usr/local/bin/gcc"
> const defaultCXX = "/usr/local/bin/g++"
> const defaultPkgConfig = "pkg-config"
>
> The absolute path is wrong, since - as mentioned above - the system
> compiler is installed in /usr/bin.
>
> Probably we just need to remove $bindir and assume that these binaries
> exist in $PATH.

I did that for defaultCC and defaultCXX, as appended.

Ian


2017-06-26  Ian Lance Taylor  

* Makefile.am (s-zdefaultcc): Don't record $(bindir) for defaultCC
or defaultCXX.
* Makefile.in: Rebuild.
Index: Makefile.am
===
--- Makefile.am (revision 249668)
+++ Makefile.am (working copy)
@@ -100,8 +100,8 @@ zdefaultcc.go: s-zdefaultcc; @true
 s-zdefaultcc: Makefile
echo 'package main' > zdefaultcc.go.tmp
echo 'const defaultGCCGO = "$(bindir)/$(GCCGO_INSTALL_NAME)"' >> 
zdefaultcc.go.tmp
-   echo 'const defaultCC = "$(bindir)/$(GCC_INSTALL_NAME)"' >> 
zdefaultcc.go.tmp
-   echo 'const defaultCXX = "$(bindir)/$(GXX_INSTALL_NAME)"' >> 
zdefaultcc.go.tmp
+   echo 'const defaultCC = "$(GCC_INSTALL_NAME)"' >> zdefaultcc.go.tmp
+   echo 'const defaultCXX = "$(GXX_INSTALL_NAME)"' >> zdefaultcc.go.tmp
echo 'const defaultPkgConfig = "pkg-config"' >> zdefaultcc.go.tmp
$(SHELL) $(srcdir)/../move-if-change zdefaultcc.go.tmp zdefaultcc.go
$(STAMP) $@ 
Index: Makefile.in
===
--- Makefile.in (revision 249668)
+++ Makefile.in (working copy)
@@ -582,8 +582,8 @@ distclean-generic:
 maintainer-clean-generic:
@echo "This command is intended for maintainers to use"
@echo "it deletes files that may require special tools to rebuild."
-@NATIVE_FALSE@uninstall-local:
 @NATIVE_FALSE@install-exec-local:
+@NATIVE_FALSE@uninstall-local:
 clean: clean-am
 
 clean-am: clean-binPROGRAMS clean-generic clean-noinstPROGRAMS \
@@ -682,8 +682,8 @@ zdefaultcc.go: s-zdefaultcc; @true
 s-zdefaultcc: Makefile
echo 'package main' > zdefaultcc.go.tmp
echo 'const defaultGCCGO = "$(bindir)/$(GCCGO_INSTALL_NAME)"' >> 
zdefaultcc.go.tmp
-   echo 'const defaultCC = "$(bindir)/$(GCC_INSTALL_NAME)"' >> 
zdefaultcc.go.tmp
-   echo 'const defaultCXX = "$(bindir)/$(GXX_INSTALL_NAME)"' >> 
zdefaultcc.go.tmp
+   echo 'const defaultCC = "$(GCC_INSTALL_NAME)"' >> zdefaultcc.go.tmp
+   echo 'const defaultCXX = "$(GXX_INSTALL_NAME)"' >> zdefaultcc.go.tmp
echo 'const defaultPkgConfig = "pkg-config"' >> zdefaultcc.go.tmp
$(SHELL) $(srcdir)/../move-if-change zdefaultcc.go.tmp zdefaultcc.go
$(STAMP) $@ 


[AARCH64] Disable pc relative literal load irrespective of TARGET_FIX_ERR_A53_84341

2017-06-26 Thread Kugan Vivekanandarajah
https://gcc.gnu.org/ml/gcc-patches/2016-03/msg00614.html  added this
workaround to get kernel building with when TARGET_FIX_ERR_A53_843419
is enabled.

This was added to support building kernel loadable modules. In kernel,
when CONFIG_ARM64_ERRATUM_843419 is selected, the relocation needed
for ADRP/LDR (R_AARCH64_ADR_PREL_PG_HI21 and
R_AARCH64_ADR_PREL_PG_HI21_NC are removed from the kernel to avoid
loading objects with possibly offending sequence). Thus, it could only
support pc relative literal loads.

However, the following patch was posted to kernel to add
-mpc-relative-literal-loads
http://www.spinics.net/lists/arm-kernel/msg476149.html

-mpc-relative-literal-loads is unconditionally added to the kernel
build as can be seen from:
https://github.com/torvalds/linux/blob/master/arch/arm64/Makefile

Therefore this patch removes the hunk so that applications like
SPECcpu2017's 521/621.wrf can be built (with LTO in this case) without
-mno-pc-relative-literal-loads

Bootstrapped and regression tested on aarch64-linux-gnu with no new regressions.

Is this OK for trunk?

Thanks,
Kugan

gcc/testsuite/ChangeLog:

2017-06-27  Kugan Vivekanandarajah  

* gcc.target/aarch64/pr63304_1.c: Remove-mno-fix-cortex-a53-843419.

gcc/ChangeLog:

2017-06-27  Kugan Vivekanandarajah  

* config/aarch64/aarch64.c (aarch64_override_options_after_change_1):
Disable pc relative literal load irrespective of TARGET_FIX_ERR_A53_84341
for default.
From bf5d8151ad6a83903f51529655e83181bdb67200 Mon Sep 17 00:00:00 2001
From: Kugan Vivekanandarajah 
Date: Thu, 8 Jun 2017 15:51:29 +1000
Subject: [PATCH] Disable pc relative literal load irrespective of
 TARGET_FIX_ERR_A53_84341

---
 gcc/config/aarch64/aarch64.c | 11 ---
 gcc/testsuite/gcc.target/aarch64/pr63304_1.c |  2 +-
 2 files changed, 1 insertion(+), 12 deletions(-)

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 71f9819..99cfd20 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -8632,17 +8632,6 @@ aarch64_override_options_after_change_1 (struct gcc_options *opts)
   if (opts->x_pcrelative_literal_loads == 1)
 aarch64_pcrelative_literal_loads = true;
 
-  /* This is PR70113. When building the Linux kernel with
- CONFIG_ARM64_ERRATUM_843419, support for relocations
- R_AARCH64_ADR_PREL_PG_HI21 and R_AARCH64_ADR_PREL_PG_HI21_NC is
- removed from the kernel to avoid loading objects with possibly
- offending sequences.  Without -mpc-relative-literal-loads we would
- generate such relocations, preventing the kernel build from
- succeeding.  */
-  if (opts->x_pcrelative_literal_loads == 2
-  && TARGET_FIX_ERR_A53_843419)
-aarch64_pcrelative_literal_loads = true;
-
   /* In the tiny memory model it makes no sense to disallow PC relative
  literal pool loads.  */
   if (aarch64_cmodel == AARCH64_CMODEL_TINY
diff --git a/gcc/testsuite/gcc.target/aarch64/pr63304_1.c b/gcc/testsuite/gcc.target/aarch64/pr63304_1.c
index c917f81c..fa0fb56 100644
--- a/gcc/testsuite/gcc.target/aarch64/pr63304_1.c
+++ b/gcc/testsuite/gcc.target/aarch64/pr63304_1.c
@@ -1,5 +1,5 @@
 /* { dg-do assemble } */
-/* { dg-options "-O1 --save-temps -mno-fix-cortex-a53-843419" } */
+/* { dg-options "-O1 --save-temps" } */
 #pragma GCC push_options
 #pragma GCC target ("+nothing+simd, cmodel=small")
 
-- 
2.7.4



[PATCH, rs6000] Add support to __builtin_cpu_supports() for two new HWCAP2 bits

2017-06-26 Thread Peter Bergner
Tulio added support for two new AT_HWCAP2 bits to GLIBC which have been
recently added to the kernel:

  https://www.sourceware.org/ml/libc-alpha/2017-06/msg00069.html

This patch adds support for them to the __builtin_cpu_supports() builtin
function so we can test for them.

Tested on powerpc64le-linux with no regressions.  Is this ok for trunk?

Peter

gcc/
* config/rs6000/ppc-auxv.h (PPC_FEATURE2_DARN): New define.
(PPC_FEATURE2_SCV): Likewise.
* config/rs6000/rs6000.c (cpu_supports_info): Use them.

gcc/testsuite/
* gcc.target/powerpc/cpu-builtin-1.c (darn, scv): Add tests.

Index: gcc/config/rs6000/ppc-auxv.h
===
--- gcc/config/rs6000/ppc-auxv.h(revision 249611)
+++ gcc/config/rs6000/ppc-auxv.h(working copy)
@@ -89,6 +89,8 @@
 #define PPC_FEATURE2_HTM_NOSC   0x0100
 #define PPC_FEATURE2_ARCH_3_00  0x0080
 #define PPC_FEATURE2_HAS_IEEE1280x0040
+#define PPC_FEATURE2_DARN   0x0020
+#define PPC_FEATURE2_SCV0x0010
 
 
 /* Thread Control Block (TCB) offsets of the AT_PLATFORM, AT_HWCAP and
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 249611)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -379,7 +379,9 @@ static const struct
   { "tar", PPC_FEATURE2_HAS_TAR,   1 },
   { "vcrypto", PPC_FEATURE2_HAS_VEC_CRYPTO,1 },
   { "arch_3_00",   PPC_FEATURE2_ARCH_3_00, 1 },
-  { "ieee128", PPC_FEATURE2_HAS_IEEE128,   1 }
+  { "ieee128", PPC_FEATURE2_HAS_IEEE128,   1 },
+  { "darn",PPC_FEATURE2_DARN,  1 },
+  { "scv", PPC_FEATURE2_SCV,   1 }
 };
 
 /* On PowerPC, we have a limited number of target clones that we care about
Index: gcc/testsuite/gcc.target/powerpc/cpu-builtin-1.c
===
--- gcc/testsuite/gcc.target/powerpc/cpu-builtin-1.c(revision 249611)
+++ gcc/testsuite/gcc.target/powerpc/cpu-builtin-1.c(working copy)
@@ -62,4 +62,6 @@ use_cpu_supports_builtins (unsigned int
   p[35] = __builtin_cpu_supports ("ucache");
   p[36] = __builtin_cpu_supports ("vcrypto");
   p[37] = __builtin_cpu_supports ("vsx");
+  p[38] = __builtin_cpu_supports ("darn");
+  p[39] = __builtin_cpu_supports ("scv");
 }



Re: [PING][PATCH] Move the check for any_condjump_p from sched-deps to target macros

2017-06-26 Thread Hurugalawadi, Naveen
Hi Jeff,

Thanks for the review and your approval for final patch.
Sorry, It was a long weekend and hence could not revert to your
comments earlier.

>> You need a ChangeLog entry, but I think that's it.  Can you
>> please repost with a ChangeLog entry for final approval?

Please find the final patch and ChangeLog entry updated as required.
Please review the same and let me know if its okay to commit?

Thanks,
Naveen

2017-06-27  Naveen H.S  

* config/aarch64/aarch64.c (aarch_macro_fusion_pair_p): Push the
check for CC usage into AARCH64_FUSE_CMP_BRANCH.
* config/i386/i386.c (ix86_macro_fusion_pair_p): Push the check for
CC usage from generic code to here.
* sched-deps.c (sched_macro_fuse_insns): Move the condition for
CC usage into the target macros.diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 2e385c4..b38b8b7 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -13973,13 +13973,23 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
 {
   enum attr_type prev_type = get_attr_type (prev);
 
-  /* FIXME: this misses some which is considered simple arthematic
- instructions for ThunderX.  Simple shifts are missed here.  */
-  if (prev_type == TYPE_ALUS_SREG
-  || prev_type == TYPE_ALUS_IMM
-  || prev_type == TYPE_LOGICS_REG
-  || prev_type == TYPE_LOGICS_IMM)
-return true;
+  unsigned int condreg1, condreg2;
+  rtx cc_reg_1;
+  aarch64_fixed_condition_code_regs (&condreg1, &condreg2);
+  cc_reg_1 = gen_rtx_REG (CCmode, condreg1);
+
+  if (reg_referenced_p (cc_reg_1, PATTERN (curr))
+	  && prev
+	  && modified_in_p (cc_reg_1, prev))
+	{
+	  /* FIXME: this misses some which is considered simple arthematic
+	 instructions for ThunderX.  Simple shifts are missed here.  */
+	  if (prev_type == TYPE_ALUS_SREG
+	  || prev_type == TYPE_ALUS_IMM
+	  || prev_type == TYPE_LOGICS_REG
+	  || prev_type == TYPE_LOGICS_IMM)
+	return true;
+	}
 }
 
   return false;
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 0b2fa1b..af14c90 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -29483,6 +29483,15 @@ ix86_macro_fusion_pair_p (rtx_insn *condgen, rtx_insn *condjmp)
   if (!any_condjump_p (condjmp))
 return false;
 
+  unsigned int condreg1, condreg2;
+  rtx cc_reg_1;
+  ix86_fixed_condition_code_regs (&condreg1, &condreg2);
+  cc_reg_1 = gen_rtx_REG (CCmode, condreg1);
+  if (!reg_referenced_p (cc_reg_1, PATTERN (condjmp))
+  || !condgen
+  || !modified_in_p (cc_reg_1, condgen))
+return false;
+
   if (get_attr_type (condgen) != TYPE_TEST
   && get_attr_type (condgen) != TYPE_ICMP
   && get_attr_type (condgen) != TYPE_INCDEC
diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
index b2393bf..4c459e6 100644
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -2834,34 +2834,30 @@ static void
 sched_macro_fuse_insns (rtx_insn *insn)
 {
   rtx_insn *prev;
-
+  prev = prev_nonnote_nondebug_insn (insn);
+  if (!prev)
+return;
+ 
   if (any_condjump_p (insn))
 {
   unsigned int condreg1, condreg2;
   rtx cc_reg_1;
   targetm.fixed_condition_code_regs (&condreg1, &condreg2);
   cc_reg_1 = gen_rtx_REG (CCmode, condreg1);
-  prev = prev_nonnote_nondebug_insn (insn);
-  if (!reg_referenced_p (cc_reg_1, PATTERN (insn))
-  || !prev
-  || !modified_in_p (cc_reg_1, prev))
-return;
+  if (reg_referenced_p (cc_reg_1, PATTERN (insn))
+	  && modified_in_p (cc_reg_1, prev))
+	{
+	  if (targetm.sched.macro_fusion_pair_p (prev, insn))
+	SCHED_GROUP_P (insn) = 1;
+	  return;
+	}
 }
-  else
-{
-  rtx insn_set = single_set (insn);
-
-  prev = prev_nonnote_nondebug_insn (insn);
-  if (!prev
-  || !insn_set
-  || !single_set (prev))
-return;
 
+  if (single_set (insn) && single_set (prev))
+{
+  if (targetm.sched.macro_fusion_pair_p (prev, insn))
+	SCHED_GROUP_P (insn) = 1;
 }
-
-  if (targetm.sched.macro_fusion_pair_p (prev, insn))
-SCHED_GROUP_P (insn) = 1;
-
 }
 
 /* Get the implicit reg pending clobbers for INSN and save them in TEMP.  */


libgo patch commited: Add misc/cgo files

2017-06-26 Thread Ian Lance Taylor
This patch adds the misc/cgo files from the Go 1.8.3 release to libgo.
These will be used for tests of the go tool in various modes.
Bootstrapped on x86_64-pc-linux-gnu.  Committed to mainline.

Ian


patch.txt.gz
Description: GNU Zip compressed data


[PATCH] Fix ICE during strstr gimple folding (PR middle-end/81207)

2017-06-26 Thread Jakub Jelinek
Hi!

replace_call_with_call_and_fold has code to copy over vdef/vuse from the
old call to the new one, so that we don't have to update virtual ssa,
but it is conditioned on gimple_vdef being non-NULL and SSA_NAME.
If we have a pure function, gimple_vdef is NULL, yet we still want to copy
over the vuse.

Bootstrapped/regtested on x86_64-linux (i686-linux fails to bootstrap
with/without this patch), ok for trunk?

2017-06-27  Jakub Jelinek  

PR middle-end/81207
* gimple-fold.c (replace_call_with_call_and_fold): Handle
gimple_vuse copying separately from gimple_vdef copying.

* gcc.c-torture/compile/pr81207.c: New test.

--- gcc/gimple-fold.c.jj2017-06-19 08:28:11.0 +0200
+++ gcc/gimple-fold.c   2017-06-26 17:09:34.735420583 +0200
@@ -607,9 +607,11 @@ replace_call_with_call_and_fold (gimple_
   && TREE_CODE (gimple_vdef (stmt)) == SSA_NAME)
 {
   gimple_set_vdef (repl, gimple_vdef (stmt));
-  gimple_set_vuse (repl, gimple_vuse (stmt));
   SSA_NAME_DEF_STMT (gimple_vdef (repl)) = repl;
 }
+  if (gimple_vuse (stmt)
+  && TREE_CODE (gimple_vuse (stmt)) == SSA_NAME)
+gimple_set_vuse (repl, gimple_vuse (stmt));
   gsi_replace (gsi, repl, false);
   fold_stmt (gsi);
 }
--- gcc/testsuite/gcc.c-torture/compile/pr81207.c.jj2017-06-26 
17:21:38.765918367 +0200
+++ gcc/testsuite/gcc.c-torture/compile/pr81207.c   2017-06-26 
17:27:15.222966965 +0200
@@ -0,0 +1,13 @@
+/* PR middle-end/81207 */
+
+static const char *b[2] = { "'", "" };
+
+int
+foo (const char *d)
+{
+  int e;
+  for (e = 0; b[e]; e++)
+if (__builtin_strstr (d, b[e]))
+  return 1;
+  return 0;
+}

Jakub


[PATCH] Fix some narrowing conversion issues

2017-06-26 Thread Jakub Jelinek
Hi!

We build gcc with -Wno-narrowing, for some reason I ended up with old
Makefile without that and discovered a couple of -Wnarrowing errors.

This patch fixes them.  Bootstrapped/regtested on x86_64-linux (i686-linux
fails to bootstrap with/without this patch), ok for trunk?

2017-06-27  Jakub Jelinek  

* predict.c (test_prediction_value_range): Use -1U instead of -1
to avoid narrowing conversion warning.
* dumpfile.c (dump_options): Wrap all value into dump_flags_t cast
to avoid narrowing conversion warning.
* opt-functions.awk (var_ref): Return (unsigned short) -1 instead of
-1.
* optc-gen.awk (END): Expect (unsigned short) -1 instead of -1.

--- gcc/predict.c.jj2017-06-21 16:53:37.0 +0200
+++ gcc/predict.c   2017-06-26 18:39:33.640190953 +0200
@@ -4031,7 +4031,7 @@ test_prediction_value_range ()
 {
   branch_predictor predictors[] = {
 #include "predict.def"
-{NULL, -1}
+{NULL, -1U}
   };
 
   for (unsigned i = 0; predictors[i].name != NULL; i++)
--- gcc/dumpfile.c.jj   2017-06-19 08:27:22.0 +0200
+++ gcc/dumpfile.c  2017-06-26 18:38:44.383762664 +0200
@@ -110,9 +110,9 @@ static const struct dump_option_value_in
   {"missed", MSG_MISSED_OPTIMIZATION},
   {"note", MSG_NOTE},
   {"optall", MSG_ALL},
-  {"all", ~(TDF_RAW | TDF_SLIM | TDF_LINENO | TDF_GRAPH | TDF_STMTADDR
-   | TDF_RHS_ONLY | TDF_NOUID | TDF_ENUMERATE_LOCALS | TDF_SCEV
-   | TDF_GIMPLE)},
+  {"all", dump_flags_t (~(TDF_RAW | TDF_SLIM | TDF_LINENO | TDF_GRAPH
+   | TDF_STMTADDR | TDF_RHS_ONLY | TDF_NOUID
+   | TDF_ENUMERATE_LOCALS | TDF_SCEV | TDF_GIMPLE))},
   {NULL, 0}
 };
 
--- gcc/opt-functions.awk.jj2017-01-08 17:41:19.0 +0100
+++ gcc/opt-functions.awk   2017-06-26 18:54:15.174940306 +0200
@@ -275,7 +275,7 @@ function var_ref(name, flags)
return "offsetof (struct gcc_options, x_target_flags)"
if (opt_args("InverseMask", flags) != "")
return "offsetof (struct gcc_options, x_target_flags)"
-   return "-1"
+   return "(unsigned short) -1"
 }
 
 # Given the option called NAME return a sanitized version of its name.
--- gcc/optc-gen.awk.jj 2017-02-25 00:15:02.0 +0100
+++ gcc/optc-gen.awk2017-06-26 18:55:41.613928361 +0200
@@ -336,7 +336,7 @@ for (i = 0; i < n_opts; i++) {
alias_posarg = nth_arg(1, alias_arg)
alias_negarg = nth_arg(2, alias_arg)
 
-   if (var_ref(opts[i], flags[i]) != "-1")
+   if (var_ref(opts[i], flags[i]) != "(unsigned short) -1")
print "#error Alias setting variable"
 
if (alias_posarg != "" && alias_negarg == "") {

Jakub


[PATCH] Fix another ubsan_encode_value related ICE (PR sanitizer/81209)

2017-06-26 Thread Jakub Jelinek
Hi!

Apparently the pr81125.C testcase ICEs on Darwin, but not on Linux,
the difference is that on Darwin ctors/dtors aren't deduplicated due to
lack of flexibility of the object format.  I've managed to reproduce
also on Linux with a virtual base and -fno-declone-ctor-dtor.
The problem was that because the temp var didn't have DECL_CONTEXT
set, during cloning that var wasn't remapped and thus was shared by
both complete and base ctor.

Fixed thusly, bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-06-27  Jakub Jelinek  

PR sanitizer/81209
* ubsan.c (ubsan_encode_value): Initialize DECL_CONTEXT on var.

* g++.dg/ubsan/pr81209.C: New test.

--- gcc/ubsan.c.jj  2017-06-19 17:28:13.0 +0200
+++ gcc/ubsan.c 2017-06-26 21:04:45.602012192 +0200
@@ -153,6 +153,7 @@ ubsan_encode_value (tree t, enum ubsan_e
{
  var = create_tmp_var_raw (type);
  TREE_ADDRESSABLE (var) = 1;
+ DECL_CONTEXT (var) = current_function_decl;
}
  if (phase == UBSAN_ENCODE_VALUE_RTL)
{
--- gcc/testsuite/g++.dg/ubsan/pr81209.C.jj 2017-06-26 21:07:47.018875009 
+0200
+++ gcc/testsuite/g++.dg/ubsan/pr81209.C2017-06-26 21:08:08.273624617 
+0200
@@ -0,0 +1,21 @@
+// PR sanitizer/81209
+// { dg-do compile }
+// { dg-options "-fsanitize=undefined -fno-declone-ctor-dtor" }
+
+#ifdef __SIZEOF_INT128__
+typedef __int128 T;
+#else
+typedef long long int T;
+#endif
+
+struct B {};
+struct A : virtual public B
+{
+  A (long);
+  T a;
+};
+
+A::A (long c)
+{
+  long b = a % c;
+}

Jakub


RE: [Neon intrinsics] Literal vector construction through vcombine is poor

2017-06-26 Thread Michael Collison
Richard,

I reworked the patch using an assert as you suggested. Bootstrapped and 
retested. Okay for trunk?


-Original Message-
From: Richard Earnshaw (lists) [mailto:richard.earns...@arm.com] 
Sent: Friday, June 23, 2017 2:09 AM
To: Michael Collison ; GCC Patches 

Cc: nd 
Subject: Re: [Neon intrinsics] Literal vector construction through vcombine is 
poor

On 23/06/17 00:10, Michael Collison wrote:
> Richard,
> 
> I reworked the patch and retested on big endian as well as little. The 
> original code was performing two swaps in the big endian case which works out 
> to no swaps at all.
> 
> I also updated the ChangeLog per your comments. Okay for trunk?
> 
> 2017-06-19  Michael Collison  
> 
>   * config/aarch64/aarch64-simd.md (aarch64_combine): Directly
>   call aarch64_split_simd_combine.
>   * (aarch64_combine_internal): Delete pattern.
>   * config/aarch64/aarch64.c (aarch64_split_simd_combine):
>   Allow register and subreg operands.
> 
> -Original Message-
> From: Richard Earnshaw (lists) [mailto:richard.earns...@arm.com]
> Sent: Monday, June 19, 2017 6:37 AM
> To: Michael Collison ; GCC Patches 
> 
> Cc: nd 
> Subject: Re: [Neon intrinsics] Literal vector construction through 
> vcombine is poor
> 
> On 16/06/17 22:08, Michael Collison wrote:
>> This patch improves code generation for literal vector construction by 
>> expanding and exposing the pattern to rtl optimization earlier. The current 
>> implementation delays splitting the pattern until after reload which results 
>> in poor code generation for the following code:
>>
>>
>> #include "arm_neon.h"
>>
>> int16x8_t
>> foo ()
>> {
>>   return vcombine_s16 (vdup_n_s16 (0), vdup_n_s16 (8)); }
>>
>> Trunk generates:
>>
>> foo:
>>  moviv1.2s, 0
>>  moviv0.4h, 0x8
>>  dup d2, v1.d[0]
>>  ins v2.d[1], v0.d[0]
>>  orr v0.16b, v2.16b, v2.16b
>>  ret
>>
>> With the patch we now generate:
>>
>> foo:
>>  moviv1.4h, 0x8
>>  moviv0.4s, 0
>>  ins v0.d[1], v1.d[0]
>>  ret
>>
>> Bootstrapped and tested on aarch64-linux-gnu. Okay for trunk.
>>
>> 2017-06-15  Michael Collison  
>>
>>  * config/aarch64/aarch64-simd.md(aarch64_combine_internal):
>>  Convert from define_insn_and_split into define_expand
>>  * config/aarch64/aarch64.c(aarch64_split_simd_combine):
>>  Allow register and subreg operands.
>>
> 
> Your changelog entry is confusing.  You've deleted the 
> aarch64_combine_internal pattern entirely, having merged some of its 
> functionality directly into its caller (aarch64_combine).
> 
> So I think it should read:
> 
> * config/aarch64/aarch64-simd.md (aarch64_combine): Directly call 
> aarch64_split_simd_combine.
> (aarch64_combine_internal): Delete pattern.
> * ...
> 
> Note also there should be a space between the file name and the open bracket 
> for the first function name.
> 
> Why don't you need the big-endian code path any more?
> 
> R.
> 
>>
>> pr7057.patch
>>
>>
>> diff --git a/gcc/config/aarch64/aarch64-simd.md
>> b/gcc/config/aarch64/aarch64-simd.md
>> index c462164..4a253a9 100644
>> --- a/gcc/config/aarch64/aarch64-simd.md
>> +++ b/gcc/config/aarch64/aarch64-simd.md
>> @@ -2807,27 +2807,11 @@
>>op1 = operands[1];
>>op2 = operands[2];
>>  }
>> -  emit_insn (gen_aarch64_combine_internal (operands[0], op1, 
>> op2));
>> -  DONE;
>> -}
>> -)
>>  
>> -(define_insn_and_split "aarch64_combine_internal"
>> -  [(set (match_operand: 0 "register_operand" "=&w")
>> -(vec_concat: (match_operand:VDC 1 "register_operand" "w")
>> -   (match_operand:VDC 2 "register_operand" "w")))]
>> -  "TARGET_SIMD"
>> -  "#"
>> -  "&& reload_completed"
>> -  [(const_int 0)]
>> -{
>> -  if (BYTES_BIG_ENDIAN)
>> -aarch64_split_simd_combine (operands[0], operands[2], operands[1]);
>> -  else
>> -aarch64_split_simd_combine (operands[0], operands[1], operands[2]);
>> +  aarch64_split_simd_combine (operands[0], op1, op2);
>> +
>>DONE;
>>  }
>> -[(set_attr "type" "multiple")]
>>  )
>>  
>>  (define_expand "aarch64_simd_combine"
>> diff --git a/gcc/config/aarch64/aarch64.c 
>> b/gcc/config/aarch64/aarch64.c index 2e385c4..46bd78b 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -1650,7 +1650,8 @@ aarch64_split_simd_combine (rtx dst, rtx src1, 
>> rtx src2)
>>  
>>gcc_assert (VECTOR_MODE_P (dst_mode));
>>  
>> -  if (REG_P (dst) && REG_P (src1) && REG_P (src2))
>> +  if (register_operand (dst, dst_mode) && register_operand (src1, src_mode)
>> +  && register_operand (src2, src_mode))
>>  {
>>rtx (*gen) (rtx, rtx, rtx);
>>  
>>
> 
> 
> pr7057v4.patch
> 
> 
> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index c462164..3043f81 100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -2796,38 +2796,10 @@
> (match_operand:VDC 2 "register_operand")]
>