Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant

2016-07-05 Thread Kyrill Tkachov


On 01/07/16 13:02, Richard Biener wrote:

On Thu, 30 Jun 2016, Kyrill Tkachov wrote:


On 28/06/16 08:54, Richard Biener wrote:

On Thu, 16 Jun 2016, Kyrill Tkachov wrote:


On 15/06/16 22:53, Marc Glisse wrote:

On Wed, 15 Jun 2016, Kyrill Tkachov wrote:


This is a respin of
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00952.html following
feedback.
I've changed the code to cast the operand to an unsigned type before
applying the multiplication algorithm
and cast it back to the signed type at the end.
Whether to perform the cast is now determined by the function
cast_mult_synth_to_unsigned in which I've implemented
the cases that Marc mentioned in [1]. Please do let me know
if there are any other cases that need to be handled.

Ah, I never meant those cases as an exhaustive list, I was just looking
for
examples showing that the transformation was unsafe, and those 2 came to
mind:

- x*15 -> x*16-x the second one can overflow even when the first one
doesn't.

- x*-2 -> -(x*2) can overflow when the result is INT_MIN (maybe that's
redundant with the negate_variant check?)

On the other hand, as long as we remain in the 'positive' operations,
turning x*3 to x<<1+x seems perfectly safe. And even x*30 to (x*16-x)*2
cannot cause spurious overflows. But I didn't look at the algorithm
closely
enough to characterize the safe cases. Now if you have done it, that's
good
:-) Otherwise, we might want to err on the side of caution.


I'll be honest, I didn't give it much thought beyond convincing myself
that
the two cases you listed are legitimate.
Looking at expand_mult_const in expmed.c can be helpful (where it updates
val_so_far for checking purposes) to see
the different algorithm cases. I think the only steps that could cause
overflow are alg_sub_t_m2, alg_sub_t2_m and alg_sub_factor or when the
final
step is negate_variant, which are what you listed (and covered in this
patch).

richi is away on PTO for the time being though, so we have some time to
convince ourselves :)

;)  I think the easiest way would be to always use unsigned arithmetic.

While VRP doesn't do anything on vector operations we still have some
match.pd patterns that rely on correct overflow behavior and those
may be enabled for vector types as well.

That's fine with me.
Here's the patch that performs the casts to unsigned and back when the input
type doesn't wrap on overflow.

Bootstrapped and tested on arm, aarch64, x86_64.

How's this?

+static bool
+target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
+tree scaltype)
+{
...
+  tree vectype = get_vectype_for_scalar_type (scaltype);
+
+  if (!vectype)
+return false;
+
+  /* All synthesis algorithms require shifts, so bail out early if
+ target cannot vectorize them.
+ TODO: If the target doesn't support vector shifts but supports
vector
+ addition we could synthesize shifts that way.
vect_synth_mult_by_constant
+ would need to be updated to do that.  */
+  if (!vect_supportable_shift (LSHIFT_EXPR, scaltype))
+return false;

I think these tests should be done in the caller before calling
synth_mult (and vectype be passed down accordingly).  Also I wonder
if synth_mult for a * 2 produces a << 1 or a + a - the case of
a * 2 -> a + a was what Marcs patch handled.  Guarding off LSHIFT_EXPR
support that early will make that fail on targets w/o vector shift.


I believe it depends on the relative rtx costs (which of course are not relevant
at vector gimple level). Anyway, I've moved the check outside.

I've also added code to synthesise the shifts by additions when vector shift
is not available (the new function is synth_lshift_by_additions).


+  bool supports_vminus = target_has_vecop_for_code (MINUS_EXPR, vectype);
+  bool supports_vplus = target_has_vecop_for_code (PLUS_EXPR, vectype);
+
+  if (var == negate_variant
+  && !target_has_vecop_for_code (NEGATE_EXPR, vectype))
+return false;

I think we don't have any targets that support PLUS but not MINUS
or targets that do not support NEGATE.  OTOH double-checking doesn't
matter.

+apply_binop_and_append_stmt (tree_code code, tree op1, tree op2,
+stmt_vec_info stmt_vinfo)
+{
+  if (TREE_INT_CST_LOW (op2) == 0

integer_zerop (op2)


Ok


+  && (code == LSHIFT_EXPR
+ || code == PLUS_EXPR))

+  tree itype = TREE_TYPE (op2);

I think it's dangerous to use the type of op2 given you synthesize
shifts as well.  Why not use the type of op1?


Yeah, we should be taking the type of op1. Fixed.


+  /* TODO: Targets that don't support vector shifts could synthesize
+them using vector additions.  target_supports_mult_synth_alg
would
+need to be updated to allow this.  */
+  switch (alg.op[i])
+   {

so I suppose one could at least special-case << 1 and always use
PLUS for that.


As said above, I added code to synthesize all constant shifts using additions
if the target doesn't support vector shifts. Note t

Check fpic is ok for target in pr69102.c

2016-07-05 Thread Kito Cheng
Hi all:

pr69102.c use -fPIC flag in dg-options but not check is available for
target, so I add "dg-require-effective-target fpic" for it.

ChangeLog
2016-07-05  Kito Cheng 

* gcc.c-torture/compile/pr69102.c: Require fpic support.
From caa51d92e620694ee1365ce0f77ac2b152662982 Mon Sep 17 00:00:00 2001
From: Kito Cheng 
Date: Tue, 5 Jul 2016 16:14:45 +0800
Subject: [PATCH] Check fpic for pr69102.c

---
 gcc/testsuite/gcc.c-torture/compile/pr69102.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gcc/testsuite/gcc.c-torture/compile/pr69102.c b/gcc/testsuite/gcc.c-torture/compile/pr69102.c
index 1f0cdc6..5c8c541 100644
--- a/gcc/testsuite/gcc.c-torture/compile/pr69102.c
+++ b/gcc/testsuite/gcc.c-torture/compile/pr69102.c
@@ -1,5 +1,6 @@
 /* { dg-options "-Og -fPIC -fschedule-insns2 -fselective-scheduling2 -fno-tree-fre --param=max-sched-extend-regions-iters=10" } */
 /* { dg-require-effective-target scheduling } */
+/* { dg-require-effective-target fpic } */
 void bar (unsigned int);
 
 void
-- 
1.9.1



Re: [ARM][testsuite] neon-testgen.ml removal

2016-07-05 Thread Christophe Lyon
On 4 July 2016 at 18:28, Kyrill Tkachov  wrote:
> Hi Christophe,
>
>
> On 22/06/16 16:52, Christophe Lyon wrote:
>>
>> Hi,
>>
>> This is a new attempt at removing neon-testgen.ml and generated files.
>>
>> Compared to my previous version several months ago:
>> - I have recently added testcases to make sure we do not lose coverage
>> as described in
>> https://gcc.gnu.org/ml/gcc-patches/2015-11/msg02922.html
>> - I now also remove neon.ml as requested by Kyrylo in
>> https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01664.html, and moved
>> the remaining hand-written tests up to gcc.target/arm.
>>
>> Doing this, I had to slightly update vst1Q_laneu64-1.c because it's
>> now compiled with more pedantic flags and there was a signed/unsigned
>> char buffer pointer mismatch.
>>
>> Sorry, I had to compress the patch, otherwise it's too large and rejected
>> by the list server.
>>
>> OK?
>>
>> Christophe
>
>
> [ARM] neon-testgen.ml, neon.ml and generated files removal.
> gcc/
> 2016-06-17  Christophe Lyon
> * config/arm/neon-testgen.ml: Delete.
> * config/arm/neon.ml: Delete.
> gcc/testsuite/
> 2016-06-17  Christophe Lyon
> * gcc.target/arm/neon/polytypes.c: Move to ...
> * gcc.target/arm/polytypes.c: ... here.
> * gcc.target/arm/neon/pr51534.c: Move to ...
> * gcc.target/arm/pr51534.c: ... here.
> * gcc.target/arm/neon/vect-vcvt.c: Move to ...
> * gcc.target/arm/vect-vcvt.c: ... here.
> * gcc.target/arm/neon/vect-vcvtq.c: Move to ...
> * gcc.target/arm/vect-vcvtq.c: ... here.
> * gcc.target/arm/neon/vfp-shift-a2t2.c: Move to ...
> * gcc.target/arm/vfp-shift-a2t2.c: ... here.
> * gcc.target/arm/neon/vst1Q_laneu64-1.c: Move to ...
> * gcc.target/arm/vst1Q_laneu64-1.c: ... here. Fix foo() prototype.
> * gcc.target/arm/neon/neon.exp: Delete.
> * gcc.target/arm/neon/*.c: Delete.
>
> I think this should be "* gcc.target/arm/neon/: Delete." to make it clear
> that the
> directory is being removed.
>
> This is ok for trunk.
> Thanks for dealing with this!
>
Thanks, committed as r238000 with the suggested ChangeLog change.

Christophe

> Kyrill
>
>


Re: [PATCH, libgcc/ARM 1b/7] Check CLZ availability with ISA support and architecture level macros

2016-07-05 Thread Thomas Preudhomme
[Fixed subject to reflect patch]

Ping?

Best regards,

Thomas

On Monday 27 June 2016 17:51:34 Thomas Preudhomme wrote:
> Hi Ramana,
> 
> On Wednesday 01 June 2016 10:00:52 Ramana Radhakrishnan wrote:
> > From here down to 
> > 
> > > -#if ((__ARM_ARCH__ > 5) && !defined(__ARM_ARCH_6M__)) \
> > > -|| defined(__ARM_ARCH_5E__) || defined(__ARM_ARCH_5TE__) \
> > > -|| defined(__ARM_ARCH_5TEJ__)
> > > -#define HAVE_ARM_CLZ 1
> > > -#endif
> > > -
> > > 
> > >  #ifdef L_clzsi2
> > > 
> > > -#if defined(__ARM_ARCH_6M__)
> > > +#if !__ARM_ARCH_ISA_ARM && __ARM_ARCH_ISA_THUMB == 1
> > > 
> > >  FUNC_START clzsi2
> > >  
> > > mov r1, #28
> > > mov r3, #1
> > > 
> > > @@ -1544,7 +1538,7 @@ FUNC_START clzsi2
> > > 
> > > FUNC_END clzsi2
> > >  
> > >  #else
> > >  ARM_FUNC_START clzsi2
> > > 
> > > -# if defined(HAVE_ARM_CLZ)
> > > +# if defined(__ARM_FEATURE_CLZ)
> > > 
> > > clz r0, r0
> > > RET
> > >  
> > >  # else
> > > 
> > > @@ -1568,15 +1562,15 @@ ARM_FUNC_START clzsi2
> > > 
> > >  .align 2
> > >  1:
> > >  .byte 4, 3, 2, 2, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0
> > > 
> > > -# endif /* !HAVE_ARM_CLZ */
> > > +# endif /* !__ARM_FEATURE_CLZ */
> > > 
> > > FUNC_END clzsi2
> > >  
> > >  #endif
> > >  #endif /* L_clzsi2 */
> > >  
> > >  #ifdef L_clzdi2
> > > 
> > > -#if !defined(HAVE_ARM_CLZ)
> > > +#if !defined(__ARM_FEATURE_CLZ)
> > 
> > here should be it's own little patchlet and can  go in separately.
> 
> The patch in attachment changes the CLZ availability check in libgcc to test
> ISA supported and architecture version rather than encode a specific list
> of architectures. __ARM_FEATURE_CLZ is not used because its value depends
> on what mode the user is targeting but only the architecture support
> matters in this case. Indeed, the code using CLZ is written in assembler
> and uses mnemonics available both in ARM and Thumb mode so only CLZ
> availability in one of the mode matters.
> 
> This change was split out from [PATCH, GCC, ARM 1/7] Fix Thumb-1 only ==
> ARMv6-M & Thumb-2 only == ARMv7-M assumptions.
> 
> ChangeLog entry is as follows:
> 
> *** libgcc/ChangeLog ***
> 
> 2016-06-16  Thomas Preud'homme  
> 
> * config/arm/lib1funcs.S (HAVE_ARM_CLZ): Define for ARMv6* or later
> and ARMv5t* rather than for a fixed list of architectures.
> 
> Looking for code generation change accross a number of combinations of ISAs
> (ARM/Thumb), optimization levels (Os/O2), and architectures (armv4, armv4t,
> armv5, armv5t, armv5te, armv6, armv6j, armv6k, armv6s-m, armv6kz, armv6t2,
> armv6z, armv6zk, armv7, armv7-a, armv7e-m, armv7-m, armv7-r, armv7ve,
> armv8-a, armv8-a+crc, iwmmxt and iwmmxt2) shows that only ARMv5T is
> impacted (uses CLZ now). This is expected because currently HAVE_ARM_CLZ is
> not defined for this architecture while the ARMv7-a/ARMv7-R Architecture
> Reference Manual [1] states that all ARMv5T* architectures have CLZ. ARMv5E
> should also be impacted (not using CLZ anymore) but testing it is difficult
> since current binutils does not support ARMv5E.
> 
> [1] Document ARM DDI0406C in http://infocenter.arm.com
> 
> Best regards,
> 
> Thomas



Re: move increase_alignment from simple to regular ipa pass

2016-07-05 Thread Prathamesh Kulkarni
ping * 2 ping https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01703.html

Thanks,
Prathamesh

On 28 June 2016 at 14:49, Prathamesh Kulkarni
 wrote:
> ping https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01703.html
>
> Thanks,
> Prathamesh
>
> On 23 June 2016 at 22:51, Prathamesh Kulkarni
>  wrote:
>> On 17 June 2016 at 19:52, Prathamesh Kulkarni
>>  wrote:
>>> On 14 June 2016 at 18:31, Prathamesh Kulkarni
>>>  wrote:
 On 13 June 2016 at 16:13, Jan Hubicka  wrote:
>> diff --git a/gcc/cgraph.h b/gcc/cgraph.h
>> index ecafe63..41ac408 100644
>> --- a/gcc/cgraph.h
>> +++ b/gcc/cgraph.h
>> @@ -1874,6 +1874,9 @@ public:
>>   if we did not do any inter-procedural code movement.  */
>>unsigned used_by_single_function : 1;
>>
>> +  /* Set if -fsection-anchors is set.  */
>> +  unsigned section_anchor : 1;
>> +
>>  private:
>>/* Assemble thunks and aliases associated to varpool node.  */
>>void assemble_aliases (void);
>> diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
>> index 4bfcad7..e75d5c0 100644
>> --- a/gcc/cgraphunit.c
>> +++ b/gcc/cgraphunit.c
>> @@ -800,6 +800,9 @@ varpool_node::finalize_decl (tree decl)
>>   it is available to notice_global_symbol.  */
>>node->definition = true;
>>notice_global_symbol (decl);
>> +
>> +  node->section_anchor = flag_section_anchors;
>> +
>>if (TREE_THIS_VOLATILE (decl) || DECL_PRESERVE_P (decl)
>>/* Traditionally we do not eliminate static variables when not
>>optimizing and when not doing toplevel reoder.  */
>> diff --git a/gcc/common.opt b/gcc/common.opt
>> index f0d7196..e497795 100644
>> --- a/gcc/common.opt
>> +++ b/gcc/common.opt
>> @@ -1590,6 +1590,10 @@ fira-algorithm=
>>  Common Joined RejectNegative Enum(ira_algorithm) 
>> Var(flag_ira_algorithm) Init(IRA_ALGORITHM_CB) Optimization
>>  -fira-algorithm=[CB|priority] Set the used IRA algorithm.
>>
>> +fipa-increase_alignment
>> +Common Report Var(flag_ipa_increase_alignment) Init(0) Optimization
>> +Option to gate increase_alignment ipa pass.
>> +
>>  Enum
>>  Name(ira_algorithm) Type(enum ira_algorithm) UnknownError(unknown IRA 
>> algorithm %qs)
>>
>> @@ -2133,7 +2137,7 @@ Common Report Var(flag_sched_dep_count_heuristic) 
>> Init(1) Optimization
>>  Enable the dependent count heuristic in the scheduler.
>>
>>  fsection-anchors
>> -Common Report Var(flag_section_anchors) Optimization
>> +Common Report Var(flag_section_anchors)
>>  Access data in the same section from shared anchor points.
>>
>>  fsee
>> diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
>> index a0db3a4..1482566 100644
>> --- a/gcc/config/aarch64/aarch64.c
>> +++ b/gcc/config/aarch64/aarch64.c
>> @@ -8252,6 +8252,8 @@ aarch64_override_options (void)
>>
>>aarch64_register_fma_steering ();
>>
>> +  /* Enable increase_alignment pass.  */
>> +  flag_ipa_increase_alignment = 1;
>
> I would rather enable it always on targets that do support anchors.
 AFAIK aarch64 supports section anchors.
>> diff --git a/gcc/lto/lto-symtab.c b/gcc/lto/lto-symtab.c
>> index ce9e146..7f09f3a 100644
>> --- a/gcc/lto/lto-symtab.c
>> +++ b/gcc/lto/lto-symtab.c
>> @@ -342,6 +342,13 @@ lto_symtab_merge (symtab_node *prevailing, 
>> symtab_node *entry)
>>   The type compatibility checks or the completing of types has 
>> properly
>>   dealt with most issues.  */
>>
>> +  /* ??? is this assert necessary ?  */
>> +  varpool_node *v_prevailing = dyn_cast (prevailing);
>> +  varpool_node *v_entry = dyn_cast (entry);
>> +  gcc_assert (v_prevailing && v_entry);
>> +  /* section_anchor of prevailing_decl wins.  */
>> +  v_entry->section_anchor = v_prevailing->section_anchor;
>> +
> Other flags are merged in lto_varpool_replace_node so please move this 
> there.
 Ah indeed, thanks for the pointers.
 I wonder though if we need to set
 prevailing_node->section_anchor = vnode->section_anchor ?
 IIUC, the function merges flags from vnode into prevailing_node
 and removes vnode. However we want prevailing_node->section_anchor
 to always take precedence.
>> +/* Return true if alignment should be increased for this vnode.
>> +   This is done if every function that references/referring to vnode
>> +   has flag_tree_loop_vectorize set.  */
>> +
>> +static bool
>> +increase_alignment_p (varpool_node *vnode)
>> +{
>> +  ipa_ref *ref;
>> +
>> +  for (int i = 0; vnode->iterate_reference (i, ref); i++)
>> +if (cgraph_node *cnode = dyn_cast (ref->referred))
>> +  {
>> + struct cl_optimization *opts = opts_for_fn (cnode->decl);
>> + if (!opts->x_flag_tree_loop_vectorize)
>> +   

Re: [PING] Re: [PATCH] Fix MPX tests on systems with MPX disabled

2016-07-05 Thread Ilya Enkovich
2016-07-04 22:58 GMT+03:00 Andi Kleen :
> Andi Kleen  writes:
>
> PING!
>
>> From: Andi Kleen 
>>
>> I have a Skylake system with MPX in the CPU, but MPX is disabled
>> in the kernel configuration.
>>
>> This makes all the MPX tests fail because they assume if MPX
>> is in CPUID it works
>>
>> Check the output of XGETBV too to detect non MPX kernels.

The patch is OK for trunk and GCC 6 branch

Thanks,
Ilya

>>
>> gcc/testsuite/:
>>
>> 2016-06-25  Andi Kleen  
>>
>>   * gcc.target/i386/mpx/mpx-check.h: Check XGETBV output
>>   if kernel supports MPX.
>> ---
>>  gcc/testsuite/gcc.target/i386/mpx/mpx-check.h | 12 +++-
>>  1 file changed, 11 insertions(+), 1 deletion(-)
>>
>> diff --git a/gcc/testsuite/gcc.target/i386/mpx/mpx-check.h 
>> b/gcc/testsuite/gcc.target/i386/mpx/mpx-check.h
>> index 3afa460..73aa01f 100644
>> --- a/gcc/testsuite/gcc.target/i386/mpx/mpx-check.h
>> +++ b/gcc/testsuite/gcc.target/i386/mpx/mpx-check.h
>> @@ -16,6 +16,16 @@ mpx_test (int, const char **);
>>
>>  #define DEBUG
>>
>> +#define XSTATE_BNDREGS (1 << 3)
>> +
>> +/* This should be an intrinsic, but isn't.  */
>> +static int xgetbv (unsigned x)
>> +{
>> +   unsigned eax, edx;
>> +   asm ("xgetbv" : "=a" (eax), "=d" (edx) : "c" (x));
>> +   return eax;
>> +}
>> +
>>  int
>>  main (int argc, const char **argv)
>>  {
>> @@ -27,7 +37,7 @@ main (int argc, const char **argv)
>>__cpuid_count (7, 0, eax, ebx, ecx, edx);
>>
>>/* Run MPX test only if host has MPX support.  */
>> -  if (ebx & bit_MPX)
>> +  if ((ebx & bit_MPX) && (xgetbv (0) & XSTATE_BNDREGS))
>>  mpx_test (argc, argv);
>>else
>>  {
>
> --
> a...@linux.intel.com -- Speaking for myself only


Re: -fopt-info handling

2016-07-05 Thread Richard Biener
On Mon, Jul 4, 2016 at 11:45 PM, Ulrich Drepper  wrote:
> Anyone?
>
> On Mon, Jun 27, 2016 at 1:31 PM, Ulrich Drepper  wrote:
>> The manual says about -fop-info:
>>
>>If OPTIONS is omitted, it defaults to 'all-all', which means
>> dump all available optimization info from all the passes.
>>
>> The current implementation (at at least recent gcc 6.1) don't follow
>> that, though.  They just ignore the option in that case.
>>
>> How about the attached patch?  It is simple and doesn't duplicate the
>> information what "all-all" means and instead let's the option parser
>> do the hard work.

I don't think all-all is a useful default.  "optimized" may be though.

Richard.


[PATCH trivial] Fix PR71214 (__cpp_rvalue_references vs. __cpp_rvalue_reference)

2016-07-05 Thread Markus Trippelsdorf
Hi, 

as PR71214 points out gcc uses a wrong feature test macro for C++11
rvalue references: __cpp_rvalue_reference instead of the correct 
__cpp_rvalue_references.

The fix is trivial. Ok for trunk and active branches?

Thanks.

c-family/ChangeLog

* c-cppbuiltin.c (c_cpp_builtins): Use __cpp_rvalue_references
instead of __cpp_rvalue_reference.

diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
index 408ad4747a33..f19375a73730 100644
--- a/gcc/c-family/c-cppbuiltin.c
+++ b/gcc/c-family/c-cppbuiltin.c
@@ -847,7 +847,7 @@ c_cpp_builtins (cpp_reader *pfile)
cpp_define (pfile, "__cpp_static_assert=200410");
  cpp_define (pfile, "__cpp_decltype=200707");
  cpp_define (pfile, "__cpp_attributes=200809");
- cpp_define (pfile, "__cpp_rvalue_reference=200610");
+ cpp_define (pfile, "__cpp_rvalue_references=200610");
  cpp_define (pfile, "__cpp_variadic_templates=200704");
  cpp_define (pfile, "__cpp_initializer_lists=200806");
  cpp_define (pfile, "__cpp_delegating_constructors=200604");
diff --git a/gcc/testsuite/g++.dg/cpp1y/feat-cxx11.C 
b/gcc/testsuite/g++.dg/cpp1y/feat-cxx11.C
index 397b9a899573..6928d6bcbd87 100644
--- a/gcc/testsuite/g++.dg/cpp1y/feat-cxx11.C
+++ b/gcc/testsuite/g++.dg/cpp1y/feat-cxx11.C
@@ -77,10 +77,10 @@
 #  error "__cpp_attributes != 200809"
 #endif
 
-#ifndef __cpp_rvalue_reference
-#  error "__cpp_rvalue_reference"
-#elif __cpp_rvalue_reference != 200610
-#  error "__cpp_rvalue_reference != 200610"
+#ifndef __cpp_rvalue_references
+#  error "__cpp_rvalue_references"
+#elif __cpp_rvalue_references != 200610
+#  error "__cpp_rvalue_references != 200610"
 #endif
 
 #ifndef __cpp_variadic_templates
diff --git a/gcc/testsuite/g++.dg/cpp1y/feat-cxx14.C 
b/gcc/testsuite/g++.dg/cpp1y/feat-cxx14.C
index fa59f90fa892..dc30a9b3cf84 100644
--- a/gcc/testsuite/g++.dg/cpp1y/feat-cxx14.C
+++ b/gcc/testsuite/g++.dg/cpp1y/feat-cxx14.C
@@ -70,10 +70,10 @@
 #  error "__cpp_attributes != 200809"
 #endif
 
-#ifndef __cpp_rvalue_reference
-#  error "__cpp_rvalue_reference"
-#elif __cpp_rvalue_reference != 200610
-#  error "__cpp_rvalue_reference != 200610"
+#ifndef __cpp_rvalue_references
+#  error "__cpp_rvalue_references"
+#elif __cpp_rvalue_references != 200610
+#  error "__cpp_rvalue_references != 200610"
 #endif
 
 #ifndef __cpp_variadic_templates
diff --git a/gcc/testsuite/g++.dg/cpp1y/feat-cxx98-neg.C 
b/gcc/testsuite/g++.dg/cpp1y/feat-cxx98-neg.C
index 886b3d3df10e..5fbffabd1396 100644
--- a/gcc/testsuite/g++.dg/cpp1y/feat-cxx98-neg.C
+++ b/gcc/testsuite/g++.dg/cpp1y/feat-cxx98-neg.C
@@ -42,8 +42,8 @@
 #  error "__cpp_attributes" // { dg-error "error" }
 #endif
 
-#ifndef __cpp_rvalue_reference
-#  error "__cpp_rvalue_reference" // { dg-error "error" }
+#ifndef __cpp_rvalue_references
+#  error "__cpp_rvalue_references" // { dg-error "error" }
 #endif
 
 #ifndef __cpp_variadic_templates
diff --git a/gcc/testsuite/g++.dg/cpp1z/feat-cxx1z.C 
b/gcc/testsuite/g++.dg/cpp1z/feat-cxx1z.C
index f8a87a8ddc37..c7becc1cbb47 100644
--- a/gcc/testsuite/g++.dg/cpp1z/feat-cxx1z.C
+++ b/gcc/testsuite/g++.dg/cpp1z/feat-cxx1z.C
@@ -58,10 +58,10 @@
 #  error "__cpp_attributes != 200809"
 #endif
 
-#ifndef __cpp_rvalue_reference
-#  error "__cpp_rvalue_reference"
-#elif __cpp_rvalue_reference != 200610
-#  error "__cpp_rvalue_reference != 200610"
+#ifndef __cpp_rvalue_references
+#  error "__cpp_rvalue_references"
+#elif __cpp_rvalue_references != 200610
+#  error "__cpp_rvalue_references != 200610"
 #endif
 
 #ifndef __cpp_variadic_templates
-- 
Markus


Re: [PATCH] - improve sprintf buffer overflow detection (middle-end/49905)

2016-07-05 Thread Richard Biener
On Mon, 4 Jul 2016, Martin Sebor wrote:

> On 07/04/2016 04:59 AM, Richard Biener wrote:
> > On Fri, 1 Jul 2016, Martin Sebor wrote:
> > 
> > > The attached patch enhances compile-time checking for buffer overflow
> > > and output truncation in non-trivial calls to the sprintf family of
> > > functions under a new option -Wformat-length=[12].  This initial
> > > patch handles printf directives with string, integer, and simple
> > > floating arguments but eventually I'd like to extend it all other
> > > functions and directives for which it makes sense.
> > > 
> > > I made some choices in the implementation that resulted in trade-offs
> > > in the quality of the diagnostics.  I would be grateful for comments
> > > and suggestions how to improve them.  Besides the list I include
> > > Jakub who already gave me some feedback (thanks), Joseph who as
> > > I understand has deep knowledge of the c-format.c code, and Richard
> > > for his input on the LTO concern below.
> > > 
> > > 1) Making use of -Wformat machinery in c-family/c-format.c.  This
> > > seemed preferable to duplicating some of the same code elsewhere
> > > (I initially started implementing it in expand_builtin in
> > > builtins.c).  It makes the implementation readily extensible
> > > to all the same formats as those already handled for -Wformat.
> > > One drawback is that unlike in expand_builtin, calls to these
> > > functions cannot readily be folded.  Another drawback pointed
> > 
> > folded?  You mean this -W option changes code generation?
> 
> No, it doesn't.  What I meant is that the same code, when added
> in builtins.c instead, could readily be extended to fold into
> strings expressions like
> 
>   sprintf (buf, "%i", 123);
> 
> > 
> > > out by Jakub is that since the code is only available in the
> > > C and C++ compilers, it apparently may not be available with
> > > an LTO compiler (I don't completely understand this problem
> > > but I mention it in the interest of full disclosure). In light
> > > of the dependency in (2) below, I don't see a way to avoid it
> > > (moving c-format.c to the middle end was suggested but seemed
> > > like too much of a change to me).
> > 
> > Yes, lto1 is not linked with C_COMMON_OBJS (that could be changed
> > of course at the expense of dragging in some dead code).  Moving
> > all the format stuff to the middle-end (or separated better so
> > the overhead in lto1 is lower) would be possible as well.
> > 
> > That said, a langhook as you add it highlights the issue with LTO.
> 
> Thanks for the clarification.  IIUC, there are at least three
> possibilities for how to proceed: leave it as is (no checking
> with LTO), link LTO with C_COMMON_OBJS, or move the c-format.c
> code into the middle end.  Do you have a preference for one of
> these?  Or is there another solution that I missed?

Another solution is to implement this somewhen before LTO
bytecode is streamed out thus at the end of early optimizations.

I'm not sure linking with C_COMMON_OBJS does even work (you can try).
Likewise c-format.c may be too entangled with the FEs (maybe just
linking with c-format.o is enough?).

Richard.

> 
> FWIW, I would expect a good number of other warnings to benefit
> from optimization and having a general solution for this problem
> to be helpful.  I also suspect this isn't the first time this
> issue has come up.  I'm wondering what solutions have already
> been considered and with what pros and cons (naively, I would
> think that factoring the relevant code out of cc1 into a shared
> library that lto1 could load should work).
> 
> Martin
> 
> 

-- 
Richard Biener 
SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
21284 (AG Nuernberg)


Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant

2016-07-05 Thread Richard Biener
On Tue, 5 Jul 2016, Kyrill Tkachov wrote:

> 
> On 01/07/16 13:02, Richard Biener wrote:
> > On Thu, 30 Jun 2016, Kyrill Tkachov wrote:
> > 
> > > On 28/06/16 08:54, Richard Biener wrote:
> > > > On Thu, 16 Jun 2016, Kyrill Tkachov wrote:
> > > > 
> > > > > On 15/06/16 22:53, Marc Glisse wrote:
> > > > > > On Wed, 15 Jun 2016, Kyrill Tkachov wrote:
> > > > > > 
> > > > > > > This is a respin of
> > > > > > > https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00952.html following
> > > > > > > feedback.
> > > > > > > I've changed the code to cast the operand to an unsigned type
> > > > > > > before
> > > > > > > applying the multiplication algorithm
> > > > > > > and cast it back to the signed type at the end.
> > > > > > > Whether to perform the cast is now determined by the function
> > > > > > > cast_mult_synth_to_unsigned in which I've implemented
> > > > > > > the cases that Marc mentioned in [1]. Please do let me know
> > > > > > > if there are any other cases that need to be handled.
> > > > > > Ah, I never meant those cases as an exhaustive list, I was just
> > > > > > looking
> > > > > > for
> > > > > > examples showing that the transformation was unsafe, and those 2
> > > > > > came to
> > > > > > mind:
> > > > > > 
> > > > > > - x*15 -> x*16-x the second one can overflow even when the first one
> > > > > > doesn't.
> > > > > > 
> > > > > > - x*-2 -> -(x*2) can overflow when the result is INT_MIN (maybe
> > > > > > that's
> > > > > > redundant with the negate_variant check?)
> > > > > > 
> > > > > > On the other hand, as long as we remain in the 'positive'
> > > > > > operations,
> > > > > > turning x*3 to x<<1+x seems perfectly safe. And even x*30 to
> > > > > > (x*16-x)*2
> > > > > > cannot cause spurious overflows. But I didn't look at the algorithm
> > > > > > closely
> > > > > > enough to characterize the safe cases. Now if you have done it,
> > > > > > that's
> > > > > > good
> > > > > > :-) Otherwise, we might want to err on the side of caution.
> > > > > > 
> > > > > I'll be honest, I didn't give it much thought beyond convincing myself
> > > > > that
> > > > > the two cases you listed are legitimate.
> > > > > Looking at expand_mult_const in expmed.c can be helpful (where it
> > > > > updates
> > > > > val_so_far for checking purposes) to see
> > > > > the different algorithm cases. I think the only steps that could cause
> > > > > overflow are alg_sub_t_m2, alg_sub_t2_m and alg_sub_factor or when the
> > > > > final
> > > > > step is negate_variant, which are what you listed (and covered in this
> > > > > patch).
> > > > > 
> > > > > richi is away on PTO for the time being though, so we have some time
> > > > > to
> > > > > convince ourselves :)
> > > > ;)  I think the easiest way would be to always use unsigned arithmetic.
> > > > 
> > > > While VRP doesn't do anything on vector operations we still have some
> > > > match.pd patterns that rely on correct overflow behavior and those
> > > > may be enabled for vector types as well.
> > > That's fine with me.
> > > Here's the patch that performs the casts to unsigned and back when the
> > > input
> > > type doesn't wrap on overflow.
> > > 
> > > Bootstrapped and tested on arm, aarch64, x86_64.
> > > 
> > > How's this?
> > +static bool
> > +target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
> > +tree scaltype)
> > +{
> > ...
> > +  tree vectype = get_vectype_for_scalar_type (scaltype);
> > +
> > +  if (!vectype)
> > +return false;
> > +
> > +  /* All synthesis algorithms require shifts, so bail out early if
> > + target cannot vectorize them.
> > + TODO: If the target doesn't support vector shifts but supports
> > vector
> > + addition we could synthesize shifts that way.
> > vect_synth_mult_by_constant
> > + would need to be updated to do that.  */
> > +  if (!vect_supportable_shift (LSHIFT_EXPR, scaltype))
> > +return false;
> > 
> > I think these tests should be done in the caller before calling
> > synth_mult (and vectype be passed down accordingly).  Also I wonder
> > if synth_mult for a * 2 produces a << 1 or a + a - the case of
> > a * 2 -> a + a was what Marcs patch handled.  Guarding off LSHIFT_EXPR
> > support that early will make that fail on targets w/o vector shift.
> 
> I believe it depends on the relative rtx costs (which of course are not
> relevant
> at vector gimple level). Anyway, I've moved the check outside.
> 
> I've also added code to synthesise the shifts by additions when vector shift
> is not available (the new function is synth_lshift_by_additions).
> 
> > +  bool supports_vminus = target_has_vecop_for_code (MINUS_EXPR, vectype);
> > +  bool supports_vplus = target_has_vecop_for_code (PLUS_EXPR, vectype);
> > +
> > +  if (var == negate_variant
> > +  && !target_has_vecop_for_code (NEGATE_EXPR, vectype))
> > +return false;
> > 
> > I think we don't have any targets that support PLUS but not MINUS
> > or targets that do not support NEGA

Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant

2016-07-05 Thread Marc Glisse

On Tue, 5 Jul 2016, Kyrill Tkachov wrote:

As for testing I've bootstrapped and tested the patch on aarch64 and 
x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked to be 
always true to exercise the paths that synthesize the shift by 
additions. Marc, could you test this on the sparc targets you care 
about?


I don't have access to Sparc, you want Rainer here (added in Cc:).

(thanks for completing the patch!)

--
Marc Glisse


[Ada] Minor changes in gnat_to_gnu_entity

2016-07-05 Thread Eric Botcazou
This avoids doing useless work at the local level in gnat_to_gnu_entity.

Tested on x86_64-suse-linux, applied on the mainline.


2016-07-05  Eric Botcazou  

* gcc-interface/decl.c (gnat_to_gnu_entity): Invoke global_bindings_p
last when possible.  Do not call elaborate_expression_2 on offsets in
local record types and avoid useless processing for constant offsets.

-- 
Eric BotcazouIndex: gcc-interface/decl.c
===
--- gcc-interface/decl.c	(revision 237999)
+++ gcc-interface/decl.c	(working copy)
@@ -798,10 +798,10 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 		mutable_p = true;
 	  }
 
-	/* If we are at global level and the size isn't constant, call
+	/* If the size isn't constant and we are at global level, call
 	   elaborate_expression_1 to make a variable for it rather than
 	   calculating it each time.  */
-	if (global_bindings_p () && !TREE_CONSTANT (gnu_size))
+	if (!TREE_CONSTANT (gnu_size) && global_bindings_p ())
 	  gnu_size = elaborate_expression_1 (gnu_size, gnat_entity,
 		 "SIZE", definition, false);
 	  }
@@ -1366,10 +1366,10 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	   than the largest stack alignment the back-end can honor, resort to
 	   a variable of "aligning type".  */
 	if (definition
-	&& !global_bindings_p ()
-	&& !static_flag
+	&& TYPE_ALIGN (gnu_type) > BIGGEST_ALIGNMENT
 	&& !imported_p
-	&& TYPE_ALIGN (gnu_type) > BIGGEST_ALIGNMENT)
+	&& !static_flag
+	&& !global_bindings_p ())
 	  {
 	/* Create the new variable.  No need for extra room before the
 	   aligned field as this is in automatic storage.  */
@@ -2679,10 +2679,10 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	  TYPE_STUB_DECL (gnu_type)
 	= create_type_stub_decl (gnu_entity_name, gnu_type);
 
-	  /* If we are at file level and this is a multi-dimensional array,
+	  /* If this is a multi-dimensional array and we are at global level,
 	 we need to make a variable corresponding to the stride of the
 	 inner dimensions.   */
-	  if (global_bindings_p () && ndim > 1)
+	  if (ndim > 1 && global_bindings_p ())
 	{
 	  tree gnu_arr_type;
 
@@ -4587,10 +4587,10 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	 a constant or self-referential, call elaborate_expression_1 to
 	 make a variable for the size rather than calculating it each time.
 	 Handle both the RM size and the actual size.  */
-  if (global_bindings_p ()
-	  && TYPE_SIZE (gnu_type)
+  if (TYPE_SIZE (gnu_type)
 	  && !TREE_CONSTANT (TYPE_SIZE (gnu_type))
-	  && !CONTAINS_PLACEHOLDER_P (TYPE_SIZE (gnu_type)))
+	  && !CONTAINS_PLACEHOLDER_P (TYPE_SIZE (gnu_type))
+	  && global_bindings_p ())
 	{
 	  tree size = TYPE_SIZE (gnu_type);
 
@@ -4672,11 +4672,10 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	}
 	}
 
-  /* If this is a record type or subtype, call elaborate_expression_2 on
-	 any field position.  Do this for both global and local types.
-	 Skip any fields that we haven't made trees for to avoid problems with
-	 class wide types.  */
-  if (IN (kind, Record_Kind))
+  /* Similarly, if this is a record type or subtype at global level, call
+	 elaborate_expression_2 on any field position.  Skip any fields that
+	 we haven't made trees for to avoid problems with class-wide types.  */
+  if (IN (kind, Record_Kind) && global_bindings_p ())
 	for (gnat_temp = First_Entity (gnat_entity); Present (gnat_temp);
 	 gnat_temp = Next_Entity (gnat_temp))
 	  if (Ekind (gnat_temp) == E_Component && present_gnu_tree (gnat_temp))
@@ -4685,7 +4684,8 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 
 	  /* ??? For now, store the offset as a multiple of the alignment
 		 in bytes so that we can see the alignment from the tree.  */
-	  if (!CONTAINS_PLACEHOLDER_P (DECL_FIELD_OFFSET (gnu_field)))
+	  if (!TREE_CONSTANT (DECL_FIELD_OFFSET (gnu_field))
+		  && !CONTAINS_PLACEHOLDER_P (DECL_FIELD_OFFSET (gnu_field)))
 		{
 		  DECL_FIELD_OFFSET (gnu_field)
 		= elaborate_expression_2 (DECL_FIELD_OFFSET (gnu_field),
@@ -4696,8 +4696,7 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 		  /* ??? The context of gnu_field is not necessarily gnu_type
 		 so the MULT_EXPR node built above may not be marked by
 		 the call to create_type_decl below.  */
-		  if (global_bindings_p ())
-		MARK_VISITED (DECL_FIELD_OFFSET (gnu_field));
+		  MARK_VISITED (DECL_FIELD_OFFSET (gnu_field));
 		}
 	}
 


Re: [PATCH trivial] Fix PR71214 (__cpp_rvalue_references vs. __cpp_rvalue_reference)

2016-07-05 Thread Richard Biener
On Tue, Jul 5, 2016 at 12:07 PM, Markus Trippelsdorf
 wrote:
> Hi,
>
> as PR71214 points out gcc uses a wrong feature test macro for C++11
> rvalue references: __cpp_rvalue_reference instead of the correct
> __cpp_rvalue_references.
>
> The fix is trivial. Ok for trunk and active branches?

I wonder if we should to retain the (bogus) old defines for backward
compatibility.

Does anyone use those?

Richard.

> Thanks.
>
> c-family/ChangeLog
>
> * c-cppbuiltin.c (c_cpp_builtins): Use __cpp_rvalue_references
> instead of __cpp_rvalue_reference.
>
> diff --git a/gcc/c-family/c-cppbuiltin.c b/gcc/c-family/c-cppbuiltin.c
> index 408ad4747a33..f19375a73730 100644
> --- a/gcc/c-family/c-cppbuiltin.c
> +++ b/gcc/c-family/c-cppbuiltin.c
> @@ -847,7 +847,7 @@ c_cpp_builtins (cpp_reader *pfile)
> cpp_define (pfile, "__cpp_static_assert=200410");
>   cpp_define (pfile, "__cpp_decltype=200707");
>   cpp_define (pfile, "__cpp_attributes=200809");
> - cpp_define (pfile, "__cpp_rvalue_reference=200610");
> + cpp_define (pfile, "__cpp_rvalue_references=200610");
>   cpp_define (pfile, "__cpp_variadic_templates=200704");
>   cpp_define (pfile, "__cpp_initializer_lists=200806");
>   cpp_define (pfile, "__cpp_delegating_constructors=200604");
> diff --git a/gcc/testsuite/g++.dg/cpp1y/feat-cxx11.C 
> b/gcc/testsuite/g++.dg/cpp1y/feat-cxx11.C
> index 397b9a899573..6928d6bcbd87 100644
> --- a/gcc/testsuite/g++.dg/cpp1y/feat-cxx11.C
> +++ b/gcc/testsuite/g++.dg/cpp1y/feat-cxx11.C
> @@ -77,10 +77,10 @@
>  #  error "__cpp_attributes != 200809"
>  #endif
>
> -#ifndef __cpp_rvalue_reference
> -#  error "__cpp_rvalue_reference"
> -#elif __cpp_rvalue_reference != 200610
> -#  error "__cpp_rvalue_reference != 200610"
> +#ifndef __cpp_rvalue_references
> +#  error "__cpp_rvalue_references"
> +#elif __cpp_rvalue_references != 200610
> +#  error "__cpp_rvalue_references != 200610"
>  #endif
>
>  #ifndef __cpp_variadic_templates
> diff --git a/gcc/testsuite/g++.dg/cpp1y/feat-cxx14.C 
> b/gcc/testsuite/g++.dg/cpp1y/feat-cxx14.C
> index fa59f90fa892..dc30a9b3cf84 100644
> --- a/gcc/testsuite/g++.dg/cpp1y/feat-cxx14.C
> +++ b/gcc/testsuite/g++.dg/cpp1y/feat-cxx14.C
> @@ -70,10 +70,10 @@
>  #  error "__cpp_attributes != 200809"
>  #endif
>
> -#ifndef __cpp_rvalue_reference
> -#  error "__cpp_rvalue_reference"
> -#elif __cpp_rvalue_reference != 200610
> -#  error "__cpp_rvalue_reference != 200610"
> +#ifndef __cpp_rvalue_references
> +#  error "__cpp_rvalue_references"
> +#elif __cpp_rvalue_references != 200610
> +#  error "__cpp_rvalue_references != 200610"
>  #endif
>
>  #ifndef __cpp_variadic_templates
> diff --git a/gcc/testsuite/g++.dg/cpp1y/feat-cxx98-neg.C 
> b/gcc/testsuite/g++.dg/cpp1y/feat-cxx98-neg.C
> index 886b3d3df10e..5fbffabd1396 100644
> --- a/gcc/testsuite/g++.dg/cpp1y/feat-cxx98-neg.C
> +++ b/gcc/testsuite/g++.dg/cpp1y/feat-cxx98-neg.C
> @@ -42,8 +42,8 @@
>  #  error "__cpp_attributes" // { dg-error "error" }
>  #endif
>
> -#ifndef __cpp_rvalue_reference
> -#  error "__cpp_rvalue_reference" // { dg-error "error" }
> +#ifndef __cpp_rvalue_references
> +#  error "__cpp_rvalue_references" // { dg-error "error" }
>  #endif
>
>  #ifndef __cpp_variadic_templates
> diff --git a/gcc/testsuite/g++.dg/cpp1z/feat-cxx1z.C 
> b/gcc/testsuite/g++.dg/cpp1z/feat-cxx1z.C
> index f8a87a8ddc37..c7becc1cbb47 100644
> --- a/gcc/testsuite/g++.dg/cpp1z/feat-cxx1z.C
> +++ b/gcc/testsuite/g++.dg/cpp1z/feat-cxx1z.C
> @@ -58,10 +58,10 @@
>  #  error "__cpp_attributes != 200809"
>  #endif
>
> -#ifndef __cpp_rvalue_reference
> -#  error "__cpp_rvalue_reference"
> -#elif __cpp_rvalue_reference != 200610
> -#  error "__cpp_rvalue_reference != 200610"
> +#ifndef __cpp_rvalue_references
> +#  error "__cpp_rvalue_references"
> +#elif __cpp_rvalue_references != 200610
> +#  error "__cpp_rvalue_references != 200610"
>  #endif
>
>  #ifndef __cpp_variadic_templates
> --
> Markus


Re: [PATCH PR c/71699] Handle pointer arithmetic in nonzero tree checks

2016-07-05 Thread Richard Biener
On Fri, Jul 1, 2016 at 3:10 PM, Manish Goregaokar  wrote:
> Added a test:

Ok if this passed bootstrap/regtest.

Richard.

> gcc/ChangeLog:
> PR c/71699
> * fold-const.c (tree_binary_nonzero_warnv_p): Allow
> pointer addition to also be considered nonzero.
>
> gcc/testsuite/ChangeLog:
> PR c/71699
> * c-c++-common/pointer-addition-nonnull.c: New test for
> pointer addition.
> ---
>  gcc/fold-const.c|  3 +++
>  .../c-c++-common/pointer-addition-nonnull.c | 21 
> +
>  2 files changed, 24 insertions(+)
>  create mode 100644 gcc/testsuite/c-c++-common/pointer-addition-nonnull.c
>
> diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> index 3b9500d..0d82018 100644
> --- a/gcc/fold-const.c
> +++ b/gcc/fold-const.c
> @@ -13199,6 +13199,9 @@ tree_binary_nonzero_warnv_p (enum tree_code code,
>switch (code)
>  {
>  case POINTER_PLUS_EXPR:
> +  return flag_delete_null_pointer_checks
> +&& (tree_expr_nonzero_warnv_p (op0, strict_overflow_p)
> +|| tree_expr_nonzero_warnv_p (op1, strict_overflow_p));
>  case PLUS_EXPR:
>if (ANY_INTEGRAL_TYPE_P (type) && TYPE_OVERFLOW_UNDEFINED (type))
>  {
> diff --git a/gcc/testsuite/c-c++-common/pointer-addition-nonnull.c
> b/gcc/testsuite/c-c++-common/pointer-addition-nonnull.c
> new file mode 100644
> index 000..10bc04c
> --- /dev/null
> +++ b/gcc/testsuite/c-c++-common/pointer-addition-nonnull.c
> @@ -0,0 +1,21 @@
> +/* { dg-options "-c -O2 -Wall" } */
> +
> +char *xstrdup (const char *) __attribute__ ((__returns_nonnull__));
> +
> +#define PREFIX "some "
> +
> +int
> +main ()
> +{
> +  char *saveptr;
> +  char *name = xstrdup (PREFIX "name");
> +
> +  char *tail = name + sizeof (PREFIX) - 1;
> +
> +  if (tail == 0)
> +tail = saveptr;
> +  while (*tail == ' ')
> +++tail;
> +
> +  return 0;
> +}
> \ No newline at end of file
> --
> 2.8.3
>
> -Manish
>
>
> On Fri, Jul 1, 2016 at 1:40 PM, Richard Biener
>  wrote:
>> On Thu, Jun 30, 2016 at 5:14 PM, Marc Glisse  wrote:
>>> On Thu, 30 Jun 2016, Richard Biener wrote:
>>>
 points-to analysis already has the constraint that POINTER_PLUS_EXPR
 cannot leave the object op0 points to.  Of course currently nothing uses 
 the
 fact whether points-to computes pointed-to as nothing (aka NULL) - so the
 argument may be moot.

 Anyway, one of my points to the original patch was that POINTER_PLUS_EXPR
 handling should be clearly separate from PLUS_EXPR and that we have
 flag_delete_null_pointer_checks to allow targest to declare that 0 is a
 valid
 object pointer (and thus you can do 4 + -4 and reach NULL).
>>>
>>>
>>> Thanks. So the tricky point is that we are not allowed to transform g into f
>>> below:
>>>
>>> char*f(char*p){return p+4;}
>>> char*g(char*p){return (char*)((intptr_t)p+4);}
>>>
>>> That makes sense and seems much easier to guarantee than I feared, nice.
>>>
>>> (on the other hand, only RTL is able to simplify (long)p+4-(long)(p+4))
>>
>> Hmm, yeah - we have some match.pd stuff to handle these kind of cases,
>> like p + ((long)p2 - (long)p1)) and also (long)(p + x) - (long)p.
>>
>> OTOH to handle (long)p + 4 - (long)(p + 4) the only thing we need is to
>> transform (long)(p + 4) to (long)p + 4 ... that would simplify things but
>> of course we cannot ever undo that canonicalization if the result is
>> ever converted back to a pointer.  But maybe we can value-number it
>> the same with some tricks... (might be worth to file a bugreport)
>>
>> Richard.
>>
>>> --
>>> Marc Glisse


[patch] Fix type merging deficiency during WPA

2016-07-05 Thread Eric Botcazou
Hi,

the deficiency comes from a chicken-and-egg problem during WPA: DECL nodes 
merging depends on type merging, but type merging also depends on DECL nodes 
merging for dynamic types declared at file scope, which easily occurs in Ada.

For the attached trivial testcase, the compiler issues:

/home/eric/svn/gcc/gcc/testsuite/gnat.dg/lto18_pkg1.ads:12:13: warning: type 
of 'lto18_pkg1__proc' does not match original declaration [-Wlto-type-
mismatch]
/home/eric/svn/gcc/gcc/testsuite/gnat.dg/lto18_pkg1.adb:3:3: note: 
'lto18_pkg1__proc' was previously declared here
/home/eric/svn/gcc/gcc/testsuite/gnat.dg/lto18_pkg1.adb:3:3: note: code may be 
misoptimized unless -fno-strict-aliasing is used

The proposed fix is to add a special processing in operand_equal_p/add_expr 
for DECL nodes during WPA.  It contains a tweak for lto_fixup_prevailing_decls 
for the sake of completeness, but it is not necessary for fixing the problem.

Tested on x86_64-suse-linux, OK for the mainline?


2016-07-05  Eric Botcazou  

* cgraph.h (symbol_table::decl_assembler_name_hash): Make public.
* fold-const.c (operand_equal_p) : Add special
processing during WPA.
* tree.c (add_expr) : Likewise.
lto/
* lto.c (walk_simple_constant_arithmetic): New function.
(LTO_SET_PREVAIL): Use it to fix up DECL nodes in simple expressions.


2016-07-05  Eric Botcazou  

* gnat.dg/lto18.adb: New test.
* gnat.dg/lto18_pkg1.ad[sb]: New helper.
* gnat.dg/lto18_pkg2.ad[sb]: Likewise.

-- 
Eric BotcazouIndex: cgraph.h
===
--- cgraph.h	(revision 237999)
+++ cgraph.h	(working copy)
@@ -2175,6 +2175,9 @@ public:
   /* Set the DECL_ASSEMBLER_NAME and update symtab hashtables.  */
   void change_decl_assembler_name (tree decl, tree name);
 
+  /* Hash asmnames ignoring the user specified marks.  */
+  static hashval_t decl_assembler_name_hash (const_tree asmname);
+
   /* Return true if assembler names NAME1 and NAME2 leads to the same symbol
  name.  */
   static bool assembler_names_equal_p (const char *name1, const char *name2);
@@ -2243,9 +2246,6 @@ private:
   /* Remove NODE from assembler name hash.  */
   void unlink_from_assembler_name_hash (symtab_node *node, bool with_clones);
 
-  /* Hash asmnames ignoring the user specified marks.  */
-  static hashval_t decl_assembler_name_hash (const_tree asmname);
-
   /* Compare ASMNAME with the DECL_ASSEMBLER_NAME of DECL.  */
   static bool decl_assembler_name_equal (tree decl, const_tree asmname);
 
Index: fold-const.c
===
--- fold-const.c	(revision 237999)
+++ fold-const.c	(working copy)
@@ -3226,6 +3226,17 @@ operand_equal_p (const_tree arg0, const_
 	}
 
 case tcc_declaration:
+  /* During WPA we can be called before the DECL nodes are merged.  */
+  if (flag_wpa
+	  && TREE_CODE (arg0) == VAR_DECL
+	  && (TREE_PUBLIC (arg0) || DECL_EXTERNAL (arg0))
+	  && (TREE_PUBLIC (arg1) || DECL_EXTERNAL (arg1)))
+	return symbol_table::assembler_names_equal_p
+		 (IDENTIFIER_POINTER
+		(DECL_ASSEMBLER_NAME (const_cast (arg0))),
+		  IDENTIFIER_POINTER
+		(DECL_ASSEMBLER_NAME (const_cast (arg1;
+
   /* Consider __builtin_sqrt equal to sqrt.  */
   return (TREE_CODE (arg0) == FUNCTION_DECL
 	  && DECL_BUILT_IN (arg0) && DECL_BUILT_IN (arg1)
Index: lto/lto.c
===
--- lto/lto.c	(revision 237999)
+++ lto/lto.c	(working copy)
@@ -2506,15 +2506,55 @@ lto_wpa_write_files (void)
   timevar_pop (TV_WHOPR_WPA_IO);
 }
 
+/* Look inside *EXPR into simple arithmetic operations involving constants.
+   Return the address of the outermost non-arithmetic or non-constant node.
+   This should be equivalent to tree.c:skip_simple_constant_arithmetic.  */
 
-/* If TT is a variable or function decl replace it with its
-   prevailing variant.  */
+static tree *
+walk_simple_constant_arithmetic (tree *expr)
+{
+  while (TREE_CODE (*expr) == NON_LVALUE_EXPR)
+expr = &TREE_OPERAND (*expr, 0);
+
+  while (true)
+{
+  if (UNARY_CLASS_P (*expr))
+	expr = &TREE_OPERAND (*expr, 0);
+  else if (BINARY_CLASS_P (*expr))
+	{
+	  if (TREE_CONSTANT (TREE_OPERAND (*expr, 1)))
+	expr = &TREE_OPERAND (*expr, 0);
+	  else if (TREE_CONSTANT (TREE_OPERAND (*expr, 0)))
+	expr = &TREE_OPERAND (*expr, 1);
+	  else
+	break;
+	}
+  else
+	break;
+}
+
+  return expr;
+}
+
+/* If TT is a variable or function decl or a simple expression containing one,
+   replace the occurrence with its prevailing variant.  This should be able to
+   deal with all the expressions attached to _DECL and _TYPE nodes which were
+   streamed into the GIMPLE IR.  */
 #define LTO_SET_PREVAIL(tt) \
   do {\
+tree *loc; \
 if ((tt) && VAR_OR_FUNCTION_DECL_P (tt) \
 	&& (TREE_PUBLIC (tt) || DECL_EXTERNAL (tt))) \
   { \
-tt = lto_symtab_prev

Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant

2016-07-05 Thread Rainer Orth
Marc Glisse  writes:

> On Tue, 5 Jul 2016, Kyrill Tkachov wrote:
>
>> As for testing I've bootstrapped and tested the patch on aarch64 and
>> x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked to be
>> always true to exercise the paths that synthesize the shift by
>> additions. Marc, could you test this on the sparc targets you care about?
>
> I don't have access to Sparc, you want Rainer here (added in Cc:).

As is, the patch doesn't even build on current mainline:

/vol/gcc/src/hg/trunk/local/gcc/tree-vect-patterns.c:2151:56: error: 
'mult_variant' has not been declared
 target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
^

enum mult_variant is only declared in expmed.c.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [PATCH, rs6000] Use direct moves for __builtin_signbit for 128-bit floating-point

2016-07-05 Thread Segher Boessenkool
On Fri, Jul 01, 2016 at 08:04:46PM -0500, Bill Schmidt wrote:
> Bootstrapped and tested on powerpc64le-unknown-linux-gnu with no regressions.
> Is this ok for trunk, and eventual 6.2 backport?

Okay for both.  Just a few cosmetics:

> +  /* If this is a VSX register, generate the special mfvsrd instruction
> +  to get it in a GPR.  Until we support SF and DF modes, that will
> + always be true.  */

This last line is indented wrong (needs a tab).

> +;; Optimize signbit on 64-bit systems with direct move to avoid doing the 
> store
> +;; and load

Sentences end with a full stop.

Thanks,


Segher


Re: [PATCH][vectorizer][2/2] Hook up mult synthesis logic into vectorisation of mult-by-constant

2016-07-05 Thread Kyrill Tkachov


On 05/07/16 12:24, Rainer Orth wrote:

Marc Glisse  writes:


On Tue, 5 Jul 2016, Kyrill Tkachov wrote:


As for testing I've bootstrapped and tested the patch on aarch64 and
x86_64 with synth_shift_p in vect_synth_mult_by_constant hacked to be
always true to exercise the paths that synthesize the shift by
additions. Marc, could you test this on the sparc targets you care about?

I don't have access to Sparc, you want Rainer here (added in Cc:).

As is, the patch doesn't even build on current mainline:

/vol/gcc/src/hg/trunk/local/gcc/tree-vect-patterns.c:2151:56: error: 
'mult_variant' has not been declared
  target_supports_mult_synth_alg (struct algorithm *alg, mult_variant var,
 ^

enum mult_variant is only declared in expmed.c.


Ah, sorry I forgot to mention that this is patch 2/2.
It requires https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html which is 
already approved
but I was planning to commit it together with this one.
Can you please try applying 
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01144.html
as well as this?

Thanks,
Kyrill


Rainer





Re: [patch] Fix type merging deficiency during WPA

2016-07-05 Thread Richard Biener
On Tue, Jul 5, 2016 at 12:57 PM, Eric Botcazou  wrote:
> Hi,
>
> the deficiency comes from a chicken-and-egg problem during WPA: DECL nodes
> merging depends on type merging, but type merging also depends on DECL nodes
> merging for dynamic types declared at file scope, which easily occurs in Ada.

So this is sth like (invalid C)

t.h
---
int n;
struct X { int x[n]; };

t1.c
--
#include "t.h"
struct X x;
t2.c
--
#include "t.h"
struct X x;

?

It's not obvious from the fix (which I think is in the wrong place)
which operand_equal/hash
call during WPA this is supposed to fix.  So can you please provide a
little more context here?

Thanks,
Richard.

> For the attached trivial testcase, the compiler issues:
>
> /home/eric/svn/gcc/gcc/testsuite/gnat.dg/lto18_pkg1.ads:12:13: warning: type
> of 'lto18_pkg1__proc' does not match original declaration [-Wlto-type-
> mismatch]
> /home/eric/svn/gcc/gcc/testsuite/gnat.dg/lto18_pkg1.adb:3:3: note:
> 'lto18_pkg1__proc' was previously declared here
> /home/eric/svn/gcc/gcc/testsuite/gnat.dg/lto18_pkg1.adb:3:3: note: code may be
> misoptimized unless -fno-strict-aliasing is used
>
> The proposed fix is to add a special processing in operand_equal_p/add_expr
> for DECL nodes during WPA.  It contains a tweak for lto_fixup_prevailing_decls
> for the sake of completeness, but it is not necessary for fixing the problem.
>
> Tested on x86_64-suse-linux, OK for the mainline?
>
>
> 2016-07-05  Eric Botcazou  
>
> * cgraph.h (symbol_table::decl_assembler_name_hash): Make public.
> * fold-const.c (operand_equal_p) : Add special
> processing during WPA.
> * tree.c (add_expr) : Likewise.
> lto/
> * lto.c (walk_simple_constant_arithmetic): New function.
> (LTO_SET_PREVAIL): Use it to fix up DECL nodes in simple expressions.
>
>
> 2016-07-05  Eric Botcazou  
>
> * gnat.dg/lto18.adb: New test.
> * gnat.dg/lto18_pkg1.ad[sb]: New helper.
> * gnat.dg/lto18_pkg2.ad[sb]: Likewise.
>
> --
> Eric Botcazou


Re: [PATCH][expr.c] PR middle-end/71700: zero-extend sub-word value when widening constructor element

2016-07-05 Thread Kyrill Tkachov

Hi Bernd,

On 04/07/16 19:02, Bernd Schmidt wrote:

On 07/01/2016 11:18 AM, Kyrill Tkachov wrote:

In this arm wrong-code PR the struct assignment goes wrong when
expanding constructor elements to a register destination
when the constructor elements are signed bitfields less than a word wide.
In this testcase we're intialising a struct with a 16-bit signed
bitfield to -1 followed by a 1-bit bitfield to 0.
Before it starts storing the elements it zeroes out the register.
 The code in store_constructor extends the first field to word size
because it appears at the beginning of a word.
It sign-extends the -1 to word size. However, when it later tries to
store the 0 to bitposition 16 it has some logic
to avoid redundant zeroing since the destination was originally cleared,
so it doesn't emit the zero store.
But the previous sign-extended -1 took up the whole word, so the
position of the second bitfield contains a set bit.

This patch fixes the problem by zeroing out the bits of the widened
field that did not appear in the original value,
so that we can safely avoid storing the second zero in the constructor.

[...]


Bootstrapped and tested on arm, aarch64, x86_64 though the codepath is
gated on WORD_REGISTER_OPERATIONS I didn't
expect any effect on aarch64 and x86_64 anyway.


So - that code path starts with this comment:

/* If this initializes a field that is smaller than a
   word, at the start of a word, try to widen it to a full
   word.  This special case allows us to output C++ member
   function initializations in a form that the optimizers
   can understand.  */

Doesn't your patch completely defeat the purpose of this? Would you get 
better/identical code by just deleting this block? It seems unfortunate to have 
two different code generation approaches like this.

It would be interesting to know the effects of your patch, and the effects of removing this code entirely, on generated code. Try to find the motivating C++ member function example perhaps? Maybe another possibility is to ensure this 
doesn't happen if the value would be interpreted as signed.




Doing some archaeology shows this code was added back in 1998 with no testcase 
(r22567) so I'd have to do more digging.

My interpretation of that comment was that for WORD_REGISTER_OPERATIONS targets 
it's more beneficial to have word-size
operations, so the expansion code tries to emit as many of the operations in 
word_mode as it safely can.
With my patch it still emits a word_mode operation, it's just that the 
immediate that is moved in word_mode has it's
top bits zeroed out instead of sign-extended.

I'll build SPEC2006 on arm (a WORD_REGISTER_OPERATIONS target) and inspect the 
assembly to see if I see any interesting
effects, but I suspect there won't be much change.

Thanks for looking at this,
Kyrill




Bernd




[PATCH] Fix loop distribution cost model

2016-07-05 Thread Richard Biener

The loop combining partitions because of cost modeling is too optimistic
in skipping partition pairs to check.  The hoisting patch exposes this,
the following fixes it.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-07-05  Richard Biener  

* tree-loop-distribution.c (distribute_loop): Fix issue with
the cost model loop.

Index: gcc/tree-loop-distribution.c
===
*** gcc/tree-loop-distribution.c(revision 237955)
--- gcc/tree-loop-distribution.c(working copy)
*** distribute_loop (struct loop *loop, vec<
*** 1504,1509 
--- 1517,1523 
   memory accesses.  */
for (i = 0; partitions.iterate (i, &into); ++i)
  {
+   bool changed = false;
if (partition_builtin_p (into))
continue;
for (int j = i + 1;
*** distribute_loop (struct loop *loop, vec<
*** 1524,1531 
--- 1538,1552 
  partitions.unordered_remove (j);
  partition_free (partition);
  j--;
+ changed = true;
}
}
+   /* If we fused 0 1 2 in step 1 to 0,2 1 as 0 and 2 have similar
+  accesses when 1 and 2 have similar accesses but not 0 and 1
+then in the next iteration we will fail to consider merging
+1 into 0,2.  So try again if we did any merging into 0.  */
+   if (changed)
+   i--;
  }
  
/* Build the partition dependency graph.  */


[PATCH] Fix path splitting to handle empty else block

2016-07-05 Thread Richard Biener

The following patch fixes a FAIL of gcc.dg/tree-ssa/split-path-5.c
with hoisting enabled which leaves both cases of the testcase as

 if (pred)
   tmp = tmp + 4;

and thus with empty else (by hoisting a conversion).  While this
eventually is if-convertible it shouldn't prevent path splitting
as the case is an IV related increment.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2016-07-05  Richard Biener  

* gimple-ssa-split-paths.c (find_block_to_duplicate_for_splitting_pa):
Handle empty else block.
(is_feasible_trace): Likewise.
(split_paths): Likewise.

Index: gcc/gimple-ssa-split-paths.c
===
--- gcc/gimple-ssa-split-paths.c(revision 237955)
+++ gcc/gimple-ssa-split-paths.c(working copy)
@@ -76,14 +76,19 @@ find_block_to_duplicate_for_splitting_pa
return NULL;
 
  /* And that BB's immediate dominator's successors are the
-predecessors of BB.  */
- if (!find_edge (bb_idom, EDGE_PRED (bb, 0)->src)
- || !find_edge (bb_idom, EDGE_PRED (bb, 1)->src))
+predecessors of BB or BB itself.  */
+ if (!(EDGE_PRED (bb, 0)->src == bb_idom
+   || find_edge (bb_idom, EDGE_PRED (bb, 0)->src))
+ || !(EDGE_PRED (bb, 1)->src == bb_idom
+  || find_edge (bb_idom, EDGE_PRED (bb, 1)->src)))
return NULL;
 
- /* And that the predecessors of BB each have a single successor.  */
- if (!single_succ_p (EDGE_PRED (bb, 0)->src)
- || !single_succ_p (EDGE_PRED (bb, 1)->src))
+ /* And that the predecessors of BB each have a single successor
+or are BB's immediate domiator itself.  */
+ if (!(EDGE_PRED (bb, 0)->src == bb_idom
+   || single_succ_p (EDGE_PRED (bb, 0)->src))
+ || !(EDGE_PRED (bb, 1)->src == bb_idom
+  || single_succ_p (EDGE_PRED (bb, 1)->src)))
return NULL;
 
  /* So at this point we have a simple diamond for an IF-THEN-ELSE
@@ -148,8 +153,10 @@ is_feasible_trace (basic_block bb)
   basic_block pred1 = EDGE_PRED (bb, 0)->src;
   basic_block pred2 = EDGE_PRED (bb, 1)->src;
   int num_stmts_in_join = count_stmts_in_block (bb);
-  int num_stmts_in_pred1 = count_stmts_in_block (pred1);
-  int num_stmts_in_pred2 = count_stmts_in_block (pred2);
+  int num_stmts_in_pred1
+= EDGE_COUNT (pred1->succs) == 1 ? count_stmts_in_block (pred1) : 0;
+  int num_stmts_in_pred2
+= EDGE_COUNT (pred2->succs) == 1 ? count_stmts_in_block (pred2) : 0;
 
   /* This is meant to catch cases that are likely opportunities for
  if-conversion.  Essentially we look for the case where
@@ -292,6 +299,8 @@ split_paths ()
 "Duplicating join block %d into predecessor paths\n",
 bb->index);
  basic_block pred0 = EDGE_PRED (bb, 0)->src;
+ if (EDGE_COUNT (pred0->succs) != 1)
+   pred0 = EDGE_PRED (bb, 1)->src;
  transform_duplicate (pred0, bb);
  changed = true;
 


Re: [PATCH][RTL ifcvt] PR rtl-optimization/71594: ICE in noce_emit_cmove due to mismatched source modes

2016-07-05 Thread Kyrill Tkachov


On 04/07/16 12:19, Bernd Schmidt wrote:

On 07/04/2016 01:18 PM, Kyrill Tkachov wrote:

That does seem like it could cause trouble but I couldn't think of how
that sequence could appear or what its
semantics would be. Would assigning to the SImode reg 0 in your example
not touch the upper bits of the DImode value?


No, multi-word subreg accesses are per-word.


In any case, bb_ok_for_noce_convert_multiple_sets doesn't keep track of
dependencies between the instructions
so I think the best place to handle this case would be in
noce_convert_multiple_sets where instead of the assert
in this patch we'd just end_sequence () and return FALSE.
Would that be preferable?


That should at least work, and I'd be ok with that.



Ok, here's the updated patch with the assert replaced by failing the conversion.
Bootstrapped and tested on x86_64. Also tested on aarch64.

Is this ok?

Thanks,
Kyrill

2016-07-05  Kyrylo Tkachov  

PR rtl-optimization/71594
* ifcvt.c (noce_convert_multiple_sets): Wrap new_val or old_val
into subregs of appropriate mode before trying to emit a conditional
move.

2016-07-05  Kyrylo Tkachov  

PR rtl-optimization/71594
* gcc.dg/torture/pr71594.c: New test.
diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index fd2951673fb6bd6d9e5d52cdb88765434a603fb6..f7f120e04b11dc4f25be969e0c183a36e067a61c 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3228,6 +3228,41 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   if (if_info->then_else_reversed)
 	std::swap (old_val, new_val);
 
+
+  /* We allow simple lowpart register subreg SET sources in
+	 bb_ok_for_noce_convert_multiple_sets.  Be careful when processing
+	 sequences like:
+	 (set (reg:SI r1) (reg:SI r2))
+	 (set (reg:HI r3) (subreg:HI (r1)))
+	 For the second insn new_val or old_val (r1 in this example) will be
+	 taken from the temporaries and have the wider mode which will not
+	 match with the mode of the other source of the conditional move, so
+	 we'll end up trying to emit r4:HI = cond ? (r1:SI) : (r3:HI).
+	 Wrap the two cmove operands into subregs if appropriate to prevent
+	 that.  */
+  if (GET_MODE (new_val) != GET_MODE (temp))
+	{
+	  machine_mode src_mode = GET_MODE (new_val);
+	  machine_mode dst_mode = GET_MODE (temp);
+	  if (GET_MODE_SIZE (src_mode) <= GET_MODE_SIZE (dst_mode))
+	{
+	  end_sequence ();
+	  return FALSE;
+	}
+	  new_val = lowpart_subreg (dst_mode, new_val, src_mode);
+	}
+  if (GET_MODE (old_val) != GET_MODE (temp))
+	{
+	  machine_mode src_mode = GET_MODE (old_val);
+	  machine_mode dst_mode = GET_MODE (temp);
+	  if (GET_MODE_SIZE (src_mode) <= GET_MODE_SIZE (dst_mode))
+	{
+	  end_sequence ();
+	  return FALSE;
+	}
+	  old_val = lowpart_subreg (dst_mode, old_val, src_mode);
+	}
+
   /* Actually emit the conditional move.  */
   rtx temp_dest = noce_emit_cmove (if_info, temp, cond_code,
    x, y, new_val, old_val);
diff --git a/gcc/testsuite/gcc.dg/torture/pr71594.c b/gcc/testsuite/gcc.dg/torture/pr71594.c
new file mode 100644
index ..468a9f6891c92ff76520af0eee242f08b01ae0cf
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/torture/pr71594.c
@@ -0,0 +1,15 @@
+/* { dg-do compile } */
+/* { dg-options "--param max-rtl-if-conversion-insns=2" } */
+
+unsigned short a;
+int b, c;
+int *d;
+void fn1() {
+  *d = 24;
+  for (; *d <= 65;) {
+unsigned short *e = &a;
+b = (a &= 0 <= 0) < (c ?: (*e %= *d));
+for (; *d <= 83;)
+  ;
+  }
+}


Re: Determine more IVs to be non-overflowing

2016-07-05 Thread Richard Biener
On Sun, 3 Jul 2016, Jan Hubicka wrote:

> Hi,
> this is updated version of patch. I finally convinced myself to read bit of
> wide-int.h and learnt some new things, like that they exists in multiple
> precisions.  I always tought of wide-int as wider version of HOST_WIDE_INT 
> that
> can hold all of target data types with not-so handy API because it lacks the
> signed flag :)
> 
> The version bellow stay in type's precision and use the overflow flags.  If
> step's range is positive it check that:
> 
>   step_max * nit <= type_max-base_max
> 
> in unsigned arithmetic (all values are positive). The right hand side can not
> overflow (INT_MAX-INT_MIN is representable as unsigned int) and for left hand
> side I use the unsigned mult with overflow check.
> 
> If step can be also negative I also check:
> 
>   step_min * (-nit) <= base_min-type_min
> 
> Clearly this path triggers only for signed types with defined overflow (as
> otherwise nowrap_type returns true and iv_can_overflow_p is neve rused). It is
> not very important right now. It will make more sense once ivopts is updated 
> to
> use no-wrap flag rather htan no-overflow and we will be able to have unsgined
> IVs going down.
> 
> I also added sanity check that base's range is representable in the type's 
> range.
> This is only because some uses of IVs weems bit sloppy WRT nops and it may
> cause the right hand side of the comparsion to underflow.
> 
> For loop infrastructure it would be very useful to have a helper which can
> determine value range of an expression based on value range of individual
> SSA_NAMEs in it.  Would that be easy to get out of tree-vrp?
>   
> Bootstrapped/regtested x86_64-linux, OK?
> 
>   * gcc.dg/tree-ssa/scev-14.c: new testcase.
> 
>   * tree-scalar-evolution.c (iv_can_overflow_p): New function.
>   (simple_iv): Use it; use nowrap_type_p to check if type can overflow.
>   * tree-ssa-loop-niter.c (nowrap_type_p): Use ANY_INTEGRAL_TYPE_P.
> Index: testsuite/gcc.dg/tree-ssa/scev-14.c
> ===
> --- testsuite/gcc.dg/tree-ssa/scev-14.c   (revision 0)
> +++ testsuite/gcc.dg/tree-ssa/scev-14.c   (working copy)
> @@ -0,0 +1,11 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2 -fdump-tree-ivopts-details" } */
> +int a[100];
> +void t(unsigned int n)
> +{
> +  unsigned int i;
> +  for (i=0; i + a[i]++;
> +}
> +/* { dg-final { scan-tree-dump "Overflowness wrto loop niter:
> No-overflow"  "ivopts" } } */
> +/* { dg-final { scan-tree-dump-not "Overflowness wrto loop niter:
> Overflow" "ivopts" } } */
> Index: tree-scalar-evolution.c
> ===
> --- tree-scalar-evolution.c   (revision 237908)
> +++ tree-scalar-evolution.c   (working copy)
> @@ -3309,6 +3309,89 @@ scev_reset (void)
>  }
>  }
>  
> +/* Return true if the IV calculation in TYPE can overflow based on the 
> knowledge
> +   of the upper bound on the number of iterations of LOOP, the BASE and STEP
> +   of IV.
> +
> +   We do not use information whether TYPE can overflow so it is safe to
> +   use this test even for derived IVs not computed every iteration or
> +   hypotetical IVs to be inserted into code.  */
> +
> +static bool
> +iv_can_overflow_p (struct loop *loop, tree type, tree base, tree step)
> +{
> +  widest_int nit;
> +  wide_int base_min, base_max, step_min, step_max, type_min, type_max;
> +  signop sgn = TYPE_SIGN (type);
> +
> +  if (integer_zerop (step))
> +return false;
> +
> +  if (TREE_CODE (base) == INTEGER_CST)
> +base_min = base_max = base;
> +  else if (TREE_CODE (base) == SSA_NAME
> +&& INTEGRAL_TYPE_P (TREE_TYPE (base))
> +&& get_range_info (base, &base_min, &base_max) == VR_RANGE)
> +;
> +  else
> +return true;
> +
> +  if (TREE_CODE (step) == INTEGER_CST)
> +step_min = step_max = step;
> +  else if (TREE_CODE (step) == SSA_NAME
> +&& INTEGRAL_TYPE_P (TREE_TYPE (step))
> +&& get_range_info (step, &step_min, &step_max) == VR_RANGE)
> +;
> +  else
> +return true;
> +
> +  if (!get_max_loop_iterations (loop, &nit))
> +return true;
> +
> +  type_min = wi::min_value (type);
> +  type_max = wi::max_value (type);
> +
> +  /* Just sanity check that we don't see values out of the range of the type.
> + In this case the arithmetics bellow would overflow.  */
> +  gcc_checking_assert (wi::ge_p (base_min, type_min, sgn)
> +&& wi::le_p (base_max, type_max, sgn));
> +
> +  /* Account the possible increment in the last ieration.  */
> +  nit = nit + 1;

This can overflow for __uint128_t IV iterating to UINT128_MAX I think
given widest_int has only precision of TImode on x86_64?  An 
after-the-fact check like

   if (nit == 0)
 return true;

does the trick I guess.

Otherwise the patch looks ok now - thanks for the extensive comments ;)

Richard.

> +
> +  /* NIT is typeless and can exceed the precision of the

Re: Determine more IVs to be non-overflowing

2016-07-05 Thread Marc Glisse

On Tue, 5 Jul 2016, Richard Biener wrote:


given widest_int has only precision of TImode on x86_64?


Is that the case? The comments say:

 It is really finite precision math where the precision is 4 times the
 size of the largest integer that the target port can represent.

And the target has

/* Keep the OI and XI modes from confusing the compiler into thinking
   that these modes could actually be used for computation.  They are
   only holders for vectors during data movement.  */
#define MAX_BITSIZE_MODE_ANY_INT (128)

I would thus expect widest_int to have at 512 bits on x86_64 (possibly 
more depending on the exact definition of largest integer).


--
Marc Glisse


Re: Determine more IVs to be non-overflowing

2016-07-05 Thread Jan Hubicka
> 
> This can overflow for __uint128_t IV iterating to UINT128_MAX I think
> given widest_int has only precision of TImode on x86_64?  An 
> after-the-fact check like
> 
>if (nit == 0)
>  return true;
> 
> does the trick I guess.

OK, I suppose we need to review niter and related code for similar overflow 
possibilities.
This approximation of infinite math precision is bit iffy :))

I wonder if widest int should not trap on overflows with checking enabled.
> 
> Otherwise the patch looks ok now - thanks for the extensive comments ;)

Thanks for the patience! :)

Honza
> 
> Richard.
> 
> > +
> > +  /* NIT is typeless and can exceed the precision of the type.  In this 
> > case
> > + overflow is always possible, because we know STEP is non-zero.  */
> > +  if (wi::min_precision (nit, UNSIGNED) > TYPE_PRECISION (type))
> > +return true;
> > +  wide_int nit2 = wide_int::from (nit, TYPE_PRECISION (type), UNSIGNED);
> > +
> > +
> > +  /* If step can be positive, check that nit*step <= type_max-base.
> > + This can be done by unsigned arithmetic and we only need to watch 
> > overflow
> > + in the multiplication. The right hand side can always be represented 
> > in
> > + the type.  */
> > +  if (sgn == UNSIGNED || !wi::neg_p (step_max))
> > +{
> > +  bool overflow = false;
> > +  if (wi::gtu_p (wi::mul (step_max, nit2, UNSIGNED, &overflow),
> > +type_max - base_max)
> > + || overflow)
> > +   return true;
> > +}
> > +  /* If step can be negative, check that nit*(-step) <= base_min-type_min. 
> >  */
> > +  if (sgn == SIGNED && wi::neg_p (step_min))
> > +{
> > +  bool overflow = false, overflow2 = false;
> > +  if (wi::gtu_p (wi::mul (wi::neg (step_min, &overflow2),
> > +nit2, UNSIGNED, &overflow),
> > +base_min - type_min)
> > + || overflow || overflow2)
> > +return true;
> > +}
> > +
> > +  return false;
> > +}
> > +
> >  /* Checks whether use of OP in USE_LOOP behaves as a simple affine iv with
> > respect to WRTO_LOOP and returns its base and step in IV if possible
> > (see analyze_scalar_evolution_in_loop for more details on USE_LOOP
> > @@ -3375,8 +3458,12 @@ simple_iv (struct loop *wrto_loop, struc
> >if (tree_contains_chrecs (iv->base, NULL))
> >  return false;
> >  
> > -  iv->no_overflow = (!folded_casts && ANY_INTEGRAL_TYPE_P (type)
> > -&& TYPE_OVERFLOW_UNDEFINED (type));
> > +  iv->no_overflow = !folded_casts && nowrap_type_p (type);
> > +
> > +  if (!iv->no_overflow
> > +  && !iv_can_overflow_p (wrto_loop, type, iv->base, iv->step))
> > +iv->no_overflow = true;
> > +
> >  
> >/* Try to simplify iv base:
> >  
> > Index: tree-ssa-loop-niter.c
> > ===
> > --- tree-ssa-loop-niter.c   (revision 237908)
> > +++ tree-ssa-loop-niter.c   (working copy)
> > @@ -4105,7 +4105,7 @@ n_of_executions_at_most (gimple *stmt,
> >  bool
> >  nowrap_type_p (tree type)
> >  {
> > -  if (INTEGRAL_TYPE_P (type)
> > +  if (ANY_INTEGRAL_TYPE_P (type)
> >&& TYPE_OVERFLOW_UNDEFINED (type))
> >  return true;
> >  
> > 
> > 
> 
> -- 
> Richard Biener 
> SUSE LINUX GmbH, GF: Felix Imendoerffer, Jane Smithard, Graham Norton, HRB 
> 21284 (AG Nuernberg)


Re: Determine more IVs to be non-overflowing

2016-07-05 Thread Jan Hubicka
> On Tue, 5 Jul 2016, Richard Biener wrote:
> 
> >given widest_int has only precision of TImode on x86_64?
> 
> Is that the case? The comments say:
> 
>  It is really finite precision math where the precision is 4 times the
>  size of the largest integer that the target port can represent.
> 
> And the target has
> 
> /* Keep the OI and XI modes from confusing the compiler into thinking
>that these modes could actually be used for computation.  They are
>only holders for vectors during data movement.  */
> #define MAX_BITSIZE_MODE_ANY_INT (128)
> 
> I would thus expect widest_int to have at 512 bits on x86_64
> (possibly more depending on the exact definition of largest
> integer).

I think that comment is just confusing. (I got trapped by it, too)

/* The MAX_BITSIZE_MODE_ANY_INT is automatically generated by a very
   early examination of the target's mode file.  The WIDE_INT_MAX_ELTS
   can accomodate at least 1 more bit so that unsigned numbers of that
   mode can be represented as a signed value.  Note that it is still
   possible to create fixed_wide_ints that have precisions greater than
   MAX_BITSIZE_MODE_ANY_INT.  This can be useful when representing a
   double-width multiplication result, for example.  */

#define WIDE_INT_MAX_ELTS \
  ((MAX_BITSIZE_MODE_ANY_INT + HOST_BITS_PER_WIDE_INT) / HOST_BITS_PER_WIDE_INT)

#define WIDE_INT_MAX_PRECISION (WIDE_INT_MAX_ELTS * HOST_BITS_PER_WIDE_INT)

typedef FIXED_WIDE_INT (WIDE_INT_MAX_PRECISION) widest_int;

My reading is that the type will end up being 128+64 bits, but there is only
one extra bit guarnatee in general, which is taken by sign.

Honza
> 
> -- 
> Marc Glisse


[PATCH PR71734] Add missed check that reference defined inside loop.

2016-07-05 Thread Yuri Rumyantsev
Hi All,

Here is a simple fix to cure regressions introduced by my fix for
70729. Patch also contains minor changes in test found by Jakub.

Bootstrapping and regression testing did not show any new failures.

Is it OK for trunk?

ChangeLog:
2016-07-05  Yuri Rumyantsev  

PR tree-optimization/71734
* tree-ssa-loop-im.c (ref_indep_loop_p_1): Consider REF defined in
LOOP as independent if at least two loop iterations are not dependent.
gcc/testsuite/ChangeLog:
* g++.dg/vect/pr70729.cc: Delete redundant dg options, fix style.


71734.patch
Description: Binary data


[PATCH][AArch64] Improve Cortex-A53 integer scheduler

2016-07-05 Thread Wilco Dijkstra
This patch improves the accuracy of the Cortex-A53 integer scheduler,
resulting in performance gains across a wide range of benchmarks.

OK for commit?

ChangeLog:
2016-07-05  Wilco Dijkstra  

* config/arm/cortex-a53.md: Use final_presence_set for in-order.
(cortex_a53_shift): Add mov_shift.
(cortex_a53_shift_reg): Add new reservation for register shifts.
(cortex_a53_alu): Remove bfm.
(cortex_a53_alu_shift): Add bfm, remove mov_shift.
(cortex_a53_alu_extr): Add new reservation for EXTR.
(bypasses): Improve bypass modelling.

---
diff --git a/gcc/config/arm/cortex-a53.md b/gcc/config/arm/cortex-a53.md
index 
fc60bc26c7caf7e94064d7f292b877b12f333fca..70c0f4daabe0ccb8e32808f1af51f5460e087a18
 100644
--- a/gcc/config/arm/cortex-a53.md
+++ b/gcc/config/arm/cortex-a53.md
@@ -30,6 +30,7 @@
 
 (define_cpu_unit "cortex_a53_slot0" "cortex_a53")
 (define_cpu_unit "cortex_a53_slot1" "cortex_a53")
+(final_presence_set "cortex_a53_slot1" "cortex_a53_slot0")
 
 (define_reservation "cortex_a53_slot_any"
"cortex_a53_slot0\
@@ -71,41 +72,43 @@
 
 (define_insn_reservation "cortex_a53_shift" 2
   (and (eq_attr "tune" "cortexa53")
-   (eq_attr "type" "adr,shift_imm,shift_reg,mov_imm,mvn_imm"))
+   (eq_attr "type" "adr,shift_imm,mov_imm,mvn_imm,mov_shift"))
   "cortex_a53_slot_any")
 
-(define_insn_reservation "cortex_a53_alu_rotate_imm" 2
+(define_insn_reservation "cortex_a53_shift_reg" 2
   (and (eq_attr "tune" "cortexa53")
-   (eq_attr "type" "rotate_imm"))
-  "(cortex_a53_slot1)
-   | (cortex_a53_single_issue)")
+   (eq_attr "type" "shift_reg,mov_shift_reg"))
+  "cortex_a53_slot_any+cortex_a53_hazard")
 
 (define_insn_reservation "cortex_a53_alu" 3
   (and (eq_attr "tune" "cortexa53")
(eq_attr "type" "alu_imm,alus_imm,logic_imm,logics_imm,
alu_sreg,alus_sreg,logic_reg,logics_reg,
adc_imm,adcs_imm,adc_reg,adcs_reg,
-   bfm,csel,clz,rbit,rev,alu_dsp_reg,
-   mov_reg,mvn_reg,
-   mrs,multiple,no_insn"))
+   csel,clz,rbit,rev,alu_dsp_reg,
+   mov_reg,mvn_reg,mrs,multiple,no_insn"))
   "cortex_a53_slot_any")
 
 (define_insn_reservation "cortex_a53_alu_shift" 3
   (and (eq_attr "tune" "cortexa53")
(eq_attr "type" "alu_shift_imm,alus_shift_imm,
crc,logic_shift_imm,logics_shift_imm,
-   alu_ext,alus_ext,
-   extend,mov_shift,mvn_shift"))
+   alu_ext,alus_ext,bfm,extend,mvn_shift"))
   "cortex_a53_slot_any")
 
 (define_insn_reservation "cortex_a53_alu_shift_reg" 3
   (and (eq_attr "tune" "cortexa53")
(eq_attr "type" "alu_shift_reg,alus_shift_reg,
logic_shift_reg,logics_shift_reg,
-   mov_shift_reg,mvn_shift_reg"))
+   mvn_shift_reg"))
   "cortex_a53_slot_any+cortex_a53_hazard")
 
-(define_insn_reservation "cortex_a53_mul" 3
+(define_insn_reservation "cortex_a53_alu_extr" 3
+  (and (eq_attr "tune" "cortexa53")
+   (eq_attr "type" "rotate_imm"))
+  "cortex_a53_slot1|cortex_a53_single_issue")
+
+(define_insn_reservation "cortex_a53_mul" 4
   (and (eq_attr "tune" "cortexa53")
(ior (eq_attr "mul32" "yes")
(eq_attr "mul64" "yes")))
@@ -189,49 +192,43 @@
 (define_insn_reservation "cortex_a53_branch" 0
   (and (eq_attr "tune" "cortexa53")
(eq_attr "type" "branch,call"))
-  "cortex_a53_slot_any,cortex_a53_branch")
+  "cortex_a53_slot_any+cortex_a53_branch")
 
 
 ;; General-purpose register bypasses
 
 
-;; Model bypasses for unshifted operands to ALU instructions.
+;; Model bypasses for ALU to ALU instructions.
 
-(define_bypass 1 "cortex_a53_shift"
-"cortex_a53_shift")
+(define_bypass 0 "cortex_a53_shift*"
+"cortex_a53_alu")
 
-(define_bypass 1 "cortex_a53_alu,
- cortex_a53_alu_shift*,
- cortex_a53_alu_rotate_imm,
- cortex_a53_shift"
+(define_bypass 1 "cortex_a53_shift*"
+"cortex_a53_shift*,cortex_a53_alu_*")
+
+(define_bypass 1 "cortex_a53_alu*"
 "cortex_a53_alu")
 
-(define_bypass 2 "cortex_a53_alu,
- cortex_a53_alu_shift*"
+(define_bypass 1 "cortex_a53_alu*"
 "cortex_a53_alu_shift*"
 "aarch_forward_to_shift_is_not_shifted_reg")
 
-;; In our model, we allow any general-purpose register operation to
-;; bypass to the accumulator operand of an integer MADD-like operation.
+(define_bypass 2 "cortex_a53_alu*"
+"cortex_a53_alu_*,cortex_a53_shift*")
 
-(define_bypass 1 "cortex_a53_alu*,
- cortex_a53_load*,
- cortex_a53_mul"
+;; Model a bypass from MUL/MLA to MLA 

Re: [Fortran] Help with STAT= attribute in coarray reference

2016-07-05 Thread Alessandro Fanfarillo
Thanks, committed as rev. 238007.

2016-07-04 14:41 GMT-06:00 Mikael Morin :
> Le 30/06/2016 06:05, Alessandro Fanfarillo a écrit :
>>
>> Dear Mikael,
>>
>> thanks for your review and for the test. The attached patch, built and
>> regtested for x86_64-pc-linux-gnu, addresses all the suggestions.
>>
>> The next patch will change the documentation related to the caf_get
>> and caf_send functions and will add support for STAT= to the sendget
>> function.
>>
>> In the meantime, is this patch OK for trunk?
>>
> Yes, thanks.
>
> Mikael
>
>


Re: [PATCH][RTL ifcvt] PR rtl-optimization/71594: ICE in noce_emit_cmove due to mismatched source modes

2016-07-05 Thread Bernd Schmidt

On 07/05/2016 03:50 PM, Kyrill Tkachov wrote:

Ok, here's the updated patch with the assert replaced by failing the
conversion.
Bootstrapped and tested on x86_64. Also tested on aarch64.

Is this ok?


Sure. Thanks!


Bernd



[PATCH, vec-tails 10/10] Tests

2016-07-05 Thread Ilya Enkovich
Hi,

This patch adds several tests to check tails vectorization functionality.

Thanks,
Ilya
--
gcc/testsuite/

2016-07-05  Ilya Enkovich  

* lib/target-supports.exp (check_avx2_hw_available): New.
(check_effective_target_avx2_runtime): New.
* gcc.dg/vect/vect-tail-combine-1.c: New test.
* gcc.dg/vect/vect-tail-combine-2.c: New test.
* gcc.dg/vect/vect-tail-combine-3.c: New test.
* gcc.dg/vect/vect-tail-combine-4.c: New test.
* gcc.dg/vect/vect-tail-combine-5.c: New test.
* gcc.dg/vect/vect-tail-combine-6.c: New test.
* gcc.dg/vect/vect-tail-combine-7.c: New test.
* gcc.dg/vect/vect-tail-combine-9.c: New test.
* gcc.dg/vect/vect-tail-mask-1.c: New test.
* gcc.dg/vect/vect-tail-mask-2.c: New test.
* gcc.dg/vect/vect-tail-mask-3.c: New test.
* gcc.dg/vect/vect-tail-mask-4.c: New test.
* gcc.dg/vect/vect-tail-mask-5.c: New test.
* gcc.dg/vect/vect-tail-mask-6.c: New test.
* gcc.dg/vect/vect-tail-mask-7.c: New test.
* gcc.dg/vect/vect-tail-mask-8.c: New test.
* gcc.dg/vect/vect-tail-mask-9.c: New test.
* gcc.dg/vect/vect-tail-nomask-1.c: New test.
* gcc.dg/vect/vect-tail-nomask-2.c: New test.
* gcc.dg/vect/vect-tail-nomask-3.c: New test.
* gcc.dg/vect/vect-tail-nomask-4.c: New test.
* gcc.dg/vect/vect-tail-nomask-5.c: New test.
* gcc.dg/vect/vect-tail-nomask-6.c: New test.
* gcc.dg/vect/vect-tail-nomask-7.c: New test.


diff --git a/gcc/testsuite/gcc.dg/vect/vect-tail-combine-1.c 
b/gcc/testsuite/gcc.dg/vect/vect-tail-combine-1.c
new file mode 100644
index 000..134d789
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-tail-combine-1.c
@@ -0,0 +1,106 @@
+/* { dg-do run } */
+/* { dg-require-weak "" } */
+/* { dg-additional-options "-ftree-vectorize-epilogues=combine 
-fvect-epilogue-cost-model=unlimited -mavx2" { target avx2_runtime } } */
+
+#define SIZE 1023
+#define ALIGN 64
+
+extern int posix_memalign(void **memptr, __SIZE_TYPE__ alignment, 
__SIZE_TYPE__ size) __attribute__((weak));
+extern void free (void *);
+
+void __attribute__((noinline))
+test_citer (int * __restrict__ a,
+   int * __restrict__ b,
+   int * __restrict__ c)
+{
+  int i;
+
+  a = (int *)__builtin_assume_aligned (a, ALIGN);
+  b = (int *)__builtin_assume_aligned (b, ALIGN);
+  c = (int *)__builtin_assume_aligned (c, ALIGN);
+
+  for (i = 0; i < SIZE; i++)
+c[i] = a[i] + b[i];
+}
+
+void __attribute__((noinline))
+test_viter (int * __restrict__ a,
+   int * __restrict__ b,
+   int * __restrict__ c,
+   int size)
+{
+  int i;
+
+  a = (int *)__builtin_assume_aligned (a, ALIGN);
+  b = (int *)__builtin_assume_aligned (b, ALIGN);
+  c = (int *)__builtin_assume_aligned (c, ALIGN);
+
+  for (i = 0; i < size; i++)
+c[i] = a[i] + b[i];
+}
+
+void __attribute__((noinline))
+init_data (int * __restrict__ a,
+  int * __restrict__ b,
+  int * __restrict__ c,
+  int size)
+{
+  for (int i = 0; i < size; i++)
+{
+  a[i] = i;
+  b[i] = -i;
+  c[i] = 0;
+  asm volatile("": : :"memory");
+}
+  a[size] = b[size] = c[size] = size;
+}
+
+
+void __attribute__((noinline))
+run_test ()
+{
+  int *a;
+  int *b;
+  int *c;
+  int i;
+
+  if (posix_memalign ((void **)&a, ALIGN, (SIZE + 1) * sizeof (int)) != 0)
+return;
+  if (posix_memalign ((void **)&b, ALIGN, (SIZE + 1) * sizeof (int)) != 0)
+return;
+  if (posix_memalign ((void **)&c, ALIGN, (SIZE + 1) * sizeof (int)) != 0)
+return;
+
+  init_data (a, b, c, SIZE);
+  test_citer (a, b, c);
+  for (i = 0; i < SIZE; i++)
+if (c[i] != a[i] + b[i])
+  __builtin_abort ();
+  if (a[SIZE] != SIZE || b[SIZE] != SIZE || c[SIZE] != SIZE)
+__builtin_abort ();
+
+  init_data (a, b, c, SIZE);
+  test_viter (a, b, c, SIZE);
+  for (i = 0; i < SIZE; i++)
+if (c[i] != a[i] + b[i])
+  __builtin_abort ();
+  if (a[SIZE] != SIZE || b[SIZE] != SIZE || c[SIZE] != SIZE)
+__builtin_abort ();
+
+  free (a);
+  free (b);
+  free (c);
+}
+
+int
+main (int argc, const char **argv)
+{
+  if (!posix_memalign)
+return 0;
+
+  run_test ();
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "LOOP VECTORIZED \\(VS=32\\)" 2 "vect" { 
target avx2_runtime } } } */
+/* { dg-final { scan-tree-dump-times "LOOP EPILOGUE COMBINED \\(VS=32\\)" 2 
"vect" { target avx2_runtime } } } */
diff --git a/gcc/testsuite/gcc.dg/vect/vect-tail-combine-2.c 
b/gcc/testsuite/gcc.dg/vect/vect-tail-combine-2.c
new file mode 100644
index 000..c513c5c
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/vect/vect-tail-combine-2.c
@@ -0,0 +1,134 @@
+/* { dg-do run } */
+/* { dg-require-weak "" } */
+/* { dg-additional-options "-ftree-vectorize-epilogues=combine 
-fvect-epilogue-cost-model=unlimited -mavx2" { target avx2_runtime } } */
+
+#define SIZE 1023
+#define ALIGN 64
+
+extern int posix_memalign(void **memptr, __SIZE_TYPE__

Re: [lra] Cleanup the use of offmemok and don't count spilling cost for it

2016-07-05 Thread Jiong Wang

On 04/07/16 20:44, Vladimir Makarov wrote:

On 06/30/2016 01:22 PM, Jiong Wang wrote:


Here is the patch,
 From my understanding, "offmemok" is used to represent a memory operand
who's address we want to reload, and searching of it's reference location
seems confirmed my understanding as it's always used together with MEM_P check.

So this patch does the following modifications:

   * Only set offmemok to true if MEM_P is also true, as otherwise offmemok
 is not used.
   * Remove redundant MEM_P check which was used together with offmemok.
   * Avoid the addition of spilling cost if offmemok be true as an address
 calculation reload is not spilling.

bootstrap & gcc/g++ regression OK on x86_64/aarch64/arm.

OK for trunk?


Yes.  The patch looks OK to me.  Thank you for working on the 
solution, Jiong.  As I wrote the code is very sensitive and any its 
change might affect some targets.  Usually patches for this part of 
LRA can take a few iterations.


Thanks for the review Vlad.

As Bernd has concerns on merging MEM_P into offmemok. I committed the
following patch as r238010 which keeps the functional change but without
merging checks.

2016-07-05  Jiong Wang

gcc/
  * lra-constraints.c (process_alt_operands): Don't add spilling cost
  for "offmemok".


diff --git a/gcc/lra-constraints.c b/gcc/lra-constraints.c
index bf08dce..e9d3e43 100644
--- a/gcc/lra-constraints.c
+++ b/gcc/lra-constraints.c
@@ -2488,7 +2488,9 @@ process_alt_operands (int only_alternative)
 
 		 Code below increases the reject for both pseudo and non-pseudo
 		 spill.  */
-	  if (no_regs_p && !(REG_P (op) && hard_regno[nop] < 0))
+	  if (no_regs_p
+		  && !(MEM_P (op) && offmemok)
+		  && !(REG_P (op) && hard_regno[nop] < 0))
 		{
 		  if (lra_dump_file != NULL)
 		fprintf


Re: [PATCH trivial] Fix PR71214 (__cpp_rvalue_references vs. __cpp_rvalue_reference)

2016-07-05 Thread Jason Merrill
On Tue, Jul 5, 2016 at 6:44 AM, Richard Biener
 wrote:
> On Tue, Jul 5, 2016 at 12:07 PM, Markus Trippelsdorf
>  wrote:
>> Hi,
>>
>> as PR71214 points out gcc uses a wrong feature test macro for C++11
>> rvalue references: __cpp_rvalue_reference instead of the correct
>> __cpp_rvalue_references.
>>
>> The fix is trivial. Ok for trunk and active branches?
>
> I wonder if we should to retain the (bogus) old defines for backward
> compatibility.

Hmm.  The original SD-6 proposal
(http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3694.htm)
used the singular, but it soon changed to plural; I haven't been able
to find any email discussion of the change.  I think let's keep both
defines.

Jason


Re: [Patch, avr] Fix PR 50739 - nameless error with -fmerge-all-constants

2016-07-05 Thread Georg-Johann Lay

Senthil Kumar Selvaraj schrieb:

Senthil Kumar Selvaraj writes:


Hi,

  This patch fixes a problem with fmerge-all-constants and the progmem
  attribute - on trunk, the below testcase errors out with a section
  conflict error.

  When avr_asm_select_section renames .rodata.xyz section to
  .progmem.xyz and calls get_section, it passes in the same flags in
  sect. If the flags include SECTION_DECLARED, get_section barfs with a
  section conflict error - the section flag comparison logic strips off
  SECTION_DECLARED from existing section flags before comparing it with
  the new incoming flags.

  With -fmerge-all-constants, default_elf_select_section always returns
  .rodata.strx.x. varasm switches to that section when writing out the
  non progmem string literal, and that sets SECTION_DECLARED. The first
  call to get_section with the section name transformed to
  .progmem.data.strx.x then includes SECTION_DECLARED, but because this
  is a new section, the section flag conflict logic doesn't kick in. The
  second call to get_section, again including SECTION_DECLARED, triggers
  the section flag conflict logic and causes the error.

  Stripping off SECTION_DECLARED before calling get_section fixes the
  problem - the flag is supposed to be set by switch_section anyway.

  Reg testing showed no new regressions. Ok for trunk and backport to 6.x?

Regards
Senthil



Added missing description in Changelog entry.

gcc/testsuite/ChangeLog:

2016-07-05  Senthil Kumar Selvaraj  

PR target/50739 
* gcc.target/avr/pr50739.c: New test.


gcc/ChangeLog:

2016-07-05  Senthil Kumar Selvaraj  

PR target/50739 
* config/avr/avr.c (avr_asm_select_section): Strip off
SECTION_DECLARED from flags when calling get_section.

Regards
Senthil


gcc/testsuite/ChangeLog:

2016-07-05  Senthil Kumar Selvaraj  

PR target/50739 
* gcc.target/avr/pr50739.c: New test.


gcc/ChangeLog:

2016-07-05  Senthil Kumar Selvaraj  

PR target/50739 
* config/avr/avr.c (avr_asm_select_section):


diff --git gcc/config/avr/avr.c gcc/config/avr/avr.c
index 18ed766..9b7b392 100644
--- gcc/config/avr/avr.c
+++ gcc/config/avr/avr.c
@@ -9641,7 +9641,7 @@ avr_asm_select_section (tree decl, int reloc, unsigned 
HOST_WIDE_INT align)
 {
   const char *sname = ACONCAT ((new_prefix,
 name + strlen (old_prefix), NULL));
-  return get_section (sname, sect->common.flags, sect->named.decl);
+  return get_section (sname, sect->common.flags & ~SECTION_DECLARED, 
sect->named.decl);


Long line, should be wrapped.

Johann


 }
 }
 
diff --git gcc/testsuite/gcc.target/avr/pr50739.c gcc/testsuite/gcc.target/avr/pr50739.c

new file mode 100644
index 000..a6850b7
--- /dev/null
+++ gcc/testsuite/gcc.target/avr/pr50739.c
@@ -0,0 +1,7 @@
+/* { dg-do compile } */
+/* { dg-options "-fmerge-all-constants" } */
+
+char *ca = "123";
+
+const char a[] __attribute__((__progmem__))= "a";
+const char b[] __attribute__((__progmem__))= "b";







Re: [PATCH trivial] Fix PR71214 (__cpp_rvalue_references vs. __cpp_rvalue_reference)

2016-07-05 Thread Markus Trippelsdorf
On 2016.07.05 at 12:21 -0400, Jason Merrill wrote:
> On Tue, Jul 5, 2016 at 6:44 AM, Richard Biener
>  wrote:
> > On Tue, Jul 5, 2016 at 12:07 PM, Markus Trippelsdorf
> >  wrote:
> >> Hi,
> >>
> >> as PR71214 points out gcc uses a wrong feature test macro for C++11
> >> rvalue references: __cpp_rvalue_reference instead of the correct
> >> __cpp_rvalue_references.
> >>
> >> The fix is trivial. Ok for trunk and active branches?
> >
> > I wonder if we should to retain the (bogus) old defines for backward
> > compatibility.
> 
> Hmm.  The original SD-6 proposal
> (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3694.htm)
> used the singular, but it soon changed to plural; I haven't been able
> to find any email discussion of the change.  I think let's keep both
> defines.

Ok, no problem. But what about the testsuite fallout?

1) Change to plural as in my patch.
2) Add additional tests for plural.
3) Don't change anything at all.

-- 
Markus


Re: [PATCH trivial] Fix PR71214 (__cpp_rvalue_references vs. __cpp_rvalue_reference)

2016-07-05 Thread Jason Merrill
On Tue, Jul 5, 2016 at 12:30 PM, Markus Trippelsdorf
 wrote:
> On 2016.07.05 at 12:21 -0400, Jason Merrill wrote:
>> On Tue, Jul 5, 2016 at 6:44 AM, Richard Biener
>>  wrote:
>> > On Tue, Jul 5, 2016 at 12:07 PM, Markus Trippelsdorf
>> >  wrote:
>> >> Hi,
>> >>
>> >> as PR71214 points out gcc uses a wrong feature test macro for C++11
>> >> rvalue references: __cpp_rvalue_reference instead of the correct
>> >> __cpp_rvalue_references.
>> >>
>> >> The fix is trivial. Ok for trunk and active branches?
>> >
>> > I wonder if we should to retain the (bogus) old defines for backward
>> > compatibility.
>>
>> Hmm.  The original SD-6 proposal
>> (http://www.open-std.org/jtc1/sc22/wg21/docs/papers/2013/n3694.htm)
>> used the singular, but it soon changed to plural; I haven't been able
>> to find any email discussion of the change.  I think let's keep both
>> defines.
>
> Ok, no problem. But what about the testsuite fallout?
>
> 1) Change to plural as in my patch.
> 2) Add additional tests for plural.
> 3) Don't change anything at all.

Changing the test to plural seems appropriate.

Jason


[v3 PATCH] Implement LWG 2509

2016-07-05 Thread Ville Voutilainen
Tested on Linux-X64.

2016-07-05  Ville Voutilainen  

Implement LWG 2509,
any_cast doesn't work with rvalue reference targets and cannot
move with a value target.
* include/experimental/any (any(_ValueType&&)): Constrain and
add an overload that doesn't forward.
* include/experimental/any (any_cast(any&&)): Constrain and
add an overload that moves.
* testsuite/experimental/any/misc/any_cast.cc: Add tests for
the functionality added by LWG 2509.
diff --git a/libstdc++-v3/include/experimental/any 
b/libstdc++-v3/include/experimental/any
index ae40091..96ad576 100644
--- a/libstdc++-v3/include/experimental/any
+++ b/libstdc++-v3/include/experimental/any
@@ -158,7 +158,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 /// Construct with a copy of @p __value as the contained object.
 template ,
- typename _Mgr = _Manager<_Tp>>
+ typename _Mgr = _Manager<_Tp>,
+  typename enable_if::value,
+ bool>::type = true>
   any(_ValueType&& __value)
   : _M_manager(&_Mgr::_S_manage)
   {
@@ -167,6 +169,19 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  "The contained object must be CopyConstructible");
   }
 
+/// Construct with a copy of @p __value as the contained object.
+template ,
+ typename _Mgr = _Manager<_Tp>,
+  typename enable_if::value,
+ bool>::type = false>
+  any(_ValueType&& __value)
+  : _M_manager(&_Mgr::_S_manage)
+  {
+_Mgr::_S_create(_M_storage, __value);
+   static_assert(is_copy_constructible<_Tp>::value,
+ "The contained object must be CopyConstructible");
+  }
+
 /// Destructor, calls @c clear()
 ~any() { clear(); }
 
@@ -377,7 +392,10 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   __throw_bad_any_cast();
 }
 
-  template
+  template::value
+  || is_lvalue_reference<_ValueType>::value,
+  bool>::type = true>
 inline _ValueType any_cast(any&& __any)
 {
   static_assert(any::__is_valid_cast<_ValueType>(),
@@ -387,6 +405,20 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
return *__p;
   __throw_bad_any_cast();
 }
+
+  template::value
+  && !is_lvalue_reference<_ValueType>::value,
+  bool>::type = false>
+inline _ValueType any_cast(any&& __any)
+{
+  static_assert(any::__is_valid_cast<_ValueType>(),
+ "Template argument must be a reference or CopyConstructible type");
+  auto __p = any_cast>(&__any);
+  if (__p)
+   return std::move(*__p);
+  __throw_bad_any_cast();
+}
   // @}
 
   template
diff --git a/libstdc++-v3/testsuite/experimental/any/misc/any_cast.cc 
b/libstdc++-v3/testsuite/experimental/any/misc/any_cast.cc
index ce3f213..bb0f754 100644
--- a/libstdc++-v3/testsuite/experimental/any/misc/any_cast.cc
+++ b/libstdc++-v3/testsuite/experimental/any/misc/any_cast.cc
@@ -77,8 +77,38 @@ void test02()
   }
 }
 
+static int move_count = 0;
+
+void test03()
+{
+  struct MoveEnabled
+  {
+MoveEnabled(MoveEnabled&&)
+{
+  ++move_count;
+}
+MoveEnabled() = default;
+MoveEnabled(const MoveEnabled&) = default;
+  };
+  MoveEnabled m;
+  MoveEnabled m2 = any_cast(any(m));
+  VERIFY(move_count == 1);
+  MoveEnabled&& m3 = any_cast(any(m));
+  VERIFY(move_count == 1);
+  struct MoveDeleted
+  {
+MoveDeleted(MoveDeleted&&) = delete;
+MoveDeleted() = default;
+MoveDeleted(const MoveDeleted&) = default;
+  };
+  MoveDeleted md;
+  MoveDeleted&& md2 = any_cast(any(std::move(md)));
+  MoveDeleted&& md3 = any_cast(any(std::move(md)));
+}
+
 int main()
 {
   test01();
   test02();
+  test03();
 }
diff --git a/libstdc++-v3/testsuite/experimental/any/misc/any_cast_neg.cc 
b/libstdc++-v3/testsuite/experimental/any/misc/any_cast_neg.cc
index 1361db8..82957a1 100644
--- a/libstdc++-v3/testsuite/experimental/any/misc/any_cast_neg.cc
+++ b/libstdc++-v3/testsuite/experimental/any/misc/any_cast_neg.cc
@@ -26,5 +26,5 @@ void test01()
   using std::experimental::any_cast;
 
   const any y(1);
-  any_cast(y); // { dg-error "qualifiers" "" { target { *-*-* } } 353 }
+  any_cast(y); // { dg-error "qualifiers" "" { target { *-*-* } } 368 }
 }


Re: [v3 PATCH] Implement LWG 2509

2016-07-05 Thread Jonathan Wakely

On 05/07/16 20:33 +0300, Ville Voutilainen wrote:

   Implement LWG 2509,
   any_cast doesn't work with rvalue reference targets and cannot
   move with a value target.
   * include/experimental/any (any(_ValueType&&)): Constrain and
   add an overload that doesn't forward.
   * include/experimental/any (any_cast(any&&)): Constrain and
   add an overload that moves.


Don't repeat the filename for two changes in the same file, it should
be:

   * include/experimental/any (any(_ValueType&&)): Constrain and
   add an overload that doesn't forward.
   (any_cast(any&&)): Constrain and add an overload that moves.


OK with that tweak to the changelog, thanks.



Re: Check fpic is ok for target in pr69102.c

2016-07-05 Thread Mike Stump
On Jul 5, 2016, at 1:39 AM, Kito Cheng  wrote:
> 
> pr69102.c use -fPIC flag in dg-options but not check is available for
> target, so I add "dg-require-effective-target fpic" for it.'

I happened to notice you didn't ask Ok?, and you didn't apply it or have it 
applied.  I'd recommend one or the other.

I'm assuming you meant to ask Ok?

Ok.

I've applied it:

Committed revision 238023.

for you.  Thanks.

> 2016-07-05  Kito Cheng 
> 
>* gcc.c-torture/compile/pr69102.c: Require fpic support.



Use iv_can_overflow_p in ivopts

2016-07-05 Thread Jan Hubicka
Hi,
this patch makes ivopts to use iv_can_overflow_p on its candidates. This helps
to determine if candidate wraps in case it is not directly originating from IV
variable (i.e. it is derived IV or artificial one). For those we can not use
type information because we do now know if they are going to be computed each
iteration. We can still use the iv_can_overflow_p analysis.

I also wrote code that propagates overflow flag from original IVs to derived
ones and it does improve some of real world benchmarks. This patch alone seems
quite benchmark neutral but I would like to proceed in smaller steps.

Bootstrapped/regtested x86_64-linux, OK?

Honza
* tree-scalar-evolution.c (iv_can_overflow_p): Export.
* tree-scalar-evolution.h (iv_can_overflow_p): Declare.
* tree-ssa-loop-ivopts.c (alloc_iv): Use it.

Index: tree-scalar-evolution.c
===
--- tree-scalar-evolution.c (revision 238012)
+++ tree-scalar-evolution.c (working copy)
@@ -3317,7 +3317,7 @@ scev_reset (void)
use this test even for derived IVs not computed every iteration or
hypotetical IVs to be inserted into code.  */
 
-static bool
+bool
 iv_can_overflow_p (struct loop *loop, tree type, tree base, tree step)
 {
   widest_int nit;
Index: tree-scalar-evolution.h
===
--- tree-scalar-evolution.h (revision 238005)
+++ tree-scalar-evolution.h (working copy)
@@ -38,6 +38,7 @@ extern unsigned int scev_const_prop (voi
 extern bool expression_expensive_p (tree);
 extern bool simple_iv (struct loop *, struct loop *, tree, struct affine_iv *,
   bool);
+extern bool iv_can_overflow_p (struct loop *, tree, tree, tree);
 extern tree compute_overall_effect_of_inner_loop (struct loop *, tree);
 
 /* Returns the basic block preceding LOOP, or the CFG entry block when
Index: tree-ssa-loop-ivopts.c
===
--- tree-ssa-loop-ivopts.c  (revision 238005)
+++ tree-ssa-loop-ivopts.c  (working copy)
@@ -1181,6 +1182,9 @@ alloc_iv (struct ivopts_data *data, tree
   iv->biv_p = false;
   iv->nonlin_use = NULL;
   iv->ssa_name = NULL_TREE;
+  if (!no_overflow && !iv_can_overflow_p (data->current_loop, TREE_TYPE (base),
+ base, step))
+no_overflow = true;
   iv->no_overflow = no_overflow;
   iv->have_address_use = false;
 


Re: [PATCH] Allow fwprop to undo vectorization harm (PR68961)

2016-07-05 Thread Richard Sandiford
Richard Biener  writes:
> On Sun, 3 Jul 2016, Richard Sandiford wrote:
>
>> Richard Biener  writes:
>> > On Wed, 15 Jun 2016, Richard Sandiford wrote:
>> >
>> >> Richard Biener  writes:
>> >> > With the proposed cost change for vector construction we will end up
>> >> > vectorizing the testcase in PR68961 again (on x86_64 and likely
>> >> > on ppc64le as well after that target gets adjustments).  Currently
>> >> > we can't optimize that away again noticing the direct overlap of
>> >> > argument and return registers.  The obstackle is
>> >> >
>> >> > (insn 7 4 8 2 (set (reg:V2DF 93)
>> >> > (vec_concat:V2DF (reg/v:DF 91 [ a ])
>> >> > (reg/v:DF 92 [ aa ]))) 
>> >> > ...
>> >> > (insn 21 8 24 2 (set (reg:DI 97 [ D.1756 ])
>> >> > (subreg:DI (reg:TI 88 [ D.1756 ]) 0))
>> >> > (insn 24 21 11 2 (set (reg:DI 100 [+8 ])
>> >> > (subreg:DI (reg:TI 88 [ D.1756 ]) 8))
>> >> >
>> >> > which we eventually optimize to DFmode subregs of (reg:V2DF 93).
>> >> >
>> >> > First of all simplify_subreg doesn't handle the subregs of a vec_concat
>> >> > (easy fix below).
>> >> >
>> >> > Then combine doesn't like to simplify the multi-use (it tries some
>> >> > parallel it seems).  So I went to forwprop which eventually manages
>> >> > to do this but throws away the result (reg:DF 91) or (reg:DF 92)
>> >> > because it is not a constant.  Thus I allow arbitrary simplification
>> >> > results for SUBREGs of [VEC_]CONCAT operations.  There doesn't seem
>> >> > to be a magic flag to tell it to restrict to the case where all
>> >> > uses can be simplified or so, nor to restrict simplifications to a REG.
>> >> > But I don't see any undesirable simplifications of (subreg 
>> >> > ([vec_]concat)).
>> >> 
>> >> Adding that as a special case to propgate_rtx feels like a hack though :-)
>> >> I think:
>> >> 
>> >> > Index: gcc/fwprop.c
>> >> > ===
>> >> > *** gcc/fwprop.c(revision 237286)
>> >> > --- gcc/fwprop.c(working copy)
>> >> > *** propagate_rtx (rtx x, machine_mode mode,
>> >> > *** 664,670 
>> >> > || (GET_CODE (new_rtx) == SUBREG
>> >> >   && REG_P (SUBREG_REG (new_rtx))
>> >> >   && (GET_MODE_SIZE (mode)
>> >> > ! <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx))
>> >> >   flags |= PR_CAN_APPEAR;
>> >> > if (!varying_mem_p (new_rtx))
>> >> >   flags |= PR_HANDLE_MEM;
>> >> > --- 664,673 
>> >> > || (GET_CODE (new_rtx) == SUBREG
>> >> >   && REG_P (SUBREG_REG (new_rtx))
>> >> >   && (GET_MODE_SIZE (mode)
>> >> > ! <= GET_MODE_SIZE (GET_MODE (SUBREG_REG (new_rtx)
>> >> > !   || ((GET_CODE (new_rtx) == VEC_CONCAT
>> >> > !  || GET_CODE (new_rtx) == CONCAT)
>> >> > ! && GET_CODE (x) == SUBREG))
>> >> >   flags |= PR_CAN_APPEAR;
>> >> > if (!varying_mem_p (new_rtx))
>> >> >   flags |= PR_HANDLE_MEM;
>> >> 
>> >> ...this if statement should fundamentally only test new_rtx.
>> >> E.g. we'd want the same thing for any SUBREG inside X.
>> >> 
>> >> How about changing:
>> >> 
>> >>   /* The replacement we made so far is valid, if all of the recursive
>> >>  replacements were valid, or we could simplify everything to
>> >>  a constant.  */
>> >>   return valid_ops || can_appear || CONSTANT_P (tem);
>> >> 
>> >> so that (REG_P (tem) && !HARD_REGISTER_P (tem)) is also valid?
>> >> I suppose that's likely to increase register pressure though,
>> >> if only some uses of new_rtx simplify.  (There again, requiring all
>> >> uses to be replacable could make hot code the hostage of cold code.)
>> >
>> > Yes, my fear was about register presure increase for the case not all
>> > uses can be replaced (fwprop doesn't seem to have code to verify or
>> > require that).
>> >
>> > I can avoid checking for GET_CODE (x) == SUBREG and add a PR_REG
>> > case to restrict REG_P (tem) && !HARD_REGISTER_P (tem) to the
>> > new_rtx == [VEC_]CONCAT case for example.
>> 
>> I don't think that helps though.  There might be other uses of a
>> VEC_CONCAT that aren't SUBREGs, in which case we'd have the same
>> problem of keeping both values live at once.
>> 
>> How about restricting the REG_P (tem) && !HARD_REGISTER_P (tem)
>> to cases where new_rtx has more words than tem?
>
> So would you really make a simple mode-size check here?

I thought a mode check would be worth trying since (for better or worse)
words are special for subregs.  But...

> I wonder which cases are there other than the subreg of [vec_concat]
> that would end up with this case.  That is,
>
>   if (REG_P (tem) && !HARD_REGISTER_P (tem)
>   && GET_MODE (tem) == GET_MODE_INNER (GET_MODE (new_rtx))
>   && (VECTOR_MODE_P (GET_MODE (new_rtx))
>   || COMPLEX_MODE_P (GET_MODE (new_rtx
> return true;

this looks good to me too FWIW.  I think it reads more naturally if
you do the GET_MODE_INNER check last.

> works as wo

[v3 PATCH] Implement LWG 2451

2016-07-05 Thread Ville Voutilainen
Tested on Linux-x64.

2016-07-05  Ville Voutilainen  

Implement LWG 2451, optional should 'forward' T's
implicit conversions.
* include/experimental/optional (__is_optional_impl, __is_optional):
New.
(optional()): Make constexpr and default.
(optional(_Up&&), optional(const optional<_Up>&),
optional(optional<_Up>&& __t): New.
(operator=(_Up&&)): Constrain.
(operator=(const optional<_Up>&), operator=(optional<_Up>&&)): New.
* testsuite/experimental/optional/cons/value.cc:
Add tests for the functionality added by LWG 2451.
* testsuite/experimental/optional/cons/value_neg.cc: New.
diff --git a/libstdc++-v3/include/experimental/optional 
b/libstdc++-v3/include/experimental/optional
index 7524a7e..5657524 100644
--- a/libstdc++-v3/include/experimental/optional
+++ b/libstdc++-v3/include/experimental/optional
@@ -470,6 +470,24 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   bool _M_engaged = false;
 };
 
+  template
+  class optional;
+
+  template
+struct __is_optional_impl : false_type
+{ };
+
+  template
+  struct __is_optional_impl> : true_type
+{ };
+
+  template
+struct __is_optional
+: public __is_optional_impl::type>::type>::type
+{ };
+
+
   /**
 * @brief Class template for optional values.
 */
@@ -502,6 +520,78 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   // _Optional_base has the responsibility for construction.
   using _Base::_Base;
 
+  constexpr optional() = default;
+  // Converting constructors for engaged optionals.
+  template >,
+ is_constructible<_Tp, _Up&&>,
+ is_convertible<_Up&&, _Tp>
+ >::value, bool>::type = true>
+  constexpr optional(_Up&& __t)
+: _Base(_Tp(std::forward<_Up>(__t))) { }
+
+  template >,
+ is_constructible<_Tp, _Up&&>,
+ __not_>
+ >::value, bool>::type = false>
+  explicit constexpr optional(_Up&& __t)
+: _Base(_Tp(std::forward<_Up>(__t))) { }
+
+  template >,
+ __not_&>>,
+ __not_&, _Tp>>,
+ is_constructible<_Tp, const _Up&>,
+ is_convertible
+ >::value, bool>::type = true>
+  constexpr optional(const optional<_Up>& __t)
+: _Base(__t ? optional<_Tp>(*__t) : optional<_Tp>()) { }
+
+  template >,
+ __not_&>>,
+ __not_&, _Tp>>,
+ is_constructible<_Tp, const _Up&>,
+ __not_>
+ >::value, bool>::type = false>
+  explicit constexpr optional(const optional<_Up>& __t)
+: _Base(__t ? optional<_Tp>(*__t) : optional<_Tp>()) { }
+
+  template >,
+ __not_&&>>,
+ __not_&&, _Tp>>,
+ is_constructible<_Tp, _Up&&>,
+ is_convertible<_Up&&, _Tp>
+ >::value, bool>::type = true>
+  constexpr optional(optional<_Up>&& __t)
+: _Base(__t ? optional<_Tp>(std::move(*__t)) : optional<_Tp>()) { }
+
+  template >,
+ __not_&&>>,
+ __not_&&, _Tp>>,
+ is_constructible<_Tp, _Up&&>,
+ __not_>
+ >::value, bool>::type = false>
+  explicit constexpr optional(optional<_Up>&& __t)
+: _Base(__t ? optional<_Tp>(std::move(*__t)) : optional<_Tp>()) { }
+
   // [X.Y.4.3] (partly) Assignment.
   optional&
   operator=(nullopt_t) noexcept
@@ -510,8 +600,12 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 return *this;
   }
 
-  template
-enable_if_t>::value, optional&>
+  template>,
+  __not_<__is_optional<_Up>>>::value,
+  bool>::type = true>
+optional&
 operator=(_Up&& __u)
 {
   static_assert(__and_,
@@ -526,6 +620,57 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   return *this;
 }
 
+  template>>::value,
+  bool>::type = true>
+optional&
+operator=(const optional<_Up>& __u)
+{
+  static_assert(__and_,
+  is_assignable<_Tp&, _Up>>(),
+"Cannot assign to value type from argument");
+
+  if (__u)
+{
+  if (this->_M_is_engaged())
+this->_M_get() = *__u;
+  else
+this->_M_construct(*__u);
+

Re: [PATCH, ARM, 3/3] Add multilib support for bare-metal ARM architectures

2016-07-05 Thread Jasmin J.

Ping!

https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01578.html

On 05/19/2016 11:42 PM, Jasmin J. wrote:

Hi!

Ping!

Attached is a rebased version of my patch due to commit
   33ac16c8cc870229a6a08cd7037275b01e7a0b9d

*** gcc/ChangeLog ***

2016-04-19  Thomas Preud'homme  
 Jasmin Jessich 

  * config.gcc: Handle bare-metal multilibs in --with-multilib-list
  option.
  * config/arm/t-baremetal: New file.
  * configure.ac: added comment for ARM in --with-multilib-list option.
  * configure: added comment for ARM in --with-multilib-list option.

BR
 Jasmin

***

On 03/04/2016 01:19 AM, Jasmin J. wrote:

Hi all!


As to the need to modify Makefile.in and
configure.ac, this is because the patch aims to let control to the user
as to what multilib should be built.

As Ramana asked in his answer to my first version of the patch: Why?
The GCC mechanism to forward this to the t-* makefile is "TM_MULTILIB_CONFIG"
(as far as I have understand it). It is not necessary to introduce a new
variable to configure and Makefile.

Ramana mentioned also:

... as well as comments up top to explain what multilibs are being
built .


Additionally the error message "You cannot use any of ..." didn't print the
the right text in any case.

Attached is an improved version of this patch:
- it uses TM_MULTILIB_CONFIG
- fixed the error message "You cannot use any of ..."
- made the error message "Error:  not supported." more clear
- added a FSF copyright header to t-baremetal file and described what is
   built there
- commented out armv8-m.base and armv8-m.main, because this is currently not
   available in GCC mainline and gcc 5.3.0 release, but will be added soon
   (I guess)

Ramana mentioned in another message a test of the new options:
- I did test it with "test_arm_none_eabi.sh"; procedure taken from this
   message: https://gcc.gnu.org/ml/gcc-patches/2013-10/msg00659.html
- The result is in "test_result.txt".
(both files attached also)

My copyright assignment number: 1059920

Please note, that the patch
   "[PATCH, GCC/ARM, 2/3] Error out for incompatible ARM multilibs"
   from 12/16/2015 12:58 PM
needs to be applied before my new version of this patch.

BR
Jasmin

**

On 12/16/2015 01:04 PM, Thomas Preud'homme wrote:

Hi Ramana,

As suggested in your initial answer to this thread, we updated the multilib
patch provided in ARM's embedded branch to be up-to-date with regards to
supported CPUs in GCC. As to the need to modify Makefile.in and
configure.ac, this is because the patch aims to let control to the user
as to what multilib should be built. To this effect, it takes a list of
architecture at configure time and that list needs to be passed down to
t-baremetal Makefile to set the multilib variables appropriately.

ChangeLog entry is as follows:


*** gcc/ChangeLog ***

2015-12-15  Thomas Preud'homme  

 * Makefile.in (with_multilib_list): New variables substituted by
 configure.
 * config.gcc: Handle bare-metal multilibs in --with-multilib-list
 option.
 * config/arm/t-baremetal: New file.
 * configure.ac (with_multilib_list): New AC_SUBST.
 * configure: Regenerate.
 * doc/install.texi (--with-multilib-list): Update description for
 arm*-*-* targets to mention bare-metal multilibs.


diff --git a/gcc/Makefile.in b/gcc/Makefile.in
index
1f698798aa2df3f44d6b3a478bb4bf48e9fa7372..18b790afa114aa7580be0662d3ac9ffbc94e919d
100644
--- a/gcc/Makefile.in
+++ b/gcc/Makefile.in
@@ -546,6 +546,7 @@ lang_opt_files=@lang_opt_files@ $(srcdir)/c-family/c.opt
$(srcdir)/common.opt
  lang_specs_files=@lang_specs_files@
  lang_tree_files=@lang_tree_files@
  target_cpu_default=@target_cpu_default@
+with_multilib_list=@with_multilib_list@
  OBJC_BOEHM_GC=@objc_boehm_gc@
  extra_modes_file=@extra_modes_file@
  extra_opt_files=@extra_opt_files@
diff --git a/gcc/config.gcc b/gcc/config.gcc
index
af948b5e203f6b4f53dfca38e9d02d060d00c97b..d8098ed3cefacd00cb10590db1ec86d48e9fcdbc
100644
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -3787,15 +3787,25 @@ case "${target}" in
  default)
  ;;
  *)
-echo "Error: --with-multilib-list=${with_multilib_list} not
supported." 1>&2
-exit 1
+for arm_multilib in ${arm_multilibs}; do
+case ${arm_multilib} in
+armv6-m | armv7-m | armv7e-m | armv7-r | armv8-m.base |
armv8-m.main)
+tmake_profile_file="arm/t-baremetal"
+;;
+*)
+echo "Error:
--with-multilib-list=${with_multilib_list} not supported." 1>&2
+exit 1
+;;
+esac
+done
  ;;

Re: Check fpic is ok for target in pr69102.c

2016-07-05 Thread Kito Cheng
Hi Mike:

thanks for your review :)

On Wed, Jul 6, 2016 at 2:54 AM, Mike Stump  wrote:
> On Jul 5, 2016, at 1:39 AM, Kito Cheng  wrote:
>>
>> pr69102.c use -fPIC flag in dg-options but not check is available for
>> target, so I add "dg-require-effective-target fpic" for it.'
>
> I happened to notice you didn't ask Ok?, and you didn't apply it or have it 
> applied.  I'd recommend one or the other.
>
> I'm assuming you meant to ask Ok?
>
> Ok.
>
> I've applied it:
>
> Committed revision 238023.
>
> for you.  Thanks.
>
>> 2016-07-05  Kito Cheng 
>>
>>* gcc.c-torture/compile/pr69102.c: Require fpic support.
>


[PATCH, rs6000] Fix PR target/71733, ICE with -mcpu=power9 -mno-vsx

2016-07-05 Thread Peter Bergner
The following patch fixes a bug where we do not disable POWER9 vector dform
addressing when we compile for POWER9 but without VSX support.  This manifested
itself with us trying to use dform addressing with altivec loads/stores
which is illegal, leading to an ICE.

This has bootstrapped and regtested with no regessions.  Ok for trunk?

This also affects the FSF 6 branch, ok there too, assuming bootstrap and
regtesting complete cleanly?

Peter

gcc/
* config/rs6000/rs6000.c (rs6000_option_override_internal): Disable
-mpower9-dform-vector when disabling -mpower9-vector.

gcc/testsuite/
* gcc.target/powerpc/pr71733.c: New test.


Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 237945)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -4303,7 +4303,8 @@ rs6000_option_override_internal (bool gl
 {
   if (rs6000_isa_flags_explicit & OPTION_MASK_P8_VECTOR)
error ("-mpower9-vector requires -mpower8-vector");
-  rs6000_isa_flags &= ~OPTION_MASK_P9_VECTOR;
+  rs6000_isa_flags &= ~(OPTION_MASK_P9_VECTOR
+   | OPTION_MASK_P9_DFORM_VECTOR);
 }
 
   /* There have been bugs with -mvsx-timode that don't show up with -mlra,
Index: gcc/testsuite/gcc.target/powerpc/pr71733.c
===
--- gcc/testsuite/gcc.target/powerpc/pr71733.c  (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/pr71733.c  (working copy)
@@ -0,0 +1,14 @@
+/* Test for ICE arising from dform code generation with VSX disabled.  */
+
+/* { dg-do compile } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
+/* { dg-options "-O0 -mcpu=power9 -mno-vsx" } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-skip-if "" { powerpc*-*-aix* } { "*" } { "" } } */
+
+typedef __attribute__((altivec(vector__))) unsigned char vec_t;
+vec_t
+foo (vec_t src)
+{
+  return src;
+}