[PATCH] Don't promote RHS of shift-expr if it's integer_type (PR tree-optimization/64183)

2014-12-05 Thread Marek Polacek
My recent change to shift operand promotion caused a regression in
loop unrolling.  Fixed as Richi suggested in the PR audit trail.

Bootstrapped/regtested on ppc64-linux and x86_64-linux, ok for trunk?

2014-12-05  Marek Polacek  

PR tree-optimization/64183
* c-gimplify.c (c_gimplify_expr): Don't convert the RHS of a
shift-expression if it is integer_type_node.

* gcc.dg/tree-ssa/pr64183.c: New test.

diff --git gcc/c-family/c-gimplify.c gcc/c-family/c-gimplify.c
index 2cfa5d9..41a928c 100644
--- gcc/c-family/c-gimplify.c
+++ gcc/c-family/c-gimplify.c
@@ -255,7 +255,8 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
ATTRIBUTE_UNUSED,
   type demotion/promotion pass.  */
tree *op1_p = &TREE_OPERAND (*expr_p, 1);
if (TREE_CODE (TREE_TYPE (*op1_p)) != VECTOR_TYPE
-   && TYPE_MAIN_VARIANT (TREE_TYPE (*op1_p)) != unsigned_type_node)
+   && TYPE_MAIN_VARIANT (TREE_TYPE (*op1_p)) != unsigned_type_node
+   && TYPE_MAIN_VARIANT (TREE_TYPE (*op1_p)) != integer_type_node)
  *op1_p = convert (unsigned_type_node, *op1_p);
break;
   }
diff --git gcc/testsuite/gcc.dg/tree-ssa/pr64183.c 
gcc/testsuite/gcc.dg/tree-ssa/pr64183.c
index e69de29..0563739 100644
--- gcc/testsuite/gcc.dg/tree-ssa/pr64183.c
+++ gcc/testsuite/gcc.dg/tree-ssa/pr64183.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
+
+int bits;
+unsigned int size;
+int max_code;
+
+void
+test ()
+{
+ int code = 0;
+
+ while (code < max_code)
+   code |= ((unsigned int) (size >> (--bits)));
+
+ while (bits < (unsigned int)25)
+   bits += 8;
+}
+
+/* { dg-final { scan-tree-dump "Loop 2 iterates at most 4 times" "cunroll"} } 
*/
+/* { dg-final { cleanup-tree-dump "cunroll" } } */

Marek


Re: [patch] Simplify non-inline function definitions for std::unordered_xxx containers

2014-12-05 Thread Jonathan Wakely

On 03/12/14 12:00 +, Jonathan Wakely wrote:

While working on PR57272 for unordered containers I was getting a
headache reading all the return types with nested-name-qualifiers
split over three or four lines.


More of the same.

Tested powerpc64-linux, committed to trunk.

commit f99108f25854efc6858f465361be57816bdd1ebc
Author: Jonathan Wakely 
Date:   Fri Dec 5 10:20:15 2014 +

	* include/bits/hashtable_policy.h (_Map_base::operator[],
	_Map_base::at): Simplify definitions with trailing return types.

diff --git a/libstdc++-v3/include/bits/hashtable_policy.h b/libstdc++-v3/include/bits/hashtable_policy.h
index 74d1bd0..cab25ef 100644
--- a/libstdc++-v3/include/bits/hashtable_policy.h
+++ b/libstdc++-v3/include/bits/hashtable_policy.h
@@ -584,12 +584,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
-typename _Map_base<_Key, _Pair, _Alloc, _Select1st, _Equal,
-		   _H1, _H2, _Hash, _RehashPolicy, _Traits, true>
-		   ::mapped_type&
+auto
 _Map_base<_Key, _Pair, _Alloc, _Select1st, _Equal,
 	  _H1, _H2, _Hash, _RehashPolicy, _Traits, true>::
 operator[](const key_type& __k)
+-> mapped_type&
 {
   __hashtable* __h = static_cast<__hashtable*>(this);
   __hash_code __code = __h->_M_hash_code(__k);
@@ -610,12 +609,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
-typename _Map_base<_Key, _Pair, _Alloc, _Select1st, _Equal,
-		   _H1, _H2, _Hash, _RehashPolicy, _Traits, true>
-		   ::mapped_type&
+auto
 _Map_base<_Key, _Pair, _Alloc, _Select1st, _Equal,
 	  _H1, _H2, _Hash, _RehashPolicy, _Traits, true>::
 operator[](key_type&& __k)
+-> mapped_type&
 {
   __hashtable* __h = static_cast<__hashtable*>(this);
   __hash_code __code = __h->_M_hash_code(__k);
@@ -636,12 +634,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
-typename _Map_base<_Key, _Pair, _Alloc, _Select1st, _Equal,
-		   _H1, _H2, _Hash, _RehashPolicy, _Traits, true>
-		   ::mapped_type&
+auto
 _Map_base<_Key, _Pair, _Alloc, _Select1st, _Equal,
 	  _H1, _H2, _Hash, _RehashPolicy, _Traits, true>::
 at(const key_type& __k)
+-> mapped_type&
 {
   __hashtable* __h = static_cast<__hashtable*>(this);
   __hash_code __code = __h->_M_hash_code(__k);
@@ -656,12 +653,11 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   template
-const typename _Map_base<_Key, _Pair, _Alloc, _Select1st,
-			 _Equal, _H1, _H2, _Hash, _RehashPolicy,
-			 _Traits, true>::mapped_type&
+auto
 _Map_base<_Key, _Pair, _Alloc, _Select1st, _Equal,
 	  _H1, _H2, _Hash, _RehashPolicy, _Traits, true>::
 at(const key_type& __k) const
+-> const mapped_type&
 {
   const __hashtable* __h = static_cast(this);
   __hash_code __code = __h->_M_hash_code(__k);


Re: asan: support for globals in kernel

2014-12-05 Thread Andrey Ryabinin
On 12/04/2014 10:10 PM, Dmitry Vyukov wrote:
> On Wed, Dec 3, 2014 at 9:19 AM, Andrey Ryabinin  
> wrote:
>> On 12/02/2014 08:56 PM, Dmitry Vyukov wrote:
>>> Hi,
>>>
>>> The following patch adds support for instrumentation of globals for
>>> Linux kernel (-fsanitize=kernel-address). Kernel only supports
>>> constructors with default priority, but the rest works fine.
>>>
>>> OK for trunk?
>>>
>>
>> I know this is too late already, but why we need this?
>> IMO it's much better to add support for constructors with priorities in 
>> kernel.
>>
>> We need to do this in kernel anyway, because GCC 4.9.2 don't have this patch 
>> and
>> I assume that we want to make instrumentation of globals work in kernel with 
>> 4.9.2
> 
> 
> That would be an option too. I don't know whether it is much better or not.
> Kernel lives without constructors, they are used only by coverage. And
> coverage does not need priorities. So it is only kasan that needs
> priorities. That would be a plenty of code in lib/module.c only for
> kasan...

It will be a very little piece of code. I don't think that it will be a problem
Perhaps, I'll cook a patch today.

> 
> Meanwhile here is backport to 4.9. Applied w/o conflicts (not counting
> ChangeLog).
> 
> OK to commit to gcc.gnu.org/svn/gcc/branches/gcc-4_9-branch ?
> 
> 
> 
> Index: gcc/ChangeLog
> ===
> --- gcc/ChangeLog (revision 218382)
> +++ gcc/ChangeLog (working copy)
> @@ -1,3 +1,8 @@
> +2014-12-04  Dmitry Vyukov  
> +
> + * asan.c: (asan_finish_file): Use default priority for constructors
> + in kernel mode.
> +
>  2014-12-04  Jakub Jelinek  
> 
>   PR c++/56493
> Index: gcc/asan.c
> ===
> --- gcc/asan.c (revision 218382)
> +++ gcc/asan.c (working copy)
> @@ -1295,7 +1295,9 @@
>   the var that is selected by the linker will have
>   padding or not.  */
>|| DECL_ONE_ONLY (decl)
> -  /* Similarly for common vars.  People can use -fno-common.  */
> +  /* Similarly for common vars.  People can use -fno-common.
> + Note: Linux kernel is built with -fno-common, so we do instrument
> + globals there even if it is C.  */
>|| (DECL_COMMON (decl) && TREE_PUBLIC (decl))
>/* Don't protect if using user section, often vars placed
>   into user section from multiple TUs are then assumed
> @@ -2383,6 +2385,13 @@
>   nor after .LASAN* array.  */
>flag_sanitize &= ~SANITIZE_ADDRESS;
> 
> +  /* For user-space we want asan constructors to run first.
> + Linux kernel does not support priorities other than default, and the 
> only
> + other user of constructors is coverage. So we run with the default
> + priority.  */
> +  int priority = flag_sanitize & SANITIZE_USER_ADDRESS
> + ? MAX_RESERVED_INIT_PRIORITY - 1 : DEFAULT_INIT_PRIORITY;
> +
>if (flag_sanitize & SANITIZE_USER_ADDRESS)
>  {
>tree fn = builtin_decl_implicit (BUILT_IN_ASAN_INIT);
> @@ -2436,12 +2445,10 @@
>   build_fold_addr_expr (var),
>   gcount_tree),
>   &dtor_statements);
> -  cgraph_build_static_cdtor ('D', dtor_statements,
> - MAX_RESERVED_INIT_PRIORITY - 1);
> +  cgraph_build_static_cdtor ('D', dtor_statements, priority);
>  }
>if (asan_ctor_statements)
> -cgraph_build_static_cdtor ('I', asan_ctor_statements,
> -   MAX_RESERVED_INIT_PRIORITY - 1);
> +cgraph_build_static_cdtor ('I', asan_ctor_statements, priority);
>flag_sanitize |= SANITIZE_ADDRESS;
>  }
> 



Re: [PATCH] MIPS/GCC: Unconditional jump generation bug fix

2014-12-05 Thread Richard Sandiford
"Maciej W. Rozycki"  writes:
> 2014-11-17  Maciej W. Rozycki  
>
>   gcc/
>   * gcc/config/mips/mips.md (*jump_absolute): Use a branch when in
>   range, a jump otherwise.
>
>   Maciej
>
> gcc-mips-jump-branch.diff
> Index: gcc-fsf-trunk-quilt/gcc/config/mips/mips.md
> ===
> --- gcc-fsf-trunk-quilt.orig/gcc/config/mips/mips.md  2014-11-16 
> 19:54:17.0 +
> +++ gcc-fsf-trunk-quilt/gcc/config/mips/mips.md   2014-11-17 
> 04:44:32.847732003 +
> @@ -5957,14 +5957,12 @@
>   (label_ref (match_operand 0)))]
>"!TARGET_MIPS16 && TARGET_ABSOLUTE_JUMPS"
>  {
> -  /* Use a branch for microMIPS.  The assembler will choose
> - a 16-bit branch, a 32-bit branch, or a 32-bit jump.  */
> -  if (TARGET_MICROMIPS && !TARGET_ABICALLS_PIC2)
> +  if (get_attr_length (insn) <= 8)
>  return "%*b\t%l0%/";
>else
>  return MIPS_ABSOLUTE_JUMP ("%*j\t%l0%/");
>  }
> -  [(set_attr "type" "jump")])
> +  [(set_attr "type" "branch")])

You didn't mention it explicitly, but this will have the effect of
overestimating the length of the insn by 8 bytes in cases where the
jump is used.  That might be an acceptable trade-off (even for
non-microMIPS code) but it's probably worth mentioning in a comment.

Thanks,
Richard



RE: [PATCH] MIPS/GCC: Unconditional jump generation bug fix

2014-12-05 Thread Matthew Fortune
Richard Sandiford  writes:
> "Maciej W. Rozycki"  writes:
> > 2014-11-17  Maciej W. Rozycki  
> >
> > gcc/
> > * gcc/config/mips/mips.md (*jump_absolute): Use a branch when in
> > range, a jump otherwise.
> >
> >   Maciej
> >
> > gcc-mips-jump-branch.diff
> > Index: gcc-fsf-trunk-quilt/gcc/config/mips/mips.md
> > ===
> > --- gcc-fsf-trunk-quilt.orig/gcc/config/mips/mips.md2014-11-16
> 19:54:17.0 +
> > +++ gcc-fsf-trunk-quilt/gcc/config/mips/mips.md 2014-11-17
> 04:44:32.847732003 +
> > @@ -5957,14 +5957,12 @@
> > (label_ref (match_operand 0)))]
> >"!TARGET_MIPS16 && TARGET_ABSOLUTE_JUMPS"
> >  {
> > -  /* Use a branch for microMIPS.  The assembler will choose
> > - a 16-bit branch, a 32-bit branch, or a 32-bit jump.  */
> > -  if (TARGET_MICROMIPS && !TARGET_ABICALLS_PIC2)
> > +  if (get_attr_length (insn) <= 8)
> >  return "%*b\t%l0%/";
> >else
> >  return MIPS_ABSOLUTE_JUMP ("%*j\t%l0%/");
> >  }
> > -  [(set_attr "type" "jump")])
> > +  [(set_attr "type" "branch")])
> 
> You didn't mention it explicitly, but this will have the effect of
> overestimating the length of the insn by 8 bytes in cases where the
> jump is used.  That might be an acceptable trade-off (even for
> non-microMIPS code) but it's probably worth mentioning in a comment.

I honestly haven't digested all the detail of the length attribute
calculation but I assume this comes from the fact that type=branch are
assumed to only support 16-bit PC-relative displacement and a multi
instruction sequence otherwise?

Perhaps in the long run we need to educate the length calculation for
jumps to know about the unconditional branch range and size the
instruction appropriately if the range is known to be within a 16-bit.
This pattern could then change back to a jump.

I suspect all the length calculation logic for jumps/branches etc will
need an overhaul as part of adding R6 compact branch support. I have
been working on this with AndrewB and the first cut just leaves the
length calculation to overestimate as it is hard enough to just get it
all working.

Thanks,
Matthew


Re: [PATCH] Don't promote RHS of shift-expr if it's integer_type (PR tree-optimization/64183)

2014-12-05 Thread Jakub Jelinek
On Fri, Dec 05, 2014 at 11:27:50AM +0100, Marek Polacek wrote:
> My recent change to shift operand promotion caused a regression in
> loop unrolling.  Fixed as Richi suggested in the PR audit trail.
> 
> Bootstrapped/regtested on ppc64-linux and x86_64-linux, ok for trunk?
> 
> 2014-12-05  Marek Polacek  
> 
>   PR tree-optimization/64183
>   * c-gimplify.c (c_gimplify_expr): Don't convert the RHS of a
>   shift-expression if it is integer_type_node.
> 
>   * gcc.dg/tree-ssa/pr64183.c: New test.

This is for middle-end, so I think it would be better to use
the middle-end type equality in the checks, so !types_compatible_p 
instead of != ?

> diff --git gcc/c-family/c-gimplify.c gcc/c-family/c-gimplify.c
> index 2cfa5d9..41a928c 100644
> --- gcc/c-family/c-gimplify.c
> +++ gcc/c-family/c-gimplify.c
> @@ -255,7 +255,8 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
> ATTRIBUTE_UNUSED,
>  type demotion/promotion pass.  */
>   tree *op1_p = &TREE_OPERAND (*expr_p, 1);
>   if (TREE_CODE (TREE_TYPE (*op1_p)) != VECTOR_TYPE
> - && TYPE_MAIN_VARIANT (TREE_TYPE (*op1_p)) != unsigned_type_node)
> + && TYPE_MAIN_VARIANT (TREE_TYPE (*op1_p)) != unsigned_type_node
> + && TYPE_MAIN_VARIANT (TREE_TYPE (*op1_p)) != integer_type_node)
> *op1_p = convert (unsigned_type_node, *op1_p);
>   break;
>}
> diff --git gcc/testsuite/gcc.dg/tree-ssa/pr64183.c 
> gcc/testsuite/gcc.dg/tree-ssa/pr64183.c
> index e69de29..0563739 100644
> --- gcc/testsuite/gcc.dg/tree-ssa/pr64183.c
> +++ gcc/testsuite/gcc.dg/tree-ssa/pr64183.c
> @@ -0,0 +1,21 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
> +
> +int bits;
> +unsigned int size;
> +int max_code;
> +
> +void
> +test ()
> +{
> + int code = 0;
> +
> + while (code < max_code)
> +   code |= ((unsigned int) (size >> (--bits)));
> +
> + while (bits < (unsigned int)25)
> +   bits += 8;
> +}
> +
> +/* { dg-final { scan-tree-dump "Loop 2 iterates at most 4 times" "cunroll"} 
> } */
> +/* { dg-final { cleanup-tree-dump "cunroll" } } */

Jakub


Re: [PATCH] MIPS/GCC: Unconditional jump generation bug fix

2014-12-05 Thread Richard Sandiford
Matthew Fortune  writes:
> Richard Sandiford  writes:
>> "Maciej W. Rozycki"  writes:
>> > 2014-11-17  Maciej W. Rozycki  
>> >
>> >gcc/
>> >* gcc/config/mips/mips.md (*jump_absolute): Use a branch when in
>> >range, a jump otherwise.
>> >
>> >   Maciej
>> >
>> > gcc-mips-jump-branch.diff
>> > Index: gcc-fsf-trunk-quilt/gcc/config/mips/mips.md
>> > ===
>> > --- gcc-fsf-trunk-quilt.orig/gcc/config/mips/mips.md   2014-11-16
>> 19:54:17.0 +
>> > +++ gcc-fsf-trunk-quilt/gcc/config/mips/mips.md2014-11-17
>> 04:44:32.847732003 +
>> > @@ -5957,14 +5957,12 @@
>> >(label_ref (match_operand 0)))]
>> >"!TARGET_MIPS16 && TARGET_ABSOLUTE_JUMPS"
>> >  {
>> > -  /* Use a branch for microMIPS.  The assembler will choose
>> > - a 16-bit branch, a 32-bit branch, or a 32-bit jump.  */
>> > -  if (TARGET_MICROMIPS && !TARGET_ABICALLS_PIC2)
>> > +  if (get_attr_length (insn) <= 8)
>> >  return "%*b\t%l0%/";
>> >else
>> >  return MIPS_ABSOLUTE_JUMP ("%*j\t%l0%/");
>> >  }
>> > -  [(set_attr "type" "jump")])
>> > +  [(set_attr "type" "branch")])
>> 
>> You didn't mention it explicitly, but this will have the effect of
>> overestimating the length of the insn by 8 bytes in cases where the
>> jump is used.  That might be an acceptable trade-off (even for
>> non-microMIPS code) but it's probably worth mentioning in a comment.
>
> I honestly haven't digested all the detail of the length attribute
> calculation but I assume this comes from the fact that type=branch are
> assumed to only support 16-bit PC-relative displacement and a multi
> instruction sequence otherwise?

Yeah, and the patch relies on that overestimation by using the
"get_attr_length (insn) <= 8" condition to tell whether a branch is OK.
I.e. it uses the branch if the insn is assumed to be 4 bytes + delay
slot and uses a jump if the insn is assumed to be bigger.  But the
insn we actually emit is never bigger; it's always 4 bytes + delay slot.

Obviously the cases where we need the jump should be rare,
but those are also the cases where overestimating hurts most.
Saying that this instruction is 12 bytes + delay slot means
that conditional branches around it may be turned into
long-branch sequences even if they are actually in range.

> Perhaps in the long run we need to educate the length calculation for
> jumps to know about the unconditional branch range and size the
> instruction appropriately if the range is known to be within a 16-bit.
> This pattern could then change back to a jump.
>
> I suspect all the length calculation logic for jumps/branches etc will
> need an overhaul as part of adding R6 compact branch support. I have
> been working on this with AndrewB and the first cut just leaves the
> length calculation to overestimate as it is hard enough to just get it
> all working.

Yeah, can imagine that would be tricky :-)

Thanks,
Richard



[PATCH 0/4][AArch64] PR/63870 Improve handling of errors in SIMD intrinsics

2014-12-05 Thread Alan Lawrence
Following on from Charles Baylis' patch to improve the error message when 
expanding arguments with qualifier_lane_index, this applies similar treatment to 
__builtin_aarch64_im_lane_boundsi (using for e.g. vset_lane and vext), and the 
more general case of immediates which should be constant but aren't.


These patches depend upon the __aarch64_lane macro in 
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg03134.html .


All patches cross-tested with check-gcc on aarch64-none-elf and 
aarch64_be-none-elf.

Ok for trunk (following 
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg03134.html) ?

Cheers, Alan



[PATCH 1/4][AArch64]Fix ICE on non-constant indices to __builtin_aarch64_im_lane_boundsi

2014-12-05 Thread Alan Lawrence
When the lane index to e.g. vset_lane_xxx is a non-constant, at present we get 
an ICE:


In file included from 
gcc/testsuite/gcc.target/aarch64/simd/vset_lane_s16_const_1.c:6:0:
/work/alalaw01/oban/buildfsf-aarch64-none-elf/install/lib/gcc/aarch64-none-elf/5.0.0/include/arm_neon.h: 
In function 'main':
/work/alalaw01/oban/buildfsf-aarch64-none-elf/install/lib/gcc/aarch64-none-elf/5.0.0/include/arm_neon.h:4280:10: 
internal compiler error: in aarch64_simd_lane_bounds, at 
config/aarch64/aarch64.c:8410

return __aarch64_vset_lane_any (__elem, __vec, __index);
  ^
0x100e0f1 aarch64_simd_lane_bounds(rtx_def*, long, long, tree_node const*)
  /work/alalaw01/oban/srcfsf/gcc/gcc/config/aarch64/aarch64.c:8410
0x107b279 gen_aarch64_im_lane_boundsi(rtx_def*, rtx_def*)
  /work/alalaw01/oban/srcfsf/gcc/gcc/config/aarch64/aarch64-simd.md:4560
0x7fc50e insn_gen_fn::operator()(rtx_def*, rtx_def*) const
  /work/alalaw01/oban/srcfsf/gcc/gcc/recog.h:303
0x10142f5 aarch64_simd_expand_args
  /work/alalaw01/oban/srcfsf/gcc/gcc/config/aarch64/aarch64-builtins.c:970
0x1014692 aarch64_simd_expand_builtin(int, tree_node*, rtx_def*)
  /work/alalaw01/oban/srcfsf/gcc/gcc/config/aarch64/aarch64-builtins.c:1051
0x1014bb0 aarch64_expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, 
int)
  /work/alalaw01/oban/srcfsf/gcc/gcc/config/aarch64/aarch64-builtins.c:1133
0x7683d6 expand_builtin(tree_node*, rtx_def*, rtx_def*, machine_mode, int)
  /work/alalaw01/oban/srcfsf/gcc/gcc/builtins.c:5912

Code with a non-constant lane index is invalid, but this patch improves the 
handling and error message to the following:


In file included from 
gcc/testsuite/gcc.target/aarch64/simd/vset_lane_s16_const_1.c:6:0:

In function 'vset_lane_s16',
  inlined from 'main' at 
gcc/testsuite/gcc.target/aarch64/simd/vset_lane_s16_const_1.c:13:13:
/work/alalaw01/oban/buildfsf-aarch64-none-elf/install/lib/gcc/aarch64-none-elf/5.0.0/include/arm_neon.h:4281:10: 
error: lane index must be a constant immediate

return __aarch64_vset_lane_any (__elem, __vec, __index);

Unfortunately the source code printed out is in arm_neon.h, but this at least 
contains the source code location (here vset_lane_s16_const_1.c:13:13), and it 
isn't an ICE ;).


Technique is to remove the aarch64_im_lane_boundsi expander, and to handle it as 
a special case in aarch64_simd_expand_builtin, where the tree (recording the 
inlining history) is available. This allows removal of the old pattern and 
associated bits.


Also replace the hand-coded #lanes in all arm_neon.h's calls to 
__builtin_aarch64_im_lane_boundsi, with a #lanes computed automatically via sizeof.


gcc/ChangeLog:

* config/aarch64/aarch64-builtins.c (aarch64_types_binopv_qualifiers,
TYPES_BINOPV): Delete.
(enum aarch64_builtins): Add AARCH64_BUILTIN_SIMD_LANE_CHECK and
AARCH64_SIMD_PATTERN_START.
(aarch64_init_simd_builtins): Register
__builtin_aarch64_im_lane_boundsi; use  AARCH64_SIMD_PATTERN_START.
(aarch64_simd_expand_builtin): Handle AARCH64_BUILTIN_LANE_CHECK; use
AARCH64_SIMD_PATTERN_START.

* config/aarch64/aarch64-simd.md (aarch64_im_lane_boundsi): Delete.
* config/aarch64/aarch64-simd-builtins.def (im_lane_bound): Delete.

* config/aarch64/arm_neon.h (__AARCH64_LANE_CHECK): New.
(__aarch64_vget_lane_f64, __aarch64_vget_lane_s64,
__aarch64_vget_lane_u64, __aarch64_vset_lane_any, vdupd_lane_f64,
vdupd_lane_s64, vdupd_lane_u64, vext_f32, vext_f64, vext_p8, vext_p16,
vext_s8, vext_s16, vext_s32, vext_s64, vext_u8, vext_u16, vext_u32,
vext_u64, vextq_f32, vextq_f64, vextq_p8, vextq_p16, vextq_s8,
vextq_s16, vextq_s32, vextq_s64, vextq_u8, vextq_u16, vextq_u32,
vextq_u64, vmulq_lane_f64): Use __AARCH64_LANE_CHECK.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/simd/vset_lane_s16_const_1.c: New test.diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index e9c4c85fd3f1dbbb81d306bbab79409034261dc3..8aceeb4cabee65b1725deb5b848312a8bc73f973 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -143,10 +143,6 @@ aarch64_types_binop_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_none, qualifier_none, qualifier_maybe_immediate };
 #define TYPES_BINOP (aarch64_types_binop_qualifiers)
 static enum aarch64_type_qualifiers
-aarch64_types_binopv_qualifiers[SIMD_MAX_BUILTIN_ARGS]
-  = { qualifier_void, qualifier_none, qualifier_none };
-#define TYPES_BINOPV (aarch64_types_binopv_qualifiers)
-static enum aarch64_type_qualifiers
 aarch64_types_binopu_qualifiers[SIMD_MAX_BUILTIN_ARGS]
   = { qualifier_unsigned, qualifier_unsigned, qualifier_unsigned };
 #define TYPES_BINOPU (aarch64_types_binopu_qualifiers)
@@ -344,9 +340,12 @@ enum aarch64_builtins
   AARCH64_BUILTIN_SET_FPSR,
 
   AARCH64_SIMD_BUILTIN_BASE,
+  AARCH64_SIMD_BUILTIN_LANE_CHECK,
 #include "aarch64-simd-builtins.def"
-  AARCH64_

[PATCH 2/4][AArch64]Improve error message for non-constant immediates

2014-12-05 Thread Alan Lawrence
The error message when passing a non-constant in place of an immediate to an 
arm_neon.h *intrinsic* (as programmers should, rather than the __builtin), is poor:


In file included from 
gcc/testsuite/gcc.target/aarch64/arg-type-diagnostics-1.c:4:0:
/work/alalaw01/oban/buildfsf-aarch64-none-elf/install/lib/gcc/aarch64-none-elf/5.0.0/include/arm_neon.h: 
In function 'foo':
/work/alalaw01/oban/buildfsf-aarch64-none-elf/install/lib/gcc/aarch64-none-elf/5.0.0/include/arm_neon.h:21752:10: 
error: incompatible type for argument 3, expected 'const int'

return (int32x2_t) __builtin_aarch64_srsra_nv2si (__a, __b, __c);

...the line number references arm_neon.h, and the meaning of 'const int' is 
distinct from what that means in C! Similarly to the previous patch, this patch 
improves this to:


In file included from 
gcc/testsuite/gcc.target/aarch64/arg-type-diagnostics-1.c:4:0:
In function 'vrsra_n_s32',
  inlined from 'foo' at 
gcc/testsuite/gcc.target/aarch64/arg-type-diagnostics-1.c:13:10:
/work/alalaw01/oban/buildfsf-aarch64-none-elf/install/lib/gcc/aarch64-none-elf/5.0.0/include/arm_neon.h:21752:10: 
error: argument 3 must be a constant immediate

return (int32x2_t) __builtin_aarch64_srsra_nv2si (__a, __b, __c);

(This shows arg-type-diagnostics-1.c:13:10, as the actual source line containing 
the error).


The code for SIMD_ARG_CONSTANT also covers non-constant lane indices, so error 
messages there are improved too.


gcc/ChangeLog:

* gcc/config/aarch64-builtins.c (aarch64_simd_expand_args): Update error
message for SIMD_ARG_CONSTANT.

gcc/testsuite/ChangeLog:

* gcc.target/aarch64/arg-type-diagnostics-1.c: Call intrinsic, update
expected error message.diff --git a/gcc/config/aarch64/aarch64-builtins.c b/gcc/config/aarch64/aarch64-builtins.c
index f8e14ab40637b66302f5cc194ae9507e37a248f9..09fd49648922a197d4d0a5f4d230af77ee0e31ba 100644
--- a/gcc/config/aarch64/aarch64-builtins.c
+++ b/gcc/config/aarch64/aarch64-builtins.c
@@ -925,8 +925,8 @@ aarch64_simd_expand_args (rtx target, int icode, int have_retval,
 	  if (!(*insn_data[icode].operand[argc + have_retval].predicate)
 		  (op[argc], mode[argc]))
 	  {
-		error_at (EXPR_LOCATION (exp), "incompatible type for argument %d, "
-		   "expected %", argc + 1);
+		error ("%Kargument %d must be a constant immediate",
+		   exp, argc + 1);
 		return const0_rtx;
 	  }
 	  break;
diff --git a/gcc/testsuite/gcc.target/aarch64/arg-type-diagnostics-1.c b/gcc/testsuite/gcc.target/aarch64/arg-type-diagnostics-1.c
index 55dd9f66f23cf6e6eff67bd68d875d24c0460d13..a7b7cd3bd8d40faaabbe48a1d382e9860d4497ff 100644
--- a/gcc/testsuite/gcc.target/aarch64/arg-type-diagnostics-1.c
+++ b/gcc/testsuite/gcc.target/aarch64/arg-type-diagnostics-1.c
@@ -3,13 +3,16 @@
 
 #include "arm_neon.h"
 
-void foo ()
+void foo (int a)
 {
-  int a;
   int32x2_t arg1;
   int32x2_t arg2;
   int32x2_t result;
   arg1 = vcreate_s32 (UINT64_C (0x));
   arg2 = vcreate_s32 (UINT64_C (0x16497fff));
-  result = __builtin_aarch64_srsra_nv2si (arg1, arg2, a); /* { dg-error "incompatible type for argument" } */
+  /* The correct line number is in the preamble to the error message,
+ not in the final line (which is all that dg-error inspects). Hence,
+ we have to tell dg-error to ignore the line number.  */
+  result = vrsra_n_s32 (arg1, arg2, a);
+  /* { dg-error "must be a constant immediate" "" { target *-*-* } 0 } */
 }

[PATCH 3/4][AArch64]Remove be_checked_get_lane, check bounds with __builtin_aarch64_im_lane_boundsi.

2014-12-05 Thread Alan Lawrence
The current __builtin_aarch64_be-checked_get_lane, on which all 
arm_neon.h's vget_lane intrinsics rely, has two problems: (a) indices are only 
checked sporadically; (b) it acts as an opaque block to optimization until 
expansion - yet is really just a simple vec_select. Both these can be solved by 
using macros and the existing __builtin_aarch64_im_lane_boundsi. (This should 
thus improve checking for numerous other intrinsics which are written as GCC 
vector extensions depending on vget_lane.) Whilst we encourage end-user 
programmers not to mix programming models (i.e. NEON Intrinsics and GCC Vector 
Extensions), us doing so in arm_neon.h will generate the most efficient code by 
allowing the most mid-end optimization.


The mass of similar testcases results from having to tell dejagnu not to inspect 
line numbers (the actual line number in the source code appears in the inlining 
history, which is not what dejagnu looks for).


gcc/ChangeLog:

* config/aarch64/aarch64-simd-builtins.def (be_checked_get_lane):
Delete.
* config/aarch64/aarch64-simd.md (aarch64_be_checked_get_lane):
Delete.
* config/aarch64/arm_neon.h (aarch64_vget_lane_any): Use GCC
vector extensions, __aarch64_lane, __builtin_aarch64_im_lane_boundsi.
(__aarch64_vget_lane_f32, __aarch64_vget_lane_f64,
__aarch64_vget_lane_p8, __aarch64_vget_lane_p16,
__aarch64_vget_lane_s8, __aarch64_vget_lane_s16,
__aarch64_vget_lane_s32, __aarch64_vget_lane_s64,
__aarch64_vget_lane_u8, __aarch64_vget_lane_u16,
__aarch64_vget_lane_u32, __aarch64_vget_lane_u64,
__aarch64_vgetq_lane_f32, __aarch64_vgetq_lane_f64,
__aarch64_vgetq_lane_p8, __aarch64_vgetq_lane_p16,
__aarch64_vgetq_lane_s8, __aarch64_vgetq_lane_s16,
__aarch64_vgetq_lane_s32, __aarch64_vgetq_lane_s64,
__aarch64_vgetq_lane_u8, __aarch64_vgetq_lane_u16,
__aarch64_vgetq_lane_u32, __aarch64_vgetq_lane_u64): Delete.
(__aarch64_vdup_lane_any): Use __aarch64_vget_lane_any, remove
‘q2’ argument.
(__aarch64_vdup_lane_f32, __aarch64_vdup_lane_f64,
__aarch64_vdup_lane_p8, __aarch64_vdup_lane_p16,
__aarch64_vdup_lane_s8, __aarch64_vdup_lane_s16,
__aarch64_vdup_lane_s32, __aarch64_vdup_lane_s64,
__aarch64_vdup_lane_u8, __aarch64_vdup_lane_u16,
__aarch64_vdup_lane_u32, __aarch64_vdup_lane_u64,
__aarch64_vdup_laneq_f32, __aarch64_vdup_laneq_f64,
__aarch64_vdup_laneq_p8, __aarch64_vdup_laneq_p16,
__aarch64_vdup_laneq_s8, __aarch64_vdup_laneq_s16,
__aarch64_vdup_laneq_s32, __aarch64_vdup_laneq_s64,
__aarch64_vdup_laneq_u8, __aarch64_vdup_laneq_u16,
__aarch64_vdup_laneq_u32, __aarch64_vdup_laneq_u64): Remove argument
to __aarch64_vdup_lane_any.
(vget_lane_f32, vget_lane_f64, vget_lane_p8, vget_lane_p16,
vget_lane_s8, vget_lane_s16, vget_lane_s32, vget_lane_s64,
vget_lane_u8, vget_lane_u16, vget_lane_u32, vget_lane_u64,
vgetq_lane_f32, vgetq_lane_f64, vgetq_lane_p8, vgetq_lane_p16,
vgetq_lane_s8, vgetq_lane_s16, vgetq_lane_s32, vgetq_lane_s64,
vgetq_lane_u8, vgetq_lane_u16, vgetq_lane_u32, vgetq_lane_u64,
vdupb_lane_p8, vdupb_lane_s8, vdupb_lane_u8, vduph_lane_p16,
vduph_lane_s16, vduph_lane_u16, vdups_lane_f32, vdups_lane_s32,
vdups_lane_u32, vdupb_laneq_p8, vdupb_laneq_s8, vdupb_laneq_u8,
vduph_laneq_p16, vduph_laneq_s16, vduph_laneq_u16, vdups_laneq_f32,
vdups_laneq_s32, vdups_laneq_u32, vdupd_laneq_f64, vdupd_laneq_s64,
vdupd_laneq_u64, vfmas_lane_f32, vfma_laneq_f64, vfmad_laneq_f64,
vfmas_laneq_f32, vfmss_lane_f32, vfms_laneq_f64, vfmsd_laneq_f64,
vfmss_laneq_f32, vmla_lane_f32, vmla_lane_s16, vmla_lane_s32,
vmla_lane_u16, vmla_lane_u32, vmla_laneq_f32, vmla_laneq_s16,
vmla_laneq_s32, vmla_laneq_u16, vmla_laneq_u32, vmlaq_lane_f32,
vmlaq_lane_s16, vmlaq_lane_s32, vmlaq_lane_u16, vmlaq_lane_u32,
vmlaq_laneq_f32, vmlaq_laneq_s16, vmlaq_laneq_s32, vmlaq_laneq_u16,
vmlaq_laneq_u32, vmls_lane_f32, vmls_lane_s16, vmls_lane_s32,
vmls_lane_u16, vmls_lane_u32, vmls_laneq_f32, vmls_laneq_s16,
vmls_laneq_s32, vmls_laneq_u16, vmls_laneq_u32, vmlsq_lane_f32,
vmlsq_lane_s16, vmlsq_lane_s32, vmlsq_lane_u16, vmlsq_lane_u32,
vmlsq_laneq_f32, vmlsq_laneq_s16, vmlsq_laneq_s32, vmlsq_laneq_u16,
vmlsq_laneq_u32, vmul_lane_f32, vmul_lane_s16, vmul_lane_s32,
vmul_lane_u16, vmul_lane_u32, vmuld_lane_f64, vmuld_laneq_f64,
vmuls_lane_f32, vmuls_laneq_f32, vmul_laneq_f32, vmul_laneq_f64,
vmul_laneq_s16, vmul_laneq_s32, vmul_laneq_u16, vmul_laneq_u32,
vmulq_lane_f32, vmulq_lane_s16, vmulq_lane_s32, vmulq_lane_u16,
vmulq_lane_u32, vmulq_laneq_f32, vmulq_laneq_f64, vmulq_laneq_s16,
vmulq_laneq_s32, vmulq_laneq_u16, vmulq_laneq_u32) : Use
   

[PATCH 4/4][AArch64]Remove aarch64_get_lanedi, unused

2014-12-05 Thread Alan Lawrence
I tested this by poisoning the old pattern and running check-gcc on both 
aarch64-none-elf and aarch64_be-none-elf; there were no regressions even with 
the poisoned pattern.


gcc/ChangeLog:

* config/aarch64/aarch64-simd.md (aarch64_get_lanedi): Remove.diff --git a/gcc/config/aarch64/aarch64-simd.md b/gcc/config/aarch64/aarch64-simd.md
index 56525738be2bbd7ed1f3e59cc2de63178a971755..4b6fd3abe891a6c067c86fb997ccc8eee81c0491 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -2417,17 +2417,6 @@
   [(set_attr "type" "neon_to_gp, neon_dup, neon_store1_one_lane")]
 )
 
-(define_expand "aarch64_get_lanedi"
-  [(match_operand:DI 0 "register_operand")
-   (match_operand:DI 1 "register_operand")
-   (match_operand:SI 2 "immediate_operand")]
-  "TARGET_SIMD"
-{
-  aarch64_simd_lane_bounds (operands[2], 0, 1, NULL);
-  emit_move_insn (operands[0], operands[1]);
-  DONE;
-})
-
 ;; In this insn, operand 1 should be low, and operand 2 the high part of the
 ;; dest vector.
 

[PATCH PR62178]Improve candidate selecting in IVOPT, 2nd try.

2014-12-05 Thread Bin Cheng
Hi,
Though PR62178 is hidden by recent cost change in aarch64 backend, the ivopt
issue still exists.

Current candidate selecting algorithm tends to select fewer candidates given
below reasons:
  1) to better handle loops with many induction uses but the best choice is
one generic basic induction variable;
  2) to keep compilation time low.

One fundamental weakness of the strategy is the opposite situation can't be
handled properly sometimes.  For these cases the best choice is each
induction variable has its own candidate.
This patch fixes the problem by shuffling candidate set after fix-point is
reached by current implementation.  The reason why this strategy works is it
replaces candidate set by selecting local optimal candidate for some
induction uses, and the new candidate set (has lower cost) is exact what we
want in the mentioned case.  Instrumentation data shows this can find better
candidates set for ~6% loops in spec2006 on x86_64, and ~4% on aarch64.

This patch actually is extension to the first version patch posted at
https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02620.html, that only adds
another selecting pass with special seed set (more or less like the shuffled
set in this patch).  Data also confirms this patch can find optimal sets for
most loops found by the first one, as well as optimal sets for many new
loops.

Bootstrap and test on x86_64, no regression on benchmarks.  Bootstrap and
test on aarch64.
Since this patch only selects candidate set with lower cost, any regressions
revealed are latent bugs of other components in GCC.
I also collected GCC bootstrap time on x86_64, no regression either.
Is this OK?

2014-12-03  Bin Cheng  bin.ch...@arm.com

  PR tree-optimization/62178
  * tree-ssa-loop-ivopts.c (iv_ca_replace): New function.
  (try_improve_iv_set): Shuffle candidates set in order to handle
  case in which candidate wrto each iv use should be selected.

gcc/testsuite/ChangeLog
2014-12-03  Bin Cheng  bin.ch...@arm.com

  PR tree-optimization/62178
  * gcc.target/aarch64/pr62178.c: New test.Index: gcc/tree-ssa-loop-ivopts.c
===
--- gcc/tree-ssa-loop-ivopts.c  (revision 217828)
+++ gcc/tree-ssa-loop-ivopts.c  (working copy)
@@ -5718,6 +5718,85 @@ iv_ca_extend (struct ivopts_data *data, struct iv_
   return cost;
 }
 
+/* Try replacing candidates in IVS which are recorded by list ACT_DELTA to
+   lower cost candidates.  CAND is the one won't be replaced.  Replacement
+   of candidate is recorded in list DELTA.  */
+
+static void
+iv_ca_replace (struct ivopts_data *data, struct iv_ca *ivs,
+  struct iv_cand *cand, struct iv_ca_delta *act_delta,
+  struct iv_ca_delta **delta)
+{
+  unsigned int i, j;
+  bitmap_iterator bi;
+  struct iv_use *use;
+  struct iv_cand *cnd;
+  bool should_replace;
+  struct iv_ca_delta *act;
+  struct cost_pair *old_cp, *best_cp = NULL, *cp;
+
+  *delta = NULL;
+  for (i = 0; i < ivs->upto; i++)
+{
+  use = iv_use (data, i);
+
+  old_cp = iv_ca_cand_for_use (ivs, use);
+  if (old_cp->cand == cand)
+   continue;
+
+  should_replace = false;
+  for (act = act_delta; act; act = act->next_change)
+   if (old_cp->cand == act->old_cp->cand)
+ {
+   should_replace = true;
+   break;
+ }
+  if (!should_replace)
+   continue;
+
+  best_cp = NULL;
+  if (data->consider_all_candidates)
+   {
+ for (j = 0; j < n_iv_cands (data); j++)
+   {
+ if (j == old_cp->cand->id)
+   continue;
+
+ cnd = iv_cand (data, j);
+ cp = get_use_iv_cost (data, use, cnd);
+ if (!cp)
+   continue;
+
+ if (best_cp == NULL || cheaper_cost_pair (cp, best_cp))
+   best_cp = cp;
+   }
+   }
+  else
+   {
+ EXECUTE_IF_SET_IN_BITMAP (use->related_cands, 0, j, bi)
+   {
+ if (j == old_cp->cand->id)
+   continue;
+
+ cnd = iv_cand (data, j);
+ cp = get_use_iv_cost (data, use, cnd);
+ if (!cp)
+   continue;
+
+ if (best_cp == NULL || cheaper_cost_pair (cp, best_cp))
+   best_cp = cp;
+   }
+   }
+
+  if (!best_cp)
+   continue;
+
+  *delta = iv_ca_delta_add (use, old_cp, best_cp, *delta);
+}
+
+  return;
+}
+
 /* Try narrowing set IVS by removing CAND.  Return the cost of
the new set and store the differences in DELTA.  START is
the candidate with which we start narrowing.  */
@@ -6042,8 +6121,50 @@ try_improve_iv_set (struct ivopts_data *data, stru
   /* Try removing the candidates from the set instead.  */
   best_cost = iv_ca_prune (data, ivs, NULL, &best_delta);
 
-  /* Nothing more we can do.  */
   if (!best_delta)
+   {
+ /* So far candidate selecting algorithm tends to choose fewer IVs
+so that it can handl

[PATCH] condition decision based on uninitialized memory

2014-12-05 Thread Martin Liška

Hello.

I've just spent some time hunting memory leaks related to my isolated branch.
Valgrind reports many following errors:

==13612== Conditional jump or move depends on uninitialised value(s)
==13612==at 0xAC72A4: sparseset_bit_p (sparseset.h:147)
==13612==by 0xAC72A4: sparseset_and_compl(sparseset_def*, sparseset_def*, 
sparseset_def*) (sparseset.c:190)
==13612==by 0x9B296C: process_bb_lives(basic_block_def*, int&, bool) 
(lra-lives.c:885)
==13612==by 0x9B394A: lra_create_live_ranges_1(bool, bool) 
(lra-lives.c:1264)
==13612==by 0x9B426F: lra_create_live_ranges(bool, bool) (lra-lives.c:1329)
==13612==by 0x99B4A3: lra(_IO_FILE*) (lra.c:2350)
==13612==by 0x959B79: do_reload (ira.c:5391)
==13612==by 0x959B79: (anonymous 
namespace)::pass_reload::execute(function*) (ira.c:5561)
==13612==by 0xA22127: execute_one_pass(opt_pass*) (passes.c:2311)
==13612==by 0xA225F5: execute_pass_list_1(opt_pass*) (passes.c:2363)
==13612==by 0xA22607: execute_pass_list_1(opt_pass*) (passes.c:2364)
==13612==by 0xA22648: execute_pass_list(function*, opt_pass*) 
(passes.c:2374)
==13612==by 0x726F04: cgraph_node::expand() (cgraphunit.c:1773)
==13612==by 0x727BCF: output_in_order(bool) (cgraphunit.c:2011)

Following patch just replaces XNEWVAR with XCNEWVAR and it solves all these 
errors.
Ready for trunk?

Thanks,
Martin
>From ba3abc54772141011b1f8737201a3046031c0e42 Mon Sep 17 00:00:00 2001
From: mliska 
Date: Fri, 5 Dec 2014 13:23:30 +0100
Subject: [PATCH] sparseset: condition decision based on uninitialized memory.

gcc/ChangeLog:

2014-12-05  Martin Liska  

	* sparseset.c (sparseset_alloc): XNEWVAR is replaced with XCNEWVAR.
---
 gcc/sparseset.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gcc/sparseset.c b/gcc/sparseset.c
index 628a6e2..f5e5e38b 100644
--- a/gcc/sparseset.c
+++ b/gcc/sparseset.c
@@ -30,7 +30,7 @@ sparseset_alloc (SPARSESET_ELT_TYPE n_elms)
   unsigned int n_bytes = sizeof (struct sparseset_def)
 			 + ((n_elms - 1) * 2 * sizeof (SPARSESET_ELT_TYPE));
 
-  sparseset set = XNEWVAR (struct sparseset_def, n_bytes);
+  sparseset set = XCNEWVAR (struct sparseset_def, n_bytes);
 
   /* Mark the sparseset as defined to silence some valgrind uninitialized
  read errors when accessing set->sparse[n] when "n" is not, and never has
-- 
2.1.2



Re: [PATCH] AIX: Filename-based shared library versioning for libgcc_s

2014-12-05 Thread Michael Haubenwallner

On 11/25/2014 02:53 PM, David Edelsohn wrote:
> 
> Now that things have calmed down with respect to breakage on AIX, the
> patch for building libgcc_s is okay.

FYI:

libtool-2.4.4 does support "--with-aix-soname=aix|both|svr4" now too,
http://thread.gmane.org/gmane.comp.gnu.libtool.patches/11789/focus=11802

Is there a libtool upgrade planned for gcc-5 as well?

Thanks!
/haubi/


Re: [PATCH] condition decision based on uninitialized memory

2014-12-05 Thread Jakub Jelinek
On Fri, Dec 05, 2014 at 01:36:07PM +0100, Martin Liška wrote:
> I've just spent some time hunting memory leaks related to my isolated branch.
> Valgrind reports many following errors:
> 
> ==13612== Conditional jump or move depends on uninitialised value(s)
> ==13612==at 0xAC72A4: sparseset_bit_p (sparseset.h:147)
> ==13612==by 0xAC72A4: sparseset_and_compl(sparseset_def*, sparseset_def*, 
> sparseset_def*) (sparseset.c:190)
> ==13612==by 0x9B296C: process_bb_lives(basic_block_def*, int&, bool) 
> (lra-lives.c:885)
> ==13612==by 0x9B394A: lra_create_live_ranges_1(bool, bool) 
> (lra-lives.c:1264)
> ==13612==by 0x9B426F: lra_create_live_ranges(bool, bool) 
> (lra-lives.c:1329)
> ==13612==by 0x99B4A3: lra(_IO_FILE*) (lra.c:2350)
> ==13612==by 0x959B79: do_reload (ira.c:5391)
> ==13612==by 0x959B79: (anonymous 
> namespace)::pass_reload::execute(function*) (ira.c:5561)
> ==13612==by 0xA22127: execute_one_pass(opt_pass*) (passes.c:2311)
> ==13612==by 0xA225F5: execute_pass_list_1(opt_pass*) (passes.c:2363)
> ==13612==by 0xA22607: execute_pass_list_1(opt_pass*) (passes.c:2364)
> ==13612==by 0xA22648: execute_pass_list(function*, opt_pass*) 
> (passes.c:2374)
> ==13612==by 0x726F04: cgraph_node::expand() (cgraphunit.c:1773)
> ==13612==by 0x727BCF: output_in_order(bool) (cgraphunit.c:2011)
> 
> Following patch just replaces XNEWVAR with XCNEWVAR and it solves all these 
> errors.
> Ready for trunk?

No.  sparseset is intentionally uninitialized. If you build with valgrind
checking, sparseset is properly instrumented so that valgrind doesn't
complain, otherwise just ignore those.

Jakub


Re: [ARM,AArch64][testsuite] Fix vaddl and vaddw tests

2014-12-05 Thread Christophe Lyon
On 3 December 2014 at 17:12, Christophe Lyon  wrote:
> On 3 December 2014 at 15:22, Christophe Lyon  
> wrote:
>> Hi,
>>
>> Here is a fix for typos in the AdvSimd intrinsic tests, where vaddl
>> and vaddw didn't actually execute the tests. (The function was
>> declared in main, instead of called).
>>
>> This patch also fixes the expected output for these tests.
>
> And it looks like I'll have to apply the same obvious fix to vaddhn.c
>
Here is the patch to fix the 3 test cases (vaddl, vaddw, vaddhn).
OK?

2014-12-05  Christophe Lyon  

testsuite/
* gcc.target/aarch64/advsimd-intrinsics/vaddhn.c: Actually execute
the test.
* gcc.target/aarch64/advsimd-intrinsics/vaddl.c: Actually execute
the test. Fix expected output.
* gcc.target/aarch64/advsimd-intrinsics/vaddw.c: Likewise.


>>
>> OK?
>>
>> Thanks
>> Christophe.
>>
>> 2014-12-03  Christophe Lyon  
>>
>> testsuite/
>> * gcc.target/aarch64/advsimd-intrinsics/vaddl.c: Actually execute
>> the test. Fix expected output.
>> * gcc.target/aarch64/advsimd-intrinsics/vaddw.c: Likewise.
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddhn.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddhn.c
index 74b4b4d..58fd5ea 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddhn.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddhn.c
@@ -52,15 +52,13 @@ VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x,
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x,
 	   0x, 0x };
 
-#ifndef INSN_NAME
 #define INSN_NAME vaddhn
 #define TEST_MSG "VADDHN"
-#endif
 
-#define FNNAME1(NAME) void exec_ ## NAME (void)
+#define FNNAME1(NAME) exec_ ## NAME
 #define FNNAME(NAME) FNNAME1(NAME)
 
-FNNAME (INSN_NAME)
+void FNNAME (INSN_NAME) (void)
 {
   /* Basic test: vec64=vaddhn(vec128_a, vec128_b), then store the result.  */
 #define TEST_VADDHN1(INSN, T1, T2, W, W2, N)\
@@ -104,6 +102,6 @@ FNNAME (INSN_NAME)
 
 int main (void)
 {
-  FNNAME (INSN_NAME);
+  FNNAME (INSN_NAME) ();
   return 0;
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddl.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddl.c
index 861abec..030785d 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddl.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddl.c
@@ -5,13 +5,13 @@
 /* Expected results.  */
 VECT_VAR_DECL(expected,int,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
    0x33, 0x33, 0x33, 0x33 };
-VECT_VAR_DECL(expected,int,16,4) [] = { 0x33, 0x33, 0x33, 0x33 };
-VECT_VAR_DECL(expected,int,32,2) [] = { 0x33, 0x33 };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0x, 0x };
 VECT_VAR_DECL(expected,int,64,1) [] = { 0x };
-VECT_VAR_DECL(expected,uint,8,8) [] = { 0x3, 0x3, 0x3, 0x3,
-	0x3, 0x3, 0x3, 0x3 };
-VECT_VAR_DECL(expected,uint,16,4) [] = { 0x37, 0x37, 0x37, 0x37 };
-VECT_VAR_DECL(expected,uint,32,2) [] = { 0x3, 0x3 };
+VECT_VAR_DECL(expected,uint,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
+	0x33, 0x33, 0x33, 0x33 };
+VECT_VAR_DECL(expected,uint,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,uint,32,2) [] = { 0x, 0x };
 VECT_VAR_DECL(expected,uint,64,1) [] = { 0x };
 VECT_VAR_DECL(expected,poly,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
 	0x33, 0x33, 0x33, 0x33 };
@@ -45,15 +45,13 @@ VECT_VAR_DECL(expected,poly,16,8) [] = { 0x, 0x, 0x, 0x,
 VECT_VAR_DECL(expected,hfloat,32,4) [] = { 0x, 0x,
 	   0x, 0x };
 
-#ifndef INSN_NAME
 #define INSN_NAME vaddl
 #define TEST_MSG "VADDL"
-#endif
 
-#define FNNAME1(NAME) void exec_ ## NAME (void)
+#define FNNAME1(NAME) exec_ ## NAME
 #define FNNAME(NAME) FNNAME1(NAME)
 
-FNNAME (INSN_NAME)
+void FNNAME (INSN_NAME) (void)
 {
   /* Basic test: y=vaddl(x1,x2), then store the result.  */
 #define TEST_VADDL1(INSN, T1, T2, W, W2, N)\
@@ -117,6 +115,6 @@ FNNAME (INSN_NAME)
 
 int main (void)
 {
-  FNNAME (INSN_NAME);
+  FNNAME (INSN_NAME) ();
   return 0;
 }
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddw.c b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddw.c
index 5804cd7..95cbb31 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddw.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vaddw.c
@@ -5,13 +5,13 @@
 /* Expected results.  */
 VECT_VAR_DECL(expected,int,8,8) [] = { 0x33, 0x33, 0x33, 0x33,
    0x33, 0x33, 0x33, 0x33 };
-VECT_VAR_DECL(expected,int,16,4) [] = { 0x33, 0x33, 0x33, 0x33 };
-VECT_VAR_DECL(expected,int,32,2) [] = { 0x33, 0x33 };
+VECT_VAR_DECL(expected,int,16,4) [] = { 0x, 0x, 0x, 0x };
+VECT_VAR_DECL(expected,int,32,2) [] = { 0x, 0x };
 VECT_VAR_DECL(expected,int,64,1) [] = { 0x };
-VECT_VAR_DECL(expected,uint,8,8) [] = { 0x3, 0x3, 0

Re: [PATCH/AARCH64] v2 Add aligning of functions/loops/jumps

2014-12-05 Thread James Greenhalgh
On Sun, Nov 23, 2014 at 12:09:16AM +, Andrew Pinski wrote:
> Hi,
>   This is just a rebase of
> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01615.html as requested
> by https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01736.html.  Nothing
> has changed in it.

*ping* I'd like to see this patch revived, rebased and committed.

Thanks,
James

> 
> OK?  Built and tested on aarch64-elf with no regressions.
> 
> Thanks,
> Andrew Pinski
> 
> ChangeLog:
> 
> * config/aarch64/aarch64-protos.h (tune_params): Add align field.
> * config/aarch64/aarch64.c (generic_tunings): Specify align.
> (cortexa53_tunings): Likewise.
> (cortexa57_tunings): Likewise.
> (thunderx_tunings): Likewise.
> (aarch64_override_options): Set align_loops, align_jumps,
> align_functions based on what the tuning struct.

> Index: config/aarch64/aarch64-protos.h
> ===
> --- config/aarch64/aarch64-protos.h   (revision 217974)
> +++ config/aarch64/aarch64-protos.h   (working copy)
> @@ -170,6 +170,7 @@ struct tune_params
>const struct cpu_vector_cost *const vec_costs;
>const int memmov_cost;
>const int issue_rate;
> +  const int align;
>  };
>  
>  HOST_WIDE_INT aarch64_initial_elimination_offset (unsigned, unsigned);
> Index: config/aarch64/aarch64.c
> ===
> --- config/aarch64/aarch64.c  (revision 217974)
> +++ config/aarch64/aarch64.c  (working copy)
> @@ -314,7 +314,8 @@ static const struct tune_params generic_
>&generic_regmove_cost,
>&generic_vector_cost,
>NAMED_PARAM (memmov_cost, 4),
> -  NAMED_PARAM (issue_rate, 2)
> +  NAMED_PARAM (issue_rate, 2),
> +  NAMED_PARAM (align, 2),
>  };
>  
>  static const struct tune_params cortexa53_tunings =
> @@ -324,7 +325,8 @@ static const struct tune_params cortexa5
>&cortexa53_regmove_cost,
>&generic_vector_cost,
>NAMED_PARAM (memmov_cost, 4),
> -  NAMED_PARAM (issue_rate, 2)
> +  NAMED_PARAM (issue_rate, 2),
> +  NAMED_PARAM (align, 8),
>  };
>  
>  static const struct tune_params cortexa57_tunings =
> @@ -334,7 +336,8 @@ static const struct tune_params cortexa5
>&cortexa57_regmove_cost,
>&cortexa57_vector_cost,
>NAMED_PARAM (memmov_cost, 4),
> -  NAMED_PARAM (issue_rate, 3)
> +  NAMED_PARAM (issue_rate, 3),
> +  NAMED_PARAM (align, 8),
>  };
>  
>  static const struct tune_params thunderx_tunings =
> @@ -344,7 +347,8 @@ static const struct tune_params thunderx
>&thunderx_regmove_cost,
>&generic_vector_cost,
>NAMED_PARAM (memmov_cost, 6),
> -  NAMED_PARAM (issue_rate, 2)
> +  NAMED_PARAM (issue_rate, 2),
> +  NAMED_PARAM (align, 8),
>  };
>  
>  /* A processor implementing AArch64.  */
> @@ -6727,6 +6731,18 @@ aarch64_override_options (void)
>  #endif
>  }
>  
> +  /* If not opzimizing for size, set the default
> + alignment to what the target wants */
> +  if (!optimize_size)
> +{
> +  if (align_loops <= 0)
> + align_loops = aarch64_tune_params->align;
> +  if (align_jumps <= 0)
> + align_jumps = aarch64_tune_params->align;
> +  if (align_functions <= 0)
> + align_functions = aarch64_tune_params->align;
> +}
> +
>aarch64_override_options_after_change ();
>  }
>  



Re: [PATCH] Make IPA-CP propagate alignment information of pointers

2014-12-05 Thread Jay Foad
On 3 December 2014 at 14:36, Martin Jambor  wrote:
> On Wed, Dec 03, 2014 at 10:53:54AM +, Jay Foad wrote:
>> > Index: src/gcc/ipa-prop.h
>> > ===
>> > --- src.orig/gcc/ipa-prop.h
>> > +++ src/gcc/ipa-prop.h
>> > @@ -144,6 +144,17 @@ struct GTY(()) ipa_agg_jump_function
>> >
>> >  typedef struct ipa_agg_jump_function *ipa_agg_jump_function_p;
>> >
>> > +/* Info about poiner alignments. */
>>
>> "pointer"
>>
>> > +struct GTY(()) ipa_alignment
>> > +{
>> > +  /* The data fields below are valid only if known is true.  */
>> > +  bool known;
>>
>> Just curious: why is the "known" flag necessary? The comments for
>> ptr_info_def say that align=0 means unknown.
>
> It is necessary.  In IPA-CP, when know is false, this means the
> lattice is in TOP state (i.e. once we learn something about the
> parameter, let's overwrite this), whereas when it is true and
> alignment is 0, it means it is in BOTTOM state (i.e. we know we cannot
> rely on this and never will be able to).

Can't you use align=1, misalign=0 for TOP ? This means that we don't
know anything useful about the pointer yet, just that it's a multiple
of 1 (which is trivially true for all pointers, isn't it?).

When you have vectors of these struct they will pack MUCH more nicely
without the "bool known" field.

Thanks,
Jay.


Re: [PATCH/AARCH64] make bswap vector consistent with scalar name

2014-12-05 Thread Marcus Shawcroft
On 24 November 2014 at 17:49, Andrew Pinski  wrote:
> I had some local patches in my tree which adds a bswap tree code.
> This breaks the aarch64 back-end vectorizing of byteswaps as we use
> the standard mechanism to see if a tree code vectorizes (optabs).
> Since it make sense to have consistent of the pattern names between
> the vector version and the scalar version, I am proposing this patch
> to make them consistent.
>
> OK?  Build and tested on aarch64-elf with no regressions.
>
> Thanks,
> Andrew Pinski
>
> ChangeLog:
> * config/aarch64/aarch64-simd-builtins.def (bswap): Use CF2 rather
> than CF10 so 2 is appended on the code.
> * config/aarch64/aarch64-simd.md (bswap): Rename to ...
> (bswap2): This so it matches for the optabs.

OK /Marcus


Re: [PATCH] AIX: Filename-based shared library versioning for libgcc_s

2014-12-05 Thread David Edelsohn
On Fri, Dec 5, 2014 at 7:56 AM, Michael Haubenwallner
 wrote:
>
> On 11/25/2014 02:53 PM, David Edelsohn wrote:
>>
>> Now that things have calmed down with respect to breakage on AIX, the
>> patch for building libgcc_s is okay.
>
> FYI:
>
> libtool-2.4.4 does support "--with-aix-soname=aix|both|svr4" now too,
> http://thread.gmane.org/gmane.comp.gnu.libtool.patches/11789/focus=11802
>
> Is there a libtool upgrade planned for gcc-5 as well?

Patches are backported as needed.

- David


Re: [PATCH, Fortran] PR fortran/60414 fix ICE was: PR 60414: Patch proposal

2014-12-05 Thread Dominique Dhumieres
> this patch is ready for commit now. Please apply. There have been no 
> objections
> against doing dg-do compile only, since my last post in August.

Since I am stubborn, I have made the test 'dg-do run' and committed the patch
as revision r218422.

Thanks for the patch,

Dominique


[PATCH][AARCH64][5/5] Add macro fusion support for cmp/b.X for ThunderX

2014-12-05 Thread Kyrill Tkachov

Hi all,

Andrew posted this patch sometime ago (before stage1 closed) and I had 
rebased it on top of the

other macro fusion patches in that series.
This is a respin of that patch with the comment about not calling 
get_attr_type
repeatedly resolved 
(https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02251.html)


Tested aarch64-none-elf.

Is this ok to go in?

Thanks,
Kyrill

2014-12-01  Andrew Pinski  apin...@cavium.com
Kyrylo Tkachov  kyrylo.tkac...@arm.com

 * config/aarch64/aarch64.c (AARCH64_FUSE_CMP_BRANCH): New define.
 (thunderx_tunings): Add AARCH64_FUSE_CMP_BRANCH to fuseable_ops.
 (aarch_macro_fusion_pair_p): Handle AARCH64_FUSE_CMP_BRANCH.commit 3035cf0bed7058e45ca70cb33face93d7dc1ce9c
Author: Kyrylo Tkachov 
Date:   Fri Nov 14 09:16:08 2014 +

[AArch64][apinski] CMP+branch macro fusion

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 93932c8..3b76071 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -310,6 +310,7 @@ static const struct cpu_vector_cost cortexa57_vector_cost =
 #define AARCH64_FUSE_ADRP_ADD	(1 << 1)
 #define AARCH64_FUSE_MOVK_MOVK	(1 << 2)
 #define AARCH64_FUSE_ADRP_LDR	(1 << 3)
+#define AARCH64_FUSE_CMP_BRANCH	(1 << 4)
 
 #if HAVE_DESIGNATED_INITIALIZERS && GCC_VERSION >= 2007
 __extension__
@@ -356,7 +357,7 @@ static const struct tune_params thunderx_tunings =
   &generic_vector_cost,
   NAMED_PARAM (memmov_cost, 6),
   NAMED_PARAM (issue_rate, 2),
-  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_NOTHING)
+  NAMED_PARAM (fuseable_ops, AARCH64_FUSE_CMP_BRANCH)
 };
 
 /* A processor implementing AArch64.  */
@@ -10522,6 +10523,20 @@ aarch_macro_fusion_pair_p (rtx_insn *prev, rtx_insn *curr)
 }
 }
 
+  if ((aarch64_tune_params->fuseable_ops & AARCH64_FUSE_CMP_BRANCH)
+  && any_condjump_p (curr))
+{
+  enum attr_type prev_type = get_attr_type (prev);
+
+  /* FIXME: this misses some which is considered simple arthematic
+ instructions for ThunderX.  Simple shifts are missed here.  */
+  if (prev_type == TYPE_ALUS_SREG
+  || prev_type == TYPE_ALUS_IMM
+  || prev_type == TYPE_LOGICS_REG
+  || prev_type == TYPE_LOGICS_IMM)
+return true;
+}
+
   return false;
 }
 

[PATCH, i386] Fix PR64003

2014-12-05 Thread Ilya Enkovich
Hi,

This patch fixes PR target/64003 by avoiding functions calls during 
computations of "length" attribute for short jump instructions.  It is achieved 
by having separate templates for prefixed and not prefixed instructions.  
Please see discussion in bugzilla for reasoning.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  Valgrind run for 
reproducer shows problem is fixed.  OK for trunk?

Thanks,
Ilya
--
2014-12-05  Ilya Enkovich  

* config/i386/i386.md (*jcc_1_bnd): New.
(*jcc_2_bnd): New.
(jump_bnd): New.
(*jcc_1): Remove bnd prefix.
(*jcc_2): Likewise.
(jump): Likewise.


diff --git a/gcc/config/i386/i386.md b/gcc/config/i386/i386.md
index 88435d6..9019ed8 100644
--- a/gcc/config/i386/i386.md
+++ b/gcc/config/i386/i386.md
@@ -10958,6 +10958,24 @@
 ;; Basic conditional jump instructions.
 ;; We ignore the overflow flag for signed branch instructions.
 
+(define_insn "*jcc_1_bnd"
+  [(set (pc)
+   (if_then_else (match_operator 1 "ix86_comparison_operator"
+ [(reg FLAGS_REG) (const_int 0)])
+ (label_ref (match_operand 0))
+ (pc)))]
+  "TARGET_MPX && ix86_bnd_prefixed_insn_p (insn)"
+  "bnd %+j%C1\t%l0"
+  [(set_attr "type" "ibr")
+   (set_attr "modrm" "0")
+   (set (attr "length")
+  (if_then_else (and (ge (minus (match_dup 0) (pc))
+ (const_int -126))
+ (lt (minus (match_dup 0) (pc))
+ (const_int 128)))
+(const_int 3)
+(const_int 7)))])
+
 (define_insn "*jcc_1"
   [(set (pc)
(if_then_else (match_operator 1 "ix86_comparison_operator"
@@ -10965,10 +10983,10 @@
  (label_ref (match_operand 0))
  (pc)))]
   ""
-  "%!%+j%C1\t%l0"
+  "%+j%C1\t%l0"
   [(set_attr "type" "ibr")
(set_attr "modrm" "0")
-   (set (attr "length_nobnd")
+   (set (attr "length")
   (if_then_else (and (ge (minus (match_dup 0) (pc))
  (const_int -126))
  (lt (minus (match_dup 0) (pc))
@@ -10976,6 +10994,24 @@
 (const_int 2)
 (const_int 6)))])
 
+(define_insn "*jcc_2_bnd"
+  [(set (pc)
+   (if_then_else (match_operator 1 "ix86_comparison_operator"
+ [(reg FLAGS_REG) (const_int 0)])
+ (pc)
+ (label_ref (match_operand 0]
+  "TARGET_MPX && ix86_bnd_prefixed_insn_p (insn)"
+  "bnd %+j%c1\t%l0"
+  [(set_attr "type" "ibr")
+   (set_attr "modrm" "0")
+   (set (attr "length")
+  (if_then_else (and (ge (minus (match_dup 0) (pc))
+ (const_int -126))
+ (lt (minus (match_dup 0) (pc))
+ (const_int 128)))
+(const_int 3)
+(const_int 7)))])
+
 (define_insn "*jcc_2"
   [(set (pc)
(if_then_else (match_operator 1 "ix86_comparison_operator"
@@ -10983,10 +11019,10 @@
  (pc)
  (label_ref (match_operand 0]
   ""
-  "%!%+j%c1\t%l0"
+  "%+j%c1\t%l0"
   [(set_attr "type" "ibr")
(set_attr "modrm" "0")
-   (set (attr "length_nobnd")
+   (set (attr "length")
   (if_then_else (and (ge (minus (match_dup 0) (pc))
  (const_int -126))
  (lt (minus (match_dup 0) (pc))
@@ -11420,13 +11456,28 @@
 
 ;; Unconditional and other jump instructions
 
+(define_insn "jump_bnd"
+  [(set (pc)
+   (label_ref (match_operand 0)))]
+  "TARGET_MPX && ix86_bnd_prefixed_insn_p (insn)"
+  "bnd jmp\t%l0"
+  [(set_attr "type" "ibr")
+   (set (attr "length")
+  (if_then_else (and (ge (minus (match_dup 0) (pc))
+ (const_int -126))
+ (lt (minus (match_dup 0) (pc))
+ (const_int 128)))
+(const_int 3)
+(const_int 6)))
+   (set_attr "modrm" "0")])
+
 (define_insn "jump"
   [(set (pc)
(label_ref (match_operand 0)))]
   ""
-  "%!jmp\t%l0"
+  "jmp\t%l0"
   [(set_attr "type" "ibr")
-   (set (attr "length_nobnd")
+   (set (attr "length")
   (if_then_else (and (ge (minus (match_dup 0) (pc))
  (const_int -126))
  (lt (minus (match_dup 0) (pc))


[PATCH, committed] AIX lcomm alignment

2014-12-05 Thread David Edelsohn
AIX 6.1 and above provides a section alignment argument for the local
comment pseudo-op (.lcomm) which GCC uses to implement
ASM_OUTPUT_ALIGNED_LOCAL from emit_local().  In the AIX
implementation, the first alignment for a particular CSECT name wins
-- the CSECT alignment is not the maximum alignment mentioned.

This patch adjusts the macro to encode the alignment in the CSECT
name, which creates multiple CSECTs and all objects in the CSECT have
the same alignment.  This also changes the default alignment to word
alignment.

This patch fixes some of the vector instruction testsuite failures on AIX.

Bootstrapped and regression tested on powerpc-ibm-aix7.1.0.0

Thanks, David

* config/rs6000/xcoff.h (ASM_OUTPUT_ALIGNED_LOCAL): Append alignment
to section name. Increase default alignment to word.

Index: config/rs6000/xcoff.h
===
--- config/rs6000/xcoff.h   (revision 218423)
+++ config/rs6000/xcoff.h   (working copy)
@@ -251,14 +251,15 @@
   do { fputs (LOCAL_COMMON_ASM_OP, (FILE));\
RS6000_OUTPUT_BASENAME ((FILE), (NAME));\
if ((ALIGN) > 32)   \
-fprintf ((FILE), ","HOST_WIDE_INT_PRINT_UNSIGNED",%s,%u\n",\
+fprintf ((FILE), ","HOST_WIDE_INT_PRINT_UNSIGNED",%s%u_,%u\n", \
  (SIZE), xcoff_bss_section_name,   \
+ floor_log2 ((ALIGN) / BITS_PER_UNIT), \
  floor_log2 ((ALIGN) / BITS_PER_UNIT));\
else if ((SIZE) > 4)\
-fprintf ((FILE), ","HOST_WIDE_INT_PRINT_UNSIGNED",%s,3\n", \
+fprintf ((FILE), ","HOST_WIDE_INT_PRINT_UNSIGNED",%s3_,3\n",   \
  (SIZE), xcoff_bss_section_name);  \
else\
-fprintf ((FILE), ","HOST_WIDE_INT_PRINT_UNSIGNED",%s\n",   \
+fprintf ((FILE), ","HOST_WIDE_INT_PRINT_UNSIGNED",%s,2\n", \
  (SIZE), xcoff_bss_section_name);  \
  } while (0)
 #endif


Re: [PATCH] condition decision based on uninitialized memory

2014-12-05 Thread Martin Liška

On 12/05/2014 02:00 PM, Jakub Jelinek wrote:

On Fri, Dec 05, 2014 at 01:36:07PM +0100, Martin Liška wrote:

I've just spent some time hunting memory leaks related to my isolated branch.
Valgrind reports many following errors:

==13612== Conditional jump or move depends on uninitialised value(s)
==13612==at 0xAC72A4: sparseset_bit_p (sparseset.h:147)
==13612==by 0xAC72A4: sparseset_and_compl(sparseset_def*, sparseset_def*, 
sparseset_def*) (sparseset.c:190)
==13612==by 0x9B296C: process_bb_lives(basic_block_def*, int&, bool) 
(lra-lives.c:885)
==13612==by 0x9B394A: lra_create_live_ranges_1(bool, bool) 
(lra-lives.c:1264)
==13612==by 0x9B426F: lra_create_live_ranges(bool, bool) (lra-lives.c:1329)
==13612==by 0x99B4A3: lra(_IO_FILE*) (lra.c:2350)
==13612==by 0x959B79: do_reload (ira.c:5391)
==13612==by 0x959B79: (anonymous 
namespace)::pass_reload::execute(function*) (ira.c:5561)
==13612==by 0xA22127: execute_one_pass(opt_pass*) (passes.c:2311)
==13612==by 0xA225F5: execute_pass_list_1(opt_pass*) (passes.c:2363)
==13612==by 0xA22607: execute_pass_list_1(opt_pass*) (passes.c:2364)
==13612==by 0xA22648: execute_pass_list(function*, opt_pass*) 
(passes.c:2374)
==13612==by 0x726F04: cgraph_node::expand() (cgraphunit.c:1773)
==13612==by 0x727BCF: output_in_order(bool) (cgraphunit.c:2011)

Following patch just replaces XNEWVAR with XCNEWVAR and it solves all these 
errors.
Ready for trunk?


No.  sparseset is intentionally uninitialized. If you build with valgrind
checking, sparseset is properly instrumented so that valgrind doesn't
complain, otherwise just ignore those.

Jakub



Thank you Jakub for reply, bergner explained me already situation.
Valgrind checking is new for me ;)

Martin


Re:[PATCH, i386] Fix PR64003

2014-12-05 Thread Uros Bizjak
Hello!

> This patch fixes PR target/64003 by avoiding functions calls during 
> computations of "length"
> attribute for short jump instructions.  It is achieved by having separate 
> templates for prefixed and
> not prefixed instructions.  Please see discussion in bugzilla for reasoning.
>
> Bootstrapped and tested on x86_64-unknown-linux-gnu.  Valgrind run for 
> reproducer shows
> problem is fixed.  OK for trunk?
>
> 2014-12-05  Ilya Enkovich  
>
> * config/i386/i386.md (*jcc_1_bnd): New.
> (*jcc_2_bnd): New.
> (jump_bnd): New.
> (*jcc_1): Remove bnd prefix.
> (*jcc_2): Likewise.
> (jump): Likewise.

Let's proceed with the above version to stay on the safe side for now.

OK for mainline, but please investigate usage of ADJUST_INSN_LENGTH
for ibr-type and ret instructions.

Thanks,
Uros.


[PATCH, PR 64192] Add forgotten conversion from bits to bytes

2014-12-05 Thread Martin Jambor
Hi,

at some point I lost an important division of bit offset by
BITS_PER_UNIT in my alignment IPA-CP propagation patch. That lead to a
few failures on i686 reported as PR 64192.

This patch adds it together with a slight improvement of the guarding
check which I suppose will never trigger but it does ensure the
division will never loose information.

I consider this change obvious and would really like to commit it
before I leave for the weekend, so I will do so after it finishes
bootstrapping and testing on i686.  It has already passed bootstrap
and testing on x86_64-linux.

Thanks,

Martin


2014-12-05  Martin Jambor  

PR ipa/64192
* ipa-prop.c (ipa_compute_jump_functions_for_edge): Convert alignment
from bits to bytes after checking they are byte-aligned.

Index: src/gcc/ipa-prop.c
===
--- src.orig/gcc/ipa-prop.c
+++ src/gcc/ipa-prop.c
@@ -1739,10 +1739,11 @@ ipa_compute_jump_functions_for_edge (str
  unsigned align;
 
  if (get_pointer_alignment_1 (arg, &align, &hwi_bitpos)
- && align > BITS_PER_UNIT)
+ && align % BITS_PER_UNIT == 0
+ && hwi_bitpos % BITS_PER_UNIT == 0)
{
  jfunc->alignment.known = true;
- jfunc->alignment.align = align;
+ jfunc->alignment.align = align / BITS_PER_UNIT;
  jfunc->alignment.misalign = hwi_bitpos / BITS_PER_UNIT;
}
  else


[patch, testsuite, committed] don't use "dg-do run" in vect tests, again

2014-12-05 Thread Sandra Loosemore

This patch is a follow-up to this one

https://gcc.gnu.org/ml/gcc-patches/2014-09/msg02337.html

which I checked in a couple months ago to fix execution failures from 
trying to run ARM NEON code on a target that didn't support those 
instructions.  I noticed that since then, some new gcc.dg/vect tests 
have been added that fail in the same way.  Since it's exactly the same 
problem and solution as the previous batch, I've committed the attached 
patch as an obvious fix.


-Sandra

2014-12-05  Sandra Loosemore  

	gcc/testsuite/
	* gcc.dg/vect/pr63341-1.c: Remove explicit "dg-do run".
	* gcc.dg/vect/pr63341-2.c: Likewise.
	* gcc.dg/vect/pr63379.c: Likewise.
	* gcc.dg/vect/pr63605.c: Likewise.
Index: gcc/testsuite/gcc.dg/vect/pr63341-1.c
===
--- gcc/testsuite/gcc.dg/vect/pr63341-1.c	(revision 218426)
+++ gcc/testsuite/gcc.dg/vect/pr63341-1.c	(working copy)
@@ -1,5 +1,4 @@
 /* PR tree-optimization/63341 */
-/* { dg-do run } */
 
 #include "tree-vect.h"
 
Index: gcc/testsuite/gcc.dg/vect/pr63341-2.c
===
--- gcc/testsuite/gcc.dg/vect/pr63341-2.c	(revision 218426)
+++ gcc/testsuite/gcc.dg/vect/pr63341-2.c	(working copy)
@@ -1,5 +1,4 @@
 /* PR tree-optimization/63341 */
-/* { dg-do run } */
 
 #include "tree-vect.h"
 
Index: gcc/testsuite/gcc.dg/vect/pr63379.c
===
--- gcc/testsuite/gcc.dg/vect/pr63379.c	(revision 218426)
+++ gcc/testsuite/gcc.dg/vect/pr63379.c	(working copy)
@@ -1,5 +1,4 @@
 /* PR tree-optimization/63379  */
-/* { dg-do run } */
 
 #include "tree-vect.h"
 
Index: gcc/testsuite/gcc.dg/vect/pr63605.c
===
--- gcc/testsuite/gcc.dg/vect/pr63605.c	(revision 218426)
+++ gcc/testsuite/gcc.dg/vect/pr63605.c	(working copy)
@@ -1,5 +1,3 @@
-/* { dg-do run } */
-
 #include "tree-vect.h"
 
 extern void abort (void);


Re: [PATCH x86] Enable v64qi permutations.

2014-12-05 Thread Ilya Tocar
On 04 Dec 15:16, Uros Bizjak wrote:
> On Thu, Dec 4, 2014 at 2:53 PM, Ilya Tocar  wrote:
> 
> >> >>> >> Can you add a few testcases?
> >> >>> >
> >> >>> > Isn't it already covered by gcc.dg/torture/vshuf* ?
> >> >>> >
> >> >>>
> >> >>> I didn't see them fail on my machines today.
> >> >>
> >> >> Those are executable testcases, those better should not fail.
> >> >> The patch just improved code generation and the testcases test
> >> >> if the improved code generation works well.
> >> >> Did you mean some scan-assembler test that verifies the better code
> >> >> generation?  Guess it is possible, though fragile.
> >> >
> >> > I think that existing executable testcases adequately cover the
> >> > functionality of the patch.
> >> >
> >> > The patch is OK.
> >>
> >> BTW, the ChangeLog is missing.
> >>
> > * config/i386/i386.c (ix86_expand_vec_perm_vpermi2): Handle v64qi.
> > (expand_vec_perm_broadcast_1): Ditto.
> > (expand_vec_perm_vpermi2_vpshub2): New.
> > (ix86_expand_vec_perm_const_1): Use it.
> > (ix86_vectorize_vec_perm_const_ok): Handle v64qi.
> > * config/i386/sse.md (VEC_PERM_AVX2): Add v64qi.
> > (VEC_PERM_CONST): Ditto.
> >> index ca5d720..6252e7e 100644
> >> --- a/gcc/config/i386/sse.md
> >> +++ b/gcc/config/i386/sse.md
> >> @@ -10678,7 +10678,7 @@
> >> (V8SF "TARGET_AVX2") (V4DF "TARGET_AVX2")
> >> (V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
> >> (V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
> >> -   (V32HI "TARGET_AVX512BW")])
> >> +   (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512VBMI")])
> >>
> >> I don't think change for VBMI target belongs in this patch.
> >>
> > Those changes enable non-const v64qi permutes
> > (via single vpermi2b insn), should I split them into separate patch?
> 
> If they are not on the same topic, then please yes. Please don't mix
> separate issues together.
>
OK.
Patch bellow adds variable v64qi permutations.
OK for trunk?
(I plan to commit both of them simultaneously, if this part is approved)

 * config/i386/i386.c (ix86_expand_vec_perm_vpermi2): Handle v64qi.
 * config/i386/sse.md (VEC_PERM_AVX2): Add v64qi.
---
 gcc/config/i386/i386.c | 4 
 gcc/config/i386/sse.md | 2 +-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index ce5dfad..c4dbf78 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -21831,6 +21831,10 @@ ix86_expand_vec_perm_vpermi2 (rtx target, rtx op0, rtx 
mask, rtx op1,
   if (TARGET_AVX512VL && TARGET_AVX512BW)
gen = gen_avx512vl_vpermi2varv16hi3;
   break;
+case V64QImode:
+  if (TARGET_AVX512VBMI)
+   gen = gen_avx512bw_vpermi2varv64qi3;
+  break;
 case V32HImode:
   if (TARGET_AVX512BW)
gen = gen_avx512bw_vpermi2varv32hi3;
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 734e6b4..cfbe40c 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -10691,7 +10691,7 @@
(V8SF "TARGET_AVX2") (V4DF "TARGET_AVX2")
(V16SF "TARGET_AVX512F") (V8DF "TARGET_AVX512F")
(V16SI "TARGET_AVX512F") (V8DI "TARGET_AVX512F")
-   (V32HI "TARGET_AVX512BW")])
+   (V32HI "TARGET_AVX512BW") (V64QI "TARGET_AVX512VBMI")])
 
 (define_expand "vec_perm"
   [(match_operand:VEC_PERM_AVX2 0 "register_operand")
-- 
1.8.3.1



Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-05 Thread Jakub Jelinek
On Mon, Nov 24, 2014 at 01:06:45AM +0100, FX wrote:
> tl;dr: One question to build maintainers, and one patch submitted to toplevel 
> configure.ac
> 
> ---
> 
> I’m looked into the issue some more, and am comparing two builds of trunk 
> (exact same source), one configured with system compiler (clang) in PATH, the 
> other with GCC 4.9.2 in PATH.
> At the toplevel configure, the only meaningful difference is that the 
> gcc-based build sets stage1_ldflags='-static-libstdc++ -static-libgcc' while 
> the clang-based has stage1_ldflags='' (clang doesn’t recognized 
> -static-libstdc++).
> 
> This is included into the toplevel Makefile as STAGE1_LDFLAGS (the comment 
> appropriately says "Linker flags to use on the host, for stage1 or when not 
> bootstrapping”).
> Those are exported by HOST_EXPORTS, which is are then used by 
> configure-libcc1, all-libcc1, etc. Thus, we end up using STAGE1_LDFLAGS, 
> which correspond to the system compiler, instead of the stage3 compiler (as 
> we should).
> 
> So, this is “false negative” part of the problem (namely, why we don’t see 
> the failure when bootstrapping with clang): we use STAGE1_LDFLAGS in building 
> libcc1, and with clang as system compiler we don’t use static linking of the 
> C++ library. This part, I don’t know how to fix: it is for the build experts 
> to address. It is a real problem: it leads to libcc1.so being linked 
> dynamically to libstdc++ and libgcc, instead of statically (as it should).
> 
> ---
> 
> Second part of the question: when the freshly built g++ is used, we need to 
> pass the appropriate -B options. As I understand it, the appropriate place 
> for that is in the toplevel configure.ac, where we already pass down the 
> respective -L options. Indeed, the attached patch restores bootstrap on 
> x86_64-apple-darwin14 with gcc as system compiler (and doesn’t break the 
> bootstrap with clang as system compiler).
> 
> OK to commit?

Reading the toplevel Makefile and trying to understand how things work
for non-bootstrap vs. bootstrap host dirs that aren't bootstrapped,
I'd say the right fix should be something like following
(bootstrapping/regtesting it right now on x86_64-linux and i686-linux,
though it won't make much difference there, on x86_64-linux
STAGE1_LDFLAGS is equal to POSTSTAGE1_LDFLAGS and STAGE1_LIBS is equal
to POSTSTAGE1_LIBS.  On i686-linux there is at least a difference
for some reason (possibly related with my setarch and gcc -m32 wrappers
hacks to make i686-linux bootstrap work on x86_64-linux box) in
*STAGE1_LDFLAGS, only the POSTSTAGE1_LDFLAGS is -static-libstdc++ 
-static-libgcc.

>From my reading, POSTSTAGE1_HOST_EXPORTS is clearly inappropriate for the
modules like libcc1, because it uses prev-gcc/, while we want to use gcc/,
but otherwise looking at the HOST_EXPORTS vs. POSTSTAGE1_HOST_EXPORTS
differences, LDFLAGS and HOST_LIBS is what needs changing.
For some reason POSTSTAGE1_HOST_EXPORTS sets LDFLAGS to 
$(POSTSTAGE1_LDFLAGS) $(BOOT_LDFLAGS)
(the first part is ok and clear, the latter differs from the HOST_EXPORTS
$(STAGE1_LDFLAGS) $(LDFLAGS).
With my patch below, one actually ends up with
$(POSTSTAGE1_LDFLAGS) $(LDFLAGS_FOR_TARGET)
for libcc1 when bootstrapping in LDFLAGS, while previously
$(STAGE1_LDFLAGS) $(LDFLAGS_FOR_TARGET)
was used.  STAGE1_L{DFLAGS,IBS} is only used in $(HOST_EXPORTS),
so at least in theory I think my patch should DTRT.

Can you please test it on Darwin (or whatever other target has similar
issues with bootstrapping libcc1)?

2014-12-05  Jakub Jelinek  

PR bootstrap/64023
* Makefile.tpl (EXTRA_TARGET_FLAGS): Set STAGE1_LDFLAGS
to POSTSTAGE1_LDFLAGS and STAGE1_LIBS to POSTSTAGE1_LIBS.
* Makefile.in: Regenerated.

--- Makefile.tpl.jj 2014-11-12 09:31:59.0 +0100
+++ Makefile.tpl2014-12-05 17:14:16.115295667 +0100
@@ -659,6 +659,8 @@ EXTRA_TARGET_FLAGS = \
'WINDRES=$$(WINDRES_FOR_TARGET)' \
'WINDMC=$$(WINDMC_FOR_TARGET)' \
'XGCC_FLAGS_FOR_TARGET=$(XGCC_FLAGS_FOR_TARGET)' \
+   'STAGE1_LDFLAGS=$$(POSTSTAGE1_LDFLAGS)' \
+   'STAGE1_LIBS=$$(POSTSTAGE1_LIBS)' \
"TFLAGS=$$TFLAGS"
 
 TARGET_FLAGS_TO_PASS = $(BASE_FLAGS_TO_PASS) $(EXTRA_TARGET_FLAGS)
--- Makefile.in.jj  2014-11-28 14:40:52.0 +0100
+++ Makefile.in 2014-12-05 17:15:04.322439003 +0100
@@ -853,6 +853,8 @@ EXTRA_TARGET_FLAGS = \
'WINDRES=$$(WINDRES_FOR_TARGET)' \
'WINDMC=$$(WINDMC_FOR_TARGET)' \
'XGCC_FLAGS_FOR_TARGET=$(XGCC_FLAGS_FOR_TARGET)' \
+   'STAGE1_LDFLAGS=$$(POSTSTAGE1_LDFLAGS)' \
+   'STAGE1_LIBS=$$(POSTSTAGE1_LIBS)' \
"TFLAGS=$$TFLAGS"
 
 TARGET_FLAGS_TO_PASS = $(BASE_FLAGS_TO_PASS) $(EXTRA_TARGET_FLAGS)


Jakub


Re: [PATCH][AArch64] Use std::swap instead of manually swapping

2014-12-05 Thread Kyrill Tkachov

Ping.

https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01426.html

Thanks,
Kyrill

On 27/11/14 15:37, Kyrill Tkachov wrote:

Ping.

Thanks,
Kyrill

On 13/11/14 09:42, Kyrill Tkachov wrote:

Hi all,

Following the trend in i386 and alpha, this patch uses std::swap to
perform swapping of values in the aarch64 backend instead of declaring
temporaries.
Tested and bootstrapped on aarch64-linux.

Ok for trunk?

Thanks,
Kyrill


2014-11-13  Kyrylo Tkachov  

   * config/aarch64/aarch64.c (aarch64_evpc_ext): Use std::swap instead
   of manual swapping implementation.
   (aarch64_expand_vec_perm_const_1): Likewise.








Re: [PATCH] Add force option to find_best_rename_reg in regrename pass

2014-12-05 Thread Eric Botcazou
> 2014-11-26 Thomas Preud'homme thomas.preudho...@arm.com\
> 
>   * regrename.c (find_best_rename_reg): Rename to ...
>   (find_rename_reg): This. Also add a parameter to skip tick check.
>   * regrename.h: Likewise.
>   * config/c6x/c6x.c (try_rename_operands): Adapt to above renaming.

OK for mainline, but investigate whether you can better format the 
config/c6x/c6x.c line, for example:

+ best_reg
+   = find_rename_reg (this_head, super_class, &unavailable, old_reg, true);

-- 
Eric Botcazou


Re: [PATCH 0/4][AArch64] PR/63870 Improve handling of errors in SIMD intrinsics

2014-12-05 Thread Charles Baylis
On 5 December 2014 at 11:45, Alan Lawrence  wrote:
> Following on from Charles Baylis' patch to improve the error message when
> expanding arguments with qualifier_lane_index, this applies similar
> treatment to __builtin_aarch64_im_lane_boundsi (using for e.g. vset_lane and
> vext), and the more general case of immediates which should be constant but
> aren't.
>
> These patches depend upon the __aarch64_lane macro in
> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg03134.html .
>
> All patches cross-tested with check-gcc on aarch64-none-elf and
> aarch64_be-none-elf.
>
> Ok for trunk (following
> https://gcc.gnu.org/ml/gcc-patches/2014-11/msg03134.html) ?

All look good to me (but I can't approve), except that the changelogs
should be marked with PR target/63870

I was about to submit a patch series for vldN_lane and vstN_lane with
a version of patch #1, but yours is much better than mine. I'll drop
that part and send it on ASAP.

Cheers
Charles


Re: [ping] account for register spans in expand_builtin_init_dwarf_reg_sizes

2014-12-05 Thread Olivier Hainque

On Dec 4, 2014, at 23:14 , Jason Merrill  wrote:

> On 11/24/2014 03:08 AM, Olivier Hainque wrote:
>> +  if (init_state->processed_regno[regno])
>> +return;
> 
> I would expect this to go in the loop in expand_builtin_init_dwarf_reg_sizes, 
> before we look up a span for the regno.

Sure.

>  OK with that change.

Great, just checked in after re bootstrap and
regtest on x86_64-linux. Thanks :-)

Regards,

Olivier



Re: [PATCH AARCH64]load store pair optimization using sched_fusion pass.

2014-12-05 Thread Marcus Shawcroft
On 18 November 2014 at 08:34, Bin Cheng  wrote:

> 2014-11-18  Bin Cheng  
>
> * config/aarch64/aarch64.md (load_pair): Split to
> load_pairsi, load_pairdi, load_pairsf and load_pairdf.
> (load_pairsi, load_pairdi, load_pairsf, load_pairdf): Split
> from load_pair.  New alternative to support int/fp
> registers in fp/int mode patterns.
> (store_pair:): Split to store_pairsi, store_pairdi,
> store_pairsf and store_pairdi.
> (store_pairsi, store_pairdi, store_pairsf, store_pairdf): Split
> from store_pair.  New alternative to support int/fp
> registers in fp/int mode patterns.
> (*load_pair_extendsidi2_aarch64): New pattern.
> (*load_pair_zero_extendsidi2_aarch64): New pattern.
> (aarch64-ldpstp.md): Include.
> * config/aarch64/aarch64-ldpstp.md: New file.
> * config/aarch64/aarch64-protos.h (aarch64_gen_adjusted_ldpstp):
> New.
> (extract_base_offset_in_addr): New.
> (aarch64_operands_ok_for_ldpstp): New.
> (aarch64_operands_adjust_ok_for_ldpstp): New.
> * config/aarch64/aarch64.c (enum sched_fusion_type): New enum.
> (TARGET_SCHED_FUSION_PRIORITY): New hook.
> (fusion_load_store): New functon.
> (extract_base_offset_in_addr): New function.
> (aarch64_gen_adjusted_ldpstp): New function.
> (aarch64_sched_fusion_priority): New function.
> (aarch64_operands_ok_for_ldpstp): New function.
> (aarch64_operands_adjust_ok_for_ldpstp): New function.
>
> 2014-11-18  Bin Cheng  
>
> * gcc.target/aarch64/ldp-stp-1.c: New test.
> * gcc.target/aarch64/ldp-stp-2.c: New test.
> * gcc.target/aarch64/ldp-stp-3.c: New test.
> * gcc.target/aarch64/ldp-stp-4.c: New test.
> * gcc.target/aarch64/ldp-stp-5.c: New test.
> * gcc.target/aarch64/lr_free_1.c: Disable scheduling fusion
> and peephole2 pass.

Committed, thanks. /Marcus


Re: [PING][PATCH] [AARCH64, NEON] Improve vcls(q?) vcnt(q?) and vld1(q?)_dup intrinsics

2014-12-05 Thread Marcus Shawcroft
On 19 November 2014 at 06:14, Yangfei (Felix)  wrote:

> Index: gcc/ChangeLog
> ===
> --- gcc/ChangeLog   (revision 217717)
> +++ gcc/ChangeLog   (working copy)
> @@ -1,3 +1,14 @@
> +2014-11-13  Felix Yang  
> +   Shanyao Chen  
> +
> +   * config/aarch64/aarch64-simd.md (clrsb2, popcount2): New
> +   patterns.
> +   * config/aarch64/aarch64-simd-builtins.def (clrsb, popcount): New
> +   builtins.
> +   * config/aarch64/arm_neon.h (vcls_s8, vcls_s16, vcls_s32, vclsq_s8,
> +   vclsq_s16, vclsq_s32, vcnt_p8, vcnt_s8, vcnt_u8, vcntq_p8, vcntq_s8,
> +   vcntq_u8): Rewrite using builtin functions.
> +

OK Thanks /Marcus


Re: Compare-elim pass (was: Re: [PATCH] Fix PR 61225)

2014-12-05 Thread Eric Botcazou
> --quote--
> If we want to use this pass for x86, then for 4.8 we should also fix the
> discrepancy between the compare-elim canonical
> 
>   [(operate)
>(set-cc)]
> 
> and the combine canonical
> 
>   [(set-cc)
>(operate)]
> 
> (Because of the simplicity of the substitution in compare-elim, I prefer
> the former as the canonical canonical.)
> --/quote--

I agree with the above.

> There were some patches flowing around [2], [3] that enhanced
> compare-elim pass for x86 needs, but the target never switched to new
> pass, mostly because compare-elim pass did not catch all cases that
> traditional RTX combine pass did.

Does [2] really work with the mode mismatch?  See the pending patch at
  https://gcc.gnu.org/ml/gcc-patches/2014-11/msg03458.html

> Due to the above, I would like to propose that existing RTX compare
> pass be updated to handle [(operate)(set-cc)] patterns (exclusively?).

That's already what it does though, did you mean the opposite?  Or did you 
mean to write "combine" instead of "compare"?

> There is also hidden benefit for "other", compare-elim only targets.
> Having this pass enabled on a wildly popular target would help
> catching eventual bugs in the pass.

FWIW we're about to submit a port that makes a heavy use of it.

-- 
Eric Botcazou


Re: [PATCH 0/4][AArch64] PR/63870 Improve handling of errors in SIMD intrinsics

2014-12-05 Thread Alan Lawrence
Ok, thanks Charles - sorry for/if duplication of effort, that just spun out of 
trying to get rid of the calls to aarch64_simd_lane_bounds, as per 
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg02510.html . Again as per that 
message I'm leaving aarch64_ld{2,3,4}_lane to you ;).


Also there is the ARM backend! There's quite a lot to port across AArch64->ARM 
but my first priority is to try to get float16_t intrinsics working on ARM, i.e. 
probably doing as little as possible to get vget_lane_f16 etc. and then focus on 
performance later. Wondering if you have any plans for ARM ?


--Alan

Charles Baylis wrote:

On 5 December 2014 at 11:45, Alan Lawrence  wrote:

Following on from Charles Baylis' patch to improve the error message when
expanding arguments with qualifier_lane_index, this applies similar
treatment to __builtin_aarch64_im_lane_boundsi (using for e.g. vset_lane and
vext), and the more general case of immediates which should be constant but
aren't.

These patches depend upon the __aarch64_lane macro in
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg03134.html .

All patches cross-tested with check-gcc on aarch64-none-elf and
aarch64_be-none-elf.

Ok for trunk (following
https://gcc.gnu.org/ml/gcc-patches/2014-11/msg03134.html) ?


All look good to me (but I can't approve), except that the changelogs
should be marked with PR target/63870

I was about to submit a patch series for vldN_lane and vstN_lane with
a version of patch #1, but yours is much better than mine. I'll drop
that part and send it on ASAP.

Cheers
Charles






Re: [PATCH 2/2] Pipeline model for APM XGene-1.

2014-12-05 Thread Marcus Shawcroft
On 21 November 2014 at 18:44, Philipp Tomsich
 wrote:

> +;; Machine description for AppliedMicro xgene1 core.
> +;; Copyright (C) 2012-2014 Free Software Foundation, Inc.
> +;; Contributed by Theobroma Systems Design und Consulting GmbH.
> +;;See http://www.theobroma-systems.com for more info.
>
Contributed by is fine, but I don;t believe the proposed http link
here provides any information relevant to the copyright, license,
scheduler, port or gcc. I suggest dropping that line.

Otherwise OK provided the comments in Ramana's response are addressed.
Cheers
/Marcus


Re: [PATCH][AARCH64]Clarify the usage of SCHED in AARCH64_CORE macro

2014-12-05 Thread Marcus Shawcroft
On 3 December 2014 at 15:30, Renlin Li  wrote:

> 2014-12-03  Renlin Li  
>
> * config/aarch64/aarch64-opts.h (AARCH64_CORE): Rename IDENT to SCHED.
> * config/aarch64/aarch64.h (AARCH64_CORE): Likewise.
> * config/aarch64/aarch64.c (AARCH64_CORE): Rename X to IDENT, IDENT to
> SCHED.

OK /Marcus


Re: [PATCH 1/2] Core definition for APM XGene-1 and associated cost-table.

2014-12-05 Thread Marcus Shawcroft
On 21 November 2014 at 18:44, Philipp Tomsich
 wrote:

> +2014-11-19  Philipp Tomsich  
> +
> +   * config/aarch64/aarch64-cores.def (xgene1): Update/add the
> +   xgene1 (APM XGene-1) core definition.
> +   * gcc/config/aarch64/aarch64.c: Add cost tables for APM XGene-1
> +   * config/arm/aarch-cost-tables.h: Add cost tables for APM XGene-1
> +   * doc/invoke.texi: Document -mcpu=xgene1.


OK provide the comments in Ramana's earlier response have been addressed.
/Marcus


Re: [PATCH 2/2] Pipeline model for APM XGene-1.

2014-12-05 Thread Dr. Philipp Tomsich
Should I revise, or do will you just drop tje line when applying when applying 
this?

Thanks,
Phil.

> On 05 Dec 2014, at 18:23, Marcus Shawcroft  wrote:
> 
> On 21 November 2014 at 18:44, Philipp Tomsich
>  wrote:
> 
>> +;; Machine description for AppliedMicro xgene1 core.
>> +;; Copyright (C) 2012-2014 Free Software Foundation, Inc.
>> +;; Contributed by Theobroma Systems Design und Consulting GmbH.
>> +;;See http://www.theobroma-systems.com for more info.
>> 
> Contributed by is fine, but I don;t believe the proposed http link
> here provides any information relevant to the copyright, license,
> scheduler, port or gcc. I suggest dropping that line.
> 
> Otherwise OK provided the comments in Ramana's response are addressed.
> Cheers
> /Marcus



Re: [PATCH, PR 64192] Add forgotten conversion from bits to bytes

2014-12-05 Thread H.J. Lu
On Fri, Dec 5, 2014 at 8:13 AM, Martin Jambor  wrote:
> Hi,
>
> at some point I lost an important division of bit offset by
> BITS_PER_UNIT in my alignment IPA-CP propagation patch. That lead to a
> few failures on i686 reported as PR 64192.
>
> This patch adds it together with a slight improvement of the guarding
> check which I suppose will never trigger but it does ensure the
> division will never loose information.
>
> I consider this change obvious and would really like to commit it
> before I leave for the weekend, so I will do so after it finishes
> bootstrapping and testing on i686.  It has already passed bootstrap
> and testing on x86_64-linux.
>
> Thanks,
>
> Martin
>
>
> 2014-12-05  Martin Jambor  
>
> PR ipa/64192
> * ipa-prop.c (ipa_compute_jump_functions_for_edge): Convert alignment
> from bits to bytes after checking they are byte-aligned.
>
> Index: src/gcc/ipa-prop.c
> ===
> --- src.orig/gcc/ipa-prop.c
> +++ src/gcc/ipa-prop.c
> @@ -1739,10 +1739,11 @@ ipa_compute_jump_functions_for_edge (str
>   unsigned align;
>
>   if (get_pointer_alignment_1 (arg, &align, &hwi_bitpos)
> - && align > BITS_PER_UNIT)
> + && align % BITS_PER_UNIT == 0
> + && hwi_bitpos % BITS_PER_UNIT == 0)
> {
>   jfunc->alignment.known = true;
> - jfunc->alignment.align = align;
> + jfunc->alignment.align = align / BITS_PER_UNIT;
>   jfunc->alignment.misalign = hwi_bitpos / BITS_PER_UNIT;
> }
>   else

It also fixed SPEC CPU 2000 regression on Linux/i686.

Thanks.

-- 
H.J.


Re: [PATCH] [AArch64, NEON] Improve vpmaxX & vpminX intrinsics

2014-12-05 Thread Tejas Belagod

On 28/11/14 09:23, Yangfei (Felix) wrote:

Hi,
   This patch converts vpmaxX & vpminX intrinsics to use builtin functions 
instead of the previous inline assembly syntax.
   Regtested with aarch64-linux-gnu on QEMU.  Also passed the glorious 
testsuite of Christophe Lyon.
   OK for the trunk?


Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 218128)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,19 @@
+2014-11-28  Felix Yang  
+
+   * config/aarch64/aarch64-simd.md (aarch64_p): New
+   pattern.
+   * config/aarch64/aarch64-simd-builtins.def (smaxp, sminp, umaxp,
+   uminp, smax_nanp, smin_nanp): New builtins.
+   * config/aarch64/arm_neon.h (vpmax_s8, vpmax_s16, vpmax_s32,
+   vpmax_u8, vpmax_u16, vpmax_u32, vpmaxq_s8, vpmaxq_s16, vpmaxq_s32,
+   vpmaxq_u8, vpmaxq_u16, vpmaxq_u32, vpmax_f32, vpmaxq_f32, vpmaxq_f64,
+   vpmaxqd_f64, vpmaxs_f32, vpmaxnm_f32, vpmaxnmq_f32, vpmaxnmq_f64,
+   vpmaxnmqd_f64, vpmaxnms_f32, vpmin_s8, vpmin_s16, vpmin_s32, vpmin_u8,
+   vpmin_u16, vpmin_u32, vpminq_s8, vpminq_s16, vpminq_s32, vpminq_u8,
+   vpminq_u16, vpminq_u32, vpmin_f32, vpminq_f32, vpminq_f64, vpminqd_f64,
+   vpmins_f32, vpminnm_f32, vpminnmq_f32, vpminnmq_f64, vpminnmqd_f64,
+   vpminnms_f32): Rewrite using builtin functions.
+


You'll need to rebase over Alan Lawrance's patch.
https://gcc.gnu.org/ml/gcc-patches/2014-12/msg00279.html


  __extension__ static __inline float32x2_t __attribute__ ((__always_inline__))
Index: gcc/config/aarch64/aarch64-simd.md
===
--- gcc/config/aarch64/aarch64-simd.md  (revision 218128)
+++ gcc/config/aarch64/aarch64-simd.md  (working copy)
@@ -1015,6 +1015,28 @@
DONE;
  })

+;; Pairwise Integer Max/Min operations.
+(define_insn "aarch64_p"
+ [(set (match_operand:VQ_S 0 "register_operand" "=w")
+   (unspec:VQ_S [(match_operand:VQ_S 1 "register_operand" "w")
+(match_operand:VQ_S 2 "register_operand" "w")]
+   MAXMINV))]
+ "TARGET_SIMD"
+ "p\t%0., %1., %2."
+  [(set_attr "type" "neon_minmax")]
+)
+


Could you roll aarch64_reduc__internalv2si into this pattern?

Thanks,
Tejas.



Re: [PATCH][AArch64]Fix ICE at -O0 on vld1_lane intrinsics

2014-12-05 Thread Marcus Shawcroft
On 25 November 2014 at 14:03, Alan Lawrence  wrote:

> gcc/ChangeLog:
>
> * config/aarch64/arm_neon.h (__AARCH64_NUM_LANES, __aarch64_lane
> *2):
> New.
> (aarch64_vset_lane_any): Redefine using previous, same for BE + LE.
> (vset_lane_f32, vset_lane_f64, vset_lane_p8, vset_lane_p16,
> vset_lane_s8, vset_lane_s16, vset_lane_s32, vset_lane_s64,
> vset_lane_u8, vset_lane_u16, vset_lane_u32, vset_lane_u64): Remove
> number of lanes.
> (vld1_lane_f32, vld1_lane_f64, vld1_lane_p8, vld1_lane_p16,
> vld1_lane_s8, vld1_lane_s16, vld1_lane_s32, vld1_lane_s64,
> vld1_lane_u8, vld1_lane_u16, vld1_lane_u32, vld1_lane_u64): Call
> __aarch64_vset_lane_any rather than vset_lane_xxx.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/vld1_lane-o0.c: New test.

OK /Marcus


Re: [PATCH 1/4][AArch64]Fix ICE on non-constant indices to __builtin_aarch64_im_lane_boundsi

2014-12-05 Thread Marcus Shawcroft
On 5 December 2014 at 11:54, Alan Lawrence  wrote:

> gcc/ChangeLog:
>
> * config/aarch64/aarch64-builtins.c
> (aarch64_types_binopv_qualifiers,
> TYPES_BINOPV): Delete.
> (enum aarch64_builtins): Add AARCH64_BUILTIN_SIMD_LANE_CHECK and
> AARCH64_SIMD_PATTERN_START.
> (aarch64_init_simd_builtins): Register
> __builtin_aarch64_im_lane_boundsi; use  AARCH64_SIMD_PATTERN_START.
> (aarch64_simd_expand_builtin): Handle AARCH64_BUILTIN_LANE_CHECK;
> use
> AARCH64_SIMD_PATTERN_START.
>
> * config/aarch64/aarch64-simd.md (aarch64_im_lane_boundsi): Delete.
> * config/aarch64/aarch64-simd-builtins.def (im_lane_bound): Delete.
>
> * config/aarch64/arm_neon.h (__AARCH64_LANE_CHECK): New.
> (__aarch64_vget_lane_f64, __aarch64_vget_lane_s64,
> __aarch64_vget_lane_u64, __aarch64_vset_lane_any, vdupd_lane_f64,
> vdupd_lane_s64, vdupd_lane_u64, vext_f32, vext_f64, vext_p8,
> vext_p16,
> vext_s8, vext_s16, vext_s32, vext_s64, vext_u8, vext_u16, vext_u32,
> vext_u64, vextq_f32, vextq_f64, vextq_p8, vextq_p16, vextq_s8,
> vextq_s16, vextq_s32, vextq_s64, vextq_u8, vextq_u16, vextq_u32,
> vextq_u64, vmulq_lane_f64): Use __AARCH64_LANE_CHECK.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/simd/vset_lane_s16_const_1.c: New test.

OK /Marcus


Re: [PATCH 2/4][AArch64]Improve error message for non-constant immediates

2014-12-05 Thread Marcus Shawcroft
On 5 December 2014 at 11:55, Alan Lawrence  wrote:

> gcc/ChangeLog:
>
> * gcc/config/aarch64-builtins.c (aarch64_simd_expand_args): Update
> error
> message for SIMD_ARG_CONSTANT.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/arg-type-diagnostics-1.c: Call intrinsic,
> update
> expected error message.

OK /Marcus


Re: [PATCH 3/4][AArch64]Remove be_checked_get_lane, check bounds with __builtin_aarch64_im_lane_boundsi.

2014-12-05 Thread Marcus Shawcroft
On 5 December 2014 at 11:56, Alan Lawrence  wrote:

> gcc/ChangeLog:
>
> * config/aarch64/aarch64-simd-builtins.def (be_checked_get_lane):
> Delete.
> * config/aarch64/aarch64-simd.md
> (aarch64_be_checked_get_lane):
> Delete.
> * config/aarch64/arm_neon.h (aarch64_vget_lane_any): Use GCC
> vector extensions, __aarch64_lane,
> __builtin_aarch64_im_lane_boundsi.
> (__aarch64_vget_lane_f32, __aarch64_vget_lane_f64,
> __aarch64_vget_lane_p8, __aarch64_vget_lane_p16,
> __aarch64_vget_lane_s8, __aarch64_vget_lane_s16,
> __aarch64_vget_lane_s32, __aarch64_vget_lane_s64,
> __aarch64_vget_lane_u8, __aarch64_vget_lane_u16,
> __aarch64_vget_lane_u32, __aarch64_vget_lane_u64,
> __aarch64_vgetq_lane_f32, __aarch64_vgetq_lane_f64,
> __aarch64_vgetq_lane_p8, __aarch64_vgetq_lane_p16,
> __aarch64_vgetq_lane_s8, __aarch64_vgetq_lane_s16,
> __aarch64_vgetq_lane_s32, __aarch64_vgetq_lane_s64,
> __aarch64_vgetq_lane_u8, __aarch64_vgetq_lane_u16,
> __aarch64_vgetq_lane_u32, __aarch64_vgetq_lane_u64): Delete.
> (__aarch64_vdup_lane_any): Use __aarch64_vget_lane_any, remove
> ‘q2’ argument.
> (__aarch64_vdup_lane_f32, __aarch64_vdup_lane_f64,
> __aarch64_vdup_lane_p8, __aarch64_vdup_lane_p16,
> __aarch64_vdup_lane_s8, __aarch64_vdup_lane_s16,
> __aarch64_vdup_lane_s32, __aarch64_vdup_lane_s64,
> __aarch64_vdup_lane_u8, __aarch64_vdup_lane_u16,
> __aarch64_vdup_lane_u32, __aarch64_vdup_lane_u64,
> __aarch64_vdup_laneq_f32, __aarch64_vdup_laneq_f64,
> __aarch64_vdup_laneq_p8, __aarch64_vdup_laneq_p16,
> __aarch64_vdup_laneq_s8, __aarch64_vdup_laneq_s16,
> __aarch64_vdup_laneq_s32, __aarch64_vdup_laneq_s64,
> __aarch64_vdup_laneq_u8, __aarch64_vdup_laneq_u16,
> __aarch64_vdup_laneq_u32, __aarch64_vdup_laneq_u64): Remove argument
> to __aarch64_vdup_lane_any.
> (vget_lane_f32, vget_lane_f64, vget_lane_p8, vget_lane_p16,
> vget_lane_s8, vget_lane_s16, vget_lane_s32, vget_lane_s64,
> vget_lane_u8, vget_lane_u16, vget_lane_u32, vget_lane_u64,
> vgetq_lane_f32, vgetq_lane_f64, vgetq_lane_p8, vgetq_lane_p16,
> vgetq_lane_s8, vgetq_lane_s16, vgetq_lane_s32, vgetq_lane_s64,
> vgetq_lane_u8, vgetq_lane_u16, vgetq_lane_u32, vgetq_lane_u64,
> vdupb_lane_p8, vdupb_lane_s8, vdupb_lane_u8, vduph_lane_p16,
> vduph_lane_s16, vduph_lane_u16, vdups_lane_f32, vdups_lane_s32,
> vdups_lane_u32, vdupb_laneq_p8, vdupb_laneq_s8, vdupb_laneq_u8,
> vduph_laneq_p16, vduph_laneq_s16, vduph_laneq_u16, vdups_laneq_f32,
> vdups_laneq_s32, vdups_laneq_u32, vdupd_laneq_f64, vdupd_laneq_s64,
> vdupd_laneq_u64, vfmas_lane_f32, vfma_laneq_f64, vfmad_laneq_f64,
> vfmas_laneq_f32, vfmss_lane_f32, vfms_laneq_f64, vfmsd_laneq_f64,
> vfmss_laneq_f32, vmla_lane_f32, vmla_lane_s16, vmla_lane_s32,
> vmla_lane_u16, vmla_lane_u32, vmla_laneq_f32, vmla_laneq_s16,
> vmla_laneq_s32, vmla_laneq_u16, vmla_laneq_u32, vmlaq_lane_f32,
> vmlaq_lane_s16, vmlaq_lane_s32, vmlaq_lane_u16, vmlaq_lane_u32,
> vmlaq_laneq_f32, vmlaq_laneq_s16, vmlaq_laneq_s32, vmlaq_laneq_u16,
> vmlaq_laneq_u32, vmls_lane_f32, vmls_lane_s16, vmls_lane_s32,
> vmls_lane_u16, vmls_lane_u32, vmls_laneq_f32, vmls_laneq_s16,
> vmls_laneq_s32, vmls_laneq_u16, vmls_laneq_u32, vmlsq_lane_f32,
> vmlsq_lane_s16, vmlsq_lane_s32, vmlsq_lane_u16, vmlsq_lane_u32,
> vmlsq_laneq_f32, vmlsq_laneq_s16, vmlsq_laneq_s32, vmlsq_laneq_u16,
> vmlsq_laneq_u32, vmul_lane_f32, vmul_lane_s16, vmul_lane_s32,
> vmul_lane_u16, vmul_lane_u32, vmuld_lane_f64, vmuld_laneq_f64,
> vmuls_lane_f32, vmuls_laneq_f32, vmul_laneq_f32, vmul_laneq_f64,
> vmul_laneq_s16, vmul_laneq_s32, vmul_laneq_u16, vmul_laneq_u32,
> vmulq_lane_f32, vmulq_lane_s16, vmulq_lane_s32, vmulq_lane_u16,
> vmulq_lane_u32, vmulq_laneq_f32, vmulq_laneq_f64, vmulq_laneq_s16,
> vmulq_laneq_s32, vmulq_laneq_u16, vmulq_laneq_u32) : Use
> __aarch64_vget_lane_any.
>
> gcc/testsuite/ChangeLog:
>
> * gcc.target/aarch64/simd/vget_lane_f32_indices_1.c: New test.
> * gcc.target/aarch64/simd/vget_lane_f64_indices_1.c: Likewise.
> * gcc.target/aarch64/simd/vget_lane_p16_indices_1.c: Likewise.
> * gcc.target/aarch64/simd/vget_lane_p8_indices_1.c: Likewise.
> * gcc.target/aarch64/simd/vget_lane_s16_indices_1.c: Likewise.
> * gcc.target/aarch64/simd/vget_lane_s32_indices_1.c: Likewise.
> * gcc.target/aarch64/simd/vget_lane_s64_indices_1.c: Likewise.
> * gcc.target/aarch64/simd/vget_lane_s8_indices_1.c: Likewise.
> * gcc.target/aarch64/simd/vget_lane_u16_indices_1.c: Likewise.
> * gcc.target/aarch64/simd/vget_lan

Re: [PATCH 4/4][AArch64]Remove aarch64_get_lanedi, unused

2014-12-05 Thread Marcus Shawcroft
On 5 December 2014 at 11:56, Alan Lawrence  wrote:
> I tested this by poisoning the old pattern and running check-gcc on both
> aarch64-none-elf and aarch64_be-none-elf; there were no regressions even
> with the poisoned pattern.
>
> gcc/ChangeLog:
>
> * config/aarch64/aarch64-simd.md (aarch64_get_lanedi): Remove.

OK /Marcus


Re: [PATCH][AArch64][test] Disable vector cost model on vect_ctz_1.c test

2014-12-05 Thread Marcus Shawcroft
On 4 December 2014 at 09:42, Kyrill Tkachov  wrote:

> 2014-12-04  Kyrylo Tkachov  kyrylo.tkac...@arm.com\
>
>  * gcc.target/aarch64/vect_ctz_1.c: Add -fno-vect-cost-model to
>  dg-options.

OK /Marcus


Re: [ARM,AArch64][testsuite] Fix vaddl and vaddw tests

2014-12-05 Thread Marcus Shawcroft
On 5 December 2014 at 13:14, Christophe Lyon  wrote:

>>> 2014-12-03  Christophe Lyon  
>>>
>>> testsuite/
>>> * gcc.target/aarch64/advsimd-intrinsics/vaddl.c: Actually execute
>>> the test. Fix expected output.
>>> * gcc.target/aarch64/advsimd-intrinsics/vaddw.c: Likewise.

OK /Marcus


RE: [PATCH] Add force option to find_best_rename_reg in regrename pass

2014-12-05 Thread Thomas Preud'homme
> From: Eric Botcazou [mailto:ebotca...@adacore.com]
> Sent: Friday, December 05, 2014 4:40 PM
> 
> OK for mainline, but investigate whether you can better format the
> config/c6x/c6x.c line, for example:
> 
> + best_reg
> +   = find_rename_reg (this_head, super_class, &unavailable, old_reg,
> true);

Commited as suggested.

Thanks. Best regards,

Thomas 






Re: [PING] [PATCH] [AArch64, NEON] More NEON intrinsics improvement

2014-12-05 Thread Tejas Belagod




+__extension__ static __inline float32x2_t __attribute__
+((__always_inline__))
+vfms_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c) {
+  return __builtin_aarch64_fmav2sf (-__b, __c, __a); }
+
+__extension__ static __inline float32x4_t __attribute__
+((__always_inline__))
+vfmsq_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c) {
+  return __builtin_aarch64_fmav4sf (-__b, __c, __a); }
+
+__extension__ static __inline float64x2_t __attribute__
+((__always_inline__))
+vfmsq_f64 (float64x2_t __a, float64x2_t __b, float64x2_t __c) {
+  return __builtin_aarch64_fmav2df (-__b, __c, __a); }
+
+


Thanks, the patch looks good. Just one comment:
You could also add
float32x2_t vfms_n_f32(float32x2_t a, float32x2_t b, float32_t n) and 
its Q-variant.


Thanks,
Tejas.



Re: [PATCH] [AArch64, NEON] Improve vpmaxX & vpminX intrinsics

2014-12-05 Thread Marcus Shawcroft
On 28 November 2014 at 09:23, Yangfei (Felix)  wrote:
> Hi,
>   This patch converts vpmaxX & vpminX intrinsics to use builtin functions 
> instead of the previous inline assembly syntax.
>   Regtested with aarch64-linux-gnu on QEMU.  Also passed the glorious 
> testsuite of Christophe Lyon.
>   OK for the trunk?

Hi Felix,   We know from experience that the advsimd intrinsics tend
to be fragile for big endian and in general it is fairly easy to break
the big endian case.  For these advsimd improvements that you are
working on (that we very much appreciate) it is important to run both
little endian and big endian regressions.

Thanks
/Marcus


Re: [PING] [PATCH] [AArch64, NEON] More NEON intrinsics improvement

2014-12-05 Thread Marcus Shawcroft
On 5 December 2014 at 18:44, Tejas Belagod  wrote:
>
>>>
>>> +__extension__ static __inline float32x2_t __attribute__
>>> +((__always_inline__))
>>> +vfms_f32 (float32x2_t __a, float32x2_t __b, float32x2_t __c) {
>>> +  return __builtin_aarch64_fmav2sf (-__b, __c, __a); }
>>> +
>>> +__extension__ static __inline float32x4_t __attribute__
>>> +((__always_inline__))
>>> +vfmsq_f32 (float32x4_t __a, float32x4_t __b, float32x4_t __c) {
>>> +  return __builtin_aarch64_fmav4sf (-__b, __c, __a); }
>>> +
>>> +__extension__ static __inline float64x2_t __attribute__
>>> +((__always_inline__))
>>> +vfmsq_f64 (float64x2_t __a, float64x2_t __b, float64x2_t __c) {
>>> +  return __builtin_aarch64_fmav2df (-__b, __c, __a); }
>>> +
>>> +
>
>
> Thanks, the patch looks good. Just one comment:
> You could also add
> float32x2_t vfms_n_f32(float32x2_t a, float32x2_t b, float32_t n) and its
> Q-variant.

You can, if you wish,  deal with Tejas' comment with a follow on patch
rather than re-spinning this one.   Provided this patch has no
regressions on a big endian and a little endian test run then you can
commit it.
Thanks
/Marcus


Re: [PATCH] Don't promote RHS of shift-expr if it's integer_type (PR tree-optimization/64183)

2014-12-05 Thread Marek Polacek
On Fri, Dec 05, 2014 at 12:08:02PM +0100, Jakub Jelinek wrote:
> On Fri, Dec 05, 2014 at 11:27:50AM +0100, Marek Polacek wrote:
> > My recent change to shift operand promotion caused a regression in
> > loop unrolling.  Fixed as Richi suggested in the PR audit trail.
> > 
> > Bootstrapped/regtested on ppc64-linux and x86_64-linux, ok for trunk?
> > 
> > 2014-12-05  Marek Polacek  
> > 
> > PR tree-optimization/64183
> > * c-gimplify.c (c_gimplify_expr): Don't convert the RHS of a
> > shift-expression if it is integer_type_node.
> > 
> > * gcc.dg/tree-ssa/pr64183.c: New test.
> 
> This is for middle-end, so I think it would be better to use
> the middle-end type equality in the checks, so !types_compatible_p 
> instead of != ?

Ok, this is a variant with types_compatible_p instead.

Bootstrapped/regtested on ppc64-linux and x86_64-linux.

2014-12-05  Marek Polacek  

PR tree-optimization/64183
* c-gimplify.c (c_gimplify_expr): Don't convert the RHS of a
shift-expression if it is integer_type_node.  Use types_compatible_p.

* gcc.dg/tree-ssa/pr64183.c: New test.

diff --git gcc/c-family/c-gimplify.c gcc/c-family/c-gimplify.c
index 2cfa5d9..4781cf2 100644
--- gcc/c-family/c-gimplify.c
+++ gcc/c-family/c-gimplify.c
@@ -255,7 +255,10 @@ c_gimplify_expr (tree *expr_p, gimple_seq *pre_p 
ATTRIBUTE_UNUSED,
   type demotion/promotion pass.  */
tree *op1_p = &TREE_OPERAND (*expr_p, 1);
if (TREE_CODE (TREE_TYPE (*op1_p)) != VECTOR_TYPE
-   && TYPE_MAIN_VARIANT (TREE_TYPE (*op1_p)) != unsigned_type_node)
+   && !types_compatible_p (TYPE_MAIN_VARIANT (TREE_TYPE (*op1_p)),
+   unsigned_type_node)
+   && !types_compatible_p (TYPE_MAIN_VARIANT (TREE_TYPE (*op1_p)),
+   integer_type_node))
  *op1_p = convert (unsigned_type_node, *op1_p);
break;
   }
diff --git gcc/testsuite/gcc.dg/tree-ssa/pr64183.c 
gcc/testsuite/gcc.dg/tree-ssa/pr64183.c
index e69de29..0563739 100644
--- gcc/testsuite/gcc.dg/tree-ssa/pr64183.c
+++ gcc/testsuite/gcc.dg/tree-ssa/pr64183.c
@@ -0,0 +1,21 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -fdump-tree-cunroll-details" } */
+
+int bits;
+unsigned int size;
+int max_code;
+
+void
+test ()
+{
+ int code = 0;
+
+ while (code < max_code)
+   code |= ((unsigned int) (size >> (--bits)));
+
+ while (bits < (unsigned int)25)
+   bits += 8;
+}
+
+/* { dg-final { scan-tree-dump "Loop 2 iterates at most 4 times" "cunroll"} } 
*/
+/* { dg-final { cleanup-tree-dump "cunroll" } } */

Marek


Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-05 Thread Dominique Dhumieres
> ...
> Can you please test it on Darwin (or whatever other target has similar
> issues with bootstrapping libcc1)?
>
> 2014-12-05  Jakub Jelinek  
> ...

The patch does not work for x86_64-apple-darwin14.0.0. However the following 
patch does:

--- ../_clean/Makefile.in   2014-11-26 23:09:14.0 +0100
+++ Makefile.in 2014-12-05 17:22:54.0 +0100
@@ -31389,7 +31389,7 @@ configure-libcc1: 
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
test ! -f $(HOST_SUBDIR)/libcc1/Makefile || exit 0; \
$(SHELL) $(srcdir)/mkinstalldirs $(HOST_SUBDIR)/libcc1 ; \
-   $(HOST_EXPORTS)  \
+   $(POSTSTAGE1_HOST_EXPORTS)  \
echo Configuring in $(HOST_SUBDIR)/libcc1; \
cd "$(HOST_SUBDIR)/libcc1" || exit 1; \
case $(srcdir) in \
@@ -31422,7 +31422,7 @@ all-libcc1: configure-libcc1
@: $(MAKE); $(unstage)
@r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
-   $(HOST_EXPORTS)  \
+   $(POSTSTAGE1_HOST_EXPORTS)  \
(cd $(HOST_SUBDIR)/libcc1 && \
  $(MAKE) $(BASE_FLAGS_TO_PASS) $(EXTRA_HOST_FLAGS) 
$(STAGE1_FLAGS_TO_PASS)  \
$(TARGET-libcc1))
@@ -31440,7 +31440,7 @@ check-libcc1:
@: $(MAKE); $(unstage)
@r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
-   $(HOST_EXPORTS) \
+   $(POSTSTAGE1_HOST_EXPORTS) \
(cd $(HOST_SUBDIR)/libcc1 && \
  $(MAKE) $(FLAGS_TO_PASS)  check)
 
@@ -31455,7 +31455,7 @@ install-libcc1: installdirs
@: $(MAKE); $(unstage)
@r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
-   $(HOST_EXPORTS) \
+   $(POSTSTAGE1_HOST_EXPORTS) \
(cd $(HOST_SUBDIR)/libcc1 && \
  $(MAKE) $(FLAGS_TO_PASS)  install)
 
@@ -31470,7 +31470,7 @@ install-strip-libcc1: installdirs
@: $(MAKE); $(unstage)
@r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
-   $(HOST_EXPORTS) \
+   $(POSTSTAGE1_HOST_EXPORTS) \
(cd $(HOST_SUBDIR)/libcc1 && \
  $(MAKE) $(FLAGS_TO_PASS)  install-strip)
 
@@ -31489,7 +31489,7 @@ info-libcc1: \
@[ -f ./libcc1/Makefile ] || exit 0; \
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
-   $(HOST_EXPORTS) \
+   $(POSTSTAGE1_HOST_EXPORTS) \
for flag in $(EXTRA_HOST_FLAGS) ; do \
  eval `echo "$$flag" | sed -e "s|^\([^=]*\)=\(.*\)|\1='\2'; export 
\1|"`; \
done; \
@@ -31515,7 +31515,7 @@ dvi-libcc1: \
@[ -f ./libcc1/Makefile ] || exit 0; \
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
-   $(HOST_EXPORTS) \
+   $(POSTSTAGE1_HOST_EXPORTS) \
for flag in $(EXTRA_HOST_FLAGS) ; do \
  eval `echo "$$flag" | sed -e "s|^\([^=]*\)=\(.*\)|\1='\2'; export 
\1|"`; \
done; \
@@ -31541,7 +31541,7 @@ pdf-libcc1: \
@[ -f ./libcc1/Makefile ] || exit 0; \
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
-   $(HOST_EXPORTS) \
+   $(POSTSTAGE1_HOST_EXPORTS) \
for flag in $(EXTRA_HOST_FLAGS) ; do \
  eval `echo "$$flag" | sed -e "s|^\([^=]*\)=\(.*\)|\1='\2'; export 
\1|"`; \
done; \
@@ -31567,7 +31567,7 @@ html-libcc1: \
@[ -f ./libcc1/Makefile ] || exit 0; \
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
-   $(HOST_EXPORTS) \
+   $(POSTSTAGE1_HOST_EXPORTS) \
for flag in $(EXTRA_HOST_FLAGS) ; do \
  eval `echo "$$flag" | sed -e "s|^\([^=]*\)=\(.*\)|\1='\2'; export 
\1|"`; \
done; \
@@ -31593,7 +31593,7 @@ TAGS-libcc1: \
@[ -f ./libcc1/Makefile ] || exit 0; \
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
-   $(HOST_EXPORTS) \
+   $(POSTSTAGE1_HOST_EXPORTS) \
for flag in $(EXTRA_HOST_FLAGS) ; do \
  eval `echo "$$flag" | sed -e "s|^\([^=]*\)=\(.*\)|\1='\2'; export 
\1|"`; \
done; \
@@ -31620,7 +31620,7 @@ install-info-libcc1: \
@[ -f ./libcc1/Makefile ] || exit 0; \
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
-   $(HOST_EXPORTS) \
+   $(POSTSTAGE1_HOST_EXPORTS) \
for flag in $(EXTRA_HOST_FLAGS) ; do \
  eval `echo "$$flag" | sed -e "s|^\([^=]*\)=\(.*\)|\1='\2'; export 
\1|"`; \
done; \
@@ -31647,7 +31647,7 @@ install-pdf-libcc1: \
@[ -f ./libcc1/Makefile ] || exit 0; \
r=`${PWD_COMMAND}`; export r; \
s=`cd $(srcdir); ${PWD_COMMAND}`; export s; \
-   $(HOST_EXPORTS) \
+   $(POSTSTAGE1_HOST_EXPORTS) \
for flag in $(EXTRA_HOST_FLAGS) ; do \
  eval `echo "$$flag" | sed -e "s|^\([^=]*\)=\(.*\)|\1='\2'; export 
\1|"`; \
done; \
@@ -31674,7 +31674,7 @@ install-html-libcc1: \
@[ -f ./libcc1/Make

Re: [PATCH] allow passing argument to the JIT linker

2014-12-05 Thread David Malcolm
On Fri, 2014-12-05 at 01:27 -0500, Ulrich Drepper wrote:
> If you generate code with the JIT which references outside symbols there
> is currently no way to have a self-contained DSO created.  The command
> line to invoke the linker is fixed.

What's the use-case here?  Sorry if I'm not getting at what this is for.

If I understand things correctly, an example use-case here is:

Executable "foo" is not yet linked against, say libpng.so, and wants to
JIT-compile code that uses libpng.

Hence:

foo -> libgccjit.so
  JIT-compile code "jitted_function" that calls, say,
   png_uint_32 png_access_version_number (void);

This gives a fake.so with an imported symbol of
"png_access_version_number" and a NEEDED of libpng.so.something.

Upon attempting to dlopen (fake.so) and run "jitted_function", then
presumably one of several things can happen:
* libpng.so.N was already loaded into "foo"
* libpng.so.N is loaded when fake.so is dlopened
* libpng.so.N is not found

(currently libgccjit.so is dlopening fake.so with
RTLD_NOW | RTLD_LOCAL).

Is the "self-containedness of the DSO" in your patch aimed at ensuring
that libpng.so.N gets unloaded when fake.so is unloaded?
Or is this more about having any errors happen at compilation time,
rather than symbol load/run time?

> The patch below would change that.  It builds upon the existing
> framework to specify options for the compiler.  The linker optimization
> flag fits fully into the existing functionality.  For additional files
> to link with I had to extend the mechanism a bit since it is not just
> one string that needs to be remembered.
> 
> I've also added the set_str_option member function to the C++ interface
> of the library.  That must have been an oversight.

One issue here is the lifetime of str options; currently str options
simply record the const char *, without taking a copy of the underlying
buffer.  We might need to change this to make it take a strdup of the
option, to avoid nasty surprises if someone calls set_str_option with a
std::string and has it auto-coerced to a const char * from under them.

> What do you think?

I'm still not clear about the problem you're solving here; sorry.

New options should be documented in:
  gcc/jit/docs/topics/contexts.rst in the "Options" section.
and these ones should probably be mentioned in the subsection on
GCC_JIT_FUNCTION_IMPORTED in functions.rst.

Sadly, the "Tutorial" part of the current docs is missing any kind of
discussion of using functions from other DSOs - sorry - which sounds
like something I/we should fix, especially if we need to add options for
it.  I've filed that omission as PR jit/64201.


We'd need one or more testcase(s) to exercise the options.


Various comments inline:

> gcc/ChangeLog:
> 
> 2014-12-05  Ulrich Drepper  
> 
>   * jit/libgccjit++.h (context): Add missing set_str_option
>   member function.
> 
>   * jit/libgccjit.h (gcc_jit_int_option): Add
>   GCC_JIT_INT_OPTION_LINK_OPTIMIZATION_LEVEL.
>   (gcc_jit_str_option): Add GCC_JIT_STR_OPTION_LINKFILE.
>   * jit/jit-playback.c (convert_to_dso): Use auto_vec instead
>   of fixed-sized array for arguments.  Define ADD_ARG macro
>   to add to it.  Adjust existing code.  Additionally add
>   optimization level and additional link files to the list.
>   * jit/jit-playback.h (context::get_linkfiles): New member
>   function.
>   * jit/jit-recording.c (recording::context:set_str_option):
>   Handle GCC_JIT_STR_OPTION_LINKFILE.
>   * jit/jit-recording.h (recording::context:set_str_option):
>   Add get_linkfiles member function.
> 
> diff --git a/gcc/jit/jit-playback.c b/gcc/jit/jit-playback.c
> index ecdae80..9c4e45f 100644
> --- a/gcc/jit/jit-playback.c
> +++ b/gcc/jit/jit-playback.c
> @@ -1726,18 +1726,19 @@ convert_to_dso (const char *ctxt_progname)
>   TV_ASSEMBLE.  */
>auto_timevar assemble_timevar (TV_ASSEMBLE);
>const char *errmsg;
> -  const char *argv[7];
> +  auto_vec  argvec;
> +#define ADD_ARG(arg) argvec.safe_push (arg)
>int exit_status = 0;
>int err = 0;
>const char *gcc_driver_name = GCC_DRIVER_NAME;
>  
> -  argv[0] = gcc_driver_name;
> -  argv[1] = "-shared";
> +  ADD_ARG (gcc_driver_name);
> +  ADD_ARG ("-shared");
>/* The input: assembler.  */
> -  argv[2] = m_path_s_file;
> +  ADD_ARG (m_path_s_file);
>/* The output: shared library.  */
> -  argv[3] = "-o";
> -  argv[4] = m_path_so_file;
> +  ADD_ARG ("-o");
> +  ADD_ARG (m_path_so_file);

This conversion from an array to an auto_vec via ADD_ARG looks good to
me.
 
>/* Don't use the linker plugin.
>   If running with just a "make" and not a "make install", then we'd
> @@ -1746,17 +1747,39 @@ convert_to_dso (const char *ctxt_progname)
>   libto_plugin is a .la at build time, with it becoming installed with
>   ".so" suffix: i.e. it doesn't exist with a .so suffix until install
>   time.  */
> -  argv[5] = "-fno-use-linker-plugin";
> +  ADD_ARG ("-fno-use-linke

Re: [PATCH 2/2] Pipeline model for APM XGene-1.

2014-12-05 Thread Mike Stump
On Dec 5, 2014, at 9:25 AM, Dr. Philipp Tomsich 
 wrote:
> Should I revise, or do will you just drop tje line when applying when 
> applying this?

We like for the gcc-patches archive to have what exactly went in.  It would be 
best to re-post the patch with the line gone.

Re: [Patch] Improving jump-thread pass for PR 54742

2014-12-05 Thread Jeff Law

On 12/04/14 02:14, Sebastian Pop wrote:

Sebastian Pop wrote:

a fail I have not seen in the past:

FAIL: gcc.c-torture/compile/pr27571.c   -Os  (internal compiler error)

I am still investigating why this fails: as far as I can see for now this is
because in copying the FSM path we create an internal loop that is then
discovered by the loop verifier as a natural loop and is not yet in the existing
loop sturctures.  I will try to fix this in duplicate_seme by invalidating the
loop structure after we code generated all the FSM paths.  I will submit an
updated patch when it passes regtest.


We need at least this patch to fix the fail:

@@ -2518,6 +2518,7 @@ thread_through_all_blocks (bool may_peel_loop_headers)
   if (duplicate_seme_region (entry, exit, region, len - 1, NULL))
 {
   /* We do not update dominance info.  */
   free_dominance_info (CDI_DOMINATORS);
   bitmap_set_bit (threaded_blocks, entry->src->index);
+ retval = true;
 }

And this will trigger in the end of the code gen function:

  if (retval)
 loops_state_set (LOOPS_NEED_FIXUP);

That will fix the loop structures.  I'm testing this patch on top of the one I
have just sent out.

That looks correct to me.

Jeff



Re: [PATCH] allow passing argument to the JIT linker

2014-12-05 Thread Ulrich Drepper
On Fri, Dec 5, 2014 at 2:24 PM, David Malcolm  wrote:
> What's the use-case here?  Sorry if I'm not getting at what this is for.

The use case is that a program wants to use library functions,
something common, not everything is self-contained and linked in
automatically (like libc).  Currently you would have to rely on the
fact that a DSO can be created with dangling references which are
expected to be somehow fulfilled at runtime. There are multiple
problems with this:

First, even if the application using the JIT itself is linked against
a library which the JIT-generated code wants to use it is a problem if
the definitions are accidentally found.  If the library with the
desired function in question uses symbol versioning the JIT-created
DSO would have just an ordinary UNDEF entry for the symbol with no
symbol version available.  This then means that at runtime the
/oldest/ version is picked.  That not what you want in this case.

Second, if you implement some form of extension language where the
language allows to reference functions in other DSOs, you'd have to
either use dlopen(RTLD_GLOBAL) in the main app (evil, ever use
RTLD_GLOBAL) or you'd have to implicitly have the generated code use
dlopen() and the dlsym().  That's cumbersome at best and also slow.


On the other hand, with an option as proposed the code generator could
simply record the dependency and have the DSO automatically used at
link-time and runtime, creating the correct references etc.


> Is the "self-containedness of the DSO" in your patch aimed at ensuring
> that libpng.so.N gets unloaded when fake.so is unloaded?

The unloading part is a nice additional benefit.  It's mostly about
the possibility to make it easily and quickly possible to call any
function from any available DSO without having to know which DSOs are
needed at the time the application using the JIT is linked.


> One issue here is the lifetime of str options; currently str options
> simply record the const char *, without taking a copy of the underlying
> buffer.  We might need to change this to make it take a strdup of the
> option, to avoid nasty surprises if someone calls set_str_option with a
> std::string and has it auto-coerced to a const char * from under them.

I'm fine with that, I just followed what you did so far.  If you want
it done this way I'll add this to the patch.


> New options should be documented in:
>   gcc/jit/docs/topics/contexts.rst in the "Options" section.
> and these ones should probably be mentioned in the subsection on
> GCC_JIT_FUNCTION_IMPORTED in functions.rst.

I was more concerned with the code first... ;-)


> Do you have a sense of what impact setting the option would have on the
> time taken by gcc_jit_context_compile?

It's really not much.  The linker just tries different sizes for a
hash table and picks the size with the least number of conflicts and
therefore hopefully best performance at runtime.  With today's
machines this isn't really noticeable.  Jakub (if you read this), when
did we implement this?  It still might not be a good idea to enable it
by default and, as written, there might be other optimizations which
are implemented.


> This doesn't support nested contexts; presumably this should walk up
> through any parent contexts, adding any linkfiles requested by them?

Nested contexts?  Do you deal with with gcc_jit_contact structures
recursively?  I must miss that.  This is just a way to add more
strings (free-form parameters) to the linker command line.  I'm using

   ctxt.set_str_option(GCC_JIT_STR_OPTION_LINKFILE, "-lsomelibrary");

to have fake.so linked against libsomelibrary.so.


> Here's another place where nested contexts may need to be supported: a
> playback context's m_recording_ctxt may have ancestors, and they might
> have linkfiles specified.

This isn't the playback context structure, it the toplevel
(gccjit::context) one.  As far I can see there is no hierarchy and
this makes sense.


> I notice that this string option works differently from the others, in
> that it appends to a list, rather than overwriting a value; that would
> need spelling out in the documentation.

Yes, sure, documentation is nothing I've concerned myself at that point.


> I wondered if this should take a std::string instead of a const char *,
> but a const char * is probably more flexible, given that you can go
> trivially from a std::string to a const char *, but going the other way
> may cost some cycles.

If we want to make copies anyway I think it doesn't matter.  I think
using const char* is easier to use for the reasons you spelled out.


> This descriptive comment needs fleshing out.  For example, are these
> filenames, or SONAMEs?  How does this relate to what a user would pass
> to the linker command line if they were writing a Makefile rather than
> code that's calling into a JIT API?

The strings are supposed to be exactly what you would add  to the
linker command line.  No magic.  In fact, the same mechanism ca

Re: [Patch] Improving jump-thread pass for PR 54742

2014-12-05 Thread Jeff Law

On 12/04/14 07:29, Sebastian Pop wrote:

Sebastian Pop wrote:

Jeff Law wrote:

I'm a bit worried about compile-time impacts of the all the
recursion


I will also restrict the recursion to the loop in which we look for the FSM
thread.


The attached patch includes this change.  It passed bootstrap and regression
test on x86_64-linux.  Ok to commit?

OK to commit.  Thanks for your patience.

Can you follow-up with a change which throttles this optimization when 
-Os is in effect.  You can check optimize_function_for_size_p (cfun) and 
simply avoid the backward traversal or you could allow it in that case 
if the amount of copying is suitably small.  Your call.




jeff


Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-05 Thread Jakub Jelinek
On Fri, Dec 05, 2014 at 08:11:53PM +0100, Dominique Dhumieres wrote:
> > ...
> > Can you please test it on Darwin (or whatever other target has similar
> > issues with bootstrapping libcc1)?
> >
> > 2014-12-05  Jakub Jelinek  
> > ...
> 
> The patch does not work for x86_64-apple-darwin14.0.0. However the following 
> patch does:

As I've tried to explain, that is IMHO wrong though.
If what you are after is the -B stuff too, then perhaps:

2014-12-05  Jakub Jelinek  

PR bootstrap/64023
* Makefile.tpl (EXTRA_TARGET_FLAGS): Set STAGE1_LDFLAGS
to POSTSTAGE1_LDFLAGS and STAGE1_LIBS to POSTSTAGE1_LIBS.
Add -B to libstdc++-v3/src/.libs and libstdc++-v3/libsupc++/.libs
to CXX.
* Makefile.in: Regenerated.

--- Makefile.tpl.jj 2014-11-12 09:31:59.0 +0100
+++ Makefile.tpl2014-12-05 21:12:21.486031062 +0100
@@ -641,7 +641,9 @@ EXTRA_TARGET_FLAGS = \
'AS=$(COMPILER_AS_FOR_TARGET)' \
'CC=$$(CC_FOR_TARGET) $$(XGCC_FLAGS_FOR_TARGET) $$(TFLAGS)' \
'CFLAGS=$$(CFLAGS_FOR_TARGET)' \
-   'CXX=$$(CXX_FOR_TARGET) $$(XGCC_FLAGS_FOR_TARGET) $$(TFLAGS)' \
+   'CXX=$$(CXX_FOR_TARGET) -B$$r/$$(TARGET_SUBDIR)/libstdc++-v3/src/.libs' 
\
+   ' -B$$r/$$(TARGET_SUBDIR)/libstdc++-v3/libsupc++/.libs' \
+   ' $$(XGCC_FLAGS_FOR_TARGET) $$(TFLAGS)' \
'CXXFLAGS=$$(CXXFLAGS_FOR_TARGET)' \
'DLLTOOL=$$(DLLTOOL_FOR_TARGET)' \
'GCJ=$$(GCJ_FOR_TARGET) $$(XGCC_FLAGS_FOR_TARGET) $$(TFLAGS)' \
@@ -659,6 +661,8 @@ EXTRA_TARGET_FLAGS = \
'WINDRES=$$(WINDRES_FOR_TARGET)' \
'WINDMC=$$(WINDMC_FOR_TARGET)' \
'XGCC_FLAGS_FOR_TARGET=$(XGCC_FLAGS_FOR_TARGET)' \
+   'STAGE1_LDFLAGS=$$(POSTSTAGE1_LDFLAGS)' \
+   'STAGE1_LIBS=$$(POSTSTAGE1_LIBS)' \
"TFLAGS=$$TFLAGS"
 
 TARGET_FLAGS_TO_PASS = $(BASE_FLAGS_TO_PASS) $(EXTRA_TARGET_FLAGS)
--- Makefile.in.jj  2014-11-28 14:40:52.0 +0100
+++ Makefile.in 2014-12-05 21:11:48.276616008 +0100
@@ -835,7 +835,9 @@ EXTRA_TARGET_FLAGS = \
'AS=$(COMPILER_AS_FOR_TARGET)' \
'CC=$$(CC_FOR_TARGET) $$(XGCC_FLAGS_FOR_TARGET) $$(TFLAGS)' \
'CFLAGS=$$(CFLAGS_FOR_TARGET)' \
-   'CXX=$$(CXX_FOR_TARGET) $$(XGCC_FLAGS_FOR_TARGET) $$(TFLAGS)' \
+   'CXX=$$(CXX_FOR_TARGET) -B$$r/$$(TARGET_SUBDIR)/libstdc++-v3/src/.libs' 
\
+   ' -B$$r/$$(TARGET_SUBDIR)/libstdc++-v3/libsupc++/.libs' \
+   ' $$(XGCC_FLAGS_FOR_TARGET) $$(TFLAGS)' \
'CXXFLAGS=$$(CXXFLAGS_FOR_TARGET)' \
'DLLTOOL=$$(DLLTOOL_FOR_TARGET)' \
'GCJ=$$(GCJ_FOR_TARGET) $$(XGCC_FLAGS_FOR_TARGET) $$(TFLAGS)' \
@@ -853,6 +855,8 @@ EXTRA_TARGET_FLAGS = \
'WINDRES=$$(WINDRES_FOR_TARGET)' \
'WINDMC=$$(WINDMC_FOR_TARGET)' \
'XGCC_FLAGS_FOR_TARGET=$(XGCC_FLAGS_FOR_TARGET)' \
+   'STAGE1_LDFLAGS=$$(POSTSTAGE1_LDFLAGS)' \
+   'STAGE1_LIBS=$$(POSTSTAGE1_LIBS)' \
"TFLAGS=$$TFLAGS"
 
 TARGET_FLAGS_TO_PASS = $(BASE_FLAGS_TO_PASS) $(EXTRA_TARGET_FLAGS)


Jakub


Re: [PATCH 3/6] combine: handle I2 a parallel of two SETs

2014-12-05 Thread Andreas Schwab
Segher Boessenkool  writes:

> gcc/
>   * combine.c (is_parallel_of_n_reg_sets): New function.
>   (can_split_parallel_of_n_reg_sets): New function.
>   (try_combine): If I2 is a PARALLEL of two SETs, split it into
>   two insns if possible.

This breaks bootstrap on m68k.

../../gcc/gcc/combine.c:2467:1: warning: ‘bool 
is_parallel_of_n_reg_sets(rtx_insn*, int)’ defined but not used 
[-Wunused-function]
 is_parallel_of_n_reg_sets (rtx_insn *insn, int n)
 ^
../../gcc/gcc/combine.c:2494:1: warning: ‘bool 
can_split_parallel_of_n_reg_sets(rtx_insn*, int)’ defined but not used 
[-Wunused-function]
 can_split_parallel_of_n_reg_sets (rtx_insn *insn, int n)
 ^

Tested on m68k-suse-linux and installed as obvious.

Andreas.

* combine.c (is_parallel_of_n_reg_sets)
(can_split_parallel_of_n_reg_sets): Only define if !HAVE_cc0.

diff --git a/gcc/combine.c b/gcc/combine.c
index e6deb41..39f9200 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -2461,6 +2461,7 @@ update_cfg_for_uncondjump (rtx_insn *insn)
 }
 }
 
+#ifndef HAVE_cc0
 /* Return whether INSN is a PARALLEL of exactly N register SETs followed
by an arbitrary number of CLOBBERs.  */
 static bool
@@ -2513,6 +2514,7 @@ can_split_parallel_of_n_reg_sets (rtx_insn *insn, int n)
 
   return true;
 }
+#endif
 
 /* Try to combine the insns I0, I1 and I2 into I3.
Here I0, I1 and I2 appear earlier than I3.
-- 
2.2.0


-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [PATCH, i386] Fix PR64003

2014-12-05 Thread Jeff Law

On 12/05/14 07:57, Ilya Enkovich wrote:

Hi,

This patch fixes PR target/64003 by avoiding functions calls during computations of 
"length" attribute for short jump instructions.  It is achieved by having 
separate templates for prefixed and not prefixed instructions.  Please see discussion in 
bugzilla for reasoning.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  Valgrind run for 
reproducer shows problem is fixed.  OK for trunk?

Thanks,
Ilya
--
2014-12-05  Ilya Enkovich  

* config/i386/i386.md (*jcc_1_bnd): New.
(*jcc_2_bnd): New.
(jump_bnd): New.
(*jcc_1): Remove bnd prefix.
(*jcc_2): Likewise.
(jump): Likewise.
Just wanted to say thanks for taking care of this.  The obscure and 
undocumented rules about what can appear in these length computations 
is, umm, bad and I don't think anyone could reasonably have expected you 
to know about them.


jeff



[PATCH] __has_{,cpp_}attribute fixes (PR preprocessor/63831)

2014-12-05 Thread Jakub Jelinek
Hi!

This patch rewrites __has_attribute support, so that:
1) it is normal built-in macro, so it can be expanded even outside of
   #if (apparently clang supports that)
2) it is expanded properly even during preprocessing
3) there is no __has_attribute__ middle-end secondary macro,
   when it is a built-in macro, it works fine in #ifdef too
4) it is not expanded / does not ICE for -lang-asm preprocessing, or
   e.g. when preprocessing Fortran

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2014-12-05  Jakub Jelinek  

PR preprocessor/63831
* c-cppbuiltin.c (c_cpp_builtins): Don't define __has_attribute
and __has_cpp_attribute here.
* c-ppoutput.c (init_pp_output): Set cb->has_attribute to
c_common_has_attribute.
* c-common.h (c_common_has_attribute): New prototype.
* c-lex.c (init_c_lex): Set cb->has_attribute to
c_common_has_attribute instead of cb_has_attribute.
(get_token_no_padding): New function.
(cb_has_attribute): Renamed to ...
(c_common_has_attribute): ... this.  No longer static.  Use
get_token_no_padding, require ()s, don't build TREE_LIST
unnecessarily, fix up formatting, adjust diagnostics, call
init_attributes.

* directives.c (lex_macro_node): Remove __has_attribute__ handling.
* internal.h (struct spec_node): Remove n__has_attribute__ field.
(struct lexer_state): Remove in__has_attribute__ field.
* macro.c (_cpp_builtin_macro_text): Handle BT_HAS_ATTRIBUTE.
* identifiers.c (_cpp_init_hashtable): Remove __has_attribute__
handling.
* init.c (builtin_array): Add __has_attribute and __has_cpp_attribute.
(cpp_init_special_builtins): Don't initialize __has_attribute
or __has_cpp_attribute if CLK_ASM or pfile->cb.has_attribute is NULL.
* traditional.c (enum ls): Remove ls_has_attribute,
ls_has_attribute_close.
(_cpp_scan_out_logical_line): Remove __has_attribute__ handling.
* include/cpplib.h (enum cpp_builtin_type): Add BT_HAS_ATTRIBUTE.
* pch.c (cpp_read_state): Remove __has_attribute__ handling.
* expr.c (eval_token): Likewise.
(parse_has_attribute): Removed.

* c-c++-common/cpp/pr63831-1.c: New test.
* c-c++-common/cpp/pr63831-2.c: New test.

--- gcc/c-family/c-cppbuiltin.c.jj  2014-11-13 00:07:36.0 +0100
+++ gcc/c-family/c-cppbuiltin.c 2014-12-05 13:55:27.065734603 +0100
@@ -800,11 +800,6 @@ c_cpp_builtins (cpp_reader *pfile)
   cpp_define (pfile, "__has_include(STR)=__has_include__(STR)");
   cpp_define (pfile, "__has_include_next(STR)=__has_include_next__(STR)");
 
-  /* Set attribute test macros for all C/C++ (not for just C++11 etc.)
- The builtin __has_attribute__ is defined in libcpp.  */
-  cpp_define (pfile, "__has_attribute(STR)=__has_attribute__(STR)");
-  cpp_define (pfile, "__has_cpp_attribute(STR)=__has_attribute__(STR)");
-
   if (c_dialect_cxx ())
 {
   if (flag_weak && SUPPORTS_ONE_ONLY)
--- gcc/c-family/c-ppoutput.c.jj2014-08-01 09:23:33.0 +0200
+++ gcc/c-family/c-ppoutput.c   2014-12-05 12:18:57.372593264 +0100
@@ -151,6 +151,8 @@ init_pp_output (FILE *out_stream)
   cb->used_undef = cb_used_undef;
 }
 
+  cb->has_attribute = c_common_has_attribute;
+
   /* Initialize the print structure.  */
   print.src_line = 1;
   print.printed = 0;
--- gcc/c-family/c-common.h.jj  2014-11-21 10:23:58.0 +0100
+++ gcc/c-family/c-common.h 2014-12-05 10:28:12.182500519 +0100
@@ -959,6 +959,7 @@ extern void c_cpp_builtins_optimize_prag
 extern bool c_cpp_error (cpp_reader *, int, int, location_t, unsigned int,
 const char *, va_list *)
  ATTRIBUTE_GCC_DIAG(6,0);
+extern int c_common_has_attribute (cpp_reader *);
 
 extern bool parse_optimize_options (tree, bool);
 
--- gcc/c-family/c-lex.c.jj 2014-11-11 00:05:25.0 +0100
+++ gcc/c-family/c-lex.c2014-12-05 14:22:17.402328011 +0100
@@ -64,7 +64,6 @@ static void cb_ident (cpp_reader *, unsi
 static void cb_def_pragma (cpp_reader *, unsigned int);
 static void cb_define (cpp_reader *, unsigned int, cpp_hashnode *);
 static void cb_undef (cpp_reader *, unsigned int, cpp_hashnode *);
-static int cb_has_attribute (cpp_reader *);
 
 void
 init_c_lex (void)
@@ -89,7 +88,7 @@ init_c_lex (void)
   cb->def_pragma = cb_def_pragma;
   cb->valid_pch = c_common_valid_pch;
   cb->read_pch = c_common_read_pch;
-  cb->has_attribute = cb_has_attribute;
+  cb->has_attribute = c_common_has_attribute;
 
   /* Set the debug callbacks if we can use them.  */
   if ((debug_info_level == DINFO_LEVEL_VERBOSE
@@ -288,57 +287,80 @@ cb_undef (cpp_reader * ARG_UNUSED (pfile
 (const char *) NODE_NAME (node));
 }
 
+/* Wrapper around cpp_get_token to skip CPP_PADDING tokens
+   and not consume CPP_EOF.  */
+static const cpp_token *
+get_token_no_padding (cpp_reader *pfile)

Re: [PATCH] PR other/63613: Add fixincludes for dejagnu.h

2014-12-05 Thread Jeff Law

On 12/04/14 15:42, Rainer Orth wrote:

David Malcolm  writes:


 assumed -fgnu89-inline until a recent upstream fix;
see http://lists.gnu.org/archive/html/dejagnu/2014-10/msg00011.html

Remove the workaround from jit.exp that used -fgnu89-inline
in favor of a fixincludes to dejagnu.h that applies the upstream fix
to a local copy.

This should make it easier to support C++ testcases from jit.exp.


I wonder how this would work if dejagnu.h doesn't live in a system
include dir (e.g. a self-compiled version)?  fixincludes won't touch
those AFAIU.  The previous version with -fgnu89-inline would still work
in that case provided dejagnu.h is found at all.

Presumably in that case the answer is upgrade dejagnu? :-)

jeff


Re: [PATCH] PR other/63613: Add fixincludes for dejagnu.h

2014-12-05 Thread Jeff Law

On 12/04/14 15:19, David Malcolm wrote:

 assumed -fgnu89-inline until a recent upstream fix;
see http://lists.gnu.org/archive/html/dejagnu/2014-10/msg00011.html

Remove the workaround from jit.exp that used -fgnu89-inline
in favor of a fixincludes to dejagnu.h that applies the upstream fix
to a local copy.

This should make it easier to support C++ testcases from jit.exp.

(I also needed to fix up the jit.dg/test-threads.c due to the
preprocessor tricks that that test plays in order to make
dejagnu.h be threadsafe).

This is the first time I've touched the "fixincludes" directory;
is this the correct way to make a change here?

Successfully bootstrapped & regrtested on x86_64-unknown-linux-gnu
(Fedora 20).

OK for trunk?

fixincludes/ChangeLog:
PR other/63613
* inclhack.def (dejagnu_h_make_inline_functions_static): New fix.
* fixincl.x: Regenerate.
* tests/base/dejagnu.h: New.

OK.
jeff



Re: Compare-elim pass (was: Re: [PATCH] Fix PR 61225)

2014-12-05 Thread Jeff Law

On 12/04/14 00:41, Uros Bizjak wrote:

Hello!


I also wonder if compare-elim ought to be helping here.  Isn't that the
point here, to eliminate the comparison and instead get it for free as
part of the arithmetic?  If so, is it the fact that we have memory
references that prevents compare-elim from kicking in?


Yes, compare-elim doesn't work with memory references but, more radically, it
is not enabled for x86 (it is only enabled for aarch64, mn10300 and rx).


I did experiment a bit with a compare-elim pass on x86. However, as
rth said in [1]:

--quote--
If we want to use this pass for x86, then for 4.8 we should also fix the
discrepancy between the compare-elim canonical

   [(operate)
(set-cc)]

and the combine canonical

   [(set-cc)
(operate)]

(Because of the simplicity of the substitution in compare-elim, I prefer
the former as the canonical canonical.)
--/quote--

There were some patches flowing around [2], [3] that enhanced
compare-elim pass for x86 needs, but the target never switched to new
pass, mostly because compare-elim pass did not catch all cases that
traditional RTX combine pass did. However, combine-elim pass can cross
BB boundaries, where traditional RTX combine doesn't (and IIRC it even
has a comment why it doesn't try too hard to do so).

The reason why x86 doesn't use both passes is simply due to the fact
quoted above. compare-elim pass substitutes the clobber in the
PARALLEL RTX with a new set-cc in-place, so all relevant patterns in
i386.md (and a couple of support functions in i386.c) would have to be
swapped around. Unfortunately, simply changing i386.md insn patterns
would disable existing RTX combiner functionality, leading to various
missed-optimization regressions.

Due to the above, I would like to propose that existing RTX compare
pass be updated to handle [(operate)(set-cc)] patterns (exclusively?).
 From my experience, compare-elim post-reload pass would catch a bunch
of remaining cross-BB opportunities, left by RTX combine pass, so
compare-elim pass would be effective on x86 also after RTX combiner
does its job. While target-dependent changes would be fairly trivial,
I don't know about the amount of work in combine.c to handle new
canonical patterns. Maybe RTL maintainer can chime in (hint, hint, wnk
wink ;)

There is also hidden benefit for "other", compare-elim only targets.
Having this pass enabled on a wildly popular target would help
catching eventual bugs in the pass.

[1] https://gcc.gnu.org/ml/gcc-patches/2012-02/msg00251.html
[2] https://gcc.gnu.org/ml/gcc-patches/2012-02/msg00466.html
[3] https://gcc.gnu.org/ml/gcc-patches/2012-04/msg01487.html
My first thought would be to allow both and have combine swap the order 
in the vector if recog doesn't recognize the pattern.  One could argue 
we could go through a full permutation of ordering in the vector, but 
that's probably above and beyond the call of duty.


Or maybe have the genfoo programs generate the multiple permutations of 
patterns for different ordererings of the elements within the parallel.


Jeff


Uros.





Re: [PATCH] Don't promote RHS of shift-expr if it's integer_type (PR tree-optimization/64183)

2014-12-05 Thread Jeff Law

On 12/05/14 12:00, Marek Polacek wrote:

On Fri, Dec 05, 2014 at 12:08:02PM +0100, Jakub Jelinek wrote:

On Fri, Dec 05, 2014 at 11:27:50AM +0100, Marek Polacek wrote:

My recent change to shift operand promotion caused a regression in
loop unrolling.  Fixed as Richi suggested in the PR audit trail.

Bootstrapped/regtested on ppc64-linux and x86_64-linux, ok for trunk?

2014-12-05  Marek Polacek  

PR tree-optimization/64183
* c-gimplify.c (c_gimplify_expr): Don't convert the RHS of a
shift-expression if it is integer_type_node.

* gcc.dg/tree-ssa/pr64183.c: New test.


This is for middle-end, so I think it would be better to use
the middle-end type equality in the checks, so !types_compatible_p
instead of != ?


Ok, this is a variant with types_compatible_p instead.

Bootstrapped/regtested on ppc64-linux and x86_64-linux.

2014-12-05  Marek Polacek  

PR tree-optimization/64183
* c-gimplify.c (c_gimplify_expr): Don't convert the RHS of a
shift-expression if it is integer_type_node.  Use types_compatible_p.

* gcc.dg/tree-ssa/pr64183.c: New test.

OK.
jeff



Re: [PATCH, PR 64192] Add forgotten conversion from bits to bytes

2014-12-05 Thread Jeff Law

On 12/05/14 09:13, Martin Jambor wrote:

Hi,

at some point I lost an important division of bit offset by
BITS_PER_UNIT in my alignment IPA-CP propagation patch. That lead to a
few failures on i686 reported as PR 64192.

This patch adds it together with a slight improvement of the guarding
check which I suppose will never trigger but it does ensure the
division will never loose information.

I consider this change obvious and would really like to commit it
before I leave for the weekend, so I will do so after it finishes
bootstrapping and testing on i686.  It has already passed bootstrap
and testing on x86_64-linux.

Thanks,

Martin


2014-12-05  Martin Jambor  

PR ipa/64192
* ipa-prop.c (ipa_compute_jump_functions_for_edge): Convert alignment
from bits to bytes after checking they are byte-aligned.

OK.

If you could add a testcase when you get back that'd be appreciated.

jeff



PATCH: PR target/64200: CE: in decide_alg, at config/i386/i386.c:24510 with -mmemcpy-strategy=libcall:-1:align -minline-stringops-dynamically fails with -mcmodel=large -fpic

2014-12-05 Thread H.J. Lu
GCC manual says:

'-minline-stringops-dynamically'
 For string operations of unknown size, use run-time checks with
 inline code for small blocks and a library call for large blocks.

we get

Breakpoint 5, decide_alg (count=0, expected_size=-1, min_size=0, 
max_size=2147483647, memset=false, zero_memset=false, 
dynamic_check=0x7fffc924, noalign=0x7fffc923)
at /export/gnu/import/git/gcc/gcc/config/i386/i386.c:24510
24510 if (TARGET_INLINE_STRINGOPS_DYNAMICALLY)
(gdb) p alg
$1 = libcall
(gdb) 

libcall is a valid choice here for -minline-stringops-dynamically.  This
patch avoids assert "alg != libcall" for
TARGET_INLINE_STRINGOPS_DYNAMICALLY.  OK for trunk and 4.9 branch?

Thanks.


H.J.
---
gcc/

2014-12-05  H.J. Lu  

PR target/64200
* config/i386/i386.c (decide_alg): Don't assert "alg != libcall"
for TARGET_INLINE_STRINGOPS_DYNAMICALLY.

gcc/testsuite/

2014-12-05  H.J. Lu  

PR target/64200
* gcc.target/i386/memcpy-strategy-4.c: New test.

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 4f1a18b..aaf0b38 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -24507,9 +24507,10 @@ decide_alg (HOST_WIDE_INT count, HOST_WIDE_INT 
expected_size,
   alg = decide_alg (count, max / 2, min_size, max_size, memset,
zero_memset, dynamic_check, noalign);
   gcc_assert (*dynamic_check == -1);
-  gcc_assert (alg != libcall);
   if (TARGET_INLINE_STRINGOPS_DYNAMICALLY)
*dynamic_check = max;
+  else
+   gcc_assert (alg != libcall);
   return alg;
 }
   return (alg_usable_p (algs->unknown_size, memset)
diff --git a/gcc/testsuite/gcc.target/i386/memcpy-strategy-4.c 
b/gcc/testsuite/gcc.target/i386/memcpy-strategy-4.c
new file mode 100644
index 000..5c51248
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/memcpy-strategy-4.c
@@ -0,0 +1,21 @@
+/* PR target/64200 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -march=atom -mmemcpy-strategy=libcall:-1:align 
-minline-stringops-dynamically" } */
+
+#include 
+
+extern void bar(char *x);
+
+void foo (int size, ...)
+{
+  struct
+  {
+char x[size];
+  } d;
+
+  va_list ap;
+  va_start(ap, size);
+  d = va_arg(ap, typeof (d));
+  va_end(ap);
+  bar(d.x);
+}


Re: [PATCH] PR other/63613: Add fixincludes for dejagnu.h

2014-12-05 Thread Bruce Korb
>> This is the first time I've touched the "fixincludes" directory;
>> is this the correct way to make a change here?

Well, I'd like to see it -- especially since it's your first.
Please send to this gmail account or wait until I get my GNU email this weekend.
Thanks!

>> Successfully bootstrapped & regrtested on x86_64-unknown-linux-gnu
>> (Fedora 20).
>>
>> OK for trunk?

More likely than not :)


Re: [PATCH, PR 64192] Add forgotten conversion from bits to bytes

2014-12-05 Thread H.J. Lu
On Fri, Dec 5, 2014 at 12:46 PM, Jeff Law  wrote:
> On 12/05/14 09:13, Martin Jambor wrote:
>>
>> Hi,
>>
>> at some point I lost an important division of bit offset by
>> BITS_PER_UNIT in my alignment IPA-CP propagation patch. That lead to a
>> few failures on i686 reported as PR 64192.
>>
>> This patch adds it together with a slight improvement of the guarding
>> check which I suppose will never trigger but it does ensure the
>> division will never loose information.
>>
>> I consider this change obvious and would really like to commit it
>> before I leave for the weekend, so I will do so after it finishes
>> bootstrapping and testing on i686.  It has already passed bootstrap
>> and testing on x86_64-linux.
>>
>> Thanks,
>>
>> Martin
>>
>>
>> 2014-12-05  Martin Jambor  
>>
>> PR ipa/64192
>> * ipa-prop.c (ipa_compute_jump_functions_for_edge): Convert
>> alignment
>> from bits to bytes after checking they are byte-aligned.
>
> OK.
>
> If you could add a testcase when you get back that'd be appreciated.
>

It is covered by existing testcases.  Those failed on Linux/i686:

FAIL: gcc.c-torture/execute/pr37573.c   -O3 -fomit-frame-pointer  execution test
FAIL: gcc.c-torture/execute/pr37573.c   -O3 -fomit-frame-pointer
-funroll-all-loops -finline-functions  execution test
FAIL: gcc.c-torture/execute/pr37573.c   -O3 -fomit-frame-pointer
-funroll-loops  execution test
FAIL: gcc.c-torture/execute/pr37573.c   -O3 -g  execution test
FAIL: gcc.dg/vect/pr60196-1.c execution test
FAIL: gcc.dg/vect/pr60196-1.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-multitypes-11.c -flto -ffat-lto-objects execution test
FAIL: gcc.dg/vect/vect-multitypes-12.c -flto -ffat-lto-objects execution test

-- 
H.J.


Re: [PATCH][ARM] FreeBSD arm support, EABI, v2

2014-12-05 Thread Andreas Tobler

Ping?!

Thanks,
Andreas

On 27.11.14 21:56, Andreas Tobler wrote:

Hi all,

this is the second attempt.

I reworked the issues Richard mentioned in the previous review.
I also found one issue which will break build/bootstrap if I pass
--enable-gnu-indirect-function, also fixed.

One thing which came up is the way we generate code for the
armv6*-*-freebsd* triplet versus the arm-*-freebsd* triplet.

I think the thing which confuses is the fact that we have only two fixed
triplets where we build a complete OS with. Means the whole OS is built
with the same optimization, not only the kernel or one binary.

For the armv6* we want to benefit from the cpu's functionality by
default. We build all __ARM_ARCH >= 6 with TARGET_CPU_arm1176jzs,
on the other hand all __ARM_ARCH <=5 will be built with TARGET_CPU_arm9.

Now who becomes arm-*-freebsd* and who becomes armv6*-*-freebsd*?

As tried above, we only know two triplets, so __ARM_ARCH >= 6 becomes
armv6*-*-freebsd* and __ARM_ARCH <=5 becomes arm-*-freebsd*.

armv8 is not yet in the portfolio and it will become something
different, either arm64 or aarch64, I do not know.

I'd like to keep this since our system compilers, clang and gcc-4.2.1
behave the same.
If we have to change here, we would confuse people quite a lot.

The whole thing is FreeBSD specific and does not touch others.

As usual, bootstrapped, cross compiled, tested.

Ok for trunk?

TIA,
Andreas

toplevel:

* configure.ac: Don't add ${libgcj} for arm*-*-freebsd*.
* configure: Regenerate.
gcc:
* config.gcc (arm*-*-freebsd*): New configuration.
* config/arm/freebsd.h: New file.
* config.host: Add extra components for arm*-*-freebsd*.
* config/arm/arm.h: Introduce MAX_SYNC_LIBFUNC_SIZE.
* config/arm/arm.c (arm_init_libfuncs): Use MAX_SYNC_LIBFUNC_SIZE.

libgcc:

* config.host (arm*-*-freebsd*): Add new configuration for
arm*-*-freebsd*.
* config/arm/freebsd-atomic.c: New file.
* config/arm/t-freebsd: Likewise.
* config/arm/unwind-arm.h: Add __FreeBSD__ to the list of
'PC-relative indirect' OS's.

libatomic:

* configure.tgt: Exclude arm*-*-freebsd* from try_ifunc.

libstdc++-v3:

* configure.host: Add arm*-*-freebsd* port_specific_symbol_files.





[PATCH] Note issues using function calls in length computations

2014-12-05 Thread Jeff Law


David Malcolm's valgrind testing uncovered a problem where the x86 port 
was using a function call to compute the length of a variable length 
insn.  This causes a certain amount of heartburn for genattrtab's 
attempts to compute worst case estimates of instruction lengths with the 
resulting valgrind symptom being uninitialized memory reads.


There's known workarounds for this issue where a dummy length clause can 
be used to tell genattrtab the worst case instruction length.  The SH 
and PA port are known to use those workarounds.


This minor doc patch mentions the issue and refers readers to the PA 
port's call pattern where the workaround is used.


Testing via "make doc" and installed onto the trunk.

commit 650ccd2a9782687056678276034f44371892ae9a
Author: law 
Date:   Fri Dec 5 22:19:26 2014 +

* doc/md.texi: Note problems using function calls to determine
insn lengths and point readers to a potential workaround.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@218439 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 9538737..0dcf649 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2014-12-05  Jeff Law  
+
+   * doc/md.texi: Note problems using function calls to determine
+   insn lengths and point readers to a potential workaround.
+
 2014-12-05  Andreas Schwab  
 
* combine.c (is_parallel_of_n_reg_sets)
diff --git a/gcc/doc/md.texi b/gcc/doc/md.texi
index b8e5ac5..1c70a77 100644
--- a/gcc/doc/md.texi
+++ b/gcc/doc/md.texi
@@ -8345,6 +8345,12 @@ the number of vectors multiplied by the size of each 
vector.
 
 Lengths are measured in addressable storage units (bytes).
 
+Note that it is possible to call functions via the @code{symbol_ref}
+mechanism to compute the length of an insn.  However, if you use this
+mechanism you must provide dummy clauses to express the maximum length
+without using the function call.  You can an example of this in the
+@code{pa} machine description for the @code{call_symref} pattern.
+
 The following macros can be used to refine the length computation:
 
 @table @code


Re: [PATCH, i386] Fix PR64003

2014-12-05 Thread Jeff Law

On 12/05/14 08:42, Uros Bizjak wrote:

Hello!


This patch fixes PR target/64003 by avoiding functions calls during computations of 
"length"
attribute for short jump instructions.  It is achieved by having separate 
templates for prefixed and
not prefixed instructions.  Please see discussion in bugzilla for reasoning.

Bootstrapped and tested on x86_64-unknown-linux-gnu.  Valgrind run for 
reproducer shows
problem is fixed.  OK for trunk?

2014-12-05  Ilya Enkovich  

* config/i386/i386.md (*jcc_1_bnd): New.
(*jcc_2_bnd): New.
(jump_bnd): New.
(*jcc_1): Remove bnd prefix.
(*jcc_2): Likewise.
(jump): Likewise.


Let's proceed with the above version to stay on the safe side for now.

OK for mainline, but please investigate usage of ADJUST_INSN_LENGTH
for ibr-type and ret instructions.
FWIW, I've just committed a minor update to the gcc documentation for 
this issue.  I'm rather depressed this was known for 15 years but 
nobody's ever documented the issue.  Sigh.


Jeff


Re: [PATCH] Fix PR 61225

2014-12-05 Thread Jeff Law

On 12/04/14 13:49, Segher Boessenkool wrote:

On Thu, Dec 04, 2014 at 04:43:34PM +0800, Zhenqiang Chen wrote:

C code:

 if (!--*p)

rtl code:

 6: r91:SI=[r90:SI]
 7: {r88:SI=r91:SI-0x1;clobber flags:CC;}
 8: [r90:SI]=r88:SI
 9: flags:CCZ=cmp(r88:SI,0)

expected output:

 8: {flags:CCZ=cmp([r90:SI]-0x1,0);[r90:SI]=[r90:SI]-0x1;}

in assemble, it is

   decl (%eax)


Combine does not consider combining 9 into 7 because there is no LOG_LINK
between them (the link for r88 is between 8 and 7 already).
OK, yea, that's a long standing design decision.  We don't feed a single 
def into multiple use sites.


jeff




Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-05 Thread Dominique Dhumieres
> As I've tried to explain, that is IMHO wrong though.
> If what you are after is the -B stuff too, then perhaps:
> ...

Sorry but it does not work:

true  DO=all multi-do # make
make[4]: Leaving directory '/opt/gcc/build_w/libbacktrace'
make[3]: Leaving directory '/opt/gcc/build_w/libbacktrace'
make[3]: Entering directory '/opt/gcc/build_w/libcpp'
make[3]: Nothing to be done for 'all'.
make[3]: Leaving directory '/opt/gcc/build_w/libcpp'
make[3]: Entering directory '/opt/gcc/build_w/libdecnumber'
make[3]: Nothing to be done for 'all'.
make[3]: Leaving directory '/opt/gcc/build_w/libdecnumber'
make[3]: Entering directory '/opt/gcc/build_w/gcc'
make[3]: Leaving directory '/opt/gcc/build_w/gcc'
Checking multilib configuration for libgcc...
make[3]: Entering directory '/opt/gcc/build_w/x86_64-apple-darwin14.0.0/libgcc'
make[3]: *** No rule to make target ' 
-B$r/$(TARGET_SUBDIR)/libstdc++-v3/libsupc++/.libs'.  Stop.
make[3]: Leaving directory '/opt/gcc/build_w/x86_64-apple-darwin14.0.0/libgcc'
Makefile:14905: recipe for target 'all-stage1-target-libgcc' failed
make[2]: *** [all-stage1-target-libgcc] Error 2
make[2]: Leaving directory '/opt/gcc/build_w'
Makefile:21193: recipe for target 'stage1-bubble' failed
make[1]: *** [stage1-bubble] Error 2
make[1]: Leaving directory '/opt/gcc/build_w'
Makefile:910: recipe for target 'all' failed
make: *** [all] Error 2

Dominique


Re: [PATCH] Fix PR 61225

2014-12-05 Thread Jeff Law

On 12/04/14 13:57, Segher Boessenkool wrote:


So combine tries to combine 6+7+8; the RTL it comes up with is a parallel
of the memory decrement (without cc clobber, but that is fine), and setting
r88 to the mem minus one.  There is no such pattern in the target, and
combine cannot break the parallel into two sets (because the first modifies
the mem used by the second), so 6+7+8 doesn't combine.

Adding a bridge pattern in the target would work; or you can enhance combine
so it can break up this parallel correctly.
I think myself or someone suggested a bridge pattern in the past, but I 
can't find it, perhaps it was one of the other threads WRT limitations 
of the combiner.


Zhenqiang, can you look at what happens if you provide a pattern for 
6+7+8 (probably via a define_and_split)?


Jeff



Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-05 Thread Jeff Law

On 12/05/14 15:34, Dominique Dhumieres wrote:

As I've tried to explain, that is IMHO wrong though.
If what you are after is the -B stuff too, then perhaps:
...


Sorry but it does not work:
BTW, thanks for working with Jakub on this.  We're going to be getting a 
Darwin box for Jakub and other folks in the Red Hat team to use when the 
need arises to dig into these kind of issues.


However, until that box arrives and is setup, this kind of iteration is 
the only way he can test Darwin stuff.


Jeff



[PATCH] Fix size & type for cold partition names (hot-cold function partitioning)

2014-12-05 Thread Caroline Tice
When hot/cold function splitting occurs, a symbol is generated for the
cold partition, and gets output in the assembly & debug info, but the
symbol currently gets a size of 0 and a type of NOTYPE, as in this
example (on x86_64-linux) from the cold_partition_label test in the
testsuite:

$ readelf -sW cold_partition_label.x02 | grep foo
36: 00400450 0 NOTYPE  LOCAL  DEFAULT   12 foo.cold.0
58: 0040049043 FUNCGLOBAL DEFAULT   12 foo
$

This patch fixes this by calculating the right size for the partition,
and outputing the size and type fo the cold partition symbol.  After
applying this patch and looking at the same test, I get:

$ readelf -sW cold_partition_label.x02 | grep foo
36: 0040045029 FUNCLOCAL  DEFAULT   12 foo.cold.0
58: 0040049043 FUNCGLOBAL DEFAULT   12 foo
$

This patch has been tested by bootstrapping the compiler, running the
dejagnu testsuite with no regressions, and checked as shown above that
it fixes the original problem.  Is this patch OK to commit to ToT?

-- Caroline Tice
cmt...@google.com

 2014-12-05  Caroline Tice  

* final.c (final_scan_insn): Change 'cold_function_name' to
'cold_partition_name' and make it a global variable; also output
assembly to give it a 'FUNC' type, if appropriate.
* varasm.c (cold_partition_name): Declare and initialize global
variable.
(assemble_start_function): Re-set value for cold_partition_name.
(assemble_end_function): Output assembly to calculate size of cold
partition, and associate size with name, if appropriate.
* varash.h (cold_partition_name): Add extern declaration for global
variable.
Index: gcc/final.c
===
--- gcc/final.c	(revision 218434)
+++ gcc/final.c	(working copy)
@@ -2223,10 +2223,16 @@
 	 suffixing "cold" to the original function's name.  */
 	  if (in_cold_section_p)
 	{
-	  tree cold_function_name
+	  cold_partition_name
 		= clone_function_name (current_function_decl, "cold");
+#ifdef ASM_DECLARE_FUNCTION_NAME
+	  ASM_DECLARE_FUNCTION_NAME (asm_out_file,
+	 IDENTIFIER_POINTER (cold_partition_name),
+	 current_function_decl);
+#else
 	  ASM_OUTPUT_LABEL (asm_out_file,
-IDENTIFIER_POINTER (cold_function_name));
+IDENTIFIER_POINTER (cold_partition_name));
+#endif
 	}
 	  break;
 
Index: gcc/varasm.c
===
--- gcc/varasm.c	(revision 218434)
+++ gcc/varasm.c	(working copy)
@@ -171,6 +171,13 @@
at the cold section.  */
 bool in_cold_section_p;
 
+/* The following global holds the partition name for the code in the
+   cold section of a function, if hot/cold function splitting is enabled
+   and there was actually code that went into the cold section.  A
+   pseudo function name is needed for the cold section of code for some
+   debugging tools that perform symbolization. */
+tree cold_partition_name = NULL_TREE;
+
 /* A linked list of all the unnamed sections.  */
 static GTY(()) section *unnamed_sections;
 
@@ -1708,6 +1715,7 @@
   ASM_GENERATE_INTERNAL_LABEL (tmp_label, "LCOLDE", const_labelno);
   crtl->subsections.cold_section_end_label = ggc_strdup (tmp_label);
   const_labelno++;
+  cold_partition_name = NULL_TREE;
 }
   else
 {
@@ -1843,6 +1851,10 @@
 
   save_text_section = in_section;
   switch_to_section (unlikely_text_section ());
+  if (cold_partition_name != NULL_TREE)
+	ASM_DECLARE_FUNCTION_SIZE (asm_out_file,
+   IDENTIFIER_POINTER (cold_partition_name),
+   decl);
   ASM_OUTPUT_LABEL (asm_out_file, crtl->subsections.cold_section_end_label);
   if (first_function_block_is_cold)
 	switch_to_section (text_section);
Index: gcc/varasm.h
===
--- gcc/varasm.h	(revision 218434)
+++ gcc/varasm.h	(working copy)
@@ -20,6 +20,13 @@
 #ifndef GCC_VARASM_H
 #define GCC_VARASM_H
 
+/* The following global holds the partition name for the code in the
+   cold section of a function, if hot/cold function splitting is enabled
+   and there was actually code that went into the cold section.  A
+   pseudo function name is needed for the cold section of code for some
+   debugging tools that perform symbolization. */
+extern tree cold_partition_name;
+
 extern tree tree_output_constant_def (tree);
 extern void make_decl_rtl (tree);
 extern rtx make_decl_rtl_for_debug (tree);


Re: [PATCH] Fix the rest of PR target/64056

2014-12-05 Thread Jeff Law

On 12/04/14 08:22, Ilya Enkovich wrote:

On 04 Dec 15:58, Rainer Orth wrote:

Hi Ilya,



This patch adds a check for stpcpy function into
gcc.target/i386/chkp-strlen-2.c test.

make check RUNTESTFLAGS="i386.exp=chkp-strlen-2.c" is OK.  OK for trunk?

Thanks,
Ilya
--
2014-12-04  Ilya Enkovich  

PR target/64056
* lib/target-supports.exp (check_effective_target_stpcpy): New.


new effective-target keywords need documentation in sourcebuild.texi.

Rainer

--
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Thanks for the notice!  I see there is also no description for mempcpy target 
check.  So add both of them.

Thanks,
Ilya
--
gcc/

2014-12-04  Ilya Enkovich  

PR target/64056
* doc/sourcebuild.texi: Add mempcpy and stpcpy for Effective-Target 
Keywords.

gcc/testsuite/

2014-12-04  Ilya Enkovich  

PR target/64056
* lib/target-supports.exp (check_effective_target_stpcpy): New.
* gcc.target/i386/chkp-strlen-2.c: Add stpcpy target check.

OK.
jeff



RFA: patch to fix PR64157

2014-12-05 Thread Vladimir Makarov

  The following patch fixes

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64157


  After calling target_reinit from save_target_globals for switchable 
targets (as ppc), a lot of ira data (register sets, classes etc) become 
undefined.  After that ira-costs.c crashes when the undefined data are used.


  The patch was successfully bootstrapped and tested on x86-64.

  Ok to commit to the trunk?

2014-12-05  Vladimir Makarov  

PR rtl-optimization/64157
* toplev.c (target_reinit): Call ira_init.
Index: toplev.c
===
--- toplev.c(revision 218378)
+++ toplev.c(working copy)
@@ -1888,6 +1888,8 @@ target_reinit (void)
   /* This invokes target hooks to set fixed_reg[] etc, which is
  mode-dependent.  */
   init_regs ();
+  /* Set IRA data depended on target parameters.  */
+  ira_init ();
 
   /* Reinitialize lang-dependent parts.  */
   lang_dependent_init_target ();


Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-05 Thread Phil Muldoon
On 05/12/14 22:40, Jeff Law wrote:
> On 12/05/14 15:34, Dominique Dhumieres wrote:
>>> As I've tried to explain, that is IMHO wrong though.
>>> If what you are after is the -B stuff too, then perhaps:
>>> ...
>>
>> Sorry but it does not work:
> BTW, thanks for working with Jakub on this.  We're going to be getting a 
> Darwin box for Jakub and other folks in the Red Hat team to use when the need 
> arises to dig into these kind of issues.
>
> However, until that box arrives and is setup, this kind of iteration is the 
> only way he can test Darwin stuff.
>
> Jeff
>

Indeed I feel especially bad in these scenarios where patches are
suggested for a patch I submitted and are causing you folks problems.
I really do not want to do that.  So many architectures for GCC, so
very few resources.  Hopefully as Jeff indicates, this will be sorted
soon.  Again from a libcc1 point of view, as long as we have the .so
built on all configurations, that is what matters.  I have not chipped
into these threads as I have nothing to say/recommend about darwin
architectures :( I do read them all, though.

Cheers,

Phil



[PATCH], PR 64204, Fix long double constants on powerpc little endian

2014-12-05 Thread Michael Meissner
After my upper regs patches went in, I noticed that the gcc.dg/c11-atomic-2.c
test would fail on a power8 host that was running in little endian mode.  This
particular test only fails if you are compiling this code with no optimization,
and power8 selected as the cpu.  Ultimately, it fails in reload when an array
index is way out of bounds.

In looking at it, it is due to rs6000_emit_move creating two separate moves of
SUBREG's of TFmode to assign a constant during RTL generation.  I fixed this so
this 'optimization' is only done if DFmode values can only go in the
traditional registers.  While I was at it, I optimized setting TFmode variables
to 0.0L to use xxlxor rather than loading up 2 double words of memory.

I have done bootstraps on big endian power7, big endian power8, and little
endian power8 with no regressions in the test suite.  I also have built the
Spec 2006 test suite for power7.  Can I install these patches?

2014-12-05  Michael Meissner  

PR target/64204
* config/rs6000/rs6000.c (rs6000_emit_move): Do not split TFmode
constant moves if -mupper-regs-df.

* config/rs6000/rs6000.md (mov_64bit_dm): Optimize moving
0.0L to TFmode.
(movtd_64bit_nodm): Likewise.
(mov_32bit, FMOVE128 case): Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 218388)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -8396,9 +8396,11 @@ rs6000_emit_move (rtx dest, rtx source, 
  || ! nonimmediate_operand (operands[0], mode)))
 goto emit_set;
 
-  /* 128-bit constant floating-point values on Darwin should really be
- loaded as two parts.  */
+  /* 128-bit constant floating-point values on Darwin should really be loaded
+ as two parts.  However, this premature splitting is a problem when DFmode
+ values can go into Altivec registers.  */
   if (!TARGET_IEEEQUAD && TARGET_LONG_DOUBLE_128
+  && !reg_addr[DFmode].scalar_in_vmx_p
   && mode == TFmode && GET_CODE (operands[1]) == CONST_DOUBLE)
 {
   rs6000_emit_move (simplify_gen_subreg (DFmode, operands[0], mode, 0),
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 218388)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -8086,8 +8086,8 @@ (define_expand "mov"
 ;; problematical.  Don't allow direct move for this case.
 
 (define_insn_and_split "*mov_64bit_dm"
-  [(set (match_operand:FMOVE128 0 "nonimmediate_operand" "=m,d,d,Y,r,r,r,wm")
-   (match_operand:FMOVE128 1 "input_operand" "d,m,d,r,YGHF,r,wm,r"))]
+  [(set (match_operand:FMOVE128 0 "nonimmediate_operand" 
"=m,d,d,ws,Y,r,r,r,wm")
+   (match_operand:FMOVE128 1 "input_operand" "d,m,d,j,r,jYGHF,r,wm,r"))]
   "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_POWERPC64
&& (mode != TDmode || WORDS_BIG_ENDIAN)
&& (gpc_reg_operand (operands[0], mode)
@@ -8096,11 +8096,11 @@ (define_insn_and_split "*mov_64bit
   "&& reload_completed"
   [(pc)]
 { rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
-  [(set_attr "length" "8,8,8,12,12,8,8,8")])
+  [(set_attr "length" "8,8,8,8,12,12,8,8,8")])
 
 (define_insn_and_split "*movtd_64bit_nodm"
-  [(set (match_operand:TD 0 "nonimmediate_operand" "=m,d,d,Y,r,r")
-   (match_operand:TD 1 "input_operand" "d,m,d,r,YGHF,r"))]
+  [(set (match_operand:TD 0 "nonimmediate_operand" "=m,d,d,ws,Y,r,r")
+   (match_operand:TD 1 "input_operand" "d,m,d,j,r,jYGHF,r"))]
   "TARGET_HARD_FLOAT && TARGET_FPRS && TARGET_POWERPC64 && !WORDS_BIG_ENDIAN
&& (gpc_reg_operand (operands[0], TDmode)
|| gpc_reg_operand (operands[1], TDmode))"
@@ -8108,11 +8108,11 @@ (define_insn_and_split "*movtd_64bit_nod
   "&& reload_completed"
   [(pc)]
 { rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
-  [(set_attr "length" "8,8,8,12,12,8")])
+  [(set_attr "length" "8,8,8,8,12,12,8")])
 
 (define_insn_and_split "*mov_32bit"
-  [(set (match_operand:FMOVE128 0 "nonimmediate_operand" "=m,d,d,Y,r,r")
-   (match_operand:FMOVE128 1 "input_operand" "d,m,d,r,YGHF,r"))]
+  [(set (match_operand:FMOVE128 0 "nonimmediate_operand" "=m,d,d,ws,Y,r,r")
+   (match_operand:FMOVE128 1 "input_operand" "d,m,d,j,r,jYGHF,r"))]
   "TARGET_HARD_FLOAT && TARGET_FPRS && !TARGET_POWERPC64
&& (gpc_reg_operand (operands[0], mode)
|| gpc_reg_operand (operands[1], mode))"
@@ -8120,7 +8120,7 @@ (define_insn_and_split "*mov_32bit
   "&& reload_completed"
   [(pc)]
 { rs6000_split_multireg_move (operands[0], operands[1]); DONE; }
-  [(set_attr "length" "8,8,8,20,20,16")])
+  [(set_attr "length" "8,8,8,8,20,20,16")])
 
 (define_insn_and_split "*mov_softfloat"
   [(set (match_operand:FMOVE128 0 "rs6000_nonimmediate_operand" "=Y,r,r")


Re: [PATCH 01/02] PR jit/64166: Add methods to gcc::dump_manager needed by JIT testing

2014-12-05 Thread Jeff Law

On 12/04/14 15:03, David Malcolm wrote:

This is the non-JIT part of the patch for PR jit/64166.

Provide a way for the JIT to lookup a dump_file_info * by switch name,
and to get from there to the filename.

OK for trunk?

gcc/ChangeLog:
PR jit/64166
* dumpfile.c (gcc::dump_manager::get_dump_file_info_by_switch):
New function.
(gcc::dump_manager::get_dump_file_name): Split out bulk of
implementation into a new overloaded variant taking a
dump_file_info *.
* dumpfile.h (gcc::dump_manager::get_dump_file_info_by_switch):
New function.
(gcc::dump_manager::get_dump_file_name): New overloaded variant of
this function, taking a dump_file_info *.

OK
jeff



Re: [patch, build] Restore bootstrap in building libcc1 on darwin

2014-12-05 Thread Jakub Jelinek
On Fri, Dec 05, 2014 at 11:34:28PM +0100, Dominique Dhumieres wrote:
> > As I've tried to explain, that is IMHO wrong though.
> > If what you are after is the -B stuff too, then perhaps:
> > ...
> 
> Sorry but it does not work:

Sorry, make that (just removed 4x ' in each file):

2014-12-05  Jakub Jelinek  

PR bootstrap/64023
* Makefile.tpl (EXTRA_TARGET_FLAGS): Set STAGE1_LDFLAGS
to POSTSTAGE1_LDFLAGS and STAGE1_LIBS to POSTSTAGE1_LIBS.
Add -B to libstdc++-v3/src/.libs and libstdc++-v3/libsupc++/.libs
to CXX.
* Makefile.in: Regenerated.

--- Makefile.tpl.jj 2014-11-12 09:31:59.0 +0100
+++ Makefile.tpl2014-12-05 21:12:21.486031062 +0100
@@ -641,7 +641,9 @@ EXTRA_TARGET_FLAGS = \
'AS=$(COMPILER_AS_FOR_TARGET)' \
'CC=$$(CC_FOR_TARGET) $$(XGCC_FLAGS_FOR_TARGET) $$(TFLAGS)' \
'CFLAGS=$$(CFLAGS_FOR_TARGET)' \
-   'CXX=$$(CXX_FOR_TARGET) $$(XGCC_FLAGS_FOR_TARGET) $$(TFLAGS)' \
+   'CXX=$$(CXX_FOR_TARGET) -B$$r/$$(TARGET_SUBDIR)/libstdc++-v3/src/.libs \
+-B$$r/$$(TARGET_SUBDIR)/libstdc++-v3/libsupc++/.libs \
+$$(XGCC_FLAGS_FOR_TARGET) $$(TFLAGS)' \
'CXXFLAGS=$$(CXXFLAGS_FOR_TARGET)' \
'DLLTOOL=$$(DLLTOOL_FOR_TARGET)' \
'GCJ=$$(GCJ_FOR_TARGET) $$(XGCC_FLAGS_FOR_TARGET) $$(TFLAGS)' \
@@ -659,6 +661,8 @@ EXTRA_TARGET_FLAGS = \
'WINDRES=$$(WINDRES_FOR_TARGET)' \
'WINDMC=$$(WINDMC_FOR_TARGET)' \
'XGCC_FLAGS_FOR_TARGET=$(XGCC_FLAGS_FOR_TARGET)' \
+   'STAGE1_LDFLAGS=$$(POSTSTAGE1_LDFLAGS)' \
+   'STAGE1_LIBS=$$(POSTSTAGE1_LIBS)' \
"TFLAGS=$$TFLAGS"
 
 TARGET_FLAGS_TO_PASS = $(BASE_FLAGS_TO_PASS) $(EXTRA_TARGET_FLAGS)
--- Makefile.in.jj  2014-11-28 14:40:52.0 +0100
+++ Makefile.in 2014-12-05 21:11:48.276616008 +0100
@@ -835,7 +835,9 @@ EXTRA_TARGET_FLAGS = \
'AS=$(COMPILER_AS_FOR_TARGET)' \
'CC=$$(CC_FOR_TARGET) $$(XGCC_FLAGS_FOR_TARGET) $$(TFLAGS)' \
'CFLAGS=$$(CFLAGS_FOR_TARGET)' \
-   'CXX=$$(CXX_FOR_TARGET) $$(XGCC_FLAGS_FOR_TARGET) $$(TFLAGS)' \
+   'CXX=$$(CXX_FOR_TARGET) -B$$r/$$(TARGET_SUBDIR)/libstdc++-v3/src/.libs \
+-B$$r/$$(TARGET_SUBDIR)/libstdc++-v3/libsupc++/.libs \
+$$(XGCC_FLAGS_FOR_TARGET) $$(TFLAGS)' \
'CXXFLAGS=$$(CXXFLAGS_FOR_TARGET)' \
'DLLTOOL=$$(DLLTOOL_FOR_TARGET)' \
'GCJ=$$(GCJ_FOR_TARGET) $$(XGCC_FLAGS_FOR_TARGET) $$(TFLAGS)' \
@@ -853,6 +855,8 @@ EXTRA_TARGET_FLAGS = \
'WINDRES=$$(WINDRES_FOR_TARGET)' \
'WINDMC=$$(WINDMC_FOR_TARGET)' \
'XGCC_FLAGS_FOR_TARGET=$(XGCC_FLAGS_FOR_TARGET)' \
+   'STAGE1_LDFLAGS=$$(POSTSTAGE1_LDFLAGS)' \
+   'STAGE1_LIBS=$$(POSTSTAGE1_LIBS)' \
"TFLAGS=$$TFLAGS"
 
 TARGET_FLAGS_TO_PASS = $(BASE_FLAGS_TO_PASS) $(EXTRA_TARGET_FLAGS)

Jakub


Re: [PATCH] Fix asan sanopt optimization (PR sanitizer/64170)

2014-12-05 Thread Jeff Law

On 12/03/14 15:07, Jakub Jelinek wrote:

Hi!

The following testcase ICEs, because base_checks vector contains
stale statements, and can_remove_asan_check relies on them not to be
there anymore (assumes that all statements in the vector dominate
the current statement, if that is not true, the loop going through immediate
dominators won't reach the basic block of the stmt in the vector).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2014-12-03  Jakub Jelinek  

PR sanitizer/64170
* sanopt.c (maybe_optimize_asan_check_ifn): If base_checks is
non-NULL, call maybe_get_dominating_check on it even if g is
non-NULL.

* gcc.dg/asan/pr64170.c: New test.

OK.
jeff



Re: [PATCH, MPX wrappers 1/3] Add MPX wrappers library

2014-12-05 Thread Jeff Law

On 12/03/14 07:28, Ilya Enkovich wrote:

   #ifndef MPX_SPEC
   #define MPX_SPEC "\
-%{!nostdlib:%{!nodefaultlibs:" LIBMPX_SPEC "}}"
+%{!nostdlib:%{!nodefaultlibs:" LIBMPX_SPEC LIBMPXWRAPPERS_SPEC "}}"
   #endif


Ugh.  Somehow I missed that MPX_SPEC was in gcc.c along with the uses of
LIBMPX_SPEC.  Aren't all these target specific and thus belong in the x86
specific files?


Is config/i386/linux-common.h is a proper place for these specs then?
Depends on whether or not we expect MPX to show up on other systems such 
as *bsd, mingw, solaris, etc.


So I'd say linux-common.h is better than gcc.c, but perhaps not the best 
location.  Uros should chime in here.



Right. Wrappers code doesn't use anything specific to MPX.  In case of
pure software solution we should be able to compile and use this
library without changes (except compilation flags).  But in case pure
software solution exists MPX option should still be available and we
should have two builds for this library.
Ok.  Just wanted to be sure I understood how the pieces fit together.  I 
don't really expect a software implementation, but keeping it in mind 
helps us reasonably consider where certain things belong implementation 
wise.


jeff



Re: [Patch]: Check __gthread_setspecific return

2014-12-05 Thread Jeff Law

On 12/02/14 10:53, Ryan Mansfield wrote:

Hi,

Underlying pthread_setspecific can return non-zero with ENOMEM or EINVAL.

2014-12-02  Ryan Mansfield  

 * emutls.c (__emutls_get_address): Check __gthread_setspecific
returns.

OK?

OK.

Sorry for the delay,
Jeff



Re: [PATCH, CHKP] Don't try to optimize bounds returned by strchr

2014-12-05 Thread Jeff Law

On 12/02/14 06:40, Ilya Enkovich wrote:

Hi,

For strchr calls bounds of the first argument are considered as returned which 
is wrong because NULL may be returned.  This patch fixes that.  Bootstrapped 
and tested on x86_64-unknown-linux-gnu.  OK for trunk?

Thanks,
Ilya
--
2014-12-02  Ilya Enkovich  

* tree-chkp.c (chkp_build_returned_bound): Don't predict
return bounds for strchr calls.

OK.

Do you have a testcase you could add to the suite?

Jeff



Re: [PATCH, CHKP] Don't generate bndret for not instrumented calls

2014-12-05 Thread Jeff Law

On 12/02/14 06:33, Ilya Enkovich wrote:

Hi,

Currently bndret is generated each time we need to get bounds for returned 
pointer.  It causes bndret generated for not instrumented calls incuding 
builtin function calls.  Troubles appear when such builtin call is optimized 
out - bndret needs to be handled appropriately.  Since we don't want not 
instrumented builtin calls optimizers to be affected by instrumentation, we 
better avoid bndret for not instrumented calls.  This patch uses zero bounds 
when we don't expect call to return bounds.

Bootstrapped and tested on  x86_64-unknown-linux-gnu.  OK for trunk?

Thanks,
Ilya
--
gcc/

2014-12-02  Ilya Enkovich  

* tree-chkp.c (chkp_call_returns_bounds_p): New.
(chkp_build_returned_bound): Use zero bounds as
returned by calls not returning bounds.

gcc/testsuite/

2014-12-02  Ilya Enkovich  

* gcc.target/i386/chkp-bndret.c: New.
* gcc.target/i386/chkp-strchr.c: New.

OK.
Jeff



Re: [patch] small fixes for post-reload compare elimination pass

2014-12-05 Thread Jeff Law

On 11/28/14 02:31, Eric Botcazou wrote:

Hi,

this patch fixes a few glitches in the post-reload compare elimination pass,
most notably the slightly disturbing opening comment:

This pass assumes:

[...]

(1) All comparison patterns are represented as

[(set (reg:CC) (compare:CC (reg) (immediate)))]

the mode mismatch in before_dom_children:

rtx x, flags = gen_rtx_REG (src_mode, targetm.flags_regnum);

/* Generate new comparison for substitution.  */
x = gen_rtx_COMPARE (new_mode, XEXP (src, 0), XEXP (src, 1));
x = gen_rtx_SET (VOIDmode, flags, x);

and the direct tests on flag_non_call_exceptions:

  if (flag_non_call_exceptions)
eh_note = find_reg_note (insn, REG_EH_REGION, NULL);

[...]

  /* Take care that it's in the same EH region.  */
  if (flag_non_call_exceptions
  && !rtx_equal_p (eh_note, last_cmp->eh_note))
goto dont_delete;

It also factors out the comparison logic in before_dom_children to avoid the
series of gotos.

Tested on a private port we're about to submit, OK for the mainline?


2014-11-28  Eric Botcazou  

* compare-elim.c: Fix head comment.
(conforming_compare): Remove redundant test.
(can_eliminate_compare): New function extracted from...
(before_dom_children): ...here.  Use it, replace direct uses of
flag_non_call_exceptions and tidy up.
(maybe_select_cc_mode): Tidy up.

OK.
jeff



Re: [debug-early] emit locals early patchset

2014-12-05 Thread Aldy Hernandez

On 10/28/14 10:28, Jason Merrill wrote:

My apologies for the long delay.  I was on PTO.

> On 10/27/2014 08:00 PM, Aldy Hernandez wrote:

2. Changes to gen_variable_die() to handle multiple passes (early/late
dwarf generation).

A lot of this is complicated by the fact that old_die's are cached and
keyed by `tree', but an abstract instance and an inline instance share
trees, while dwarf2out_abstract_function() sets DECL_ABSTRACT_P behind
the scenes.

The current support (and my changes) maintain this shared and delicate
design.  I wonder whether we could simplify a lot of this code by
unsharing these trees, but this may be beyond the scope of this work.


Copying all the trees in a function just for debug generation?  No, that
sounds undesirable.


3. I've removed deferred_locations.  With multiple dwarf passes we can
do without them.


Yay!


Kind words greatly appreciated.  Basically I'm looking for feedback
and positive reinforcement that this is all eventually useful


This all looks very good, just a few nitpicks:


Yay!




- instance tree [that has DW_AT_inline] should not contain any
+ instance tree [has DW_AT_inline] should not contain any


This doesn't seem like an improvement.


Reverted.




+  /* Find and reuse a previously generated DW_TAG_subrange_type if
+ available.  */


Let's expand this comment a bit to clarify how this works for
multi-dimensional arrays.


Done.




- abstract instance (origin != NULL), in which case we need a new
+ inline instance (origin != NULL), in which case we need a new DIE


I think "concrete instance" is what you want here.


Done.




+  /* Even if we have locations, we need to recurse through
+ the locals to make sure they also have locations.  */


Why?  What is adding a location to the function without doing the same
for the locals?


Apparently, nothing.  I put a gcc_unreachable() there, and in all my 
tests, it never got triggered, so I've removed it.  Thanks.





+  current_function_has_inlines = 0;
+
+  /* The first time through decls_for_scope we will generate the
+ DIEs for the locals.  The second time, we fill in the
+ location info.  */
+  decls_for_scope (outer_scope, subr_die, 0);
+
   /* Emit a DW_TAG_variable DIE for a named return value.  */
   if (DECL_NAME (DECL_RESULT (decl)))
 gen_decl_die (DECL_RESULT (decl), NULL, subr_die);

-  current_function_has_inlines = 0;
-  decls_for_scope (outer_scope, subr_die, 0);


Why does this need to be reordered?


This may have been fall back from a previous version.  I have reverted 
the change.





+  /* If the compiler emitted a definition for the DECL declaration
+ and we already emitted a DIE for it, don't emit a second
+ DIE for it again. Allow re-declarations of DECLs that are
+ inside functions, though.  */
+  else if (old_die && !declaration && !local_scope_p (context_die))
+return;


What DECLs in functions need re-declaration?


This was already there.  It is pre-existing code that got moved down 
after the new caching code.





-  if (decl && (DECL_ABSTRACT_P (decl) || declaration || old_die ==
NULL))
+  if (decl && (DECL_ABSTRACT_P (decl) || declaration || old_die == NULL
+   /* If we make it to a specialization, we have already
+  handled the declaration by virtue of early dwarf.
+  If so, make a new assocation if available, so late
+  dwarf can find it.  */
+   || (specialization_p && old_die && old_die->dumped_early)))
 equate_decl_number_to_die (decl, var_die);


Instead of old_die->dumped_early, I think it would make more sense to
check early_dwarf_dumping; the reason we need to call
equate_decl_number_to_die is because we're early-dumping the definition
and we will need to find it again later.


I've rewritten the above as:

   || (specialization_p && early_dwarf_dumping)))




+  else if (BLOCK_ABSTRACT_ORIGIN (stmt))
 {
+  /* If this is an inlined instance, create a new lexical die for
+ anything below to attach DW_AT_abstract_origin to.  */
+  stmt_die = new_die (DW_TAG_lexical_block, context_die, stmt);
+}


What if we early dumped this block?


What do you mean?  Would you like me to calls decls_for_scope earlier 
for abstract instances, or generate the DW_TAG_lexical_block die earlier 
for abstract instances, or what?





+  /* Variabled-lengthed types may be incomplete even if
+ TREE_ASM_WRITTEN.


"variable-length", I think.


Fixed in the changelog and otherwise in the patch.

I have committed the attached patch.  We can iterate on the 
DW_TAG_lexical_block and DECL re-declaration issues in subsequent followups.


As usual, feel free to scream 
(https://www.youtube.com/watch?v=HLI4EuDckgM) if in violent disagreement.


Aldy
commit 7d0ab897d086ac8648928b1236dc88697c96d037
Author: Aldy Hernandez 
Date:   Fri Dec 5 14:54:41 2014 -0800

* dwarf2out.c (check_die_inline): Revert previous com

Re: ptx debugging patch

2014-12-05 Thread Jeff Law

On 11/14/14 11:17, Bernd Schmidt wrote:

The situation with debugging on ptx is a little strange - it allows
.file and .loc directives for line numbers, and it provides a way to
define dwarf2 debug sections - but as far as I can tell, there's no way
of putting useful or accurate information into the latter. There's also
the slight problem that the data output directives used within those
sections differ from the ones used everywhere else in ptx code.

The following patch adds a variant of dwarf2 debugging that supports
just line numbers. I'll need to update the nvptx-as from my tools
package, since ptxas is picky and does not allow .file directives within
functions, so some more reordering of the assembly output is required.

How does this look? Testing currently in progress. An alternative would
be to make a PTX_DEBUGGING_OUTPUT macro and a corresponding file cut
down from dwarf2out.


Bernd

ptx-debug.diff


* config/nvptx/nvptx.c (nvptx_option_override): Don't override
debug options.
* config/nvptx/nvptx.h (DWARF2_LINENO_DEBUGGING_INFO): Define.
* config/nvptx/nvptx.h (DWARF2_DEBUGGING_INFO): Don't define.
* debug.h (dwarf2_lineno_debug_hooks): Declare.
* toplev.c (process_options): Add a case for it.
* dwarf2out.c (dwarf2_lineno_debug_hooks): New variable.
(dwarf2out_init): Skip most initializations if
DWARF2_LINENO_DEBUGGING_INFO, but set cur_line_info_table in
that case.
* defaults.h (PREFERRED_DEBUGGING_TYPE): Also use DWARF2_DEBUG
if DWARF2_LINENO_DEBUGGING_INFO.
* opts.c (set_debug_level): Likewise.

I'll resist the temptation to bikeshed on the name :-)

OK.

jeff



Re: [PATCH] Fix PR 61225

2014-12-05 Thread Segher Boessenkool
On Fri, Dec 05, 2014 at 03:36:01PM -0700, Jeff Law wrote:
> >So combine tries to combine 6+7+8; the RTL it comes up with is a parallel
> >of the memory decrement (without cc clobber, but that is fine), and setting
> >r88 to the mem minus one.  There is no such pattern in the target, and
> >combine cannot break the parallel into two sets (because the first modifies
> >the mem used by the second), so 6+7+8 doesn't combine.
> >
> >Adding a bridge pattern in the target would work; or you can enhance 
> >combine
> >so it can break up this parallel correctly.
> I think myself or someone suggested a bridge pattern in the past, but I 
> can't find it, perhaps it was one of the other threads WRT limitations 
> of the combiner.
> 
> Zhenqiang, can you look at what happens if you provide a pattern for 
> 6+7+8 (probably via a define_and_split)?

I tried this out yesterday.  There are a few options (a bridge pattern
for 6+7+8, or one for 7+8).  I went with 6+7+8.

So the code combine is asked to optimise is

6  A = M
7  T = A + B
8  M = T
9  C = cmp T, 0

and the bridge pattern I added is

M = M + B  ::  T = M + B

(I made it to split to  M = M + B ; T = M  which is probably not optimal,
but irrelevant for the rest here).

So combine happily combines 6+7+8 to the bridge pattern.  But then it
forgets to make a link from 9.  I suppose it just doesn't know how to
make a link to a parallel (it wouldn't ever be useful before my recent
patches).

Investigating...


Segher


Re: [PATCH] Fix PR 61225

2014-12-05 Thread Segher Boessenkool
On Fri, Dec 05, 2014 at 03:31:54PM -0700, Jeff Law wrote:
> >Combine does not consider combining 9 into 7 because there is no LOG_LINK
> >between them (the link for r88 is between 8 and 7 already).
> OK, yea, that's a long standing design decision.  We don't feed a single 
> def into multiple use sites.

There is no real reason not to do that.  It doesn't increase computational
complexity, although it is of course more expensive than what combine does
today (it is more work, after all).  And combining with a later use does
not have too big a chance to succeed (since it has to keep the result of
the earlier insn around always).

GCC 6 or later ;-)


Segher


Re: [PATCH] Fix PR 61225

2014-12-05 Thread Segher Boessenkool
On Thu, Dec 04, 2014 at 02:57:56PM -0600, Segher Boessenkool wrote:
> Adding a bridge pattern in the target would work; or you can enhance combine
> so it can break up this parallel correctly.

I also investigated that second option.  The enhancement transforms
the combine result

M = XXX  ::  T = XXX

into

M = XXX
T = M

and then the set of T can combine with its later use (the compare), but
it won't ever combine that with the store to M: there is never a link
for memory, only for registers.

Never mind that this is unsuitable for many targets anyway (it creates
a read-after-write hazard).


Segher


  1   2   >