Re: [v3] add max_size and rebind to __alloc_traits

2011-10-06 Thread Jonathan Wakely
On 6 October 2011 02:57, Paolo Carlini wrote:
>
> today I ran the whole testsuite in C++0x mode and I'm pretty sure that
> 23_containers/vector/modifiers/swap/3.cc, which is now failing, wasn't a
> couple of days ago (I ran the whole testsuite like that in order to validate
> my std::list changes). When you have time, could you please double check?
> (maybe after all we *do* want it to fail in C++0x mode, but I'd like to
> understand if the behavior changed inadvertently!)

I think you're right it wasn't failing before, as I ran the whole
testsuite in C++0x mode when I first added alloc_traits - I'll check
it today and see how I broke it.


Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-06 Thread Paolo Bonzini

On 10/05/2011 10:16 PM, William J. Schmidt wrote:

OK, I see.  If there's a better place downstream to make a swizzle, I'm
certainly fine with that.

I disabled locally_poor_mem_replacement and added some dump information
in should_replace_address to show the costs for the replacement I'm
trying to avoid:

In should_replace_address:
   old_rtx = (reg/f:DI 125 [ D.2036 ])
   new_rtx = (plus:DI (reg/v/f:DI 126 [ p ])
 (reg:DI 128))
   address_cost (old_rtx) = 0
   address_cost (new_rtx) = 0
   set_src_cost (old_rtx) = 0
   set_src_cost (new_rtx) = 4

In insn 11, replacing
  (mem/s:SI (reg/f:DI 125 [ D.2036 ]) [2 p_1(D)->a S4 A32])
  with (mem/s:SI (plus:DI (reg/v/f:DI 126 [ p ])
 (reg:DI 128)) [2 p_1(D)->a S4 A32])
Changed insn 11
deferring rescan insn with uid = 11.
deferring rescan insn with uid = 11.


And IIUC the other address is based on pseudo 125 as well, but the 
combination is (plus (plus (reg 126) (reg 128)) (const_int X)) and 
cannot be represented on ppc.  I think _this_ is the problem, so I'm 
afraid your patch could cause pessimizations on x86 for example.  On 
x86, which has a cheap REG+REG+CONST addressing mode, it is much better 
to propagate pseudo 125 so that you can delete the set altogether.


However, indeed there is no downstream pass that undoes the 
transformation.  Perhaps we can do it in CSE, since this _is_ CSE after 
all. :)  The attached untested (uncompiled) patch is an attempt.


Paolo
Index: cse.c
===
--- cse.c	(revision 177688)
+++ cse.c	(working copy)
@@ -3136,6 +3136,75 @@ find_comparison_args (enum rtx_code code
   return code;
 }
 
+static rtx
+lookup_addr (rtx insn, rtx *loc, enum machine_mode mode)
+{
+  struct table_elt *elt, *p;
+  int regno;
+  int hash;
+  int base_cost;
+  rtx addr = *loc;
+  rtx exp;
+
+  /* Try to reuse existing registers for addresses, in hope of shortening
+ live ranges for the registers that compose the addresses.  This happens
+ when you have
+
+ (set (reg C) (plus (reg A) (reg B))
+ (set (reg D) (mem (reg C)))
+ (set (reg E) (mem (plus (reg C) (const_int X
+
+ In this case fwprop will try to propagate into the addresses, but
+ if propagation into reg E fails, the only result will have been to
+ uselessly lengthen the live range of A and B.  */
+
+  if (!REG_P (addr))
+return;
+
+  regno = REGNO (addr);
+  if (regno == FRAME_POINTER_REGNUM
+  || regno == HARD_FRAME_POINTER_REGNUM
+  || regno == ARG_POINTER_REGNUM)
+return;
+
+   /* If this address is not in the hash table, we can't look for equivalences
+  of the whole address.  Also, ignore if volatile.  */
+
+  {
+int save_do_not_record = do_not_record;
+int save_hash_arg_in_memory = hash_arg_in_memory;
+int addr_volatile;
+
+do_not_record = 0;
+hash = HASH (addr, Pmode);
+addr_volatile = do_not_record;
+do_not_record = save_do_not_record;
+hash_arg_in_memory = save_hash_arg_in_memory;
+
+if (addr_volatile)
+  return;
+  }
+
+  /* Try to find a REG that holds the same address.  */
+
+  elt = lookup (addr, hash, Pmode);
+  if (!elt)
+return;
+
+  base_cost = address_cost (*loc, mode);
+  for (p = elt->first_same_value; p; p = p->next_same_value)
+{
+  exp = p->exp;
+  if (REG_P (exp)
+  && exp_equiv_p (exp, exp, 1, false)
+  && address_cost (exp, mode) > base_cost)
+break;
+}
+
+  if (p)
+validate_change (insn, loc, canon_reg (copy_rtx (exp), NULL_RTX), 0));
+}
+
 /* If X is a nontrivial arithmetic operation on an argument for which
a constant value can be determined, return the result of operating
on that value, as a constant.  Otherwise, return X, possibly with
@@ -3180,6 +3249,12 @@ fold_rtx (rtx x, rtx insn)
   switch (code)
 {
 case MEM:
+  if ((new_rtx = equiv_constant (x)) != NULL_RTX)
+return new_rtx;
+  if (insn)
+lookup_addr (insn, &XEXP (x, 0), GET_MODE (x));
+  return x;
+
 case SUBREG:
   if ((new_rtx = equiv_constant (x)) != NULL_RTX)
 return new_rtx;
Index: passes.c
===
--- passes.c	(revision 177688)
+++ passes.c	(working copy)
@@ -1448,9 +1448,9 @@ init_optimization_passes (void)
 	}
   NEXT_PASS (pass_web);
   NEXT_PASS (pass_rtl_cprop);
+  NEXT_PASS (pass_rtl_fwprop_addr);
   NEXT_PASS (pass_cse2);
   NEXT_PASS (pass_rtl_dse1);
-  NEXT_PASS (pass_rtl_fwprop_addr);
   NEXT_PASS (pass_inc_dec);
   NEXT_PASS (pass_initialize_regs);
   NEXT_PASS (pass_ud_rtl_dce);


[PATCH] Fix PR38884

2011-10-06 Thread Richard Guenther

This handles the case of CSEing part of an SSA name that is stored
to memory and defined with a composition like COMPLEX_EXPR or
CONSTRUCTOR.  This fixes the remaining pieces of PR38884 and
PR38885.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2011-10-06  Richard Guenther  

PR tree-optimization/38884
* tree-ssa-sccvn.c (vn_reference_lookup_3): Handle partial
reads from aggregate SSA names.

* gcc.dg/tree-ssa/ssa-fre-34.c: New testcase.
* gcc.dg/tree-ssa/ssa-fre-35.c: Likewise.

Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 179556)
--- gcc/tree-ssa-sccvn.c(working copy)
*** vn_reference_lookup_3 (ao_ref *ref, tree
*** 1489,1495 
}
  }
  
!   /* 4) For aggregate copies translate the reference through them if
   the copy kills ref.  */
else if (vn_walk_kind == VN_WALKREWRITE
   && gimple_assign_single_p (def_stmt)
--- 1489,1554 
}
  }
  
!   /* 4) Assignment from an SSA name which definition we may be able
!  to access pieces from.  */
!   else if (ref->size == maxsize
!  && is_gimple_reg_type (vr->type)
!  && gimple_assign_single_p (def_stmt)
!  && TREE_CODE (gimple_assign_rhs1 (def_stmt)) == SSA_NAME)
! {
!   tree rhs1 = gimple_assign_rhs1 (def_stmt);
!   gimple def_stmt2 = SSA_NAME_DEF_STMT (rhs1);
!   if (is_gimple_assign (def_stmt2)
! && (gimple_assign_rhs_code (def_stmt2) == COMPLEX_EXPR
! || gimple_assign_rhs_code (def_stmt2) == CONSTRUCTOR)
! && types_compatible_p (vr->type, TREE_TYPE (TREE_TYPE (rhs1
!   {
! tree base2;
! HOST_WIDE_INT offset2, size2, maxsize2, off;
! base2 = get_ref_base_and_extent (gimple_assign_lhs (def_stmt),
!  &offset2, &size2, &maxsize2);
! off = offset - offset2;
! if (maxsize2 != -1
! && maxsize2 == size2
! && operand_equal_p (base, base2, 0)
! && offset2 <= offset
! && offset2 + size2 >= offset + maxsize)
!   {
! tree val = NULL_TREE;
! HOST_WIDE_INT elsz
!   = TREE_INT_CST_LOW (TYPE_SIZE (TREE_TYPE (TREE_TYPE (rhs1;
! if (gimple_assign_rhs_code (def_stmt2) == COMPLEX_EXPR)
!   {
! if (off == 0)
!   val = gimple_assign_rhs1 (def_stmt2);
! else if (off == elsz)
!   val = gimple_assign_rhs2 (def_stmt2);
!   }
! else if (gimple_assign_rhs_code (def_stmt2) == CONSTRUCTOR
!  && off % elsz == 0)
!   {
! tree ctor = gimple_assign_rhs1 (def_stmt2);
! unsigned i = off / elsz;
! if (i < CONSTRUCTOR_NELTS (ctor))
!   {
! constructor_elt *elt = CONSTRUCTOR_ELT (ctor, i);
! if (compare_tree_int (elt->index, i) == 0)
!   val = elt->value;
!   }
!   }
! if (val)
!   {
! unsigned int value_id = get_or_alloc_constant_value_id (val);
! return vn_reference_insert_pieces
!  (vuse, vr->set, vr->type,
!   VEC_copy (vn_reference_op_s, heap, vr->operands),
!   val, value_id);
!   }
!   }
!   }
! }
! 
!   /* 5) For aggregate copies translate the reference through them if
   the copy kills ref.  */
else if (vn_walk_kind == VN_WALKREWRITE
   && gimple_assign_single_p (def_stmt)
*** vn_reference_lookup_3 (ao_ref *ref, tree
*** 1587,1593 
return NULL;
  }
  
!   /* 5) For memcpy copies translate the reference through them if
   the copy kills ref.  */
else if (vn_walk_kind == VN_WALKREWRITE
   && is_gimple_reg_type (vr->type)
--- 1646,1652 
return NULL;
  }
  
!   /* 6) For memcpy copies translate the reference through them if
   the copy kills ref.  */
else if (vn_walk_kind == VN_WALKREWRITE
   && is_gimple_reg_type (vr->type)
Index: gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-34.c
===
*** gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-34.c  (revision 0)
--- gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-34.c  (revision 0)
***
*** 0 
--- 1,18 
+ /* { dg-do compile } */
+ /* { dg-options "-O -fdump-tree-fre1-details" } */
+ 
+ #define vector __attribute__((vector_size(16) ))
+ 
+ struct {
+ float i;
+ vector float global_res;
+ } s;
+ float foo(float f)
+ {
+   vector float res = (vector float){0.0f,f,0.0f,1.0f};
+   s.global_res = res;
+   return *((float*)&s.global_res + 1);
+ }
+ 
+ /* { dg-final { scan-tree-dump

Re: Modify gcc for use with gdb (issue5132047)

2011-10-06 Thread Richard Guenther
On Wed, Oct 5, 2011 at 6:53 PM, Diego Novillo  wrote:
> On Wed, Oct 5, 2011 at 11:28, Diego Novillo  wrote:
>> On Wed, Oct 5, 2011 at 10:51, Richard Guenther
>>  wrote:
>>
>>> Did you also mark the function with always_inline?  That's a requirement
>>> as artificial only works for inlined function bodies.
>>
>> Yeah.  It doesn't quite work as I expect it to.  It steps into the
>> function at odd places.
>
> So, I played with this some more with this, and there seems to be some
> inconsistency in how these attributes get handled.
> http://sourceware.org/bugzilla/show_bug.cgi?id=13263
>
> static inline int foo (int) __attribute__((always_inline,artificial));
>
> static inline int foo (int x)
> {
>  int y  = x - 3;
>  return y;
> }
>
> int bar (int y)
> {
>  return y == 0;
> }
>
> main ()
> {
>  foo (10);
>  return bar (foo (3));
> }
>
> With GCC 4.7, the stand alone call foo(10) is not ignored by 'step'.
> However, the embedded call bar(foo(3)) is ignored as I was expecting.

Hm, nothing is ignored for me with gcc 4.6.

>
> Diego.
>


Re: Modify gcc for use with gdb (issue5132047)

2011-10-06 Thread Richard Guenther
On Wed, Oct 5, 2011 at 8:51 PM, Diego Novillo  wrote:
> On Wed, Oct 5, 2011 at 14:20, Mike Stump  wrote:
>> On Oct 5, 2011, at 6:18 AM, Diego Novillo wrote:
>>> I think we need to find a solution for this situation.
>>
>> The solution Apple found and implemented is a __nodebug__ attribute, as can 
>> be seen in Apple's gcc.
>>
>> We use it like so:
>>
>> #define __always_inline__ __always_inline__, __nodebug__
>> #undef __always_inline__
>>
>> in headers like mmintrn.h:
>>
>> __STATIC_INLINE void __attribute__((__always_inline__))
>> /* APPLE LOCAL end radar 5618945 */
>> _mm_empty (void)
>> {
>>  __builtin_ia32_emms ();
>> }
>
> Ah, nice.  Though, one of the things I am liking more and more about
> the blacklist solution is that it (a) does not need any modifications
> to the source code, and (b) it works with no-inline functions as well.
>
> This gives total control to the developer.  I would blacklist a bunch
> of functions I never care to go into, for instance.  Others may choose
> to blacklist a different set.  And you can change that from debug
> session to the next.
>
> I agree with Jakub that artificial functions should be blacklisted
> automatically, however.
>
> Richi, Jakub, if the blacklist solution was implemented in GCC would
> you agree with promoting these macros into inline functions?  This is
> orthogonal to http://sourceware.org/bugzilla/show_bug.cgi?id=13263, of
> course.

I know you are on to that C++ thing and ending up returning a reference
to make it an lvalue.  Which I very much don't like (please, if you go
that route add _set functions and lower the case of the macros).

What's the other advantage of using inline functions?  The gdb
annoyance with the macros can be solved with the .gdbinit macro
defines (which might be nice to commit to SVN btw).

Richard.

>
> Thanks.  Diego.
>


Re: [patch, arm] Fix PR target/50305 (arm_legitimize_reload_address problem)

2011-10-06 Thread Ramana Radhakrishnan
On 4 October 2011 16:13, Ulrich Weigand  wrote:
> Ramana Radhakrishnan wrote:
>> On 26 September 2011 15:24, Ulrich Weigand  wrote:
>> > Is this sufficient, or should I test any other set of options as well?
>>
>> Could you run one set of tests with neon ?
>
> Sorry for the delay, but I had to switch to my IGEP board for Neon
> support, and that's a bit slow ...   In any case, I've now completed
> testing the patch with Neon with no regressions.
>
>> > Just to clarify: in the presence of the other options that are already
>> > in dg-options, the test case now fails (with the unpatched compiler)
>> > for *any* setting of -mfloat-abi (hard, soft, or softfp).  Do you still
>> > want me to add a specific setting to the test case?
>>
>> No the mfpu=vfpv3 is fine.
>
> OK, thanks.
>
>> Instead of skipping I was wondering if we
>> could prune the outputs to get this through all the testers we have.
>
> Well, the problem is that with certain -march options (e.g. armv7) we get:
> /home/uweigand/gcc-head/gcc/testsuite/gcc.target/arm/pr50305.c:1:0:
> error: target CPU does not support ARM mode

Ah - ok.

>
> Since this is an *error*, pruning the output doesn't really help, the
> test isn't being run in any case.
>
>> Otherwise this is OK.
>
> Given the above, is the patch now OK as-is?

OK by me.

Ramana


[patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-06 Thread Kai Tietz
Hello,

this patch improves in fold_truth_andor the generation of branch-conditions for 
targets having LOGICAL_OP_NON_SHORT_CIRCUIT set.  If right-hand side operation 
of a TRUTH_(OR|AND)IF_EXPR is simple operand, has no side-effects, and doesn't 
trap, then try to convert expression to a TRUTH_(AND|OR)_EXPR, if left-hand 
operand is a simple operand, and has no side-effects.

ChangeLog

2011-10-06  Kai Tietz  

* fold-const.c (fold_truth_andor): Convert TRUTH_(AND|OR)IF_EXPR
to TRUTH_OR_EXPR, if suitable.

Bootstrapped and tested for all languages (including Ada and Obj-C++) on host 
x86_64-unknown-linux-gnu.  Ok for apply?

Regards,
Kai


Index: gcc/gcc/fold-const.c
===
--- gcc.orig/gcc/fold-const.c
+++ gcc/gcc/fold-const.c
@@ -8386,6 +8390,33 @@ fold_truth_andor (location_t loc, enum t
   if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0)
 return tem;

+  if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR)
+  && !TREE_SIDE_EFFECTS (arg1)
+  && simple_operand_p (arg1)
+  && LOGICAL_OP_NON_SHORT_CIRCUIT
+  && !FLOAT_TYPE_P (TREE_TYPE (arg1))
+  && ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
+  && TREE_CODE (TREE_CODE (arg1)) != TRUTH_NOT_EXPR)
+ || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0)
+{
+  if (TREE_CODE (arg0) == code
+  && !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))
+  && simple_operand_p (TREE_OPERAND (arg0, 1)))
+   {
+ tem = build2_loc (loc,
+   (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
+   (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
+ : TRUTH_OR_EXPR),
+   type, TREE_OPERAND (arg0, 1), arg1);
+   return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), tem);
+  }
+  if (!TREE_SIDE_EFFECTS (arg0)
+  && simple_operand_p (arg0))
+   return build2_loc (loc,
+  (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
+: TRUTH_OR_EXPR),
+  type, arg0, arg1);
+}
+
   return NULL_TREE;
 }


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-06 Thread Kai Tietz
Hello,

Sorry attached non-updated change.  Here with proper attached patch.
This patch improves in fold_truth_andor the generation of branch-conditions for 
targets having LOGICAL_OP_NON_SHORT_CIRCUIT set.  If right-hand side operation 
of a TRUTH_(OR|AND)IF_EXPR is simple operand, has no side-effects, and doesn't 
trap, then try to convert expression to a TRUTH_(AND|OR)_EXPR, if left-hand 
operand is a simple operand, and has no side-effects.

ChangeLog

2011-10-06  Kai Tietz  

* fold-const.c (fold_truth_andor): Convert TRUTH_(AND|OR)IF_EXPR
to TRUTH_OR_EXPR, if suitable.

Bootstrapped and tested for all languages (including Ada and Obj-C++) on host 
x86_64-unknown-linux-gnu.  Ok for apply?

Regards,
Kai


ndex: fold-const.c
===
--- fold-const.c(revision 179592)
+++ fold-const.c(working copy)
@@ -8387,6 +8387,33 @@
   if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0)
 return tem;

+  if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR)
+  && !TREE_SIDE_EFFECTS (arg1)
+  && simple_operand_p (arg1)
+  && LOGICAL_OP_NON_SHORT_CIRCUIT
+  && !FLOAT_TYPE_P (TREE_TYPE (arg1))
+  && ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
+  && TREE_CODE (arg1) != TRUTH_NOT_EXPR)
+ || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0)
+{
+  if (TREE_CODE (arg0) == code
+  && !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))
+  && simple_operand_p (TREE_OPERAND (arg0, 1)))
+   {
+ tem = build2_loc (loc,
+   (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
+ : TRUTH_OR_EXPR),
+   type, TREE_OPERAND (arg0, 1), arg1);
+   return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), tem);
+  }
+  if (!TREE_SIDE_EFFECTS (arg0)
+  && simple_operand_p (arg0))
+   return build2_loc (loc,
+  (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
+: TRUTH_OR_EXPR),
+  type, arg0, arg1);
+}
+
   return NULL_TREE;
 }



Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-06 Thread Richard Guenther
On Thu, Oct 6, 2011 at 11:28 AM, Kai Tietz  wrote:
> Hello,
>
> Sorry attached non-updated change.  Here with proper attached patch.
> This patch improves in fold_truth_andor the generation of branch-conditions 
> for targets having LOGICAL_OP_NON_SHORT_CIRCUIT set.  If right-hand side 
> operation of a TRUTH_(OR|AND)IF_EXPR is simple operand, has no side-effects, 
> and doesn't trap, then try to convert expression to a TRUTH_(AND|OR)_EXPR, if 
> left-hand operand is a simple operand, and has no side-effects.
>
> ChangeLog
>
> 2011-10-06  Kai Tietz  
>
>        * fold-const.c (fold_truth_andor): Convert TRUTH_(AND|OR)IF_EXPR
>        to TRUTH_OR_EXPR, if suitable.
>
> Bootstrapped and tested for all languages (including Ada and Obj-C++) on host 
> x86_64-unknown-linux-gnu.  Ok for apply?
>
> Regards,
> Kai
>
>
> ndex: fold-const.c
> ===
> --- fold-const.c        (revision 179592)
> +++ fold-const.c        (working copy)
> @@ -8387,6 +8387,33 @@
>   if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0)
>     return tem;
>
> +  if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR)
> +      && !TREE_SIDE_EFFECTS (arg1)
> +      && simple_operand_p (arg1)
> +      && LOGICAL_OP_NON_SHORT_CIRCUIT

Why only for LOGICAL_OP_NON_SHORT_CIRCUIT?  It doesn't make
a difference for !LOGICAL_OP_NON_SHORT_CIRCUIT targets, but ...

> +      && !FLOAT_TYPE_P (TREE_TYPE (arg1))

?  I hope we don't have &&|| float.

> +      && ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
> +          && TREE_CODE (arg1) != TRUTH_NOT_EXPR)
> +         || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0)

?  simple_operand_p would have rejected both ! and comparisons.

I miss a test for side-effects on arg0 (and probably simple_operand_p there,
as well).

> +    {
> +      if (TREE_CODE (arg0) == code
> +          && !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))
> +          && simple_operand_p (TREE_OPERAND (arg0, 1)))

Err ... so why do you recurse here (and associate)?  Even with different
predicates than above ...

And similar transforms seem to happen in fold_truthop - did you
investigate why it didn't trigger there.

And I'm missing a testcase.

Richard.

> +       {
> +         tem = build2_loc (loc,
> +                           (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
> +                                                     : TRUTH_OR_EXPR),
> +                           type, TREE_OPERAND (arg0, 1), arg1);
> +       return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), tem);
> +      }
> +      if (!TREE_SIDE_EFFECTS (arg0)
> +          && simple_operand_p (arg0))
> +       return build2_loc (loc,
> +                          (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
> +                                                    : TRUTH_OR_EXPR),
> +                          type, arg0, arg1);
> +    }
> +
>   return NULL_TREE;
>  }
>
>


Re: [PATCH, PR50527] Don't assume alignment of vla-related allocas.

2011-10-06 Thread Richard Guenther
On Wed, Oct 5, 2011 at 11:07 PM, Tom de Vries  wrote:
> On 10/05/2011 10:46 AM, Richard Guenther wrote:
>> On Tue, Oct 4, 2011 at 6:28 PM, Tom de Vries  wrote:
>>> On 10/04/2011 03:03 PM, Richard Guenther wrote:
 On Tue, Oct 4, 2011 at 9:43 AM, Tom de Vries  
 wrote:
> On 10/01/2011 05:46 PM, Tom de Vries wrote:
>> On 09/30/2011 03:29 PM, Richard Guenther wrote:
>>> On Thu, Sep 29, 2011 at 3:15 PM, Tom de Vries  
>>> wrote:
 On 09/28/2011 11:53 AM, Richard Guenther wrote:
> On Wed, Sep 28, 2011 at 11:34 AM, Tom de Vries 
>  wrote:
>> Richard,
>>
>> I got a patch for PR50527.
>>
>> The patch prevents the alignment of vla-related allocas to be set to
>> BIGGEST_ALIGNMENT in ccp. The alignment may turn out smaller after 
>> folding
>> the alloca.
>>
>> Bootstrapped and regtested on x86_64.
>>
>> OK for trunk?
>
> Hmm.  As gfortran with -fstack-arrays uses VLAs it's probably bad that
> the vectorizer then will no longer see that the arrays are properly 
> aligned.
>
> I'm not sure what the best thing to do is here, other than trying to 
> record
> the alignment requirement of the VLA somewhere.
>
> Forcing the alignment of the alloca replacement decl to 
> BIGGEST_ALIGNMENT
> has the issue that it will force stack-realignment which isn't free 
> (and the
> point was to make the decl cheaper than the alloca).  But that might
> possibly be the better choice.
>
> Any other thoughts?

 How about the approach in this (untested) patch? Using the DECL_ALIGN 
 of the vla
 for the new array prevents stack realignment for folded vla-allocas, 
 also for
 large vlas.

 This will not help in vectorizing large folded vla-allocas, but I 
 think it's not
 reasonable to expect BIGGEST_ALIGNMENT when writing a vla (although 
 that has
 been the case up until we started to fold). If you want to trigger 
 vectorization
 for a vla, you can still use the aligned attribute on the declaration.

 Still, the unfolded vla-allocas will have BIGGEST_ALIGNMENT, also 
 without using
 an attribute on the decl. This patch exploits this by setting it at 
 the end of
 the 3rd pass_ccp, renamed to pass_ccp_last. This is not very effective 
 in
 propagation though, because although the ptr_info of the lhs is 
 propagated via
 copy_prop afterwards, it's not propagated anymore via ccp.

 Another way to do this would be to set BIGGEST_ALIGNMENT at the end of 
 ccp2 and
 not fold during ccp3.
>>>
>>> Ugh, somehow I like this the least ;)
>>>
>>> How about lowering VLAs to
>>>
>>>   p = __builtin_alloca (...);
>>>   p = __builtin_assume_aligned (p, DECL_ALIGN (vla));
>>>
>>> and not assume anything for alloca itself if it feeds a
>>> __builtin_assume_aligned?
>>>
>>> Or rather introduce a __builtin_alloca_with_align () and for VLAs do
>>>
>>>  p = __builtin_alloca_with_align (..., DECL_ALIGN (vla));
>>>
>>> that's less awkward to use?
>>>
>>> Sorry for not having a clear plan here ;)
>>>
>>
>> Using assume_aligned is a more orthogonal way to represent this in 
>> gimple, but
>> indeed harder to use.
>>
>> Another possibility is to add a 'tree vla_decl' field to struct
>> gimple_statement_call, which is probably the easiest to implement.
>>
>> But I think __builtin_alloca_with_align might have a use beyond vlas, so 
>> I
>> decided to try this one. Attached patch implements my first stab at this 
>>  (now
>> testing on x86_64).
>>
>> Is this an acceptable approach?
>>
>
> bootstrapped and reg-tested (including ada) on x86_64.
>
> Ok for trunk?

 The idea is ok I think.  But

      case BUILT_IN_ALLOCA:
 +    case BUILT_IN_ALLOCA_WITH_ALIGN:
        /* If the allocation stems from the declaration of a variable-sized
          object, it cannot accumulate.  */
        target = expand_builtin_alloca (exp, CALL_ALLOCA_FOR_VAR_P (exp));
        if (target)
         return target;
 +      if (DECL_FUNCTION_CODE (get_callee_fndecl (exp))
 +         == BUILT_IN_ALLOCA_WITH_ALIGN)
 +       {
 +         tree new_call = build_call_expr_loc (EXPR_LOCATION (exp),
 +                                              
 built_in_decls[BUILT_IN_ALLOCA],
 +                                              1, CALL_EXPR_ARG (exp, 0));
 +         CALL_ALLOCA_FOR_VAR_P (new_call) = CALL_ALLOCA_FOR_VAR_P (exp);
 +         exp = new_call;
 +       }


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-06 Thread Kai Tietz
2011/10/6 Richard Guenther :
> On Thu, Oct 6, 2011 at 11:28 AM, Kai Tietz  wrote:
>> Hello,
>>
>> Sorry attached non-updated change.  Here with proper attached patch.
>> This patch improves in fold_truth_andor the generation of branch-conditions 
>> for targets having LOGICAL_OP_NON_SHORT_CIRCUIT set.  If right-hand side 
>> operation of a TRUTH_(OR|AND)IF_EXPR is simple operand, has no side-effects, 
>> and doesn't trap, then try to convert expression to a TRUTH_(AND|OR)_EXPR, 
>> if left-hand operand is a simple operand, and has no side-effects.
>>
>> ChangeLog
>>
>> 2011-10-06  Kai Tietz  
>>
>>        * fold-const.c (fold_truth_andor): Convert TRUTH_(AND|OR)IF_EXPR
>>        to TRUTH_OR_EXPR, if suitable.
>>
>> Bootstrapped and tested for all languages (including Ada and Obj-C++) on 
>> host x86_64-unknown-linux-gnu.  Ok for apply?
>>
>> Regards,
>> Kai
>>
>>
>> ndex: fold-const.c
>> ===
>> --- fold-const.c        (revision 179592)
>> +++ fold-const.c        (working copy)
>> @@ -8387,6 +8387,33 @@
>>   if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0)
>>     return tem;
>>
>> +  if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR)
>> +      && !TREE_SIDE_EFFECTS (arg1)
>> +      && simple_operand_p (arg1)
>> +      && LOGICAL_OP_NON_SHORT_CIRCUIT
>
> Why only for LOGICAL_OP_NON_SHORT_CIRCUIT?  It doesn't make
> a difference for !LOGICAL_OP_NON_SHORT_CIRCUIT targets, but ...
Well, I used this check only for not doing this transformation for
targets, which have low-cost branches.  This is the same thing as in
fold_truthop.  It does this transformation only if
LOGICAL_OP_NON_SHORT_CIRCUIT is true.
>> +      && !FLOAT_TYPE_P (TREE_TYPE (arg1))
>
> ?  I hope we don't have &&|| float.
This can happen.  Operands of TRUTH_AND|OR(IF)_EXPR aren't necessarily
of integral type.  After expansion in gimplifier, we have for sure
comparisons, but not in c-tree.

>> +      && ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
>> +          && TREE_CODE (arg1) != TRUTH_NOT_EXPR)
>> +         || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0)
>
> ?  simple_operand_p would have rejected both ! and comparisons.
This check is the same as in fold_truthop.  I used this check.  The
point here is that floats might trap.

> I miss a test for side-effects on arg0 (and probably simple_operand_p there,
> as well).
See inner of if condition for those checks.  I moved those checks for
arg1 out of the inner conditions to avoid double-checking.

>> +    {
>> +      if (TREE_CODE (arg0) == code
>> +          && !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))
>> +          && simple_operand_p (TREE_OPERAND (arg0, 1)))
>
> Err ... so why do you recurse here (and associate)?  Even with different
> predicates than above ...

See, here is the missing check. Point is that even if arg0 has
side-effects and is a (AND|OR)IF expression, we might be able to
associate with right-hand argument of arg0, if for it no side-effects
are existing.  Otherwise we wouldn't catch this case.
We have here in maximum a recursion level of one.

> And similar transforms seem to happen in fold_truthop - did you
> investigate why it didn't trigger there.

This is pretty simple.  The point is that only for comparisons this
transformation is done.  But in c-tree we don't have here necessarily
for TRUTH_(AND|OR)[IF]_EXPR comparison arguments, not necessarily
integral ones (see above).

> And I'm missing a testcase.

Ok, I'll add one.  Effect can be seen best after gimplification.

> Richard.
>
>> +       {
>> +         tem = build2_loc (loc,
>> +                           (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
>> +                                                     : TRUTH_OR_EXPR),
>> +                           type, TREE_OPERAND (arg0, 1), arg1);
>> +       return fold_build2_loc (loc, code, type, TREE_OPERAND (arg0, 0), 
>> tem);
>> +      }
>> +      if (!TREE_SIDE_EFFECTS (arg0)
>> +          && simple_operand_p (arg0))
>> +       return build2_loc (loc,
>> +                          (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
>> +                                                    : TRUTH_OR_EXPR),
>> +                          type, arg0, arg1);
>> +    }
>> +
>>   return NULL_TREE;
>>  }

Regards.
Kai


Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD

2011-10-06 Thread Uros Bizjak
On Thu, Oct 6, 2011 at 11:38 AM, Kirill Yukhin  wrote:

> Thanks for review. I did all but one
>
>> you have disabled all tests on ia32 - unconditionally use "-O2 -mfma
>> -mfpmath=sse" for dg-options, and these instructions will magically
>> appear on all targets.
>
> I am enabling these tests to run on ia32, they all fail to scan-assembler,
> since assemler is completely different, here is instance:
> test_neg_sub_neg_sub:
> .LFB15:
>        .cfi_startproc
>        subl    $12, %esp
>        .cfi_def_cfa_offset 16
>        vmovsd  16(%esp), %xmm2
>        vmovsd  24(%esp), %xmm1
>        vmovapd %xmm2, %xmm0
>        vfnmsub213sd    32(%esp), %xmm1, %xmm0
>        vfnmsub132sd    %xmm2, %xmm1, %xmm0
>        vmovsd  %xmm0, (%esp)
>        fldl    (%esp)
>        addl    $12, %esp
>        .cfi_def_cfa_offset 4
>        ret
>        .cfi_endproc
>
> On ia32 params are passed completely different and therefore code differs.

You can add __attribute__((sseregparm)) to function declaration. This
will force arguments to/from function to SSE registers. The problem
is, that it will result in "warning: 'sseregparm' attribute ignored",
but this can be ignored using dg-prune-output dg directive. Please see
many examples in the testsuite.

Uros.


Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-06 Thread Richard Guenther
On Wed, 5 Oct 2011, William J. Schmidt wrote:

> This patch addresses the poor code generation in PR46556 for the
> following code:
> 
> struct x
> {
>   int a[16];
>   int b[16];
>   int c[16];
> };
> 
> extern void foo (int, int, int);
> 
> void
> f (struct x *p, unsigned int n)
> {
>   foo (p->a[n], p->c[n], p->b[n]);
> }
> 
> Prior to the fix for PR32698, gcc calculated the offset for accessing
> the array elements as:  n*4; 64+n*4; 128+n*4.
> 
> Following that fix, the offsets are calculated as:  n*4; (n+16)*4; (n
> +32)*4.  This led to poor code generation on powerpc64 targets, among
> others.
> 
> The poor code generation was observed to not occur in loops, as the
> IVOPTS code does a good job of lowering these expressions to MEM_REFs.
> It was previously suggested that perhaps a general pass to lower memory
> accesses to MEM_REFs in GIMPLE would solve not only this, but other
> similar problems.  I spent some time looking into various approaches to
> this, and reviewing some previous attempts to do similar things.  In the
> end, I've concluded that this is a bad idea in practice because of the
> loss of useful aliasing information.  In particular, early lowering of
> component references causes us to lose the ability to disambiguate
> non-overlapping references in the same structure, and there is no simple
> way to carry the necessary aliasing information along with the
> replacement MEM_REFs to avoid this.  While some performance gains are
> available with GIMPLE lowering of memory accesses, there are also
> offsetting performance losses, and I suspect this would just be a
> continuous source of bug reports into the future.
> 
> Therefore the current patch is a much simpler approach to solve the
> specific problem noted in the PR.  There are two pieces to the patch:
> 
>  * The offending addressing pattern is matched in GIMPLE and transformed
> into a restructured MEM_REF that distributes the multiply, so that (n
> +32)*4 becomes 4*n+128 as before.  This is done during the reassociation
> pass, for reasons described below.  The transformation only occurs in
> non-loop blocks, since IVOPTS does a good job on such things within
> loops.
>  * A tweak is added to the RTL forward-propagator to avoid propagating
> into memory references based on a single base register with no offset,
> under certain circumstances.  This improves sharing of base registers
> for accesses within the same structure and slightly lowers register
> pressure.
> 
> It would be possible to separate these into two patches if that's
> preferred.  I chose to combine them because together they provide the
> ideal code generation that the new test cases test for.
> 
> I initially implemented the pattern matcher during expand, but I found
> that the expanded code for two accesses to the same structure was often
> not being CSEd well.  So I moved it back into the GIMPLE phases prior to
> DOM to take advantage of its CSE.  To avoid an additional complete pass
> over the IL, I chose to piggyback on the reassociation pass.  This
> transformation is not technically a reassociation, but it is related
> enough to not be a complete wart.
> 
> One noob question about this:  It would probably be preferable to have
> this transformation only take place during the second reassociation
> pass, so the ARRAY_REFs are seen by earlier optimization phases.  Is
> there an easy way to detect that it's the second pass without having to
> generate a separate pass entry point?
> 
> One other general question about the pattern-match transformation:  Is
> this an appropriate transformation for all targets, or should it be
> somehow gated on available addressing modes on the target processor?
> 
> Bootstrapped and regression tested on powerpc64-linux-gnu.  Verified no
> performance degradations on that target for SPEC CPU2000 and CPU2006.
> 
> I'm looking for eventual approval for trunk after any comments are
> resolved.  Thanks!

People have already commented on pieces, so I'm looking only
at the tree-ssa-reassoc.c pieces (did you consider piggy-backing
on IVOPTs instead?  The idea is to expose additional CSE
opportunities, right?  So it's sort-of a strength-reduction
optimization on scalar code (classically strength reduction
in loops transforms for (i) { z = i*x; } to z = 0; for (i) { z += x }).
That might be worth in general, even for non-address cases.
So - if you rename that thing to tree-ssa-strength-reduce.c you
can get away without piggy-backing on anything ;)  If you
structure it to detect a strength reduction opportunity
(thus, you'd need to match two/multiple of the patterns at the same time)
that would be a bonus ... generalizing it a little bit would be
another.

Now some comments on the patch ...

> Bill
> 
> 
> 2011-10-05  Bill Schmidt  
> 
> gcc:
>   
>   PR rtl-optimization/46556
>   * fwprop.c (fwprop_bb_aux_d): New struct.
>   (MEM_PLUS_REGS): New macro.
>   (record_mem_plus_reg): New function.
>   (record_mem_plus_re

Re: [v3] versioned-namespaces spelling/soname change

2011-10-06 Thread Paolo Carlini

On 10/06/2011 02:03 AM, Paolo Carlini wrote:

Hi,

the below hunk seems spurious?!?
... I went ahead and reverted the change, wasn't documented anywhere, 
definitely unintended.


Paolo.


Re: Unreviewed libgcc patches

2011-10-06 Thread Rainer Orth
Paolo Bonzini  writes:

> On 09/22/2011 08:18 PM, Rainer Orth wrote:
>>  [build] Move gthr to toplevel libgcc
>>  http://gcc.gnu.org/ml/gcc-patches/2011-08/msg00762.html
>
> Can you post an updated patch for this one?  I'll try to review the others
> as soon as possible.

Do you see a change to get the other patches reviewed before stage1
closes?  I'd like to get them into 4.7 rather than carry them forward
for several months.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Unreviewed libgcc patches

2011-10-06 Thread Paolo Bonzini

On 10/06/2011 12:21 PM, Rainer Orth wrote:

>  Can you post an updated patch for this one?  I'll try to review the others
>  as soon as possible.

Do you see a change to get the other patches reviewed before stage1
closes?  I'd like to get them into 4.7 rather than carry them forward
for several months.


Yes, I'm very sorry for the delay.

Paolo


Re: Commit: RX: Codegen bug fixes

2011-10-06 Thread Nick Clifton

Hi Richard,


The SMIN pattern has the same problem.


*sigh*  Fixed.

Cheers
  Nick




Re: Initial shrink-wrapping patch

2011-10-06 Thread Bernd Schmidt
On 10/06/11 05:17, Ian Lance Taylor wrote:
> Thinking about it I think this is the wrong approach.  The -fsplit-stack
> code by definition has to wrap the entire function and it can not modify
> any callee-saved registers.  We should do shrink wrapping before
> -fsplit-stack, not the other way around.

Sorry, I'm not following what you're saying here. Can you elaborate?


Bernd


[PATCH] Some TLC

2011-10-06 Thread Richard Guenther

Noticed when working on vector/complex folding and simplification.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2011-10-06  Richard Guenther  

* fold-const.c (fold_ternary_loc): Also fold non-constant
vector CONSTRUCTORs.  Make more efficient.
* tree-ssa-dom.c (cprop_operand): Don't handle virtual operands.
(cprop_into_stmt): Don't propagate into virtual operands.
(optimize_stmt): Really dump original statement.

Index: gcc/fold-const.c
===
*** gcc/fold-const.c(revision 179592)
--- gcc/fold-const.c(working copy)
*** fold_ternary_loc (location_t loc, enum t
*** 13647,13653 
  
  case BIT_FIELD_REF:
if ((TREE_CODE (arg0) == VECTOR_CST
!  || (TREE_CODE (arg0) == CONSTRUCTOR && TREE_CONSTANT (arg0)))
  && type == TREE_TYPE (TREE_TYPE (arg0)))
{
  unsigned HOST_WIDE_INT width = tree_low_cst (arg1, 1);
--- 13647,13653 
  
  case BIT_FIELD_REF:
if ((TREE_CODE (arg0) == VECTOR_CST
!  || TREE_CODE (arg0) == CONSTRUCTOR)
  && type == TREE_TYPE (TREE_TYPE (arg0)))
{
  unsigned HOST_WIDE_INT width = tree_low_cst (arg1, 1);
*** fold_ternary_loc (location_t loc, enum t
*** 13659,13682 
  && (idx = idx / width)
 < TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))
{
- tree elements = NULL_TREE;
- 
  if (TREE_CODE (arg0) == VECTOR_CST)
-   elements = TREE_VECTOR_CST_ELTS (arg0);
- else
{
! unsigned HOST_WIDE_INT idx;
! tree value;
! 
! FOR_EACH_CONSTRUCTOR_VALUE (CONSTRUCTOR_ELTS (arg0), idx, 
value)
!   elements = tree_cons (NULL_TREE, value, elements);
}
! while (idx-- > 0 && elements)
!   elements = TREE_CHAIN (elements);
! if (elements)
!   return TREE_VALUE (elements);
! else
!   return build_zero_cst (type);
}
}
  
--- 13659,13675 
  && (idx = idx / width)
 < TYPE_VECTOR_SUBPARTS (TREE_TYPE (arg0)))
{
  if (TREE_CODE (arg0) == VECTOR_CST)
{
! tree elements = TREE_VECTOR_CST_ELTS (arg0);
! while (idx-- > 0 && elements)
!   elements = TREE_CHAIN (elements);
! if (elements)
!   return TREE_VALUE (elements);
}
! else if (idx < CONSTRUCTOR_NELTS (arg0))
!   return CONSTRUCTOR_ELT (arg0, idx)->value;
! return build_zero_cst (type);
}
}
  
Index: gcc/tree-ssa-dom.c
===
*** gcc/tree-ssa-dom.c  (revision 179592)
--- gcc/tree-ssa-dom.c  (working copy)
*** cprop_operand (gimple stmt, use_operand_
*** 1995,2011 
val = SSA_NAME_VALUE (op);
if (val && val != op)
  {
-   /* Do not change the base variable in the virtual operand
-tables.  That would make it impossible to reconstruct
-the renamed virtual operand if we later modify this
-statement.  Also only allow the new value to be an SSA_NAME
-for propagation into virtual operands.  */
-   if (!is_gimple_reg (op)
- && (TREE_CODE (val) != SSA_NAME
- || is_gimple_reg (val)
- || get_virtual_var (val) != get_virtual_var (op)))
-   return;
- 
/* Do not replace hard register operands in asm statements.  */
if (gimple_code (stmt) == GIMPLE_ASM
  && !may_propagate_copy_into_asm (op))
--- 1995,2000 
*** cprop_into_stmt (gimple stmt)
*** 2076,2086 
use_operand_p op_p;
ssa_op_iter iter;
  
!   FOR_EACH_SSA_USE_OPERAND (op_p, stmt, iter, SSA_OP_ALL_USES)
! {
!   if (TREE_CODE (USE_FROM_PTR (op_p)) == SSA_NAME)
!   cprop_operand (stmt, op_p);
! }
  }
  
  /* Optimize the statement pointed to by iterator SI.
--- 2065,2072 
use_operand_p op_p;
ssa_op_iter iter;
  
!   FOR_EACH_SSA_USE_OPERAND (op_p, stmt, iter, SSA_OP_USE)
! cprop_operand (stmt, op_p);
  }
  
  /* Optimize the statement pointed to by iterator SI.
*** optimize_stmt (basic_block bb, gimple_st
*** 2107,2124 
  
old_stmt = stmt = gsi_stmt (si);
  
-   if (gimple_code (stmt) == GIMPLE_COND)
- canonicalize_comparison (stmt);
- 
-   update_stmt_if_modified (stmt);
-   opt_stats.num_stmts++;
- 
if (dump_file && (dump_flags & TDF_DETAILS))
  {
fprintf (dump_file, "Optimizing statement ");
print_gimple_stmt (dump_file, stmt, 0, TDF_SLIM);
  }
  
/* Const/copy propagate into USES, VUSES and the RHS of VDEFs.  */
cprop_into_stmt (stmt);
  
--- 2093,2110 
  
old_stmt = stmt =

Re: [Patch] Support DEC-C extensions

2011-10-06 Thread Gabriel Dos Reis
On Tue, Oct 4, 2011 at 5:46 AM, Pedro Alves  wrote:
> On Tuesday 04 October 2011 11:16:30, Gabriel Dos Reis wrote:
>
>> > Do we need to consider ABIs that have calling conventions that
>> > treat unprototyped and varargs functions differently? (is there any?)
>>
>> Could you elaborate on the equivalence of these declarations?
>
> I expected that with:
>
>  extern void foo();
>  extern void bar(...);
>  foo (1, 2, 0.3f, NULL, 5);
>  bar (1, 2, 0.3f, NULL, 5);
>
> the compiler would emit the same for both of those
> calls (calling convention wise).  That is, for example,
> on x86-64, %rax is set to 1 (number of floating point
> parameters passed to the function in SSE registers) in
> both cases.

Except that variadics use a different kind of calling convention
than the rest.

>
> But not to be equivalent at the source level, that is:
>
>  extern void foo();
>  extern void foo(int a);
>  extern void bar(...);
>  extern void bar(int a);
>
> should be a "conflicting types for ’bar’" error in C.
>
> --
> Pedro Alves
>


Re: Vector shuffling

2011-10-06 Thread Georg-Johann Lay
Artem Shinkarov schrieb:
> Hi, Richard
> 
> There is a problem with the testcases of the patch you have committed
> for me. The code in every test-case is doubled. Could you please,
> apply the following patch, otherwise it would fail all the tests from
> the vector-shuffle-patch would fail.
> 
> Also, if it is possible, could you change my name from in the
> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
> version is the way I am spelled in the passport, and the name I use in
> the ChangeLog.
> 
> Thanks,
> Artem.
> 
> On Mon, Oct 3, 2011 at 4:13 PM, Richard Henderson  wrote:
>> On 10/03/2011 05:14 AM, Artem Shinkarov wrote:
>>> Hi, can anyone commit it please?
>>>
>>> Richard?
>>> Or may be Richard?
>> Committed.
>>
>> r~
>>
> Hi, Richard
> 
> There is a problem with the testcases of the patch you have committed
> for me. The code in every test-case is doubled. Could you please,
> apply the following patch, otherwise it would fail all the tests from
> the vector-shuffle-patch would fail.
> 
> Also, if it is possible, could you change my name from in the
> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
> version is the way I am spelled in the passport, and the name I use in
> the ChangeLog.
> 
> 
> Thanks,
> Artem.
> 

The following test cases cause FAILs because main cannot be found by the linker
 because if __SIZEOF_INT__ != 4 you are trying to compile and run an empty file.

> Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c

> Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c

The following patch avoids __SIZEOF_INT__.

Ok by some maintainer to commit?

Johann

testsuite/
* lib/target-supports.exp (check_effective_target_int32): New
function.
* gcc.c-torture/execute/vect-shuffle-1.c: Don't use
__SIZEOF_INT__.
* gcc.c-torture/execute/vect-shuffle-5.c: Ditto.
* gcc.c-torture/execute/vect-shuffle-1.x: New file.
* gcc.c-torture/execute/vect-shuffle-5.x: New file.

Index: lib/target-supports.exp
===
--- lib/target-supports.exp	(revision 179599)
+++ lib/target-supports.exp	(working copy)
@@ -1583,6 +1583,15 @@ proc check_effective_target_int16 { } {
 }]
 }
 
+# Returns 1 if we're generating 32-bit integers with the
+# default options, 0 otherwise.
+
+proc check_effective_target_int32 { } {
+return [check_no_compiler_messages int32 object {
+	int dummy[sizeof (int) == 4 ? 1 : -1];
+}]
+}
+
 # Return 1 if we're generating 64-bit code using default options, 0
 # otherwise.
 
Index: gcc.c-torture/execute/vect-shuffle-1.c
===
--- gcc.c-torture/execute/vect-shuffle-1.c	(revision 179599)
+++ gcc.c-torture/execute/vect-shuffle-1.c	(working copy)
@@ -1,4 +1,3 @@
-#if __SIZEOF_INT__ == 4
 typedef unsigned int V __attribute__((vector_size(16), may_alias));
 
 struct S
@@ -64,5 +63,3 @@ int main()
 
   return 0;
 }
-
-#endif /* SIZEOF_INT */
Index: gcc.c-torture/execute/vect-shuffle-1.x
===
--- gcc.c-torture/execute/vect-shuffle-1.x	(revision 0)
+++ gcc.c-torture/execute/vect-shuffle-1.x	(revision 0)
@@ -0,0 +1,7 @@
+load_lib target-supports.exp
+
+if { [check_effective_target_int32] } {
+	return 0
+}
+
+return 1;
Index: gcc.c-torture/execute/vect-shuffle-5.c
===
--- gcc.c-torture/execute/vect-shuffle-5.c	(revision 179599)
+++ gcc.c-torture/execute/vect-shuffle-5.c	(working copy)
@@ -1,4 +1,3 @@
-#if __SIZEOF_INT__ == 4
 typedef unsigned int V __attribute__((vector_size(16), may_alias));
 
 struct S
@@ -60,5 +59,3 @@ int main()
 
   return 0;
 }
-
-#endif /* SIZEOF_INT */
Index: gcc.c-torture/execute/vect-shuffle-5.x
===
--- gcc.c-torture/execute/vect-shuffle-5.x	(revision 0)
+++ gcc.c-torture/execute/vect-shuffle-5.x	(revision 0)
@@ -0,0 +1,7 @@
+load_lib target-supports.exp
+
+if { [check_effective_target_int32] } {
+	return 0
+}
+
+return 1;


Re: [Patch] Support DEC-C extensions

2011-10-06 Thread Gabriel Dos Reis
On Tue, Oct 4, 2011 at 1:24 PM, Douglas Rupp  wrote:
> On 10/3/2011 8:35 AM, Gabriel Dos Reis wrote:
>>
>> "unnamed variadic functions" sounds as if the function itself is
>> unnamed, so not good.
>>
>>
>> -funnamed-variadic-parameter
>
> How about
> -fvariadic-parameters-unnamed
>
> there's already a -fvariadic-macros, so maybe putting variadic first is more
> consistent?

consistent with what?
Consistency would imply -fvariadic-functions.  But that does not make
much sense since variadic functions already exist in C.

-fvariadic-parameters-unnamed sounds as if the function could have
several variadic parameters, but that is what is being proposed.


Re: Vector shuffling

2011-10-06 Thread Richard Guenther
On Thu, Oct 6, 2011 at 12:51 PM, Georg-Johann Lay  wrote:
> Artem Shinkarov schrieb:
>> Hi, Richard
>>
>> There is a problem with the testcases of the patch you have committed
>> for me. The code in every test-case is doubled. Could you please,
>> apply the following patch, otherwise it would fail all the tests from
>> the vector-shuffle-patch would fail.
>>
>> Also, if it is possible, could you change my name from in the
>> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
>> version is the way I am spelled in the passport, and the name I use in
>> the ChangeLog.
>>
>> Thanks,
>> Artem.
>>
>> On Mon, Oct 3, 2011 at 4:13 PM, Richard Henderson  wrote:
>>> On 10/03/2011 05:14 AM, Artem Shinkarov wrote:
 Hi, can anyone commit it please?

 Richard?
 Or may be Richard?
>>> Committed.
>>>
>>> r~
>>>
>> Hi, Richard
>>
>> There is a problem with the testcases of the patch you have committed
>> for me. The code in every test-case is doubled. Could you please,
>> apply the following patch, otherwise it would fail all the tests from
>> the vector-shuffle-patch would fail.
>>
>> Also, if it is possible, could you change my name from in the
>> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
>> version is the way I am spelled in the passport, and the name I use in
>> the ChangeLog.
>>
>>
>> Thanks,
>> Artem.
>>
>
> The following test cases cause FAILs because main cannot be found by the 
> linker
>  because if __SIZEOF_INT__ != 4 you are trying to compile and run an empty 
> file.
>
>> Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
>
>> Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c
>
> The following patch avoids __SIZEOF_INT__.
>
> Ok by some maintainer to commit?

On a general note, if you need to add .x files, consider moving the
test to gcc.dg/torture instead.

Richard.

> Johann
>
> testsuite/
>        * lib/target-supports.exp (check_effective_target_int32): New
>        function.
>        * gcc.c-torture/execute/vect-shuffle-1.c: Don't use
>        __SIZEOF_INT__.
>        * gcc.c-torture/execute/vect-shuffle-5.c: Ditto.
>        * gcc.c-torture/execute/vect-shuffle-1.x: New file.
>        * gcc.c-torture/execute/vect-shuffle-5.x: New file.
>
>


Re: Vector shuffling

2011-10-06 Thread Georg-Johann Lay
Richard Guenther schrieb:
> On Thu, Oct 6, 2011 at 12:51 PM, Georg-Johann Lay  wrote:
>> Artem Shinkarov schrieb:
>>> Hi, Richard
>>>
>>> There is a problem with the testcases of the patch you have committed
>>> for me. The code in every test-case is doubled. Could you please,
>>> apply the following patch, otherwise it would fail all the tests from
>>> the vector-shuffle-patch would fail.
>>>
>>> Also, if it is possible, could you change my name from in the
>>> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
>>> version is the way I am spelled in the passport, and the name I use in
>>> the ChangeLog.
>>>
>>> Thanks,
>>> Artem.
>>>
>>> On Mon, Oct 3, 2011 at 4:13 PM, Richard Henderson  wrote:
 On 10/03/2011 05:14 AM, Artem Shinkarov wrote:
> Hi, can anyone commit it please?
>
> Richard?
> Or may be Richard?
 Committed.

 r~

>>> Hi, Richard
>>>
>>> There is a problem with the testcases of the patch you have committed
>>> for me. The code in every test-case is doubled. Could you please,
>>> apply the following patch, otherwise it would fail all the tests from
>>> the vector-shuffle-patch would fail.
>>>
>>> Also, if it is possible, could you change my name from in the
>>> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
>>> version is the way I am spelled in the passport, and the name I use in
>>> the ChangeLog.
>>>
>>>
>>> Thanks,
>>> Artem.
>>>
>> The following test cases cause FAILs because main cannot be found by the 
>> linker
>>  because if __SIZEOF_INT__ != 4 you are trying to compile and run an empty 
>> file.
>>
>>> Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
>>> Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c
>> The following patch avoids __SIZEOF_INT__.
>>
>> Ok by some maintainer to commit?
> 
> On a general note, if you need to add .x files, consider moving the
> test to gcc.dg/torture instead.

So should I move all vect-shuffle-*.c files so that they are kept together?

Johann

> Richard.
> 
>> Johann
>>
>> testsuite/
>>* lib/target-supports.exp (check_effective_target_int32): New
>>function.
>>* gcc.c-torture/execute/vect-shuffle-1.c: Don't use
>>__SIZEOF_INT__.
>>* gcc.c-torture/execute/vect-shuffle-5.c: Ditto.
>>* gcc.c-torture/execute/vect-shuffle-1.x: New file.
>>* gcc.c-torture/execute/vect-shuffle-5.x: New file.


Re: Vector shuffling

2011-10-06 Thread Richard Guenther
On Thu, Oct 6, 2011 at 1:03 PM, Georg-Johann Lay  wrote:
> Richard Guenther schrieb:
>> On Thu, Oct 6, 2011 at 12:51 PM, Georg-Johann Lay  wrote:
>>> Artem Shinkarov schrieb:
 Hi, Richard

 There is a problem with the testcases of the patch you have committed
 for me. The code in every test-case is doubled. Could you please,
 apply the following patch, otherwise it would fail all the tests from
 the vector-shuffle-patch would fail.

 Also, if it is possible, could you change my name from in the
 ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
 version is the way I am spelled in the passport, and the name I use in
 the ChangeLog.

 Thanks,
 Artem.

 On Mon, Oct 3, 2011 at 4:13 PM, Richard Henderson  wrote:
> On 10/03/2011 05:14 AM, Artem Shinkarov wrote:
>> Hi, can anyone commit it please?
>>
>> Richard?
>> Or may be Richard?
> Committed.
>
> r~
>
 Hi, Richard

 There is a problem with the testcases of the patch you have committed
 for me. The code in every test-case is doubled. Could you please,
 apply the following patch, otherwise it would fail all the tests from
 the vector-shuffle-patch would fail.

 Also, if it is possible, could you change my name from in the
 ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
 version is the way I am spelled in the passport, and the name I use in
 the ChangeLog.


 Thanks,
 Artem.

>>> The following test cases cause FAILs because main cannot be found by the 
>>> linker
>>>  because if __SIZEOF_INT__ != 4 you are trying to compile and run an empty 
>>> file.
>>>
 Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
 Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c
>>> The following patch avoids __SIZEOF_INT__.
>>>
>>> Ok by some maintainer to commit?
>>
>> On a general note, if you need to add .x files, consider moving the
>> test to gcc.dg/torture instead.
>
> So should I move all vect-shuffle-*.c files so that they are kept together?

Yes.

> Johann
>
>> Richard.
>>
>>> Johann
>>>
>>> testsuite/
>>>        * lib/target-supports.exp (check_effective_target_int32): New
>>>        function.
>>>        * gcc.c-torture/execute/vect-shuffle-1.c: Don't use
>>>        __SIZEOF_INT__.
>>>        * gcc.c-torture/execute/vect-shuffle-5.c: Ditto.
>>>        * gcc.c-torture/execute/vect-shuffle-1.x: New file.
>>>        * gcc.c-torture/execute/vect-shuffle-5.x: New file.
>


Re: Vector shuffling

2011-10-06 Thread Jakub Jelinek
On Thu, Oct 06, 2011 at 12:51:54PM +0200, Georg-Johann Lay wrote:
> The following patch avoids __SIZEOF_INT__.
> 
> Ok by some maintainer to commit?

That is unnecessary.  You can just add
#else
int
main ()
{
  return 0;
}
before the final #endif in the files instead.
Or move around the #ifdefs, so that it ifdefs out for weirdo targets
just everything before main and then also main's body except for return 0;
at the end.

Jakub


[Committed] s390 bootstrap: last_bb_active set but not used

2011-10-06 Thread Andreas Krebbel
Hi,

this fixes a bootstrap problem on s390.  s390 doesn't have "return"
nor "simple_return" expanders so the last_bb_active variable stays
unused in thread_prologue_and_epilogue_insns.

Committed to mainline as obvious.

Bye,

-Andreas-


2011-10-06  Andreas Krebbel  

* function.c (thread_prologue_and_epilogue_insns): Mark
last_bb_active as possibly unused.  It is unused for targets which
do neither have "return" nor "simple_return" expanders.


Index: gcc/function.c
===
*** gcc/function.c.orig
--- gcc/function.c
*** thread_prologue_and_epilogue_insns (void
*** 5453,5459 
  {
bool inserted;
basic_block last_bb;
!   bool last_bb_active;
  #ifdef HAVE_simple_return
bool unconverted_simple_returns = false;
basic_block simple_return_block_hot = NULL;
--- 5453,5459 
  {
bool inserted;
basic_block last_bb;
!   bool last_bb_active ATTRIBUTE_UNUSED;
  #ifdef HAVE_simple_return
bool unconverted_simple_returns = false;
basic_block simple_return_block_hot = NULL;


Re: Vector shuffling

2011-10-06 Thread Georg-Johann Lay
Richard Guenther schrieb:
> On Thu, Oct 6, 2011 at 1:03 PM, Georg-Johann Lay  wrote:
>> Richard Guenther schrieb:
>>> On Thu, Oct 6, 2011 at 12:51 PM, Georg-Johann Lay  wrote:
 Artem Shinkarov schrieb:
> Hi, Richard
>
> There is a problem with the testcases of the patch you have committed
> for me. The code in every test-case is doubled. Could you please,
> apply the following patch, otherwise it would fail all the tests from
> the vector-shuffle-patch would fail.
>
> Also, if it is possible, could you change my name from in the
> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
> version is the way I am spelled in the passport, and the name I use in
> the ChangeLog.
>
> Thanks,
> Artem.
>
> On Mon, Oct 3, 2011 at 4:13 PM, Richard Henderson  wrote:
>> On 10/03/2011 05:14 AM, Artem Shinkarov wrote:
>>> Hi, can anyone commit it please?
>>>
>>> Richard?
>>> Or may be Richard?
>> Committed.
>>
>> r~
>>
> Hi, Richard
>
> There is a problem with the testcases of the patch you have committed
> for me. The code in every test-case is doubled. Could you please,
> apply the following patch, otherwise it would fail all the tests from
> the vector-shuffle-patch would fail.
>
> Also, if it is possible, could you change my name from in the
> ChangeLog from "Artem Shinkarov" to "Artjoms Sinkarovs". The last
> version is the way I am spelled in the passport, and the name I use in
> the ChangeLog.
>
>
> Thanks,
> Artem.
>
 The following test cases cause FAILs because main cannot be found by the 
 linker
  because if __SIZEOF_INT__ != 4 you are trying to compile and run an empty 
 file.

> Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
> Index: gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c
 The following patch avoids __SIZEOF_INT__.

 Ok by some maintainer to commit?
>>> On a general note, if you need to add .x files, consider moving the
>>> test to gcc.dg/torture instead.
>> So should I move all vect-shuffle-*.c files so that they are kept together?
> 
> Yes.

So here it is.  Lightly tested on my target: All tests either PASS or are
UNSUPPORTED now.

Ok?

Johann

testsuite/
* lib/target-supports.exp (check_effective_target_int32): New
function.
(check_effective_target_short16): New function.
(check_effective_target_longlong64): New function.

* gcc.c-torture/execute/vect-shuffle-1.c: Move to gcc.dg/torture.
* gcc.c-torture/execute/vect-shuffle-2.c: Move to gcc.dg/torture.
* gcc.c-torture/execute/vect-shuffle-3.c: Move to gcc.dg/torture.
* gcc.c-torture/execute/vect-shuffle-4.c: Move to gcc.dg/torture.
* gcc.c-torture/execute/vect-shuffle-5.c: Move to gcc.dg/torture.
* gcc.c-torture/execute/vect-shuffle-6.c: Move to gcc.dg/torture.
* gcc.c-torture/execute/vect-shuffle-7.c: Move to gcc.dg/torture.
* gcc.c-torture/execute/vect-shuffle-8.c: Move to gcc.dg/torture.
* gcc.dg/torture/vect-shuffle-1.c: Use dg-require-effective-target
int32 instead of __SIZEOF_INT__ == 4.
* gcc.dg/torture/vect-shuffle-5.c: Ditto.
* gcc.dg/torture/vect-shuffle-2.c: Use dg-require-effective-target
short16 instead of __SIZEOF_SHORT__ == 2.
* gcc.dg/torture/vect-shuffle-6.c: Ditto.
* gcc.dg/torture/vect-shuffle-3.c: Use dg-require-effective-target
longlong64 instead of __SIZEOF_LONG_LONG__ == 8.
* gcc.dg/torture/vect-shuffle-7.c: Ditto.


Index: lib/target-supports.exp
===
--- lib/target-supports.exp	(revision 179599)
+++ lib/target-supports.exp	(working copy)
@@ -1583,6 +1583,33 @@ proc check_effective_target_int16 { } {
 }]
 }
 
+# Returns 1 if we're generating 32-bit integers with the
+# default options, 0 otherwise.
+
+proc check_effective_target_int32 { } {
+return [check_no_compiler_messages int32 object {
+	int dummy[sizeof (int) == 4 ? 1 : -1];
+}]
+}
+
+# Returns 1 if we're generating 64-bit long long integers with the
+# default options, 0 otherwise.
+
+proc check_effective_target_longlong64 { } {
+return [check_no_compiler_messages longlong64 object {
+	int dummy[sizeof (long long ) == 8 ? 1 : -1];
+}]
+}
+
+# Returns 1 if we're generating 16-bit short integers with the
+# default options, 0 otherwise.
+
+proc check_effective_target_short16 { } {
+return [check_no_compiler_messages short16 object {
+	int dummy[sizeof (short) == 2 ? 1 : -1];
+}]
+}
+
 # Return 1 if we're generating 64-bit code using default options, 0
 # otherwise.
 
Index: gcc.c-torture/execute/vect-shuffle-2.c
===
--- gcc.c-torture/execute/vect-shuffle-2.c	(revision 179599)
+++ gcc.c-torture/execute/vect-shuf

Re: Modify gcc for use with gdb (issue5132047)

2011-10-06 Thread Diego Novillo

On 11-10-06 04:58 , Richard Guenther wrote:


I know you are on to that C++ thing and ending up returning a reference
to make it an lvalue.  Which I very much don't like (please, if you go
that route add _set functions and lower the case of the macros).


Not necessarily.  I'm after making the debugging experience easier 
(among other things).  Only a handful of macros were converted into 
functions in this patch, not all of them.  We may not *need* to convert 
all of them either.



What's the other advantage of using inline functions?  The gdb
annoyance with the macros can be solved with the .gdbinit macro
defines (which might be nice to commit to SVN btw).


Static type checking, of course.  Ability to set breakpoints, and as 
time goes on, more inline functions will start showing up.


We already have several.  The blacklist feature would solve your 
annoyance with tree_operand_length, too.  Additionally, blacklist can 
deal with non-inline functions, which can be useful.



Diego.


Re: [Patch, Fortran] PR50273 - Fix -Walign-commons diagnostic

2011-10-06 Thread Tobias Burnus

*ping*
http://gcc.gnu.org/ml/fortran/2011-09/msg00160.html

On 09/30/2011 10:50 AM, Tobias Burnus wrote:

Dear all,

with the following change in 4.5, the -Walign-commons warning got 
disabled:


"The |COMMON| default padding has been changed – instead of adding the 
padding before a variable it is now added afterwards, which increases 
the compatibility with other vendors and helps to obtain the correct 
output in some cases."


The attached patch restores the warning. I actually got a bit lost 
tracking the offset (and the "max_align" usage), hence, I wouldn't 
mind a careful review. However, testing didn't show any alignment issues.


Build and regtested (trunk) on x86-64-linux.
Ok for the trunk, 4.6 and 4.5?

Tobias




Re: [Patch, Fortran] Add c_float128{,_complex} as GNU extension to ISO_C_BINDING

2011-10-06 Thread Tobias Burnus

*ping*
http://gcc.gnu.org/ml/fortran/2011-09/msg00150.html

On 09/28/2011 04:28 PM, Tobias Burnus wrote:
This patch makes the GCC extension __float128 (_Complex) available in 
the C bindings via C_FLOAT128 and C_FLOAT128_COMPLEX.


Additionally, I have improved the diagnostic for explicitly use 
associating -std= versioned symbols. And I have finally added the 
iso*.def files to the makefile dependencies.


As usual, with -std=f2008, the GNU extensions are not loaded. I have 
also updated the documentation.


OK for the trunk?

Tobias

PS: If you think that C_FLOAT128/C_FLOAT128_COMPLEX are bad names for 
C's __float128, please speak up before gfortran - and other compilers 
implement it. (At least one vendor is implementing __float128 support 
and plans to modify ISO_C_BINDING.) The proper name would be 
C___FLOAT128, but that looks awkward!




Re: [patch] --enable-dynamic-string default for mingw-w64 v2

2011-10-06 Thread JonY
On 10/1/2011 22:31, JonY wrote:
> On 10/1/2011 19:16, Paolo Carlini wrote:
>> Hi,
>>
>>> Thanks, but I am having problems sending a proper diff with the
>>> regenerated files, they have a lot of unrelated, even if I made sure I
>>> am using autoconf 2.64 and automake 1.11.1.
>>
>> To be clear, regenerated files should **not** be part of the patch submitted 
>> for review, but should definitely appear on the ChangeLog and the changes 
>> eventually committed.
>>
> 
> After some careful adjustments, I see just configure and config.h.in
> changed.
> 
> New patch with updated Changelog attached. Be sure to copy
> config/os/mingw32 over to config/os/mingw32-w64 before continuing.
> 
>>> I guess its due to libtool versions, any ideas how to fix this?
>>
>> Nope, sorry, on Linux, provided the versions are correct - please double 
>> check that, I'm traveling - regen works like a charm. You just invoke 
>> autoreconf, right?
>>
>> Paolo
> 
> Yeah, just "autoreconf" under Cygwin to regenerate them, calling
> autoconf and autoheader avoids new libtool getting pulled in.
> 

Sorry, there seems to be another problem with the patch, I'll work on it
during the weekends.



signature.asc
Description: OpenPGP digital signature


Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD

2011-10-06 Thread Uros Bizjak
On Thu, Oct 6, 2011 at 2:51 PM, Kirill Yukhin  wrote:
> Wow, it works!
>
> Thank you. New patch attached.
> ChangeLogs were not touched.
>
> Tests pass both on ia32/x86-64 with and without simulator.

You are missing closing curly braces in dg-do compile directives.

Also, please write:

TYPE __attribute__((sseregparm))
test_noneg_sub_noneg_sub (TYPE a, TYPE b, TYPE c)

The patch is OK with these changes.

Thanks,
Uros.


Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD

2011-10-06 Thread Uros Bizjak
On Thu, Oct 6, 2011 at 2:55 PM, Uros Bizjak  wrote:
> On Thu, Oct 6, 2011 at 2:51 PM, Kirill Yukhin  wrote:
>> Wow, it works!
>>
>> Thank you. New patch attached.
>> ChangeLogs were not touched.
>>
>> Tests pass both on ia32/x86-64 with and without simulator.
>
> You are missing closing curly braces in dg-do compile directives.
>
> Also, please write:
>
> TYPE __attribute__((sseregparm))
> test_noneg_sub_noneg_sub (TYPE a, TYPE b, TYPE c)
>
> The patch is OK with these changes.

BTW, don't you also need "-mfmpath=sse" in dg-options?

Uros.


Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-06 Thread William J. Schmidt
On Thu, 2011-10-06 at 09:47 +0200, Paolo Bonzini wrote:
> And IIUC the other address is based on pseudo 125 as well, but the 
> combination is (plus (plus (reg 126) (reg 128)) (const_int X)) and 
> cannot be represented on ppc.  I think _this_ is the problem, so I'm 
> afraid your patch could cause pessimizations on x86 for example.  On 
> x86, which has a cheap REG+REG+CONST addressing mode, it is much better 
> to propagate pseudo 125 so that you can delete the set altogether.
> 
> However, indeed there is no downstream pass that undoes the 
> transformation.  Perhaps we can do it in CSE, since this _is_ CSE after 
> all. :)  The attached untested (uncompiled) patch is an attempt.
> 
> Paolo

Thanks, Paolo!  This makes good sense.  I will play with your (second :)
patch and let you know how it goes.

Bill



ARM: Fix PR49049

2011-10-06 Thread Bernd Schmidt
This corrects a brain fart in one of my patches last year: I added
another alternative to a subsi for subtraction of a constant. This is
bogus because such an operation should be canonicalized to a PLUS with
the negative constant, Normally that's what happens, and so testing
never showed that the alternative was only half-finished and didn't
work. PR49049 is a testcase where we do end up replacing a REG with a
constant and produce the bad alternative, leading to a crash.

Tested on arm-eabi and committed as obvious. Will do some sanity checks
on 4.6 and commit there as well.


Bernd
Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 179606)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2011-10-06  Bernd Schmidt  
+
+   PR target/49049
+   * config/arm/arm.md (arm_subsi3_insn): Lose the last alternative.
+
 2011-10-06  Ulrich Weigand  
 
PR target/50305
Index: gcc/testsuite/gcc.c-torture/compile/pr49049.c
===
--- gcc/testsuite/gcc.c-torture/compile/pr49049.c   (revision 0)
+++ gcc/testsuite/gcc.c-torture/compile/pr49049.c   (revision 0)
@@ -0,0 +1,28 @@
+__extension__ typedef unsigned long long int uint64_t;
+
+static int
+sub (int a, int b)
+{
+  return a - b;
+}
+
+static uint64_t
+add (uint64_t a, uint64_t b)
+{
+  return a + b;
+}
+
+int *ptr;
+
+int
+foo (uint64_t arg1, int *arg2)
+{
+  int j;
+  for (; j < 1; j++)
+{
+  *arg2 |= sub ( sub (sub (j || 1 ^ 0x1, 1), arg1 < 0x1 <=
+  sub (1, *ptr & j)),
+(sub ( j != 1 || sub (j && j, 1) >= 0,
+  add (!j > arg1, 0x35DLL;
+}
+}
Index: gcc/testsuite/ChangeLog
===
--- gcc/testsuite/ChangeLog (revision 179606)
+++ gcc/testsuite/ChangeLog (working copy)
@@ -1,3 +1,8 @@
+2011-10-06  Bernd Schmidt  
+
+   PR target/49049
+   * gcc.c-torture/compile/pr49049.c: New test.
+
 2011-10-06  Ulrich Weigand  
 
PR target/50305
Index: gcc/config/arm/arm.md
===
--- gcc/config/arm/arm.md   (revision 179606)
+++ gcc/config/arm/arm.md   (working copy)
@@ -1213,27 +1213,24 @@ (define_insn "thumb1_subsi3_insn"
 
 ; ??? Check Thumb-2 split length
 (define_insn_and_split "*arm_subsi3_insn"
-  [(set (match_operand:SI   0 "s_register_operand" "=r,r,rk,r,r")
-   (minus:SI (match_operand:SI 1 "reg_or_int_operand" "rI,r,k,?n,r")
- (match_operand:SI 2 "reg_or_int_operand" "r,rI,r, r,?n")))]
+  [(set (match_operand:SI   0 "s_register_operand" "=r,r,rk,r")
+   (minus:SI (match_operand:SI 1 "reg_or_int_operand" "rI,r,k,?n")
+ (match_operand:SI 2 "reg_or_int_operand" "r,rI,r, r")))]
   "TARGET_32BIT"
   "@
rsb%?\\t%0, %2, %1
sub%?\\t%0, %1, %2
sub%?\\t%0, %1, %2
-   #
#"
-  "&& ((GET_CODE (operands[1]) == CONST_INT
-   && !const_ok_for_arm (INTVAL (operands[1])))
-   || (GET_CODE (operands[2]) == CONST_INT
-  && !const_ok_for_arm (INTVAL (operands[2]"
+  "&& (GET_CODE (operands[1]) == CONST_INT
+   && !const_ok_for_arm (INTVAL (operands[1])))"
   [(clobber (const_int 0))]
   "
   arm_split_constant (MINUS, SImode, curr_insn,
   INTVAL (operands[1]), operands[0], operands[2], 0);
   DONE;
   "
-  [(set_attr "length" "4,4,4,16,16")
+  [(set_attr "length" "4,4,4,16")
(set_attr "predicable" "yes")]
 )
 


Re: Builtin infrastructure change

2011-10-06 Thread Tobias Burnus

On 10/06/2011 03:02 PM, Michael Meissner wrote:

On the x86 (with Fedora 13), I built and tested the C, C++, Objective C, Java, 
Ada,
and Go languages with no regressions



On a power6 box with RHEL 6.1, I
have done the same for C, C++, Objective C, Java, and Ada languages with no
regressions.


Any reason for not building and testing Fortran? Especially as you patch 
gcc/fortran/{trans*.c,f95-lang.c}?


Tobias


[gcc/fortran]
2011-10-05  Michael Meissner

* trans-expr.c (gfc_conv_power_op): Delete old interface with two
parallel arrays to hold standard builtin declarations, and replace
it with a function based interface that can support creating
builtins on the fly in the future.  Change all uses, and poison
the old names.  Make sure 0 is not a legitimate builtin index.
(fill_with_spaces): Ditto.
(gfc_trans_string_copy): Ditto.
(gfc_trans_zero_assign): Ditto.
(gfc_build_memcpy_call): Ditto.
(alloc_scalar_allocatable_for_assignment): Ditto.
* trans-array.c (gfc_trans_array_constructor_value): Ditto.
(duplicate_allocatable): Ditto.
(gfc_alloc_allocatable_for_assignment): Ditto.
* trans-openmp.c (gfc_omp_clause_copy_ctor): Ditto.
(gfc_omp_clause_assign_op): Ditto.
(gfc_trans_omp_atomic): Ditto.
(gfc_trans_omp_do): Ditto.
(gfc_trans_omp_task): Ditto.
* trans-stmt.c (gfc_trans_stop): Ditto.
(gfc_trans_sync): Ditto.
(gfc_trans_allocate): Ditto.
(gfc_trans_deallocate): Ditto.
* trans.c (gfc_call_malloc): Ditto.
(gfc_allocate_using_malloc): Ditto.
(gfc_call_free): Ditto.
(gfc_deallocate_with_status): Ditto.
(gfc_deallocate_scalar_with_status): Ditto.
* f95-lang.c (gfc_define_builtin): Ditto.
(gfc_init_builtin_functions): Ditto.
* trans-decl.c (create_main_function): Ditto.
* trans-intrinsic.c (builtin_decl_for_precision): Ditto.


[build] Restore FreeBSD/SPARC bootstrap (PR bootstrap/49804)

2011-10-06 Thread Rainer Orth
As reported in the PR, FreeBSD/SPARC bootstrap is broken by one of my
previous libgcc patches.  While the crtstuff one will fix it, I'd like
to avoid breaking the target.

The following patch fixes the problem, as confirmed in the PR.

Ok for mainline?

Rainer


2011-10-04  Rainer Orth  

PR bootstrap/49804
* config.host: Add crtbegin.o, crtbeginS.o, crtend.o, crtendS.o to
extra_parts.

# HG changeset patch
# Parent a57e226a2b14812bfa3c37c1aa807f28fac223eb
Restore FreeBSD/SPARC bootstrap (PR bootstrap/49804)

diff --git a/libgcc/config.host b/libgcc/config.host
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -777,7 +777,7 @@ sparc-wrs-vxworks)
 	;;
 sparc64-*-freebsd*|ultrasparc-*-freebsd*)
 	tmake_file="$tmake_file t-crtfm"
-	extra_parts=crtfastmath.o
+	extra_parts="crtbegin.o crtbeginS.o crtend.o crtendS.o crtfastmath.o"
 	;;
 sparc64-*-linux*)		# 64-bit SPARC's running GNU/Linux
 	extra_parts="$extra_parts crtfastmath.o"


-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [build] Restore FreeBSD/SPARC bootstrap (PR bootstrap/49804)

2011-10-06 Thread Paolo Bonzini

On 10/06/2011 03:29 PM, Rainer Orth wrote:

As reported in the PR, FreeBSD/SPARC bootstrap is broken by one of my
previous libgcc patches.  While the crtstuff one will fix it, I'd like
to avoid breaking the target.

The following patch fixes the problem, as confirmed in the PR.

Ok for mainline?

Rainer


2011-10-04  Rainer Orth

PR bootstrap/49804
* config.host: Add crtbegin.o, crtbeginS.o, crtend.o, crtendS.o to
extra_parts.








Ok.

Paolo


Re: [PATCH 0/3] Fix vector shuffle problems

2011-10-06 Thread Michael Matz
Hi,

On Wed, 5 Oct 2011, Richard Henderson wrote:

> Tested on x86_64 with
> 
>   check-gcc//unix/{,-mssse3,-msse4}
> 
> Hopefully one of the AMD guys can test on a bulldozer with -mxop?

=== gcc Summary for unix//-mxop ===

# of expected passes160


Ciao,
Michael.


Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-06 Thread William J. Schmidt
On Thu, 2011-10-06 at 12:13 +0200, Richard Guenther wrote:
> People have already commented on pieces, so I'm looking only
> at the tree-ssa-reassoc.c pieces (did you consider piggy-backing
> on IVOPTs instead?  The idea is to expose additional CSE
> opportunities, right?  So it's sort-of a strength-reduction
> optimization on scalar code (classically strength reduction
> in loops transforms for (i) { z = i*x; } to z = 0; for (i) { z += x }).
> That might be worth in general, even for non-address cases.
> So - if you rename that thing to tree-ssa-strength-reduce.c you
> can get away without piggy-backing on anything ;)  If you
> structure it to detect a strength reduction opportunity
> (thus, you'd need to match two/multiple of the patterns at the same time)
> that would be a bonus ... generalizing it a little bit would be
> another.

These are all good ideas.  I will think about casting this as a more
general strength reduction over extended basic blocks outside of loops.
First I'll put together some simple tests to see whether we're currently
missing some non-address opportunities.



> > +  mult_op0 = TREE_OPERAND (offset, 0);
> > +  mult_op1 = TREE_OPERAND (offset, 1);
> > +
> > +  if (TREE_CODE (mult_op0) != PLUS_EXPR
> > +  || TREE_CODE (mult_op1) != INTEGER_CST
> > +  || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
> > +return NULL_TREE;
> > +
> > +  t1 = TREE_OPERAND (base, 0);
> > +  t2 = TREE_OPERAND (mult_op0, 0);
> > +  c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1));
> > +  c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1));
> > +  c3 = TREE_INT_CST_LOW (mult_op1);
> 
> Before accessing TREE_INT_CST_LOW you need to make sure the
> constants fit into a HWI using host_integerp () (which
> conveniently includes the check for INTEGER_CST).
> 
> Note that you need to sign-extend the MEM_REF offset,
> thus use mem_ref_offset (base).low instead of
> TREE_INT_CST_LOW (TREE_OPERAND (base, 1)).  Might be worth
> to add a testcase with negative offset ;)

D'oh! >.<

> 
> > +  c4 = bitpos / BITS_PER_UNIT;
> > +  c = c1 + c2 * c3 + c4;
> 
> And you don't know whether this operation overflows.  Thus it's
> probably easiest to use double_ints instead of HOST_WIDE_INTs
> in all of the code.

OK, thanks, will do.



> 
> > +  /* Determine whether the expression can be represented with base and
> > + offset components.  */
> > +  base = get_inner_reference (*expr, &bitsize, &bitpos, &offset, &mode,
> > + &unsignedp, &volatilep, false);
> > +  if (!base || !offset)
> > +return false;
> > +
> > +  /* Look for a restructuring opportunity.  */
> > +  if ((mem_ref = restructure_base_and_offset (expr, gsi, base,
> > + offset, bitpos)) == NULL_TREE)
> > +return false;
> 
> What I'm missing is a check whether the old address computation stmts
> will be dead after the transform.

Hm, not quite sure what to do here.  Prior to the transformation I'll
have an assignment with something like:

ARRAY_REF (COMPONENT_REF (MEM_REF (Ta, Cb), FIELD_DECL c), Td)

on LHS or RHS.  Ta and Td will be part of the replacement.  What should
I be checking for?



> >  
> > -  if (is_gimple_assign (stmt)
> > - && !stmt_could_throw_p (stmt))
> > +  /* Look for restructuring opportunities within an expression
> > +that references memory.  We only do this for blocks not
> > + contained in loops, since the ivopts machinery does a 
> > + good job on loop expressions, and we don't want to interfere
> > +with other loop optimizations.  */
> > +  if (!in_loop && gimple_vuse (stmt) && gimple_assign_single_p (stmt))
> > {
> > + tree *lhs, *rhs;
> > + lhs = gimple_assign_lhs_ptr (stmt);
> > + chgd_mem_ref = restructure_mem_ref (lhs, &gsi) || chgd_mem_ref;
> > + rhs = gimple_assign_rhs1_ptr (stmt);
> > + chgd_mem_ref = restructure_mem_ref (rhs, &gsi) || chgd_mem_ref;
> 
> It will either be a store or a load, but never both (unless it's an
> aggregate copy which I think we should not handle).  So ...
> 
>   if (gimple_vdef (stmt))
> ... lhs
>   else if (gimple_vuse (stmt))
> ... rhs

OK, with your suggested gating on non-BLKmode I agree.

> > +   }
> > +
> > +  else if (is_gimple_assign (stmt)
> > +  && !stmt_could_throw_p (stmt))
> > +   {
> >   tree lhs, rhs1, rhs2;
> >   enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
> >  
> > @@ -2489,6 +2615,12 @@ reassociate_bb (basic_block bb)
> > }
> > }
> >  }
> > +  /* If memory references have been restructured, immediate uses need
> > + to be cleaned up.  */
> > +  if (chgd_mem_ref)
> > +for (gsi = gsi_start_bb (bb); !gsi_end_p (gsi); gsi_next (&gsi))
> > +  update_stmt (gsi_stmt (gsi));
> 
> ICK.  Definitely a no ;)
> 
> Why does a update_stmt () after the restructure_mem_ref call not work?

Ah, yeah, I meant to check again on that before submitting.  >.< 

IIRC, at some point the upda

Re: Initial shrink-wrapping patch

2011-10-06 Thread Bernd Schmidt
On 10/06/11 01:47, Bernd Schmidt wrote:
> This appears to be because the split prologue contains a jump, which
> means the find_many_sub_blocks call reorders the block numbers, and our
> indices into bb_flags are off.

Testing of the patch completed - ok? Regardless of split-stack it seems
like a cleanup and eliminates a potential source of errors.


Bernd


Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD

2011-10-06 Thread Kirill Yukhin
>
> BTW, don't you also need "-mfmpath=sse" in dg-options?
>

According to doc/invoke.texi
...
@itemx -mfma
...
These options will enable GCC to use these extended instructions in
generated code, even without @option{-mfpmath=sse}.

Seems it -mfpmath=sse is useless..
Although, if this is wrong, we probably have to update doc as well.

Thanks, K


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-06 Thread Michael Matz
Hi,

On Thu, 6 Oct 2011, Richard Guenther wrote:

> > +      && ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
> > +          && TREE_CODE (arg1) != TRUTH_NOT_EXPR)
> > +         || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0)
> 
> ?  simple_operand_p would have rejected both ! and comparisons.
> 
> I miss a test for side-effects on arg0 (and probably simple_operand_p there,
> as well).

He has it in the if() body.  But why?  The point of ANDIF/ORIF is to not 
evaluate the second argument for side-effects when the first argument is 
false/true already, and further to establish an order between both 
evaluations.  The sideeffect on the first arg is always evaluated.  
AND/OR always evaluate both arguments (in unspecified order), but as he 
checks the second one for being free of side effects already that alone is 
already equivalent to ANDIF/ORIF.  No need to check something on the first 
argument.


Ciao,
Michael.

Re: [Patch] Support DEC-C extensions

2011-10-06 Thread Tristan Gingold

On Oct 3, 2011, at 10:23 PM, Joseph S. Myers wrote:

> On Mon, 3 Oct 2011, Douglas Rupp wrote:
> 
>> On 9/30/2011 8:19 AM, Joseph S. Myers wrote:
>>> On Fri, 30 Sep 2011, Tristan Gingold wrote:
>>> 
 If you prefer a target hook, I'm fine with that.  I will write such a
 patch.
 
 I don't think it must be restricted to system headers, as it is possible
 that the user 'imports' such a function (and define it in one of VMS
 favorite languages such as macro-32 or bliss).
>>> If it's not restricted to system headers, then probably the option is
>>> better than the target hook.
>>> 
>> I'm not sure I understand the reasoning here.  This seems fairly VMS specific
>> so what is the downside for a target hook and user written headers?
> 
> The language accepted by the compiler in the user's source code (as 
> opposed to in system headers) shouldn't depend on the target except for 
> certain well-defined areas such as target attributes and built-in 
> functions; behaving the same across different systems is an important 
> feature of GCC.  This isn't one of those areas of target-dependence; it's 
> generic syntax rather than e.g. exploiting a particular processor feature.

So the consensus is for a dedicated option.  Which one do you prefer ?

-funnamed-variadic-parameter
-fpointless-variadic-functions
-fallow-parameterless-variadic-functions

I will update my patch once this is settled.

Thanks,
Tristan.




[PATCH, AIX] Add missing macros PR39950

2011-10-06 Thread David Edelsohn
The appended patch adds a few macros that XLC now defines on AIX.

- David

* config/rs6000/aix.h (TARGET_OS_AIX_CPP_BUILTINS): Define
__powerpc__, __PPC__, __unix__.

Index: aix.h
===
--- aix.h   (revision 179610)
+++ aix.h   (working copy)
@@ -97,6 +97,9 @@
 {  \
   builtin_define ("_IBMR2");   \
   builtin_define ("_POWER");   \
+  builtin_define ("__powerpc__");   \
+  builtin_define ("__PPC__");   \
+  builtin_define ("__unix__");  \
   builtin_define ("_AIX"); \
   builtin_define ("_AIX32");   \
   builtin_define ("_AIX41");   \


Re: [Patch] Support DEC-C extensions

2011-10-06 Thread Joseph S. Myers
On Thu, 6 Oct 2011, Tristan Gingold wrote:

> So the consensus is for a dedicated option.  Which one do you prefer ?
> 
> -funnamed-variadic-parameter
> -fpointless-variadic-functions
> -fallow-parameterless-variadic-functions

I prefer -fallow-parameterless-variadic-functions.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-06 Thread Richard Guenther
On Thu, 6 Oct 2011, William J. Schmidt wrote:

> On Thu, 2011-10-06 at 12:13 +0200, Richard Guenther wrote:
> > People have already commented on pieces, so I'm looking only
> > at the tree-ssa-reassoc.c pieces (did you consider piggy-backing
> > on IVOPTs instead?  The idea is to expose additional CSE
> > opportunities, right?  So it's sort-of a strength-reduction
> > optimization on scalar code (classically strength reduction
> > in loops transforms for (i) { z = i*x; } to z = 0; for (i) { z += x }).
> > That might be worth in general, even for non-address cases.
> > So - if you rename that thing to tree-ssa-strength-reduce.c you
> > can get away without piggy-backing on anything ;)  If you
> > structure it to detect a strength reduction opportunity
> > (thus, you'd need to match two/multiple of the patterns at the same time)
> > that would be a bonus ... generalizing it a little bit would be
> > another.
> 
> These are all good ideas.  I will think about casting this as a more
> general strength reduction over extended basic blocks outside of loops.
> First I'll put together some simple tests to see whether we're currently
> missing some non-address opportunities.
> 
> 
> 
> > > +  mult_op0 = TREE_OPERAND (offset, 0);
> > > +  mult_op1 = TREE_OPERAND (offset, 1);
> > > +
> > > +  if (TREE_CODE (mult_op0) != PLUS_EXPR
> > > +  || TREE_CODE (mult_op1) != INTEGER_CST
> > > +  || TREE_CODE (TREE_OPERAND (mult_op0, 1)) != INTEGER_CST)
> > > +return NULL_TREE;
> > > +
> > > +  t1 = TREE_OPERAND (base, 0);
> > > +  t2 = TREE_OPERAND (mult_op0, 0);
> > > +  c1 = TREE_INT_CST_LOW (TREE_OPERAND (base, 1));
> > > +  c2 = TREE_INT_CST_LOW (TREE_OPERAND (mult_op0, 1));
> > > +  c3 = TREE_INT_CST_LOW (mult_op1);
> > 
> > Before accessing TREE_INT_CST_LOW you need to make sure the
> > constants fit into a HWI using host_integerp () (which
> > conveniently includes the check for INTEGER_CST).
> > 
> > Note that you need to sign-extend the MEM_REF offset,
> > thus use mem_ref_offset (base).low instead of
> > TREE_INT_CST_LOW (TREE_OPERAND (base, 1)).  Might be worth
> > to add a testcase with negative offset ;)
> 
> D'oh! >.<
> 
> > 
> > > +  c4 = bitpos / BITS_PER_UNIT;
> > > +  c = c1 + c2 * c3 + c4;
> > 
> > And you don't know whether this operation overflows.  Thus it's
> > probably easiest to use double_ints instead of HOST_WIDE_INTs
> > in all of the code.
> 
> OK, thanks, will do.
> 
> 
> 
> > 
> > > +  /* Determine whether the expression can be represented with base and
> > > + offset components.  */
> > > +  base = get_inner_reference (*expr, &bitsize, &bitpos, &offset, &mode,
> > > +   &unsignedp, &volatilep, false);
> > > +  if (!base || !offset)
> > > +return false;
> > > +
> > > +  /* Look for a restructuring opportunity.  */
> > > +  if ((mem_ref = restructure_base_and_offset (expr, gsi, base,
> > > +   offset, bitpos)) == NULL_TREE)
> > > +return false;
> > 
> > What I'm missing is a check whether the old address computation stmts
> > will be dead after the transform.
> 
> Hm, not quite sure what to do here.  Prior to the transformation I'll
> have an assignment with something like:
> 
> ARRAY_REF (COMPONENT_REF (MEM_REF (Ta, Cb), FIELD_DECL c), Td)
> 
> on LHS or RHS.  Ta and Td will be part of the replacement.  What should
> I be checking for?

Doh, I thought you were matching gimple stmts that do the address
computation.  But now I see you are matching the tree returned from
get_inner_reference.  So no need to check anything for that case.

But that keeps me wondering what you'll do if the accesses were
all pointer arithmetic, not arrays.  Thus,

extern void foo (int, int, int);

void
f (int *p, unsigned int n)
{
 foo (p[n], p[n+64], p[n+128]);
}

wouldn't that have the same issue and you wouldn't handle it?

Richard.


[PATCH] Remove unnecessary SSA_NAME_DEF_STMT assignments in tree-vect-patterns.c

2011-10-06 Thread Jakub Jelinek
Hi!

If the second argument of gimple_build_assign_with_ops is an SSA_NAME,
gimple_build_assign_with_ops_stat calls gimple_assign_set_lhs
which does
  if (lhs && TREE_CODE (lhs) == SSA_NAME)
SSA_NAME_DEF_STMT (lhs) = gs;
so the SSA_NAME_DEF_STMT assignments in tree-vect-patterns.c aren't needed.
Cleaned up thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
ok for trunk?

2011-10-06  Jakub Jelinek  

* tree-vect-patterns.c (vect_handle_widen_mult_by_const): For lhs
don't set SSA_NAME_DEF_STMT that has been already set by
gimple_build_assign_with_ops.
(vect_recog_pow_pattern, vect_recog_widen_sum_pattern,
vect_operation_fits_smaller_type, vect_recog_over_widening_pattern):
Likewise.

--- gcc/tree-vect-patterns.c.jj 2011-10-06 12:37:34.0 +0200
+++ gcc/tree-vect-patterns.c2011-10-06 13:19:44.0 +0200
@@ -400,7 +400,6 @@ vect_handle_widen_mult_by_const (gimple 
   new_oprnd = make_ssa_name (tmp, NULL);
   new_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd, *oprnd,
   NULL_TREE);
-  SSA_NAME_DEF_STMT (new_oprnd) = new_stmt;
   STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
   VEC_safe_push (gimple, heap, *stmts, def_stmt);
   *oprnd = new_oprnd;
@@ -619,7 +618,6 @@ vect_recog_widen_mult_pattern (VEC (gimp
   var = vect_recog_temp_ssa_var (type, NULL);
   pattern_stmt = gimple_build_assign_with_ops (WIDEN_MULT_EXPR, var, oprnd0,
   oprnd1);
-  SSA_NAME_DEF_STMT (var) = pattern_stmt;
 
   if (vect_print_dump_info (REPORT_DETAILS))
 print_gimple_stmt (vect_dump, pattern_stmt, 0, TDF_SLIM);
@@ -703,7 +701,6 @@ vect_recog_pow_pattern (VEC (gimple, hea
 
   var = vect_recog_temp_ssa_var (TREE_TYPE (base), NULL);
   stmt = gimple_build_assign_with_ops (MULT_EXPR, var, base, base);
-  SSA_NAME_DEF_STMT (var) = stmt;
   return stmt;
 }
 
@@ -826,7 +823,6 @@ vect_recog_widen_sum_pattern (VEC (gimpl
   var = vect_recog_temp_ssa_var (type, NULL);
   pattern_stmt = gimple_build_assign_with_ops (WIDEN_SUM_EXPR, var,
   oprnd0, oprnd1);
-  SSA_NAME_DEF_STMT (var) = pattern_stmt;
 
   if (vect_print_dump_info (REPORT_DETAILS))
 {
@@ -1016,7 +1012,6 @@ vect_operation_fits_smaller_type (gimple
   new_oprnd = make_ssa_name (tmp, NULL);
   new_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd,
oprnd, NULL_TREE);
-  SSA_NAME_DEF_STMT (new_oprnd) = new_stmt;
   STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
   VEC_safe_push (gimple, heap, *stmts, def_stmt);
   oprnd = new_oprnd;
@@ -1038,7 +1033,6 @@ vect_operation_fits_smaller_type (gimple
   new_oprnd = make_ssa_name (tmp, NULL);
   new_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd,
oprnd, NULL_TREE);
-  SSA_NAME_DEF_STMT (new_oprnd) = new_stmt;
   oprnd = new_oprnd;
   *new_def_stmt = new_stmt;
 }
@@ -1141,9 +1135,9 @@ vect_recog_over_widening_pattern (VEC (g
 VEC_safe_push (gimple, heap, *stmts, prev_stmt);
 
   var = vect_recog_temp_ssa_var (new_type, NULL);
-  pattern_stmt = gimple_build_assign_with_ops (
-  gimple_assign_rhs_code (stmt), var, op0, op1);
-  SSA_NAME_DEF_STMT (var) = pattern_stmt;
+  pattern_stmt
+   = gimple_build_assign_with_ops (gimple_assign_rhs_code (stmt), var,
+   op0, op1);
   STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt;
   STMT_VINFO_PATTERN_DEF_STMT (vinfo_for_stmt (stmt)) = new_def_stmt;
 
@@ -1182,7 +1176,6 @@ vect_recog_over_widening_pattern (VEC (g
   new_oprnd = make_ssa_name (tmp, NULL);
   pattern_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd,
var, NULL_TREE);
-  SSA_NAME_DEF_STMT (new_oprnd) = pattern_stmt;
   STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt;
 
   *type_in = get_vectype_for_scalar_type (new_type);

Jakub


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-06 Thread Richard Guenther
On Thu, Oct 6, 2011 at 3:49 PM, Michael Matz  wrote:
> Hi,
>
> On Thu, 6 Oct 2011, Richard Guenther wrote:
>
>> > +      && ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
>> > +          && TREE_CODE (arg1) != TRUTH_NOT_EXPR)
>> > +         || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0)
>>
>> ?  simple_operand_p would have rejected both ! and comparisons.
>>
>> I miss a test for side-effects on arg0 (and probably simple_operand_p there,
>> as well).
>
> He has it in the if() body.  But why?  The point of ANDIF/ORIF is to not
> evaluate the second argument for side-effects when the first argument is
> false/true already, and further to establish an order between both
> evaluations.  The sideeffect on the first arg is always evaluated.
> AND/OR always evaluate both arguments (in unspecified order), but as he
> checks the second one for being free of side effects already that alone is
> already equivalent to ANDIF/ORIF.  No need to check something on the first
> argument.

It seems to me it should then simply be

  if (!TREE_SIDE_EFFECTS (arg1)
 && simple_operand_p (arg1))
   return fold-the-not-and-variant ();

Richard.


[PATCH] Don't fold always_inline not yet inlined builtins in gimple_fold_builtin

2011-10-06 Thread Jakub Jelinek
Hi!

The 3 functions in builtins.c that dispatch builtin folding give up
if avoid_folding_inline_builtin (fndecl) returns true, because we
want to wait with those functions until they are inlined (which for
-D_FORTIFY_SOURCE contains security checks).  Unfortunately
gimple_fold_builtin calls fold_builtin_str* etc. directly and thus bypasses
this check.  This didn't show up often because most of the inlines
have __restrict arguments and restrict casts weren't considered useless.

Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
preapproved by richi on IRC, will commit to trunk momentarily.

2011-10-06  Jakub Jelinek  

* tree.h (avoid_folding_inline_builtin): New prototype.
* builtins.c (avoid_folding_inline_builtin): No longer static.
* gimple-fold.c (gimple_fold_builtin): Give up if
avoid_folding_inline_builtin returns true.

--- gcc/tree.h.jj   2011-10-03 14:27:50.0 +0200
+++ gcc/tree.h  2011-10-06 13:26:32.0 +0200
@@ -5352,6 +5352,7 @@ fold_build_pointer_plus_hwi_loc (locatio
fold_build_pointer_plus_hwi_loc (UNKNOWN_LOCATION, p, o)
 
 /* In builtins.c */
+extern bool avoid_folding_inline_builtin (tree);
 extern tree fold_call_expr (location_t, tree, bool);
 extern tree fold_builtin_fputs (location_t, tree, tree, bool, bool, tree);
 extern tree fold_builtin_strcpy (location_t, tree, tree, tree, tree);
--- gcc/builtins.c.jj   2011-10-05 08:13:55.0 +0200
+++ gcc/builtins.c  2011-10-06 13:25:39.0 +0200
@@ -10360,7 +10360,7 @@ fold_builtin_varargs (location_t loc, tr
been inlined, otherwise e.g. -D_FORTIFY_SOURCE checking
might not be performed.  */
 
-static bool
+bool
 avoid_folding_inline_builtin (tree fndecl)
 {
   return (DECL_DECLARED_INLINE_P (fndecl)
--- gcc/gimple-fold.c.jj2011-10-06 09:14:17.0 +0200
+++ gcc/gimple-fold.c   2011-10-06 13:29:08.0 +0200
@@ -828,6 +828,11 @@ gimple_fold_builtin (gimple stmt)
   if (DECL_BUILT_IN_CLASS (callee) == BUILT_IN_MD)
 return NULL_TREE;
 
+  /* Give up for always_inline inline builtins until they are
+ inlined.  */
+  if (avoid_folding_inline_builtin (callee))
+return NULL_TREE;
+
   /* If the builtin could not be folded, and it has no argument list,
  we're done.  */
   nargs = gimple_call_num_args (stmt);

Jakub


[PATCH] Improve vector lowering a bit

2011-10-06 Thread Richard Guenther

This makes us lookup previous intermediate vector results when
decomposing a operation.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2011-10-06  Richard Guenther  

* tree-vect-generic.c (vector_element): Look at previous
generated results.

Index: gcc/tree-vect-generic.c
===
*** gcc/tree-vect-generic.c (revision 179598)
--- gcc/tree-vect-generic.c (working copy)
*** vector_element (gimple_stmt_iterator *gs
*** 536,541 
--- 536,552 
  idx = build_int_cst (TREE_TYPE (idx), index);
}
  
+   /* When lowering a vector statement sequence do some easy
+  simplification by looking through intermediate vector results.  */
+   if (TREE_CODE (vect) == SSA_NAME)
+   {
+ gimple def_stmt = SSA_NAME_DEF_STMT (vect);
+ if (is_gimple_assign (def_stmt)
+ && (gimple_assign_rhs_code (def_stmt) == VECTOR_CST
+ || gimple_assign_rhs_code (def_stmt) == CONSTRUCTOR))
+   vect = gimple_assign_rhs1 (def_stmt);
+   }
+ 
if (TREE_CODE (vect) == VECTOR_CST)
  {
  unsigned i;


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-06 Thread Kai Tietz
Hi,

I modified the patch so, that it always just converts two leafs of a 
TRUTH(AND|OR)IF chain into a TRUTH_(AND|OR) expression, if branch costs are 
high and leafs are simple without side-effects.

Additionally I added some testcases for it.

2011-10-06  Kai Tietz  

* fold-const.c (fold_truth_andor): Convert TRUTH_(AND|OR)IF_EXPR
to TRUTH_OR_EXPR, if suitable.

2011-10-06  Kai Tietz  

* gcc.dg/tree-ssa/ssa-ifbranch-1.c: New test.
* gcc.dg/tree-ssa/ssa-ifbranch-2.c: New test.
* gcc.dg/tree-ssa/ssa-ifbranch-3.c: New test.
* gcc.dg/tree-ssa/ssa-ifbranch-4.c: New test.

Bootstrapped and tested for all languages (including Ada and Obj-C++) on host 
x86_64-unknown-linux-gnu.  Ok for apply?

Regards,
Kai

Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-1.c
===
--- /dev/null
+++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-1.c
@@ -0,0 +1,18 @@
+/* Skip on MIPS, S/390, and AVR due LOGICAL_OP_NON_SHORT_CIRCUIT, and
+   lower values in BRANCH_COST.  */
+/* { dg-do compile { target { ! "mips*-*-* s390*-*-*  avr-*-* mn10300-*-*" } } 
} */
+/* { dg-options "-O2 -fdump-tree-gimple" } */
+/* { dg-options "-O2 -fdump-tree-gimple -march=i586" { target { i?86-*-* && 
ilp32 } } } */
+
+extern int doo1 (void);
+extern int doo2 (void);
+
+int bar (int a, int b, int c)
+{
+  if (a && b && c)
+return doo1 ();
+  return doo2 ();
+}
+
+/* { dg-final { scan-tree-dump-times "if " 2 "gimple" } } */
+/* { dg-final { cleanup-tree-dump "gimple" } } */
Index: gcc-head/gcc/fold-const.c
===
--- gcc-head.orig/gcc/fold-const.c
+++ gcc-head/gcc/fold-const.c
@@ -8387,6 +8387,45 @@ fold_truth_andor (location_t loc, enum t
   if ((tem = fold_truthop (loc, code, type, arg0, arg1)) != 0)
 return tem;
 
+  if ((code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR)
+  && !TREE_SIDE_EFFECTS (arg1)
+  && LOGICAL_OP_NON_SHORT_CIRCUIT
+  /* floats might trap.  */
+  && !FLOAT_TYPE_P (TREE_TYPE (arg1))
+  && ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
+   && TREE_CODE (arg1) != TRUTH_NOT_EXPR
+   && simple_operand_p (arg1))
+  || ((TREE_CODE_CLASS (TREE_CODE (arg1)) == tcc_comparison
+   || TREE_CODE (arg1) == TRUTH_NOT_EXPR)
+ /* Float comparison might trap.  */
+  && !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0)))
+  && simple_operand_p (TREE_OPERAND (arg1, 0)
+{
+  /* We want to combine truth-comparison for
+((W TRUTH-ANDOR X) TRUTH-ANDORIF Y) TRUTH-ANDORIF Z,
+if Y and Z are simple operands and have no side-effect to
+((W TRUTH-ANDOR X) TRUTH-IF (Y TRUTH-ANDOR Z).  */
+  if (TREE_CODE (arg0) == code
+  && !TREE_SIDE_EFFECTS (TREE_OPERAND (arg0, 1))
+  && simple_operand_p (TREE_OPERAND (arg0, 1)))
+   {
+ tem = build2_loc (loc,
+   (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
+ : TRUTH_OR_EXPR),
+   type, TREE_OPERAND (arg0, 1), arg1);
+ return build2_loc (loc, code, type, TREE_OPERAND (arg0, 0),
+tem);
+   }
+  /* Convert X TRUTH-ANDORIF Y to X TRUTH-ANDOR Y, if X and Y
+are simple operands and have no side-effects.  */
+  if (simple_operand_p (arg0)
+  && !TREE_SIDE_EFFECTS (arg0))
+   return build2_loc (loc,
+  (code == TRUTH_ANDIF_EXPR ? TRUTH_AND_EXPR
+: TRUTH_OR_EXPR),
+  type, arg0, arg1);
+}
+
   return NULL_TREE;
 }
 
Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-2.c
===
--- /dev/null
+++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-2.c
@@ -0,0 +1,18 @@
+/* Skip on MIPS, S/390, and AVR due LOGICAL_OP_NON_SHORT_CIRCUIT, and
+   lower values in BRANCH_COST.  */
+/* { dg-do compile { target { ! "mips*-*-* s390*-*-*  avr-*-* mn10300-*-*" } } 
} */
+/* { dg-options "-O2 -fdump-tree-gimple" } */
+/* { dg-options "-O2 -fdump-tree-gimple -march=i586" { target { i?86-*-* && 
ilp32 } } } */
+
+extern int doo1 (void);
+extern int doo2 (void);
+
+int bar (int a, int b, int c, int d)
+{
+  if (a && b && c && d)
+return doo1 ();
+  return doo2 ();
+}
+
+/* { dg-final { scan-tree-dump-times "if " 2 "gimple" } } */
+/* { dg-final { cleanup-tree-dump "gimple" } } */
Index: gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-3.c
===
--- /dev/null
+++ gcc-head/gcc/testsuite/gcc.dg/tree-ssa/ssa-ifbranch-3.c
@@ -0,0 +1,18 @@
+/* Skip on MIPS, S/390, and AVR due LOGICAL_OP_NON_SHORT_CIRCUIT, and
+   lower values in BRANCH_COST.  */
+/* { dg-do compile { target { ! "mips*-*-* s39

Re: [PATCH] Remove unnecessary SSA_NAME_DEF_STMT assignments in tree-vect-patterns.c

2011-10-06 Thread Richard Guenther
On Thu, 6 Oct 2011, Jakub Jelinek wrote:

> Hi!
> 
> If the second argument of gimple_build_assign_with_ops is an SSA_NAME,
> gimple_build_assign_with_ops_stat calls gimple_assign_set_lhs
> which does
>   if (lhs && TREE_CODE (lhs) == SSA_NAME)
> SSA_NAME_DEF_STMT (lhs) = gs;
> so the SSA_NAME_DEF_STMT assignments in tree-vect-patterns.c aren't needed.
> Cleaned up thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
> ok for trunk?

Ok.

Thanks,
Richard.

> 2011-10-06  Jakub Jelinek  
> 
>   * tree-vect-patterns.c (vect_handle_widen_mult_by_const): For lhs
>   don't set SSA_NAME_DEF_STMT that has been already set by
>   gimple_build_assign_with_ops.
>   (vect_recog_pow_pattern, vect_recog_widen_sum_pattern,
>   vect_operation_fits_smaller_type, vect_recog_over_widening_pattern):
>   Likewise.
> 
> --- gcc/tree-vect-patterns.c.jj   2011-10-06 12:37:34.0 +0200
> +++ gcc/tree-vect-patterns.c  2011-10-06 13:19:44.0 +0200
> @@ -400,7 +400,6 @@ vect_handle_widen_mult_by_const (gimple 
>new_oprnd = make_ssa_name (tmp, NULL);
>new_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd, *oprnd,
>  NULL_TREE);
> -  SSA_NAME_DEF_STMT (new_oprnd) = new_stmt;
>STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
>VEC_safe_push (gimple, heap, *stmts, def_stmt);
>*oprnd = new_oprnd;
> @@ -619,7 +618,6 @@ vect_recog_widen_mult_pattern (VEC (gimp
>var = vect_recog_temp_ssa_var (type, NULL);
>pattern_stmt = gimple_build_assign_with_ops (WIDEN_MULT_EXPR, var, oprnd0,
>  oprnd1);
> -  SSA_NAME_DEF_STMT (var) = pattern_stmt;
>  
>if (vect_print_dump_info (REPORT_DETAILS))
>  print_gimple_stmt (vect_dump, pattern_stmt, 0, TDF_SLIM);
> @@ -703,7 +701,6 @@ vect_recog_pow_pattern (VEC (gimple, hea
>  
>var = vect_recog_temp_ssa_var (TREE_TYPE (base), NULL);
>stmt = gimple_build_assign_with_ops (MULT_EXPR, var, base, base);
> -  SSA_NAME_DEF_STMT (var) = stmt;
>return stmt;
>  }
>  
> @@ -826,7 +823,6 @@ vect_recog_widen_sum_pattern (VEC (gimpl
>var = vect_recog_temp_ssa_var (type, NULL);
>pattern_stmt = gimple_build_assign_with_ops (WIDEN_SUM_EXPR, var,
>  oprnd0, oprnd1);
> -  SSA_NAME_DEF_STMT (var) = pattern_stmt;
>  
>if (vect_print_dump_info (REPORT_DETAILS))
>  {
> @@ -1016,7 +1012,6 @@ vect_operation_fits_smaller_type (gimple
>new_oprnd = make_ssa_name (tmp, NULL);
>new_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd,
> oprnd, NULL_TREE);
> -  SSA_NAME_DEF_STMT (new_oprnd) = new_stmt;
>STMT_VINFO_RELATED_STMT (vinfo_for_stmt (def_stmt)) = new_stmt;
>VEC_safe_push (gimple, heap, *stmts, def_stmt);
>oprnd = new_oprnd;
> @@ -1038,7 +1033,6 @@ vect_operation_fits_smaller_type (gimple
>new_oprnd = make_ssa_name (tmp, NULL);
>new_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd,
> oprnd, NULL_TREE);
> -  SSA_NAME_DEF_STMT (new_oprnd) = new_stmt;
>oprnd = new_oprnd;
>*new_def_stmt = new_stmt;
>  }
> @@ -1141,9 +1135,9 @@ vect_recog_over_widening_pattern (VEC (g
>  VEC_safe_push (gimple, heap, *stmts, prev_stmt);
>  
>var = vect_recog_temp_ssa_var (new_type, NULL);
> -  pattern_stmt = gimple_build_assign_with_ops (
> -  gimple_assign_rhs_code (stmt), var, op0, op1);
> -  SSA_NAME_DEF_STMT (var) = pattern_stmt;
> +  pattern_stmt
> + = gimple_build_assign_with_ops (gimple_assign_rhs_code (stmt), var,
> + op0, op1);
>STMT_VINFO_RELATED_STMT (vinfo_for_stmt (stmt)) = pattern_stmt;
>STMT_VINFO_PATTERN_DEF_STMT (vinfo_for_stmt (stmt)) = new_def_stmt;
>  
> @@ -1182,7 +1176,6 @@ vect_recog_over_widening_pattern (VEC (g
>new_oprnd = make_ssa_name (tmp, NULL);
>pattern_stmt = gimple_build_assign_with_ops (NOP_EXPR, new_oprnd,
> var, NULL_TREE);
> -  SSA_NAME_DEF_STMT (new_oprnd) = pattern_stmt;
>STMT_VINFO_RELATED_STMT (vinfo_for_stmt (use_stmt)) = pattern_stmt;
>  
>*type_in = get_vectype_for_scalar_type (new_type);
> 
>   Jakub
> 
> 

-- 
Richard Guenther 
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer

[PATCH] Don't handle CAST_RESTRICT (PR tree-optimization/49279)

2011-10-06 Thread Jakub Jelinek
Hi!

CAST_RESTRICT based disambiguation unfortunately isn't reliable,
e.g. to store a non-restrict pointer into a restricted field,
we add a non-useless cast to restricted pointer in the gimplifier,
and while we don't consider that field to have a special restrict tag
because it is unsafe to do so, we unfortunately create it for the
CAST_RESTRICT before that and end up with different restrict tags
for the same thing.  See the PR for more details.

This patch turns off CAST_RESTRICT handling for now, in the future
we might try to replace it by explicit CAST_RESTRICT stmts in some form,
but need to solve problems with multiple inlined copies of the same function
with restrict arguments or restrict variables in it and intermixed code from
them (or similarly code from different non-overlapping source blocks).

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
4.6 too?

2011-10-06  Jakub Jelinek  

PR tree-optimization/49279
* tree-ssa-structalias.c (find_func_aliases): Don't handle
CAST_RESTRICT.
* tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Allow
restrict propagation.
* tree-ssa.c (useless_type_conversion_p): Don't return false
if TYPE_RESTRICT differs.

* gcc.dg/tree-ssa/restrict-4.c: XFAIL.
* gcc.c-torture/execute/pr49279.c: New test.

--- gcc/tree-ssa-structalias.c.jj   2011-10-04 10:18:29.0 +0200
+++ gcc/tree-ssa-structalias.c  2011-10-05 12:43:42.0 +0200
@@ -4494,15 +4494,6 @@ find_func_aliases (gimple origt)
  && (!in_ipa_mode
  || DECL_EXTERNAL (lhsop) || TREE_PUBLIC (lhsop)))
make_escape_constraint (rhsop);
-  /* If this is a conversion of a non-restrict pointer to a
-restrict pointer track it with a new heapvar.  */
-  else if (gimple_assign_cast_p (t)
-  && POINTER_TYPE_P (TREE_TYPE (rhsop))
-  && POINTER_TYPE_P (TREE_TYPE (lhsop))
-  && !TYPE_RESTRICT (TREE_TYPE (rhsop))
-  && TYPE_RESTRICT (TREE_TYPE (lhsop)))
-   make_constraint_from_restrict (get_vi_for_tree (lhsop),
-  "CAST_RESTRICT");
 }
   /* Handle escapes through return.  */
   else if (gimple_code (t) == GIMPLE_RETURN
--- gcc/tree-ssa-forwprop.c.jj  2011-10-04 14:36:00.0 +0200
+++ gcc/tree-ssa-forwprop.c 2011-10-05 12:46:32.0 +0200
@@ -804,11 +804,6 @@ forward_propagate_addr_expr_1 (tree name
   && ((rhs_code == SSA_NAME && rhs == name)
  || CONVERT_EXPR_CODE_P (rhs_code)))
 {
-  /* Don't propagate restrict pointer's RHS.  */
-  if (TYPE_RESTRICT (TREE_TYPE (lhs))
- && !TYPE_RESTRICT (TREE_TYPE (name))
- && !is_gimple_min_invariant (def_rhs))
-   return false;
   /* Only recurse if we don't deal with a single use or we cannot
 do the propagation to the current statement.  In particular
 we can end up with a conversion needed for a non-invariant
--- gcc/tree-ssa.c.jj   2011-09-15 12:18:54.0 +0200
+++ gcc/tree-ssa.c  2011-10-05 12:44:52.0 +0200
@@ -1270,12 +1270,6 @@ useless_type_conversion_p (tree outer_ty
  != TYPE_ADDR_SPACE (TREE_TYPE (inner_type)))
return false;
 
-  /* Do not lose casts to restrict qualified pointers.  */
-  if ((TYPE_RESTRICT (outer_type)
-  != TYPE_RESTRICT (inner_type))
- && TYPE_RESTRICT (outer_type))
-   return false;
-
   /* If the outer type is (void *), the conversion is not necessary.  */
   if (VOID_TYPE_P (TREE_TYPE (outer_type)))
return true;
--- gcc/testsuite/gcc.dg/tree-ssa/restrict-4.c.jj   2011-10-04 
14:33:08.0 +0200
+++ gcc/testsuite/gcc.dg/tree-ssa/restrict-4.c  2011-10-05 16:22:33.232433231 
+0200
@@ -22,5 +22,5 @@ bar (int *x, int y)
   return p1[y];
 }
 
-/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
+/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" { xfail *-*-* } 
} } */
 /* { dg-final { cleanup-tree-dump "optimized" } } */
--- gcc/testsuite/gcc.c-torture/execute/pr49279.c.jj2011-10-05 
13:32:43.087670846 +0200
+++ gcc/testsuite/gcc.c-torture/execute/pr49279.c   2011-10-05 
13:32:43.087670846 +0200
@@ -0,0 +1,35 @@
+/* PR tree-optimization/49279 */
+extern void abort (void);
+
+struct S { int a; int *__restrict p; };
+
+__attribute__((noinline, noclone))
+struct S *bar (struct S *p)
+{
+  struct S *r;
+  asm volatile ("" : "=r" (r) : "0" (p) : "memory");
+  return r;
+}
+
+__attribute__((noinline, noclone))
+int
+foo (int *p, int *q)
+{
+  struct S s, *t;
+  s.a = 1;
+  s.p = p;
+  t = bar (&s);
+  t->p = q;
+  s.p[0] = 0;
+  t->p[0] = 1;
+  return s.p[0];
+}
+
+int
+main ()
+{
+  int a, b;
+  if (foo (&a, &b) != 1)
+abort ();
+  return 0;
+}

Jakub


Re: Builtin infrastructure change

2011-10-06 Thread Michael Meissner
On Thu, Oct 06, 2011 at 03:23:07PM +0200, Tobias Burnus wrote:
> On 10/06/2011 03:02 PM, Michael Meissner wrote:
> >On the x86 (with Fedora 13), I built and tested the C, C++, Objective C, 
> >Java, Ada,
> >and Go languages with no regressions
> 
> >On a power6 box with RHEL 6.1, I
> >have done the same for C, C++, Objective C, Java, and Ada languages with no
> >regressions.
> 
> Any reason for not building and testing Fortran? Especially as you
> patch gcc/fortran/{trans*.c,f95-lang.c}?
> 
> Tobias

Brain fault on my part.  I tested the previous set of patches with Fortran.
Since I had to explicitly add the languages to pick up Ada and Go, I seemed to
have dropped Fortran.  Sigh.  Sorry about that.  I just started the powerpc
bootstrap, since that is a lot faster.

-- 
Michael Meissner, IBM
5 Technology Place Drive, M/S 2757, Westford, MA 01886-3141, USA
meiss...@linux.vnet.ibm.com fax +1 (978) 399-6899


[v3] Avoid spurious fails when running the testsuite with -std=gnu++0x

2011-10-06 Thread Paolo Carlini

Hi,

tested x86_64-linux, committed.

Paolo.


2011-10-06  Paolo Carlini  

* testsuite/27_io/ios_base/cons/assign_neg.cc: Tidy dg- directives,
for C++0x testing too.
* testsuite/27_io/ios_base/cons/copy_neg.cc: Likewise.
* testsuite/ext/pb_ds/example/hash_resize_neg.cc: Likewise.
* testsuite/24_iterators/istreambuf_iterator/requirements/
base_classes.cc: Adjust for C++0x testing.
* testsuite/ext/codecvt/char-1.cc: Avoid warnings in C++0x mode.
* testsuite/ext/codecvt/char-2.cc: Likewise.
* testsuite/ext/codecvt/wchar_t.cc: Likewise.
Index: testsuite/27_io/ios_base/cons/assign_neg.cc
===
--- testsuite/27_io/ios_base/cons/assign_neg.cc (revision 179595)
+++ testsuite/27_io/ios_base/cons/assign_neg.cc (working copy)
@@ -18,21 +18,18 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-
 #include 
 
 // Library defect report
 //50.  Copy constructor and assignment operator of ios_base
-class test_base : public std::ios_base { };
+class test_base : public std::ios_base { }; // { dg-error "within this 
context|deleted" } 
 
 void test01()
 {
   // assign
   test_base io1;
   test_base io2;
-  io1 = io2;
+  io1 = io2; // { dg-error "synthesized|deleted" }
 }
-// { dg-error "synthesized" "" { target *-*-* } 33 } 
-// { dg-error "within this context" "" { target *-*-* } 26 } 
-// { dg-error "is private" "" { target *-*-* } 791 }
-// { dg-error "operator=" "" { target *-*-* } 0 } 
+
+// { dg-prune-output "include" }
Index: testsuite/27_io/ios_base/cons/copy_neg.cc
===
--- testsuite/27_io/ios_base/cons/copy_neg.cc   (revision 179595)
+++ testsuite/27_io/ios_base/cons/copy_neg.cc   (working copy)
@@ -18,21 +18,18 @@
 // with this library; see the file COPYING3.  If not see
 // .
 
-
 #include 
 
 // Library defect report
 //50.  Copy constructor and assignment operator of ios_base
-struct test_base : public std::ios_base 
+struct test_base : public std::ios_base // { dg-error "within this 
context|deleted" }
 { };
 
 void test02()
 {
   // copy ctor
   test_base io1;
-  test_base io2 = io1; 
+  test_base io2 = io1; // { dg-error "synthesized|deleted" } 
 }
-// { dg-error "within this context" "" { target *-*-* } 26 }
-// { dg-error "synthesized" "" { target *-*-* } 33 } 
-// { dg-error "is private" "" { target *-*-* } 788 } 
-// { dg-error "copy constructor" "" { target *-*-* } 0 } 
+
+// { dg-prune-output "include" }
Index: testsuite/24_iterators/istreambuf_iterator/requirements/base_classes.cc
===
--- testsuite/24_iterators/istreambuf_iterator/requirements/base_classes.cc 
(revision 179595)
+++ testsuite/24_iterators/istreambuf_iterator/requirements/base_classes.cc 
(working copy)
@@ -1,7 +1,8 @@
 // { dg-do compile }
 // 1999-06-28 bkoz
 
-// Copyright (C) 1999, 2001, 2003, 2009 Free Software Foundation, Inc.
+// Copyright (C) 1999, 2001, 2003, 2009, 2010, 2011
+// Free Software Foundation, Inc.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
 // software; you can redistribute it and/or modify it under the
@@ -31,8 +32,15 @@
   // Check for required base class.
   typedef istreambuf_iterator test_iterator;
   typedef char_traits::off_type off_type;
-  typedef iterator 
base_iterator;
 
+  typedef iterator
+#else
+char&>
+#endif
+base_iterator;
+
   istringstream isstream("this tag");
   test_iterator  r_it(isstream);
   base_iterator* base __attribute__((unused)) = &r_it;
Index: testsuite/ext/pb_ds/example/hash_resize_neg.cc
===
--- testsuite/ext/pb_ds/example/hash_resize_neg.cc  (revision 179595)
+++ testsuite/ext/pb_ds/example/hash_resize_neg.cc  (working copy)
@@ -1,7 +1,8 @@
 // { dg-do compile }
 // -*- C++ -*-
 
-// Copyright (C) 2005, 2006, 2007, 2009 Free Software Foundation, Inc.
+// Copyright (C) 2005, 2006, 2007, 2009, 2010, 2011
+// Free Software Foundation, Inc.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
 // software; you can redistribute it and/or modify it under the terms
@@ -60,4 +61,4 @@
   h.resize(20); // { dg-error "required from" }
 }
 
-// { dg-error "invalid" "" { target *-*-* } 187 } 
+// { dg-prune-output "include" }
Index: testsuite/ext/codecvt/char-1.cc
===
--- testsuite/ext/codecvt/char-1.cc (revision 179595)
+++ testsuite/ext/codecvt/char-1.cc (working copy)
@@ -4,6 +4,7 @@
 // 2000-08-22 Benjamin Kosnik 
 
 // Copyright (C) 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2009
+// 2010, 2011
 // Free Software Foundation
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
@@ -69,10 +70,14 @@
 
 

Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-06 Thread Kai Tietz
2011/10/6 Michael Matz :
> Hi,
>
> On Thu, 6 Oct 2011, Richard Guenther wrote:
>
>> > +      && ((TREE_CODE_CLASS (TREE_CODE (arg1)) != tcc_comparison
>> > +          && TREE_CODE (arg1) != TRUTH_NOT_EXPR)
>> > +         || !FLOAT_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0)
>>
>> ?  simple_operand_p would have rejected both ! and comparisons.
>>
>> I miss a test for side-effects on arg0 (and probably simple_operand_p there,
>> as well).
>
> He has it in the if() body.  But why?  The point of ANDIF/ORIF is to not
> evaluate the second argument for side-effects when the first argument is
> false/true already, and further to establish an order between both
> evaluations.  The sideeffect on the first arg is always evaluated.
> AND/OR always evaluate both arguments (in unspecified order), but as he
> checks the second one for being free of side effects already that alone is
> already equivalent to ANDIF/ORIF.  No need to check something on the first
> argument.
>
>
> Ciao,
> Michael.


That's not the hole story.  The difference between
TRUTH_(AND|OR)IF_EXPR and TRUTH_(AND|OR)_EXPR are, that for
TRUTH_(AND|OR)IF_EXPR gimplifier creates a COND expression, but for
TRUTH_(AND|OR)_EXPR it doesn't.

Regards,
Kai


Re: [PATCH] Don't handle CAST_RESTRICT (PR tree-optimization/49279)

2011-10-06 Thread Richard Guenther
On Thu, 6 Oct 2011, Jakub Jelinek wrote:

> Hi!
> 
> CAST_RESTRICT based disambiguation unfortunately isn't reliable,
> e.g. to store a non-restrict pointer into a restricted field,
> we add a non-useless cast to restricted pointer in the gimplifier,
> and while we don't consider that field to have a special restrict tag
> because it is unsafe to do so, we unfortunately create it for the
> CAST_RESTRICT before that and end up with different restrict tags
> for the same thing.  See the PR for more details.
> 
> This patch turns off CAST_RESTRICT handling for now, in the future
> we might try to replace it by explicit CAST_RESTRICT stmts in some form,
> but need to solve problems with multiple inlined copies of the same function
> with restrict arguments or restrict variables in it and intermixed code from
> them (or similarly code from different non-overlapping source blocks).
> 
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
> 4.6 too?

Ok for trunk.  Ok for 4.6 with the tree-ssa.c change omitted -
and the stmt folding patch applied.

Thanks,
Richard.

> 2011-10-06  Jakub Jelinek  
> 
>   PR tree-optimization/49279
>   * tree-ssa-structalias.c (find_func_aliases): Don't handle
>   CAST_RESTRICT.
>   * tree-ssa-forwprop.c (forward_propagate_addr_expr_1): Allow
>   restrict propagation.
>   * tree-ssa.c (useless_type_conversion_p): Don't return false
>   if TYPE_RESTRICT differs.
> 
>   * gcc.dg/tree-ssa/restrict-4.c: XFAIL.
>   * gcc.c-torture/execute/pr49279.c: New test.
> 
> --- gcc/tree-ssa-structalias.c.jj 2011-10-04 10:18:29.0 +0200
> +++ gcc/tree-ssa-structalias.c2011-10-05 12:43:42.0 +0200
> @@ -4494,15 +4494,6 @@ find_func_aliases (gimple origt)
> && (!in_ipa_mode
> || DECL_EXTERNAL (lhsop) || TREE_PUBLIC (lhsop)))
>   make_escape_constraint (rhsop);
> -  /* If this is a conversion of a non-restrict pointer to a
> -  restrict pointer track it with a new heapvar.  */
> -  else if (gimple_assign_cast_p (t)
> -&& POINTER_TYPE_P (TREE_TYPE (rhsop))
> -&& POINTER_TYPE_P (TREE_TYPE (lhsop))
> -&& !TYPE_RESTRICT (TREE_TYPE (rhsop))
> -&& TYPE_RESTRICT (TREE_TYPE (lhsop)))
> - make_constraint_from_restrict (get_vi_for_tree (lhsop),
> -"CAST_RESTRICT");
>  }
>/* Handle escapes through return.  */
>else if (gimple_code (t) == GIMPLE_RETURN
> --- gcc/tree-ssa-forwprop.c.jj2011-10-04 14:36:00.0 +0200
> +++ gcc/tree-ssa-forwprop.c   2011-10-05 12:46:32.0 +0200
> @@ -804,11 +804,6 @@ forward_propagate_addr_expr_1 (tree name
>&& ((rhs_code == SSA_NAME && rhs == name)
> || CONVERT_EXPR_CODE_P (rhs_code)))
>  {
> -  /* Don't propagate restrict pointer's RHS.  */
> -  if (TYPE_RESTRICT (TREE_TYPE (lhs))
> -   && !TYPE_RESTRICT (TREE_TYPE (name))
> -   && !is_gimple_min_invariant (def_rhs))
> - return false;
>/* Only recurse if we don't deal with a single use or we cannot
>do the propagation to the current statement.  In particular
>we can end up with a conversion needed for a non-invariant
> --- gcc/tree-ssa.c.jj 2011-09-15 12:18:54.0 +0200
> +++ gcc/tree-ssa.c2011-10-05 12:44:52.0 +0200
> @@ -1270,12 +1270,6 @@ useless_type_conversion_p (tree outer_ty
> != TYPE_ADDR_SPACE (TREE_TYPE (inner_type)))
>   return false;
>  
> -  /* Do not lose casts to restrict qualified pointers.  */
> -  if ((TYPE_RESTRICT (outer_type)
> -!= TYPE_RESTRICT (inner_type))
> -   && TYPE_RESTRICT (outer_type))
> - return false;
> -
>/* If the outer type is (void *), the conversion is not necessary.  */
>if (VOID_TYPE_P (TREE_TYPE (outer_type)))
>   return true;
> --- gcc/testsuite/gcc.dg/tree-ssa/restrict-4.c.jj 2011-10-04 
> 14:33:08.0 +0200
> +++ gcc/testsuite/gcc.dg/tree-ssa/restrict-4.c2011-10-05 
> 16:22:33.232433231 +0200
> @@ -22,5 +22,5 @@ bar (int *x, int y)
>return p1[y];
>  }
>  
> -/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" } } */
> +/* { dg-final { scan-tree-dump-times "return 1;" 2 "optimized" { xfail *-*-* 
> } } } */
>  /* { dg-final { cleanup-tree-dump "optimized" } } */
> --- gcc/testsuite/gcc.c-torture/execute/pr49279.c.jj  2011-10-05 
> 13:32:43.087670846 +0200
> +++ gcc/testsuite/gcc.c-torture/execute/pr49279.c 2011-10-05 
> 13:32:43.087670846 +0200
> @@ -0,0 +1,35 @@
> +/* PR tree-optimization/49279 */
> +extern void abort (void);
> +
> +struct S { int a; int *__restrict p; };
> +
> +__attribute__((noinline, noclone))
> +struct S *bar (struct S *p)
> +{
> +  struct S *r;
> +  asm volatile ("" : "=r" (r) : "0" (p) : "memory");
> +  return r;
> +}
> +
> +__attribute__((noinline, noclone))
> +int
> +foo (int *p, int *q)
> +{
> +  struct S s, *t;
> +  s.a = 1;
> +  s.p = p;
> +  t = bar (&s);
> 

Re: rfa: remove get_var_ann (was: Fix PR50260)

2011-10-06 Thread Michael Matz
Hi,

On Sat, 3 Sep 2011, Richard Guenther wrote:

> > OTOH it's a nice invariant that can actually be checked for (that all 
> > reachable vars whatsoever have to be in referenced_vars), so I'm going 
> > to do that.
> 
> Yes, until we get rid of referenced_vars (which we still should do at 
> some point...) that's the best.

Okay, like so then.  Regstrapped on x86_64-linux.  (Note that sometimes I 
use add_referenced_vars, and sometimes find_referenced_vars_in, the latter 
when I would have to add several add_referenced_vars for one statement).

> IIRC we have some verification code even, and wonder why it doesn't 
> trigger.

Nope, we don't.  But with the patch we segfault in case this happens 
again, which is good enough checking for me.


Ciao,
Michael.

* tree-flow.h (get_var_ann): Don't declare.
* tree-flow-inline.h (get_var_ann): Remove.
(set_is_used): Use var_ann, not get_var_ann.
* tree-dfa.c (add_referenced_var): Inline body of get_var_ann.
* tree-profile.c (gimple_gen_edge_profiler): Call
find_referenced_var_in.
(gimple_gen_interval_profiler): Ditto.
(gimple_gen_pow2_profiler): Ditto.
(gimple_gen_one_value_profiler): Ditto.
(gimple_gen_average_profiler): Ditto.
(gimple_gen_ior_profiler): Ditto.
(gimple_gen_ic_profiler): Ditto plus call add_referenced_var.
(gimple_gen_ic_func_profiler): Call add_referenced_var.
* tree-mudflap.c (execute_mudflap_function_ops): Call
add_referenced_var.

Index: tree-flow.h
===
--- tree-flow.h (revision 178488)
+++ tree-flow.h (working copy)
@@ -278,7 +278,6 @@ typedef struct immediate_use_iterator_d
 typedef struct var_ann_d *var_ann_t;
 
 static inline var_ann_t var_ann (const_tree);
-static inline var_ann_t get_var_ann (tree);
 static inline void update_stmt (gimple);
 static inline int get_lineno (const_gimple);
 
Index: tree-flow-inline.h
===
--- tree-flow-inline.h  (revision 178488)
+++ tree-flow-inline.h  (working copy)
@@ -145,16 +145,6 @@ var_ann (const_tree t)
   return p ? *p : NULL;
 }
 
-/* Return the variable annotation for T, which must be a _DECL node.
-   Create the variable annotation if it doesn't exist.  */
-static inline var_ann_t
-get_var_ann (tree var)
-{
-  var_ann_t *p = DECL_VAR_ANN_PTR (var);
-  gcc_checking_assert (p);
-  return *p ? *p : create_var_ann (var);
-}
-
 /* Get the number of the next statement uid to be allocated.  */
 static inline unsigned int
 gimple_stmt_max_uid (struct function *fn)
@@ -568,7 +558,7 @@ phi_arg_index_from_use (use_operand_p us
 static inline void
 set_is_used (tree var)
 {
-  var_ann_t ann = get_var_ann (var);
+  var_ann_t ann = var_ann (var);
   ann->used = true;
 }
 
Index: tree-dfa.c
===
--- tree-dfa.c  (revision 178488)
+++ tree-dfa.c  (working copy)
@@ -580,8 +580,9 @@ set_default_def (tree var, tree def)
 bool
 add_referenced_var (tree var)
 {
-  get_var_ann (var);
   gcc_assert (DECL_P (var));
+  if (!*DECL_VAR_ANN_PTR (var))
+create_var_ann (var);
 
   /* Insert VAR into the referenced_vars hash table if it isn't present.  */
   if (referenced_var_check_and_insert (var))
Index: tree-profile.c
===
--- tree-profile.c  (revision 178408)
+++ tree-profile.c  (working copy)
@@ -224,6 +224,7 @@ gimple_gen_edge_profiler (int edgeno, ed
   one = build_int_cst (gcov_type_node, 1);
   stmt1 = gimple_build_assign (gcov_type_tmp_var, ref);
   gimple_assign_set_lhs (stmt1, make_ssa_name (gcov_type_tmp_var, stmt1));
+  find_referenced_vars_in (stmt1);
   stmt2 = gimple_build_assign_with_ops (PLUS_EXPR, gcov_type_tmp_var,
gimple_assign_lhs (stmt1), one);
   gimple_assign_set_lhs (stmt2, make_ssa_name (gcov_type_tmp_var, stmt2));
@@ -270,6 +271,7 @@ gimple_gen_interval_profiler (histogram_
   val = prepare_instrumented_value (&gsi, value);
   call = gimple_build_call (tree_interval_profiler_fn, 4,
ref_ptr, val, start, steps);
+  find_referenced_vars_in (call);
   gsi_insert_before (&gsi, call, GSI_NEW_STMT);
 }
 
@@ -290,6 +292,7 @@ gimple_gen_pow2_profiler (histogram_valu
  true, NULL_TREE, true, GSI_SAME_STMT);
   val = prepare_instrumented_value (&gsi, value);
   call = gimple_build_call (tree_pow2_profiler_fn, 2, ref_ptr, val);
+  find_referenced_vars_in (call);
   gsi_insert_before (&gsi, call, GSI_NEW_STMT);
 }
 
@@ -310,6 +313,7 @@ gimple_gen_one_value_profiler (histogram
  true, NULL_TREE, true, GSI_SAME_STMT);
   val = prepare_instrumented_value (&gsi, value);
   call = gimple_build_call (tree_one_value_profiler_fn, 2, ref_ptr, val);
+  find_referenced_vars_in (call);
   gs

[PATCH][ARM] Fix broken shift patterns

2011-10-06 Thread Andrew Stubbs

This patch is a follow-up both to my patches here:

  http://gcc.gnu.org/ml/gcc-patches/2011-09/msg00049.html

and Paul Brook's patch here:

  http://gcc.gnu.org/ml/gcc-patches/2011-09/msg01076.html

The patch fixes both the original problem, in which negative shift 
constants caused an ICE (pr50193), and the problem introduced by Paul's 
patch, in which "a*64+b" is not properly optimized.


However, it does not attempt to fix Richard Sandiford's observation that 
there may be a latent problem with the 'M' constraint which could lead 
reload to cause a recog ICE.


I believe this patch to be nothing but an improvement over the current 
state, and that a fix to the constraint problem should be a separate patch.


In that basis, am I OK to commit?



Now, let me explain the other problem:

As it stands, all shift-related patterns, for ARM or Thumb2 mode, permit 
a shift to be expressed as either a shift type and amount (register or 
constant), or as a multiply and power-of-two constant.


This is necessary because the canonical form of (plus (ashift x y) z) 
appears to be (plus (mult x 2^y) z), presumably for the benefit of 
multiply-and-accumulate optimizations. (Minus is similarly affected, but 
other shiftable operations are unaffected, and this only applies to left 
shifts, of course.)


The (possible) problem is that the meanings of the constants for mult 
and ashift are very different, but the arm.md file has these unified 
into a single pattern using a single 'M' constraint that must allow both 
types of constant unconditionally. This is safe for the vast majority of 
passes because they check recog before they make a change, and anyway 
don't make changes without understanding the logic. But, reload has a 
feature where it can pull a constant from a register, and convert it to 
an immediate, if the constraints allow, but crucially, it doesn't check 
the predicates; no doubt it shouldn't need to, but the ARM port appears 
to be breaking to rules.


Problem scenario 1:

  Consider pattern (plus (mult r1 r2) r3).

  It so happens that reload knows that r2 contains a constant, say 20,
  so reload checks to see if that could be converted to an immediate.
  Now, 20 is not a power of two, so recog would reject it, but it is in
  the range 0..31 so it does match the 'M' constraint. Oops!

Problem scenario 2:

  Consider pattern (ashiftrt r1 r2).

  Again, it so happens that reload knows that r2 contains a constant, in
  this case let's say 64, so again reload checks to see if that could
  be converted to an immediate. This time, 64 is not in the range
  0..31, so recog would reject it, but it is a power of two, so it does
  match the 'M' constraint. Again, oops!

I see two ways to fix this properly:

 1. Duplicate all the patterns in the machine description, once for the
mult case, and once for the other cases. This could probably be
done with a code iterator, if preferred.

 2. Redefine the canonical form of (plus (mult .. 2^n) ..) such that it
always uses the (presumably cheaper) shift-and-add option. However,
this would require all other targets where madd really is the best
option to fix it up. (I'd imagine that two instructions for shift
and add would be cheaper speed wise, if properly scheduled, on most
targets? That doesn't help the size optimization though.)

However, it's not obvious to me that this needs fixing:
 * The failure mode would be an ICE, and we've not seen any.
 * There's a comment in arm.c:shift_op that suggests that this can't
   happen, somehow, at least in the mult case.
   - I'm not sure exactly how reload works, but it seems reasonable
 that it will never try to convert a register to an immediate
 because the pattern does not allow registers in the first place.
   - This logic doesn't hold in the opposite case though.

Have I explained all that clearly?

My conclusion after studying all this is that we don't need to do 
anything until somebody reports an ICE, at which point it becomes worth 
the effort of fixing it. Other opinions welcome! :)


Andrew
2011-10-06  Andrew Stubbs  

	gcc/
	* config/arm/predicates.md (shift_amount_operand): Remove constant
	range check.
	(shift_operator): Check range of constants for all shift operators.

	gcc/testsuite/
	* gcc.dg/pr50193-1.c: New file.
	* gcc.target/arm/shiftable.c: New file.

---
 src/gcc-mainline/gcc/config/arm/predicates.md  |   15 ++-
 src/gcc-mainline/gcc/testsuite/gcc.dg/pr50193-1.c  |   10 +
 .../gcc/testsuite/gcc.target/arm/shiftable.c   |   43 
 3 files changed, 65 insertions(+), 3 deletions(-)
 create mode 100644 src/gcc-mainline/gcc/testsuite/gcc.dg/pr50193-1.c
 create mode 100644 src/gcc-mainline/gcc/testsuite/gcc.target/arm/shiftable.c

diff --git a/src/gcc-mainline/gcc/config/arm/predicates.md b/src/gcc-mainline/gcc/config/arm/predicates.md
index 27ba603..7307fd5 100644
--- a/src/gcc-mainline/gcc/config/arm/predicates.md
+++ b/src/gcc-mainline/gcc/c

Re: rfa: remove get_var_ann (was: Fix PR50260)

2011-10-06 Thread Richard Guenther
On Thu, Oct 6, 2011 at 4:59 PM, Michael Matz  wrote:
> Hi,
>
> On Sat, 3 Sep 2011, Richard Guenther wrote:
>
>> > OTOH it's a nice invariant that can actually be checked for (that all
>> > reachable vars whatsoever have to be in referenced_vars), so I'm going
>> > to do that.
>>
>> Yes, until we get rid of referenced_vars (which we still should do at
>> some point...) that's the best.
>
> Okay, like so then.  Regstrapped on x86_64-linux.  (Note that sometimes I
> use add_referenced_vars, and sometimes find_referenced_vars_in, the latter
> when I would have to add several add_referenced_vars for one statement).
>
>> IIRC we have some verification code even, and wonder why it doesn't
>> trigger.
>
> Nope, we don't.  But with the patch we segfault in case this happens
> again, which is good enough checking for me.

Ok.

Thanks,
Richard.

>
> Ciao,
> Michael.
> 
>        * tree-flow.h (get_var_ann): Don't declare.
>        * tree-flow-inline.h (get_var_ann): Remove.
>        (set_is_used): Use var_ann, not get_var_ann.
>        * tree-dfa.c (add_referenced_var): Inline body of get_var_ann.
>        * tree-profile.c (gimple_gen_edge_profiler): Call
>        find_referenced_var_in.
>        (gimple_gen_interval_profiler): Ditto.
>        (gimple_gen_pow2_profiler): Ditto.
>        (gimple_gen_one_value_profiler): Ditto.
>        (gimple_gen_average_profiler): Ditto.
>        (gimple_gen_ior_profiler): Ditto.
>        (gimple_gen_ic_profiler): Ditto plus call add_referenced_var.
>        (gimple_gen_ic_func_profiler): Call add_referenced_var.
>        * tree-mudflap.c (execute_mudflap_function_ops): Call
>        add_referenced_var.
>
> Index: tree-flow.h
> ===
> --- tree-flow.h (revision 178488)
> +++ tree-flow.h (working copy)
> @@ -278,7 +278,6 @@ typedef struct immediate_use_iterator_d
>  typedef struct var_ann_d *var_ann_t;
>
>  static inline var_ann_t var_ann (const_tree);
> -static inline var_ann_t get_var_ann (tree);
>  static inline void update_stmt (gimple);
>  static inline int get_lineno (const_gimple);
>
> Index: tree-flow-inline.h
> ===
> --- tree-flow-inline.h  (revision 178488)
> +++ tree-flow-inline.h  (working copy)
> @@ -145,16 +145,6 @@ var_ann (const_tree t)
>   return p ? *p : NULL;
>  }
>
> -/* Return the variable annotation for T, which must be a _DECL node.
> -   Create the variable annotation if it doesn't exist.  */
> -static inline var_ann_t
> -get_var_ann (tree var)
> -{
> -  var_ann_t *p = DECL_VAR_ANN_PTR (var);
> -  gcc_checking_assert (p);
> -  return *p ? *p : create_var_ann (var);
> -}
> -
>  /* Get the number of the next statement uid to be allocated.  */
>  static inline unsigned int
>  gimple_stmt_max_uid (struct function *fn)
> @@ -568,7 +558,7 @@ phi_arg_index_from_use (use_operand_p us
>  static inline void
>  set_is_used (tree var)
>  {
> -  var_ann_t ann = get_var_ann (var);
> +  var_ann_t ann = var_ann (var);
>   ann->used = true;
>  }
>
> Index: tree-dfa.c
> ===
> --- tree-dfa.c  (revision 178488)
> +++ tree-dfa.c  (working copy)
> @@ -580,8 +580,9 @@ set_default_def (tree var, tree def)
>  bool
>  add_referenced_var (tree var)
>  {
> -  get_var_ann (var);
>   gcc_assert (DECL_P (var));
> +  if (!*DECL_VAR_ANN_PTR (var))
> +    create_var_ann (var);
>
>   /* Insert VAR into the referenced_vars hash table if it isn't present.  */
>   if (referenced_var_check_and_insert (var))
> Index: tree-profile.c
> ===
> --- tree-profile.c      (revision 178408)
> +++ tree-profile.c      (working copy)
> @@ -224,6 +224,7 @@ gimple_gen_edge_profiler (int edgeno, ed
>   one = build_int_cst (gcov_type_node, 1);
>   stmt1 = gimple_build_assign (gcov_type_tmp_var, ref);
>   gimple_assign_set_lhs (stmt1, make_ssa_name (gcov_type_tmp_var, stmt1));
> +  find_referenced_vars_in (stmt1);
>   stmt2 = gimple_build_assign_with_ops (PLUS_EXPR, gcov_type_tmp_var,
>                                        gimple_assign_lhs (stmt1), one);
>   gimple_assign_set_lhs (stmt2, make_ssa_name (gcov_type_tmp_var, stmt2));
> @@ -270,6 +271,7 @@ gimple_gen_interval_profiler (histogram_
>   val = prepare_instrumented_value (&gsi, value);
>   call = gimple_build_call (tree_interval_profiler_fn, 4,
>                            ref_ptr, val, start, steps);
> +  find_referenced_vars_in (call);
>   gsi_insert_before (&gsi, call, GSI_NEW_STMT);
>  }
>
> @@ -290,6 +292,7 @@ gimple_gen_pow2_profiler (histogram_valu
>                                      true, NULL_TREE, true, GSI_SAME_STMT);
>   val = prepare_instrumented_value (&gsi, value);
>   call = gimple_build_call (tree_pow2_profiler_fn, 2, ref_ptr, val);
> +  find_referenced_vars_in (call);
>   gsi_insert_before (&gsi, call, GSI_NEW_STMT);
>  }
>
> @@ -310,6 +313,7 @@ gimple_gen_one_value_profil

Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-06 Thread Michael Matz
Hi,

On Thu, 6 Oct 2011, Kai Tietz wrote:

> That's not the hole story.  The difference between TRUTH_(AND|OR)IF_EXPR 
> and TRUTH_(AND|OR)_EXPR are, that for TRUTH_(AND|OR)IF_EXPR gimplifier 
> creates a COND expression, but for TRUTH_(AND|OR)_EXPR it doesn't.

Yes, of course.  That is what implements the short-circuit semantics.  But 
as Richard already mentioned I also don't understand why you do the 
reassociation at that point.  Why not simply rewrite ANDIF -> AND (when 
possible, i.e. no sideeffects on arg1, and desirable, i.e. when 
LOGICAL_OP_NON_SHORT_CIRCUIT, and simple_operand(arg1)) and let other 
folders do reassociation?  I ask because your comments states to 
transform:

  ((W AND X) ANDIF Y) ANDIF Z
into
  (W AND X) ANDIF (Y AND Z)

(under condition that Y and Z are simple operands).

In fact you don't check the form of arg0,0, i.e. the "W AND X" here.  
Independend of that it doesn't make sense, because if Y and Z are easy 
(simple and no side-effects), then "Y AND Z" is too, and therefore you 
should transform this (if at all) into:

  (W AND X) AND (Y AND Z)

at which point this association doesn't make sense anymore, as 

  ((W AND X) AND Y) AND Z

is just as fine.  So, the reassociation looks fishy at best, better get 
rid of it?  (which of the testcases breaks without it?)


Ciao,
Michael.


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-06 Thread Kai Tietz
2011/10/6 Michael Matz :
> Hi,
>
> On Thu, 6 Oct 2011, Kai Tietz wrote:
>
>> That's not the hole story.  The difference between TRUTH_(AND|OR)IF_EXPR
>> and TRUTH_(AND|OR)_EXPR are, that for TRUTH_(AND|OR)IF_EXPR gimplifier
>> creates a COND expression, but for TRUTH_(AND|OR)_EXPR it doesn't.
>
> Yes, of course.  That is what implements the short-circuit semantics.  But
> as Richard already mentioned I also don't understand why you do the
> reassociation at that point.  Why not simply rewrite ANDIF -> AND (when
> possible, i.e. no sideeffects on arg1, and desirable, i.e. when
> LOGICAL_OP_NON_SHORT_CIRCUIT, and simple_operand(arg1)) and let other
> folders do reassociation?  I ask because your comments states to
> transform:
>
>  ((W AND X) ANDIF Y) ANDIF Z
> into
>  (W AND X) ANDIF (Y AND Z)
>
> (under condition that Y and Z are simple operands).
>
> In fact you don't check the form of arg0,0, i.e. the "W AND X" here.
> Independend of that it doesn't make sense, because if Y and Z are easy
> (simple and no side-effects), then "Y AND Z" is too, and therefore you
> should transform this (if at all) into:
>
>  (W AND X) AND (Y AND Z)
>
> at which point this association doesn't make sense anymore, as

Sorry, exactly this doesn't happen, due an ANDIF isn't simple, and
therefore it isn't transformed into and AND.


>  ((W AND X) AND Y) AND Z
>
> is just as fine.  So, the reassociation looks fishy at best, better get
> rid of it?  (which of the testcases breaks without it?)

None.  I had this implemented first.  But Richard was concerned about
making non-IF conditions too long.I understand that point that it
might not that good to always modify unconditionally to AND/OR chain.
For example

if (a1 && a2 && a3 &&  && a100) return 1;

would be packed by this patch into 50 branches.   If we would modify
all of them into AND, then we would calculate for all 100 values the
result, even if the first a1 is zero.  This doesn't improve speed
pretty well.

But you are right, that from the point of reassociation optimization
it could be in some cases more profitable to have packed all elements
into on AND-chain.

Regards,
Kai


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-06 Thread Jakub Jelinek
On Thu, Oct 06, 2011 at 05:28:36PM +0200, Kai Tietz wrote:
> None.  I had this implemented first.  But Richard was concerned about
> making non-IF conditions too long.I understand that point that it
> might not that good to always modify unconditionally to AND/OR chain.
> For example
> 
> if (a1 && a2 && a3 &&  && a100) return 1;
> 
> would be packed by this patch into 50 branches.   If we would modify
> all of them into AND, then we would calculate for all 100 values the
> result, even if the first a1 is zero.  This doesn't improve speed
> pretty well.
> 
> But you are right, that from the point of reassociation optimization
> it could be in some cases more profitable to have packed all elements
> into on AND-chain.

Yeah.  Perhaps we should break them up after reassoc2, or on the other side
teach reassoc (or some other pass) to be able to do the optimizations
on a series of GIMPLE_COND with no side-effects in between.
See e.g. PR46309, return a == 3 || a == 1 || a == 2 || a == 4;
isn't optimized into (a - 1U) < 4U, although it could, if branch cost
cause it to be broken up into several GIMPLE_COND stmts.
Or if user writes:
  if (a == 3)
return 1;
  if (a == 1)
return 1;
  if (a == 2)
return 1;
  if (a == 4)
return 1;
  return 0;
(more probably using enums).

Jakub


Re: Initial shrink-wrapping patch

2011-10-06 Thread Richard Henderson
On 10/06/2011 06:37 AM, Bernd Schmidt wrote:
> On 10/06/11 01:47, Bernd Schmidt wrote:
>> This appears to be because the split prologue contains a jump, which
>> means the find_many_sub_blocks call reorders the block numbers, and our
>> indices into bb_flags are off.
> 
> Testing of the patch completed - ok? Regardless of split-stack it seems
> like a cleanup and eliminates a potential source of errors.

Yes, patch is ok.


r~


Re: Vector shuffling

2011-10-06 Thread Richard Henderson
On 10/06/2011 04:46 AM, Georg-Johann Lay wrote:
> So here it is.  Lightly tested on my target: All tests either PASS or are
> UNSUPPORTED now.
> 
> Ok?

Not ok, but only because I've completely restructured the tests again.
Patch coming very shortly...


r~


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-06 Thread Michael Matz
Hi,

On Thu, 6 Oct 2011, Kai Tietz wrote:

> > at which point this association doesn't make sense anymore, as
> 
> Sorry, exactly this doesn't happen, due an ANDIF isn't simple, and 
> therefore it isn't transformed into and AND.

Right ...

> >  ((W AND X) AND Y) AND Z
> >
> > is just as fine.  So, the reassociation looks fishy at best, better get
> > rid of it?  (which of the testcases breaks without it?)
> 
> None.  I had this implemented first.  But Richard was concerned about
> making non-IF conditions too long.I understand that point that it
> might not that good to always modify unconditionally to AND/OR chain.

... and I see that (that's why the transformation should be desirable for 
some definition of desirable, which probably includes "and RHS not too 
long chain").  As it stands right now your transformation seems to be a 
fairly ad-hoc try at avoiding this problem.  That's why I wonder why to do 
the reassoc at all?  Which testcases break _without_ the reassociation, 
i.e. with only rewriting ANDIF -> AND at the outermost level?


Ciao,
Michael.

[cxx-mem-model] Add lockfree tests

2011-10-06 Thread Andrew MacLeod
This patch supplies __sync_mem_is_lock_free (size) and 
__sync_mem_always_lock_free (size).


__sync_mem_always_lock_free requires a compile time constant, and 
returns true if an object of the specified size will *always* generate 
lock free instructions on the current architecture.  Otherwise false is 
returned.


__sync_mem_is_lock_free also returns true if instructions will always be 
lock free, but if the answer is not true, it resolves to an external 
call  named '__sync_mem_is_lock_free' which will be supplied 
externally.  Presumably by whatever library or application is providing 
the other external __sync_mem routines as documented in 
http://gcc.gnu.org/wiki/Atomic/GCCMM/LIbrary


New tests, documentation are provided, bootstraps on 
x86_64-unknown-linux-gnu and causes no new testsuite regressions.


Andrew



* optabs.h (DOI_sync_mem_always_lock_free): New.
(DOI_sync_mem_is_lock_free): New.
(sync_mem_always_lock_free_optab, sync_mem_is_lock_free_optab): New.
* builtins.c (fold_builtin_sync_mem_always_lock_free): New.
(expand_builtin_sync_mem_always_lock_free): New.
(fold_builtin_sync_mem_is_lock_free): New.
(expand_builtin_sync_mem_is_lock_free): New.
(expand_builtin): Handle BUILT_IN_SYNC_MEM_{is,always}_LOCK_FREE.
(fold_builtin_1): Handle BUILT_IN_SYNC_MEM_{is,always}_LOCK_FREE.
* sync-builtins.def: Add BUILT_IN_SYNC_MEM_{is,always}_LOCK_FREE.
* builtin-types.def: Add BT_FN_BOOL_SIZE type.
* fortran/types.def: Add BT_SIZE and BT_FN_BOOL_SIZE.
* doc/extend.texi: Add documentation.
* testsuite/gcc.dg/sync-mem-invalid.c: Test for invalid param.
* testsuite/gcc.dg/sync-mem-lockfree[-aux].c: New tests.

Index: optabs.h
===
*** optabs.h(revision 178916)
--- optabs.h(working copy)
*** enum direct_optab_index
*** 708,713 
--- 708,715 
DOI_sync_mem_nand,
DOI_sync_mem_xor,
DOI_sync_mem_or,
+   DOI_sync_mem_always_lock_free,
+   DOI_sync_mem_is_lock_free,
DOI_sync_mem_thread_fence,
DOI_sync_mem_signal_fence,
  
*** typedef struct direct_optab_d *direct_op
*** 801,806 
--- 803,812 
(&direct_optab_table[(int) DOI_sync_mem_xor])
  #define sync_mem_or_optab \
(&direct_optab_table[(int) DOI_sync_mem_or])
+ #define sync_mem_always_lock_free_optab \
+   (&direct_optab_table[(int) DOI_sync_mem_always_lock_free])
+ #define sync_mem_is_lock_free_optab \
+   (&direct_optab_table[(int) DOI_sync_mem_is_lock_free])
  #define sync_mem_thread_fence_optab \
(&direct_optab_table[(int) DOI_sync_mem_thread_fence])
  #define sync_mem_signal_fence_optab \
Index: builtins.c
===
*** builtins.c  (revision 179522)
--- builtins.c  (working copy)
*** expand_builtin_sync_mem_fetch_op (enum m
*** 5386,5391 
--- 5386,5472 
return expand_sync_mem_fetch_op (target, mem, val, code, model, 
fetch_after);
  }
  
+ /* Return true if size ARG is always lock free on this architecture.  */
+ static tree
+ fold_builtin_sync_mem_always_lock_free (tree arg)
+ {
+   int size;
+   enum machine_mode mode;
+   enum insn_code icode;
+ 
+   if (TREE_CODE (arg) != INTEGER_CST)
+ return NULL_TREE;
+ 
+   /* Check if a compare_and_swap pattern exists for the mode which represents
+  the required size.  The pattern is not allowed to fail, so the existence
+  of the pattern indicates support is present.  */
+   size = INTVAL (expand_normal (arg)) * BITS_PER_UNIT;
+   mode = mode_for_size (size, MODE_INT, 0);
+   icode = direct_optab_handler (sync_compare_and_swap_optab, mode);
+ 
+   if (icode == CODE_FOR_nothing)
+ return integer_zero_node;
+ 
+   return integer_one_node;
+ }
+ 
+ /* Return true if the first argument to call EXP represents a size of
+object than will always generate lock-free instructions on this target.
+Otherwise return false.  */
+ static rtx
+ expand_builtin_sync_mem_always_lock_free (tree exp)
+ {
+   tree size;
+   tree arg = CALL_EXPR_ARG (exp, 0);
+ 
+   if (TREE_CODE (arg) != INTEGER_CST)
+ {
+   error ("non-constant argument to __sync_mem_always_lock_free");
+   return const0_rtx;
+ }
+ 
+   size = fold_builtin_sync_mem_always_lock_free (arg);
+   if (size == integer_one_node)
+ return const1_rtx;
+   return const0_rtx;
+ }
+ 
+ /* Return a one or zero if it can be determined that size ARG is lock free on
+this architecture.  */
+ static tree
+ fold_builtin_sync_mem_is_lock_free (tree arg)
+ {
+   tree always = fold_builtin_sync_mem_always_lock_free (arg);
+ 
+   /* If it isnt always lock free, don't generate a result.  */
+   if (always == integer_one_node)
+ return always;
+ 
+   return NULL_TREE;
+ }
+ 
+ /* Return one or zero if the first argument to call EXP represents a size of
+object than can generate lock-free instructi

[testsuite] Don't XFAIL gcc.dg/uninit-B.c etc. (PR middle-end/50125)

2011-10-06 Thread Rainer Orth
After almost two months, two tests are still XPASSing everywhere:

XPASS: gcc.dg/uninit-B.c uninit i warning (test for warnings, line 12)
XPASS: gcc.dg/uninit-pr19430.c  (test for warnings, line 32)
XPASS: gcc.dg/uninit-pr19430.c uninitialized (test for warnings, line 41)

I think it's time to remove the xfail's.

Tested with the appropriate runtest invocation on i386-pc-solaris2.10,
ok for mainline?

Rainer


2011-10-06  Rainer Orth  

PR middle-end/50125
* gcc.dg/uninit-B.c (baz): Remove xfail *-*-*.
* gcc.dg/uninit-pr19430.c (main): Remove xfail *-*-*.
(bar3): Likewise.

# HG changeset patch
# Parent 60c73f26147c2e549be69d750637ed45ca48e93c
Don't XFAIL gcc.dg/uninit-B.c etc. (PR middle-end/50125)

diff --git a/gcc/testsuite/gcc.dg/uninit-B.c b/gcc/testsuite/gcc.dg/uninit-B.c
--- a/gcc/testsuite/gcc.dg/uninit-B.c
+++ b/gcc/testsuite/gcc.dg/uninit-B.c
@@ -9,7 +9,7 @@ void
 baz (void)
 {
   int i;
-  if (i) /* { dg-warning "is used uninitialized" "uninit i warning" { xfail *-*-* } } */
+  if (i) /* { dg-warning "is used uninitialized" "uninit i warning" } */
 bar (i);
   foo (&i);
 }
diff --git a/gcc/testsuite/gcc.dg/uninit-pr19430.c b/gcc/testsuite/gcc.dg/uninit-pr19430.c
--- a/gcc/testsuite/gcc.dg/uninit-pr19430.c
+++ b/gcc/testsuite/gcc.dg/uninit-pr19430.c
@@ -29,7 +29,7 @@ void frob(int *pi);
 int main(void)
 {
   int i; 
-  printf("i = %d\n", i); /* { dg-warning "'i' is used uninitialized in this function" "" { xfail *-*-* } } */
+  printf("i = %d\n", i); /* { dg-warning "'i' is used uninitialized in this function" "" } */
   frob(&i);
 
   return 0;
@@ -38,6 +38,6 @@ int main(void)
 void foo3(int*);
 void bar3(void) { 
   int x; 
-  if(x) /* { dg-warning "'x' is used uninitialized in this function" "uninitialized" { xfail *-*-* } } */
+  if(x) /* { dg-warning "'x' is used uninitialized in this function" "uninitialized" } */
 foo3(&x); 
 }


-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Initial shrink-wrapping patch

2011-10-06 Thread Ian Lance Taylor
Bernd Schmidt  writes:

> On 10/06/11 05:17, Ian Lance Taylor wrote:
>> Thinking about it I think this is the wrong approach.  The -fsplit-stack
>> code by definition has to wrap the entire function and it can not modify
>> any callee-saved registers.  We should do shrink wrapping before
>> -fsplit-stack, not the other way around.
>
> Sorry, I'm not following what you're saying here. Can you elaborate?

Basically -fsplit-stack wraps the entire function in code that (on
x86_64) looks like

cmpq%fs:112, %rsp
jae .L2
movl$24, %r10d
movl$0, %r11d
call__morestack
ret
.L2:

There is absolutely no reason to try to shrink wrap that code.  It will
never help.  That code always has to be first.  It especially has to be
first because the gold linker recognizes the prologue specially when a
split-stack function calls a non-split-stack function, in order to
request a larger stack.

Therefore, it seems to me that we should apply shrink wrapping to the
function as it exists *before* the split-stack prologue is created.  The
flag_split_stack bit should be moved after the flag_shrink_wrap bit.

Ian


Re: Initial shrink-wrapping patch

2011-10-06 Thread Bernd Schmidt
On 10/06/11 17:57, Ian Lance Taylor wrote:
> There is absolutely no reason to try to shrink wrap that code.  It will
> never help.  That code always has to be first.  It especially has to be
> first because the gold linker recognizes the prologue specially when a
> split-stack function calls a non-split-stack function, in order to
> request a larger stack.

Urgh, ok.

> Therefore, it seems to me that we should apply shrink wrapping to the
> function as it exists *before* the split-stack prologue is created.  The
> flag_split_stack bit should be moved after the flag_shrink_wrap bit.

Sounds like we just need to always emit the split prologue on the
original entry edge then. Can you test the following with Go?


Bernd
* function.c (thread_prologue_and_epilogue_insns): Emit split
prologue on the orig_entry_edge. Don't account for it in
prologue_clobbered.

Index: gcc/function.c
===
--- gcc/function.c  (revision 179619)
+++ gcc/function.c  (working copy)
@@ -5602,10 +5602,6 @@ thread_prologue_and_epilogue_insns (void
  note_stores (PATTERN (p_insn), record_hard_reg_sets,
   &prologue_clobbered);
}
-  for (p_insn = split_prologue_seq; p_insn; p_insn = NEXT_INSN (p_insn))
-   if (NONDEBUG_INSN_P (p_insn))
- note_stores (PATTERN (p_insn), record_hard_reg_sets,
-  &prologue_clobbered);
 
   bitmap_initialize (&bb_antic_flags, &bitmap_default_obstack);
   bitmap_initialize (&bb_on_list, &bitmap_default_obstack);
@@ -5758,7 +5754,7 @@ thread_prologue_and_epilogue_insns (void
 
   if (split_prologue_seq != NULL_RTX)
 {
-  insert_insn_on_edge (split_prologue_seq, entry_edge);
+  insert_insn_on_edge (split_prologue_seq, orig_entry_edge);
   inserted = true;
 }
   if (prologue_seq != NULL_RTX)


[PATCH] Optimize COND_EXPR where then/else operands are INTEGER_CSTs of different size than the comparison operands

2011-10-06 Thread Jakub Jelinek
Hi!

Since Richard's changes recently to allow different modes in vcond
patterns (so far on i?86/x86_64 only I think) we can vectorize more
COND_EXPRs than before, and this patch improves it a tiny bit more
- even i?86/x86_64 support vconds only if the sizes of vector element
modes are the same.  With this patch we can optimize even if it is wider
or narrower, by vectorizing it as the COND_EXPR in integer mode matching
the size of the comparsion operands and then a cast.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-10-06  Jakub Jelinek  

PR tree-optimization/50596
* tree-vectorizer.h (vect_is_simple_cond): New prototype.
(NUM_PATTERNS): Change to 6.
* tree-vect-patterns.c (vect_recog_mixed_size_cond_pattern): New
function.
(vect_vect_recog_func_ptrs): Add vect_recog_mixed_size_cond_pattern.
(vect_mark_pattern_stmts): Don't create stmt_vinfo for def_stmt
if it already has one, and don't set STMT_VINFO_VECTYPE in it
if it is already set.
* tree-vect-stmts.c (vect_mark_stmts_to_be_vectorized): Handle
COND_EXPR and VEC_COND_EXPR in pattern stmts.
(vect_is_simple_cond): No longer static.

* lib/target-supports.exp (check_effective_target_vect_cond_mixed):
New.
* gcc.dg/vect/vect-cond-8.c: New test.

--- gcc/tree-vectorizer.h.jj2011-09-26 14:06:52.0 +0200
+++ gcc/tree-vectorizer.h   2011-10-06 10:04:03.0 +0200
@@ -1,5 +1,5 @@
 /* Vectorizer
-   Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010
+   Copyright (C) 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011
Free Software Foundation, Inc.
Contributed by Dorit Naishlos 
 
@@ -818,6 +818,7 @@ extern bool vect_transform_stmt (gimple,
  bool *, slp_tree, slp_instance);
 extern void vect_remove_stores (gimple);
 extern bool vect_analyze_stmt (gimple, bool *, slp_tree);
+extern bool vect_is_simple_cond (tree, loop_vec_info, tree *);
 extern bool vectorizable_condition (gimple, gimple_stmt_iterator *, gimple *,
 tree, int);
 extern void vect_get_load_cost (struct data_reference *, int, bool,
@@ -902,7 +903,7 @@ extern void vect_slp_transform_bb (basic
Additional pattern recognition functions can (and will) be added
in the future.  */
 typedef gimple (* vect_recog_func_ptr) (VEC (gimple, heap) **, tree *, tree *);
-#define NUM_PATTERNS 5
+#define NUM_PATTERNS 6
 void vect_pattern_recog (loop_vec_info);
 
 /* In tree-vectorizer.c.  */
--- gcc/tree-vect-patterns.c.jj 2011-10-06 09:14:17.0 +0200
+++ gcc/tree-vect-patterns.c2011-10-06 14:37:12.0 +0200
@@ -49,12 +49,15 @@ static gimple vect_recog_dot_prod_patter
 static gimple vect_recog_pow_pattern (VEC (gimple, heap) **, tree *, tree *);
 static gimple vect_recog_over_widening_pattern (VEC (gimple, heap) **, tree *,
  tree *);
+static gimple vect_recog_mixed_size_cond_pattern (VEC (gimple, heap) **,
+ tree *, tree *);
 static vect_recog_func_ptr vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
vect_recog_widen_mult_pattern,
vect_recog_widen_sum_pattern,
vect_recog_dot_prod_pattern,
vect_recog_pow_pattern,
-vect_recog_over_widening_pattern};
+   vect_recog_over_widening_pattern,
+   vect_recog_mixed_size_cond_pattern};
 
 
 /* Function widened_name_p
@@ -1218,6 +1214,120 @@ vect_recog_over_widening_pattern (VEC (g
 }
 
 
+/* Function vect_recog_mixed_size_cond_pattern
+
+   Try to find the following pattern:
+
+ type x_t, y_t;
+ TYPE a_T, b_T, c_T;
+   loop:
+ S1  a_T = x_t CMP y_t ? b_T : c_T;
+
+   where type 'TYPE' is an integral type which has different size
+   from 'type'.  b_T and c_T are constants and if 'TYPE' is wider
+   than 'type', the constants need to fit into an integer type
+   with the same width as 'type'.
+
+   Input:
+
+   * LAST_STMT: A stmt from which the pattern search begins.
+
+   Output:
+
+   * TYPE_IN: The type of the input arguments to the pattern.
+
+   * TYPE_OUT: The type of the output of this pattern.
+
+   * Return value: A new stmt that will be used to replace the pattern.
+   Additionally a def_stmt is added.
+
+   a_it = x_t CMP y_t ? b_it : c_it;
+   a_T = (TYPE) a_it;  */
+
+static gimple
+vect_recog_mixed_size_cond_pattern (VEC (gimple, heap) **stmts, tree *type_in,
+   tree *type_out)
+{
+  gimple last_stmt = VEC_index (gimple, *stmts, 0);
+  tree cond_expr, then_clause, else_clause;
+  stmt_vec_info stmt_vinfo = vinfo_for_stmt (last_stmt), def_stmt_info;
+  tree type, vectype, comp_vectype, itype, vecitype;
+  enum machine_mode cmpmode;
+  gimple pattern_stmt, def_stmt;
+  loop_vec_info loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_vinfo);
+
+  if (!is_gimple_assign (last_stmt)
+  || gimple_assign_rhs

[PATCH] Minor readability improvement in vect_pattern_recog{,_1}

2011-10-06 Thread Jakub Jelinek
Hi!

tree-vectorizer.h already has typedefs for the recog functions,
and using that typedef we can make these two functions slightly more
readable.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-10-06  Jakub Jelinek  

* tree-vect-patterns.c (vect_pattern_recog_1): Use
vect_recog_func_ptr typedef for the first argument.
(vect_pattern_recog): Rename vect_recog_func_ptr variable
to vect_recog_func, use vect_recog_func_ptr typedef for it.

--- gcc/tree-vect-patterns.c.jj 2011-10-06 14:37:12.0 +0200
+++ gcc/tree-vect-patterns.c2011-10-06 15:50:12.0 +0200
@@ -1393,10 +1393,9 @@ vect_mark_pattern_stmts (gimple orig_stm
for vect_recog_pattern.  */
 
 static void
-vect_pattern_recog_1 (
-   gimple (* vect_recog_func) (VEC (gimple, heap) **, tree *, tree *),
-   gimple_stmt_iterator si,
-   VEC (gimple, heap) **stmts_to_replace)
+vect_pattern_recog_1 (vect_recog_func_ptr vect_recog_func,
+ gimple_stmt_iterator si,
+ VEC (gimple, heap) **stmts_to_replace)
 {
   gimple stmt = gsi_stmt (si), pattern_stmt;
   stmt_vec_info stmt_info;
@@ -1580,7 +1579,7 @@ vect_pattern_recog (loop_vec_info loop_v
   unsigned int nbbs = loop->num_nodes;
   gimple_stmt_iterator si;
   unsigned int i, j;
-  gimple (* vect_recog_func_ptr) (VEC (gimple, heap) **, tree *, tree *);
+  vect_recog_func_ptr vect_recog_func;
   VEC (gimple, heap) *stmts_to_replace = VEC_alloc (gimple, heap, 1);
 
   if (vect_print_dump_info (REPORT_DETAILS))
@@ -1596,8 +1595,8 @@ vect_pattern_recog (loop_vec_info loop_v
   /* Scan over all generic vect_recog_xxx_pattern functions.  */
   for (j = 0; j < NUM_PATTERNS; j++)
 {
-  vect_recog_func_ptr = vect_vect_recog_func_ptrs[j];
- vect_pattern_recog_1 (vect_recog_func_ptr, si,
+ vect_recog_func = vect_vect_recog_func_ptrs[j];
+ vect_pattern_recog_1 (vect_recog_func, si,
&stmts_to_replace);
 }
 }

Jakub


[PATCH] vshuffle: Use correct mode for mask operand.

2011-10-06 Thread Richard Henderson
---
 gcc/ChangeLog |5 +
 gcc/optabs.c  |   16 +++-
 2 files changed, 12 insertions(+), 9 deletions(-)

* optabs.c (expand_vec_shuffle_expr): Use the proper mode for the
mask operand.  Tidy the code.

This patch is required before I rearrange the testsuite to actually
test floating-point shuffle.


diff --git a/gcc/optabs.c b/gcc/optabs.c
index 3a52fb0..aa233d5 100644
--- a/gcc/optabs.c
+++ b/gcc/optabs.c
@@ -6650,9 +6650,8 @@ expand_vec_shuffle_expr (tree type, tree v0, tree v1, 
tree mask, rtx target)
   struct expand_operand ops[4];
   enum insn_code icode;
   enum machine_mode mode = TYPE_MODE (type);
-  rtx rtx_v0, rtx_mask;
 
-  gcc_assert (expand_vec_shuffle_expr_p (mode, v0, v1, mask));
+  gcc_checking_assert (expand_vec_shuffle_expr_p (mode, v0, v1, mask));
 
   if (TREE_CODE (mask) == VECTOR_CST)
 {
@@ -6675,24 +6674,23 @@ expand_vec_shuffle_expr (tree type, tree v0, tree v1, 
tree mask, rtx target)
   return expand_expr_real_1 (call, target, VOIDmode, EXPAND_NORMAL, NULL);
 }
 
-vshuffle:
+ vshuffle:
   icode = direct_optab_handler (vshuffle_optab, mode);
 
   if (icode == CODE_FOR_nothing)
 return 0;
 
-  rtx_mask = expand_normal (mask);
-
   create_output_operand (&ops[0], target, mode);
-  create_input_operand (&ops[3], rtx_mask, mode);
+  create_input_operand (&ops[3], expand_normal (mask),
+   TYPE_MODE (TREE_TYPE (mask)));
 
   if (operand_equal_p (v0, v1, 0))
 {
-  rtx_v0 = expand_normal (v0);
-  if (!insn_operand_matches(icode, 1, rtx_v0))
+  rtx rtx_v0 = expand_normal (v0);
+  if (!insn_operand_matches (icode, 1, rtx_v0))
 rtx_v0 = force_reg (mode, rtx_v0);
 
-  gcc_checking_assert(insn_operand_matches(icode, 2, rtx_v0));
+  gcc_checking_assert (insn_operand_matches (icode, 2, rtx_v0));
 
   create_fixed_operand (&ops[1], rtx_v0);
   create_fixed_operand (&ops[2], rtx_v0);
-- 
1.7.6.4



[PATCH] i386: Use the proper mode for blend in vshuffle.

2011-10-06 Thread Richard Henderson
This allows the use of blendvpd instead of pblendvb on the
final step.  I don't *really* know if this helps or hurts
with the re-interpretation of the data from byte data to
double data.  But it looks "nicer" anyway.


r~

---
 gcc/ChangeLog  |6 ++
 gcc/config/i386/i386.c |   28 +---
 2 files changed, 27 insertions(+), 7 deletions(-)

+   * config/i386/i386.c (ix86_expand_sse_movcc): Use correct mode
+   for vector_all_ones_operand.
+   (ix86_expand_int_vcond): Distinguish between comparison mode
+   and data mode.  Allow them to differ.
+   (ix86_expand_vshuffle): Don't force data mode to match maskmode.



diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 9960fd2..2fdf540 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -18941,7 +18941,7 @@ ix86_expand_sse_movcc (rtx dest, rtx cmp, rtx op_true, 
rtx op_false)
   enum machine_mode mode = GET_MODE (dest);
   rtx t2, t3, x;
 
-  if (vector_all_ones_operand (op_true, GET_MODE (op_true))
+  if (vector_all_ones_operand (op_true, mode)
   && rtx_equal_p (op_false, CONST0_RTX (mode)))
 {
   emit_insn (gen_rtx_SET (VOIDmode, dest, cmp));
@@ -19170,7 +19170,8 @@ ix86_expand_fp_vcond (rtx operands[])
 bool
 ix86_expand_int_vcond (rtx operands[])
 {
-  enum machine_mode mode = GET_MODE (operands[0]);
+  enum machine_mode data_mode = GET_MODE (operands[0]);
+  enum machine_mode mode = GET_MODE (operands[4]);
   enum rtx_code code = GET_CODE (operands[3]);
   bool negate = false;
   rtx x, cop0, cop1;
@@ -19297,8 +19298,21 @@ ix86_expand_int_vcond (rtx operands[])
}
 }
 
-  x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1,
-  operands[1+negate], operands[2-negate]);
+  /* Allow the comparison to be done in one mode, but the movcc to
+ happen in another mode.  */
+  if (data_mode == mode)
+{
+  x = ix86_expand_sse_cmp (operands[0], code, cop0, cop1,
+  operands[1+negate], operands[2-negate]);
+}
+  else
+{
+  gcc_assert (GET_MODE_SIZE (data_mode) == GET_MODE_SIZE (mode));
+  x = ix86_expand_sse_cmp (gen_lowpart (mode, operands[0]),
+  code, cop0, cop1,
+  operands[1+negate], operands[2-negate]);
+  x = gen_lowpart (data_mode, x);
+}
 
   ix86_expand_sse_movcc (operands[0], x, operands[1+negate],
 operands[2-negate]);
@@ -19533,9 +19547,9 @@ ix86_expand_vshuffle (rtx operands[])
   mask = expand_simple_binop (maskmode, AND, mask, vt,
  NULL_RTX, 0, OPTAB_DIRECT);
 
-  xops[0] = gen_lowpart (maskmode, operands[0]);
-  xops[1] = gen_lowpart (maskmode, t2);
-  xops[2] = gen_lowpart (maskmode, t1);
+  xops[0] = operands[0];
+  xops[1] = gen_lowpart (mode, t2);
+  xops[2] = gen_lowpart (mode, t1);
   xops[3] = gen_rtx_EQ (maskmode, mask, vt);
   xops[4] = mask;
   xops[5] = vt;
-- 
1.7.6.4



[PATCH] i386: Add AVX2 support to ix86_expand_vshuffle.

2011-10-06 Thread Richard Henderson
This was tested only via compile, and inspecting the output.

I'm attempting to set up the Intel SDE as a sim target for
the testsuite, but apparently it only supports 32-bit binaries.


r~


---
 gcc/ChangeLog  |9 
 gcc/config/i386/i386.c |  112 ++--
 gcc/config/i386/sse.md |   31 --
 3 files changed, 135 insertions(+), 17 deletions(-)

+   * config/i386/i386.c (ix86_expand_vshuffle): Add AVX2 support.
+   * config/i386/sse.md (sseshuffint): Remove.
+   (sseintvecmode): Support V16HI, V8HI, V32QI, V16QI.
+   (VSHUFFLE_AVX2): New mode iterator.
+   (vshuffle): Use it.
+   (avx_vec_concat): Rename from *vec_concat_avx.


 
diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 688fba1..9960fd2 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -19312,17 +19312,120 @@ ix86_expand_vshuffle (rtx operands[])
   rtx op0 = operands[1];
   rtx op1 = operands[2];
   rtx mask = operands[3];
-  rtx vt, vec[16];
+  rtx t1, t2, vt, vec[16];
   enum machine_mode mode = GET_MODE (op0);
   enum machine_mode maskmode = GET_MODE (mask);
   int w, e, i;
   bool one_operand_shuffle = rtx_equal_p (op0, op1);
 
-  gcc_checking_assert (GET_MODE_BITSIZE (mode) == 128);
-
   /* Number of elements in the vector.  */
   w = GET_MODE_NUNITS (mode);
   e = GET_MODE_UNIT_SIZE (mode);
+  gcc_assert (w <= 16);
+
+  if (TARGET_AVX2)
+{
+  if (mode == V4DImode || mode == V4DFmode)
+   {
+ /* Unfortunately, the VPERMQ and VPERMPD instructions only support
+an constant shuffle operand.  With a tiny bit of effort we can
+use VPERMD instead.  A re-interpretation stall for V4DFmode is
+unfortunate but there's no avoiding it.  */
+ t1 = gen_reg_rtx (V8SImode);
+
+ /* Replicate the low bits of the V4DImode mask into V8SImode:
+  mask = { A B C D }
+  t1 = { A A B B C C D D }.  */
+ for (i = 0; i < 4; ++i)
+   vec[i*2 + 1] = vec[i*2] = GEN_INT (i * 2);
+ vt = gen_rtx_CONST_VECTOR (V8SImode, gen_rtvec_v (8, vec));
+ vt = force_reg (V8SImode, vt);
+ mask = gen_lowpart (V8SImode, mask);
+ emit_insn (gen_avx2_permvarv8si (t1, vt, mask));
+
+ /* Multiply the shuffle indicies by two.  */
+ emit_insn (gen_avx2_lshlv8si3 (t1, t1, const1_rtx));
+
+ /* Add one to the odd shuffle indicies:
+   t1 = { A*2, A*2+1, B*2, B*2+1, ... }.  */
+ for (i = 0; i < 4; ++i)
+   {
+ vec[i * 2] = const0_rtx;
+ vec[i * 2 + 1] = const1_rtx;
+   }
+ vt = gen_rtx_CONST_VECTOR (V8SImode, gen_rtvec_v (8, vec));
+ vt = force_const_mem (V8SImode, vt);
+ emit_insn (gen_addv8si3 (t1, t1, vt));
+
+ /* Continue as if V8SImode was used initially.  */
+ operands[3] = mask = t1;
+ target = gen_lowpart (V8SImode, target);
+ op0 = gen_lowpart (V8SImode, op0);
+ op1 = gen_lowpart (V8SImode, op1);
+ maskmode = mode = V8SImode;
+ w = 8;
+ e = 4;
+   }
+
+  switch (mode)
+   {
+   case V8SImode:
+ /* The VPERMD and VPERMPS instructions already properly ignore
+the high bits of the shuffle elements.  No need for us to
+perform an AND ourselves.  */
+ if (one_operand_shuffle)
+   emit_insn (gen_avx2_permvarv8si (target, mask, op0));
+ else
+   {
+ t1 = gen_reg_rtx (V8SImode);
+ t2 = gen_reg_rtx (V8SImode);
+ emit_insn (gen_avx2_permvarv8si (t1, mask, op0));
+ emit_insn (gen_avx2_permvarv8si (t2, mask, op1));
+ goto merge_two;
+   }
+ return;
+
+   case V8SFmode:
+ mask = gen_lowpart (V8SFmode, mask);
+ if (one_operand_shuffle)
+   emit_insn (gen_avx2_permvarv8sf (target, mask, op0));
+ else
+   {
+ t1 = gen_reg_rtx (V8SFmode);
+ t2 = gen_reg_rtx (V8SFmode);
+ emit_insn (gen_avx2_permvarv8sf (t1, mask, op0));
+ emit_insn (gen_avx2_permvarv8sf (t2, mask, op1));
+ goto merge_two;
+   }
+ return;
+
+case V4SImode:
+ /* By combining the two 128-bit input vectors into one 256-bit
+input vector, we can use VPERMD and VPERMPS for the full
+two-operand shuffle.  */
+ t1 = gen_reg_rtx (V8SImode);
+ t2 = gen_reg_rtx (V8SImode);
+ emit_insn (gen_avx_vec_concatv8si (t1, op0, op1));
+ emit_insn (gen_avx_vec_concatv8si (t2, mask, mask));
+ emit_insn (gen_avx2_permvarv8si (t1, t2, t1));
+ emit_insn (gen_avx_vextractf128v8si (target, t1, const0_rtx));
+ return;
+
+case V4SFmode:
+ t1 = gen_reg_rtx (V8SFmode);
+ t2 = gen_reg_rtx (V8SFmode);
+ mask = gen_lowpart (V4SFmode, mask);
+ 

[PATCH] Rework vector shuffle tests.

2011-10-06 Thread Richard Henderson
Test vector sizes 8, 16, and 32.  Test most data types for each size.

This should also solve the problem that Georg reported for AVR.
Indeed, I hope that except for the DImode/DFmode tests, these
actually execute on that target.


r~


Cc: Georg-Johann Lay 
---
 gcc/testsuite/ChangeLog|   29 ++
 .../gcc.c-torture/execute/vect-shuffle-1.c |   68 -
 .../gcc.c-torture/execute/vect-shuffle-2.c |   68 -
 .../gcc.c-torture/execute/vect-shuffle-3.c |   58 ---
 .../gcc.c-torture/execute/vect-shuffle-4.c |   51 --
 .../gcc.c-torture/execute/vect-shuffle-5.c |   64 
 .../gcc.c-torture/execute/vect-shuffle-6.c |   64 
 .../gcc.c-torture/execute/vect-shuffle-7.c |   70 --
 .../gcc.c-torture/execute/vect-shuffle-8.c |   55 ---
 gcc/testsuite/gcc.c-torture/execute/vshuf-16.inc   |   81 
 gcc/testsuite/gcc.c-torture/execute/vshuf-2.inc|   38 
 gcc/testsuite/gcc.c-torture/execute/vshuf-4.inc|   39 
 gcc/testsuite/gcc.c-torture/execute/vshuf-8.inc|  101 
 gcc/testsuite/gcc.c-torture/execute/vshuf-main.inc |   26 +
 gcc/testsuite/gcc.c-torture/execute/vshuf-v16qi.c  |5 +
 gcc/testsuite/gcc.c-torture/execute/vshuf-v2df.c   |   15 +++
 gcc/testsuite/gcc.c-torture/execute/vshuf-v2di.c   |   15 +++
 gcc/testsuite/gcc.c-torture/execute/vshuf-v2sf.c   |   21 
 gcc/testsuite/gcc.c-torture/execute/vshuf-v2si.c   |   18 
 gcc/testsuite/gcc.c-torture/execute/vshuf-v4df.c   |   19 
 gcc/testsuite/gcc.c-torture/execute/vshuf-v4di.c   |   19 
 gcc/testsuite/gcc.c-torture/execute/vshuf-v4hi.c   |   15 +++
 gcc/testsuite/gcc.c-torture/execute/vshuf-v4sf.c   |   25 +
 gcc/testsuite/gcc.c-torture/execute/vshuf-v4si.c   |   22 
 gcc/testsuite/gcc.c-torture/execute/vshuf-v8hi.c   |   23 +
 gcc/testsuite/gcc.c-torture/execute/vshuf-v8qi.c   |   23 +
 gcc/testsuite/gcc.c-torture/execute/vshuf-v8si.c   |   30 ++
 27 files changed, 564 insertions(+), 498 deletions(-)
 delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-1.c
 delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-2.c
 delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-3.c
 delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-4.c
 delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-5.c
 delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-6.c
 delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-7.c
 delete mode 100644 gcc/testsuite/gcc.c-torture/execute/vect-shuffle-8.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-16.inc
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-2.inc
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-4.inc
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-8.inc
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-main.inc
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v16qi.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v2df.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v2di.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v2sf.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v2si.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v4df.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v4di.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v4hi.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v4sf.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v4si.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v8hi.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v8qi.c
 create mode 100644 gcc/testsuite/gcc.c-torture/execute/vshuf-v8si.c

+   * gcc.c-torture/execute/vect-shuffle-1.c: Remove.
+   * gcc.c-torture/execute/vect-shuffle-2.c: Remove.
+   * gcc.c-torture/execute/vect-shuffle-3.c: Remove.
+   * gcc.c-torture/execute/vect-shuffle-4.c: Remove.
+   * gcc.c-torture/execute/vect-shuffle-5.c: Remove.
+   * gcc.c-torture/execute/vect-shuffle-6.c: Remove.
+   * gcc.c-torture/execute/vect-shuffle-7.c: Remove.
+   * gcc.c-torture/execute/vect-shuffle-8.c: Remove.
+   * gcc.c-torture/execute/vshuf-16.inc: New file.
+   * gcc.c-torture/execute/vshuf-2.inc: New file.
+   * gcc.c-torture/execute/vshuf-4.inc: New file.
+   * gcc.c-torture/execute/vshuf-8.inc: New file.
+   * gcc.c-torture/execute/vshuf-main.inc: New file.
+   * gcc.c-torture/execute/vshuf-v16qi.c: New test.
+   * gcc.c-torture/execute/vshuf-v2df.c: New test.
+   * gcc.c-torture/execute/vshuf-v2di.c: New test.
+   * gcc.c-torture/execute/vshuf-v2sf.c: New test.
+   * gcc.c-torture/execute/vshuf-v2si.c: New test.

Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-06 Thread William J. Schmidt
On Thu, 2011-10-06 at 16:16 +0200, Richard Guenther wrote:



> 
> Doh, I thought you were matching gimple stmts that do the address
> computation.  But now I see you are matching the tree returned from
> get_inner_reference.  So no need to check anything for that case.
> 
> But that keeps me wondering what you'll do if the accesses were
> all pointer arithmetic, not arrays.  Thus,
> 
> extern void foo (int, int, int);
> 
> void
> f (int *p, unsigned int n)
> {
>  foo (p[n], p[n+64], p[n+128]);
> }
> 
> wouldn't that have the same issue and you wouldn't handle it?
> 
> Richard.
> 

Good point.  This indeed gets missed here, and that's more fuel for
doing a generalized strength reduction along with the special cases like
p->a[n] that are only exposed with get_inner_reference.

(The pointer arithmetic cases were picked up in my earlier "big-hammer"
approach using the aff-comb machinery, but that had too many problems in
the end, as you know.)

So for the long term I will look into a full strength reducer for
non-loop code.  For the short term, what do you think about keeping this
single transformation in reassoc to make sure it gets into 4.7?  I would
plan to strip it back out and fold it into the strength reducer
thereafter, which might or might not make 4.7 depending on my other
responsibilities and how the 4.7 schedule goes.  I haven't seen anything
official, but I'm guessing we're getting towards the end of 4.7 stage 1?



Re: [PATCH][ARM] Fix broken shift patterns

2011-10-06 Thread Paul Brook
> I believe this patch to be nothing but an improvement over the current
> state, and that a fix to the constraint problem should be a separate patch.
> 
> In that basis, am I OK to commit?

One minor nit:

> (define_special_predicate "shift_operator"
>...
>+  (ior (match_test "GET_CODE (XEXP (op, 1)) == CONST_INT
>+  && ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 1))) < 
>32")
>+  (match_test "REG_P (XEXP (op, 1))"
 
We're already enforcing the REG_P elsewhere, and it's only valid in some 
contexts, so I'd change this to:
(match_test "GET_CODE (XEXP (op, 1)) != CONST_INT
|| ((unsigned HOST_WIDE_INT) INTVAL (XEXP (op, 1))) < 32")


> Now, let me explain the other problem:
> 
> As it stands, all shift-related patterns, for ARM or Thumb2 mode, permit
> a shift to be expressed as either a shift type and amount (register or
> constant), or as a multiply and power-of-two constant.

Added complication is that only ARM mode accepts a register.
 
> Problem scenario 1:
> 
>Consider pattern (plus (mult r1 r2) r3).
> 
>It so happens that reload knows that r2 contains a constant, say 20,
>so reload checks to see if that could be converted to an immediate.
>Now, 20 is not a power of two, so recog would reject it, but it is in
>the range 0..31 so it does match the 'M' constraint. Oops!

Though as you mention below we the predicate don't allow the second operand to 
be a register, so this can never happen.  Reload may do unexpected things, but 
if it starts randomly changing valid const_int values then we have much bigger 
problems.
 
> Problem scenario 2:
> 
>Consider pattern (ashiftrt r1 r2).
> 
>Again, it so happens that reload knows that r2 contains a constant, in
>this case let's say 64, so again reload checks to see if that could
>be converted to an immediate. This time, 64 is not in the range
>0..31, so recog would reject it, but it is a power of two, so it does
>match the 'M' constraint. Again, oops!
> 
> I see two ways to fix this properly:
> 
>   1. Duplicate all the patterns in the machine description, once for the
>  mult case, and once for the other cases. This could probably be
>  done with a code iterator, if preferred.
> 
>   2. Redefine the canonical form of (plus (mult .. 2^n) ..) such that it
>  always uses the (presumably cheaper) shift-and-add option. However,
>  this would require all other targets where madd really is the best
>  option to fix it up. (I'd imagine that two instructions for shift
>  and add would be cheaper speed wise, if properly scheduled, on most
>  targets? That doesn't help the size optimization though.)

3. Consistently accept both power-of-two and 0..31 for shifts.  Large shift 
counts give undefined results[1], so replace them with an arbitrary value
(e.g. 0) during assembly output.  Argualy not an entirely "proper" fix, but I 
think it'll keep everything happy.
 
> However, it's not obvious to me that this needs fixing:
>   * The failure mode would be an ICE, and we've not seen any.

Then again noone noticed the negative-shift ICE until recently :-/

>   * There's a comment in arm.c:shift_op that suggests that this can't
> happen, somehow, at least in the mult case.
> - I'm not sure exactly how reload works, but it seems reasonable
>   that it will never try to convert a register to an immediate
>   because the pattern does not allow registers in the first place.
> - This logic doesn't hold in the opposite case though.
> Have I explained all that clearly?

I think you've convered most of it.

For bonus points we should probably disallow MULT in the arm_shiftsi3 pattern, 
stop it interacting with the regulat mulsi3 pattern in undesirable ways.

Paul

[1] Or at least not any result gcc will be expecting.


Re: [PATCH] Optimize COND_EXPR where then/else operands are INTEGER_CSTs of different size than the comparison operands

2011-10-06 Thread Ira Rosen
On 6 October 2011 18:17, Jakub Jelinek  wrote:
> Hi!
>
> Since Richard's changes recently to allow different modes in vcond
> patterns (so far on i?86/x86_64 only I think) we can vectorize more
> COND_EXPRs than before, and this patch improves it a tiny bit more
> - even i?86/x86_64 support vconds only if the sizes of vector element
> modes are the same.  With this patch we can optimize even if it is wider
> or narrower, by vectorizing it as the COND_EXPR in integer mode matching
> the size of the comparsion operands and then a cast.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>

OK, but...

> --- gcc/tree-vect-stmts.c.jj    2011-09-29 14:25:46.0 +0200
> +++ gcc/tree-vect-stmts.c       2011-10-06 12:16:43.0 +0200
> @@ -652,9 +652,26 @@ vect_mark_stmts_to_be_vectorized (loop_v
>              have to scan the RHS or function arguments instead.  */
>           if (is_gimple_assign (stmt))
>             {
> -              for (i = 1; i < gimple_num_ops (stmt); i++)
> +             enum tree_code rhs_code = gimple_assign_rhs_code (stmt);
> +             tree op = gimple_assign_rhs1 (stmt);
> +
> +             i = 1;
> +             if ((rhs_code == COND_EXPR || rhs_code == VEC_COND_EXPR)

I don't understand why we need VEC_COND_EXPR here.

> +                 && COMPARISON_CLASS_P (op))
> +               {
> +                 if (!process_use (stmt, TREE_OPERAND (op, 0), loop_vinfo,
> +                                   live_p, relevant, &worklist)
> +                     || !process_use (stmt, TREE_OPERAND (op, 1), loop_vinfo,
> +                                      live_p, relevant, &worklist))
> +                   {
> +                     VEC_free (gimple, heap, worklist);
> +                     return false;
> +                   }
> +                 i = 2;
> +               }
> +             for (; i < gimple_num_ops (stmt); i++)
>                 {
> -                  tree op = gimple_op (stmt, i);
> +                 op = gimple_op (stmt, i);
>                   if (!process_use (stmt, op, loop_vinfo, live_p, relevant,
>                                     &worklist))
>                     {
>

Thanks,
Ira


Re: [PATCH] Minor readability improvement in vect_pattern_recog{,_1}

2011-10-06 Thread Ira Rosen
On 6 October 2011 18:19, Jakub Jelinek  wrote:
> Hi!
>
> tree-vectorizer.h already has typedefs for the recog functions,
> and using that typedef we can make these two functions slightly more
> readable.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

OK.

Thanks,
Ira

>
> 2011-10-06  Jakub Jelinek  
>
>        * tree-vect-patterns.c (vect_pattern_recog_1): Use
>        vect_recog_func_ptr typedef for the first argument.
>        (vect_pattern_recog): Rename vect_recog_func_ptr variable
>        to vect_recog_func, use vect_recog_func_ptr typedef for it.
>
> --- gcc/tree-vect-patterns.c.jj 2011-10-06 14:37:12.0 +0200
> +++ gcc/tree-vect-patterns.c    2011-10-06 15:50:12.0 +0200
> @@ -1393,10 +1393,9 @@ vect_mark_pattern_stmts (gimple orig_stm
>    for vect_recog_pattern.  */
>
>  static void
> -vect_pattern_recog_1 (
> -       gimple (* vect_recog_func) (VEC (gimple, heap) **, tree *, tree *),
> -       gimple_stmt_iterator si,
> -       VEC (gimple, heap) **stmts_to_replace)
> +vect_pattern_recog_1 (vect_recog_func_ptr vect_recog_func,
> +                     gimple_stmt_iterator si,
> +                     VEC (gimple, heap) **stmts_to_replace)
>  {
>   gimple stmt = gsi_stmt (si), pattern_stmt;
>   stmt_vec_info stmt_info;
> @@ -1580,7 +1579,7 @@ vect_pattern_recog (loop_vec_info loop_v
>   unsigned int nbbs = loop->num_nodes;
>   gimple_stmt_iterator si;
>   unsigned int i, j;
> -  gimple (* vect_recog_func_ptr) (VEC (gimple, heap) **, tree *, tree *);
> +  vect_recog_func_ptr vect_recog_func;
>   VEC (gimple, heap) *stmts_to_replace = VEC_alloc (gimple, heap, 1);
>
>   if (vect_print_dump_info (REPORT_DETAILS))
> @@ -1596,8 +1595,8 @@ vect_pattern_recog (loop_vec_info loop_v
>           /* Scan over all generic vect_recog_xxx_pattern functions.  */
>           for (j = 0; j < NUM_PATTERNS; j++)
>             {
> -              vect_recog_func_ptr = vect_vect_recog_func_ptrs[j];
> -             vect_pattern_recog_1 (vect_recog_func_ptr, si,
> +             vect_recog_func = vect_vect_recog_func_ptrs[j];
> +             vect_pattern_recog_1 (vect_recog_func, si,
>                                    &stmts_to_replace);
>             }
>         }
>
>        Jakub
>


Re: [PATCH] Optimize COND_EXPR where then/else operands are INTEGER_CSTs of different size than the comparison operands

2011-10-06 Thread Jakub Jelinek
On Thu, Oct 06, 2011 at 07:27:28PM +0200, Ira Rosen wrote:
> > +             i = 1;
> > +             if ((rhs_code == COND_EXPR || rhs_code == VEC_COND_EXPR)
> 
> I don't understand why we need VEC_COND_EXPR here.

Only for completeness, as VEC_COND_EXPR is the same weirdo thingie like
COND_EXPR.  I can leave that out if you want.

Jakub


Re: [PATCH] Minor readability improvement in vect_pattern_recog{,_1}

2011-10-06 Thread Richard Henderson
On 10/06/2011 09:19 AM, Jakub Jelinek wrote:
>   * tree-vect-patterns.c (vect_pattern_recog_1): Use
>   vect_recog_func_ptr typedef for the first argument.
>   (vect_pattern_recog): Rename vect_recog_func_ptr variable
>   to vect_recog_func, use vect_recog_func_ptr typedef for it.

Ok.


r~


Re: Initial shrink-wrapping patch

2011-10-06 Thread Richard Henderson
On 10/06/2011 09:01 AM, Bernd Schmidt wrote:
> On 10/06/11 17:57, Ian Lance Taylor wrote:
>> There is absolutely no reason to try to shrink wrap that code.  It will
>> never help.  That code always has to be first.  It especially has to be
>> first because the gold linker recognizes the prologue specially when a
>> split-stack function calls a non-split-stack function, in order to
>> request a larger stack.
> 
> Urgh, ok.
> 
>> Therefore, it seems to me that we should apply shrink wrapping to the
>> function as it exists *before* the split-stack prologue is created.  The
>> flag_split_stack bit should be moved after the flag_shrink_wrap bit.
> 
> Sounds like we just need to always emit the split prologue on the
> original entry edge then. Can you test the following with Go?

Looks reasonable.

I wonder if we can have this as a generic feature?  I'm thinking about
things like the MIPS and Alpha load-gp stuff.  That operation also needs
to happen exactly at the start of the function, due to the pc-relative
nature of the operation.

I do see that MIPS works around this by emitting the load-gp as text
in the legacy prologue.  But Alpha makes some effort to emit this as
rtl, so that the scheduler knows about the two pipeline reservations
and the latency of any use of the gp register.

Would a "pre_prologue" named pattern seem wrong to anyone?


r~


Re: [PATCH] Fix PR38885

2011-10-06 Thread H.J. Lu
On Wed, Oct 5, 2011 at 6:48 AM, Richard Guenther  wrote:
>
> I'm testing a pair of patches to fix PR38885 (for constants)
> and PR38884 (for non-constants) stores to complex/vector memory
> and CSE of component accesses from SCCVN.
>
> This is the piece that handles stores from constants and partial
> reads of it.  We can conveniently re-use fold-const native
> encode/interpret code for this.
>
> Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.
>
> Richard.
>
> 2011-10-05  Richard Guenther  
>
>        PR tree-optimization/38885
>        * tree-ssa-sccvn.c (vn_reference_lookup_3): Handle partial reads
>        from constants.
>
>        * gcc.dg/tree-ssa/ssa-fre-33.c: New testcase.
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50634


-- 
H.J.


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-06 Thread Jeff Law
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/06/11 09:37, Jakub Jelinek wrote:
> On Thu, Oct 06, 2011 at 05:28:36PM +0200, Kai Tietz wrote:
>> None.  I had this implemented first.  But Richard was concerned
>> about making non-IF conditions too long.I understand that
>> point that it might not that good to always modify
>> unconditionally to AND/OR chain. For example
>> 
>> if (a1 && a2 && a3 &&  && a100) return 1;
>> 
>> would be packed by this patch into 50 branches.   If we would
>> modify all of them into AND, then we would calculate for all 100
>> values the result, even if the first a1 is zero.  This doesn't
>> improve speed pretty well.
>> 
>> But you are right, that from the point of reassociation
>> optimization it could be in some cases more profitable to have
>> packed all elements into on AND-chain.
> 
> Yeah.  Perhaps we should break them up after reassoc2, or on the
> other side teach reassoc (or some other pass) to be able to do the
> optimizations on a series of GIMPLE_COND with no side-effects in
> between. See e.g. PR46309, return a == 3 || a == 1 || a == 2 || a
> == 4; isn't optimized into (a - 1U) < 4U, although it could, if
> branch cost cause it to be broken up into several GIMPLE_COND
> stmts. Or if user writes: if (a == 3) return 1; if (a == 1) return
> 1; if (a == 2) return 1; if (a == 4) return 1; return 0; (more
> probably using enums).
I haven't followed this thread as closely as perhaps I should; what
I'm seeing discussed now looks a lot like condition merging and I'm
pretty sure there's some research in this area that might guide us.
multi-variable condition merging is the term the authors used.

jeff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOjeYFAAoJEBRtltQi2kC7eFMIALjFM/GIg1DDryU59EoFQe5A
x7pvx3FSlcjLWeyIlzYJvWF4wGybRNNXp5qziIedO6qp0Z/06VvCU07A10VoWSig
/EFufo5l+Jef5s1d0mA6mBJ9A52HDL2ipOK8YDQbVzJWqHdaXLrrzUni3wGwcUVs
v3UIi5OevjRhJ55fRVxBcReJKF6YAzxFDxqOnVGAbf9R3BEJ2T9JW2CLhIcd/T1L
D9K+6YymHaN9eYh7B7gPKG88q+5JjcStHuMQODKSAegt3T4iP9CH/G5dV8u95Y+q
6mxo8gOGAwYR7N/U6fuXRaGJEzWSdrqRy2EBF5B7+Rt6lSWXdfzUEBusT24i67A=
=HIrU
-END PGP SIGNATURE-


Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-06 Thread Kai Tietz
2011/10/6 Michael Matz :
> Hi,
>
> On Thu, 6 Oct 2011, Kai Tietz wrote:
>
>> > at which point this association doesn't make sense anymore, as
>>
>> Sorry, exactly this doesn't happen, due an ANDIF isn't simple, and
>> therefore it isn't transformed into and AND.
>
> Right ...
>
>> >  ((W AND X) AND Y) AND Z
>> >
>> > is just as fine.  So, the reassociation looks fishy at best, better get
>> > rid of it?  (which of the testcases breaks without it?)
>>
>> None.  I had this implemented first.  But Richard was concerned about
>> making non-IF conditions too long.    I understand that point that it
>> might not that good to always modify unconditionally to AND/OR chain.
>
> ... and I see that (that's why the transformation should be desirable for
> some definition of desirable, which probably includes "and RHS not too
> long chain").  As it stands right now your transformation seems to be a
> fairly ad-hoc try at avoiding this problem.  That's why I wonder why to do
> the reassoc at all?  Which testcases break _without_ the reassociation,
> i.e. with only rewriting ANDIF -> AND at the outermost level?

I don't do here reassociation in inner.  See that patch calls
build2_loc, and not fold_build2_loc anymore.  So it doesn't retries to
associate in inner anymore (which might be something of interest for
the issue Jakub mentioned).

There is no test actual failing AFAICS.  I just noticed
size-differences by this.  Nevertheless it might be better to enhance
reassociation pass to break-up (and repropagate) GIMPLE_CONDs with
non-side-effect, as Jakub suggested.

The other chance might be here to allow deeper chains then two
elements within one AND/OR element, but this would be architecture
dependent.  For x86 -as example- used instruction cycles for a common
for branching would suggest that it might be profitable to have here 3
or 4 leafs within one AND|OR chain.  But for sure on other
architectures the amount of leafs varies.

Regards,
Kai


Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-06 Thread Jeff Law
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/06/11 04:13, Richard Guenther wrote:

> 
> People have already commented on pieces, so I'm looking only at the
> tree-ssa-reassoc.c pieces (did you consider piggy-backing on IVOPTs
> instead?  The idea is to expose additional CSE opportunities,
> right?  So it's sort-of a strength-reduction optimization on scalar
> code (classically strength reduction in loops transforms for (i) {
> z = i*x; } to z = 0; for (i) { z += x }). That might be worth in
> general, even for non-address cases. So - if you rename that thing
> to tree-ssa-strength-reduce.c you can get away without
> piggy-backing on anything ;)  If you structure it to detect a
> strength reduction opportunity (thus, you'd need to match
> two/multiple of the patterns at the same time) that would be a
> bonus ... generalizing it a little bit would be another.
There's a variety of literature that uses PRE to detect and optimize
straightline code strength reduction.  I poked at it at one time (RTL
gcse framework) and it looked reasonably promising.  Never pushed it
all the way through.

jeff
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iQEcBAEBAgAGBQJOjebJAAoJEBRtltQi2kC71ogH/AkMNzXpYK1GXp2EhoS+3Dhn
T1mWDKdHT5+ozpuAxRFzuCSQ8HmkbLJk8fGpOyUuLr15zEnT1isE7cU3i4ZzY3o0
lduo9Ck23rMWNroYgxbV+zPvArW5MG9qrGO6XSBynfipmlpznEo8zQPiaoaASlHz
8G7gd9P2la1QHha9OVtiCMKs0zgckU55RqiwV7d8DMi5tgoq5wkN+qcKCoSI7+b0
jxAukIcp6O8QZ6ADcHyAdav+zZzGDBycEhgakam71WifjFlysah2TG05SsK75Dxi
h3S13yPpx/A8zBuex5osL0qOGn0H7L93uAsTxcv4dTEpUl4Jx7Y5FoPOEp5D1Z4=
=LcZy
-END PGP SIGNATURE-


Re: [PATCH] Optimize COND_EXPR where then/else operands are INTEGER_CSTs of different size than the comparison operands

2011-10-06 Thread Ira Rosen
On 6 October 2011 19:28, Jakub Jelinek  wrote:
> On Thu, Oct 06, 2011 at 07:27:28PM +0200, Ira Rosen wrote:
>> > +             i = 1;
>> > +             if ((rhs_code == COND_EXPR || rhs_code == VEC_COND_EXPR)
>>
>> I don't understand why we need VEC_COND_EXPR here.
>
> Only for completeness, as VEC_COND_EXPR is the same weirdo thingie like
> COND_EXPR.  I can leave that out if you want.

But we mark stmts that we want to vectorize here. I think that
expecting a vector stmt is confusing. So yes, please, leave it out.

Thanks,
Ira

>
>        Jakub
>


Re: [PATCH] Add support for lzd and popc instructions on sparc.

2011-10-06 Thread Richard Henderson
On 10/05/2011 11:41 PM, David Miller wrote:
> +(define_expand "popcount2"
> +  [(set (match_operand:SIDI 0 "register_operand" "")
> +(popcount:SIDI (match_operand:SIDI 1 "register_operand" "")))]
> +  "TARGET_POPC"
> +{
> +  if (! TARGET_ARCH64)
> +{
> +  emit_insn (gen_popcount_v8plus (operands[0], operands[1]));
> +  DONE;
> +}
> +})
> +
> +(define_insn "*popcount_sp64"
> +  [(set (match_operand:SIDI 0 "register_operand" "=r")
> +(popcount:SIDI (match_operand:SIDI 1 "register_operand" "r")))]
> +  "TARGET_POPC && TARGET_ARCH64"
> +  "popc\t%1, %0")

You've said that POPC only operates on the full 64-bit register,
but I see no zero-extend of the SImode input?  Similarly for 
the clzsi patterns.

If it weren't for the v8plus ugliness, it would be sufficient to
only expose the DImode patterns, and let optabs.c do the work to
extend from SImode...


r~


[Patch 0/5] ARM 64 bit sync atomic operations [V3]

2011-10-06 Thread Dr. David Alan Gilbert

Hi,
  This is V3 of a series of 5 patches relating to ARM atomic operations;
they incorporate most of the feedback from V2.  Note the patch numbering/
ordering is different from v2; the two simple patches are now first.

  1) Correct the definition of TARGET_HAVE_DMB_MCR so that it doesn't
 produce the mcr instruction in Thumb1 (and enable on ARMv6 not just 6k
 as per the docs).
  2) Fix pr48126 which is a misplaced barrier in the atomic generation
  3) Provide 64 bit atomic operations using the new ldrexd/strexd in ARMv6k 
 and above.
  4) Provide fallbacks so that when compiled for earlier CPUs a Linux kernel
 asssist is called (as per 32bit and smaller ops)
  5) Add test cases and support for those test cases, for the operations
 added in (3) and (4).

This code has been tested built on x86-64 cross to ARM run in ARM and Thumb
(C, C++, Fortran).

It is against git rev 68a79dfc.

Relative to v2:
  Test cases split out
  Code sharing between the test cases
  More coding style cleanup
  A handful of NULL->NULL_RTX changes

Relative to v1:
  Split the DMB_MCR patch out
  Provide complete changelogs
  Don't emit IT instruction except in Thumb2 mode
  Move iterators to iterators.md (didn't move the table since it was specific
to sync.md)
  Remove sync_atleastsi
  Use sync_predtab in as many places as possible
  Avoid headers in libgcc
  Made various libgcc routines I added static
  used __write instead of write
  Comment the barrier move to explain it more

  Note that the kernel interface has remained the same for the helper, and as
such I've not changed the way the helper calling in patch 2 is structured.

This work is part of Linaro blueprint:
https://blueprints.launchpad.net/linaro-toolchain-misc/+spec/64-bit-sync-primitives

Dave



[Patch 1/5] ARM 64 bit sync atomic operations [V3]

2011-10-06 Thread Dr. David Alan Gilbert
   gcc/
   * config/arm/arm.c (TARGET_HAVE_DMB_MCR): MCR Not available in Thumb1

diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
index 993e3a0..f6f1da7 100644
--- a/gcc/config/arm/arm.h
+++ b/gcc/config/arm/arm.h
@@ -288,7 +288,8 @@ extern void (*arm_lang_output_object_attributes_hook)(void);
 #define TARGET_HAVE_DMB(arm_arch7)
 
 /* Nonzero if this chip implements a memory barrier via CP15.  */
-#define TARGET_HAVE_DMB_MCR(arm_arch6k && ! TARGET_HAVE_DMB)
+#define TARGET_HAVE_DMB_MCR(arm_arch6 && ! TARGET_HAVE_DMB \
+&& ! TARGET_THUMB1)
 
 /* Nonzero if this chip implements a memory barrier instruction.  */
 #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR)


[Patch 2/5] ARM 64 bit sync atomic operations [V3]

2011-10-06 Thread Dr. David Alan Gilbert
Micahel K. Edwards points out in PR/48126 that the sync is in the wrong 
place
relative to the branch target of the compare, since the load could float
up beyond the ldrex.
  
PR target/48126

  * config/arm/arm.c (arm_output_sync_loop): Move label before barrier

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 5161439..6e7105a 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24214,8 +24214,11 @@ arm_output_sync_loop (emit_f emit,
}
 }
 
-  arm_process_output_memory_barrier (emit, NULL);
+  /* Note: label is before barrier so that in cmp failure case we still get
+ a barrier to stop subsequent loads floating upwards past the ldrex
+ pr/48126.  */
   arm_output_asm_insn (emit, 1, operands, "%sLSYB%%=:", LOCAL_LABEL_PREFIX);
+  arm_process_output_memory_barrier (emit, NULL);
 }
 
 static rtx


[Patch 3/5] ARM 64 bit sync atomic operations [V3]

2011-10-06 Thread Dr. David Alan Gilbert
Add support for ARM 64bit sync intrinsics.

gcc/
* arm.c (arm_output_ldrex): Support ldrexd.
  (arm_output_strex): Support strexd.
  (arm_output_it): New helper to output it in Thumb2 mode only.
  (arm_output_sync_loop): Support DI mode,
 Change comment to not support const_int.
  (arm_expand_sync): Support DI mode.

* arm.h (TARGET_HAVE_LDREXBHD): Split into LDREXBH and LDREXD.

* iterators.md (NARROW): move from sync.md.
  (QHSD): New iterator for all current ARM integer modes.
  (SIDI): New iterator for SI and DI modes only.

* sync.md  (sync_predtab): New mode_attr
  (sync_compare_and_swapsi): Fold into sync_compare_and_swap
  (sync_lock_test_and_setsi): Fold into sync_lock_test_and_setsi
  (sync_si): Fold into sync_
  (sync_nandsi): Fold into sync_nand
  (sync_new_si): Fold into sync_new_
  (sync_new_nandsi): Fold into sync_new_nand
  (sync_old_si): Fold into sync_old_
  (sync_old_nandsi): Fold into sync_old_nand
  (sync_compare_and_swap): Support SI & DI
  (sync_lock_test_and_set): Likewise
  (sync_): Likewise
  (sync_nand): Likewise
  (sync_new_): Likewise
  (sync_new_nand): Likewise
  (sync_old_): Likewise
  (sync_old_nand): Likewise
  (arm_sync_compare_and_swapsi): Turn into iterator on SI & DI
  (arm_sync_lock_test_and_setsi): Likewise
  (arm_sync_new_si): Likewise
  (arm_sync_new_nandsi): Likewise
  (arm_sync_old_si): Likewise
  (arm_sync_old_nandsi): Likewise
  (arm_sync_compare_and_swap NARROW): use sync_predtab, fix indent
  (arm_sync_lock_test_and_setsi NARROW): Likewise
  (arm_sync_new_ NARROW): Likewise
  (arm_sync_new_nand NARROW): Likewise
  (arm_sync_old_ NARROW): Likewise
  (arm_sync_old_nand NARROW): Likewise

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 6e7105a..51c0f3f 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -24039,12 +24039,26 @@ arm_output_ldrex (emit_f emit,
  rtx target,
  rtx memory)
 {
-  const char *suffix = arm_ldrex_suffix (mode);
-  rtx operands[2];
+  rtx operands[3];
 
   operands[0] = target;
-  operands[1] = memory;
-  arm_output_asm_insn (emit, 0, operands, "ldrex%s\t%%0, %%C1", suffix);
+  if (mode != DImode)
+{
+  const char *suffix = arm_ldrex_suffix (mode);
+  operands[1] = memory;
+  arm_output_asm_insn (emit, 0, operands, "ldrex%s\t%%0, %%C1", suffix);
+}
+  else
+{
+  /* The restrictions on target registers in ARM mode are that the two
+registers are consecutive and the first one is even; Thumb is
+actually more flexible, but DI should give us this anyway.
+Note that the 1st register always gets the lowest word in memory.  */
+  gcc_assert ((REGNO (target) & 1) == 0);
+  operands[1] = gen_rtx_REG (SImode, REGNO (target) + 1);
+  operands[2] = memory;
+  arm_output_asm_insn (emit, 0, operands, "ldrexd\t%%0, %%1, %%C2");
+}
 }
 
 /* Emit a strex{b,h,d, } instruction appropriate for the specified
@@ -24057,14 +24071,41 @@ arm_output_strex (emit_f emit,
  rtx value,
  rtx memory)
 {
-  const char *suffix = arm_ldrex_suffix (mode);
-  rtx operands[3];
+  rtx operands[4];
 
   operands[0] = result;
   operands[1] = value;
-  operands[2] = memory;
-  arm_output_asm_insn (emit, 0, operands, "strex%s%s\t%%0, %%1, %%C2", suffix,
-  cc);
+  if (mode != DImode)
+{
+  const char *suffix = arm_ldrex_suffix (mode);
+  operands[2] = memory;
+  arm_output_asm_insn (emit, 0, operands, "strex%s%s\t%%0, %%1, %%C2",
+ suffix, cc);
+}
+  else
+{
+  /* The restrictions on target registers in ARM mode are that the two
+registers are consecutive and the first one is even; Thumb is
+actually more flexible, but DI should give us this anyway.
+Note that the 1st register always gets the lowest word in memory.  */
+  gcc_assert ((REGNO (value) & 1) == 0 || TARGET_THUMB2);
+  operands[2] = gen_rtx_REG (SImode, REGNO (value) + 1);
+  operands[3] = memory;
+  arm_output_asm_insn (emit, 0, operands, "strexd%s\t%%0, %%1, %%2, %%C3",
+  cc);
+}
+}
+
+/* Helper to emit an it instruction in Thumb2 mode only; although the assembler
+   will ignore it in ARM mode, emitting it will mess up instruction counts we
+   sometimes keep 'flags' are the extra t's and e's if it's more than one
+   instruction that is conditional.  */
+static void
+arm_output_it (emit_f emit, const char *flags, const char *cond)
+{
+  rtx operands[1]; /* Don't actually use the operand.  */
+  if (TARGET_THUMB2)
+arm_output_asm_insn (emit, 0, operands, "it%s\t%s", flags, 

[Patch 4/5] ARM 64 bit sync atomic operations [V3]

2011-10-06 Thread Dr. David Alan Gilbert
Add ARM 64bit sync helpers for use on older ARMs.  Based on 32bit
versions but with check for sufficiently new kernel version.

gcc/
* config/arm/linux-atomic-64bit.c: New (based on linux-atomic.c)
* config/arm/linux-atomic.c: Change comment to point to 64bit version
  (SYNC_LOCK_RELEASE): Instantiate 64bit version.
* config/arm/t-linux-eabi: Pull in linux-atomic-64bit.c


diff --git a/gcc/config/arm/linux-atomic-64bit.c 
b/gcc/config/arm/linux-atomic-64bit.c
new file mode 100644
index 000..6966e66
--- /dev/null
+++ b/gcc/config/arm/linux-atomic-64bit.c
@@ -0,0 +1,166 @@
+/* 64bit Linux-specific atomic operations for ARM EABI.
+   Copyright (C) 2008, 2009, 2010, 2011 Free Software Foundation, Inc.
+   Based on linux-atomic.c
+
+   64 bit additions david.gilb...@linaro.org
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify it under
+the terms of the GNU General Public License as published by the Free
+Software Foundation; either version 3, or (at your option) any later
+version.
+
+GCC is distributed in the hope that it will be useful, but WITHOUT ANY
+WARRANTY; without even the implied warranty of MERCHANTABILITY or
+FITNESS FOR A PARTICULAR PURPOSE.  See the GNU General Public License
+for more details.
+
+Under Section 7 of GPL version 3, you are granted additional
+permissions described in the GCC Runtime Library Exception, version
+3.1, as published by the Free Software Foundation.
+
+You should have received a copy of the GNU General Public License and
+a copy of the GCC Runtime Library Exception along with this program;
+see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+.  */
+
+/* 64bit helper functions for atomic operations; the compiler will
+   call these when the code is compiled for a CPU without ldrexd/strexd.
+   (If the CPU had those then the compiler inlines the operation).
+
+   These helpers require a kernel helper that's only present on newer
+   kernels; we check for that in an init section and bail out rather
+   unceremoneously.  */
+
+extern unsigned int __write (int fd, const void *buf, unsigned int count);
+extern void abort (void);
+
+/* Kernel helper for compare-and-exchange.  */
+typedef int (__kernel_cmpxchg64_t) (const long long* oldval,
+   const long long* newval,
+   long long *ptr);
+#define __kernel_cmpxchg64 (*(__kernel_cmpxchg64_t *) 0x0f60)
+
+/* Kernel helper page version number.  */
+#define __kernel_helper_version (*(unsigned int *)0x0ffc)
+
+/* Check that the kernel has a new enough version at load.  */
+static void __check_for_sync8_kernelhelper (void)
+{
+  if (__kernel_helper_version < 5)
+{
+  const char err[] = "A newer kernel is required to run this binary. "
+   "(__kernel_cmpxchg64 helper)\n";
+  /* At this point we need a way to crash with some information
+for the user - I'm not sure I can rely on much else being
+available at this point, so do the same as generic-morestack.c
+write () and abort ().  */
+  __write (2 /* stderr.  */, err, sizeof (err));
+  abort ();
+}
+};
+
+static void (*__sync8_kernelhelper_inithook[]) (void)
+   __attribute__ ((used, section (".init_array"))) = {
+  &__check_for_sync8_kernelhelper
+};
+
+#define HIDDEN __attribute__ ((visibility ("hidden")))
+
+#define FETCH_AND_OP_WORD64(OP, PFX_OP, INF_OP)\
+  long long HIDDEN \
+  __sync_fetch_and_##OP##_8 (long long *ptr, long long val)\
+  {\
+int failure;   \
+long long tmp,tmp2;\
+   \
+do {   \
+  tmp = *ptr;  \
+  tmp2 = PFX_OP (tmp INF_OP val);  \
+  failure = __kernel_cmpxchg64 (&tmp, &tmp2, ptr); \
+} while (failure != 0);\
+   \
+return tmp;\
+  }
+
+FETCH_AND_OP_WORD64 (add,   , +)
+FETCH_AND_OP_WORD64 (sub,   , -)
+FETCH_AND_OP_WORD64 (or,, |)
+FETCH_AND_OP_WORD64 (and,   , &)
+FETCH_AND_OP_WORD64 (xor,   , ^)
+FETCH_AND_OP_WORD64 (nand, ~, &)
+
+#define NAME_oldval(OP, WIDTH) __sync_fetch_and_##OP##_##WIDTH
+#define NAME_newval(OP, WIDTH) __sync_##OP##_and_fetch_##WIDTH
+
+/* Implement both __sync__and_fetch and __sync_fetch_and_ for
+   subword-sized quantities.  */
+
+#define OP_AND_FETCH_WORD64(OP, PFX_OP, INF_OP)\
+  long long HIDDEN   

[Patch 5/5] ARM 64 bit sync atomic operations [V3]

2011-10-06 Thread Dr. David Alan Gilbert
   Test support for ARM 64bit sync intrinsics.

  gcc/testsuite/
* gcc.dg/di-longlong64-sync-1.c: New test.
* gcc.dg/di-sync-multithread.c: New test.
* gcc.target/arm/di-longlong64-sync-withhelpers.c: New test.
* gcc.target/arm/di-longlong64-sync-withldrexd.c: New test.
* lib/target-supports.exp: (arm_arch_*_ok): Series of  effective-target
tests for v5, v6, v6k, and v7-a, and add-options helpers.
  (check_effective_target_arm_arm_ok): New helper.
  (check_effective_target_sync_longlong): New helper.

diff --git a/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c 
b/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c
new file mode 100644
index 000..82a4ea2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/di-longlong64-sync-1.c
@@ -0,0 +1,164 @@
+/* { dg-do run } */
+/* { dg-require-effective-target sync_longlong } */
+/* { dg-options "-std=gnu99" } */
+/* { dg-message "note: '__sync_fetch_and_nand' changed semantics in GCC 4.4" 
"" { target *-*-* } 0 } */
+/* { dg-message "note: '__sync_nand_and_fetch' changed semantics in GCC 4.4" 
"" { target *-*-* } 0 } */
+
+
+/* Test basic functionality of the intrinsics.  The operations should
+   not be optimized away if no one checks the return values.  */
+
+/* Based on ia64-sync-[12].c, but 1) long on ARM is 32 bit so use long long
+   (an explicit 64bit type maybe a better bet) and 2) Use values that cross
+   the 32bit boundary and cause carries since the actual maths are done as
+   pairs of 32 bit instructions.  */
+
+/* Note: This file is #included by some of the ARM tests.  */
+
+__extension__ typedef __SIZE_TYPE__ size_t;
+
+extern void abort (void);
+extern void *memcpy (void *, const void *, size_t);
+extern int memcmp (const void *, const void *, size_t);
+
+/* Temporary space where the work actually gets done.  */
+static long long AL[24];
+/* Values copied into AL before we start.  */
+static long long init_di[24] = { 0x10002ll, 0x20003ll, 0, 1,
+
+0x10002ll, 0x10002ll,
+0x10002ll, 0x10002ll,
+
+0, 0x1000e0dell,
+42 , 0xc001c0dell,
+
+-1ll, 0, 0xff00ffll, -1ll,
+
+0, 0x1000e0dell,
+42 , 0xc001c0dell,
+
+-1ll, 0, 0xff00ffll, -1ll};
+/* This is what should be in AL at the end.  */
+static long long test_di[24] = { 0x1234567890ll, 0x1234567890ll, 1, 0,
+
+0x10002ll, 0x10002ll,
+0x10002ll, 0x10002ll,
+
+1, 0xc001c0dell,
+20, 0x1000e0dell,
+
+0x30007ll , 0x50009ll,
+0xf100ff0001ll, ~0xa0007ll,
+
+1, 0xc001c0dell,
+20, 0x1000e0dell,
+
+0x30007ll , 0x50009ll,
+0xf100ff0001ll, ~0xa0007ll };
+
+/* First check they work in terms of what they do to memory.  */
+static void
+do_noret_di (void)
+{
+  __sync_val_compare_and_swap (AL+0, 0x10002ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap (AL+1, 0x20003ll, 0x1234567890ll);
+  __sync_lock_test_and_set (AL+2, 1);
+  __sync_lock_release (AL+3);
+
+  /* The following tests should not change the value since the
+ original does NOT match.  */
+  __sync_val_compare_and_swap (AL+4, 0x2ll, 0x1234567890ll);
+  __sync_val_compare_and_swap (AL+5, 0x1ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap (AL+6, 0x2ll, 0x1234567890ll);
+  __sync_bool_compare_and_swap (AL+7, 0x1ll, 0x1234567890ll);
+
+  __sync_fetch_and_add (AL+8, 1);
+  __sync_fetch_and_add (AL+9, 0xb000e000ll); /* + to both halves & carry.  
*/
+  __sync_fetch_and_sub (AL+10, 22);
+  __sync_fetch_and_sub (AL+11, 0xb000e000ll);
+
+  __sync_fetch_and_and (AL+12, 0x30007ll);
+  __sync_fetch_and_or (AL+13, 0x50009ll);
+  __sync_fetch_and_xor (AL+14, 0xe0001ll);
+  __sync_fetch_and_nand (AL+15, 0xa0007ll);
+
+  /* These should be the same as the fetch_and_* cases except for
+ return value.  */
+  __sync_add_and_fetch (AL+16, 1);
+  /* add to both halves & carry.  */
+  __sync_add_and_fetch (AL+17, 0xb000e000ll);
+  __sync_sub_and_fetch (AL+18, 22);
+  __sync_sub_and_fetch (AL+19, 0xb000e000ll);
+
+  __sync_and_and_fetch (AL+20, 0x30007ll);
+  __sync_or_and_fetch (AL+21, 0x50009ll);
+  __sync_xor_and_fetch (AL+22, 0xe0001ll);
+  __sync_nand_and_fetch (AL+23, 0xa0007ll);
+}
+
+/* Now check return values.  */
+static void
+do_ret_di (void)
+{
+  if (__sync_val_compare_and_swap (AL+0, 0x10002ll, 0x1234567890ll) !=
+   0x10002ll) abort (

Re: [PATCH] Fix PR46556 (poor address generation)

2011-10-06 Thread William J. Schmidt
On Thu, 2011-10-06 at 11:35 -0600, Jeff Law wrote:
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
> 
> On 10/06/11 04:13, Richard Guenther wrote:
> 
> > 
> > People have already commented on pieces, so I'm looking only at the
> > tree-ssa-reassoc.c pieces (did you consider piggy-backing on IVOPTs
> > instead?  The idea is to expose additional CSE opportunities,
> > right?  So it's sort-of a strength-reduction optimization on scalar
> > code (classically strength reduction in loops transforms for (i) {
> > z = i*x; } to z = 0; for (i) { z += x }). That might be worth in
> > general, even for non-address cases. So - if you rename that thing
> > to tree-ssa-strength-reduce.c you can get away without
> > piggy-backing on anything ;)  If you structure it to detect a
> > strength reduction opportunity (thus, you'd need to match
> > two/multiple of the patterns at the same time) that would be a
> > bonus ... generalizing it a little bit would be another.
> There's a variety of literature that uses PRE to detect and optimize
> straightline code strength reduction.  I poked at it at one time (RTL
> gcse framework) and it looked reasonably promising.  Never pushed it
> all the way through.
> 
> jeff

I ran across http://gcc.gnu.org/bugzilla/show_bug.cgi?id=22586 and
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=35308 that show this
question has come up before.  The former also suggested a PRE-based
approach.



Re: Initial shrink-wrapping patch

2011-10-06 Thread Bernd Schmidt
HJ found some more maybe_record_trace_start failures. In one case I
debugged, we have

(insn/f 31 0 32 (parallel [
(set (reg/f:DI 7 sp)
(plus:DI (reg/f:DI 7 sp)
(const_int 8 [0x8])))
(clobber (reg:CC 17 flags))
(clobber (mem:BLK (scratch) [0 A8]))
]) -1
 (expr_list:REG_CFA_ADJUST_CFA (set (reg/f:DI 7 sp)
(plus:DI (reg/f:DI 7 sp)
(const_int 8 [0x8])))
(nil)))

The insn pattern is later changed by csa to adjust by 24, and the note
is left untouched; that seems to be triggering the problem.

Richard, is there a reason to use REG_CFA_ADJUST_CFA rather than
REG_CFA_DEF_CFA? If no, I'll just try to fix i386.c not to emit the former.


Bernd


Re: Vector shuffling

2011-10-06 Thread Georg-Johann Lay

Richard Henderson schrieb:

On 10/06/2011 04:46 AM, Georg-Johann Lay wrote:


So here it is.  Lightly tested on my target: All tests either PASS or are
UNSUPPORTED now.

Ok?


Not ok, but only because I've completely restructured the tests again.
Patch coming very shortly...


Thanks, I hope your patch fixed the issues addressed in my patch :-)

Johann



r~





  1   2   >