[PATCH] Optimize __sync_fetch_and_add (x, -N) == N and __sync_add_and_fetch (x, N) == 0 (PR target/48986)

2011-05-17 Thread Jakub Jelinek
Hi!

This patch optimizes using peephole2 __sync_fetch_and_add (x, -N) == N
and __sync_add_and_fetch (x, N) == 0 by just doing lock {add,sub,inc,dec}
and testing flags, instead of lock xadd plus comparison.
The sync_old_add predicate change makes it possible to optimize
__sync_add_and_fetch with constant second argument to same
code as __sync_fetch_and_add.  Doing it in peephole2 has a disadvantage
though, both that the 3 instructions need to be consecutive and
e.g. that xadd insn has to be supported by the CPU.  Other alternative
would be to come up with a new bool builtin that would represent the
whole __sync_fetch_and_add (x, -N) == N operation (perhaps with dot or space
in its name to make it inaccessible), try to match it during some folding
and expand it using special optab.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
this way?

2011-05-16  Jakub Jelinek  

PR target/48986
* config/i386/sync.md (sync_old_add): Relax operand 2
predicate to allow CONST_INT.
(*sync_old_add_cmp): New insn and peephole2 for it.

--- gcc/config/i386/sync.md.jj  2010-05-21 11:46:29.0 +0200
+++ gcc/config/i386/sync.md 2011-05-16 14:42:08.0 +0200
@@ -170,11 +170,62 @@ (define_insn "sync_old_add"
  [(match_operand:SWI 1 "memory_operand" "+m")] UNSPECV_XCHG))
(set (match_dup 1)
(plus:SWI (match_dup 1)
- (match_operand:SWI 2 "register_operand" "0")))
+ (match_operand:SWI 2 "nonmemory_operand" "0")))
(clobber (reg:CC FLAGS_REG))]
   "TARGET_XADD"
   "lock{%;} xadd{}\t{%0, %1|%1, %0}")
 
+(define_peephole2
+  [(set (match_operand:SWI 0 "register_operand" "")
+   (match_operand:SWI 2 "const_int_operand" ""))
+   (parallel [(set (match_dup 0)
+  (unspec_volatile:SWI
+[(match_operand:SWI 1 "memory_operand" "")] UNSPECV_XCHG))
+ (set (match_dup 1)
+  (plus:SWI (match_dup 1)
+(match_dup 0)))
+ (clobber (reg:CC FLAGS_REG))])
+   (set (reg:CCZ FLAGS_REG)
+   (compare:CCZ (match_dup 0)
+(match_operand:SWI 3 "const_int_operand" "")))]
+  "peep2_reg_dead_p (3, operands[0])
+   && (unsigned HOST_WIDE_INT) INTVAL (operands[2])
+  == -(unsigned HOST_WIDE_INT) INTVAL (operands[3])
+   && !reg_overlap_mentioned_p (operands[0], operands[1])"
+  [(parallel [(set (reg:CCZ FLAGS_REG)
+  (compare:CCZ (unspec_volatile:SWI [(match_dup 1)]
+UNSPECV_XCHG)
+   (match_dup 3)))
+ (set (match_dup 1)
+  (plus:SWI (match_dup 1)
+(match_dup 2)))])])
+
+(define_insn "*sync_old_add_cmp"
+  [(set (reg:CCZ FLAGS_REG)
+   (compare:CCZ (unspec_volatile:SWI
+  [(match_operand:SWI 0 "memory_operand" "+m")]
+  UNSPECV_XCHG)
+(match_operand:SWI 2 "const_int_operand" "i")))
+   (set (match_dup 0)
+   (plus:SWI (match_dup 0)
+ (match_operand:SWI 1 "const_int_operand" "i")))]
+  "(unsigned HOST_WIDE_INT) INTVAL (operands[1])
+   == -(unsigned HOST_WIDE_INT) INTVAL (operands[2])"
+{
+  if (TARGET_USE_INCDEC)
+{
+  if (operands[1] == const1_rtx)
+   return "lock{%;} inc{}\t%0";
+  if (operands[1] == constm1_rtx)
+   return "lock{%;} dec{}\t%0";
+}
+
+  if (x86_maybe_negate_const_int (&operands[1], mode))
+return "lock{%;} sub{}\t{%1, %0|%0, %1}";
+
+  return "lock{%;} add{}\t{%1, %0|%0, %1}";
+})
+
 ;; Recall that xchg implicitly sets LOCK#, so adding it again wastes space.
 (define_insn "sync_lock_test_and_set"
   [(set (match_operand:SWI 0 "register_operand" "=")

Jakub


Re: [patch, ARM] Fix PR42017, LR not used in leaf functions

2011-05-17 Thread Chung-Lin Tang
On 2011/5/13 04:26 PM, Richard Sandiford wrote:
> Richard Sandiford  writes:
>> Chung-Lin Tang  writes:
>>> My fix here simply adds 'reload_completed' as an additional condition
>>> for EPILOGUE_USES to return true for LR_REGNUM. I think this should be
>>> valid, as correct LR save/restoring is handled by the epilogue/prologue
>>> code; it should be safe for IRA to treat it as a normal call-used register.
>>
>> FWIW, epilogue_completed might be a more accurate choice.
> 
> I still stand by this, although I realise no other target does it.

Did a re-test of the patch just to be sure, as expected the test results
were also clear. Attached is the updated patch.

>> It seems a lot of other ports suffer from the same problem though.
>> I wonder which targets really do want to make a register live throughout
>> the function?  If none do, perhaps we should say that this macro is
>> only meaningful once the epilogue has been generated.
> 
> To answer my own question, I suppose VRSAVE is one.  So I was wrong
> about the target-independent "fix".
> 
> Richard

To rehash what I remember we discussed at LDS, such registers like
VRSAVE might be more appropriately placed in global regs. It looks like
EPILOGUE_USES could be more clarified in its use...

To Richard Earnshaw and Ramana, is the patch okay for trunk? This should
be a not-so-insignificant performance regression-fix/improvement.

Thanks,
Chung-Lin
Index: config/arm/arm.h
===
--- config/arm/arm.h(revision 173814)
+++ config/arm/arm.h(working copy)
@@ -1627,7 +1627,7 @@
frame.  */
 #define EXIT_IGNORE_STACK 1
 
-#define EPILOGUE_USES(REGNO) ((REGNO) == LR_REGNUM)
+#define EPILOGUE_USES(REGNO) (epilogue_completed && (REGNO) == LR_REGNUM)
 
 /* Determine if the epilogue should be output as RTL.
You should override this if you define FUNCTION_EXTRA_EPILOGUE.  */


[PATCH, PR45098]

2011-05-17 Thread Tom de Vries
Hi Zdenek,

I have a patch set for for PR45098.

01_object-size-target.patch
02_pr45098-rtx-cost-set.patch
03_pr45098-computation-cost.patch
04_pr45098-iv-init-cost.patch
05_pr45098-bound-cost.patch
06_pr45098-bound-cost.test.patch
07_pr45098-nowrap-limits-iterations.patch
08_pr45098-nowrap-limits-iterations.test.patch
09_pr45098-shift-add-cost.patch
10_pr45098-shift-add-cost.test.patch

I will sent out the patches individually.

The patch set has been bootstrapped and reg-tested on x86_64, and
reg-tested on ARM.

The effect of the patch set on examples is the removal of 1 iterator,
demonstrated below for '-Os -mthumb -march=armv7-a' on example tr4.

tr4.c:
...
extern void foo2 (short*);
void tr4 (short array[], int n)
{
  int i;
  if (n > 0)
for (i = 0; i < n; i++)
  foo2 (&array[i]);
}
...

tr4.s diff (left without, right with patch):
...
push{r4, r5, r6, lr}  | cmp r1, #0
subsr6, r1, #0| push{r3, r4, r5, lr}
ble .L1 ble .L1
mov r5, r0| mov r4, r0
movsr4, #0| add r5, r0, r1, lsl #1
.L3:.L3:
mov r0, r5| mov r0, r4
addsr4, r4, #1| addsr4, r4, #2
bl  foo2bl  foo2
addsr5, r5, #2| cmp r4, r5
cmp r4, r6<
bne .L3 bne .L3
.L1:.L1:
pop {r4, r5, r6, pc}  | pop {r3, r4, r5, pc}
...


The effect of the patch set on the test cases in terms of size is listed
in the following 2 tables.

---
-Os -thumb -mmarch=armv7-a
---
withoutwith   delta
---
tr1  32  30  -2
tr2  36  36   0
tr3  32  30  -2
tr4  26  26   0
tr5  20  20   0
---

---
-Os -mmarch=armv7-a
---
withoutwith   delta
---
tr1  60  52  -8
tr2  64  60  -4
tr3  60  52  -8
tr4  48  44  -4
tr5  36  32  -4
---


The size impact on several benchmarks is shown in the following table
(%, lower is better).

 nonepic
thumb1  thumb2  thumb1 thumb2
spec2000  99.999.999.9   99.9
eembc 99.9   100.099.9  100.1
dhrystone100.0   100.0   100.0  100.0
coremark  99.399.999.3  100.0

Thanks,
- Tom


[PATCH, PR45098, 1/10]

2011-05-17 Thread Tom de Vries
On 05/17/2011 09:10 AM, Tom de Vries wrote:
> Hi Zdenek,
> 
> I have a patch set for for PR45098.
> 
> 01_object-size-target.patch
> 02_pr45098-rtx-cost-set.patch
> 03_pr45098-computation-cost.patch
> 04_pr45098-iv-init-cost.patch
> 05_pr45098-bound-cost.patch
> 06_pr45098-bound-cost.test.patch
> 07_pr45098-nowrap-limits-iterations.patch
> 08_pr45098-nowrap-limits-iterations.test.patch
> 09_pr45098-shift-add-cost.patch
> 10_pr45098-shift-add-cost.test.patch
> 
> I will sent out the patches individually.

OK for trunk?

Thanks,
- Tom
2011-05-05  Tom de Vries  

	* lib/lib/scanasm.exp (object-size): Fix target selector handling.

Index: gcc/testsuite/lib/scanasm.exp
===
--- gcc/testsuite/lib/scanasm.exp (revision 173734)
+++ gcc/testsuite/lib/scanasm.exp (working copy)
@@ -330,7 +330,7 @@ proc object-size { args } {
 	return
 }
 if { [llength $args] >= 4 } {
-	switch [dg-process-target [lindex $args 1]] {
+	switch [dg-process-target [lindex $args 3]] {
 	"S" { }
 	"N" { return }
 	"F" { setup_xfail "*-*-*" }


Re: [PATCH] Misc debug info improvements

2011-05-17 Thread Jakub Jelinek
On Mon, May 16, 2011 at 02:13:22PM +0200, Jakub Jelinek wrote:
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

Actually, I've missed one regression in libjava testing on x86_64-linux,
FAIL: pr26390 -O3 compilation from source
The problem is when cselib_subst_to_values substs ENTRY_VALUE for
corresponding VALUE, we should treat ENTRY_VALUE like REG in
replace_expr_with_values, otherwise we can end up with var-tracking
noting equivalence between a VALUE and the same VALUE, which causes
ICE in set_slot_part.

Updated patch below, bootstrapped/regtested on x86_64-linux and i686-linux.
Ok?

2011-05-17  Jakub Jelinek  

* cselib.c (promote_debug_loc): Allow l->next non-NULL for
cselib_preserve_constants.
(cselib_lookup_1): If cselib_preserve_constants,
a new VALUE is being created for REG and there is a VALUE for the
same register in wider mode, add another loc with lowpart SUBREG of
the wider VALUE.
(cselib_subst_to_values): Handle ENTRY_VALUE.
* var-tracking.c (vt_expand_loc, vt_expand_loc_dummy): Increase
max_depth from 8 to 10.
(replace_expr_with_values): Return NULL for ENTRY_VALUE too.
* dwarf2out.c (convert_descriptor_to_signed): New function.
(mem_loc_descriptor) : Optimize using DW_OP_and
instead of two shifts.
(mem_loc_descriptor) : ZERO_EXTEND second argument to
the right mode if needed.
(mem_loc_descriptor) : For typed ops just use DW_OP_mod.
(mem_loc_descriptor) : Use
convert_descriptor_to_signed.
(mem_loc_descriptor) : Handle these rtls.

* gcc.dg/guality/bswaptest.c: New test.
* gcc.dg/guality/clztest.c: New test.
* gcc.dg/guality/ctztest.c: New test.
* gcc.dg/guality/rotatetest.c: New test.

--- gcc/cselib.c.jj 2011-05-02 18:39:28.0 +0200
+++ gcc/cselib.c2011-05-13 17:55:24.0 +0200
@@ -257,7 +257,7 @@ promote_debug_loc (struct elt_loc_list *
 {
   n_debug_values--;
   l->setting_insn = cselib_current_insn;
-  gcc_assert (!l->next);
+  gcc_assert (!l->next || cselib_preserve_constants);
 }
 }
 
@@ -1719,6 +1719,12 @@ cselib_subst_to_values (rtx x, enum mach
}
   return e->val_rtx;
 
+case ENTRY_VALUE:
+  e = cselib_lookup (x, GET_MODE (x), 0, memmode);
+  if (! e)
+   break;
+  return e->val_rtx;
+
 case CONST_DOUBLE:
 case CONST_VECTOR:
 case CONST_INT:
@@ -1843,6 +1849,43 @@ cselib_lookup_1 (rtx x, enum machine_mod
  used_regs[n_used_regs++] = i;
  REG_VALUES (i) = new_elt_list (REG_VALUES (i), NULL);
}
+  else if (cselib_preserve_constants
+  && GET_MODE_CLASS (mode) == MODE_INT)
+   {
+ /* During var-tracking, try harder to find equivalences
+for SUBREGs.  If a setter sets say a DImode register
+and user uses that register only in SImode, add a lowpart
+subreg location.  */
+ struct elt_list *lwider = NULL;
+ l = REG_VALUES (i);
+ if (l && l->elt == NULL)
+   l = l->next;
+ for (; l; l = l->next)
+   if (GET_MODE_CLASS (GET_MODE (l->elt->val_rtx)) == MODE_INT
+   && GET_MODE_SIZE (GET_MODE (l->elt->val_rtx))
+  > GET_MODE_SIZE (mode)
+   && (lwider == NULL
+   || GET_MODE_SIZE (GET_MODE (l->elt->val_rtx))
+  < GET_MODE_SIZE (GET_MODE (lwider->elt->val_rtx
+ {
+   struct elt_loc_list *el;
+   if (i < FIRST_PSEUDO_REGISTER
+   && hard_regno_nregs[i][GET_MODE (l->elt->val_rtx)] != 1)
+ continue;
+   for (el = l->elt->locs; el; el = el->next)
+ if (!REG_P (el->loc))
+   break;
+   if (el)
+ lwider = l;
+ }
+ if (lwider)
+   {
+ rtx sub = lowpart_subreg (mode, lwider->elt->val_rtx,
+   GET_MODE (lwider->elt->val_rtx));
+ if (sub)
+   e->locs->next = new_elt_loc_list (e->locs->next, sub);
+   }
+   }
   REG_VALUES (i)->next = new_elt_list (REG_VALUES (i)->next, e);
   slot = cselib_find_slot (x, e->hash, INSERT, memmode);
   *slot = e;
--- gcc/var-tracking.c.jj   2011-05-11 19:51:48.0 +0200
+++ gcc/var-tracking.c  2011-05-13 10:52:25.0 +0200
@@ -4836,7 +4836,7 @@ get_address_mode (rtx mem)
 static rtx
 replace_expr_with_values (rtx loc)
 {
-  if (REG_P (loc))
+  if (REG_P (loc) || GET_CODE (loc) == ENTRY_VALUE)
 return NULL;
   else if (MEM_P (loc))
 {
@@ -7415,7 +7415,7 @@ vt_expand_loc (rtx loc, htab_t vars, boo
   data.dummy = false;
   data.cur_loc_changed = false;
   data.ignore_cur_loc = ignore_cur_loc;
-  loc = cselib_expand_value_rtx_cb (loc, scratch_regs, 8,
+  loc = cselib_expand_value_rtx_cb 

[PATCH, PR45098, 2/10]

2011-05-17 Thread Tom de Vries
On 05/17/2011 09:10 AM, Tom de Vries wrote:
> Hi Zdenek,
> 
> I have a patch set for for PR45098.
> 
> 01_object-size-target.patch
> 02_pr45098-rtx-cost-set.patch
> 03_pr45098-computation-cost.patch
> 04_pr45098-iv-init-cost.patch
> 05_pr45098-bound-cost.patch
> 06_pr45098-bound-cost.test.patch
> 07_pr45098-nowrap-limits-iterations.patch
> 08_pr45098-nowrap-limits-iterations.test.patch
> 09_pr45098-shift-add-cost.patch
> 10_pr45098-shift-add-cost.test.patch
> 

OK for trunk?

Thanks,
- Tom
2011-05-05  Tom de Vries  

	PR target/45098
	* tree-ssa-loop-ivopts.c (seq_cost): Fix call to rtx_cost.

Index: gcc/tree-ssa-loop-ivopts.c
===
--- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -2745,7 +2745,7 @@ seq_cost (rtx seq, bool speed)
 {
   set = single_set (seq);
   if (set)
-	cost += rtx_cost (set, SET,speed);
+	cost += rtx_cost (SET_SRC (set), SET, speed);
   else
 	cost++;
 }


[PATCH, PR45098, 3/10]

2011-05-17 Thread Tom de Vries
On 05/17/2011 09:10 AM, Tom de Vries wrote:
> Hi Zdenek,
> 
> I have a patch set for for PR45098.
> 
> 01_object-size-target.patch
> 02_pr45098-rtx-cost-set.patch
> 03_pr45098-computation-cost.patch
> 04_pr45098-iv-init-cost.patch
> 05_pr45098-bound-cost.patch
> 06_pr45098-bound-cost.test.patch
> 07_pr45098-nowrap-limits-iterations.patch
> 08_pr45098-nowrap-limits-iterations.test.patch
> 09_pr45098-shift-add-cost.patch
> 10_pr45098-shift-add-cost.test.patch
> 
> I will sent out the patches individually.
> 

OK for trunk?

Thanks,
- Tom
2011-05-05  Tom de Vries  

	PR target/45098
	* tree-ssa-loop-ivopts.c (computation_cost): Prevent cost of 0.

Index: gcc/tree-ssa-loop-ivopts.c
===
--- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -2862,7 +2862,9 @@ computation_cost (tree expr, bool speed)
   default_rtl_profile ();
   node->frequency = real_frequency;
 
-  cost = seq_cost (seq, speed);
+  cost = (seq != NULL_RTX
+  ? seq_cost (seq, speed)
+  : (unsigned)rtx_cost (rslt, SET, speed));
   if (MEM_P (rslt))
 cost += address_cost (XEXP (rslt, 0), TYPE_MODE (type),
 			  TYPE_ADDR_SPACE (type), speed);


[PATCH, PR45098, 4/10]

2011-05-17 Thread Tom de Vries
On 05/17/2011 09:10 AM, Tom de Vries wrote:
> Hi Zdenek,
> 
> I have a patch set for for PR45098.
> 
> 01_object-size-target.patch
> 02_pr45098-rtx-cost-set.patch
> 03_pr45098-computation-cost.patch
> 04_pr45098-iv-init-cost.patch
> 05_pr45098-bound-cost.patch
> 06_pr45098-bound-cost.test.patch
> 07_pr45098-nowrap-limits-iterations.patch
> 08_pr45098-nowrap-limits-iterations.test.patch
> 09_pr45098-shift-add-cost.patch
> 10_pr45098-shift-add-cost.test.patch
> 
> I will sent out the patches individually.
> 

OK for trunk?

Thanks,
- Tom
2011-05-05  Tom de Vries  

	* tree-ssa-loop-ivopts.c (determine_iv_cost): Prevent
	cost_base.cost == 0.

Index: gcc/tree-ssa-loop-ivopts.c
===
--- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -4688,6 +4688,8 @@ determine_iv_cost (struct ivopts_data *d
 
   base = cand->iv->base;
   cost_base = force_var_cost (data, base, NULL);
+  if (cost_base.cost == 0)
+  cost_base.cost = COSTS_N_INSNS (1);
   cost_step = add_cost (TYPE_MODE (TREE_TYPE (base)), data->speed);
 
   cost = cost_step + adjust_setup_cost (data, cost_base.cost);


[PATCH, PR45098, 5/10]

2011-05-17 Thread Tom de Vries
On 05/17/2011 09:10 AM, Tom de Vries wrote:
> Hi Zdenek,
> 
> I have a patch set for for PR45098.
> 
> 01_object-size-target.patch
> 02_pr45098-rtx-cost-set.patch
> 03_pr45098-computation-cost.patch
> 04_pr45098-iv-init-cost.patch
> 05_pr45098-bound-cost.patch
> 06_pr45098-bound-cost.test.patch
> 07_pr45098-nowrap-limits-iterations.patch
> 08_pr45098-nowrap-limits-iterations.test.patch
> 09_pr45098-shift-add-cost.patch
> 10_pr45098-shift-add-cost.test.patch
> 
> I will sent out the patches individually.
> 

OK for trunk?

Thanks,
- Tom
2011-05-05  Tom de Vries  

	PR target/45098
	* tree-ssa-loop-ivopts.c (get_expr_id): Factored new function out of
	get_loop_invariant_expr_id.
	(get_loop_invariant_expr_id): Use get_expr_id.
	(parm_decl_cost): New function.
	(determine_use_iv_cost_condition): Use get_expr_id and parm_decl_cost.
	Improve bound cost estimation.  Use different inv_expr_id for elim and
	express cases.

Index: gcc/tree-ssa-loop-ivopts.c
===
--- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -3835,6 +3835,28 @@ compare_aff_trees (aff_tree *aff1, aff_t
   return true;
 }
 
+/* Stores EXPR in DATA->inv_expr_tab, and assigns it an inv_expr_id.  */
+
+static int
+get_expr_id (struct ivopts_data *data, tree expr)
+{
+  struct iv_inv_expr_ent ent;
+  struct iv_inv_expr_ent **slot;
+
+  ent.expr = expr;
+  ent.hash = iterative_hash_expr (expr, 0);
+  slot = (struct iv_inv_expr_ent **) htab_find_slot (data->inv_expr_tab,
+ &ent, INSERT);
+  if (*slot)
+return (*slot)->id;
+
+  *slot = XNEW (struct iv_inv_expr_ent);
+  (*slot)->expr = expr;
+  (*slot)->hash = ent.hash;
+  (*slot)->id = data->inv_expr_id++;
+  return (*slot)->id;
+}
+
 /* Returns the pseudo expr id if expression UBASE - RATIO * CBASE
requires a new compiler generated temporary.  Returns -1 otherwise.
ADDRESS_P is a flag indicating if the expression is for address
@@ -3847,8 +3869,6 @@ get_loop_invariant_expr_id (struct ivopt
 {
   aff_tree ubase_aff, cbase_aff;
   tree expr, ub, cb;
-  struct iv_inv_expr_ent ent;
-  struct iv_inv_expr_ent **slot;
 
   STRIP_NOPS (ubase);
   STRIP_NOPS (cbase);
@@ -3936,18 +3956,7 @@ get_loop_invariant_expr_id (struct ivopt
   aff_combination_scale (&cbase_aff, shwi_to_double_int (-1 * ratio));
   aff_combination_add (&ubase_aff, &cbase_aff);
   expr = aff_combination_to_tree (&ubase_aff);
-  ent.expr = expr;
-  ent.hash = iterative_hash_expr (expr, 0);
-  slot = (struct iv_inv_expr_ent **) htab_find_slot (data->inv_expr_tab,
- &ent, INSERT);
-  if (*slot)
-return (*slot)->id;
-
-  *slot = XNEW (struct iv_inv_expr_ent);
-  (*slot)->expr = expr;
-  (*slot)->hash = ent.hash;
-  (*slot)->id = data->inv_expr_id++;
-  return  (*slot)->id;
+  return get_expr_id (data, expr);
 }
 
 
@@ -4412,6 +4421,23 @@ may_eliminate_iv (struct ivopts_data *da
   return true;
 }
 
+ /* Calculates the cost of BOUND, if it is a PARM_DECL.  A PARM_DECL must
+be copied, if is is used in the loop body and DATA->body_includes_call.  */
+
+static int
+parm_decl_cost (struct ivopts_data *data, tree bound)
+{
+  tree sbound = bound;
+  STRIP_NOPS (sbound);
+
+  if (TREE_CODE (sbound) == SSA_NAME
+  && TREE_CODE (SSA_NAME_VAR (sbound)) == PARM_DECL
+  && gimple_nop_p (SSA_NAME_DEF_STMT (sbound))
+  && data->body_includes_call)
+return COSTS_N_INSNS (1);
+
+  return 0;
+}
 
 /* Determines cost of basing replacement of USE on CAND in a condition.  */
 
@@ -4422,9 +4448,9 @@ determine_use_iv_cost_condition (struct 
   tree bound = NULL_TREE;
   struct iv *cmp_iv;
   bitmap depends_on_elim = NULL, depends_on_express = NULL, depends_on;
-  comp_cost elim_cost, express_cost, cost;
+  comp_cost elim_cost, express_cost, cost, bound_cost;
   bool ok;
-  int inv_expr_id = -1;
+  int elim_inv_expr_id = -1, express_inv_expr_id = -1, inv_expr_id;
   tree *control_var, *bound_cst;
 
   /* Only consider real candidates.  */
@@ -4438,6 +4464,21 @@ determine_use_iv_cost_condition (struct 
   if (may_eliminate_iv (data, use, cand, &bound))
 {
   elim_cost = force_var_cost (data, bound, &depends_on_elim);
+  if (elim_cost.cost == 0)
+elim_cost.cost = parm_decl_cost (data, bound);
+  else if (TREE_CODE (bound) == INTEGER_CST)
+elim_cost.cost = 0;
+  /* If we replace a loop condition 'i < n' with 'p < base + n',
+	 depends_on_elim will have 'base' and 'n' set, which implies
+	 that both 'base' and 'n' will be live during the loop.	 More likely,
+	 'base + n' will be loop invariant, resulting in only one live value
+	 during the loop.  So in that case we clear depends_on_elim and set
+elim_inv_expr_id instead.  */
+  if (depends_on_elim && bitmap_count_bits (depends_on_elim) > 1)
+	{
+	  elim_inv_expr_id = get_expr_id (data, bound);
+	  bitmap_clear (depends_on_elim)

[PATCH, PR45098, 6/10]

2011-05-17 Thread Tom de Vries
On 05/17/2011 09:10 AM, Tom de Vries wrote:
> Hi Zdenek,
> 
> I have a patch set for for PR45098.
> 
> 01_object-size-target.patch
> 02_pr45098-rtx-cost-set.patch
> 03_pr45098-computation-cost.patch
> 04_pr45098-iv-init-cost.patch
> 05_pr45098-bound-cost.patch
> 06_pr45098-bound-cost.test.patch
> 07_pr45098-nowrap-limits-iterations.patch
> 08_pr45098-nowrap-limits-iterations.test.patch
> 09_pr45098-shift-add-cost.patch
> 10_pr45098-shift-add-cost.test.patch
> 
> I will sent out the patches individually.
> 

OK for trunk?

Thanks,
- Tom
2011-05-05  Tom de Vries  

	PR target/45098
	* gcc.target/arm/ivopts.c: New test.
	* gcc.target/arm/ivopts-2.c: New test.

Index: gcc/testsuite/gcc.target/arm/ivopts-2.c
===
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/ivopts-2.c (revision 0)
@@ -0,0 +1,18 @@
+/* { dg-do assemble } */
+/* { dg-options "-Os -mthumb -fdump-tree-ivopts -save-temps" } */
+
+extern void foo2 (short*);
+
+void
+tr4 (short array[], int n)
+{
+  int x;
+  if (n > 0)
+for (x = 0; x < n; x++)
+  foo2 (&array[x]);
+}
+
+/* { dg-final { scan-tree-dump-times "PHI  0)
+for (x = 0; x < n; x++)
+  array[x] = 0;
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <" 1 "ivopts"} } */
+/* { dg-final { object-size text <= 20 { target arm_thumb2_ok } } } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */


[PATCH, PR45098, 7/10]

2011-05-17 Thread Tom de Vries
On 05/17/2011 09:10 AM, Tom de Vries wrote:
> Hi Zdenek,
> 
> I have a patch set for for PR45098.
> 
> 01_object-size-target.patch
> 02_pr45098-rtx-cost-set.patch
> 03_pr45098-computation-cost.patch
> 04_pr45098-iv-init-cost.patch
> 05_pr45098-bound-cost.patch
> 06_pr45098-bound-cost.test.patch
> 07_pr45098-nowrap-limits-iterations.patch
> 08_pr45098-nowrap-limits-iterations.test.patch
> 09_pr45098-shift-add-cost.patch
> 10_pr45098-shift-add-cost.test.patch
> 
> I will sent out the patches individually.
> 

OK for trunk?

Thanks,
- Tom
2011-05-05  Tom de Vries  

	PR target/45098
	* tree-ssa-loop-ivopts.c (struct ivopts_data): Add fields
	max_iterations_p and max_iterations.
	(is_nonwrap_use, max_loop_iterations, set_max_iterations): New function.
	(may_eliminate_iv): Use max_iterations_p and max_iterations.
	(tree_ssa_iv_optimize_loop): Use set_max_iterations.

Index: gcc/tree-ssa-loop-ivopts.c
===
--- gcc/tree-ssa-loop-ivopts.c (revision 173355)
+++ gcc/tree-ssa-loop-ivopts.c (working copy)
@@ -291,6 +291,12 @@ struct ivopts_data
 
   /* Whether the loop body includes any function calls.  */
   bool body_includes_call;
+
+  /* Whether max_iterations is valid.  */
+  bool max_iterations_p;
+
+  /* Maximum number of iterations of current_loop.  */
+  double_int max_iterations;
 };
 
 /* An assignment of iv candidates to uses.  */
@@ -4319,6 +4325,108 @@ iv_elimination_compare (struct ivopts_da
   return (exit->flags & EDGE_TRUE_VALUE ? EQ_EXPR : NE_EXPR);
 }
 
+/* Determine if USE contains non-wrapping arithmetic.  */
+
+static bool
+is_nonwrap_use (struct ivopts_data *data, struct iv_use *use)
+{
+  gimple stmt = use->stmt;
+  tree var, ptr, ptr_type;
+
+  if (!is_gimple_assign (stmt))
+return false;
+
+  switch (gimple_assign_rhs_code (stmt))
+{
+case POINTER_PLUS_EXPR:
+  ptr = gimple_assign_rhs1 (stmt);
+  ptr_type = TREE_TYPE (ptr);
+  var = gimple_assign_rhs2 (stmt);
+  if (!expr_invariant_in_loop_p (data->current_loop, ptr))
+return false;
+  break;
+case ARRAY_REF:
+  ptr = TREE_OPERAND ((gimple_assign_rhs1 (stmt)), 0);
+  ptr_type = build_pointer_type (TREE_TYPE (gimple_assign_rhs1 (stmt)));
+  var = TREE_OPERAND ((gimple_assign_rhs1 (stmt)), 1);
+  break;
+default:
+  return false;
+}
+
+  if (!nowrap_type_p (ptr_type))
+return false;
+
+  if (TYPE_PRECISION (ptr_type) != TYPE_PRECISION (TREE_TYPE (var)))
+return false;
+
+  return true;
+}
+
+/* Attempt to infer maximum number of loop iterations of DATA->current_loop
+   from uses in loop containing non-wrapping arithmetic.  If successful,
+   return true, and return maximum iterations in MAX_NITER.  */
+
+static bool
+max_loop_iterations (struct ivopts_data *data, double_int *max_niter)
+{
+  struct iv_use *use;
+  struct iv *iv;
+  bool found = false;
+  double_int period;
+  gimple stmt;
+  unsigned i;
+
+  for (i = 0; i < n_iv_uses (data); i++)
+{
+  use = iv_use (data, i);
+
+  stmt = use->stmt;
+  if (!just_once_each_iteration_p (data->current_loop, gimple_bb (stmt)))
+	continue;
+
+  if (!is_nonwrap_use (data, use))
+continue;
+
+  iv = use->iv;
+  if (iv->step == NULL_TREE || TREE_CODE (iv->step) != INTEGER_CST)
+	continue;
+  period = tree_to_double_int (iv_period (iv));
+
+  if (found)
+*max_niter = double_int_umin (*max_niter, period);
+  else
+{
+  found = true;
+  *max_niter = period;
+}
+}
+
+  return found;
+}
+
+/* Initializes DATA->max_iterations and DATA->max_iterations_p.  */
+
+static void
+set_max_iterations (struct ivopts_data *data)
+{
+  double_int max_niter, max_niter2;
+  bool estimate1, estimate2;
+
+  data->max_iterations_p = false;
+  estimate1 = estimated_loop_iterations (data->current_loop, true, &max_niter);
+  estimate2 = max_loop_iterations (data, &max_niter2);
+  if (!(estimate1 || estimate2))
+return;
+  if (estimate1 && estimate2)
+data->max_iterations = double_int_umin (max_niter, max_niter2);
+  else if (estimate1)
+data->max_iterations = max_niter;
+  else
+data->max_iterations = max_niter2;
+  data->max_iterations_p = true;
+}
+
 /* Check whether it is possible to express the condition in USE by comparison
of candidate CAND.  If so, store the value compared with to BOUND.  */
 
@@ -4391,10 +4499,10 @@ may_eliminate_iv (struct ivopts_data *da
   /* See if we can take advantage of infered loop bound information.  */
   if (loop_only_exit_p (loop, exit))
 {
-  if (!estimated_loop_iterations (loop, true, &max_niter))
+  if (!data->max_iterations_p)
 return false;
   /* The loop bound is already adjusted by adding 1.  */
-  if (double_int_ucmp (max_niter, period_value) > 0)
+  if (double_int_ucmp (data->max_iterations, period_value) > 0)
 re

[PATCH, PR45098, 8/10]

2011-05-17 Thread Tom de Vries
On 05/17/2011 09:10 AM, Tom de Vries wrote:
> Hi Zdenek,
> 
> I have a patch set for for PR45098.
> 
> 01_object-size-target.patch
> 02_pr45098-rtx-cost-set.patch
> 03_pr45098-computation-cost.patch
> 04_pr45098-iv-init-cost.patch
> 05_pr45098-bound-cost.patch
> 06_pr45098-bound-cost.test.patch
> 07_pr45098-nowrap-limits-iterations.patch
> 08_pr45098-nowrap-limits-iterations.test.patch
> 09_pr45098-shift-add-cost.patch
> 10_pr45098-shift-add-cost.test.patch
> 
> I will sent out the patches individually.
> 

OK for trunk?

Thanks,
- Tom
2011-05-05  Tom de Vries  

	PR target/45098
	* gcc.target/arm/ivopts-3.c: New test.
	* gcc.target/arm/ivopts-4.c: New test.
	* gcc.target/arm/ivopts-5.c: New test.
	* gcc.dg/tree-ssa/ivopt_infer_2.c: Adapt test.

Index: gcc/testsuite/gcc.target/arm/ivopts-3.c
===
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/ivopts-3.c (revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do assemble } */
+/* { dg-options "-Os -mthumb -fdump-tree-ivopts -save-temps" } */
+
+extern unsigned int foo2 (short*) __attribute__((pure));
+
+unsigned int
+tr3 (short array[], unsigned int n)
+{
+  unsigned sum = 0;
+  unsigned int x;
+  for (x = 0; x < n; x++)
+sum += foo2 (&array[x]);
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump-times "PHI  0)
+for (x = 0; x < n; x++)
+  sum += foo (&array[x]);
+  return sum;
+}
+
+/* { dg-final { scan-tree-dump-times "PHI 

[PATCH, PR45098, 9/10]

2011-05-17 Thread Tom de Vries
On 05/17/2011 09:10 AM, Tom de Vries wrote:
> Hi Zdenek,
> 
> I have a patch set for for PR45098.
> 
> 01_object-size-target.patch
> 02_pr45098-rtx-cost-set.patch
> 03_pr45098-computation-cost.patch
> 04_pr45098-iv-init-cost.patch
> 05_pr45098-bound-cost.patch
> 06_pr45098-bound-cost.test.patch
> 07_pr45098-nowrap-limits-iterations.patch
> 08_pr45098-nowrap-limits-iterations.test.patch
> 09_pr45098-shift-add-cost.patch
> 10_pr45098-shift-add-cost.test.patch
> 
> I will sent out the patches individually.
> 

OK for trunk?

Thanks,
- Tom
2011-05-05  Tom de Vries  

	PR target/45098
	* tree-ssa-loop-ivopts.c: Include expmed.h.
	(get_shiftadd_cost): New function.
	(force_expr_to_var_cost): Use get_shiftadd_cost.

Index: gcc/tree-ssa-loop-ivopts.c
===
--- gcc/tree-ssa-loop-ivopts.c	(revision 173380)
+++ gcc/tree-ssa-loop-ivopts.c	(working copy)
@@ -92,6 +92,12 @@ along with GCC; see the file COPYING3.  
 #include "tree-inline.h"
 #include "tree-ssa-propagate.h"
 
+/* FIXME: add_cost and zero_cost defined in exprmed.h conflict with local uses.
+ */
+#include "expmed.h"
+#undef add_cost
+#undef zero_cost
+
 /* FIXME: Expressions are expanded to RTL in this pass to determine the
cost of different addressing modes.  This should be moved to a TBD
interface between the GIMPLE and RTL worlds.  */
@@ -3504,6 +3510,37 @@ get_address_cost (bool symbol_present, b
   return new_cost (cost + acost, complexity);
 }
 
+ /* Calculate the SPEED or size cost of shiftadd EXPR in MODE.  MULT is the
+the EXPR operand holding the shift.  COST0 and COST1 are the costs for
+calculating the operands of EXPR.  Returns true if successful, and returns
+the cost in COST.  */
+
+static bool
+get_shiftadd_cost (tree expr, enum machine_mode mode, comp_cost cost0,
+   comp_cost cost1, tree mult, bool speed, comp_cost *cost)
+{
+  comp_cost res;
+  tree op1 = TREE_OPERAND (expr, 1);
+  tree cst = TREE_OPERAND (mult, 1);
+  int m = exact_log2 (int_cst_value (cst));
+  int maxm = MIN (BITS_PER_WORD, GET_MODE_BITSIZE (mode));
+  int sa_cost;
+
+  if (!(m >= 0 && m < maxm))
+return false;
+
+  sa_cost = (TREE_CODE (expr) != MINUS_EXPR
+ ? shiftadd_cost[speed][mode][m]
+ : (mult == op1
+? shiftsub1_cost[speed][mode][m]
+: shiftsub0_cost[speed][mode][m]));
+  res = new_cost (sa_cost, 0);
+  res = add_costs (res, mult == op1 ? cost0 : cost1);
+
+  *cost = res;
+  return true;
+}
+
 /* Estimates cost of forcing expression EXPR into a variable.  */
 
 static comp_cost
@@ -3629,6 +3666,21 @@ force_expr_to_var_cost (tree expr, bool 
 case MINUS_EXPR:
 case NEGATE_EXPR:
   cost = new_cost (add_cost (mode, speed), 0);
+  if (TREE_CODE (expr) != NEGATE_EXPR)
+{
+  tree mult = NULL_TREE;
+  comp_cost sa_cost;
+  if (TREE_CODE (op1) == MULT_EXPR)
+mult = op1;
+  else if (TREE_CODE (op0) == MULT_EXPR)
+mult = op0;
+
+  if (mult != NULL_TREE
+  && TREE_CODE (TREE_OPERAND (mult, 1)) == INTEGER_CST
+  && get_shiftadd_cost (expr, mode, cost0, cost1, mult, speed,
+&sa_cost))
+return sa_cost;
+}
   break;
 
 case MULT_EXPR:


[PATCH, PR45098, 10/10]

2011-05-17 Thread Tom de Vries
On 05/17/2011 09:10 AM, Tom de Vries wrote:
> Hi Zdenek,
> 
> I have a patch set for for PR45098.
> 
> 01_object-size-target.patch
> 02_pr45098-rtx-cost-set.patch
> 03_pr45098-computation-cost.patch
> 04_pr45098-iv-init-cost.patch
> 05_pr45098-bound-cost.patch
> 06_pr45098-bound-cost.test.patch
> 07_pr45098-nowrap-limits-iterations.patch
> 08_pr45098-nowrap-limits-iterations.test.patch
> 09_pr45098-shift-add-cost.patch
> 10_pr45098-shift-add-cost.test.patch
> 
> I will sent out the patches individually.
> 

OK for trunk?

Thanks,
- Tom
2011-05-05  Tom de Vries  

	PR target/45098
	* gcc.target/arm/ivopts-6.c: New test.

Index: gcc/testsuite/gcc.target/arm/ivopts-6.c
===
--- /dev/null (new file)
+++ gcc/testsuite/gcc.target/arm/ivopts-6.c (revision 0)
@@ -0,0 +1,15 @@
+/* { dg-do assemble } */
+/* { dg-options "-Os -fdump-tree-ivopts -save-temps -marm" } */
+
+void
+tr5 (short array[], int n)
+{
+  int x;
+  if (n > 0)
+for (x = 0; x < n; x++)
+  array[x] = 0;
+}
+
+/* { dg-final { scan-tree-dump-times "PHI <" 1 "ivopts"} } */
+/* { dg-final { object-size text <= 32 } } */
+/* { dg-final { cleanup-tree-dump "ivopts" } } */


Re: [PATCH] Optimize __sync_fetch_and_add (x, -N) == N and __sync_add_and_fetch (x, N) == 0 (PR target/48986)

2011-05-17 Thread Uros Bizjak
On Tue, May 17, 2011 at 9:02 AM, Jakub Jelinek  wrote:
> Hi!
>
> This patch optimizes using peephole2 __sync_fetch_and_add (x, -N) == N
> and __sync_add_and_fetch (x, N) == 0 by just doing lock {add,sub,inc,dec}
> and testing flags, instead of lock xadd plus comparison.
> The sync_old_add predicate change makes it possible to optimize
> __sync_add_and_fetch with constant second argument to same
> code as __sync_fetch_and_add.  Doing it in peephole2 has a disadvantage
> though, both that the 3 instructions need to be consecutive and
> e.g. that xadd insn has to be supported by the CPU.  Other alternative
> would be to come up with a new bool builtin that would represent the
> whole __sync_fetch_and_add (x, -N) == N operation (perhaps with dot or space
> in its name to make it inaccessible), try to match it during some folding
> and expand it using special optab.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk
> this way?
>
> 2011-05-16  Jakub Jelinek  
>
>        PR target/48986
>        * config/i386/sync.md (sync_old_add): Relax operand 2
>        predicate to allow CONST_INT.
>        (*sync_old_add_cmp): New insn and peephole2 for it.

OK, but please add a comment explaining why we have matched constraint
with non-matched predicate. These operands are otherwise targets for
cleanups ;)

Also, a comment explaining the purpose of the added peephole would be nice.

IMO, the change to sync_old_add is also appropriate to release branches.

Thanks,
Uros.


Re: Don't let search bots look at buglist.cgi

2011-05-17 Thread Axel Freyn
On Mon, May 16, 2011 at 10:27:44PM -0700, Ian Lance Taylor wrote:
> On Mon, May 16, 2011 at 6:42 AM, Richard Guenther
>  wrote:
> >>>
> >>> httpd being in the top-10 always, fiddling with bugzilla URLs?
> >>> (Note, I don't have access to gcc.gnu.org, I'm relaying info from multiple
> >>> instances of discussion on #gcc and richi poking on it; that said, it
> >>> still might not be web crawlers, that's right, but I'll happily accept
> >>> _any_ load improvement on gcc.gnu.org, how unfounded they might seem)
> 
> I think that simply blocking buglist.cgi has dropped bugzilla off the
> immediate radar.
> It also seems to have lowered the load, although I'm not sure if we
> are still keeping
> historical data.
> 
> 
> > I for example see also
> >
> > 66.249.71.59 - - [16/May/2011:13:37:58 +] "GET
> > /viewcvs?view=revision&revision=169814 HTTP/1.1" 200 1334 "-"
> > "Mozilla/5.0 (compatible; Googlebot/2.1;
> > +http://www.google.com/bot.html)" (35%) 2060117us
> >
> > and viewvc is certainly even worse (from an I/O perspecive).  I thought
> > we blocked all bot traffic from the viewvc stuff ...
> 
> This is only happening at top level.  I committed this patch to fix this.
Probably you know it much better than me, but wouldn't it be a
possibility to only allow some of google crawlers? (if all try to crawl
bugzilla)
As I read
http://www.google.com/support/webmasters/bin/answer.py?hl=en&answer=1061943
it would be possible to block the Crawlers Googlebot-Mobile,
Mediapartners-Google and AdsBot-Google, (which seem to be independent
Crawlers?) while allowing the main Googlebot (Well, I don't know how
often which crawler appears how often on bugzilla...)

Axel



Commit: RX: Add peepholes for move followed by compare

2011-05-17 Thread Nick Clifton
Hi Guys,

  I am applying the patch below to add a peephole optimization to the RX
  backend.  It was suggested by Kazuhio Inaoka at Renesas Japan, and
  adapted by me to use peephole2 system.  It finds a register move
  followed by a comparison of the moved register against zero and
  replaces the two instructions with a single addition instruction.  The
  addition does not actually do anything since the value being added is
  zero, but as a side effect it moves the register and performs the
  comparison.

Cheers
  Nick

gcc/ChangeLog
2011-05-17  Kazuhio Inaoka  
Nick Clifton  

* config/rx/rx.md: Add peepholes to match a register move followed
by a comparison of the moved register.  Replace these with an
addition of zero that does both actions in one instruction.

Index: gcc/config/rx/rx.md
===
--- gcc/config/rx/rx.md (revision 173815)
+++ gcc/config/rx/rx.md (working copy)
@@ -904,6 +904,39 @@
(set_attr "length"   "3,4,5,6,7,6")]
 )
 
+;; Peepholes to match:
+;;   (set (reg A) (reg B))
+;;   (set (CC) (compare:CC (reg A/reg B) (const_int 0)))
+;; and replace them with the addsi3_flags pattern, using an add
+;; of zero to copy the register and set the condition code bits.
+(define_peephole2
+  [(set (match_operand:SI 0 "register_operand")
+(match_operand:SI 1 "register_operand"))
+   (set (reg:CC CC_REG)
+(compare:CC (match_dup 0)
+(const_int 0)))]
+  ""
+  [(parallel [(set (match_dup 0)
+  (plus:SI (match_dup 1) (const_int 0)))
+ (set (reg:CC_ZSC CC_REG)
+  (compare:CC_ZSC (plus:SI (match_dup 1) (const_int 0))
+  (const_int 0)))])]
+)
+
+(define_peephole2
+  [(set (match_operand:SI 0 "register_operand")
+(match_operand:SI 1 "register_operand"))
+   (set (reg:CC CC_REG)
+(compare:CC (match_dup 1)
+(const_int 0)))]
+  ""
+  [(parallel [(set (match_dup 0)
+  (plus:SI (match_dup 1) (const_int 0)))
+ (set (reg:CC_ZSC CC_REG)
+  (compare:CC_ZSC (plus:SI (match_dup 1) (const_int 0))
+  (const_int 0)))])]
+)
+
 (define_expand "adddi3"
   [(set (match_operand:DI  0 "register_operand")
(plus:DI (match_operand:DI 1 "register_operand")


Re: [PATCH][?/n] Cleanup LTO type merging

2011-05-17 Thread Richard Guenther
On Mon, 16 May 2011, Jan Hubicka wrote:

> > 
> > I've seen us merge different named structs which happen to reside
> > on the same variant list.  That's bogus, not only because we are
> > adjusting TYPE_MAIN_VARIANT during incremental type-merging and
> > fixup, so computing a persistent hash by looking at it looks
> > fishy as well.
> 
> Hi,
> as reported on IRC earlier, I get the segfault while building libxul
> duea to infinite recursion problem.
> 
> I now however also get a lot more of the following ICEs:
> In function '__unguarded_insertion_sort':
> lto1: internal compiler error: in splice_child_die, at dwarf2out.c:8274
> previously it reported once during Mozilla build (and I put testcase into
> bugzilla), now it reproduces on many libraries. I did not see this problem
> when applying only the SCC hasing change.

This change causes us to preserve more TYPE_DECLs I think, so we might
run more often into pre-existing debuginfo issues.  Previously most
of the types were merged into their nameless variant which probably
didn't get output into debug info.

Do you by chance have small testcases for your problems? ;)

Richard.

> perhaps another unrelated thing, but I now also get undefined symbols during
> the builds.
> 
> Honza
> > 
> > Bootstrapped and tested on x86_64-unknown-linux-gnu.
> > 
> > Richard.
> > 
> > 2011-05-16  Richard Guenther  
> > 
> > * gimple.c (gimple_types_compatible_p_1): Use names of the
> > type itself, not its main variant.
> > (iterative_hash_gimple_type): Likewise.
> > 
> > Index: gcc/gimple.c
> > ===
> > *** gcc/gimple.c(revision 173794)
> > --- gcc/gimple.c(working copy)
> > *** gimple_types_compatible_p_1 (tree t1, tr
> > *** 3817,3824 
> > tree f1, f2;
> >   
> > /* The struct tags shall compare equal.  */
> > !   if (!compare_type_names_p (TYPE_MAIN_VARIANT (t1),
> > !  TYPE_MAIN_VARIANT (t2), false))
> >   goto different_types;
> >   
> > /* For aggregate types, all the fields must be the same.  */
> > --- 3817,3823 
> > tree f1, f2;
> >   
> > /* The struct tags shall compare equal.  */
> > !   if (!compare_type_names_p (t1, t2, false))
> >   goto different_types;
> >   
> > /* For aggregate types, all the fields must be the same.  */
> > *** iterative_hash_gimple_type (tree type, h
> > *** 4193,4199 
> > unsigned nf;
> > tree f;
> >   
> > !   v = iterative_hash_name (TYPE_NAME (TYPE_MAIN_VARIANT (type)), v);
> >   
> > for (f = TYPE_FIELDS (type), nf = 0; f; f = TREE_CHAIN (f))
> > {
> > --- 4192,4198 
> > unsigned nf;
> > tree f;
> >   
> > !   v = iterative_hash_name (TYPE_NAME (type), v);
> >   
> > for (f = TYPE_FIELDS (type), nf = 0; f; f = TREE_CHAIN (f))
> > {
> 
> 

-- 
Richard Guenther 
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer

Commit: RX: Add peepholes to remove redundant extensions

2011-05-17 Thread Nick Clifton
Hi Guys,

  I am applying the patch below to add a couple of peephole
  optimizations to the RX backend.  It seems that GCC does not cope very
  well with the RX's ability to perform either sign-extending loads or
  zero-extending loads and so sometimes it can generate an extending
  load followed by a register to register extension.  The peepholes
  match these cases and delete the unnecessary extension where possible.

Cheers
  Nick

gcc/ChangeLog
2011-05-17  Nick Clifton  

* config/rx/rx.md: Add peephole to remove redundant extensions
after loads.

Index: gcc/config/rx/rx.md
===
--- gcc/config/rx/rx.md (revision 173819)
+++ gcc/config/rx/rx.md (working copy)
@@ -1701,6 +1701,35 @@
(extend_types:SI (match_dup 1]
 )
 
+;; Convert:
+;;   (set (reg1) (sign_extend (mem))
+;;   (set (reg2) (zero_extend (reg1))
+;; into
+;;   (set (reg2) (zero_extend (mem)))
+(define_peephole2
+  [(set (match_operand:SI  0 "register_operand")
+   (sign_extend:SI (match_operand:small_int_modes 1 "memory_operand")))
+   (set (match_operand:SI  2 "register_operand")
+   (zero_extend:SI (match_operand:small_int_modes 3 "register_operand")))]
+  "REGNO (operands[0]) == REGNO (operands[3])
+   && (REGNO (operands[0]) == REGNO (operands[2])
+   || peep2_regno_dead_p (2, REGNO (operands[0])))"
+  [(set (match_dup 2)
+   (zero_extend:SI (match_dup 1)))]
+)
+
+;; Remove the redundant sign extension from:
+;;   (set (reg) (extend (mem)))
+;;   (set (reg) (extend (reg)))
+(define_peephole2
+  [(set (match_operand:SI   0 "register_operand")
+   (extend_types:SI (match_operand:small_int_modes 1 "memory_operand")))
+   (set (match_dup 0)
+   (extend_types:SI (match_operand:small_int_modes 2 "register_operand")))]
+  "REGNO (operands[0]) == REGNO (operands[2])"
+  [(set (match_dup 0) (extend_types:SI (match_dup 1)))]
+)
+
 (define_insn "comparesi3_"
   [(set (reg:CC CC_REG)
(compare:CC (match_operand:SI   0 
"register_operand" "=r")


Commit: RX: Fix predicates for restricted memory patterns

2011-05-17 Thread Nick Clifton
Hi Guys,

  I am applying the patch below to fix a minor discrepancy in the rx.md
  file.  Several patterns can only use restricted memory addresses.
  They have the correct Q constraint, but they were using the more
  permissive memory_operand predicate.  The patch fixes these patterns
  by replacing memory_operand with rx_restricted_mem_operand.

Cheers
  Nick

gcc/ChangeLog
2011-05-17  Nick Clifton  

* config/rx/rx.md (bitset_in_memory): Use
rx_restricted_mem_operand.
(bitinvert_in_memory): Likewise.
(bitclr_in_memory): Likewise.

Index: gcc/config/rx/rx.md
===
--- gcc/config/rx/rx.md (revision 173820)
+++ gcc/config/rx/rx.md (working copy)
@@ -1831,7 +1831,7 @@
 )
 
 (define_insn "*bitset_in_memory"
-  [(set (match_operand:QI0 "memory_operand" "+Q")
+  [(set (match_operand:QI0 "rx_restricted_mem_operand" 
"+Q")
(ior:QI (ashift:QI (const_int 1)
   (match_operand:QI 1 "nonmemory_operand" "ri"))
(match_dup 0)))]
@@ -1852,7 +1852,7 @@
 )
 
 (define_insn "*bitinvert_in_memory"
-  [(set (match_operand:QI 0 "memory_operand" "+Q")
+  [(set (match_operand:QI 0 "rx_restricted_mem_operand" "+Q")
(xor:QI (ashift:QI (const_int 1)
   (match_operand:QI 1 "nonmemory_operand" "ri"))
(match_dup 0)))]
@@ -1875,7 +1875,7 @@
 )
 
 (define_insn "*bitclr_in_memory"
-  [(set (match_operand:QI 0 "memory_operand" "+Q")
+  [(set (match_operand:QI 0 "rx_restricted_mem_operand" "+Q")
(and:QI (not:QI
  (ashift:QI
(const_int 1)


Commit: RX: Include cost of register moving in the cost of register loading.

2011-05-17 Thread Nick Clifton
Hi Guys,

  I am applying the patch below to fix a bug with the
  rx_memory_move_cost function.  The problem was that the costs are
  meant to be relative to the cost of moving a value between registers,
  but the existing definition was making stores cheaper than moves, and
  loads the same cost as moves.  Thus gcc was sometimes choosing to store
  values in memory when actually it was better to keep them in memory.

  The patch fixes the problem by adding in the register move cost to the
  memory move cost.  It also removes the call to
  memory_move_secondary_cost since there is no secondary cost.

Cheers
  Nick

gcc/ChangeLog
2011-05-17  Nick Clifton  

* config/rx/rx.c (rx_memory_move_cost): Include cost of register
moves.

Index: gcc/config/rx/rx.c
===
--- gcc/config/rx/rx.c  (revision 173815)
+++ gcc/config/rx/rx.c  (working copy)
@@ -2638,7 +2638,7 @@
 static int
 rx_memory_move_cost (enum machine_mode mode, reg_class_t regclass, bool in)
 {
-  return (in ? 2 : 0) + memory_move_secondary_cost (mode, regclass, in);
+  return (in ? 2 : 0) + REGISTER_MOVE_COST (mode, regclass, regclass);
 }
 
 /* Convert a CC_MODE to the set of flags that it represents.  */
  


Re: [PATCH] Fix PR46728 (move pow/powi folds to tree phases)

2011-05-17 Thread Richard Guenther
On Mon, May 16, 2011 at 7:30 PM, William J. Schmidt
 wrote:
> Richi, thank you for the detailed review!
>
> I'll plan to move the power-series expansion into the existing IL walk
> during pass_cse_sincos.  As part of this, I'll move
> tree_expand_builtin_powi and its subfunctions from builtins.c into
> tree-ssa-math-opts.c.  I'll submit this as a separate patch.
>
> I will also stop attempting to make code generation match completely at
> -O0.  If there are tests in the test suite that fail only at -O0 due to
> these changes, I'll modify the tests to require -O1 or higher.
>
> I understand that you'd prefer that I leave the existing
> canonicalization folds in place, and only un-canonicalize them during my
> new pass (now, during cse_sincos).  Actually, that was my first approach
> to this issue.  The problem that I ran into is that the various folds
> are not performed just by the front end, but can pop up later, after my
> pass is done.  In particular, pass_fold_builtins will undo my changes,
> turning expressions involving roots back into expressions involving
> pow/powi.  It wasn't clear to me whether the folds could kick in
> elsewhere as well, so I took the approach of shutting them down.  I see
> now that this does lose some optimizations such as
> pow(sqrt(cbrx(x)),6.0), as you pointed out.

Yeah, it's always a delicate balance between canonicalization
and optimal form for further optimization.  Did you really see
sqrt(cbrt(x)) being transformed back to pow()?  I would doubt that,
as on gimple the foldings that require two function calls to match
shouldn't trigger.  Or do you see sqrt(x) turned into pow(x,0.5)?
I see that the vectorizer for example handles both pow(x,0.5) and
pow(x,2), so indeed that might happen.

I think what we might want to do is limit what the generic
gimple fold_stmt folding does to function calls, also to avoid
building regular generic call statements again.  But that might
be a bigger project and certainly should be done separately.

So I'd say don't worry about this issue for the initial patch but
instead deal with it separately.

We also repeatedly thought about whether canonicalizing
everything to pow is a good idea or not, especially our
canonicalizing of x * x to pow (x, 2) leads to interesting
effects in some cases, as several passes do not handle
function calls very well.  So I also thought about introducing
a POW_EXPR tree code that would be easier in this
regard and would be a more IL friendly canonical form
of the power-related functions.

> Should I attempt to leave the folds in place, and screen out the
> particular cases that are causing trouble in pass_fold_builtins?  Or is
> it too fragile to try to catch all places where folds occur?  If there's
> a flag that indicates parsing is complete, I suppose I could disable
> individual folds once we're into the optimizer.  I'd appreciate your
> guidance.

Indeed restricting canonicalization to earlier passes would be the
way to go I think.  I will think of the best way to achieve this.

Richard.

> Thanks,
> Bill
>
>
>


[PATCH][?/n] LTO type merging cleanup

2011-05-17 Thread Richard Guenther

This avoids the odd cases where gimple_register_canonical_type could
end up running in cycles.  I was able to reproduce this issue
with an intermediate tree and LTO bootstrap.  While the following
patch is not the "real" fix (that one runs into a known cache-preloading
issue again ...) it certainly makes a lot of sense and avoids
the issue by design.

LTO bootstrapped on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2011-05-17  Richard Guenther  

* gimple.c (gimple_register_canonical_type): Use the main-variant
leader for computing the canonical type.

Index: gcc/gimple.c
===
*** gcc/gimple.c(revision 173825)
--- gcc/gimple.c(working copy)
*** gimple_register_canonical_type (tree t)
*** 4856,4874 
if (TYPE_CANONICAL (t))
  return TYPE_CANONICAL (t);
  
!   /* Always register the type itself first so that if it turns out
!  to be the canonical type it will be the one we merge to as well.  */
!   t = gimple_register_type (t);
  
if (TYPE_CANONICAL (t))
  return TYPE_CANONICAL (t);
  
-   /* Always register the main variant first.  This is important so we
-  pick up the non-typedef variants as canonical, otherwise we'll end
-  up taking typedef ids for structure tags during comparison.  */
-   if (TYPE_MAIN_VARIANT (t) != t)
- gimple_register_canonical_type (TYPE_MAIN_VARIANT (t));
- 
if (gimple_canonical_types == NULL)
  gimple_canonical_types = htab_create_ggc (16381, 
gimple_canonical_type_hash,
  gimple_canonical_type_eq, 0);
--- 4856,4869 
if (TYPE_CANONICAL (t))
  return TYPE_CANONICAL (t);
  
!   /* Use the leader of our main variant for determining our canonical
!  type.  The main variant leader is a type that will always
!  prevail.  */
!   t = gimple_register_type (TYPE_MAIN_VARIANT (t));
  
if (TYPE_CANONICAL (t))
  return TYPE_CANONICAL (t);
  
if (gimple_canonical_types == NULL)
  gimple_canonical_types = htab_create_ggc (16381, 
gimple_canonical_type_hash,
  gimple_canonical_type_eq, 0);


Re: [PATCH][?/n] Cleanup LTO type merging

2011-05-17 Thread Richard Guenther
On Mon, 16 May 2011, H.J. Lu wrote:

> On Mon, May 16, 2011 at 7:17 AM, Richard Guenther  wrote:
> >
> > The following patch improves hashing types by re-instantiating the
> > patch that makes us visit aggregate target types of pointers and
> > function return and argument types.  This halves the collision
> > rate on the type hash table for a linux-kernel build and improves
> > WPA compile-time from 3mins to 1mins and reduces memory usage by
> > 1GB for that testcase.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC2k6
> > build-tested.
> >
> > Richard.
> >
> > (patch is reversed)
> >
> > 2011-05-16  Richard Guenther  
> >
> >        * gimple.c (iterative_hash_gimple_type): Re-instantiate
> >        change to always visit pointer target and function result
> >        and argument types.
> >
> 
> This caused:
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013

I have reverted the patch for now.

Richard.

Re: Reintroduce -mflat option on SPARC

2011-05-17 Thread Eric Botcazou
> Right, -mflat option should only be for 32-bit SPARC target.

OK, let's keep it that way for now.

Another question: why does the model hijack %i7 to use it as frame pointer, 
instead of just using %fp?  AFAICS both are kept as fixed registers by the 
code so the model seems to be wasting 1 register (2 without frame pointer).

-- 
Eric Botcazou


Re: Don't let search bots look at buglist.cgi

2011-05-17 Thread Michael Matz
Hi,

On Mon, 16 May 2011, Ian Lance Taylor wrote:

> >>> httpd being in the top-10 always, fiddling with bugzilla URLs? 
> >>> (Note, I don't have access to gcc.gnu.org, I'm relaying info from 
> >>> multiple instances of discussion on #gcc and richi poking on it; 
> >>> that said, it still might not be web crawlers, that's right, but 
> >>> I'll happily accept
> >>> _any_ load improvement on gcc.gnu.org, how unfounded they might seem)
> 
> I think that simply blocking buglist.cgi has dropped bugzilla off the 
> immediate radar. It also seems to have lowered the load, although I'm 
> not sure if we are still keeping historical data.

Btw. FWIW, I had a quick look at one of the httpd log files, and in seven 
hours on last Saturday (from 5:30 to 12:30), there were overall 435203 GET 
requests, and 391319 of them came from our own MnoGoSearch engine, that's 
90%.  Granted many are then in fact 304 (not modified) responses, but 
still, perhaps the eagerness of our own crawler can be turned down a bit.


Ciao,
Michael.


Re: [PATCH][?/n] Cleanup LTO type merging

2011-05-17 Thread Jan Hubicka
> On Mon, 16 May 2011, Jan Hubicka wrote:
> 
> > > 
> > > I've seen us merge different named structs which happen to reside
> > > on the same variant list.  That's bogus, not only because we are
> > > adjusting TYPE_MAIN_VARIANT during incremental type-merging and
> > > fixup, so computing a persistent hash by looking at it looks
> > > fishy as well.
> > 
> > Hi,
> > as reported on IRC earlier, I get the segfault while building libxul
> > duea to infinite recursion problem.
> > 
> > I now however also get a lot more of the following ICEs:
> > In function '__unguarded_insertion_sort':
> > lto1: internal compiler error: in splice_child_die, at dwarf2out.c:8274
> > previously it reported once during Mozilla build (and I put testcase into
> > bugzilla), now it reproduces on many libraries. I did not see this problem
> > when applying only the SCC hasing change.
> 
> This change causes us to preserve more TYPE_DECLs I think, so we might
> run more often into pre-existing debuginfo issues.  Previously most
> of the types were merged into their nameless variant which probably
> didn't get output into debug info.
> 
> Do you by chance have small testcases for your problems? ;)

I think you might just look into one at http://gcc.gnu.org/bugzilla/show 
_bug.cgi?id=48354

Honza


Re: [PATCH] Fix PR46728 (move pow/powi folds to tree phases)

2011-05-17 Thread William J. Schmidt

On Tue, 2011-05-17 at 11:03 +0200, Richard Guenther wrote:
> On Mon, May 16, 2011 at 7:30 PM, William J. Schmidt
>  wrote:
> > Richi, thank you for the detailed review!
> >
> > I'll plan to move the power-series expansion into the existing IL walk
> > during pass_cse_sincos.  As part of this, I'll move
> > tree_expand_builtin_powi and its subfunctions from builtins.c into
> > tree-ssa-math-opts.c.  I'll submit this as a separate patch.
> >
> > I will also stop attempting to make code generation match completely at
> > -O0.  If there are tests in the test suite that fail only at -O0 due to
> > these changes, I'll modify the tests to require -O1 or higher.
> >
> > I understand that you'd prefer that I leave the existing
> > canonicalization folds in place, and only un-canonicalize them during my
> > new pass (now, during cse_sincos).  Actually, that was my first approach
> > to this issue.  The problem that I ran into is that the various folds
> > are not performed just by the front end, but can pop up later, after my
> > pass is done.  In particular, pass_fold_builtins will undo my changes,
> > turning expressions involving roots back into expressions involving
> > pow/powi.  It wasn't clear to me whether the folds could kick in
> > elsewhere as well, so I took the approach of shutting them down.  I see
> > now that this does lose some optimizations such as
> > pow(sqrt(cbrx(x)),6.0), as you pointed out.
> 
> Yeah, it's always a delicate balance between canonicalization
> and optimal form for further optimization.  Did you really see
> sqrt(cbrt(x)) being transformed back to pow()?  I would doubt that,
> as on gimple the foldings that require two function calls to match
> shouldn't trigger.  Or do you see sqrt(x) turned into pow(x,0.5)?
> I see that the vectorizer for example handles both pow(x,0.5) and
> pow(x,2), so indeed that might happen.

Yes, I was seeing sqrt(x) turned back to pow(x,0.5), and even x*x
turning back into pow(x,2.0).  I don't specifically recall the
sqrt(cbrt(x)) case; you're probably right about that one.  But I had
several test cases break because of this.

> 
> I think what we might want to do is limit what the generic
> gimple fold_stmt folding does to function calls, also to avoid
> building regular generic call statements again.  But that might
> be a bigger project and certainly should be done separately.
> 
> So I'd say don't worry about this issue for the initial patch but
> instead deal with it separately.

Agreed...

> 
> We also repeatedly thought about whether canonicalizing
> everything to pow is a good idea or not, especially our
> canonicalizing of x * x to pow (x, 2) leads to interesting
> effects in some cases, as several passes do not handle
> function calls very well.  So I also thought about introducing
> a POW_EXPR tree code that would be easier in this
> regard and would be a more IL friendly canonical form
> of the power-related functions.
> 
> > Should I attempt to leave the folds in place, and screen out the
> > particular cases that are causing trouble in pass_fold_builtins?  Or is
> > it too fragile to try to catch all places where folds occur?  If there's
> > a flag that indicates parsing is complete, I suppose I could disable
> > individual folds once we're into the optimizer.  I'd appreciate your
> > guidance.
> 
> Indeed restricting canonicalization to earlier passes would be the
> way to go I think.  I will think of the best way to achieve this.

Thanks.  I think we need to address this as part of this patch, unless
you're willing to live with a number of broken test cases in the
meanwhile.  If I only do the un-canonicalization in the new pass and let
some of the folds be re-done later, some will fail.  I'll start
experimenting and see how many.

Bill

> 
> Richard.
> 
> > Thanks,
> > Bill
> >
> >
> >



Re: [patch ada]: Fix boolean_type_node setup and some cleanup for boolean use

2011-05-17 Thread Eric Botcazou
> 2011-05-16  Kai Tietz
>
>   PR middle-end/48989
>   * gcc-interface/trans.c (Exception_Handler_to_gnu_sjlj): Use
>   boolean_false_node instead of integer_zero_node.
>   (convert_with_check): Likewise.
>   * gcc-interface/decl.c (choices_to_gnu): Likewise.

OK for this part.

>   * gcc-interface/misc.c (gnat_init): Set precision for
>   generated boolean_type_node and initialize
>   boolean_false_node.

Not OK, you cannot set the precision of boolean_type_node to 1 in Ada.

-- 
Eric Botcazou


[PATCH][?/n] LTO type merging cleanup

2011-05-17 Thread Richard Guenther

This fixes an oversight in the new SCC hash mixing code - we of course
need to return the adjusted hash of our type, not the purely local one.

There's still something weird going on, hash values somehow depend
on the order we feed it types ...

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2011-05-17  Richard Guenther  

* gimple.c (iterative_hash_gimple_type): Simplify singleton
case some more, fix final hash value of the non-singleton case.

Index: gcc/gimple.c
===
--- gcc/gimple.c(revision 173827)
+++ gcc/gimple.c(working copy)
@@ -4213,25 +4213,24 @@ iterative_hash_gimple_type (tree type, h
   if (state->low == state->dfsnum)
 {
   tree x;
-  struct sccs *cstate;
   struct tree_int_map *m;
 
   /* Pop off the SCC and set its hash values.  */
   x = VEC_pop (tree, *sccstack);
-  cstate = (struct sccs *)*pointer_map_contains (sccstate, x);
-  cstate->on_sccstack = false;
   /* Optimize SCC size one.  */
   if (x == type)
{
+ state->on_sccstack = false;
  m = ggc_alloc_cleared_tree_int_map ();
  m->base.from = x;
- m->to = cstate->u.hash;
+ m->to = v;
  slot = htab_find_slot (type_hash_cache, m, INSERT);
  gcc_assert (!*slot);
  *slot = (void *) m;
}
   else
{
+ struct sccs *cstate;
  unsigned first, i, size, j;
  struct type_hash_pair *pairs;
  /* Pop off the SCC and build an array of type, hash pairs.  */
@@ -4241,6 +4240,8 @@ iterative_hash_gimple_type (tree type, h
  size = VEC_length (tree, *sccstack) - first + 1;
  pairs = XALLOCAVEC (struct type_hash_pair, size);
  i = 0;
+ cstate = (struct sccs *)*pointer_map_contains (sccstate, x);
+ cstate->on_sccstack = false;
  pairs[i].type = x;
  pairs[i].hash = cstate->u.hash;
  do
@@ -4275,6 +4276,8 @@ iterative_hash_gimple_type (tree type, h
  for (j = 0; pairs[j].hash != pairs[i].hash; ++j)
hash = iterative_hash_hashval_t (pairs[j].hash, hash);
  m->to = hash;
+ if (pairs[i].type == type)
+   v = hash;
  slot = htab_find_slot (type_hash_cache, m, INSERT);
  gcc_assert (!*slot);
  *slot = (void *) m;


Re: [patch ada]: Fix boolean_type_node setup and some cleanup for boolean use

2011-05-17 Thread Kai Tietz
2011/5/17 Eric Botcazou :
>> 2011-05-16  Kai Tietz
>>
>>       PR middle-end/48989
>>       * gcc-interface/trans.c (Exception_Handler_to_gnu_sjlj): Use
>>       boolean_false_node instead of integer_zero_node.
>>       (convert_with_check): Likewise.
>>       * gcc-interface/decl.c (choices_to_gnu): Likewise.
>
> OK for this part.
>
>>       * gcc-interface/misc.c (gnat_init): Set precision for
>>       generated boolean_type_node and initialize
>>       boolean_false_node.
>
> Not OK, you cannot set the precision of boolean_type_node to 1 in Ada.
>
> --
> Eric Botcazou

Hmm, sad. As the a check in tree-cfg for truth-expressions about
having type-precision of 1 would be a good way.  What is actual the
cause for not setting type-precision here? At least in testcases I
didn't found a regression caused by this.

Regards,
Kai


Re: [PATCH][?/n] Cleanup LTO type merging

2011-05-17 Thread H.J. Lu
On Tue, May 17, 2011 at 3:29 AM, Richard Guenther  wrote:
> On Mon, 16 May 2011, H.J. Lu wrote:
>
>> On Mon, May 16, 2011 at 7:17 AM, Richard Guenther  wrote:
>> >
>> > The following patch improves hashing types by re-instantiating the
>> > patch that makes us visit aggregate target types of pointers and
>> > function return and argument types.  This halves the collision
>> > rate on the type hash table for a linux-kernel build and improves
>> > WPA compile-time from 3mins to 1mins and reduces memory usage by
>> > 1GB for that testcase.
>> >
>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC2k6
>> > build-tested.
>> >
>> > Richard.
>> >
>> > (patch is reversed)
>> >
>> > 2011-05-16  Richard Guenther  
>> >
>> >        * gimple.c (iterative_hash_gimple_type): Re-instantiate
>> >        change to always visit pointer target and function result
>> >        and argument types.
>> >
>>
>> This caused:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013
>
> I have reverted the patch for now.
>

It doesn't solve the problem and I reopened:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013

Your followup patches may have similar issues.

-- 
H.J.


Re: [PATCH][?/n] Cleanup LTO type merging

2011-05-17 Thread H.J. Lu
On Tue, May 17, 2011 at 5:59 AM, H.J. Lu  wrote:
> On Tue, May 17, 2011 at 3:29 AM, Richard Guenther  wrote:
>> On Mon, 16 May 2011, H.J. Lu wrote:
>>
>>> On Mon, May 16, 2011 at 7:17 AM, Richard Guenther  wrote:
>>> >
>>> > The following patch improves hashing types by re-instantiating the
>>> > patch that makes us visit aggregate target types of pointers and
>>> > function return and argument types.  This halves the collision
>>> > rate on the type hash table for a linux-kernel build and improves
>>> > WPA compile-time from 3mins to 1mins and reduces memory usage by
>>> > 1GB for that testcase.
>>> >
>>> > Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC2k6
>>> > build-tested.
>>> >
>>> > Richard.
>>> >
>>> > (patch is reversed)
>>> >
>>> > 2011-05-16  Richard Guenther  
>>> >
>>> >        * gimple.c (iterative_hash_gimple_type): Re-instantiate
>>> >        change to always visit pointer target and function result
>>> >        and argument types.
>>> >
>>>
>>> This caused:
>>>
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013
>>
>> I have reverted the patch for now.
>>
>
> It doesn't solve the problem and I reopened:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013
>
> Your followup patches may have similar issues.
>

I think you reverted the WRONG patch:

http://gcc.gnu.org/viewcvs?view=revision&revision=173827


-- 
H.J.


Re: [PATCH][?/n] Cleanup LTO type merging

2011-05-17 Thread Richard Guenther
On Tue, May 17, 2011 at 3:01 PM, H.J. Lu  wrote:
> On Tue, May 17, 2011 at 5:59 AM, H.J. Lu  wrote:
>> On Tue, May 17, 2011 at 3:29 AM, Richard Guenther  wrote:
>>> On Mon, 16 May 2011, H.J. Lu wrote:
>>>
 On Mon, May 16, 2011 at 7:17 AM, Richard Guenther  
 wrote:
 >
 > The following patch improves hashing types by re-instantiating the
 > patch that makes us visit aggregate target types of pointers and
 > function return and argument types.  This halves the collision
 > rate on the type hash table for a linux-kernel build and improves
 > WPA compile-time from 3mins to 1mins and reduces memory usage by
 > 1GB for that testcase.
 >
 > Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC2k6
 > build-tested.
 >
 > Richard.
 >
 > (patch is reversed)
 >
 > 2011-05-16  Richard Guenther  
 >
 >        * gimple.c (iterative_hash_gimple_type): Re-instantiate
 >        change to always visit pointer target and function result
 >        and argument types.
 >

 This caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013
>>>
>>> I have reverted the patch for now.
>>>
>>
>> It doesn't solve the problem and I reopened:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013
>>
>> Your followup patches may have similar issues.
>>
>
> I think you reverted the WRONG patch:
>
> http://gcc.gnu.org/viewcvs?view=revision&revision=173827

No, that was on purpose.

> --
> H.J.
>


Re: [PATCH][?/n] Cleanup LTO type merging

2011-05-17 Thread H.J. Lu
On Tue, May 17, 2011 at 6:03 AM, Richard Guenther
 wrote:
> On Tue, May 17, 2011 at 3:01 PM, H.J. Lu  wrote:
>> On Tue, May 17, 2011 at 5:59 AM, H.J. Lu  wrote:
>>> On Tue, May 17, 2011 at 3:29 AM, Richard Guenther  wrote:
 On Mon, 16 May 2011, H.J. Lu wrote:

> On Mon, May 16, 2011 at 7:17 AM, Richard Guenther  
> wrote:
> >
> > The following patch improves hashing types by re-instantiating the
> > patch that makes us visit aggregate target types of pointers and
> > function return and argument types.  This halves the collision
> > rate on the type hash table for a linux-kernel build and improves
> > WPA compile-time from 3mins to 1mins and reduces memory usage by
> > 1GB for that testcase.
> >
> > Bootstrapped and tested on x86_64-unknown-linux-gnu, SPEC2k6
> > build-tested.
> >
> > Richard.
> >
> > (patch is reversed)
> >
> > 2011-05-16  Richard Guenther  
> >
> >        * gimple.c (iterative_hash_gimple_type): Re-instantiate
> >        change to always visit pointer target and function result
> >        and argument types.
> >
>
> This caused:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013

 I have reverted the patch for now.

>>>
>>> It doesn't solve the problem and I reopened:
>>>
>>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49013
>>>
>>> Your followup patches may have similar issues.
>>>
>>
>> I think you reverted the WRONG patch:
>>
>> http://gcc.gnu.org/viewcvs?view=revision&revision=173827
>
> No, that was on purpose.
>

But it doesn't fix the problem.



-- 
H.J.


Re: FDO patch -- make ic related vars TLS if target allows

2011-05-17 Thread H.J. Lu
On Wed, Apr 27, 2011 at 10:54 AM, Xinliang David Li  wrote:
> Hi please review the trivial patch below. It reduces race conditions
> in value profiling. Another trivial change (to initialize
> function_list struct) is also included.
>
> Bootstrapped and regression tested on x86-64/linux.
>
> Thanks,
>
> David
>
>
> 2011-04-27  Xinliang David Li  
>
>        * tree-profile.c (init_ic_make_global_vars): Set
>        tls attribute on ic vars.
>        * coverage.c (coverage_end_function): Initialize
>        function_list with zero.
>

This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49014


-- 
H.J.


Re: [patch ada]: Fix boolean_type_node setup and some cleanup for boolean use

2011-05-17 Thread Eric Botcazou
> Hmm, sad. As the a check in tree-cfg for truth-expressions about
> having type-precision of 1 would be a good way.  What is actual the
> cause for not setting type-precision here?

But we are setting it:

  /* In Ada, we use an unsigned 8-bit type for the default boolean type.  */
  boolean_type_node = make_unsigned_type (8);
  TREE_SET_CODE (boolean_type_node, BOOLEAN_TYPE);

See make_unsigned_type:

/* Create and return a type for unsigned integers of PRECISION bits.  */

tree
make_unsigned_type (int precision)
{
  tree type = make_node (INTEGER_TYPE);

  TYPE_PRECISION (type) = precision;

  fixup_unsigned_type (type);
  return type;
}


The other languages are changing the precision, but in Ada we need a standard 
scalar (precision == mode size) in order to support invalid values.

> At least in testcases I didn't found a regression caused by this.

Right, I've just installed the attached testcase, it passes with the unmodified 
compiler but fails with your gcc-interface/misc.c change.


2011-05-17  Eric Botcazou  

* gnat.dg/invalid1.adb: New test.


-- 
Eric Botcazou
-- { dg-do run }
-- { dg-options "-gnatws -gnatVa" }

pragma Initialize_Scalars;

procedure Invalid1 is

  X : Boolean;
  A : Boolean := False;

  procedure Uninit (B : out Boolean) is
  begin
if A then
  B := True;
  raise Program_Error;
end if;
  end;

begin

  -- first, check that initialize_scalars is enabled
  begin
if X then
  A := False;
end if;
raise Program_Error;
  exception
when Constraint_Error => null;
  end;

  -- second, check if copyback of an invalid value raises constraint error
  begin
Uninit (A);
if A then
  -- we expect constraint error in the 'if' above according to gnat ug:
  -- 
  -- call.  Note that there is no specific option to test `out'
  -- parameters, but any reference within the subprogram will be tested
  -- in the usual manner, and if an invalid value is copied back, any
  -- reference to it will be subject to validity checking.
  -- ...
  raise Program_Error;
end if;
raise Program_Error;
  exception
when Constraint_Error => null;
  end;

end;


Re: [PATCH] comment precising need to use free_dominance_info

2011-05-17 Thread Pierre Vittet
So maybe this patch adding a comment on calculate_dominance_info is more 
adapted.


ChangeLog:
2011-05-17  Pierre Vittet

* dominance.c (calculate_dominance_info): Add comment
   precising when to free with free_dominance_info

contributor number: 634276

Index: gcc/dominance.c
===
--- gcc/dominance.c (revision 173830)
+++ gcc/dominance.c (working copy)
@@ -628,8 +628,15 @@ compute_dom_fast_query (enum cdi_direction dir)
 }
 
 /* The main entry point into this module.  DIR is set depending on whether
-   we want to compute dominators or postdominators.  */
+   we want to compute dominators or postdominators.  
 
+   We try to keep dominance info alive as long as possible (to avoid
+   recomputing it often). It has to be freed with free_dominance_info when CFG
+   transformation makes it invalide. 
+   
+   post_dominance info is less often used, and should be freed after each use.
+*/
+
 void
 calculate_dominance_info (enum cdi_direction dir)
 {


RFA: MN10300: Add TLS support

2011-05-17 Thread Nick Clifton
Hi Richard, Hi Jeff, Hi Alex,

  Here is another MN10300 patch.  This ones adds support for TLS.  I
  must confess that I did not actually write this code - DJ did - but I
  have been asked to submit it upstream, so here goes:

  OK to apply ?

Cheers
  Nick

gcc/ChangeLog
2011-05-17  DJ Delorie  
Nick Clifton  

* config/mn10300/mn10300.c (mn10300_unspec_int_label_counter):
New variable.
(mn10300_option_override): Disable TLS for the MN10300.
(tls_symbolic_operand_kind): New function.
(get_some_local_dynamic_name_1): New function.
(get_some_local_dynamic_name): New function.
(mn10300_print_operand): Handle %&.
(mn10300_legitimize_address): Legitimize TLS addresses.
(is_legitimate_tls_operand): New function.
(mn10300_legitimate_pic_operand_p): TLS operands are
legitimate.
(mn10300_legitimate_address_p): TLS symbols do not make
legitimate addresses.
Allow TLS operands under some circumstances.
(mn10300_legitimate_constant_p): Handle TLS UNSPECs.
(mn10300_init_machine_status): New function.
(mn10300_init_expanders): New function.
(pic_nonpic_got_ptr): New function.
(mn10300_tls_get_addr): New function.
(mn10300_legitimize_tls_address): New function.
(mn10300_constant_address_p): New function.
(TARGET_HAVE_TLS): Define.
* config/mn10300/predicates.md (tls_symbolic_operand): New.
(nontls_general_operand): New.
* config/mn10300/mn10300.h (enum reg_class): Add D0_REGS,
A0_REGS.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(struct machine_function): New structure.
(INIT_EXPANDERS): Define.
(mn10300_unspec_int_label_counter): New variable.
(PRINT_OPERAND_PUNCT_VALID_P): Define.
(CONSTANT_ADDRESS_P): Define.
* config/mn10300/constraints (B): New constraint.
(C): New constraint.
* config/mn10300/mn10300-protos.h: Alpha sort.
(mn10300_init_expanders): Prototype.
(mn10300_tls_get_addr): Prototype.
(mn10300_legitimize_tls_address): Prototype.
(mn10300_constant_address_p): Prototype.
* config/mn10300/mn10300.md (TLS_REG): New constant.
(UNSPEC_INT_LABEL): New constant.
(UNSPEC_TLSGD): New constant.
(UNSPEC_TLSLDM): New constant.
(UNSPEC_DTPOFF): New constant.
(UNSPEC_GOTNTPOFF): New constant.
(UNSPEC_INDNTPOFF): New constant.
(UNSPEC_TPOFF): New constant.
(UNSPEC_TLS_GD): New constant.
(UNSPEC_TLS_LD_BASE): New constant.
(movsi): Add TLS code.
(tls_global_dynamic_i): New pattern.
(tls_global_dynamic): New pattern.
(tls_local_dynamic_base_i): New pattern.
(tls_local_dynamic_base): New pattern.
(tls_initial_exec): New pattern.
(tls_initial_exec_1): New pattern.
(tls_initial_exec_2): New pattern.
(am33_set_got): New pattern.
(int_label): New pattern.
(am33_loadPC_anyreg): New pattern.
(add_GOT_to_any_reg): New pattern.

Index: gcc/config/mn10300/mn10300.c
===
--- gcc/config/mn10300/mn10300.c	(revision 173815)
+++ gcc/config/mn10300/mn10300.c	(working copy)
@@ -46,7 +46,12 @@
 #include "df.h"
 #include "opts.h"
 #include "cfgloop.h"
+#include "ggc.h"
 
+/* This is used by GOTaddr2picreg to uniquely identify
+   UNSPEC_INT_LABELs.  */
+int mn10300_unspec_int_label_counter;
+
 /* This is used in the am33_2.0-linux-gnu port, in which global symbol
names are not prefixed by underscores, to tell whether to prefix a
label with a plus sign or not, so that the assembler can tell
@@ -124,6 +129,9 @@
 target_flags &= ~MASK_MULT_BUG;
   else
 {
+  /* We can't do TLS if we don't have the TLS register.  */
+  targetm.have_tls = false;
+
   /* Disable scheduling for the MN10300 as we do
 	 not have timing information available for it.  */
   flag_schedule_insns = 0;
@@ -162,6 +170,51 @@
 fprintf (asm_out_file, "\t.am33\n");
 }
 
+/* Returns non-zero if OP has the KIND tls model.  */
+
+static inline bool
+tls_symbolic_operand_kind (rtx op, enum tls_model kind)
+{
+  if (GET_CODE (op) != SYMBOL_REF)
+return false;
+  return SYMBOL_REF_TLS_MODEL (op) == kind;
+}
+
+/* Locate some local-dynamic symbol still in use by this function
+   so that we can print its name in some tls_local_dynamic_base
+   pattern.  This is used by "%&" in print_operand().  */
+
+static int
+get_some_local_dynamic_name_1 (rtx *px, void *data ATTRIBUTE_UNUSED)
+{
+  rtx x = *px;
+
+  if (GET_CODE (x) == SYMBOL_REF
+  && tls_symbolic_operand_kind (x, TLS_MODEL_LOCAL_DYNAMIC))
+{
+  cfun->machine->some_ld_name = XSTR (x, 0);
+  return 1;
+}
+
+  return 0;
+}
+
+static const char *
+get_some_local_dynamic_name (void)
+{
+  rtx insn;

[PATCH] Fixup LTO SCC hash comparison fn

2011-05-17 Thread Richard Guenther

Quite obvious if you look at it for the 100th time...

Richard.

2011-05-17  Richard Guenther  

* gimple.c (type_hash_pair_compare): Fix comparison.

Index: gcc/gimple.c
===
--- gcc/gimple.c(revision 173830)
+++ gcc/gimple.c(working copy)
@@ -4070,9 +4070,11 @@ type_hash_pair_compare (const void *p1_,
 {
   const struct type_hash_pair *p1 = (const struct type_hash_pair *) p1_;
   const struct type_hash_pair *p2 = (const struct type_hash_pair *) p2_;
-  if (p1->hash == p2->hash)
-return TYPE_UID (p1->type) - TYPE_UID (p2->type);
-  return p1->hash - p2->hash;
+  if (p1->hash < p2->hash)
+return -1;
+  else if (p1->hash > p2->hash)
+return 1;
+  return 0;
 }
 
 /* Returning a hash value for gimple type TYPE combined with VAL.


Clean up ARM string option handling

2011-05-17 Thread Joseph S. Myers
This patch cleans up ARM handling of various options, making
enumerated options that were handled in arm_option_override use Enum
instead (except for -mfpu=, to be handled in a subsequent patch) and
using UInteger for -mstructure-size-boundary=.

-mfp= and -mfpe= (legacy aliases) are converted into actual .opt Alias
entries for each valid argument to those options.  (Thus, the last of
a sequence of interspersed -mfpu= and -mfp=/-mfpe= options now wins,
which I think is correct, whereas previously an earlier -mfpu= option
would override a later -mfp=/-mfpe= option.)
-mstructure-size-boundary= using UInteger means that its arguments are
now required to be decimal integers without leading whitespace or
ignored trailing text, whereas previously such variants (and octal and
hexadecimal values) would be accepted; I think this stricter checking,
consistent with other integer-valued options, is also the desired
semantics.

Tested building cc1 and xgcc for cross to arm-eabi.  Will commit to
trunk in the absence of target maintainer objections.

Remarks on oddities in the configuration of various ARM targets for
the consideration of the ARM target maintainers (I don't plan to work
on these issues):

* The default float-abi (TARGET_DEFAULT_FLOAT_ABI) is "hard" only for
  old-ABI GNU/Linux.  Although it's also defined to ARM_FLOAT_ABI_HARD
  in arm/semi.h, that definition gets overridden in arm/coff.h.

* The header arm/aout.h has a comment saying it is for "ARM with
  a.out", which is misleading since there are no such targets in GCC.
  Actually, all ARM targets include it either immediately before
  arm/arm.h, or in the case of FreeBSD with arm/freebsd.h inbetween -
  but arm/freebsd.h does not test or modify any macros from
  arm/aout.h, so actually arm/aout.h could safely be merged into
  arm/arm.h, making the port simpler and less confusing.  (I haven't
  checked whether arm/arm.h tests or modifies any macros from
  arm/aout.h, which would be relevant to determining the best way to
  merge each macro into arm/arm.h.)

* The header arm/semi.h says "ARM on semi-hosted platform" - which is
  also misleading since it's only used for WinCE.  As noted above at
  least one target macro in this header is dead.  As the four headers
  arm/semi.h arm/coff.h arm/pe.h arm/wince-pe.h are all only used for
  this one target, and I doubt the distinctions between them genuinely
  follow any proper abstraction levels between four different sets of
  targets that simply happen to be the same right now, merging them
  into one header might improve things.

* arm*-*-uclinux* are legacy Linux-based targets not using the common
  gnu-user.h and linux.h headers as they should be.

* There were previous discussions of deprecating at least some
  non-AAPCS targets.  Since for some target OSes (such as WinCE) the
  ABI is what it is, it may not be possible to eliminate the old ABI
  support completely.  But maybe some other OS maintainers (VxWorks,
  FreeBSD, NetBSD, eCos, RTEMS) are moving their OSes to use AAPCS, in
  which case some older target variants could be deprecated and
  removed?  (Maybe old-ABI bare-metal, GNU/Linux and uClinux could be
  deprecated in any case.)  Closely tied into old-ABI use are defaults
  of -mfpu= based on the selected CPU - but NetBSD and VxWorks at
  least default to VFP (define FPUTYPE_DEFAULT appropriately) even
  though using pre-AAPCS ABIs.

* There was also a previous suggestion that -mwords-little-endian is
  long-obsolete and should be removed as well.

2011-05-17  Joseph Myers  

* config/arm/arm-opts.h (enum arm_fp16_format_type, enum
arm_abi_type, enum float_abi_type, enum arm_tp_type): Move from
arm.h.
* config/arm/arm.c (arm_float_abi, arm_fp16_format, arm_abi,
target_thread_pointer, arm_structure_size_boundary, struct
float_abi, all_float_abis, struct fp16_format, all_fp16_formats,
struct abi_name, arm_all_abis): Remove.
(arm_option_override) Don't process most enumerated option values
here.  Don't process target_fpe_name here.  Work with integer not
string for structure size boundary; use separate diagnostics for
each case.
* config/arm/arm.h (enum float_abi_type, enum
arm_fp16_format_type, enum arm_abi_type, enum arm_tp_type): Move
to arm-opts.h.
(arm_float_abi, arm_fp16_format, arm_abi, target_thread_pointer,
arm_structure_size_boundary): Remove.
* config/arm/arm.opt (mabi=): Use Enum and Init.
(arm_abi_type): New Enum and EnumValue entries.
(mfloat-abi=): Use Enum and Init.
(float_abi_type): New Enum and EnumValue entries.
(mfp=, mfpe=): Replace by separate Alias entries for each
argument.
(mfp16-format=): Use Enum and Init.
(arm_fp16_format_type): New Enum and EnumValue entries.
(mstructure-size-boundary=): Use UInteger and Init.
(mtp=): Use Enum and Init.
(arm_tp_type): New

[PING][PATCH 13/18] move TS_EXP to be a substructure of TS_TYPED

2011-05-17 Thread Nathan Froyd
On 05/10/2011 04:18 PM, Nathan Froyd wrote:
> On 03/10/2011 11:23 PM, Nathan Froyd wrote:
>> After all that, we can finally make tree_exp inherit from typed_tree.
>> Quite anticlimatic.
> 
> Ping.  http://gcc.gnu.org/ml/gcc-patches/2011-03/msg00559.html

Ping^2.

-Nathan


Re: Libiberty: POSIXify psignal definition

2011-05-17 Thread Richard Earnshaw

On Thu, 2011-05-05 at 09:30 +0200, Corinna Vinschen wrote:
> [Please keep me CCed, I'm not subscribed to gcc-patches.  Thank you]
> 
> Hi,
> 
> the definition of psignal in libiberty is
> 
>void psignal (int, char *);
> 
> The correct definition per POSIX is
> 
>void psignal (int, const char *);
> 
> The below patch fixes that.
> 
> 
> Thanks,
> Corinna
> 
> 
>   * strsignal.c (psignal): Change second parameter to const char *.
>   Fix comment accordingly.
> 

OK.

R.






Re: Libiberty: POSIXify psignal definition

2011-05-17 Thread DJ Delorie

> > * strsignal.c (psignal): Change second parameter to const char *.
> > Fix comment accordingly.
> > 
> 
> OK.

I had argued against this patch:

http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00439.html

The newlib change broke ALL released versions of gcc, and the above
patch does NOT fix the problem, but merely hides it until the next
time we trip over it.


Re: [patch ada]: Fix boolean_type_node setup and some cleanup for boolean use

2011-05-17 Thread Kai Tietz
2011/5/17 Eric Botcazou :
>> Hmm, sad. As the a check in tree-cfg for truth-expressions about
>> having type-precision of 1 would be a good way.  What is actual the
>> cause for not setting type-precision here?
>
> But we are setting it:
>
>  /* In Ada, we use an unsigned 8-bit type for the default boolean type.  */
>  boolean_type_node = make_unsigned_type (8);
>  TREE_SET_CODE (boolean_type_node, BOOLEAN_TYPE);
>
> See make_unsigned_type:
>
> /* Create and return a type for unsigned integers of PRECISION bits.  */
>
> tree
> make_unsigned_type (int precision)
> {
>  tree type = make_node (INTEGER_TYPE);
>
>  TYPE_PRECISION (type) = precision;
>
>  fixup_unsigned_type (type);
>  return type;
> }
>
>
> The other languages are changing the precision, but in Ada we need a standard
> scalar (precision == mode size) in order to support invalid values.
>
>> At least in testcases I didn't found a regression caused by this.
>
> Right, I've just installed the attached testcase, it passes with the 
> unmodified
> compiler but fails with your gcc-interface/misc.c change.
>
>
> 2011-05-17  Eric Botcazou  
>
>        * gnat.dg/invalid1.adb: New test.
>
>
> --
> Eric Botcazou
>

Ok, thanks for explaining it.  So would be patch ok for apply without
the precision setting?

Regards,
Kai


Re: Libiberty: POSIXify psignal definition

2011-05-17 Thread Richard Earnshaw

On Tue, 2011-05-17 at 11:52 -0400, DJ Delorie wrote:
> > >   * strsignal.c (psignal): Change second parameter to const char *.
> > >   Fix comment accordingly.
> > > 
> > 
> > OK.
> 
> I had argued against this patch:
> 
> http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00439.html
> 
> The newlib change broke ALL released versions of gcc, and the above
> patch does NOT fix the problem, but merely hides it until the next
> time we trip over it.
> 

So regardless of whether the changes to newlib are a good idea or not, I
think the fix to libiberty is still right. Posix says that psignal takes
a const char *, and libiberty's implementation doesn't.  That's just
silly.

I do agree that the newlib code should be tightened up, particularly in
order to support older compilers; but that doesn't mean we shouldn't fix
libiberty as well.

R.




Re: Libiberty: POSIXify psignal definition

2011-05-17 Thread Corinna Vinschen
On May 17 16:33, Richard Earnshaw wrote:
> 
> On Thu, 2011-05-05 at 09:30 +0200, Corinna Vinschen wrote:
> > [Please keep me CCed, I'm not subscribed to gcc-patches.  Thank you]
> > 
> > Hi,
> > 
> > the definition of psignal in libiberty is
> > 
> >void psignal (int, char *);
> > 
> > The correct definition per POSIX is
> > 
> >void psignal (int, const char *);
> > 
> > The below patch fixes that.
> > 
> > 
> > Thanks,
> > Corinna
> > 
> > 
> > * strsignal.c (psignal): Change second parameter to const char *.
> > Fix comment accordingly.
> > 
> 
> OK.
> 
> R.

Thanks.  I just have no check in rights to the gcc repository.  I applied
the change to the sourceware CVS repository but for gcc I need a proxy.


Thanks,
Corinna

-- 
Corinna Vinschen
Cygwin Project Co-Leader
Red Hat


Re: Libiberty: POSIXify psignal definition

2011-05-17 Thread Corinna Vinschen
On May 17 17:07, Richard Earnshaw wrote:
> 
> On Tue, 2011-05-17 at 11:52 -0400, DJ Delorie wrote:
> > > > * strsignal.c (psignal): Change second parameter to const char 
> > > > *.
> > > > Fix comment accordingly.
> > > > 
> > > 
> > > OK.
> > 
> > I had argued against this patch:
> > 
> > http://gcc.gnu.org/ml/gcc-patches/2011-05/msg00439.html
> > 
> > The newlib change broke ALL released versions of gcc, and the above
> > patch does NOT fix the problem, but merely hides it until the next
> > time we trip over it.
> > 
> 
> So regardless of whether the changes to newlib are a good idea or not, I
> think the fix to libiberty is still right. Posix says that psignal takes
> a const char *, and libiberty's implementation doesn't.  That's just
> silly.
> 
> I do agree that the newlib code should be tightened up, particularly in
> order to support older compilers;

What I don't understand is why the newlib change broke older compilers.
The function has been added to newlib and the definitions in newlib are
correct.

If this is refering to the fact that libiberty doesn't grok
automatically if a symbol has been added to newlib, then that's a
problem in libiberty, not in newlib.

Otherwise, if you're building an older compiler, just use an older
newlib as well.


Corinna

-- 
Corinna Vinschen
Cygwin Project Co-Leader
Red Hat


Re: Libiberty: POSIXify psignal definition

2011-05-17 Thread DJ Delorie

> So regardless of whether the changes to newlib are a good idea or not, I
> think the fix to libiberty is still right.

Irrelevent.  I said I'd accept that change *after* the real problem is
fixed.  The real problem hasn't been fixed.

The real problem is that libibery should NOT INCLUDE PSIGNAL AT ALL if
newlib has it.

What *should* have happened, is libiberty should have been fixed
*first*, and newlib waited until a gcc/binutils release cycle
happened, so that at least ONE version of those could build with
newlib.


Re: Libiberty: POSIXify psignal definition

2011-05-17 Thread DJ Delorie

> Thanks.  I just have no check in rights to the gcc repository.  I
> applied the change to the sourceware CVS repository but for gcc I
> need a proxy.

Please, never apply libiberty patches only to src.  They're likely to
get deleted by the robomerge.  The rule is: gcc only, or both at the
same time.


Re: Libiberty: POSIXify psignal definition

2011-05-17 Thread DJ Delorie

> What I don't understand is why the newlib change broke older compilers.

Older compilers have the older libiberty.  At the moment, libiberty
cannot be built by *any* released gcc, because you cannot *build* any
released gcc, because it cannot build its target libiberty.

> The function has been added to newlib and the definitions in newlib are
> correct.

"Correct" is irrelevent.  They don't match libiberty, so the build
breaks.

> If this is refering to the fact that libiberty doesn't grok
> automatically if a symbol has been added to newlib, then that's a
> problem in libiberty, not in newlib.

It's a problem in every released gcc at the moment, so no released gcc
can be built for a newlib target, without hacking the sources.

> Otherwise, if you're building an older compiler, just use an older
> newlib as well.

The only option here is to not release a newlib at all until a fixed
gcc release happens, then, and require that fixed gcc for that version
of newlib forward.


Re: [Patch, libfortran] PR 48931 Async-signal-safety of backtrace signal handler

2011-05-17 Thread Toon Moene

On 05/14/2011 09:40 PM, Janne Blomqvist wrote:


Hi,

the current version of showing the backtrace is not async-signal-safe
as it uses backtrace_symbols() which, in turn, uses malloc(). The
attached patch changes the backtrace printing functionality to instead
use backtrace_symbols_fd() and pipes.


Great - this would solve a problem I filed a bugzilla report for years 
ago (unfortunately, I do not know the number of it).


I closed it WONTFIX, because neither FX nor I could come up with an 
alternative way *not* using malloc.


[ The problem was getting a traceback after corruption of the
  malloc arena, which just hangs under the current implementation. ]

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


[PATCH, i386]: Trivial, use bool some more.

2011-05-17 Thread Uros Bizjak
Hello!

2011-05-16  Uros Bizjak  

* config/i386/i386-protos.h (output_fix_trunc): Change arg 3 to bool.
(output_fp_compare): Change args 3 and 4 to bool.
(ix86_expand_call): Change arg 6 to bool.
(ix86_attr_length_immediate_default): Change arg 2 to bool.
(ix86_attr_length_vex_default): Change arg 3 to bool.
* config/i386/i386.md: Update all uses.
* config/i386/i386.c: Ditto.
(ix86_flags_dependent): Change return type to bool.

Patch was tested on x86_64-pc-linux-gnu {,-m32}, also with
--enable-build-with-cxx (additional patch is needed to bootstrap
without errors ATM).

Committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 173832)
+++ config/i386/i386.md (working copy)
@@ -414,9 +414,9 @@
   (const_int 0)
 (eq_attr "type" "alu,alu1,negnot,imovx,ishift,rotate,ishift1,rotate1,
  imul,icmp,push,pop")
-  (symbol_ref "ix86_attr_length_immediate_default(insn,1)")
+  (symbol_ref "ix86_attr_length_immediate_default (insn, true)")
 (eq_attr "type" "imov,test")
-  (symbol_ref "ix86_attr_length_immediate_default(insn,0)")
+  (symbol_ref "ix86_attr_length_immediate_default (insn, false)")
 (eq_attr "type" "call")
   (if_then_else (match_operand 0 "constant_call_address_operand" "")
 (const_int 4)
@@ -524,11 +524,11 @@
   (if_then_else (and (eq_attr "prefix_0f" "1")
 (eq_attr "prefix_extra" "0"))
 (if_then_else (eq_attr "prefix_vex_w" "1")
-  (symbol_ref "ix86_attr_length_vex_default (insn, 1, 1)")
-  (symbol_ref "ix86_attr_length_vex_default (insn, 1, 0)"))
+  (symbol_ref "ix86_attr_length_vex_default (insn, true, true)")
+  (symbol_ref "ix86_attr_length_vex_default (insn, true, false)"))
 (if_then_else (eq_attr "prefix_vex_w" "1")
-  (symbol_ref "ix86_attr_length_vex_default (insn, 0, 1)")
-  (symbol_ref "ix86_attr_length_vex_default (insn, 0, 0)"
+  (symbol_ref "ix86_attr_length_vex_default (insn, false, true)")
+  (symbol_ref "ix86_attr_length_vex_default (insn, false, false)"
 
 ;; Set when modrm byte is used.
 (define_attr "modrm" ""
@@ -1262,7 +1262,7 @@
UNSPEC_FNSTSW))]
   "X87_FLOAT_MODE_P (GET_MODE (operands[1]))
&& GET_MODE (operands[1]) == GET_MODE (operands[2])"
-  "* return output_fp_compare (insn, operands, 0, 0);"
+  "* return output_fp_compare (insn, operands, false, false);"
   [(set_attr "type" "multi")
(set_attr "unit" "i387")
(set (attr "mode")
@@ -1309,7 +1309,7 @@
 (match_operand:XF 2 "register_operand" "f"))]
  UNSPEC_FNSTSW))]
   "TARGET_80387"
-  "* return output_fp_compare (insn, operands, 0, 0);"
+  "* return output_fp_compare (insn, operands, false, false);"
   [(set_attr "type" "multi")
(set_attr "unit" "i387")
(set_attr "mode" "XF")])
@@ -1343,7 +1343,7 @@
 (match_operand:MODEF 2 "nonimmediate_operand" "fm"))]
  UNSPEC_FNSTSW))]
   "TARGET_80387"
-  "* return output_fp_compare (insn, operands, 0, 0);"
+  "* return output_fp_compare (insn, operands, false, false);"
   [(set_attr "type" "multi")
(set_attr "unit" "i387")
(set_attr "mode" "")])
@@ -1378,7 +1378,7 @@
  UNSPEC_FNSTSW))]
   "X87_FLOAT_MODE_P (GET_MODE (operands[1]))
&& GET_MODE (operands[1]) == GET_MODE (operands[2])"
-  "* return output_fp_compare (insn, operands, 0, 1);"
+  "* return output_fp_compare (insn, operands, false, true);"
   [(set_attr "type" "multi")
(set_attr "unit" "i387")
(set (attr "mode")
@@ -1428,7 +1428,7 @@
   "X87_FLOAT_MODE_P (GET_MODE (operands[1]))
&& (TARGET_USE_MODE_FIOP || optimize_function_for_size_p (cfun))
&& (GET_MODE (operands [3]) == GET_MODE (operands[1]))"
-  "* return output_fp_compare (insn, operands, 0, 0);"
+  "* return output_fp_compare (insn, operands, false, false);"
   [(set_attr "type" "multi")
(set_attr "unit" "i387")
(set_attr "fp_int_src" "true")
@@ -1504,7 +1504,7 @@
   "TARGET_MIX_SSE_I387
&& SSE_FLOAT_MODE_P (GET_MODE (operands[0]))
&& GET_MODE (operands[0]) == GET_MODE (operands[1])"
-  "* return output_fp_compare (insn, operands, 1, 0);"
+  "* return output_fp_compare (insn, operands, true, false);"
   [(set_attr "type" "fcmp,ssecomi")
(set_attr "prefix" "orig,maybe_vex")
(set (attr "mode")
@@ -1533,7 +1533,7 @@
   "TARGET_SSE_MATH
&& SSE_FLOAT_MODE_P (GET_MODE (operands[0]))
&& GET_MODE (operands[0]) == GET_MODE (operands[1])"
-  "* return output_fp_compare (insn, operands, 1, 0);"
+  "* return output_fp_compare (insn, operands, true, false);"
   [(set_attr "type" "ssecomi")
(set_attr "prefix" "maybe_vex")
(set (attr "mode")
@@ -1557,7 +1557,7 @@
&& TARGET_CMOVE
&& !(SSE_FLOAT_MODE_P (GET_MODE (operands[0])) && TARGET_SSE_MATH)
&& GET_MODE (operands[0]) == GET_MODE (operands[1]

[PATCH]: Restore bootstrap with --enable-build-with-cxx

2011-05-17 Thread Uros Bizjak
Hello!

2011-05-17  Uros Bizjak  

* ipa-inline-analysis.c (inline_node_duplication_hook): Initialize
info->entry with 0
* tree-inline.c (maybe_inline_call_in_expr):  Initialize
id.transform_lang_insert_block with NULL.

Tested on x86_64-pc-linux-gnu {, m32} with --enable-build-with-cxx.
Committed to mainline SVN as obvious.

Uros.
Index: ipa-inline-analysis.c
===
--- ipa-inline-analysis.c   (revision 173832)
+++ ipa-inline-analysis.c   (working copy)
@@ -702,7 +702,7 @@ inline_node_duplication_hook (struct cgr
   bool inlined_to_p = false;
   struct cgraph_edge *edge;
 
-  info->entry = false;
+  info->entry = 0;
   VEC_safe_grow_cleared (tree, heap, known_vals, count);
   for (i = 0; i < count; i++)
 {
Index: tree-inline.c
===
--- tree-inline.c   (revision 173832)
+++ tree-inline.c   (working copy)
@@ -5232,7 +5232,7 @@ maybe_inline_call_in_expr (tree exp)
   id.transform_call_graph_edges = CB_CGE_DUPLICATE;
   id.transform_new_cfg = false;
   id.transform_return_to_modify = true;
-  id.transform_lang_insert_block = false;
+  id.transform_lang_insert_block = NULL;
 
   /* Make sure not to unshare trees behind the front-end's back
 since front-end specific mechanisms may rely on sharing.  */


[PATCH, MELT] correcting path error in the Makefile.in

2011-05-17 Thread Pierre Vittet
This patch correct a bug in the current revision of MELT, which was 
preventing MELT to run correctly.


This was a path problem in gcc/Makefile.in (melt-modules/ and 
melt-modules.mk) were not found.


My contributor number is 634276.

changelog :


2011-05-17  Pierre Vittet  

* Makefile.in : Correct path errors for melt_module_dir and for
install-melt-mk target




Index: gcc/Makefile.in
===
--- gcc/Makefile.in (revision 173832)
+++ gcc/Makefile.in (working copy)
@@ -5352,7 +5352,7 @@ melt_default_modules_list=melt-default-modules
 melt_source_dir=$(libexecsubdir)/melt-source/
 
 ## this is the installation directory of melt dynamic modules (*.so)
-melt_module_dir=$(libexecsubdir)/melt-module/
+melt_module_dir=$(libexecsubdir)/melt-modules/
 
 ## this is the installed path of the MELT module makefile
 melt_installed_module_makefile=$(libexecsubdir)/melt-module.mk
@@ -5416,8 +5416,8 @@ install-melt-modules: melt-modules melt-all-module
 
 ## install the makefile for MELT modules
 install-melt-mk: melt-module.mk
-   $(mkinstalldirs) $(DESTDIR)$(plugin_includedir)
-   $(INSTALL_DATA) $< $(DESTDIR)/$(plugin_includedir)/
+   $(mkinstalldirs) $(DESTDIR)$(libexecsubdir)
+   $(INSTALL_DATA) $< $(DESTDIR)/$(libexecsubdir)/
 
 ## install the default modules list
 install-melt-default-modules-list: $(melt_default_modules_list).modlis 


Re: [Patch, libfortran] PR 48931 Async-signal-safety of backtrace signal handler

2011-05-17 Thread Toon Moene

On 05/17/2011 07:50 PM, Toon Moene wrote:


On 05/14/2011 09:40 PM, Janne Blomqvist wrote:


Hi,

the current version of showing the backtrace is not async-signal-safe
as it uses backtrace_symbols() which, in turn, uses malloc(). The
attached patch changes the backtrace printing functionality to instead
use backtrace_symbols_fd() and pipes.


Great - this would solve a problem I filed a bugzilla report for years
ago (unfortunately, I do not know the number of it).


It was 33905 (2007-10-26).

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Re: [PATCH, MELT] correcting path error in the Makefile.in

2011-05-17 Thread Basile Starynkevitch
On Tue, 17 May 2011 21:30:44 +0200
Pierre Vittet  wrote:

> This patch correct a bug in the current revision of MELT, which was 
> preventing MELT to run correctly.
> 
> This was a path problem in gcc/Makefile.in (melt-modules/ and 
> melt-modules.mk) were not found.
> 
> My contributor number is 634276.
> 
> changelog :
> 
> 
> 2011-05-17  Pierre Vittet  
> 
>   * Makefile.in : Correct path errors for melt_module_dir and for
>   install-melt-mk target

The ChangeLog.MELT entry should mention the Makefile target as
changelog functions. And the colon shouldn't have any space before.
So I applied the patch with the following entry:

2011-05-17  Pierre Vittet  

* Makefile.in (melt_module_dir,install-melt-mk): Correct path
errors.
Committed revision 173835.

Thanks.


-- 
Basile STARYNKEVITCH http://starynkevitch.net/Basile/
email: basilestarynkevitchnet mobile: +33 6 8501 2359
8, rue de la Faiencerie, 92340 Bourg La Reine, France
*** opinions {are only mine, sont seulement les miennes} ***


Re: [PATCH]: Restore bootstrap with --enable-build-with-cxx

2011-05-17 Thread Toon Moene

On 05/17/2011 08:32 PM, Uros Bizjak wrote:


Tested on x86_64-pc-linux-gnu {, m32} with --enable-build-with-cxx.
Committed to mainline SVN as obvious.


Does that mean that I can now remove the --disable-werror from my daily 
C++ bootstrap run ?


It's great that some people understand the intricacies of the 
infight^H^H^H^H^H^H differences between the C and C++ type model.


OK: 1/2 :-)

--
Toon Moene - e-mail: t...@moene.org - phone: +31 346 214290
Saturnushof 14, 3738 XG  Maartensdijk, The Netherlands
At home: http://moene.org/~toon/; weather: http://moene.org/~hirlam/
Progress of GNU Fortran: http://gcc.gnu.org/wiki/GFortran#news


Restore MIPS builds

2011-05-17 Thread Richard Sandiford
I've applied the patch below to restore -Werror MIPS builds.
Tested on mips64-linux-gnu.

Richard


gcc/
* config/mips/mips.c (mips_handle_option): Remove unused variable.

Index: gcc/config/mips/mips.c
===
--- gcc/config/mips/mips.c  2011-05-15 08:37:21.0 +0100
+++ gcc/config/mips/mips.c  2011-05-15 08:37:28.0 +0100
@@ -15287,7 +15287,6 @@ mips_handle_option (struct gcc_options *
location_t loc ATTRIBUTE_UNUSED)
 {
   size_t code = decoded->opt_index;
-  const char *arg = decoded->arg;
 
   switch (code)
 {


[PATCH] fix vfmsubaddpd/vfmaddsubpd generation

2011-05-17 Thread Quentin Neill
This patch fixes an obvious problem: the fma4_fmsubadd/fma4_fmaddsub
instruction templates don't generate vfmsubaddpd/vfmaddsubpd because
they don't use 

This passes bootstrap on x86_64 on trunk.  Okay to commit?

BTW, I'm testing on gcc-4_6-branch.  Should I post a different patch
thread, or just use this one?
-- 
Quentin
From aa70d4f6180f1c6712888b7328723232b5da8bdc Mon Sep 17 00:00:00 2001
From: Quentin Neill 
Date: Tue, 17 May 2011 10:24:17 -0500
Subject: [PATCH] 2011-05-17  Harsha Jagasia  

* config/i386/sse.md (fma4_fmsubadd): Use .
(fma4_fmaddsub): Likewise
---
 gcc/ChangeLog  |5 +
 gcc/config/i386/sse.md |4 ++--
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 3625d9b..e86ea4e 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,8 @@
+2011-05-17  Harsha Jagasia  
+
+   * config/i386/sse.md (fma4_fmsubadd): Use .
+   (fma4_fmaddsub): Likewise
+
 2011-05-17  Richard Guenther  
 
* gimple.c (iterative_hash_gimple_type): Simplify singleton
diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md
index 291bffb..7c4e6dd 100644
--- a/gcc/config/i386/sse.md
+++ b/gcc/config/i386/sse.md
@@ -1663,7 +1663,7 @@
   (match_operand:VF 3 "nonimmediate_operand" "xm,x")]
  UNSPEC_FMADDSUB))]
   "TARGET_FMA4"
-  "vfmaddsubps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  "vfmaddsubp\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
(set_attr "mode" "")])
 
@@ -1676,7 +1676,7 @@
 (match_operand:VF 3 "nonimmediate_operand" "xm,x"))]
  UNSPEC_FMADDSUB))]
   "TARGET_FMA4"
-  "vfmsubaddps\t{%3, %2, %1, %0|%0, %1, %2, %3}"
+  "vfmsubaddp\t{%3, %2, %1, %0|%0, %1, %2, %3}"
   [(set_attr "type" "ssemuladd")
(set_attr "mode" "")])
 
-- 
1.7.1



[v3] vs noexcept

2011-05-17 Thread Paolo Carlini

Hi,

this time too, took the occasion to add the get(tuple&&) bits. Tested 
x86_64-linux, committed.


Paolo.

///
2011-05-17  Paolo Carlini  

* include/std/tuple: Use noexcept where appropriate.
(tuple<>::swap): Rework implementation.
(_Head_base<>::_M_swap_impl): Remove.
(get(std::tuple<>&&)): Add.
* testsuite/20_util/tuple/element_access/get2.cc: New.
* testsuite/20_util/weak_ptr/comparison/cmp_neg.cc: Adjust dg-error
line number.
Index: include/std/tuple
===
--- include/std/tuple   (revision 173832)
+++ include/std/tuple   (working copy)
@@ -59,6 +59,15 @@
 struct __add_ref<_Tp&>
 { typedef _Tp& type; };
 
+  // Adds an rvalue reference to a non-reference type.
+  template
+struct __add_r_ref
+{ typedef _Tp&& type; };
+
+  template
+struct __add_r_ref<_Tp&>
+{ typedef _Tp& type; };
+
   template
 struct _Head_base;
 
@@ -78,13 +87,6 @@
 
   _Head&   _M_head()   { return *this; }
   const _Head& _M_head() const { return *this; }
-
-  void 
-  _M_swap_impl(_Head& __h)
-  {
-   using std::swap;
-   swap(__h, _M_head());
-  }
 };
 
   template
@@ -103,13 +105,6 @@
   _Head&   _M_head()   { return _M_head_impl; }
   const _Head& _M_head() const { return _M_head_impl; }
 
-  void
-  _M_swap_impl(_Head& __h)
-  { 
-   using std::swap;
-   swap(__h, _M_head());
-  }
-
   _Head _M_head_impl; 
 };
 
@@ -130,9 +125,11 @@
*/
   template
 struct _Tuple_impl<_Idx>
-{ 
+{
+  template friend class _Tuple_impl;
+
 protected:
-  void _M_swap_impl(_Tuple_impl&) { /* no-op */ }
+  void _M_swap(_Tuple_impl&) noexcept { /* no-op */ }
 };
 
   /**
@@ -145,6 +142,8 @@
 : public _Tuple_impl<_Idx + 1, _Tail...>,
   private _Head_base<_Idx, _Head, std::is_empty<_Head>::value>
 {
+  template friend class _Tuple_impl;
+
   typedef _Tuple_impl<_Idx + 1, _Tail...> _Inherited;
   typedef _Head_base<_Idx, _Head, std::is_empty<_Head>::value> _Base;
 
@@ -218,10 +217,14 @@
 
 protected:
   void
-  _M_swap_impl(_Tuple_impl& __in)
+  _M_swap(_Tuple_impl& __in)
+  noexcept(noexcept(swap(std::declval<_Head&>(),
+std::declval<_Head&>()))
+  && noexcept(__in._M_tail()._M_swap(__in._M_tail(
   {
-   _Base::_M_swap_impl(__in._M_head());
-   _Inherited::_M_swap_impl(__in._M_tail());
+   using std::swap;
+   swap(this->_M_head(), __in._M_head());
+   _Inherited::_M_swap(__in._M_tail());
   }
 };
 
@@ -300,14 +303,15 @@
 
   void
   swap(tuple& __in)
-  { _Inherited::_M_swap_impl(__in); }
+  noexcept(noexcept(__in._M_swap(__in)))
+  { _Inherited::_M_swap(__in); }
 };
 
   template<>  
 class tuple<>
 {
 public:
-  void swap(tuple&) { /* no-op */ }
+  void swap(tuple&) noexcept { /* no-op */ }
 };
 
   /// tuple (2-element), with construction and assignment from a pair.
@@ -360,6 +364,7 @@
 
   tuple&
   operator=(tuple&& __in)
+  // noexcept has to wait is_nothrow_move_assignable
   {
static_cast<_Inherited&>(*this) = std::move(__in);
return *this;
@@ -392,7 +397,7 @@
 
   template
 tuple&
-operator=(pair<_U1, _U2>&& __in)
+operator=(pair<_U1, _U2>&& __in) noexcept
 {
  this->_M_head() = std::forward<_U1>(__in.first);
  this->_M_tail()._M_head() = std::forward<_U2>(__in.second);
@@ -401,11 +406,8 @@
 
   void
   swap(tuple& __in)
-  { 
-   using std::swap;
-   swap(this->_M_head(), __in._M_head());
-   swap(this->_M_tail()._M_head(), __in._M_tail()._M_head());  
-  }
+  noexcept(noexcept(__in._M_swap(__in)))
+  { _Inherited::_M_swap(__in); }
 };
 
   /// tuple (1-element).
@@ -473,7 +475,8 @@
 
   void
   swap(tuple& __in)
-  { _Inherited::_M_swap_impl(__in); }
+  noexcept(noexcept(__in._M_swap(__in)))
+  { _Inherited::_M_swap(__in); }
 };
 
 
@@ -522,22 +525,31 @@
 __get_helper(const _Tuple_impl<__i, _Head, _Tail...>& __t)
 { return __t._M_head(); }
 
-  // Return a reference (const reference) to the ith element of a tuple.
-  // Any const or non-const ref elements are returned with their original type.
+  // Return a reference (const reference, rvalue reference) to the ith element
+  // of a tuple.  Any const or non-const ref elements are returned with their
+  // original type.
   template
 inline typename __add_ref<
-  typename tuple_element<__i, tuple<_Elements...> >::type
+  typename tuple_element<__i, tuple<_Elements...>>::type
 >::type
-get(tuple<_Elements...>& __t)
+get(tuple<_Elements...>& __t) noexcept
 { return __get_helper<__i>(__t); }
 

Fix PR 49026 (-mfpmath= attribute bug)

2011-05-17 Thread Joseph S. Myers
PR 49026 identified testsuite regressions when mfpmath= is set by
target attributes, that for some reason appear on x86_64-darwin but
not x86_64-linux.

This patch fixes one place where I failed to preserve the logic of
this attribute handling, and restores the code generated for the
testcase to the code attached to that PR as being generated before my
previous patch.

Bootstrapped with no regressions on x86_64-unknown-linux-gnu.  Applied
to mainline.

2011-05-17  Joseph Myers  

* config/i386/i386.c (ix86_valid_target_attribute_tree): Use
enum_opts_set when testing if attributes have set -mfpmath=.

Index: gcc/config/i386/i386.c
===
--- gcc/config/i386/i386.c  (revision 173809)
+++ gcc/config/i386/i386.c  (working copy)
@@ -4692,7 +4692,7 @@ ix86_valid_target_attribute_tree (tree a
   || target_flags != def->x_target_flags
   || option_strings[IX86_FUNCTION_SPECIFIC_ARCH]
   || option_strings[IX86_FUNCTION_SPECIFIC_TUNE]
-  || ix86_fpmath != def->x_ix86_fpmath)
+  || enum_opts_set.x_ix86_fpmath)
 {
   /* If we are using the default tune= or arch=, undo the string assigned,
 and use the default.  */

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH]: Restore bootstrap with --enable-build-with-cxx

2011-05-17 Thread Gabriel Dos Reis
On Tue, May 17, 2011 at 2:46 PM, Toon Moene  wrote:
> On 05/17/2011 08:32 PM, Uros Bizjak wrote:
>
>> Tested on x86_64-pc-linux-gnu {, m32} with --enable-build-with-cxx.
>> Committed to mainline SVN as obvious.
>
> Does that mean that I can now remove the --disable-werror from my daily C++
> bootstrap run ?
>
> It's great that some people understand the intricacies of the
> infight^H^H^H^H^H^H differences between the C and C++ type model.
>
> OK: 1/2 :-)

I suspect this infight would vanish if we just switched, as we discussed
in the past.

-- Gaby


Re: [google] Parameterize function overhead estimate for inlining

2011-05-17 Thread Xinliang David Li
You will have a followup patch to override arm defaults, right? Ok for
google/main.

Thanks,

David

On Tue, May 17, 2011 at 9:29 PM, Mark Heffernan  wrote:
> This tiny change improves the size estimation for inlining and results in an
> average 1% size reduction and a small (maybe 0.25% geomean) performance
> increase on internal benchmarks on x86-64.  I parameterized the value rather
> than changing it directly because previous exploration with x86 and ARM
> arches indicated that it varies significantly with architecture.  Default
> value is tuned for x86-64.
> Bootstrapped and tested on x86-64.  Will explore relevance and effectiveness
> for trunk and SPEC later.
> Ok for google/main?
> Mark
> 2011-05-17  Mark Heffernan  
>
>
>
>
>
>        * ipa-inline.c (estimate_function_body_sizes): Parameterize static
>
>
>        function static overhead.
>
>
>        * params.def (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE): New parameter.
> Index: ipa-inline.c
> ===
> --- ipa-inline.c (revision 173845)
> +++ ipa-inline.c (working copy)
> @@ -1979,10 +1979,11 @@ estimate_function_body_sizes (struct cgr
>    gcov_type time = 0;
>    gcov_type time_inlining_benefit = 0;
>    /* Estimate static overhead for function prologue/epilogue and alignment.
> */
> -  int size = 2;
> +  int size = PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE);
>    /* Benefits are scaled by probability of elimination that is in range
>       <0,2>.  */
> -  int size_inlining_benefit = 2 * 2;
> +  int size_inlining_benefit =
> +    PARAM_VALUE (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE) * 2;
>    basic_block bb;
>    gimple_stmt_iterator bsi;
>    struct function *my_function = DECL_STRUCT_FUNCTION (node->decl);
> Index: params.def
> ===
> --- params.def (revision 173845)
> +++ params.def (working copy)
> @@ -110,6 +110,11 @@ DEFPARAM (PARAM_MIN_INLINE_RECURSIVE_PRO
>    "Inline recursively only when the probability of call being executed
> exceeds the parameter",
>    10, 0, 0)
>
> +DEFPARAM (PARAM_INLINE_FUNCTION_OVERHEAD_SIZE,
> +  "inline-function-overhead-size",
> +  "Size estimate of function overhead (prologue and epilogue) for inlining
> purposes",
> +  7, 0, 0)
> +
>  /* Limit of iterations of early inliner.  This basically bounds number of
>     nested indirect calls early inliner can resolve.  Deeper chains are
> still
>     handled by late inlining.  */
>


[google] Increase inlining limits with FDO/LIPO

2011-05-17 Thread Mark Heffernan
This small patch greatly expands the function size limits for inlining
with FDO/LIPO.  With profile information, the inliner is much more
selective and precise and so the limits can be increased with less
worry that functions and total code size will blow up.  This speeds up
x86-64 internal benchmarks by about geomean 1.5% to 3% with LIPO
(depending on microarch), and 1% to 1.5% with FDO.  Size increase is
negligible (0.1% mean).

Bootstrapped and regression tested on x86-64.

Trunk testing to follow.

Ok for google/main?

Mark


2011-05-17  Mark Heffernan  

   * opts.c (finish_options): Increase inlining limits with profile
   generate and use.

Index: opts.c
===
--- opts.c  (revision 173666)
+++ opts.c  (working copy)
@@ -828,6 +828,22 @@ finish_options (struct gcc_options *opts
  opts->x_flag_split_stack = 0;
}
 }
+
+  if (opts->x_flag_profile_use
+  || opts->x_profile_arc_flag
+  || opts->x_flag_profile_values)
+{
+  /* With accurate profile information, inlining is much more
+selective and makes better decisions, so increase the
+inlining function size limits.  Changes must be added to both
+the generate and use builds to avoid profile mismatches.  */
+  maybe_set_param_value
+   (PARAM_MAX_INLINE_INSNS_SINGLE, 1000,
+opts->x_param_values, opts_set->x_param_values);
+  maybe_set_param_value
+   (PARAM_MAX_INLINE_INSNS_AUTO, 1000,
+opts->x_param_values, opts_set->x_param_values);
+}
 }


[patch] [1/2] Support reduction in loop SLP

2011-05-17 Thread Ira Rosen
Hi,

This is the first part of reduction support in loop-aware SLP. The
purpose of the patch is to handle "unrolled" reductions such as:

#a1 = phi 
...
a2 = a1 + x
...
a3 = a2 + y
...
a5 = a4 + z

Such sequence of statements is gathered into a reduction chain and
serves as a root for an SLP instance (similar to a group of strided
stores in the existing loop SLP implementation).

The patch also fixes PR tree-optimization/41881.

Since reduction chains use the same data structure as strided data
accesses, this part of the patch renames these data structures,
removing data-ref and interleaving references.

Bootstrapped and tested on powerpc64-suse-linux.
I am going to apply it later today.

Ira


ChangeLog:

* tree-vect-loop-manip.c (vect_create_cond_for_alias_checks): Use new
names for group elements access.
* tree-vectorizer.h (struct _stmt_vec_info): Use interleaving info for
reduction chains as well.  Remove data reference and interleaving 
related
words from the fields names.
* tree-vect-loop.c (vect_transform_loop): Use new names for group
elements access.
* tree-vect-data-refs.c (vect_get_place_in_interleaving_chain,
vect_insert_into_interleaving_chain, vect_update_interleaving_chain,
vect_update_interleaving_chain, vect_same_range_drs,
vect_analyze_data_ref_dependence, vect_update_misalignment_for_peel,
vect_verify_datarefs_alignment, vector_alignment_reachable_p,
vect_peeling_hash_get_lowest_cost, vect_enhance_data_refs_alignment,
vect_analyze_group_access, vect_analyze_data_ref_access,
vect_create_data_ref_ptr, vect_transform_strided_load,
vect_record_strided_load_vectors): Likewise.
* tree-vect-stmts.c (vect_model_simple_cost, vect_model_store_cost,
vect_model_load_cost, vectorizable_store, vectorizable_load,
vect_remove_stores, new_stmt_vec_info): Likewise.
* tree-vect-slp.c (vect_build_slp_tree,
vect_supported_slp_permutation_p, vect_analyze_slp_instance): Likewise.
Index: tree-vect-loop-manip.c
===
--- tree-vect-loop-manip.c  (revision 173814)
+++ tree-vect-loop-manip.c  (working copy)
@@ -2437,7 +2437,7 @@ vect_create_cond_for_alias_checks (loop_vec_info l
 
   dr_a = DDR_A (ddr);
   stmt_a = DR_STMT (DDR_A (ddr));
-  dr_group_first_a = DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt_a));
+  dr_group_first_a = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_a));
   if (dr_group_first_a)
 {
  stmt_a = dr_group_first_a;
@@ -2446,7 +2446,7 @@ vect_create_cond_for_alias_checks (loop_vec_info l
 
   dr_b = DDR_B (ddr);
   stmt_b = DR_STMT (DDR_B (ddr));
-  dr_group_first_b = DR_GROUP_FIRST_DR (vinfo_for_stmt (stmt_b));
+  dr_group_first_b = GROUP_FIRST_ELEMENT (vinfo_for_stmt (stmt_b));
   if (dr_group_first_b)
 {
  stmt_b = dr_group_first_b;
Index: tree-vectorizer.h
===
--- tree-vectorizer.h   (revision 173814)
+++ tree-vectorizer.h   (working copy)
@@ -468,15 +473,15 @@ typedef struct _stmt_vec_info {
   /*  Whether the stmt is SLPed, loop-based vectorized, or both.  */
   enum slp_vect_type slp_type;
 
-  /* Interleaving info.  */
-  /* First data-ref in the interleaving group.  */
-  gimple first_dr;
-  /* Pointer to the next data-ref in the group.  */
-  gimple next_dr;
-  /* In case that two or more stmts share data-ref, this is the pointer to the
- previously detected stmt with the same dr.  */
+  /* Interleaving and reduction chains info.  */
+  /* First element in the group.  */
+  gimple first_element;
+  /* Pointer to the next element in the group.  */
+  gimple next_element;
+  /* For data-refs, in case that two or more stmts share data-ref, this is the
+ pointer to the previously detected stmt with the same dr.  */
   gimple same_dr_stmt;
-  /* The size of the interleaving group.  */
+  /* The size of the group.  */
   unsigned int size;
   /* For stores, number of stores from this group seen. We vectorize the last
  one.  */
@@ -527,22 +532,22 @@ typedef struct _stmt_vec_info {
 #define STMT_VINFO_RELATED_STMT(S) (S)->related_stmt
 #define STMT_VINFO_SAME_ALIGN_REFS(S)  (S)->same_align_refs
 #define STMT_VINFO_DEF_TYPE(S) (S)->def_type
-#define STMT_VINFO_DR_GROUP_FIRST_DR(S)(S)->first_dr
-#define STMT_VINFO_DR_GROUP_NEXT_DR(S) (S)->next_dr
-#define STMT_VINFO_DR_GROUP_SIZE(S)(S)->size
-#define STMT_VINFO_DR_GROUP_STORE_COUNT(S) (S)->store_count
-#define STMT_VINFO_DR_GROUP_GAP(S) (S)->gap
-#define STMT_VINFO_DR_GROUP_SAME_DR_STMT(S)(S)->same_dr_stmt
-#define STMT_VINFO_DR_GROUP_READ_WRITE_DEPENDENCE(S)  (S)->read_write_dep
-#define STMT_VINFO_STRIDED_ACCESS(S)  ((S)->first_dr != NULL)
+#define STMT_VINFO_GROUP_FIRST_ELEMENT(S)  (S)->first_element
+#define STMT_VINFO

[patch] [2/2] Support reduction in loop SLP

2011-05-17 Thread Ira Rosen
This part adds the actual code for reduction support.

Bootstrapped and tested on powerpc64-suse-linux.
I am planning to apply it later today.

Ira

ChangeLog:

PR tree-optimization/41881
* tree-vectorizer.h (struct _loop_vec_info): Add new field
reduction_chains along with a macro for
its access.
* tree-vect-loop.c (new_loop_vec_info): Initialize reduction chains.
(destroy_loop_vec_info): Free reduction chains.
(vect_analyze_loop_2): Return false if vect_analyze_slp() returns false.
(vect_is_slp_reduction): New function.
(vect_is_simple_reduction_1): Call vect_is_slp_reduction.
(vect_create_epilog_for_reduction): Support SLP reduction chains.
* tree-vect-slp.c (vect_get_and_check_slp_defs): Allow different
definition types for reduction
chains.
(vect_supported_load_permutation_p): Don't allow permutations for
reduction chains.
(vect_analyze_slp_instance): Support reduction chains.
(vect_analyze_slp): Try to build SLP instance from reduction chains.
(vect_get_constant_vectors):  Handle reduction chains.
(vect_schedule_slp_instance): Mark the first statement of the
reduction chain as reduction.

testsuite/ChangeLog:

PR tree-optimization/41881
* gcc.dg/vect/O3-pr41881.c: New test.
* gcc.dg/vect/O3-slp-reduc-10.c: New test.
Index: testsuite/gcc.dg/vect/O3-slp-reduc-10.c
===
--- testsuite/gcc.dg/vect/O3-slp-reduc-10.c (revision 0)
+++ testsuite/gcc.dg/vect/O3-slp-reduc-10.c (revision 0)
@@ -0,0 +1,43 @@
+/* { dg-require-effective-target vect_int } */
+
+#include 
+#include "tree-vect.h"
+
+#define N 128
+#define TYPE int
+#define RESULT 755918
+
+__attribute__ ((noinline)) TYPE fun2 (TYPE *x, TYPE *y, unsigned int n)
+{
+  int i, j;
+  TYPE dot = 14;
+
+  for (i = 0; i < n / 2; i++)
+for (j = 0; j < 2; j++)
+  dot += *(x++) * *(y++);
+
+  return dot;
+}
+
+int main (void)
+{
+  TYPE a[N], b[N], dot;
+  int i;
+
+  check_vect ();
+
+  for (i = 0; i < N; i++)
+{
+  a[i] = i;
+  b[i] = i+8;
+}
+
+  dot = fun2 (a, b, N);
+  if (dot != RESULT)
+abort();
+
+  return 0;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target { 
vect_int_mult && {! vect_no_align } } } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
Index: testsuite/gcc.dg/vect/O3-pr41881.c
===
--- testsuite/gcc.dg/vect/O3-pr41881.c  (revision 0)
+++ testsuite/gcc.dg/vect/O3-pr41881.c  (revision 0)
@@ -0,0 +1,30 @@
+/* { dg-do compile } */
+
+#define TYPE int
+
+TYPE fun1(TYPE *x, TYPE *y, unsigned int n)
+{
+  int i, j;
+  TYPE dot = 0;
+
+  for (i = 0; i < n; i++)
+dot += *(x++) * *(y++);
+
+  return dot;
+}
+
+TYPE fun2(TYPE *x, TYPE *y, unsigned int n)
+{
+  int i, j;
+  TYPE dot = 0;
+
+  for (i = 0; i < n / 8; i++)
+for (j = 0; j < 8; j++)
+  dot += *(x++) * *(y++);
+
+  return dot;
+}
+
+/* { dg-final { scan-tree-dump-times "vectorized 1 loops" 2 "vect" { target { 
vect_int_mult && {! vect_no_align } } } } } */
+/* { dg-final { cleanup-tree-dump "vect" } } */
+
Index: tree-vectorizer.h
===
--- tree-vectorizer.h   (revision 173814)
+++ tree-vectorizer.h   (working copy)
@@ -248,6 +248,10 @@ typedef struct _loop_vec_info {
   /* Reduction cycles detected in the loop. Used in loop-aware SLP.  */
   VEC (gimple, heap) *reductions;
 
+  /* All reduction chains in the loop, represented by the first
+ stmt in the chain.  */
+  VEC (gimple, heap) *reduction_chains;
+
   /* Hash table used to choose the best peeling option.  */
   htab_t peeling_htab;
 
@@ -277,6 +281,7 @@ typedef struct _loop_vec_info {
 #define LOOP_VINFO_SLP_INSTANCES(L)(L)->slp_instances
 #define LOOP_VINFO_SLP_UNROLLING_FACTOR(L) (L)->slp_unrolling_factor
 #define LOOP_VINFO_REDUCTIONS(L)   (L)->reductions
+#define LOOP_VINFO_REDUCTION_CHAINS(L) (L)->reduction_chains
 #define LOOP_VINFO_PEELING_HTAB(L) (L)->peeling_htab
 
 #define LOOP_REQUIRES_VERSIONING_FOR_ALIGNMENT(L) \
Index: tree-vect-loop.c
===
--- tree-vect-loop.c(revision 173814)
+++ tree-vect-loop.c(working copy)
@@ -757,6 +757,7 @@ new_loop_vec_info (struct loop *loop)
PARAM_VALUE (PARAM_VECT_MAX_VERSION_FOR_ALIAS_CHECKS));
   LOOP_VINFO_STRIDED_STORES (res) = VEC_alloc (gimple, heap, 10);
   LOOP_VINFO_REDUCTIONS (res) = VEC_alloc (gimple, heap, 10);
+  LOOP_VINFO_REDUCTION_CHAINS (res) = VEC_alloc (gimple, heap, 10);
   LOOP_VINFO_SLP_INSTANCES (res) = VEC_alloc (slp_instance, heap, 10);
   LOOP_VINFO_SLP_UNROLLING_FACTOR (res) = 1;
   LOOP_VINFO_PEELING_HTAB (res) = NULL;
@@ -852,6 +853,7 @@ destroy_loop_vec_info (loop_vec_info loop_vinfo, b
   VEC_free (slp_instance, heap, 

[PATCH] Fix up execute_update_addresses_taken for debug stmts (PR tree-optimization/49000)

2011-05-17 Thread Jakub Jelinek
Hi!

When an addressable var is optimized into non-addressable, we didn't
clean up MEM_REFs containing ADDR_EXPR of such VARs in debug stmts.  This
got later on folded into the var itself and caused ssa verification errors.
Fixed by trying to rewrite it and if it fails, resetting.

Bootstrapped/regtested on x86_64-linux and i686-linux, no change in cc1plus
.debug_info/.debug_loc, implicitptr.c testcase still works too.
Ok for trunk/4.6?

2011-05-18  Jakub Jelinek  

PR tree-optimization/49000
* tree-ssa.c (execute_update_addresses_taken): Call
maybe_rewrite_mem_ref_base on debug stmt value.  If it couldn't
be rewritten and decl has been marked for renaming, reset
the debug stmt.

* gcc.dg/pr49000.c: New test.

--- gcc/tree-ssa.c.jj   2011-05-11 19:39:04.0 +0200
+++ gcc/tree-ssa.c  2011-05-17 18:20:10.0 +0200
@@ -2230,6 +2230,17 @@ execute_update_addresses_taken (void)
  }
  }
 
+   else if (gimple_debug_bind_p (stmt)
+&& gimple_debug_bind_has_value_p (stmt))
+ {
+   tree *valuep = gimple_debug_bind_get_value_ptr (stmt);
+   tree decl;
+   maybe_rewrite_mem_ref_base (valuep);
+   decl = non_rewritable_mem_ref_base (*valuep);
+   if (decl && symbol_marked_for_renaming (decl))
+ gimple_debug_bind_reset_value (stmt);
+ }
+
if (gimple_references_memory_p (stmt)
|| is_gimple_debug (stmt))
  update_stmt (stmt);
--- gcc/testsuite/gcc.dg/pr49000.c.jj   2011-05-17 18:30:10.0 +0200
+++ gcc/testsuite/gcc.dg/pr49000.c  2011-05-17 18:23:16.0 +0200
@@ -0,0 +1,29 @@
+/* PR tree-optimization/49000 */
+/* { dg-do compile } */
+/* { dg-options "-O2 -g" } */
+
+static
+foo (int x, int y)
+{
+  return x * y;
+}
+
+static int
+bar (int *z)
+{
+  return *z;
+}
+
+void
+baz (void)
+{
+  int a = 42;
+  int *b = &a;
+  foo (bar (&a), 3);
+}
+
+void
+test (void)
+{
+  baz ();
+}

Jakub


[PATCH] Small typed DWARF improvement

2011-05-17 Thread Jakub Jelinek
Hi!

This patch optimizes away unneeded DW_OP_GNU_converts.  mem_loc_descriptor
attempts to keep the operands signed when it returns, if next op
needs it unsigned again with the same size, there might be useless
converts.  The patch won't change DW_OP_GNU_convert to integral from
non-integral (so that say float to {un,}signed conversion is done with the
right sign), for other converts will change if possible preceeding typed
op's base type if size is the same, both the typed op and following
DW_OP_GNU_convert are integral or have the same encoding.
Example testcase which is improved is e.g.:
/* { dg-do run } */
/* { dg-options "-g" } */

volatile int vv;

__attribute__((noclone, noinline)) void
foo (double d)
{
  unsigned long f = ((unsigned long) d) / 33UL;
  vv++; /* { dg-final { gdb-test 10 "f" "7" } } */
}

int
main ()
{
  foo (231.0);
  return 0;
}
where previously we emitted
DW_OP_GNU_regval_type  DW_OP_GNU_convert 
DW_OP_GNU_convert  DW_OP_GNU_convert  DW_OP_const1u <33>
DW_OP_GNU_convert  DW_OP_div DW_OP_GNU_convert 
while with this patch DW_OP_GNU_convert  DW_OP_GNU_convert 
can go away.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-05-17  Jakub Jelinek  

* dwarf2out.c (resolve_addr_in_expr): Optimize away redundant
DW_OP_GNU_convert ops.

--- gcc/dwarf2out.c.jj  2011-05-17 13:35:26.0 +0200
+++ gcc/dwarf2out.c 2011-05-17 14:41:21.0 +0200
@@ -24092,23 +24092,84 @@ resolve_one_addr (rtx *addr, void *data 
 static bool
 resolve_addr_in_expr (dw_loc_descr_ref loc)
 {
+  dw_loc_descr_ref keep = NULL;
   for (; loc; loc = loc->dw_loc_next)
-if (((loc->dw_loc_opc == DW_OP_addr || loc->dtprel)
-&& resolve_one_addr (&loc->dw_loc_oprnd1.v.val_addr, NULL))
-   || (loc->dw_loc_opc == DW_OP_implicit_value
-   && loc->dw_loc_oprnd2.val_class == dw_val_class_addr
-   && resolve_one_addr (&loc->dw_loc_oprnd2.v.val_addr, NULL)))
-  return false;
-else if (loc->dw_loc_opc == DW_OP_GNU_implicit_pointer
-&& loc->dw_loc_oprnd1.val_class == dw_val_class_decl_ref)
+switch (loc->dw_loc_opc)
   {
-   dw_die_ref ref
- = lookup_decl_die (loc->dw_loc_oprnd1.v.val_decl_ref);
-   if (ref == NULL)
+  case DW_OP_addr:
+   if (resolve_one_addr (&loc->dw_loc_oprnd1.v.val_addr, NULL))
  return false;
-   loc->dw_loc_oprnd1.val_class = dw_val_class_die_ref;
-   loc->dw_loc_oprnd1.v.val_die_ref.die = ref;
-   loc->dw_loc_oprnd1.v.val_die_ref.external = 0;
+   break;
+  case DW_OP_const4u:
+  case DW_OP_const8u:
+   if (loc->dtprel
+   && resolve_one_addr (&loc->dw_loc_oprnd1.v.val_addr, NULL))
+ return false;
+   break;
+  case DW_OP_implicit_value:
+   if (loc->dw_loc_oprnd2.val_class == dw_val_class_addr
+   && resolve_one_addr (&loc->dw_loc_oprnd2.v.val_addr, NULL))
+ return false;
+   break;
+  case DW_OP_GNU_implicit_pointer:
+   if (loc->dw_loc_oprnd1.val_class == dw_val_class_decl_ref)
+ {
+   dw_die_ref ref
+ = lookup_decl_die (loc->dw_loc_oprnd1.v.val_decl_ref);
+   if (ref == NULL)
+ return false;
+   loc->dw_loc_oprnd1.val_class = dw_val_class_die_ref;
+   loc->dw_loc_oprnd1.v.val_die_ref.die = ref;
+   loc->dw_loc_oprnd1.v.val_die_ref.external = 0;
+ }
+   break;
+  case DW_OP_GNU_const_type:
+  case DW_OP_GNU_regval_type:
+  case DW_OP_GNU_deref_type:
+  case DW_OP_GNU_convert:
+  case DW_OP_GNU_reinterpret:
+   while (loc->dw_loc_next
+  && loc->dw_loc_next->dw_loc_opc == DW_OP_GNU_convert)
+ {
+   dw_die_ref base1, base2;
+   unsigned enc1, enc2, size1, size2;
+   if (loc->dw_loc_opc == DW_OP_GNU_regval_type
+   || loc->dw_loc_opc == DW_OP_GNU_deref_type)
+ base1 = loc->dw_loc_oprnd2.v.val_die_ref.die;
+   else
+ base1 = loc->dw_loc_oprnd1.v.val_die_ref.die;
+   base2 = loc->dw_loc_next->dw_loc_oprnd1.v.val_die_ref.die;
+   gcc_assert (base1->die_tag == DW_TAG_base_type
+   && base2->die_tag == DW_TAG_base_type);
+   enc1 = get_AT_unsigned (base1, DW_AT_encoding);
+   enc2 = get_AT_unsigned (base2, DW_AT_encoding);
+   size1 = get_AT_unsigned (base1, DW_AT_byte_size);
+   size2 = get_AT_unsigned (base2, DW_AT_byte_size);
+   if (size1 == size2
+   && (((enc1 == DW_ATE_unsigned || enc1 == DW_ATE_signed)
+&& (enc2 == DW_ATE_unsigned || enc2 == DW_ATE_signed)
+&& loc != keep)
+   || enc1 == enc2))
+ {
+   /* Optimize away next DW_OP_GNU_convert after
+  adjusting LOC's base type die reference.  */
+   if (loc->dw_loc_opc == DW_OP_GNU_regval_type
+   || loc->dw_loc_opc == DW_O

Re: [google] Increase inlining limits with FDO/LIPO

2011-05-17 Thread Xinliang David Li
To make consistent inline decisions between profile-gen and
profile-use, probably better to check these two:

flag_profile_arcs and flag_branch_probabilities.  -fprofile-use
enables profile-arcs, and value profiling is enabled only when
edge/branch profiling is enabled (so no need to be checked).

David


On Tue, May 17, 2011 at 10:50 PM, Mark Heffernan  wrote:
> This small patch greatly expands the function size limits for inlining with
> FDO/LIPO.  With profile information, the inliner is much more selective and
> precise and so the limits can be increased with less worry that functions
> and total code size will blow up.  This speeds up x86-64 internal benchmarks
> by about geomean 1.5% to 3% with LIPO (depending on microarch), and 1% to
> 1.5% with FDO.  Size increase is negligible (0.1% mean).
> Bootstrapped and regression tested on x86-64.
> Trunk testing to follow.
> Ok for google/main?
> Mark
>
> 2011-05-17  Mark Heffernan  
>        * opts.c (finish_options): Increase inlining limits with profile
>        generate and use.
>
> Index: opts.c
> ===
> --- opts.c (revision 173666)
> +++ opts.c (working copy)
> @@ -828,6 +828,22 @@ finish_options (struct gcc_options *opts
>    opts->x_flag_split_stack = 0;
>   }
>      }
> +
> +  if (opts->x_flag_profile_use
> +      || opts->x_profile_arc_flag
> +      || opts->x_flag_profile_values)
> +    {
> +      /* With accurate profile information, inlining is much more
> + selective and makes better decisions, so increase the
> + inlining function size limits.  Changes must be added to both
> + the generate and use builds to avoid profile mismatches.  */
> +      maybe_set_param_value
> + (PARAM_MAX_INLINE_INSNS_SINGLE, 1000,
> + opts->x_param_values, opts_set->x_param_values);
> +      maybe_set_param_value
> + (PARAM_MAX_INLINE_INSNS_AUTO, 1000,
> + opts->x_param_values, opts_set->x_param_values);
> +    }
>  }
>


Re: [patch gimplifier]: Make sure TRUTH_NOT_EXPR has boolean_type_node type and argument

2011-05-17 Thread Kai Tietz
2011/5/16 Richard Guenther :
> On Mon, May 16, 2011 at 3:45 PM, Michael Matz  wrote:
>> Hi,
>>
>> On Mon, 16 May 2011, Richard Guenther wrote:
>>
>>> > I think conversion _to_ BOOLEAN_TYPE shouldn't be useless, on the
>>> > grounds that it requires booleanization (at least conceptually), i.e.
>>> > conversion to a set of two values (no matter the precision or size)
>>> > based on the outcome of comparing the RHS value with
>>> > false_pre_image(TREE_TYPE(RHS)).
>>> >
>>> > Conversion _from_ BOOLEAN_TYPE can be regarded as useless, as the
>>> > conversions from false or true into false_pre_image or true_pre_image
>>> > always is simply an embedding of 0 or 1/-1 (depending on target type
>>> > signedness).  And if the BOOLEAN_TYPE and the LHS have same signedness
>>> > the bit representation of boolean_true_type is (or should be) the same
>>> > as the one converted to LHS (namely either 1 or -1).
>>>
>>> Sure, that would probably be enough to prevent non-BOOLEAN_TYPEs be used
>>> where BOOLEAN_TYPE nodes were used before.  It still will cause an
>>> artificial conversion from a single-bit bitfield read to a bool.
>>
>> Not if you're special casing single-bit conversions (on the grounds that a
>> booleanization from two-valued set to a different two-valued set of
>> the same signedness will not actually require a comparison).  I think it's
>> better to be very precise in our base predicates than to add various hacks
>> over the place to care for imprecision.
>
> Or require a 1-bit integral type for TRUTH_* operands only (which ensures
> the two-valueness which is what we really want).  That can be done
> by either fixing the frontends to make boolean_type_node have 1-bit
> precision or to build a middle-end private type with that constraints
> (though that's the more difficult route as we still do not have a strong
> FE - middle-end hand-off point, and it certainly is not the gimplifier).
>
> Long term all the global trees should be FE private and the middle-end
> should have its own set.
>
> Richard.
>
>>
>> Ciao,
>> Michael.
>

Hello,

initial idea was to check for logical operations that the conversion
to boolean_type_node
is useless.  This assumption was flawed by the fact that
boolean_type_node gets re-defined
in free_lang_decl to a 1-bit precision BOOL_TYPE_SIZE-ed type, if FE's
boolean_type_node
is incompatible to this.  By this FE's boolean_type_node gets via the
back-door incompatible
in tree-cfg checks.
So for all languages - but ADA - logical types have precision set to
one. Just for ADA case,
which requires a different boolean_type_node kind, we need to inspect
the inner type to be
a boolean.  As Fortran has also integer typed boolean compatible
types, we can't simply check
for BOOLEAN_TYPE here and need to check for precision first.

ChangeLog

2011-05-18  Kai Tietz

PR middle-end/48989
* tree-cfg.c (verify_gimple_assign_binary): Check lhs type
for being compatible to boolean for logical operations.
(verify_gimple_assign_unary): Likewise.
(compatible_boolean_type_p): New helper.

Bootstrapped on x86_64-pc-linux-gnu. And regression tested for ADA
and Fortran.

Ok for apply?

Regards,
Kai
Index: gcc/gcc/tree-cfg.c
===
--- gcc.orig/gcc/tree-cfg.c 2011-05-16 14:26:12.369031500 +0200
+++ gcc/gcc/tree-cfg.c  2011-05-18 08:20:34.935819100 +0200
@@ -3220,6 +3220,31 @@ verify_gimple_comparison (tree type, tre
   return false;
 }
 
+/* Checks TYPE for being compatible to boolean. Returns
+   FALSE, if type is not compatible, otherwise TRUE.
+
+   A type is compatible if
+   a) TYPE_PRECISION is one.
+   b) The type - or the inner type - is of kind BOOLEAN_TYPE.  */
+
+static bool
+compatible_boolean_type_p (tree type)
+{
+  if (!type)
+return false;
+  if (TYPE_PRECISION (type) == 1)
+return true;
+
+  /* We try to look here into inner type, as ADA uses
+ boolean_type_node with type precision != 1.  */
+  while (TREE_TYPE (type)
+&& (TREE_CODE (type) == INTEGER_TYPE
+|| TREE_CODE (type) == REAL_TYPE))
+type = TREE_TYPE (type);
+
+  return TYPE_PRECISION (type) == 1 || TREE_CODE (type) == BOOLEAN_TYPE;
+}
+
 /* Verify a gimple assignment statement STMT with an unary rhs.
Returns true if anything is wrong.  */
 
@@ -3350,15 +3375,16 @@ verify_gimple_assign_unary (gimple stmt)
   return false;
 
 case TRUTH_NOT_EXPR:
-  if (!useless_type_conversion_p (boolean_type_node,  rhs1_type))
+
+  if (!useless_type_conversion_p (lhs_type,  rhs1_type)
+  || !compatible_boolean_type_p (lhs_type))
 {
-   error ("invalid types in truth not");
-   debug_generic_expr (lhs_type);
-   debug_generic_expr (rhs1_type);
-   return true;
+   error ("invalid types in truth not");
+   debug_generic_expr (lhs_type);
+   debug_generic_expr (rhs1_type);
+   return true;
 }
   break;
-