Re: [PATCH 2/3] canonicalize_loop_ivs should not generate unsigned types.

2011-07-25 Thread Richard Guenther
On Sun, 24 Jul 2011, Sebastian Pop wrote:

> On Sun, Jul 24, 2011 at 05:59, Richard Guenther
>  wrote:
> > For two IVs with the same precision, one signed and one unsigned you choose
> > signedness of the canonical IV based on the random order of PHIs - that 
> > doesn't
> > look correct.
> >
> > I think what you should do here is use an unsigned type if any of the IVs 
> > with
> > the current max precision is unsigned.
> 
> Attached is the updated patch.  Bootstrapped and tested on amd64-linux.
> Ok for trunk?

   for (psi = gsi_start_phis (loop->header);
!gsi_end_p (psi); gsi_next (&psi))
@@ -1207,11 +1209,25 @@ canonicalize_loop_ivs (struct loop *loop, tree 
*nit,
bool bump_in_latch)
   gimple phi = gsi_stmt (psi);
   tree res = PHI_RESULT (phi);
 
-  if (is_gimple_reg (res) && TYPE_PRECISION (TREE_TYPE (res)) >
precision)
-   precision = TYPE_PRECISION (TREE_TYPE (res));
+  if (!is_gimple_reg (res))
+   continue;
+
+  type = TREE_TYPE (res);
+  if (TYPE_PRECISION (type) > precision)
+   {
+ precision = TYPE_PRECISION (type);
+ unsigned_p = TYPE_UNSIGNED (type);
+ continue;
+   }
+
+  if (TYPE_PRECISION (type) == precision
+ && TYPE_UNSIGNED (type))
+   unsigned_p = true;
 }

I think we also need to care for non-integral PHIs where TYPE_PRECISION
and TYPE_UNSIGNED are not applicable (seems the original code is also
buggy here?).  So, sth like

  type = TREE_TYPE (res);
  if (!is_gimple_reg (res)
  || !INTEGRAL_TYPE_P (type)
  || TYPE_PRECISION (type) < precision)
continue;

  precision = TYPE_PRECISION (type);
  unsigned_p |= TYPE_UNSIGNED (type);
}

would be ok.

Ok with that change.

Thanks,
Richard.


> 
> Thanks,
> Sebastian
> 

-- 
Richard Guenther 
Novell / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendörffer

Re: [Patch] [C++0x] Support decltype-specifier as start of nested-name-specifier. (Bug 6709)

2011-07-25 Thread Adam Butcher
On Fri, July 22, 2011 10:23 pm, Jason Merrill wrote:
> On 07/21/2011 11:48 AM, Adam Butcher wrote:
>> No worries.  I'm guilty of not checking mails for months due to other
>> commitments so didn't see either of your responses (or the committed
>> fix) on this decltype stuff until now.
>
> I was beginning to think you were dead or something, glad to see that's
> not the case.  :)
>
:)

>>> Do you have tests for your decltype patches?
>>>
>> I assume you no longer need these as you've made your own on 176513
>> right?
>
> Yes, though they aren't terribly exhaustive if you have ideas about
> other variations we ought to test...
>
Just glanced over the changeset and looks like the tests are pretty much
equivalent to my own -- just spelt differently.  Just building 4.7 head
now.

> Incidentally, the GNU copyright clerk told me that your copyright
> assignment isn't complete yet.
>
That's odd.  I thought that was sorted ages ago around the time of the
old lambda branch fixes (pre 4.5) and the polymorphic lambda stuff.  I
definitely sent the paperwork off.  Who do I need to contact to sort
that out?  I do intend to recreate the polymorphic lambda stuff against
4.7 at some point when I get some breathing time.  The original patch no
longer fits and had some problems.




[Melt] Fix foreach_edge_bb_precs

2011-07-25 Thread Romain Geissler
Hello,

This iteratoc won't work because of a little typo error (the previous
edge field is preDs and not preCs).
To avoid future errors, i apply the global /precs/preds/ change (and
thus the iterator is renamed).

Romain Geissler


Changelog
Description: Binary data


foreach_edge_bb_preds.diff
Description: Binary data


Re: [patch] Fix inlining glitch

2011-07-25 Thread Richard Guenther
On Sun, Jul 24, 2011 at 7:12 PM, Eric Botcazou  wrote:
> Hi,
>
> we sometimes get messages like this in Ada:
>
> prime-mc2-other.adb: In function 'PRIME.MC2.OTHER.DO_SOMETHING':
> prime-mc2.adb:2:4: warning: inlining failed in call
> to 'PRIME.MC2.GET_INPUT_VALUE.PART': non-call exception handling mismatch
> [-Winline]
> prime-mc2-other.adb:3:4: warning: called from here [-Winline]
>
> Since this is for a pure Ada program, it's unexpected.  This stems from 
> virtual
> cloning: cgraph_create_virtual_clone creates the virtual clone and does:
>
>  DECL_STRUCT_FUNCTION (new_decl) = NULL;
>
> so the can_throw_non_call_exceptions flag isn't preserved and 
> can_inline_edge_p
> is fooled into thinking that it cannot inline.
>
> It's probably better not to fiddle with virtual cloning so the attached patch
> teaches can_inline_edge_p to look into DECL_STRUCT_FUNCTION of the original
> nodes if it is dealing with virtual clones.
>
> Tested on i586-suse-linux, OK for the mainline?

Doesn't cgraph_function_or_thunk_node already deal with this?  Honza?

Richard.

>
> 2011-07-24  Eric Botcazou  
>
>        * ipa-inline.c (can_inline_edge_p): Look into DECL_STRUCT_FUNCTION of
>        original nodes if we are dealing with virtual clones.
>
>
> --
> Eric Botcazou
>


Re: [PATCH] Fix PR 49671 volatile goes missing after inlining

2011-07-25 Thread Richard Guenther
On Mon, Jul 25, 2011 at 1:34 AM, Andrew Pinski  wrote:
> Hi,
>  There are two issues, first the inliner does not copy a volatile
> when creating a new tree in one case.  The second issue is that
> IPA-SRA does not check if we are deferencing a pointer variable via a
> volatile type.
>
> OK?  Bootstrapped and tested on x86_64-linux-gnu with no regressions.

Ok.

Can you add a testcase?

Thanks,
Richard.

> Thanks,
> Andrew Pinski
>
> ChangeLog:
> * tree-inline.c (remap_gimple_op_r): Copy TREE_THIS_VOLATILE and
> TREE_THIS_NOTRAP into the inner most MEM_REF.
> Always copy TREE_THIS_VOLATILE.
> * tree-sra.c (ptr_parm_has_direct_uses): Check that the lhs, rhs and
> arguments are not volatile references.
>


Re: [patch] Fix PR tree-optimization/49771

2011-07-25 Thread Ulrich Weigand
Richard Guenther wrote:
> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen  wrote:
> > On 21 July 2011 15:19, Ira Rosen  wrote:
> >> I reproduced the failure. It occurs without Richard's
> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
> >> patches too. Obviously the vectorized loop is executed, but at the
> >> moment I don't understand why. I'll have a better look on Sunday.
> >
> > Actually it doesn't choose the vectorized code. But the scalar version
> > gets optimized in a harmful way for SPU, AFAIU.
> > Here is the scalar loop after vrp2
> >
> > :
> >  # ivtmp.42_50 = PHI 
> >  D.4593_42 = (void *) ivtmp.53_32;
> >  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
> >  D.4521_34 = D.4520_33 + 1;
> >  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
> >  ivtmp.42_45 = ivtmp.42_50 + 4;
> >  if (ivtmp.42_45 != 16)
> >goto ;
> >  else
> >goto ;
> >
> > and the load is changed by dom2 to:
> >
> > :
> >  ...
> >  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
> >   ...
> >
> > where vector(4) int * vect_pa.9;
> >
> > And the scalar loop has no rotate for that load:
> 
> Hum.  This smells like we are hiding sth from the tree optimizers?

Well, the back-end assumes a pointer to vector type is always
naturally aligned, and therefore the data it points to can be
accessed via a simple load, with no extra rotate needed.

It seems what happened here is that somehow, a pointer to int
gets replaced by a pointer to vector, even though their alignment
properties are different.

This vector pointer must originate somehow in the vectorizer,
however, since the original C source does not contain any
vector types at all ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: vect-70.c fails on spu-elf

2011-07-25 Thread Ulrich Weigand
Ira Rosen wrote:
> "Ulrich Weigand"  wrote on 22/07/2011 05:05:57 PM:
> > Any suggestions how to fix this?  Maybe decrease N again and instead
> > prevent unrolling via command line switch?
> 
> There is no flag for this unrolling, but we can run the test with -O1
> instead of -O2 (and with N=12) by renaming vect-70.c to O1-vect-70.c (see
> the attached patch).
> 
> > Maybe just decrease
> > *some* dimensions of the tmp1 array?
> 
> This can help too, if it is small enough:

Either of the two patches you suggest fix the problem for me on spu-elf.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: ARM: Clear icache when creating a closure

2011-07-25 Thread Andrew Haley
On 21/07/11 16:33, Joseph S. Myers wrote:
> On Tue, 12 Jul 2011, Andrew Haley wrote:
> 
 *(unsigned int*) &__tramp[0] = 0xe92d000f; /* stmfd sp!, {r0-r3} */ \
 *(unsigned int*) &__tramp[4] = 0xe59f; /* ldr r0, [pc] */   \
 *(unsigned int*) &__tramp[8] = 0xe59ff000; /* ldr pc, [pc] */   \
> 
>>> Your patch looks sane, but I'll observe here that the poking of
>>> instruction values is wrong on cores that run in BE-8 mode (where
>>> instructions are always little-endian).
>>
>> Oh dear.  How would one test for BE-8 mode on a Linux system?
> 
> My suggestion would be putting the instruction sequence in a .s file, 
> rather than hardcoding the instruction encodings here, and writing the 
> code to read from the sequence as assembled by the assembler.  That way it 
> will have the appropriate mapping symbols to mark it as ARM-mode code and 
> the linker will deal with adjusting endianness, so you don't need to test 
> for BE-8 at all.

OK, I'll have a look at doing that.

Andrew.


[Patch,AVR]: PR49687 (better widening 32-bit mul)

2011-07-25 Thread Georg-Johann Lay
This is the second part for a better widening multiply for AVR,
namely widening to 32 bit when a MUL instructions are available.

This as a bit more complicated than the 16-bit case because the
multiplications are emit as implicit libgcc calls and involve
hard registers.  Thus, all splits and expansion has to be done
before register allocation.

The patch includes widening multiply from QI to SI, too.
If a QI is involved the extension is done in two steps:
An explicit, inlined QI->HI extension and an implicit HI->SI
extension in the library routine.

The __mulsi3 is rewritten; it now runs a bit slower and needs a
bit more code because __umulhisi3 and __muluhisi3 are factored out
to faciliate code-reuse.  In particular, multiplication
with a small constant (i.e. 17-bit signed -65536...65536) perform
better and will reuse such functions.

Eric, can you review the assembler routines and say if such reuse is ok
or if you'd prefer a speed-optimized version of __mulsi3 like in the current 
libgcc?

The new multiplication routines aim at a minimal register usage footprint:
No registers need to be clobbered except R26/R27 for __mulhi3.

The patch passes without regressions, of course.

Moreover, I drove individual tests of the routines against the old 
implementation
before integrating then into libgcc to run regression tests.

Ok to install?

Johann


PR target/49687
* config/avr/t-avr (LIB1ASMFUNCS): Remove _xmulhisi3_exit.
Add _muluhisi3, _mulshisi3, _usmulhisi3.
* config/avr/libgcc.S (__mulsi3): Rewrite.
(__mulhisi3): Rewrite.
(__umulhisi3): Rewrite.
(__usmulhisi3): New.
(__muluhisi3): New.
(__mulshisi3): New.
(__mulohisi3): New.
(__mulqi3, __mulqihi3, __umulqihi3, __mulhi3): Use DEFUN/ENDF to
declare.
* config/avr/predicates.md (pseudo_register_operand): Rewrite.
(pseudo_register_or_const_int_operand): New.
(combine_pseudo_register_operand): New.
(u16_operand): New.
(s16_operand): New.
(o16_operand): New.
* config/avr/avr.c (avr_rtx_costs): Handle costs for mult:SI.
* config/avr/avr.md (QIHI, QIHI2): New mode iterators.
(any_extend, any_extend2): New code iterators.
(extend_prefix): New code attribute.
(mulsi3): Rewrite. Turn insn to expander.
(mulhisi3): Ditto.
(umulhisi3): Ditto.
(usmulhisi3): New expander.
(*mulsi3): New insn-and-split.
(mulusi3): New insn-and-split.
(mulssi3): New insn-and-split.
(mulohisi3): New insn-and-split.
(*uumulqihisi3, *uumulhiqisi3, *uumulhihisi3, *uumulqiqisi3,
*usmulqihisi3, *usmulhiqisi3, *usmulhihisi3, *usmulqiqisi3,
*sumulqihisi3, *sumulhiqisi3, *sumulhihisi3, *sumulqiqisi3,
*ssmulqihisi3, *ssmulhiqisi3, *ssmulhihisi3, *ssmulqiqisi3): New
insn-and-split.
(*mulsi3_call): Rewrite.
(*mulhisi3_call): Rewrite.
(*umulhisi3_call): Rewrite.
(*usmulhisi3_call): New insn.
(*muluhisi3_call): New insn.
(*mulshisi3_call): New insn.
(*mulohisi3_call): New insn.
(extendqihi2): Use combine_pseudo_register_operand as predicate
for operand 1.
(extendqisi2): Ditto.
(zero_extendqihi2): Ditto.
(zero_extendqisi2): Ditto.
(zero_extendhisi2): Ditto.
(extendhisi2): Ditto. Don't early-clobber operand 0.

Index: config/avr/predicates.md
===
--- config/avr/predicates.md	(revision 176624)
+++ config/avr/predicates.md	(working copy)
@@ -155,10 +155,34 @@ (define_predicate "call_insn_operand"
(ior (match_test "register_operand (XEXP (op, 0), mode)")
 (match_test "CONSTANT_ADDRESS_P (XEXP (op, 0))"
 
+;; For some insns we must ensure that no hard register is inserted
+;; into their operands because the insns are split and the split
+;; involves hard registers.  An example are divmod insn that are
+;; split to insns that represent implicit library calls.
+
 ;; True for register that is pseudo register.
 (define_predicate "pseudo_register_operand"
-  (and (match_code "reg")
-   (match_test "!HARD_REGISTER_P (op)")))
+  (and (match_operand 0 "register_operand")
+   (not (and (match_code "reg")
+ (match_test "HARD_REGISTER_P (op)")
+
+;; True for operand that is pseudo register or CONST_INT.
+(define_predicate "pseudo_register_or_const_int_operand"
+  (ior (match_operand 0 "const_int_operand")
+   (match_operand 0 "pseudo_register_operand")))
+
+;; We keep combiner from inserting hard registers into the input of sign- and
+;; zero-extends.  A hard register in the input operand is not wanted because
+;; 32-bit multiply patterns clobber some hard registers and extends with a
+;; hard register that overlaps these clobbers won't combine to a widening
+;; multiplication.  There is no need for combine to propagat

Re: [patch] Fix PR tree-optimization/49771

2011-07-25 Thread Richard Guenther
On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand  wrote:
> Richard Guenther wrote:
>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen  wrote:
>> > On 21 July 2011 15:19, Ira Rosen  wrote:
>> >> I reproduced the failure. It occurs without Richard's
>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
>> >> patches too. Obviously the vectorized loop is executed, but at the
>> >> moment I don't understand why. I'll have a better look on Sunday.
>> >
>> > Actually it doesn't choose the vectorized code. But the scalar version
>> > gets optimized in a harmful way for SPU, AFAIU.
>> > Here is the scalar loop after vrp2
>> >
>> > :
>> >  # ivtmp.42_50 = PHI 
>> >  D.4593_42 = (void *) ivtmp.53_32;
>> >  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
>> >  D.4521_34 = D.4520_33 + 1;
>> >  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
>> >  ivtmp.42_45 = ivtmp.42_50 + 4;
>> >  if (ivtmp.42_45 != 16)
>> >    goto ;
>> >  else
>> >    goto ;
>> >
>> > and the load is changed by dom2 to:
>> >
>> > :
>> >  ...
>> >  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
>> >   ...
>> >
>> > where vector(4) int * vect_pa.9;
>> >
>> > And the scalar loop has no rotate for that load:
>>
>> Hum.  This smells like we are hiding sth from the tree optimizers?
>
> Well, the back-end assumes a pointer to vector type is always
> naturally aligned, and therefore the data it points to can be
> accessed via a simple load, with no extra rotate needed.

I can't see any use of VECTOR_TYPE in config/spu/, and assuming
anything about alignment just because of the kind of the pointer
is bogus - the scalar code does a scalar read using that pointer.
So the backend better should look at the memory operation, not
at the pointer type.  That said, I can't find any code that looks
suspicious in the spu backend.

> It seems what happened here is that somehow, a pointer to int
> gets replaced by a pointer to vector, even though their alignment
> properties are different.

No, they are not.  They get replaced if they are value-equivalent
in which case they are also alignment-equivalent.  But well, the
dump snippet wasn't complete and I don't feel like building a
SPU cross to verify myself.

> This vector pointer must originate somehow in the vectorizer,
> however, since the original C source does not contain any
> vector types at all ...

That's for sure true, it must be the initial pointer we then increment
in the vectorized loop.

Richard.

> Bye,
> Ulrich
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  ulrich.weig...@de.ibm.com
>


Re: [patch] Fix inlining glitch

2011-07-25 Thread Jan Hubicka
> On Sun, Jul 24, 2011 at 7:12 PM, Eric Botcazou  wrote:
> > Hi,
> >
> > we sometimes get messages like this in Ada:
> >
> > prime-mc2-other.adb: In function 'PRIME.MC2.OTHER.DO_SOMETHING':
> > prime-mc2.adb:2:4: warning: inlining failed in call
> > to 'PRIME.MC2.GET_INPUT_VALUE.PART': non-call exception handling mismatch
> > [-Winline]
> > prime-mc2-other.adb:3:4: warning: called from here [-Winline]
> >
> > Since this is for a pure Ada program, it's unexpected.  This stems from 
> > virtual
> > cloning: cgraph_create_virtual_clone creates the virtual clone and does:
> >
> >  DECL_STRUCT_FUNCTION (new_decl) = NULL;
> >
> > so the can_throw_non_call_exceptions flag isn't preserved and 
> > can_inline_edge_p
> > is fooled into thinking that it cannot inline.
> >
> > It's probably better not to fiddle with virtual cloning so the attached 
> > patch
> > teaches can_inline_edge_p to look into DECL_STRUCT_FUNCTION of the original
> > nodes if it is dealing with virtual clones.
> >
> > Tested on i586-suse-linux, OK for the mainline?
> 
> Doesn't cgraph_function_or_thunk_node already deal with this?  Honza?

No, the problem here is deciding whether we can inline a clone.
We look into DECL_STRUCT_FUNCTION that we can't.  The real fix is one commented 
in:

  /* Don't inline if the callee can throw non-call exceptions but the
 caller cannot.
 FIXME: this is obviously wrong for LTO where STRUCT_FUNCTION is missing.
 Move the flag into cgraph node or mirror it in the inline summary.  */

I plan to look into this before next release.  I would, for sure, welcome Eric 
beating me. If he don't have
time to do so, I think the patch is OK as it is, since it improves the 
situation despite the fact that
it won't fix the same problem with WPA.

Honza
> 
> Richard.
> 
> >
> > 2011-07-24  Eric Botcazou  
> >
> >        * ipa-inline.c (can_inline_edge_p): Look into DECL_STRUCT_FUNCTION of
> >        original nodes if we are dealing with virtual clones.
> >
> >
> > --
> > Eric Botcazou
> >


Re: PING: PATCH [4/n]: Prepare x32: Permute the conversion and addition if one operand is a constant

2011-07-25 Thread Paolo Bonzini

On 07/13/2011 07:48 PM, H.J. Lu wrote:

Here is the patch.  OK for trunk?


Again, at least you should explain clearly _why_ you need 
ignore_address_wrap_around.  You said elsewhere x32 should be first 
clean, then fast.



   if (GET_CODE (x) == SUBREG && SUBREG_PROMOTED_VAR_P (x)
   && GET_MODE_SIZE (GET_MODE (SUBREG_REG (x))) >= GET_MODE_SIZE (mode)
   && SUBREG_PROMOTED_UNSIGNED_P (x) == unsignedp)
+{
+  if (no_emit)
+   x = rtl_hooks.gen_lowpart_no_emit (mode, x);
+  else
+   x = gen_lowpart (mode, x);
+}
@@ -773,7 +781,10 @@ convert_modes (enum machine_mode mode, enum machine_mode 
oldmode, rtx x, int uns
  return gen_int_mode (val, mode);
}

-  return gen_lowpart (mode, x);
+  if (no_emit)
+   return rtl_hooks.gen_lowpart_no_emit (mode, x);
+  else
+   return gen_lowpart (mode, x);
 }


These should be

  rtx tem = rtl_hooks.gen_lowpart_no_emit (mode, x);
  if (tem)
x = tem;

  rtx tem = rtl_hooks.gen_lowpart_no_emit (mode, x);
  if (tem)
return x;

since the "emitting" case can just reuse the code below.  However, see 
the patch I'm sending now.


Paolo


[PATCH] Saner return value for gen_lowpart_no_emit

2011-07-25 Thread Paolo Bonzini
For some reason, when I "invented" gen_lowpart_no_emit I defaulted it
to returning the original value of X.  Since gen_lowpart_no_emit is
mostly used to return simplifications, the correct thing to return when
conversion fails is NULL.  As a follow-up, every use in simplify-rtx.c
could be changed to try other simplifications if gen_lowpart_no_emit
fails; for now, I'm just avoiding a NULL pointer dereference.

2011-07-25  Paolo Bonzini  

* rtlhooks.c (gen_lowpart_no_emit_general): Remove.
* rtlhooks-def.h (gen_lowpart_no_emit_general): Remove prototype.
(RTL_HOOKS_GEN_LOWPART_NO_EMIT): Default to gen_lowpart_if_possible.

Index: rtlhooks.c
===
--- rtlhooks.c  (revision 169877)
+++ rtlhooks.c  (working copy)
@@ -80,18 +80,6 @@ gen_lowpart_general (enum machine_mode m
 }
 }
 
-/* Similar to gen_lowpart, but cannot emit any instruction via
-   copy_to_reg or force_reg.  Mainly used in simplify-rtx.c.  */
-rtx
-gen_lowpart_no_emit_general (enum machine_mode mode, rtx x)
-{
-  rtx result = gen_lowpart_if_possible (mode, x);
-  if (result)
-return result;
-  else
-return x;
-}
-
 rtx
 reg_num_sign_bit_copies_general (const_rtx x ATTRIBUTE_UNUSED,
 enum machine_mode mode ATTRIBUTE_UNUSED,
Index: rtlhooks-def.h
===
--- rtlhooks-def.h  (revision 169877)
+++ rtlhooks-def.h  (working copy)
@@ -23,7 +23,7 @@ along with GCC; see the file COPYING3.  
 #include "rtl.h"
 
 #define RTL_HOOKS_GEN_LOWPART gen_lowpart_general
-#define RTL_HOOKS_GEN_LOWPART_NO_EMIT gen_lowpart_no_emit_general
+#define RTL_HOOKS_GEN_LOWPART_NO_EMIT gen_lowpart_if_possible
 #define RTL_HOOKS_REG_NONZERO_REG_BITS reg_nonzero_bits_general
 #define RTL_HOOKS_REG_NUM_SIGN_BIT_COPIES reg_num_sign_bit_copies_general
 #define RTL_HOOKS_REG_TRUNCATED_TO_MODE reg_truncated_to_mode_general
@@ -38,7 +38,6 @@ along with GCC; see the file COPYING3.  
 }
 
 extern rtx gen_lowpart_general (enum machine_mode, rtx);
-extern rtx gen_lowpart_no_emit_general (enum machine_mode, rtx);
 extern rtx reg_nonzero_bits_general (const_rtx, enum machine_mode, const_rtx,
 enum machine_mode,
 unsigned HOST_WIDE_INT,
Index: simplify-rtx.c
===
--- simplify-rtx.c  (revision 169877)
+++ simplify-rtx.c  (working copy)
@@ -1039,6 +1039,8 @@ simplify_unary_operation_1 (enum rtx_cod
{
  rtx inner =
rtl_hooks.gen_lowpart_no_emit (tmode, XEXP (XEXP (op, 0), 0));
+  if (!inner)
+return NULL_RTX;
  return simplify_gen_unary (GET_CODE (op) == ASHIFTRT
 ? SIGN_EXTEND : ZERO_EXTEND,
 mode, inner, tmode);
@@ -1092,6 +1094,8 @@ simplify_unary_operation_1 (enum rtx_cod
{
  rtx inner =
rtl_hooks.gen_lowpart_no_emit (tmode, XEXP (XEXP (op, 0), 0));
+  if (!inner)
+return NULL_RTX;
  return simplify_gen_unary (ZERO_EXTEND, mode, inner, tmode);
}
}
@@ -2768,6 +2772,8 @@ simplify_binary_operation_1 (enum rtx_co
  if (trueop1 == constm1_rtx)
{
  rtx x = rtl_hooks.gen_lowpart_no_emit (mode, op0);
+  if (!x)
+return NULL_RTX;
  return simplify_gen_unary (NEG, mode, x, mode);
}
}


Re: [patch] Fix inlining glitch

2011-07-25 Thread Eric Botcazou
> No, the problem here is deciding whether we can inline a clone.
> We look into DECL_STRUCT_FUNCTION that we can't.  The real fix is one
> commented in:
>
>   /* Don't inline if the callee can throw non-call exceptions but the
>  caller cannot.
>  FIXME: this is obviously wrong for LTO where STRUCT_FUNCTION is
> missing. Move the flag into cgraph node or mirror it in the inline summary.
>  */

The irony being that I implemented the flag for the sake of LTO, based on 
suggestions made on this list...  So why is STRUCT_FUNCTION missing now?

> I plan to look into this before next release.  I would, for sure, welcome
> Eric beating me. If he don't have time to do so, I think the patch is OK as
> it is, since it improves the situation despite the fact that it won't fix
> the same problem with WPA.

OK, I'll install the patch for now.

-- 
Eric Botcazou


Re: [patch tree-optimization]: Move tree-vrp to use binary instead of truth-expressions

2011-07-25 Thread Kai Tietz
Hello,

this patch removes TRUTH-binary expressions and adjusts some places about
bitwise-binary-expressions.

ChangeLog gcc

2011-07-25  Kai Tietz  

* tree-vrp.c (extract_range_from_binary_expr): Remove
TRUTH-binary cases and add new bitwise cases.
(extract_range_from_assignment): Likewise.
(register_edge_assert_for_1): Likewise.
(register_edge_assert_for): Likewise.
(simplify_truth_ops_using_ranges): Likewise.
(simplify_stmt_using_ranges): Likewise.

Bootstrapped and regression tested for all standard languages
(including Ada and Obj-C++) on
host x86_64-pc-linux-gnu.  Ok for apply?


Regards,
Kai

Index: gcc-head/gcc/tree-vrp.c
===
--- gcc-head.orig/gcc/tree-vrp.c
+++ gcc-head/gcc/tree-vrp.c
@@ -2171,9 +2171,7 @@ extract_range_from_binary_expr (value_ra
   && code != MIN_EXPR
   && code != MAX_EXPR
   && code != BIT_AND_EXPR
-  && code != BIT_IOR_EXPR
-  && code != TRUTH_AND_EXPR
-  && code != TRUTH_OR_EXPR)
+  && code != BIT_IOR_EXPR)
 {
   /* We can still do constant propagation here.  */
   tree const_op0 = op_with_constant_singleton_value_range (op0);
@@ -2228,8 +2226,7 @@ extract_range_from_binary_expr (value_ra
  divisions.  TODO, we may be able to derive anti-ranges in
  some cases.  */
   if (code != BIT_AND_EXPR
-  && code != TRUTH_AND_EXPR
-  && code != TRUTH_OR_EXPR
+  && code != BIT_IOR_EXPR
   && code != TRUNC_DIV_EXPR
   && code != FLOOR_DIV_EXPR
   && code != CEIL_DIV_EXPR
@@ -2251,7 +2248,12 @@ extract_range_from_binary_expr (value_ra
   || POINTER_TYPE_P (TREE_TYPE (op0))
   || POINTER_TYPE_P (TREE_TYPE (op1)))
 {
-  if (code == MIN_EXPR || code == MAX_EXPR)
+  if (code == BIT_IOR_EXPR)
+{
+ set_value_range_to_varying (vr);
+ return;
+   }
+  else if (code == MIN_EXPR || code == MAX_EXPR)
{
  /* For MIN/MAX expressions with pointers, we only care about
 nullness, if both are non null, then the result is nonnull.
@@ -2296,57 +2298,9 @@ extract_range_from_binary_expr (value_ra

   /* For integer ranges, apply the operation to each end of the
  range and see what we end up with.  */
-  if (code == TRUTH_AND_EXPR
-  || code == TRUTH_OR_EXPR)
-{
-  /* If one of the operands is zero, we know that the whole
-expression evaluates zero.  */
-  if (code == TRUTH_AND_EXPR
- && ((vr0.type == VR_RANGE
-  && integer_zerop (vr0.min)
-  && integer_zerop (vr0.max))
- || (vr1.type == VR_RANGE
- && integer_zerop (vr1.min)
- && integer_zerop (vr1.max
-   {
- type = VR_RANGE;
- min = max = build_int_cst (expr_type, 0);
-   }
-  /* If one of the operands is one, we know that the whole
-expression evaluates one.  */
-  else if (code == TRUTH_OR_EXPR
-  && ((vr0.type == VR_RANGE
-   && integer_onep (vr0.min)
-   && integer_onep (vr0.max))
-  || (vr1.type == VR_RANGE
-  && integer_onep (vr1.min)
-  && integer_onep (vr1.max
-   {
- type = VR_RANGE;
- min = max = build_int_cst (expr_type, 1);
-   }
-  else if (vr0.type != VR_VARYING
-  && vr1.type != VR_VARYING
-  && vr0.type == vr1.type
-  && !symbolic_range_p (&vr0)
-  && !overflow_infinity_range_p (&vr0)
-  && !symbolic_range_p (&vr1)
-  && !overflow_infinity_range_p (&vr1))
-   {
- /* Boolean expressions cannot be folded with int_const_binop.  */
- min = fold_binary (code, expr_type, vr0.min, vr1.min);
- max = fold_binary (code, expr_type, vr0.max, vr1.max);
-   }
-  else
-   {
- /* The result of a TRUTH_*_EXPR is always true or false.  */
- set_value_range_to_truthvalue (vr, expr_type);
- return;
-   }
-}
-  else if (code == PLUS_EXPR
-  || code == MIN_EXPR
-  || code == MAX_EXPR)
+  if (code == PLUS_EXPR
+  || code == MIN_EXPR
+  || code == MAX_EXPR)
 {
   /* If we have a PLUS_EXPR with two VR_ANTI_RANGEs, drop to
 VR_VARYING.  It would take more effort to compute a precise
@@ -2675,9 +2629,10 @@ extract_range_from_binary_expr (value_ra
   else if (code == BIT_AND_EXPR || code == BIT_IOR_EXPR)
 {
   bool vr0_int_cst_singleton_p, vr1_int_cst_singleton_p;
-  bool int_cst_range0, int_cst_range1;
+  bool int_cst_range0, int_cst_range1, is_var_range;
   double_int may_be_nonzero0, may_be_nonzero1;
   double_int must_be_nonzero0, must_be_nonzero1;
+  value_range_t *cst_vr, *var_vr;

   vr0_int_cst_singleton_p = range_int_cst_singleton_p (&vr0);
   vr1_int_cst_singleton_p = range_int_cst_singleton_p

Re: [patch] Fix inlining glitch

2011-07-25 Thread Jan Hubicka
> > No, the problem here is deciding whether we can inline a clone.
> > We look into DECL_STRUCT_FUNCTION that we can't.  The real fix is one
> > commented in:
> >
> >   /* Don't inline if the callee can throw non-call exceptions but the
> >  caller cannot.
> >  FIXME: this is obviously wrong for LTO where STRUCT_FUNCTION is
> > missing. Move the flag into cgraph node or mirror it in the inline summary.
> >  */
> 
> The irony being that I implemented the flag for the sake of LTO, based on 
> suggestions made on this list...  So why is STRUCT_FUNCTION missing now?

I noticed that, sorry it was also my ommision.

DECL_STRUCT_FUNCTION and function bodies are not load into WPA stage, only into
ltrans. This is how WHOPR is designed.

WPA stage is expected to work across cgraph/varpool and not look into the
bodies/initializers.  So this flag, as well as the optimization settings, needs
to be copied there. 

Since we do a lot of querries from codegen to this particular flag, I would
suggest to simply copy it into inline_summary and make inline_analyze_function
to copy it rather than moving it to place less convenient for the codegen.

Honza
> 
> > I plan to look into this before next release.  I would, for sure, welcome
> > Eric beating me. If he don't have time to do so, I think the patch is OK as
> > it is, since it improves the situation despite the fact that it won't fix
> > the same problem with WPA.
> 
> OK, I'll install the patch for now.
> 
> -- 
> Eric Botcazou


Re: [patch tree-optimization]: Move tree-vrp to use binary instead of truth-expressions

2011-07-25 Thread Richard Guenther
On Mon, Jul 25, 2011 at 12:08 PM, Kai Tietz  wrote:
> Hello,
>
> this patch removes TRUTH-binary expressions and adjusts some places about
> bitwise-binary-expressions.
>
> ChangeLog gcc
>
> 2011-07-25  Kai Tietz  
>
>        * tree-vrp.c (extract_range_from_binary_expr): Remove
>        TRUTH-binary cases and add new bitwise cases.
>        (extract_range_from_assignment): Likewise.
>        (register_edge_assert_for_1): Likewise.
>        (register_edge_assert_for): Likewise.
>        (simplify_truth_ops_using_ranges): Likewise.
>        (simplify_stmt_using_ranges): Likewise.
>
> Bootstrapped and regression tested for all standard languages
> (including Ada and Obj-C++) on
> host x86_64-pc-linux-gnu.  Ok for apply?
>
>
> Regards,
> Kai
>
> Index: gcc-head/gcc/tree-vrp.c
> ===
> --- gcc-head.orig/gcc/tree-vrp.c
> +++ gcc-head/gcc/tree-vrp.c
> @@ -2171,9 +2171,7 @@ extract_range_from_binary_expr (value_ra
>       && code != MIN_EXPR
>       && code != MAX_EXPR
>       && code != BIT_AND_EXPR
> -      && code != BIT_IOR_EXPR
> -      && code != TRUTH_AND_EXPR
> -      && code != TRUTH_OR_EXPR)
> +      && code != BIT_IOR_EXPR)
>     {
>       /* We can still do constant propagation here.  */
>       tree const_op0 = op_with_constant_singleton_value_range (op0);
> @@ -2228,8 +2226,7 @@ extract_range_from_binary_expr (value_ra
>      divisions.  TODO, we may be able to derive anti-ranges in
>      some cases.  */
>   if (code != BIT_AND_EXPR
> -      && code != TRUTH_AND_EXPR
> -      && code != TRUTH_OR_EXPR
> +      && code != BIT_IOR_EXPR
>       && code != TRUNC_DIV_EXPR
>       && code != FLOOR_DIV_EXPR
>       && code != CEIL_DIV_EXPR
> @@ -2251,7 +2248,12 @@ extract_range_from_binary_expr (value_ra
>       || POINTER_TYPE_P (TREE_TYPE (op0))
>       || POINTER_TYPE_P (TREE_TYPE (op1)))
>     {
> -      if (code == MIN_EXPR || code == MAX_EXPR)
> +      if (code == BIT_IOR_EXPR)
> +        {
> +         set_value_range_to_varying (vr);
> +         return;
> +       }
> +      else if (code == MIN_EXPR || code == MAX_EXPR)
>        {
>          /* For MIN/MAX expressions with pointers, we only care about
>             nullness, if both are non null, then the result is nonnull.
> @@ -2296,57 +2298,9 @@ extract_range_from_binary_expr (value_ra
>
>   /* For integer ranges, apply the operation to each end of the
>      range and see what we end up with.  */
> -  if (code == TRUTH_AND_EXPR
> -      || code == TRUTH_OR_EXPR)
> -    {
> -      /* If one of the operands is zero, we know that the whole
> -        expression evaluates zero.  */
> -      if (code == TRUTH_AND_EXPR
> -         && ((vr0.type == VR_RANGE
> -              && integer_zerop (vr0.min)
> -              && integer_zerop (vr0.max))
> -             || (vr1.type == VR_RANGE
> -                 && integer_zerop (vr1.min)
> -                 && integer_zerop (vr1.max
> -       {
> -         type = VR_RANGE;
> -         min = max = build_int_cst (expr_type, 0);
> -       }
> -      /* If one of the operands is one, we know that the whole
> -        expression evaluates one.  */
> -      else if (code == TRUTH_OR_EXPR
> -              && ((vr0.type == VR_RANGE
> -                   && integer_onep (vr0.min)
> -                   && integer_onep (vr0.max))
> -                  || (vr1.type == VR_RANGE
> -                      && integer_onep (vr1.min)
> -                      && integer_onep (vr1.max
> -       {
> -         type = VR_RANGE;
> -         min = max = build_int_cst (expr_type, 1);
> -       }
> -      else if (vr0.type != VR_VARYING
> -              && vr1.type != VR_VARYING
> -              && vr0.type == vr1.type
> -              && !symbolic_range_p (&vr0)
> -              && !overflow_infinity_range_p (&vr0)
> -              && !symbolic_range_p (&vr1)
> -              && !overflow_infinity_range_p (&vr1))
> -       {
> -         /* Boolean expressions cannot be folded with int_const_binop.  */
> -         min = fold_binary (code, expr_type, vr0.min, vr1.min);
> -         max = fold_binary (code, expr_type, vr0.max, vr1.max);
> -       }
> -      else
> -       {
> -         /* The result of a TRUTH_*_EXPR is always true or false.  */
> -         set_value_range_to_truthvalue (vr, expr_type);
> -         return;
> -       }
> -    }
> -  else if (code == PLUS_EXPR
> -          || code == MIN_EXPR
> -          || code == MAX_EXPR)
> +  if (code == PLUS_EXPR
> +      || code == MIN_EXPR
> +      || code == MAX_EXPR)
>     {
>       /* If we have a PLUS_EXPR with two VR_ANTI_RANGEs, drop to
>         VR_VARYING.  It would take more effort to compute a precise
> @@ -2675,9 +2629,10 @@ extract_range_from_binary_expr (value_ra
>   else if (code == BIT_AND_EXPR || code == BIT_IOR_EXPR)
>     {
>       bool vr0_int_cst_singleton_p, vr1_int_cst_singleton_p;
> -      bool int_cst_range0, int_cst_range1;
> +      bool int_cst_range0, int_cst_range1, is_var_range;
>      

Re: tic6c-elf toolchain fails to build in FSF mainline

2011-07-25 Thread Bernd Schmidt
On 07/24/11 10:17, Nick Clifton wrote:
> Hi Bernd,
> 
>   I tried building a tic6x-elf toolchain today from the FSF mainline
>   sources, but "make all-gcc" fails with:
> 
>   make[1]: *** No rule to make target 
> `/work/sources/gcc/current/gcc/common/config/c6x/c6x-common.c', needed by 
> `c6x-common.o'.  Stop.
> 
>   Could you fix this please ?

I guess one missing svn add had to be expected :-(


Bernd

Index: gcc/ChangeLog
===
--- gcc/ChangeLog   (revision 176737)
+++ gcc/ChangeLog   (working copy)
@@ -1,3 +1,7 @@
+2011-07-25  Bernd Schmidt  
+
+   * common/config/c6x/c6x-common.c: New file.
+
 2011-07-25  Roman Zhuykov  
 
* tree-flow.h (tree_ssa_loop_version): Remove unused declaration.
Index: gcc/common/config/c6x/c6x-common.c
===
--- gcc/common/config/c6x/c6x-common.c  (revision 0)
+++ gcc/common/config/c6x/c6x-common.c  (revision 0)
@@ -0,0 +1,63 @@
+/* TI C6X common hooks.
+   Copyright (C) 2011  Free Software Foundation, Inc.
+
+This file is part of GCC.
+
+GCC is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 3, or (at your option)
+any later version.
+
+GCC is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with GCC; see the file COPYING3.  If not see
+.  */
+
+#include "config.h"
+#include "system.h"
+#include "coretypes.h"
+#include "diagnostic-core.h"
+#include "tm.h"
+#include "tm_p.h"
+#include "common/common-target.h"
+#include "common/common-target-def.h"
+#include "opts.h"
+#include "flags.h"
+
+/* Implement overriding of the optimization options.  */
+static const struct default_options c6x_option_optimization_table[] =
+  {
+{ OPT_LEVELS_1_PLUS, OPT_fomit_frame_pointer, NULL, 1 },
+{ OPT_LEVELS_1_PLUS, OPT_frename_registers, NULL, 1 },
+{ OPT_LEVELS_ALL, OPT_freciprocal_math, NULL, 1 },
+{ OPT_LEVELS_NONE, 0, NULL, 0 }
+  };
+
+/* Implement TARGET_EXCEPT_UNWIND_INFO.  */
+
+static enum unwind_info_type
+c6x_except_unwind_info (struct gcc_options *opts ATTRIBUTE_UNUSED)
+{
+  /* Honor the --enable-sjlj-exceptions configure switch.  */
+#ifdef CONFIG_SJLJ_EXCEPTIONS
+  if (CONFIG_SJLJ_EXCEPTIONS)
+return UI_SJLJ;
+#endif
+
+  return UI_TARGET;
+}
+
+#undef TARGET_DEFAULT_TARGET_FLAGS
+#define TARGET_DEFAULT_TARGET_FLAGS TARGET_DEFAULT
+
+#undef TARGET_OPTION_OPTIMIZATION_TABLE
+#define TARGET_OPTION_OPTIMIZATION_TABLE c6x_option_optimization_table
+
+#undef TARGET_EXCEPT_UNWIND_INFO
+#define TARGET_EXCEPT_UNWIND_INFO  c6x_except_unwind_info
+
+struct gcc_targetm_common targetm_common = TARGETM_COMMON_INITIALIZER;


Re: [patch] Fix PR tree-optimization/49771

2011-07-25 Thread Ira Rosen
On 25 July 2011 12:39, Richard Guenther  wrote:
> On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand  wrote:
>> Richard Guenther wrote:
>>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen  wrote:
>>> > On 21 July 2011 15:19, Ira Rosen  wrote:
>>> >> I reproduced the failure. It occurs without Richard's
>>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
>>> >> patches too. Obviously the vectorized loop is executed, but at the
>>> >> moment I don't understand why. I'll have a better look on Sunday.
>>> >
>>> > Actually it doesn't choose the vectorized code. But the scalar version
>>> > gets optimized in a harmful way for SPU, AFAIU.
>>> > Here is the scalar loop after vrp2
>>> >
>>> > :
>>> >  # ivtmp.42_50 = PHI 
>>> >  D.4593_42 = (void *) ivtmp.53_32;
>>> >  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
>>> >  D.4521_34 = D.4520_33 + 1;
>>> >  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
>>> >  ivtmp.42_45 = ivtmp.42_50 + 4;
>>> >  if (ivtmp.42_45 != 16)
>>> >    goto ;
>>> >  else
>>> >    goto ;
>>> >
>>> > and the load is changed by dom2 to:
>>> >
>>> > :
>>> >  ...
>>> >  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
>>> >   ...
>>> >
>>> > where vector(4) int * vect_pa.9;
>>> >
>>> > And the scalar loop has no rotate for that load:
>>>
>>> Hum.  This smells like we are hiding sth from the tree optimizers?
>>
>> Well, the back-end assumes a pointer to vector type is always
>> naturally aligned, and therefore the data it points to can be
>> accessed via a simple load, with no extra rotate needed.
>
> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
> anything about alignment just because of the kind of the pointer
> is bogus - the scalar code does a scalar read using that pointer.
> So the backend better should look at the memory operation, not
> at the pointer type.  That said, I can't find any code that looks
> suspicious in the spu backend.
>
>> It seems what happened here is that somehow, a pointer to int
>> gets replaced by a pointer to vector, even though their alignment
>> properties are different.
>
> No, they are not.  They get replaced if they are value-equivalent
> in which case they are also alignment-equivalent.  But well, the
> dump snippet wasn't complete and I don't feel like building a
> SPU cross to verify myself.

I am attaching the complete file.

Thanks,
Ira



>
>> This vector pointer must originate somehow in the vectorizer,
>> however, since the original C source does not contain any
>> vector types at all ...
>
> That's for sure true, it must be the initial pointer we then increment
> in the vectorized loop.
>
> Richard.
>
>> Bye,
>> Ulrich
>>
>> --
>>  Dr. Ulrich Weigand
>>  GNU Toolchain for Linux on System z and Cell BE
>>  ulrich.weig...@de.ibm.com
>>
>


my--pr49771.c.124t.dom2
Description: Binary data
#include 
#include 

#define N 4
static int a[N];

__attribute__ ((noinline)) int
foo (void)
{
  int j;
  int i;
  for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
  a[j] = a[i] + 1;
  return a[0];
}

int
main (void)
{
  int res, i;

  for (i = 0; i < N; i++)
a[i] = 0;

  res = foo ();
  if (res != 31)
printf ("%d\n", res);
  for (i = 0; i < N; i++)
printf ("%d ", a[i]);
 printf ("\n");

  return 0;
}

/* { dg-final { cleanup-tree-dump "vect" } } */


Re: [patch] Fix PR tree-optimization/49771

2011-07-25 Thread Richard Guenther
On Mon, Jul 25, 2011 at 12:52 PM, Ira Rosen  wrote:
> On 25 July 2011 12:39, Richard Guenther  wrote:
>> On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand  wrote:
>>> Richard Guenther wrote:
 On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen  wrote:
 > On 21 July 2011 15:19, Ira Rosen  wrote:
 >> I reproduced the failure. It occurs without Richard's
 >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
 >> patches too. Obviously the vectorized loop is executed, but at the
 >> moment I don't understand why. I'll have a better look on Sunday.
 >
 > Actually it doesn't choose the vectorized code. But the scalar version
 > gets optimized in a harmful way for SPU, AFAIU.
 > Here is the scalar loop after vrp2
 >
 > :
 >  # ivtmp.42_50 = PHI 
 >  D.4593_42 = (void *) ivtmp.53_32;
 >  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
 >  D.4521_34 = D.4520_33 + 1;
 >  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
 >  ivtmp.42_45 = ivtmp.42_50 + 4;
 >  if (ivtmp.42_45 != 16)
 >    goto ;
 >  else
 >    goto ;
 >
 > and the load is changed by dom2 to:
 >
 > :
 >  ...
 >  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
 >   ...
 >
 > where vector(4) int * vect_pa.9;
 >
 > And the scalar loop has no rotate for that load:

 Hum.  This smells like we are hiding sth from the tree optimizers?
>>>
>>> Well, the back-end assumes a pointer to vector type is always
>>> naturally aligned, and therefore the data it points to can be
>>> accessed via a simple load, with no extra rotate needed.
>>
>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
>> anything about alignment just because of the kind of the pointer
>> is bogus - the scalar code does a scalar read using that pointer.
>> So the backend better should look at the memory operation, not
>> at the pointer type.  That said, I can't find any code that looks
>> suspicious in the spu backend.
>>
>>> It seems what happened here is that somehow, a pointer to int
>>> gets replaced by a pointer to vector, even though their alignment
>>> properties are different.
>>
>> No, they are not.  They get replaced if they are value-equivalent
>> in which case they are also alignment-equivalent.  But well, the
>> dump snippet wasn't complete and I don't feel like building a
>> SPU cross to verify myself.
>
> I am attaching the complete file.

The issue seems to be that the IV in question, vect_pa.9_19, is
defined as

  vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;

but ivtmp.53_32 does not have a definition at all.

Richard.

>
> Thanks,
> Ira
>
>
>
>>
>>> This vector pointer must originate somehow in the vectorizer,
>>> however, since the original C source does not contain any
>>> vector types at all ...
>>
>> That's for sure true, it must be the initial pointer we then increment
>> in the vectorized loop.
>>
>> Richard.
>>
>>> Bye,
>>> Ulrich
>>>
>>> --
>>>  Dr. Ulrich Weigand
>>>  GNU Toolchain for Linux on System z and Cell BE
>>>  ulrich.weig...@de.ibm.com
>>>
>>
>


Re: [patch] Fix PR tree-optimization/49771

2011-07-25 Thread Ira Rosen
On 25 July 2011 13:57, Richard Guenther  wrote:
> On Mon, Jul 25, 2011 at 12:52 PM, Ira Rosen  wrote:
>> On 25 July 2011 12:39, Richard Guenther  wrote:
>>> On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand  
>>> wrote:
 Richard Guenther wrote:
> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen  wrote:
> > On 21 July 2011 15:19, Ira Rosen  wrote:
> >> I reproduced the failure. It occurs without Richard's
> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
> >> patches too. Obviously the vectorized loop is executed, but at the
> >> moment I don't understand why. I'll have a better look on Sunday.
> >
> > Actually it doesn't choose the vectorized code. But the scalar version
> > gets optimized in a harmful way for SPU, AFAIU.
> > Here is the scalar loop after vrp2
> >
> > :
> >  # ivtmp.42_50 = PHI 
> >  D.4593_42 = (void *) ivtmp.53_32;
> >  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
> >  D.4521_34 = D.4520_33 + 1;
> >  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
> >  ivtmp.42_45 = ivtmp.42_50 + 4;
> >  if (ivtmp.42_45 != 16)
> >    goto ;
> >  else
> >    goto ;
> >
> > and the load is changed by dom2 to:
> >
> > :
> >  ...
> >  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
> >   ...
> >
> > where vector(4) int * vect_pa.9;
> >
> > And the scalar loop has no rotate for that load:
>
> Hum.  This smells like we are hiding sth from the tree optimizers?

 Well, the back-end assumes a pointer to vector type is always
 naturally aligned, and therefore the data it points to can be
 accessed via a simple load, with no extra rotate needed.
>>>
>>> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
>>> anything about alignment just because of the kind of the pointer
>>> is bogus - the scalar code does a scalar read using that pointer.
>>> So the backend better should look at the memory operation, not
>>> at the pointer type.  That said, I can't find any code that looks
>>> suspicious in the spu backend.
>>>
 It seems what happened here is that somehow, a pointer to int
 gets replaced by a pointer to vector, even though their alignment
 properties are different.
>>>
>>> No, they are not.  They get replaced if they are value-equivalent
>>> in which case they are also alignment-equivalent.  But well, the
>>> dump snippet wasn't complete and I don't feel like building a
>>> SPU cross to verify myself.
>>
>> I am attaching the complete file.
>
> The issue seems to be that the IV in question, vect_pa.9_19, is
> defined as
>
>  vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;
>
> but ivtmp.53_32 does not have a definition at all.
>

I am sorry, it's my fault, resending the file.

Sorry,
Ira

> Richard.
>
>>
>> Thanks,
>> Ira
>>
>>
>>
>>>
 This vector pointer must originate somehow in the vectorizer,
 however, since the original C source does not contain any
 vector types at all ...
>>>
>>> That's for sure true, it must be the initial pointer we then increment
>>> in the vectorized loop.
>>>
>>> Richard.
>>>
 Bye,
 Ulrich

 --
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com

>>>
>>
>


my--pr49771.c.124t.dom2
Description: Binary data


Re: [patch] Fix PR tree-optimization/49771

2011-07-25 Thread Richard Guenther
On Mon, Jul 25, 2011 at 1:09 PM, Ira Rosen  wrote:
> On 25 July 2011 13:57, Richard Guenther  wrote:
>> On Mon, Jul 25, 2011 at 12:52 PM, Ira Rosen  wrote:
>>> On 25 July 2011 12:39, Richard Guenther  wrote:
 On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand  
 wrote:
> Richard Guenther wrote:
>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen  wrote:
>> > On 21 July 2011 15:19, Ira Rosen  wrote:
>> >> I reproduced the failure. It occurs without Richard's
>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
>> >> patches too. Obviously the vectorized loop is executed, but at the
>> >> moment I don't understand why. I'll have a better look on Sunday.
>> >
>> > Actually it doesn't choose the vectorized code. But the scalar version
>> > gets optimized in a harmful way for SPU, AFAIU.
>> > Here is the scalar loop after vrp2
>> >
>> > :
>> >  # ivtmp.42_50 = PHI 
>> >  D.4593_42 = (void *) ivtmp.53_32;
>> >  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
>> >  D.4521_34 = D.4520_33 + 1;
>> >  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
>> >  ivtmp.42_45 = ivtmp.42_50 + 4;
>> >  if (ivtmp.42_45 != 16)
>> >    goto ;
>> >  else
>> >    goto ;
>> >
>> > and the load is changed by dom2 to:
>> >
>> > :
>> >  ...
>> >  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
>> >   ...
>> >
>> > where vector(4) int * vect_pa.9;
>> >
>> > And the scalar loop has no rotate for that load:
>>
>> Hum.  This smells like we are hiding sth from the tree optimizers?
>
> Well, the back-end assumes a pointer to vector type is always
> naturally aligned, and therefore the data it points to can be
> accessed via a simple load, with no extra rotate needed.

 I can't see any use of VECTOR_TYPE in config/spu/, and assuming
 anything about alignment just because of the kind of the pointer
 is bogus - the scalar code does a scalar read using that pointer.
 So the backend better should look at the memory operation, not
 at the pointer type.  That said, I can't find any code that looks
 suspicious in the spu backend.

> It seems what happened here is that somehow, a pointer to int
> gets replaced by a pointer to vector, even though their alignment
> properties are different.

 No, they are not.  They get replaced if they are value-equivalent
 in which case they are also alignment-equivalent.  But well, the
 dump snippet wasn't complete and I don't feel like building a
 SPU cross to verify myself.
>>>
>>> I am attaching the complete file.
>>
>> The issue seems to be that the IV in question, vect_pa.9_19, is
>> defined as
>>
>>  vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;
>>
>> but ivtmp.53_32 does not have a definition at all.
>>
>
> I am sorry, it's my fault, resending the file.

Seems perfectly valid to me.  Or well - I suppose we might run into
the issue that the vectorizer sets alignment data at the wrong spot?
You can check alignment info when dumping with the -alias flag.
Building a spu cross now.

Richard.


[Patch,AVR]: Fix PR39386 (x << x and x >> x)

2011-07-25 Thread Georg-Johann Lay
This is a fix for pathological, variable shift offset shifts of
the form x << x resp. x >> x.

Such shifts need a shift register which might overlap with the
shift operand.

unsigned char shift (unsigned int x)
{
return x << x;
}


Without patch, note r24 is part of operand and used in loop:

shift:
rjmp 2f
1:  lsl r24
rol r25
2:  dec r24
brpl 1b
ret

With patch use tmp_reg (R0) as counter:

shift:
mov r0,r24
rjmp 2f
1:  lsl r24
rol r25
2:  dec r0
brpl 1b
ret

Patch as obvious. Increased instruction length is already
taken into account because the RO = Rx will be needed if
Rx is used afterwards, anyway.

Ok to install?

Johann


PR target/39386
* config/avr/avr.c (out_shift_with_cnt): Use tmp_reg as
shift counter for x << x and x >> x shifts.

Index: config/avr/avr.c
===
--- config/avr/avr.c	(revision 176624)
+++ config/avr/avr.c	(working copy)
@@ -3147,8 +3147,11 @@ out_shift_with_cnt (const char *templ, r
 }
   else if (register_operand (operands[2], QImode))
 {
-  if (reg_unused_after (insn, operands[2]))
-	op[3] = op[2];
+  if (reg_unused_after (insn, operands[2])
+  && !reg_overlap_mentioned_p (operands[0], operands[2]))
+{
+  op[3] = op[2];
+}
   else
 	{
 	  op[3] = tmp_reg_rtx;


Re: [patch] Fix PR tree-optimization/49771

2011-07-25 Thread Richard Guenther
On Mon, Jul 25, 2011 at 1:15 PM, Richard Guenther
 wrote:
> On Mon, Jul 25, 2011 at 1:09 PM, Ira Rosen  wrote:
>> On 25 July 2011 13:57, Richard Guenther  wrote:
>>> On Mon, Jul 25, 2011 at 12:52 PM, Ira Rosen  wrote:
 On 25 July 2011 12:39, Richard Guenther  wrote:
> On Mon, Jul 25, 2011 at 11:10 AM, Ulrich Weigand  
> wrote:
>> Richard Guenther wrote:
>>> On Sun, Jul 24, 2011 at 2:02 PM, Ira Rosen  wrote:
>>> > On 21 July 2011 15:19, Ira Rosen  wrote:
>>> >> I reproduced the failure. It occurs without Richard's
>>> >> (http://gcc.gnu.org/ml/gcc-patches/2011-07/msg01022.html) and this
>>> >> patches too. Obviously the vectorized loop is executed, but at the
>>> >> moment I don't understand why. I'll have a better look on Sunday.
>>> >
>>> > Actually it doesn't choose the vectorized code. But the scalar version
>>> > gets optimized in a harmful way for SPU, AFAIU.
>>> > Here is the scalar loop after vrp2
>>> >
>>> > :
>>> >  # ivtmp.42_50 = PHI 
>>> >  D.4593_42 = (void *) ivtmp.53_32;
>>> >  D.4520_33 = MEM[base: D.4593_42, offset: 0B];
>>> >  D.4521_34 = D.4520_33 + 1;
>>> >  MEM[symbol: a, index: ivtmp.42_50, offset: 0B] = D.4521_34;
>>> >  ivtmp.42_45 = ivtmp.42_50 + 4;
>>> >  if (ivtmp.42_45 != 16)
>>> >    goto ;
>>> >  else
>>> >    goto ;
>>> >
>>> > and the load is changed by dom2 to:
>>> >
>>> > :
>>> >  ...
>>> >  D.4520_33 = MEM[base: vect_pa.9_19, offset: 0B];
>>> >   ...
>>> >
>>> > where vector(4) int * vect_pa.9;
>>> >
>>> > And the scalar loop has no rotate for that load:
>>>
>>> Hum.  This smells like we are hiding sth from the tree optimizers?
>>
>> Well, the back-end assumes a pointer to vector type is always
>> naturally aligned, and therefore the data it points to can be
>> accessed via a simple load, with no extra rotate needed.
>
> I can't see any use of VECTOR_TYPE in config/spu/, and assuming
> anything about alignment just because of the kind of the pointer
> is bogus - the scalar code does a scalar read using that pointer.
> So the backend better should look at the memory operation, not
> at the pointer type.  That said, I can't find any code that looks
> suspicious in the spu backend.
>
>> It seems what happened here is that somehow, a pointer to int
>> gets replaced by a pointer to vector, even though their alignment
>> properties are different.
>
> No, they are not.  They get replaced if they are value-equivalent
> in which case they are also alignment-equivalent.  But well, the
> dump snippet wasn't complete and I don't feel like building a
> SPU cross to verify myself.

 I am attaching the complete file.
>>>
>>> The issue seems to be that the IV in question, vect_pa.9_19, is
>>> defined as
>>>
>>>  vect_pa.9_19 = (vector(4) int *) ivtmp.53_32;
>>>
>>> but ivtmp.53_32 does not have a definition at all.
>>>
>>
>> I am sorry, it's my fault, resending the file.
>
> Seems perfectly valid to me.  Or well - I suppose we might run into
> the issue that the vectorizer sets alignment data at the wrong spot?
> You can check alignment info when dumping with the -alias flag.
> Building a spu cross now.

Nope, all perfectly valid.

> Richard.
>


[patch, testsuite] Fix gcc.dg/vect/vect-70.c (was Re: vect-70.c fails on spu-elf)

2011-07-25 Thread Ira Rosen


"Ulrich Weigand"  wrote on 25/07/2011 12:19:54 PM:

> Ira Rosen wrote:
> > "Ulrich Weigand"  wrote on 22/07/2011 05:05:57 PM:
> > > Any suggestions how to fix this?  Maybe decrease N again and instead
> > > prevent unrolling via command line switch?
> >
> > There is no flag for this unrolling, but we can run the test with -O1
> > instead of -O2 (and with N=12) by renaming vect-70.c to O1-vect-70.c
(see
> > the attached patch).
> >
> > > Maybe just decrease
> > > *some* dimensions of the tmp1 array?
> >
> > This can help too, if it is small enough:
>
> Either of the two patches you suggest fix the problem for me on spu-elf.

OK, so I am choosing the second patch.
Tested by Ulrich on spu-elf, and on x86_64-suse-linux.

OK for mainline? And 4.6?

Thanks,
Ira

testsuite/ChangeLog:

* gcc.dg/vect/vect-70.c: Reduce the data size to fit
SPU local store.


Index: testsuite/gcc.dg/vect/vect-70.c
===
--- testsuite/gcc.dg/vect/vect-70.c (revision 176495)
+++ testsuite/gcc.dg/vect/vect-70.c (working copy)
@@ -7,7 +7,7 @@

 struct s{
   int m;
-  int n[N][N][N];
+  int n[N/6][N/6][N];
 };

 struct test1{




Re: [PATCH, i386]: Rewrite LEA handling (was:Re: PATCH [10/n] X32: Support x32 LEA insns)

2011-07-25 Thread Uros Bizjak
On Mon, Jul 25, 2011 at 3:58 AM, H.J. Lu  wrote:

>> You are not fixing the core of the problem... this is why you need so
>> much hacks and kludges at various places (some w.r.t. -fPIC already
>> existed, see the patch). Above, you correctly identified the problem,
>> so let's avoid gen_lowpart on SImode operands by not calling it
>> anymore.
>>
>> Attached patch effectively rewrites LEA handling. The trick is, that
>> instead of using Pmode operations in addresses, we use either SImode
>> or DImode operations to calculate the address on 64bit targets. Up to
>> now, address calculations strictly used Pmode, so SImode on 32bit
>> targets and DImode on 64bit targets. Recent patches to
>> ix86_decompose_address and ix86_legitimate_address_p relaxed this
>> requirement.
>>
>> Attached patch changes LEA patterns and LEA splitters to accept
>> addresses, calculated with either SImode or DImode operations.This
>> means, that on x64 targets, we don't use gen_lowpart on SImode
>> operands anymore. Since symbol references on x32 are in SImode, this
>> solves the problem. The patch also avoids generating SImode subregs of
>> DImode addresses and DImode zero_extends of SImode addresses, since
>> LEA insn does this for us automatically.
>>
>> Please also note the change to ix86_print_operand_address. To avoid
>> addr32 prefixes, we can force registers in DImode on 64bit targets
>> without any problems. On x32, we can investigate, if this change
>> avoids unnecessary LEAs (for PR 49781, patched gcc genrates 6 vs. 8).
>
> The testcase won't compile since PIC doesn't work:

Well, I did say that -fPIC did not work.

>> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
>> {,-m32} with no regressions. H.J., can you please test it on x32?
>
> On x32, it failed:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49832
>
>> BTW: -fPIC is not yet implemented on trunk and still fails there with
>> an (unrelated) error, I didn't check x32 branch.
>>
>
> This could be:
>
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49833

Attached patch implements -fpic handling for x32. In x32 mode, we now
use x86_64_general_operand and corresponding "e" constraints for adds
in SImode, since it looks that invalid addresses can only be generated
through adds. This avoids a whole bunch of new predicates and
constraints.

2011-07-25  Uros Bizjak  

PR target/47381
PR target/49832
PR target/49833
* config/i386/i386.md (add_operand): New mode attribute.
(*movdi_internal_rex64): Remove mode from pic_32bit_operand check.
(*movsi_internal): Ditto.  Use "e" constraint in alternative 2.
(*lea_1): Use SWI48 mode iterator.
(*lea_1_zext): New insn pattern.
(add3): Use  predicate for operand 2.
(*add_1): Use  predicate for operand 2.  Use "le"
constraint for alternative 2.
(addsi_1_zext): Use addsi_operand predicate for operand 2.  Use "le"
constraint for alternative 2.
(add->lea splitter): Check operand modes in insn constraint.  Extend
operands less than SImode wide to SImode.
(add->lea zext splitter): Do not extend operands to DImode.
(*lea_general_1): Handle only QImode and HImode operands.
(*lea_general_2): Ditto.
(*lea_general_3): Ditto.
(*lea_general_1_zext): Remove.
(*lea_general_2_zext): Ditto.
(*lea_general_3_zext): Ditto.
(*lea_general_4): Check operand modes in insn constraint.  Extend
operands less than SImode wide to SImode.
(ashift->lea splitter): Ditto.
* config/i386/i386.c (ix86_print_operand_address): Print address
registers with 'q' modifier on 64bit targets.
* config/i386/predicates.md (pic_32bit_opreand): Define as special
predicate.  Reject non-SI and non-DI modes.
(addsi_operand): New predicate.

Uros.
Index: i386.md
===
--- i386.md (revision 176733)
+++ i386.md (working copy)
@@ -901,6 +901,14 @@
 (SI "nonmemory_operand")
 (DI "x86_64_nonmemory_operand")])
 
+;; Operand predicate for adds.
+(define_mode_attr add_operand
+   [(QI "general_operand")
+(HI "general_operand")
+(SI "addsi_operand")
+(DI "x86_64_general_operand")
+(TI "x86_64_general_operand")])
+
 ;; Operand predicate for shifts.
 (define_mode_attr shift_operand
[(QI "nonimmediate_operand")
@@ -2039,7 +2047,7 @@
  (const_string "ssemov")
(eq_attr "alternative" "16,17")
  (const_string "ssecvt")
-   (match_operand:DI 1 "pic_32bit_operand" "")
+   (match_operand 1 "pic_32bit_operand" "")
  (const_string "lea")
   ]
   (const_string "imov")))
@@ -2184,7 +2192,7 @@
   [(set (match_operand:SI 0 "nonimmediate_operand"
"=r,m ,*y,*y,?rm,?*y,*x,*x,?r ,m ,?*Yi,*x")
(match_operand:SI 1 "general_o

[C++ Patch] PR 49838

2011-07-25 Thread Paolo Carlini

Hi,

I have this patchlet for an ICE after error on invalid. Is it OK?

Tested x86_64-linux.

Paolo.


/cp
2011-07-25  Paolo Carlini  

PR c++/49838
* parser.c (cp_parser_perform_range_for_lookup): Early return if
error_operand_p (range).

/testsuite
2011-07-25  Paolo Carlini  

PR c++/49838
* g++.dg/cpp0x/range-for19.C: New.

Index: gcc/testsuite/g++.dg/cpp0x/range-for19.C
===
--- gcc/testsuite/g++.dg/cpp0x/range-for19.C(revision 0)
+++ gcc/testsuite/g++.dg/cpp0x/range-for19.C(revision 0)
@@ -0,0 +1,11 @@
+// PR c++/49838
+
+// { dg-do compile }
+// { dg-options "-std=c++0x" }
+
+int main()
+{
+  auto a;// { dg-error "no initializer" }
+  for(auto i: a) // { dg-error "deduce" }
+;
+}
Index: gcc/cp/parser.c
===
--- gcc/cp/parser.c (revision 176718)
+++ gcc/cp/parser.c (working copy)
@@ -8793,6 +8793,9 @@ cp_convert_range_for (tree statement, tree range_d
 static tree
 cp_parser_perform_range_for_lookup (tree range, tree *begin, tree *end)
 {
+  if (error_operand_p (range))
+return error_mark_node;
+
   if (!COMPLETE_TYPE_P (complete_type (TREE_TYPE (range
 {
   error ("range-based % expression of type %qT "


Re: Fix pass_partition_blocks vs -O0

2011-07-25 Thread Michael Matz
Hi,

On Fri, 22 Jul 2011, Richard Henderson wrote:

> Well, technically it's not "broken" yet.  It will be as soon as it starts
> touching DF data, since this pass runs before pass_df_initialize_no_opt.
> 
> But the only real consumer of BB_PARTITION is pass_reorder_blocks.  And
> that pass is already gated to only run if optimization is enabled.  So
> really there's no point in running this pass without optimization.
> 
> Committed.

Why not simply move pass_df_initialize_no_opt earlier?  Introducing more 
checks on optimize instead of only relying on flag_xxx seems to go the 
wrong direction.


Ciao,
Michael.


Re: [RFC] Replace some bitmaps with HARD_REG_SETs - second version

2011-07-25 Thread Dimitrios Apostolou
Bug found, in df_mark_reg I need to iterate until regno + n, not n. The 
error is at the following hunk:


--- gcc/df-scan.c   2011-02-02 20:08:06 +
+++ gcc/df-scan.c   2011-07-24 17:16:46 +
@@ -3713,35 +3717,40 @@ df_mark_reg (rtx reg, void *vset)
   if (regno < FIRST_PSEUDO_REGISTER)
 {
   int n = hard_regno_nregs[regno][GET_MODE (reg)];
-  bitmap_set_range (set, regno, n);
+  int i;
+  for (i=regno; iMany thanks to monoid from IRC for spotting it! I'll post an updated patch 
soon.


Thanks, 
Dimitris




Re: [patch] Fix PR tree-optimization/49771

2011-07-25 Thread Ulrich Weigand
Richard Guenther wrote:

> >> Well, the back-end assumes a pointer to vector type is always
> >> naturally aligned, and therefore the data it points to can be
> >> accessed via a simple load, with no extra rotate needed.
> >
> > I can't see any use of VECTOR_TYPE in config/spu/, and assuming
> > anything about alignment just because of the kind of the pointer
> > is bogus - the scalar code does a scalar read using that pointer.
> > So the backend better should look at the memory operation, not
> > at the pointer type.  That said, I can't find any code that looks
> > suspicious in the spu backend.
> >
> >> It seems what happened here is that somehow, a pointer to int
> >> gets replaced by a pointer to vector, even though their alignment
> >> properties are different.
> >
> > No, they are not.  They get replaced if they are value-equivalent
> > in which case they are also alignment-equivalent.  But well, the
> > dump snippet wasn't complete and I don't feel like building a
> > SPU cross to verify myself.

> > Seems perfectly valid to me.  Or well - I suppose we might run into
> > the issue that the vectorizer sets alignment data at the wrong spot?
> > You can check alignment info when dumping with the -alias flag.
> > Building a spu cross now.
> 
> Nope, all perfectly valid.

Ah, I guess I see what's happening here.  When the SPU back-end is called
to expand the load, the source operand is passed as:

(mem:SI (reg/f:SI 226 [ vect_pa.9 ])
[2 MEM[base: vect_pa.9_44, offset: 0B]+0 S4 A32])

Now this does say the MEM is only guaranteed to be aligned to 32 bits.

However, spu_expand_load then goes and looks at the components of the
address in detail, in order to figure out how to best perform the access.
In doing so, it looks at the REGNO_POINTER_ALIGN values of the base
registers involved in the address.

In this case, REGNO_POINTER_ALIGN (226) is set to 128, and therefore
the back-end thinks it can use an aligned access after all.

Now, the reason why REGNO_POINTER_ALIGN (226) is 128 is that the register
is the DECL_RTL for the variable vect_pa.9, and that variable has a
pointer-to-vector type (with target alignment 128).

When expanding that variable, expand_one_register_var does:

  if (POINTER_TYPE_P (type))
mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type)));

All this is normally completely correct -- a variable of type pointer
to vector type *must* hold only properly aligned values.

I guess the vectorizer deliberatly loads a (potentially) unaligned
value into a vector pointer variable.  It then generates a check
whether the value is really aligned; and uses it only if so.

But if that pointer variable "escapes" into the other branch because
DOM thinks it can re-use the value, the REGNO_POINTER_ALIGN value
carried for its DECL_RTL register is now incorrect ...

Maybe the vectorizer ought to declare that variable with a non-default
type alignment setting?  Or else, perform the assignment to the
variable only *inside* the "if" that checks for correct alignment?

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [patch] Fix PR tree-optimization/49771

2011-07-25 Thread Richard Guenther
On Mon, Jul 25, 2011 at 3:22 PM, Ulrich Weigand  wrote:
> Richard Guenther wrote:
>
>> >> Well, the back-end assumes a pointer to vector type is always
>> >> naturally aligned, and therefore the data it points to can be
>> >> accessed via a simple load, with no extra rotate needed.
>> >
>> > I can't see any use of VECTOR_TYPE in config/spu/, and assuming
>> > anything about alignment just because of the kind of the pointer
>> > is bogus - the scalar code does a scalar read using that pointer.
>> > So the backend better should look at the memory operation, not
>> > at the pointer type.  That said, I can't find any code that looks
>> > suspicious in the spu backend.
>> >
>> >> It seems what happened here is that somehow, a pointer to int
>> >> gets replaced by a pointer to vector, even though their alignment
>> >> properties are different.
>> >
>> > No, they are not.  They get replaced if they are value-equivalent
>> > in which case they are also alignment-equivalent.  But well, the
>> > dump snippet wasn't complete and I don't feel like building a
>> > SPU cross to verify myself.
>
>> > Seems perfectly valid to me.  Or well - I suppose we might run into
>> > the issue that the vectorizer sets alignment data at the wrong spot?
>> > You can check alignment info when dumping with the -alias flag.
>> > Building a spu cross now.
>>
>> Nope, all perfectly valid.
>
> Ah, I guess I see what's happening here.  When the SPU back-end is called
> to expand the load, the source operand is passed as:
>
> (mem:SI (reg/f:SI 226 [ vect_pa.9 ])
>        [2 MEM[base: vect_pa.9_44, offset: 0B]+0 S4 A32])
>
> Now this does say the MEM is only guaranteed to be aligned to 32 bits.
>
> However, spu_expand_load then goes and looks at the components of the
> address in detail, in order to figure out how to best perform the access.
> In doing so, it looks at the REGNO_POINTER_ALIGN values of the base
> registers involved in the address.
>
> In this case, REGNO_POINTER_ALIGN (226) is set to 128, and therefore
> the back-end thinks it can use an aligned access after all.
>
> Now, the reason why REGNO_POINTER_ALIGN (226) is 128 is that the register
> is the DECL_RTL for the variable vect_pa.9, and that variable has a
> pointer-to-vector type (with target alignment 128).
>
> When expanding that variable, expand_one_register_var does:
>
>  if (POINTER_TYPE_P (type))
>    mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type)));
>
> All this is normally completely correct -- a variable of type pointer
> to vector type *must* hold only properly aligned values.

No, this is indeed completely bogus code ;)  it should instead
use get_pointer_alignment.

Richard.


Re: [PATCH, i386]: Rewrite LEA handling (was:Re: PATCH [10/n] X32: Support x32 LEA insns)

2011-07-25 Thread H.J. Lu
On Mon, Jul 25, 2011 at 5:33 AM, Uros Bizjak  wrote:
> On Mon, Jul 25, 2011 at 3:58 AM, H.J. Lu  wrote:
>
>>> You are not fixing the core of the problem... this is why you need so
>>> much hacks and kludges at various places (some w.r.t. -fPIC already
>>> existed, see the patch). Above, you correctly identified the problem,
>>> so let's avoid gen_lowpart on SImode operands by not calling it
>>> anymore.
>>>
>>> Attached patch effectively rewrites LEA handling. The trick is, that
>>> instead of using Pmode operations in addresses, we use either SImode
>>> or DImode operations to calculate the address on 64bit targets. Up to
>>> now, address calculations strictly used Pmode, so SImode on 32bit
>>> targets and DImode on 64bit targets. Recent patches to
>>> ix86_decompose_address and ix86_legitimate_address_p relaxed this
>>> requirement.
>>>
>>> Attached patch changes LEA patterns and LEA splitters to accept
>>> addresses, calculated with either SImode or DImode operations.This
>>> means, that on x64 targets, we don't use gen_lowpart on SImode
>>> operands anymore. Since symbol references on x32 are in SImode, this
>>> solves the problem. The patch also avoids generating SImode subregs of
>>> DImode addresses and DImode zero_extends of SImode addresses, since
>>> LEA insn does this for us automatically.
>>>
>>> Please also note the change to ix86_print_operand_address. To avoid
>>> addr32 prefixes, we can force registers in DImode on 64bit targets
>>> without any problems. On x32, we can investigate, if this change
>>> avoids unnecessary LEAs (for PR 49781, patched gcc genrates 6 vs. 8).
>>
>> The testcase won't compile since PIC doesn't work:
>
> Well, I did say that -fPIC did not work.
>
>>> Patch was bootstrapped and regression tested on x86_64-pc-linux-gnu
>>> {,-m32} with no regressions. H.J., can you please test it on x32?
>>
>> On x32, it failed:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49832
>>
>>> BTW: -fPIC is not yet implemented on trunk and still fails there with
>>> an (unrelated) error, I didn't check x32 branch.
>>>
>>
>> This could be:
>>
>> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49833
>
> Attached patch implements -fpic handling for x32. In x32 mode, we now
> use x86_64_general_operand and corresponding "e" constraints for adds
> in SImode, since it looks that invalid addresses can only be generated
> through adds. This avoids a whole bunch of new predicates and
> constraints.
>
> 2011-07-25  Uros Bizjak  
>
>        PR target/47381
>        PR target/49832
>        PR target/49833
>        * config/i386/i386.md (add_operand): New mode attribute.
>        (*movdi_internal_rex64): Remove mode from pic_32bit_operand check.
>        (*movsi_internal): Ditto.  Use "e" constraint in alternative 2.
>        (*lea_1): Use SWI48 mode iterator.
>        (*lea_1_zext): New insn pattern.
>        (add3): Use  predicate for operand 2.
>        (*add_1): Use  predicate for operand 2.  Use "le"
>        constraint for alternative 2.
>        (addsi_1_zext): Use addsi_operand predicate for operand 2.  Use "le"
>        constraint for alternative 2.
>        (add->lea splitter): Check operand modes in insn constraint.  Extend
>        operands less than SImode wide to SImode.
>        (add->lea zext splitter): Do not extend operands to DImode.
>        (*lea_general_1): Handle only QImode and HImode operands.
>        (*lea_general_2): Ditto.
>        (*lea_general_3): Ditto.
>        (*lea_general_1_zext): Remove.
>        (*lea_general_2_zext): Ditto.
>        (*lea_general_3_zext): Ditto.
>        (*lea_general_4): Check operand modes in insn constraint.  Extend
>        operands less than SImode wide to SImode.
>        (ashift->lea splitter): Ditto.
>        * config/i386/i386.c (ix86_print_operand_address): Print address
>        registers with 'q' modifier on 64bit targets.
>        * config/i386/predicates.md (pic_32bit_opreand): Define as special
>        predicate.  Reject non-SI and non-DI modes.
>        (addsi_operand): New predicate.
>
> Uros.
>

X32 glibc is miscompiled:

CPP='/export/build/gnu/gcc-x32/release/usr/gcc-4.7.0-x32/bin/gcc -mx32
 -E -x c-header'
/export/build/gnu/glibc-x32/build-x86_64-linux/elf/ld-linux-x32.so.2
--library-path 
/export/build/gnu/glibc-x32/build-x86_64-linux:/export/build/gnu/glibc-x32/build-x86_64-linux/math:/export/build/gnu/glibc-x32/build-x86_64-linux/elf:/export/build/gnu/glibc-x32/build-x86_64-linux/dlfcn:/export/build/gnu/glibc-x32/build-x86_64-linux/nss:/export/build/gnu/glibc-x32/build-x86_64-linux/nis:/export/build/gnu/glibc-x32/build-x86_64-linux/rt:/export/build/gnu/glibc-x32/build-x86_64-linux/resolv:/export/build/gnu/glibc-x32/build-x86_64-linux/crypt:/export/build/gnu/glibc-x32/build-x86_64-linux/nptl
/export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcgen -Y
../scripts -h rpcsvc/yppasswd.x -o
/export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcsvc/yppasswd.T
make[5]: *** 
[/export/build/gnu/glibc-x32/build-x86_64-linux/sunrp

PR 49809: write data refs can now be calls

2011-07-25 Thread Richard Sandiford
PR 49809 is fallout from my patch to add write data references for the
lhs of calls.  tree-ssa-phiopt.c was still assuming that writes were
always assignments.

I tried to look for other examples of the same thing, but couldn't
find any.

Tested on x86_64-linux-gnu (all,ada).  OK to install?

Richard


gcc/
PR tree-optimization/49809
* tree-ssa-phiopt.c (cond_if_else_store_replacement): Use
gimple_get_lhs instead of gimple_assign_lhs.

Index: gcc/tree-ssa-phiopt.c
===
--- gcc/tree-ssa-phiopt.c   2011-07-21 11:10:34.0 +0100
+++ gcc/tree-ssa-phiopt.c   2011-07-25 14:32:36.0 +0100
@@ -1454,7 +1454,7 @@ cond_if_else_store_replacement (basic_bl
 continue;
 
   then_store = DR_STMT (then_dr);
-  then_lhs = gimple_assign_lhs (then_store);
+  then_lhs = gimple_get_lhs (then_store);
   found = false;
 
   FOR_EACH_VEC_ELT (data_reference_p, else_datarefs, j, else_dr)
@@ -1463,7 +1463,7 @@ cond_if_else_store_replacement (basic_bl
 continue;
 
   else_store = DR_STMT (else_dr);
-  else_lhs = gimple_assign_lhs (else_store);
+  else_lhs = gimple_get_lhs (else_store);
 
   if (operand_equal_p (then_lhs, else_lhs, 0))
 {


Re: [PATCH, i386]: Rewrite LEA handling (was:Re: PATCH [10/n] X32: Support x32 LEA insns)

2011-07-25 Thread Uros Bizjak
On Mon, Jul 25, 2011 at 3:30 PM, H.J. Lu  wrote:

>> Attached patch implements -fpic handling for x32. In x32 mode, we now
>> use x86_64_general_operand and corresponding "e" constraints for adds
>> in SImode, since it looks that invalid addresses can only be generated
>> through adds. This avoids a whole bunch of new predicates and
>> constraints.

> X32 glibc is miscompiled:
>
> CPP='/export/build/gnu/gcc-x32/release/usr/gcc-4.7.0-x32/bin/gcc -mx32
>  -E -x c-header'
> /export/build/gnu/glibc-x32/build-x86_64-linux/elf/ld-linux-x32.so.2
> --library-path 
> /export/build/gnu/glibc-x32/build-x86_64-linux:/export/build/gnu/glibc-x32/build-x86_64-linux/math:/export/build/gnu/glibc-x32/build-x86_64-linux/elf:/export/build/gnu/glibc-x32/build-x86_64-linux/dlfcn:/export/build/gnu/glibc-x32/build-x86_64-linux/nss:/export/build/gnu/glibc-x32/build-x86_64-linux/nis:/export/build/gnu/glibc-x32/build-x86_64-linux/rt:/export/build/gnu/glibc-x32/build-x86_64-linux/resolv:/export/build/gnu/glibc-x32/build-x86_64-linux/crypt:/export/build/gnu/glibc-x32/build-x86_64-linux/nptl
> /export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcgen -Y
> ../scripts -h rpcsvc/yppasswd.x -o
> /export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcsvc/yppasswd.T
> make[5]: *** 
> [/export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcsvc/yppasswd.stmp]
> Segmentation fault (core dumped)
>
> Some LEA patterns are wrong for x32.  I will investigate.

What about x32 GCC testsuite?

Uros.


Re: PR 49809: write data refs can now be calls

2011-07-25 Thread Richard Guenther
On Mon, Jul 25, 2011 at 3:38 PM, Richard Sandiford
 wrote:
> PR 49809 is fallout from my patch to add write data references for the
> lhs of calls.  tree-ssa-phiopt.c was still assuming that writes were
> always assignments.
>
> I tried to look for other examples of the same thing, but couldn't
> find any.
>
> Tested on x86_64-linux-gnu (all,ada).  OK to install?

Ok.

Thanks,
Richard.

> Richard
>
>
> gcc/
>        PR tree-optimization/49809
>        * tree-ssa-phiopt.c (cond_if_else_store_replacement): Use
>        gimple_get_lhs instead of gimple_assign_lhs.
>
> Index: gcc/tree-ssa-phiopt.c
> ===
> --- gcc/tree-ssa-phiopt.c       2011-07-21 11:10:34.0 +0100
> +++ gcc/tree-ssa-phiopt.c       2011-07-25 14:32:36.0 +0100
> @@ -1454,7 +1454,7 @@ cond_if_else_store_replacement (basic_bl
>         continue;
>
>       then_store = DR_STMT (then_dr);
> -      then_lhs = gimple_assign_lhs (then_store);
> +      then_lhs = gimple_get_lhs (then_store);
>       found = false;
>
>       FOR_EACH_VEC_ELT (data_reference_p, else_datarefs, j, else_dr)
> @@ -1463,7 +1463,7 @@ cond_if_else_store_replacement (basic_bl
>             continue;
>
>           else_store = DR_STMT (else_dr);
> -          else_lhs = gimple_assign_lhs (else_store);
> +          else_lhs = gimple_get_lhs (else_store);
>
>           if (operand_equal_p (then_lhs, else_lhs, 0))
>             {
>


Re: PR 49809: write data refs can now be calls

2011-07-25 Thread Eric Botcazou
> PR 49809 is fallout from my patch to add write data references for the
> lhs of calls.  tree-ssa-phiopt.c was still assuming that writes were
> always assignments.
>
> I tried to look for other examples of the same thing, but couldn't
> find any.
>
> Tested on x86_64-linux-gnu (all,ada).  OK to install?

Thanks for fixing this.  Please install the testcase as well.

-- 
Eric Botcazou


Re: [patch] Fix PR tree-optimization/49771

2011-07-25 Thread Richard Guenther
On Mon, Jul 25, 2011 at 3:24 PM, Richard Guenther
 wrote:
> On Mon, Jul 25, 2011 at 3:22 PM, Ulrich Weigand  wrote:
>> Richard Guenther wrote:
>>
>>> >> Well, the back-end assumes a pointer to vector type is always
>>> >> naturally aligned, and therefore the data it points to can be
>>> >> accessed via a simple load, with no extra rotate needed.
>>> >
>>> > I can't see any use of VECTOR_TYPE in config/spu/, and assuming
>>> > anything about alignment just because of the kind of the pointer
>>> > is bogus - the scalar code does a scalar read using that pointer.
>>> > So the backend better should look at the memory operation, not
>>> > at the pointer type.  That said, I can't find any code that looks
>>> > suspicious in the spu backend.
>>> >
>>> >> It seems what happened here is that somehow, a pointer to int
>>> >> gets replaced by a pointer to vector, even though their alignment
>>> >> properties are different.
>>> >
>>> > No, they are not.  They get replaced if they are value-equivalent
>>> > in which case they are also alignment-equivalent.  But well, the
>>> > dump snippet wasn't complete and I don't feel like building a
>>> > SPU cross to verify myself.
>>
>>> > Seems perfectly valid to me.  Or well - I suppose we might run into
>>> > the issue that the vectorizer sets alignment data at the wrong spot?
>>> > You can check alignment info when dumping with the -alias flag.
>>> > Building a spu cross now.
>>>
>>> Nope, all perfectly valid.
>>
>> Ah, I guess I see what's happening here.  When the SPU back-end is called
>> to expand the load, the source operand is passed as:
>>
>> (mem:SI (reg/f:SI 226 [ vect_pa.9 ])
>>        [2 MEM[base: vect_pa.9_44, offset: 0B]+0 S4 A32])
>>
>> Now this does say the MEM is only guaranteed to be aligned to 32 bits.
>>
>> However, spu_expand_load then goes and looks at the components of the
>> address in detail, in order to figure out how to best perform the access.
>> In doing so, it looks at the REGNO_POINTER_ALIGN values of the base
>> registers involved in the address.
>>
>> In this case, REGNO_POINTER_ALIGN (226) is set to 128, and therefore
>> the back-end thinks it can use an aligned access after all.
>>
>> Now, the reason why REGNO_POINTER_ALIGN (226) is 128 is that the register
>> is the DECL_RTL for the variable vect_pa.9, and that variable has a
>> pointer-to-vector type (with target alignment 128).
>>
>> When expanding that variable, expand_one_register_var does:
>>
>>  if (POINTER_TYPE_P (type))
>>    mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type)));
>>
>> All this is normally completely correct -- a variable of type pointer
>> to vector type *must* hold only properly aligned values.
>
> No, this is indeed completely bogus code ;)  it should instead
> use get_pointer_alignment.

Btw, as pseudos do not have a single def site how can the above
ever be correct in the face of coalescing?  For example on trees we
can have

 p_1 = &a; // align 256
 p_2 = p_1 + 4; // align 32

but we'll coalesce the thing and thus would have to use the weaker
alignment of both SSA names.  expand_one_register_var expands
the decl, not the SSA name, so using get_pointer_alignment on
the decl would probably be fine, though also pointless as it always
will return 8.

At least I don't see any code that would prevent a temporary variable
of type int * of being coalesced with a temporary variable of type vector int *.

Why should REGNO_POINTER_ALIGN be interesting to anyone?
Proper alignment information is (should be) attached to every
MEM already.

Richard.


Re: [PATCH] Fix PR49715, (float)unsigned -> (float)signed

2011-07-25 Thread H.J. Lu
On Fri, Jul 22, 2011 at 2:13 AM, Richard Guenther  wrote:
> On Fri, 22 Jul 2011, Richard Guenther wrote:
>
>> On Thu, 21 Jul 2011, Joseph S. Myers wrote:
>>
>> > On Thu, 21 Jul 2011, Richard Guenther wrote:
>> >
>> > > Patch also handling wider modes and not starting with SImode but
>> > > the mode of int:
>> >
>> > Use of target int for anything not about C ABIs is certainly wrong.  This
>> > might be about what operations the target does efficiently, or what
>> > functions are present in libgcc (both of which would be functions of
>> > machine modes), but it's not about the choice of C int.
>>
>> Ok.  Given rths last suggestion I'm testing the following which
>> checks all integer modes (but never will widen - optabs.c will do
>> that if it turns out to be profitable).
>
> Err, I should refresh the patch before sending it ... here it goes.
>
> Richard.
>
> 2011-07-22  Richard Guenther  
>
>        PR tree-optimization/49715
>        * tree-vrp.c: Include expr.h and optabs.h.
>        (simplify_float_conversion_using_ranges): New function.
>        (simplify_stmt_using_ranges): Call it.
>        * Makefile.in (tree-vrp.o): Add $(EXPR_H) and $(OPTABS_H) dependencies.
>        * optabs.c (can_float_p): Export.
>        * optabs.h (can_float_p): Declare.
>
>        * gcc.target/i386/pr49715-1.c: New testcase.
>        * gcc.target/i386/pr49715-2.c: Likewise.
>

I think this caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49840

-- 
H.J.


Re: [patch] Fix PR tree-optimization/49771

2011-07-25 Thread Richard Guenther
On Mon, Jul 25, 2011 at 4:03 PM, Richard Guenther
 wrote:
> On Mon, Jul 25, 2011 at 3:24 PM, Richard Guenther
>  wrote:
>> On Mon, Jul 25, 2011 at 3:22 PM, Ulrich Weigand  wrote:
>>> Richard Guenther wrote:
>>>
 >> Well, the back-end assumes a pointer to vector type is always
 >> naturally aligned, and therefore the data it points to can be
 >> accessed via a simple load, with no extra rotate needed.
 >
 > I can't see any use of VECTOR_TYPE in config/spu/, and assuming
 > anything about alignment just because of the kind of the pointer
 > is bogus - the scalar code does a scalar read using that pointer.
 > So the backend better should look at the memory operation, not
 > at the pointer type.  That said, I can't find any code that looks
 > suspicious in the spu backend.
 >
 >> It seems what happened here is that somehow, a pointer to int
 >> gets replaced by a pointer to vector, even though their alignment
 >> properties are different.
 >
 > No, they are not.  They get replaced if they are value-equivalent
 > in which case they are also alignment-equivalent.  But well, the
 > dump snippet wasn't complete and I don't feel like building a
 > SPU cross to verify myself.
>>>
 > Seems perfectly valid to me.  Or well - I suppose we might run into
 > the issue that the vectorizer sets alignment data at the wrong spot?
 > You can check alignment info when dumping with the -alias flag.
 > Building a spu cross now.

 Nope, all perfectly valid.
>>>
>>> Ah, I guess I see what's happening here.  When the SPU back-end is called
>>> to expand the load, the source operand is passed as:
>>>
>>> (mem:SI (reg/f:SI 226 [ vect_pa.9 ])
>>>        [2 MEM[base: vect_pa.9_44, offset: 0B]+0 S4 A32])
>>>
>>> Now this does say the MEM is only guaranteed to be aligned to 32 bits.
>>>
>>> However, spu_expand_load then goes and looks at the components of the
>>> address in detail, in order to figure out how to best perform the access.
>>> In doing so, it looks at the REGNO_POINTER_ALIGN values of the base
>>> registers involved in the address.
>>>
>>> In this case, REGNO_POINTER_ALIGN (226) is set to 128, and therefore
>>> the back-end thinks it can use an aligned access after all.
>>>
>>> Now, the reason why REGNO_POINTER_ALIGN (226) is 128 is that the register
>>> is the DECL_RTL for the variable vect_pa.9, and that variable has a
>>> pointer-to-vector type (with target alignment 128).
>>>
>>> When expanding that variable, expand_one_register_var does:
>>>
>>>  if (POINTER_TYPE_P (type))
>>>    mark_reg_pointer (x, TYPE_ALIGN (TREE_TYPE (type)));
>>>
>>> All this is normally completely correct -- a variable of type pointer
>>> to vector type *must* hold only properly aligned values.
>>
>> No, this is indeed completely bogus code ;)  it should instead
>> use get_pointer_alignment.
>
> Btw, as pseudos do not have a single def site how can the above
> ever be correct in the face of coalescing?  For example on trees we
> can have
>
>  p_1 = &a; // align 256
>  p_2 = p_1 + 4; // align 32
>
> but we'll coalesce the thing and thus would have to use the weaker
> alignment of both SSA names.  expand_one_register_var expands
> the decl, not the SSA name, so using get_pointer_alignment on
> the decl would probably be fine, though also pointless as it always
> will return 8.
>
> At least I don't see any code that would prevent a temporary variable
> of type int * of being coalesced with a temporary variable of type vector int 
> *.
>
> Why should REGNO_POINTER_ALIGN be interesting to anyone?
> Proper alignment information is (should be) attached to every
> MEM already.

nonzero_bits1 seems to be the only consumer of REGNO_POINTER_ALIGN
apart from maybe alpha.c and spu.c.

We should simply kill REGNO_POINTER_ALIGN IMHO.

Richard.

> Richard.
>


Re: [PATCH] Fix PR49715, (float)unsigned -> (float)signed

2011-07-25 Thread Richard Guenther
On Mon, 25 Jul 2011, H.J. Lu wrote:

> On Fri, Jul 22, 2011 at 2:13 AM, Richard Guenther  wrote:
> > On Fri, 22 Jul 2011, Richard Guenther wrote:
> >
> >> On Thu, 21 Jul 2011, Joseph S. Myers wrote:
> >>
> >> > On Thu, 21 Jul 2011, Richard Guenther wrote:
> >> >
> >> > > Patch also handling wider modes and not starting with SImode but
> >> > > the mode of int:
> >> >
> >> > Use of target int for anything not about C ABIs is certainly wrong.  This
> >> > might be about what operations the target does efficiently, or what
> >> > functions are present in libgcc (both of which would be functions of
> >> > machine modes), but it's not about the choice of C int.
> >>
> >> Ok.  Given rths last suggestion I'm testing the following which
> >> checks all integer modes (but never will widen - optabs.c will do
> >> that if it turns out to be profitable).
> >
> > Err, I should refresh the patch before sending it ... here it goes.
> >
> > Richard.
> >
> > 2011-07-22  Richard Guenther  
> >
> >        PR tree-optimization/49715
> >        * tree-vrp.c: Include expr.h and optabs.h.
> >        (simplify_float_conversion_using_ranges): New function.
> >        (simplify_stmt_using_ranges): Call it.
> >        * Makefile.in (tree-vrp.o): Add $(EXPR_H) and $(OPTABS_H) 
> > dependencies.
> >        * optabs.c (can_float_p): Export.
> >        * optabs.h (can_float_p): Declare.
> >
> >        * gcc.target/i386/pr49715-1.c: New testcase.
> >        * gcc.target/i386/pr49715-2.c: Likewise.
> >
> 
> I think this caused:
> 
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=49840

I didn't see those with -m32 on x86_64.  Would we expect these to
only show up on a host i?86 machine?

Richard.

Re: [PATCH, i386]: Rewrite LEA handling (was:Re: PATCH [10/n] X32: Support x32 LEA insns)

2011-07-25 Thread H.J. Lu
On Mon, Jul 25, 2011 at 6:43 AM, Uros Bizjak  wrote:
> On Mon, Jul 25, 2011 at 3:30 PM, H.J. Lu  wrote:
>
>>> Attached patch implements -fpic handling for x32. In x32 mode, we now
>>> use x86_64_general_operand and corresponding "e" constraints for adds
>>> in SImode, since it looks that invalid addresses can only be generated
>>> through adds. This avoids a whole bunch of new predicates and
>>> constraints.
>
>> X32 glibc is miscompiled:
>>
>> CPP='/export/build/gnu/gcc-x32/release/usr/gcc-4.7.0-x32/bin/gcc -mx32
>>  -E -x c-header'
>> /export/build/gnu/glibc-x32/build-x86_64-linux/elf/ld-linux-x32.so.2
>> --library-path 
>> /export/build/gnu/glibc-x32/build-x86_64-linux:/export/build/gnu/glibc-x32/build-x86_64-linux/math:/export/build/gnu/glibc-x32/build-x86_64-linux/elf:/export/build/gnu/glibc-x32/build-x86_64-linux/dlfcn:/export/build/gnu/glibc-x32/build-x86_64-linux/nss:/export/build/gnu/glibc-x32/build-x86_64-linux/nis:/export/build/gnu/glibc-x32/build-x86_64-linux/rt:/export/build/gnu/glibc-x32/build-x86_64-linux/resolv:/export/build/gnu/glibc-x32/build-x86_64-linux/crypt:/export/build/gnu/glibc-x32/build-x86_64-linux/nptl
>> /export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcgen -Y
>> ../scripts -h rpcsvc/yppasswd.x -o
>> /export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcsvc/yppasswd.T
>> make[5]: *** 
>> [/export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcsvc/yppasswd.stmp]
>> Segmentation fault (core dumped)
>>
>> Some LEA patterns are wrong for x32.  I will investigate.
>
> What about x32 GCC testsuite?
>

GCC testsuite is clean on x32.



-- 
H.J.


Re: [RFC PATCH 0/9] CFG aware dwarf2 cfi generation

2011-07-25 Thread David Edelsohn
On Sun, Jul 24, 2011 at 11:56 PM, Richard Henderson  wrote:

> Please try again.  I've fixed 4 bugs today for different targets;
> hopefully I've gotten this one as part of that.

Unfortunately, no.  Same error.

- David


[PATCH] Fix PR49822

2011-07-25 Thread Richard Guenther

This robustifies remove_prop_source_from_use some more.

Bootstrapped and tested on x86_64-unknown-linux-gnu, also
cross-tested the arm testcase.  Applied.

Richard.

2011-07-25  Richard Guenther  

PR tree-optimization/49822
* tree-ssa-forwprop.c (remove_prop_source_from_use): Robustify
more.  Make sure to preserve stmts with side-effects.  Properly
handle virtual defs, follow a longer def chain.

Index: gcc/tree-ssa-forwprop.c
===
*** gcc/tree-ssa-forwprop.c (revision 176735)
--- gcc/tree-ssa-forwprop.c (working copy)
*** can_propagate_from (gimple def_stmt)
*** 295,303 
return true;
  }
  
! /* Remove a copy chain ending in NAME along the defs.
 If NAME was replaced in its only use then this function can be used
!to clean up dead stmts.  Returns true if cleanup-cfg has to run.  */
  
  static bool
  remove_prop_source_from_use (tree name)
--- 295,306 
return true;
  }
  
! /* Remove a chain of dead statements starting at the definition of
!NAME.  The chain is linked via the first operand of the defining 
statements.
 If NAME was replaced in its only use then this function can be used
!to clean up dead stmts.  The function handles already released SSA
!names gracefully.
!Returns true if cleanup-cfg has to run.  */
  
  static bool
  remove_prop_source_from_use (tree name)
*** remove_prop_source_from_use (tree name)
*** 309,327 
do {
  basic_block bb;
  
! if (!has_zero_uses (name))
return cfg_changed;
  
  stmt = SSA_NAME_DEF_STMT (name);
! bb = gimple_bb (stmt);
! if (!bb)
return cfg_changed;
  gsi = gsi_for_stmt (stmt);
! release_defs (stmt);
  gsi_remove (&gsi, true);
  cfg_changed |= gimple_purge_dead_eh_edges (bb);
  
! name = (gimple_assign_copy_p (stmt)) ? gimple_assign_rhs1 (stmt) : NULL;
} while (name && TREE_CODE (name) == SSA_NAME);
  
return cfg_changed;
--- 312,335 
do {
  basic_block bb;
  
! if (SSA_NAME_IN_FREE_LIST (name)
!   || SSA_NAME_IS_DEFAULT_DEF (name)
!   || !has_zero_uses (name))
return cfg_changed;
  
  stmt = SSA_NAME_DEF_STMT (name);
! if (gimple_code (stmt) == GIMPLE_PHI
!   || gimple_has_side_effects (stmt))
return cfg_changed;
+ 
+ bb = gimple_bb (stmt);
  gsi = gsi_for_stmt (stmt);
! unlink_stmt_vdef (stmt);
  gsi_remove (&gsi, true);
+ release_defs (stmt);
  cfg_changed |= gimple_purge_dead_eh_edges (bb);
  
! name = is_gimple_assign (stmt) ? gimple_assign_rhs1 (stmt) : NULL_TREE;
} while (name && TREE_CODE (name) == SSA_NAME);
  
return cfg_changed;


RE: [Patch,AVR]: Fix PR39386 (x << x and x >> x)

2011-07-25 Thread Weddington, Eric


> -Original Message-
> From: Georg-Johann Lay [mailto:a...@gjlay.de]
> Sent: Monday, July 25, 2011 5:20 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Denis Chertykov; Weddington, Eric; Richard Henderson
> Subject: [Patch,AVR]: Fix PR39386 (x << x and x >> x)
> 
> This is a fix for pathological, variable shift offset shifts of
> the form x << x resp. x >> x.
> 
 
> Ok to install?
> 
> Johann
> 
> 
>   PR target/39386
>   * config/avr/avr.c (out_shift_with_cnt): Use tmp_reg as
>   shift counter for x << x and x >> x shifts.

Please commit. Thanks!

Eric Weddington


RE: [Patch,AVR]: PR49687 (better widening 32-bit mul)

2011-07-25 Thread Weddington, Eric


> -Original Message-
> From: Georg-Johann Lay [mailto:a...@gjlay.de]
> Sent: Monday, July 25, 2011 3:32 AM
> To: gcc-patches@gcc.gnu.org
> Cc: Anatoly Sokolov; Denis Chertykov; Weddington, Eric; Richard Henderson
> Subject: [Patch,AVR]: PR49687 (better widening 32-bit mul)
> 
> Eric, can you review the assembler routines and say if such reuse is ok
> or if you'd prefer a speed-optimized version of __mulsi3 like in the
> current libgcc?

Hi Johann,

Typically a penalty on speed is preferred over a penalty on code size. Do you 
already have information on how it compares on code size with the old routines?

Eric


Re: [RFC] Replace some bitmaps with HARD_REG_SETs - second version

2011-07-25 Thread Michael Matz
Hi,

On Mon, 25 Jul 2011, Dimitrios Apostolou wrote:

> Bug found, in df_mark_reg I need to iterate until regno + n, not n. The error
> is at the following hunk:
> 
> --- gcc/df-scan.c   2011-02-02 20:08:06 +
> +++ gcc/df-scan.c   2011-07-24 17:16:46 +
> @@ -3713,35 +3717,40 @@ df_mark_reg (rtx reg, void *vset)
>if (regno < FIRST_PSEUDO_REGISTER)
>  {
>int n = hard_regno_nregs[regno][GET_MODE (reg)];
> -  bitmap_set_range (set, regno, n);
> +  int i;
> +  for (i=regno; i +   SET_HARD_REG_BIT (*set, i);
>  }

No.  n is a count, hence the upper bound is regno + n.


Ciao,
Michael.


Re: [patch] Fix PR tree-optimization/49771

2011-07-25 Thread Ulrich Weigand
Richard Guenther wrote:

> > Btw, as pseudos do not have a single def site how can the above
> > ever be correct in the face of coalescing?

I had always understood this to reflect the simple fact that a
pointer to some type must never hold a value that is not properly
aligned for that type.  (Maybe this is only true on STRICT_ALIGNMENT
targets?)   This has always been an important property to generate
good code on SPU ...

> > For example on trees we can have
> >
> >  p_1 = &a; // align 256
> >  p_2 = p_1 + 4; // align 32
> >
> > but we'll coalesce the thing and thus would have to use the weaker
> > alignment of both SSA names.  expand_one_register_var expands
> > the decl, not the SSA name, so using get_pointer_alignment on
> > the decl would probably be fine, though also pointless as it always
> > will return 8.
> >
> > At least I don't see any code that would prevent a temporary variable
> > of type int * of being coalesced with a temporary variable of type vector
> > int *.

I don't really understand the coalesce code, but in the above sample,
the two variables must have the same type, otherwise there'd have to
be a cast somewhere.  Does coalesce eliminate casts?

> > Why should REGNO_POINTER_ALIGN be interesting to anyone?
> > Proper alignment information is (should be) attached to every
> > MEM already.
> 
> nonzero_bits1 seems to be the only consumer of REGNO_POINTER_ALIGN
> apart from maybe alpha.c and spu.c.
> 
> We should simply kill REGNO_POINTER_ALIGN IMHO.

On the SPU at least, REGNO_POINTER_ALIGN carries additional information
over just the MEM alignment.  Say, I'm getting a MEM the form
(mem (plus (reg X) (reg Y))), and the MEM is aligned to 32 bits.

This means I need to generate a rotate to fix up the value that was
loaded by the (forced aligned) load instruction.  However, the form
of this rotate can be simpler if I know that e.g. reg X is always
guaranteed to be 128-bits aligned and only reg Y introduces the
potential misalignment.  If on the other hand neither of the base
registers is guaranteed to be 128-bit aligned, I need to generate
more complex rotate code ...

I understand this may also be important on other platforms, e.g.
to choose which register to use as base and which as index in a
memory operation ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: Allow Tru64 UNIX bootstrap with C++

2011-07-25 Thread Rainer Orth
Ian Lance Taylor  writes:

> I'll preapprove a similar patch to libcpp/system.h if you want to take a
> look at that.

Sure.  The following patch has been bootstrapped successfully on
i386-pc-solaris2.11, installed on mainline.

Rainer


2011-07-22  Rainer Orth  

* system.h [__cplusplus]: Wrap C function declarations in extern "C".

diff --git a/libcpp/system.h b/libcpp/system.h
--- a/libcpp/system.h
+++ b/libcpp/system.h
@@ -84,6 +84,10 @@ along with GCC; see the file COPYING3.  
 #  define fputc(C, Stream) fputc_unlocked (C, Stream)
 # endif
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 # ifdef HAVE_CLEARERR_UNLOCKED
 #  undef clearerr
 #  define clearerr(Stream) clearerr_unlocked (Stream)
@@ -164,6 +168,10 @@ extern int fprintf_unlocked (FILE *, con
 #  endif
 # endif
 
+#ifdef __cplusplus
+}
+#endif
+
 #endif
 
 /* ??? Glibc's fwrite/fread_unlocked macros cause
@@ -286,10 +294,18 @@ extern int errno;
here.  These checks will be in the undefined state while configure
is running so be careful to test "defined (HAVE_DECL_*)".  */
 
+#ifdef __cplusplus
+extern "C" {
+#endif
+
 #if defined (HAVE_DECL_ABORT) && !HAVE_DECL_ABORT
 extern void abort (void);
 #endif
 
+#ifdef __cplusplus
+}
+#endif
+
 #if HAVE_SYS_STAT_H
 # include 
 #endif

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [patch] Fix PR tree-optimization/49771

2011-07-25 Thread Richard Guenther
On Mon, Jul 25, 2011 at 4:23 PM, Ulrich Weigand  wrote:
> Richard Guenther wrote:
>
>> > Btw, as pseudos do not have a single def site how can the above
>> > ever be correct in the face of coalescing?
>
> I had always understood this to reflect the simple fact that a
> pointer to some type must never hold a value that is not properly
> aligned for that type.  (Maybe this is only true on STRICT_ALIGNMENT
> targets?)   This has always been an important property to generate
> good code on SPU ...

We do not preserve pointer type casts in the middle-end (anymore).

>> > For example on trees we can have
>> >
>> >  p_1 = &a; // align 256
>> >  p_2 = p_1 + 4; // align 32
>> >
>> > but we'll coalesce the thing and thus would have to use the weaker
>> > alignment of both SSA names.  expand_one_register_var expands
>> > the decl, not the SSA name, so using get_pointer_alignment on
>> > the decl would probably be fine, though also pointless as it always
>> > will return 8.
>> >
>> > At least I don't see any code that would prevent a temporary variable
>> > of type int * of being coalesced with a temporary variable of type vector
>> > int *.
>
> I don't really understand the coalesce code, but in the above sample,
> the two variables must have the same type, otherwise there'd have to
> be a cast somewhere.  Does coalesce eliminate casts?

No, there is no cast between different pointer types.  Information is
not attached to types but to real entities.

>> > Why should REGNO_POINTER_ALIGN be interesting to anyone?
>> > Proper alignment information is (should be) attached to every
>> > MEM already.
>>
>> nonzero_bits1 seems to be the only consumer of REGNO_POINTER_ALIGN
>> apart from maybe alpha.c and spu.c.
>>
>> We should simply kill REGNO_POINTER_ALIGN IMHO.
>
> On the SPU at least, REGNO_POINTER_ALIGN carries additional information
> over just the MEM alignment.  Say, I'm getting a MEM the form
> (mem (plus (reg X) (reg Y))), and the MEM is aligned to 32 bits.
>
> This means I need to generate a rotate to fix up the value that was
> loaded by the (forced aligned) load instruction.  However, the form
> of this rotate can be simpler if I know that e.g. reg X is always
> guaranteed to be 128-bits aligned and only reg Y introduces the
> potential misalignment.  If on the other hand neither of the base
> registers is guaranteed to be 128-bit aligned, I need to generate
> more complex rotate code ...

Because then you need the value of X + Y instead of just picking either?

Why not expand this explicitly when you still have the per-SSA name
alignment information around?

> I understand this may also be important on other platforms, e.g.
> to choose which register to use as base and which as index in a
> memory operation ...

Well, we still have REG_POINTER.

Richard.

> Bye,
> Ulrich
>
> --
>  Dr. Ulrich Weigand
>  GNU Toolchain for Linux on System z and Cell BE
>  ulrich.weig...@de.ibm.com
>


[patch] Fix PR tree-optimization/49471

2011-07-25 Thread Razya Ladelsky
Hi,

This patch fixes the build failure of cactusADM and dealII spec2006 
benchmarks when autopar is enabled.
(for powerpc they fail only when -m32 is additionally enabled)

The problem originated in canonicalize_loop_ivs, where we iterate the 
header's phis in order to base all
the induction variables on a single control variable.
We use the largest precision of the loop's ivs in order to determine the 
type of the control variable. 

Since iterating the loop's phis takes into account not only the loop's 
ivs, but also reduction variables, 
we got precision values like 80 for x86, or 128 for ppc.
The compilers failed to create proper types for these sizes 
(respectively).

The proper behavior for determining the control variable's type is to take 
into account only the loop's ivs,
which is what this patch does. 

Bootstrap and testsuite pass successfully (as autopar is not enabled by 
default).
No new regressions when the testsuite is run with autopar enabled.
No new regressions for the run of spec2006 with autopar enabled, 

cactusADM and dealII benchmarks now pass successfully with autopar on 
powerpc and x86.

Thanks to Zdenek who helped me figure out the failure/fix. 
OK for trunk? 
Thanks,
Razya

ChangeLog:

   PR tree-optimization/49471
   * tree-vect-loop-manip.c (canonicalize_loop_ivs): Add condition to 
   ignore reduction variables when iterating the loop header's phis.


Index: tree-ssa-loop-manip.c
===
*** tree-ssa-loop-manip.c   (revision 175851)
--- tree-ssa-loop-manip.c   (working copy)
*** canonicalize_loop_ivs (struct loop *loop
*** 1200,1205 
--- 1200,1206 
gimple stmt;
edge exit = single_dom_exit (loop);
gimple_seq stmts;
+   affine_iv iv;
  
for (psi = gsi_start_phis (loop->header);
 !gsi_end_p (psi); gsi_next (&psi))
*** canonicalize_loop_ivs (struct loop *loop
*** 1207,1213 
gimple phi = gsi_stmt (psi);
tree res = PHI_RESULT (phi);
  
!   if (is_gimple_reg (res) && TYPE_PRECISION (TREE_TYPE (res)) > precision)
precision = TYPE_PRECISION (TREE_TYPE (res));
  }
  
--- 1208,1216 
gimple phi = gsi_stmt (psi);
tree res = PHI_RESULT (phi);
  
!   if (is_gimple_reg (res) 
! && simple_iv (loop, loop, res, &iv, true)
! && TYPE_PRECISION (TREE_TYPE (res)) > precision)
precision = TYPE_PRECISION (TREE_TYPE (res));
  }
  
=

eliminate bitmap regs_invalidated_by_call_regset

2011-07-25 Thread Dimitrios Apostolou

Hello list,

the attached patch eliminates regs_invalidated_by_call_regset bitmap and 
uses instead the original regs_invalidated_by_call HARD_REG_SET. Tested on 
i386, I had the following two regressions that I'll investigate right on:


 FAIL: libmudflap.cth/pass39-frag.c (-O3) (rerun 10) execution test
 FAIL: libmudflap.cth/pass39-frag.c (-O3) (rerun 10) output pattern test



Performance measured not to be affected, maybe it is now a couple 
milliseconds faster:


Original: PC1:0.878s, PC2:6.55s, 2105.6 M instr
Patched : PC1:0.875s, PC2:6.54s, 2104.9 M instr


2011-07-25  Dimitrios Apostolou 

	* df-core.c, df-problems.c, df-scan.c, df.h, reginfo.c, regset.h: 
Eliminate regs_invalidated_by_call_regset bitmap and use instead the 
original regs_invalidated_by_call HARD_REG_SET.



All comments are welcome,
Dimitris=== modified file 'gcc/df-core.c'
--- gcc/df-core.c   2011-04-20 18:19:03 +
+++ gcc/df-core.c   2011-07-25 13:58:58 +
@@ -1886,6 +1886,17 @@ df_print_regset (FILE *file, bitmap r)
 }
 
 
+void
+df_print_hard_reg_set (FILE *f, HARD_REG_SET r)
+{
+  unsigned int i;
+  hard_reg_set_iterator iter;
+
+  EXECUTE_IF_SET_IN_HARD_REG_SET (r, 0, i, iter)
+fprintf (f, " %d [%s]", i, reg_names[i]);
+  fprintf (f, "\n");
+}
+
 /* Write information about registers and basic blocks into FILE.  The
bitmap is in the form used by df_byte_lr.  This is part of making a
debugging dump.  */

=== modified file 'gcc/df-problems.c'
--- gcc/df-problems.c   2011-05-04 20:24:15 +
+++ gcc/df-problems.c   2011-07-25 13:58:58 +
@@ -432,6 +432,7 @@ df_rd_local_compute (bitmap all_blocks)
 {
   unsigned int bb_index;
   bitmap_iterator bi;
+  hard_reg_set_iterator iter;
   unsigned int regno;
   struct df_rd_problem_data *problem_data
 = (struct df_rd_problem_data *) df_rd->problem_data;
@@ -449,7 +450,7 @@ df_rd_local_compute (bitmap all_blocks)
 }
 
   /* Set up the knockout bit vectors to be applied across EH_EDGES.  */
-  EXECUTE_IF_SET_IN_BITMAP (regs_invalidated_by_call_regset, 0, regno, bi)
+  EXECUTE_IF_SET_IN_HARD_REG_SET (regs_invalidated_by_call, 0, regno, iter)
 {
   if (DF_DEFS_COUNT (regno) > DF_SPARSE_THRESHOLD)
bitmap_set_bit (sparse_invalidated, regno);
@@ -969,6 +970,29 @@ df_lr_confluence_0 (basic_block bb)
 bitmap_copy (op1, &df->hardware_regs_used);
 }
 
+/* to |= from1 & ~from2
+   from2 is of type HARD_REG_SET */
+
+static bool
+bitmap_ior_and_compl_from_hard_reg_set (bitmap to, const_bitmap from1,
+   HARD_REG_SET from2)
+{
+  bool ret;
+  unsigned int i;
+  bitmap_head from1_tmp;
+  hard_reg_set_iterator iter;
+
+  bitmap_initialize (&from1_tmp, &bitmap_default_obstack);
+  bitmap_copy (&from1_tmp, from1);
+
+  /* TODO optimise per-word */
+  EXECUTE_IF_SET_IN_HARD_REG_SET (from2, 0, i, iter)
+bitmap_clear_bit (&from1_tmp, i);
+  ret = bitmap_ior_into (to, &from1_tmp);
+
+  bitmap_clear (&from1_tmp);
+  return ret;
+}
 
 /* Confluence function that ignores fake edges.  */
 
@@ -983,7 +1007,8 @@ df_lr_confluence_n (edge e)
   /* ??? Abnormal call edges ignored for the moment, as this gets
  confused by sibling call edges, which crashes reg-stack.  */
   if (e->flags & EDGE_EH)
-changed = bitmap_ior_and_compl_into (op1, op2, 
regs_invalidated_by_call_regset);
+changed = bitmap_ior_and_compl_from_hard_reg_set (op1, op2,
+ regs_invalidated_by_call);
   else
 changed = bitmap_ior_into (op1, op2);
 
@@ -4450,8 +4475,8 @@ df_md_confluence_n (edge e)
 return false;
 
   if (e->flags & EDGE_EH)
-return bitmap_ior_and_compl_into (op1, op2,
- regs_invalidated_by_call_regset);
+return bitmap_ior_and_compl_from_hard_reg_set (op1, op2,
+  regs_invalidated_by_call);
   else
 return bitmap_ior_into (op1, op2);
 }

=== modified file 'gcc/df-scan.c'
--- gcc/df-scan.c   2011-02-02 20:08:06 +
+++ gcc/df-scan.c   2011-07-25 13:58:58 +
@@ -409,7 +409,7 @@ df_scan_start_dump (FILE *file ATTRIBUTE
   rtx insn;
 
   fprintf (file, ";;  invalidated by call \t");
-  df_print_regset (file, regs_invalidated_by_call_regset);
+  df_print_hard_reg_set (file, regs_invalidated_by_call);
   fprintf (file, ";;  hardware regs used \t");
   df_print_regset (file, &df->hardware_regs_used);
   fprintf (file, ";;  regular block artificial uses \t");
@@ -3317,7 +3317,7 @@ df_get_call_refs (struct df_collection_r
   int flags)
 {
   rtx note;
-  bitmap_iterator bi;
+  hard_reg_set_iterator iter;
   unsigned int ui;
   bool is_sibling_call;
   unsigned int i;
@@ -3375,7 +3375,7 @@ df_get_call_refs (struct df_collection_r
}
 
   is_sibling_call = SIBLING_CALL_P (insn_info->insn);
-  EXECUTE_IF_SET_IN_BITMAP (regs_invalidated_by_call_regset, 0, ui, bi)
+  EXECUTE_IF_SET_IN_HARD_REG_SET (regs_invalidated_by_call, 0, ui, iter)
 {
   i

Re: [patch] Fix PR tree-optimization/49471

2011-07-25 Thread Richard Guenther
On Mon, Jul 25, 2011 at 4:47 PM, Razya Ladelsky  wrote:
> Hi,
>
> This patch fixes the build failure of cactusADM and dealII spec2006
> benchmarks when autopar is enabled.
> (for powerpc they fail only when -m32 is additionally enabled)
>
> The problem originated in canonicalize_loop_ivs, where we iterate the
> header's phis in order to base all
> the induction variables on a single control variable.
> We use the largest precision of the loop's ivs in order to determine the
> type of the control variable.
>
> Since iterating the loop's phis takes into account not only the loop's
> ivs, but also reduction variables,
> we got precision values like 80 for x86, or 128 for ppc.
> The compilers failed to create proper types for these sizes
> (respectively).
>
> The proper behavior for determining the control variable's type is to take
> into account only the loop's ivs,
> which is what this patch does.
>
> Bootstrap and testsuite pass successfully (as autopar is not enabled by
> default).
> No new regressions when the testsuite is run with autopar enabled.
> No new regressions for the run of spec2006 with autopar enabled,
>
> cactusADM and dealII benchmarks now pass successfully with autopar on
> powerpc and x86.
>
> Thanks to Zdenek who helped me figure out the failure/fix.
> OK for trunk?

It'll collide with Sebastians patch in that area.  I suggested a
INTEGRAL_TYPE_P check instead of the simple_iv one, it
should be cheaper.  Zdenek, do you think it will be "incorrect"
in some cases?

Thanks,
Richard.

> Thanks,
> Razya
>
> ChangeLog:
>
>   PR tree-optimization/49471
>   * tree-vect-loop-manip.c (canonicalize_loop_ivs): Add condition to
>   ignore reduction variables when iterating the loop header's phis.
>
>
>


Re: [PATCH 2/3] canonicalize_loop_ivs should not generate unsigned types.

2011-07-25 Thread Sebastian Pop
On Mon, Jul 25, 2011 at 03:41, Richard Guenther  wrote:
> I think we also need to care for non-integral PHIs where TYPE_PRECISION
> and TYPE_UNSIGNED are not applicable (seems the original code is also
> buggy here?).  So, sth like
>
>  type = TREE_TYPE (res);
>  if (!is_gimple_reg (res)
>      || !INTEGRAL_TYPE_P (type)
>      || TYPE_PRECISION (type) < precision)
>    continue;
>
>  precision = TYPE_PRECISION (type);
>  unsigned_p |= TYPE_UNSIGNED (type);
> }
>

This would not work optimally on the following sequence:
unsigned char
signed short
as we would set the unsigned_p to true for the "unsigned char" and
then we would not reset the value of unsigned_p when the precision
increases.  So what about doing this instead:

  type = TREE_TYPE (res);
  if (!is_gimple_reg (res)
  || !INTEGRAL_TYPE_P (type)
  || TYPE_PRECISION (type) < precision)
continue;

  if (TYPE_PRECISION (type) > precision)
unsigned_p = TYPE_UNSIGNED (type);
  else
unsigned_p |= TYPE_UNSIGNED (type);

  precision = TYPE_PRECISION (type);

Thanks,
Sebastian


Reject unqualified *-*-solaris2 configurations (PR target/47124)

2011-07-25 Thread Rainer Orth
As described in the PR, configuring for e.g. sparc-sun-solaris2 fails to
build in the gcc directory.  While this could be made to work, it
doesn't really make sense: in a native build, you don't have to specify
the configure triplet you're building for, and in a cross you must,
given the differences between different versions of Solaris.

Thus I'm simply rejecting such a target.  The following patch was tested
by configuring with --target i386-pc-solaris2 and observing the
configuration being rejected.

Installed on mainline.

Rainer


2011-07-22  Rainer Orth  

PR target/47124
* config.gcc: Reject *-*-solaris2 configuration.

diff --git a/gcc/config.gcc b/gcc/config.gcc
--- a/gcc/config.gcc
+++ b/gcc/config.gcc
@@ -280,6 +280,7 @@ case ${target} in
  | *-*-linux*oldld*\
  | *-*-rtemsaout*  \
  | *-*-rtemscoff*  \
+ | *-*-solaris2\
  | *-*-solaris2.[0-7]  \
  | *-*-solaris2.[0-7].*\
  | *-*-sysv*   \

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [patch] Fix PR tree-optimization/49471

2011-07-25 Thread Razya Ladelsky
Razya Ladelsky/Haifa/IBM wrote on 25/07/2011 05:44:02 PM:

> From: Razya Ladelsky/Haifa/IBM
> To: gcc-patches@gcc.gnu.org
> Cc: Zdenek Dvorak , Richard Guenther 
> 
> Date: 25/07/2011 05:44 PM
> Subject: [patch] Fix PR tree-optimization/49471
> 
> Hi,
> 
> This patch fixes the build failure of cactusADM and dealII spec2006 
> benchmarks when autopar is enabled.
> (for powerpc they fail only when -m32 is additionally enabled)
> 
> The problem originated in canonicalize_loop_ivs, where we iterate 
> the header's phis in order to base all
> the induction variables on a single control variable.
> We use the largest precision of the loop's ivs in order to determine
> the type of the control variable. 
> 
> Since iterating the loop's phis takes into account not only the 
> loop's ivs, but also reduction variables, 
> we got precision values like 80 for x86, or 128 for ppc.
> The compilers failed to create proper types for these sizes 
(respectively).
> 
> The proper behavior for determining the control variable's type is 
> to take into account only the loop's ivs,
> which is what this patch does. 
> 
> Bootstrap and testsuite pass successfully (as autopar is not enabled
> by default).
> No new regressions when the testsuite is run with autopar enabled.
> No new regressions for the run of spec2006 with autopar enabled, 
> 
> cactusADM and dealII benchmarks now pass successfully with autopar 
> on powerpc and x86.
> 
> Thanks to Zdenek who helped me figure out the failure/fix. 
> OK for trunk? 
> Thanks,
> Razya
> 
> ChangeLog:
> 
>PR tree-optimization/49471
>* tree-vect-loop-manip.c (canonicalize_loop_ivs): Add condition to 
>ignore reduction variables when iterating the loop header's phis.

I have an error in the ChangeLog:
the change is in tree-ssa-loop-manip.c instead of tree-vect-loop-manip.c 

Sorry,
Razya
> 
> [attachment "cactus_dealII_patch.txt" deleted by Razya 
Ladelsky/Haifa/IBM] 


Re: [PATCH 2/3] canonicalize_loop_ivs should not generate unsigned types.

2011-07-25 Thread Richard Guenther
On Mon, Jul 25, 2011 at 4:56 PM, Sebastian Pop  wrote:
> On Mon, Jul 25, 2011 at 03:41, Richard Guenther  wrote:
>> I think we also need to care for non-integral PHIs where TYPE_PRECISION
>> and TYPE_UNSIGNED are not applicable (seems the original code is also
>> buggy here?).  So, sth like
>>
>>  type = TREE_TYPE (res);
>>  if (!is_gimple_reg (res)
>>      || !INTEGRAL_TYPE_P (type)
>>      || TYPE_PRECISION (type) < precision)
>>    continue;
>>
>>  precision = TYPE_PRECISION (type);
>>  unsigned_p |= TYPE_UNSIGNED (type);
>> }
>>
>
> This would not work optimally on the following sequence:
> unsigned char
> signed short
> as we would set the unsigned_p to true for the "unsigned char" and
> then we would not reset the value of unsigned_p when the precision
> increases.  So what about doing this instead:
>
>  type = TREE_TYPE (res);
>  if (!is_gimple_reg (res)
>      || !INTEGRAL_TYPE_P (type)
>      || TYPE_PRECISION (type) < precision)
>    continue;
>
>  if (TYPE_PRECISION (type) > precision)
>    unsigned_p = TYPE_UNSIGNED (type);
>  else
>    unsigned_p |= TYPE_UNSIGNED (type);
>
>  precision = TYPE_PRECISION (type);

Ah, indeed.  Yes, fine with me.

Thanks,
Richard.

> Thanks,
> Sebastian
>


Re: [RFC] Replace some bitmaps with HARD_REG_SETs - second version

2011-07-25 Thread Bernd Schmidt
On 07/25/11 16:20, Michael Matz wrote:
> Hi,
> 
> On Mon, 25 Jul 2011, Dimitrios Apostolou wrote:
> 
>> Bug found, in df_mark_reg I need to iterate until regno + n, not n. The error
>> is at the following hunk:
>>
>> --- gcc/df-scan.c   2011-02-02 20:08:06 +
>> +++ gcc/df-scan.c   2011-07-24 17:16:46 +
>> @@ -3713,35 +3717,40 @@ df_mark_reg (rtx reg, void *vset)
>>if (regno < FIRST_PSEUDO_REGISTER)
>>  {
>>int n = hard_regno_nregs[regno][GET_MODE (reg)];
>> -  bitmap_set_range (set, regno, n);
>> +  int i;
>> +  for (i=regno; i> +   SET_HARD_REG_BIT (*set, i);
>>  }
> 
> No.  n is a count, hence the upper bound is regno + n.

Also, see add_to_hard_reg_set.


Bernd


Fix libgomp alignment errors on Tru64 UNIX (PR libgomp/45351)

2011-07-25 Thread Rainer Orth
As discussed in the PR, we need to force int alignment for sem_t on
Tru64 UNIX to work around a bug in the native librt.  The following
patch does this.

Tested by rebuilding libgomp, make check is still running.  No failures
or unaligned access errors so for.

Ok for mainline if it passes?

Thanks.
Rainer


2011-07-22  Rainer Orth  

PR libgomp/45351
* config/osf/sem.h: New file.
* configure.tgt (alpha*-dec-osf*): Prepend osf to config_path.

diff --git a/libgomp/config/osf/sem.h b/libgomp/config/osf/sem.h
new file mode 100644
--- /dev/null
+++ b/libgomp/config/osf/sem.h
@@ -0,0 +1,53 @@
+/* Copyright (C) 2011 Free Software Foundation, Inc.
+
+   This file is part of the GNU OpenMP Library (libgomp).
+
+   Libgomp is free software; you can redistribute it and/or modify it
+   under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3, or (at your option)
+   any later version.
+
+   Libgomp is distributed in the hope that it will be useful, but WITHOUT ANY
+   WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS
+   FOR A PARTICULAR PURPOSE.  See the GNU General Public License for
+   more details.
+
+   Under Section 7 of GPL version 3, you are granted additional
+   permissions described in the GCC Runtime Library Exception, version
+   3.1, as published by the Free Software Foundation.
+
+   You should have received a copy of the GNU General Public License and
+   a copy of the GCC Runtime Library Exception along with this program;
+   see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+   .  */
+
+/* This is a variant of config/posix/sem.h for Tru64 UNIX.  The librt
+   sem_init implementation assumes int (4-byte) alignment for sem_t, while
+   the type only requires short (2-byte) alignment.  This mismatch causes
+   lots of unaligned access warnings from the kernel, so enforce that
+   alignment.  */
+
+#ifndef GOMP_SEM_H
+#define GOMP_SEM_H 1
+
+#include 
+
+typedef sem_t gomp_sem_t __attribute__((aligned (__alignof__ (int;
+
+static inline void gomp_sem_init (gomp_sem_t *sem, int value)
+{
+  sem_init (sem, 0, value);
+}
+
+extern void gomp_sem_wait (gomp_sem_t *sem);
+
+static inline void gomp_sem_post (gomp_sem_t *sem)
+{
+  sem_post (sem);
+}
+
+static inline void gomp_sem_destroy (gomp_sem_t *sem)
+{
+  sem_destroy (sem);
+}
+#endif /* GOMP_SEM_H  */
diff --git a/libgomp/configure.tgt b/libgomp/configure.tgt
--- a/libgomp/configure.tgt
+++ b/libgomp/configure.tgt
@@ -129,6 +129,11 @@ case "${target}" in
XLDFLAGS="${XLDFLAGS} -lpthread"
;;
 
+  alpha*-dec-osf*)
+   # Use Tru64 UNIX-specific sem.h version.
+   config_path="osf posix"
+   ;;
+
   mips-sgi-irix6*)
# Need to link with -lpthread so libgomp.so is self-contained.
XLDFLAGS="${XLDFLAGS} -lpthread"


-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Fix libgomp alignment errors on Tru64 UNIX (PR libgomp/45351)

2011-07-25 Thread Jakub Jelinek
On Mon, Jul 25, 2011 at 05:04:03PM +0200, Rainer Orth wrote:
> As discussed in the PR, we need to force int alignment for sem_t on
> Tru64 UNIX to work around a bug in the native librt.  The following
> patch does this.
> 
> Tested by rebuilding libgomp, make check is still running.  No failures
> or unaligned access errors so for.
> 
> Ok for mainline if it passes?

Yes.  Thanks.

> 2011-07-22  Rainer Orth  
> 
>   PR libgomp/45351
>   * config/osf/sem.h: New file.
>   * configure.tgt (alpha*-dec-osf*): Prepend osf to config_path.

Jakub


Re: [PATCH 2/3] canonicalize_loop_ivs should not generate unsigned types.

2011-07-25 Thread Sebastian Pop
On Mon, Jul 25, 2011 at 10:01, Richard Guenther
 wrote:
>>  type = TREE_TYPE (res);
>>  if (!is_gimple_reg (res)
>>      || !INTEGRAL_TYPE_P (type)
>>      || TYPE_PRECISION (type) < precision)
>>    continue;
>>
>>  if (TYPE_PRECISION (type) > precision)
>>    unsigned_p = TYPE_UNSIGNED (type);
>>  else
>>    unsigned_p |= TYPE_UNSIGNED (type);
>>
>>  precision = TYPE_PRECISION (type);
>
> Ah, indeed.  Yes, fine with me.

Ok, so let's wait before committing to see what Zdenek says about
the use of INTEGRAL_TYPE_P.

I am now testing this together with the other patches.

Sebastian
From 0c7d8bc8935ac00701735e96fcaa91855e099727 Mon Sep 17 00:00:00 2001
From: Sebastian Pop 
Date: Sun, 24 Jul 2011 01:52:52 -0500
Subject: [PATCH] canonicalize_loop_ivs should not generate unsigned types.

2011-07-23  Sebastian Pop  

	* tree-ssa-loop-manip.c (canonicalize_loop_ivs): Build an unsigned
	iv only when the largest type is unsigned.  Do not call
	lang_hooks.types.type_for_size.

	* testsuite/libgomp.graphite/force-parallel-1.c: Un-xfail.
	* testsuite/libgomp.graphite/force-parallel-2.c: Adjust pattern.
---
 gcc/ChangeLog  |6 ++
 gcc/tree-ssa-loop-manip.c  |   20 +---
 libgomp/ChangeLog  |5 +
 .../testsuite/libgomp.graphite/force-parallel-1.c  |2 +-
 .../testsuite/libgomp.graphite/force-parallel-2.c  |2 +-
 5 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index dba2f82..65676cb 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,11 @@
 2011-07-23  Sebastian Pop  
 
+	* tree-ssa-loop-manip.c (canonicalize_loop_ivs): Build an unsigned
+	iv only when the largest type is unsigned.  Do not call
+	lang_hooks.types.type_for_size.
+
+2011-07-23  Sebastian Pop  
+
 	* tree-data-ref.c (max_stmt_executions_tree): Do not call
 	lang_hooks.types.type_for_size.
 
diff --git a/gcc/tree-ssa-loop-manip.c b/gcc/tree-ssa-loop-manip.c
index 8176ed8..f73d2d9 100644
--- a/gcc/tree-ssa-loop-manip.c
+++ b/gcc/tree-ssa-loop-manip.c
@@ -1200,6 +1200,8 @@ canonicalize_loop_ivs (struct loop *loop, tree *nit, bool bump_in_latch)
   gimple stmt;
   edge exit = single_dom_exit (loop);
   gimple_seq stmts;
+  enum machine_mode mode;
+  bool unsigned_p = false;
 
   for (psi = gsi_start_phis (loop->header);
!gsi_end_p (psi); gsi_next (&psi))
@@ -1207,11 +1209,23 @@ canonicalize_loop_ivs (struct loop *loop, tree *nit, bool bump_in_latch)
   gimple phi = gsi_stmt (psi);
   tree res = PHI_RESULT (phi);
 
-  if (is_gimple_reg (res) && TYPE_PRECISION (TREE_TYPE (res)) > precision)
-	precision = TYPE_PRECISION (TREE_TYPE (res));
+  type = TREE_TYPE (res);
+  if (!is_gimple_reg (res)
+	  || !INTEGRAL_TYPE_P (type)
+	  || TYPE_PRECISION (type) < precision)
+	continue;
+
+  if (TYPE_PRECISION (type) > precision)
+	unsigned_p = TYPE_UNSIGNED (type);
+  else
+	unsigned_p |= TYPE_UNSIGNED (type);
+
+  precision = TYPE_PRECISION (type);
 }
 
-  type = lang_hooks.types.type_for_size (precision, 1);
+  mode = smallest_mode_for_size (precision, MODE_INT);
+  precision = GET_MODE_PRECISION (mode);
+  type = build_nonstandard_integer_type (precision, unsigned_p);
 
   if (original_precision != precision)
 {
diff --git a/libgomp/ChangeLog b/libgomp/ChangeLog
index 9225401..d5cd94d 100644
--- a/libgomp/ChangeLog
+++ b/libgomp/ChangeLog
@@ -1,3 +1,8 @@
+2011-07-23  Sebastian Pop  
+
+	* testsuite/libgomp.graphite/force-parallel-1.c: Un-xfail.
+	* testsuite/libgomp.graphite/force-parallel-2.c: Adjust pattern.
+
 2011-07-18  Rainer Orth  
 
 	PR target/49541
diff --git a/libgomp/testsuite/libgomp.graphite/force-parallel-1.c b/libgomp/testsuite/libgomp.graphite/force-parallel-1.c
index 71ed332..7f043d8 100644
--- a/libgomp/testsuite/libgomp.graphite/force-parallel-1.c
+++ b/libgomp/testsuite/libgomp.graphite/force-parallel-1.c
@@ -23,7 +23,7 @@ int main(void)
 }
 
 /* Check that parallel code generation part make the right answer.  */
-/* { dg-final { scan-tree-dump-times "1 loops carried no dependency" 2 "graphite" { xfail *-*-* } } } */
+/* { dg-final { scan-tree-dump-times "1 loops carried no dependency" 2 "graphite" } } */
 /* { dg-final { cleanup-tree-dump "graphite" } } */
 /* { dg-final { scan-tree-dump-times "loopfn" 5 "optimized" } } */
 /* { dg-final { cleanup-tree-dump "parloops" } } */
diff --git a/libgomp/testsuite/libgomp.graphite/force-parallel-2.c b/libgomp/testsuite/libgomp.graphite/force-parallel-2.c
index 1ce0feb..03d8236 100644
--- a/libgomp/testsuite/libgomp.graphite/force-parallel-2.c
+++ b/libgomp/testsuite/libgomp.graphite/force-parallel-2.c
@@ -23,7 +23,7 @@ int main(void)
 }
 
 /* Check that parallel code generation part make the right answer.  */
-/* { dg-final { scan-tree-dump-times "2 loops carried no dependency" 1 "graphite" } } */
+/* { dg-final { scan-tree-dump-times "2 loops carried no dependency" 2 "graphite" } } *

[fixincludes] Fix posix_spawn* declarations in Solaris (PR c++/49347)

2011-07-25 Thread Rainer Orth
As discussed in the PR, the Solaris 10+  header needs a fix to
make it work with g++.  The following patch implements it.

It passed a i386-pc-solaris2.11 bootstrap without regressions and make
check in fixincludes works without failures.

Ok for mainline?

Thanks.
Rainer


diff --git a/fixincludes/inclhack.def b/fixincludes/inclhack.def
--- a/fixincludes/inclhack.def
+++ b/fixincludes/inclhack.def
@@ -3706,6 +3706,23 @@ fix = {
 
 
 /*
+ * Solaris 10+  uses char *const argv[_RESTRICT_KYWD] in the
+ * posix_spawn declarations, which doesn't work with C++.
+ */
+fix = {
+hackname  = solaris_posix_spawn_restrict;
+files = spawn.h;
+mach  = '*-*-solaris2*';
+c_fix = format;
+c_fix_arg = "%1*_RESTRICT_KYWD %2%3";
+select= "(.*[ \t]+)([a-z]+)\\[_RESTRICT_KYWD\\](.*)";
+test_text =
+"char *const argv[_RESTRICT_KYWD],\n"
+"char *const envp[_RESTRICT_KYWD]);";
+};
+
+
+/*
  * Sun Solaris 8 has what appears to be some gross workaround for
  * some old version of their c++ compiler.  G++ doesn't want it
  * either, but doesn't want to be tied to SunPRO version numbers.
diff --git a/fixincludes/tests/base/spawn.h b/fixincludes/tests/base/spawn.h
new file mode 100644
--- /dev/null
+++ b/fixincludes/tests/base/spawn.h
@@ -0,0 +1,15 @@
+/*  DO NOT EDIT THIS FILE.
+
+It has been auto-edited by fixincludes from:
+
+   "fixinc/tests/inc/spawn.h"
+
+This had to be done to correct non-standard usages in the
+original, manufacturer supplied header file.  */
+
+
+
+#if defined( SOLARIS_POSIX_SPAWN_RESTRICT_CHECK )
+char *const *_RESTRICT_KYWD argv,
+char *const *_RESTRICT_KYWD envp);
+#endif  /* SOLARIS_POSIX_SPAWN_RESTRICT_CHECK */

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [C++ Patch] PR 49838

2011-07-25 Thread Jason Merrill
Ok.

Re: Fix pass_partition_blocks vs -O0

2011-07-25 Thread Richard Henderson
On 07/25/2011 06:02 AM, Michael Matz wrote:
> Hi,
> 
> On Fri, 22 Jul 2011, Richard Henderson wrote:
> 
>> Well, technically it's not "broken" yet.  It will be as soon as it starts
>> touching DF data, since this pass runs before pass_df_initialize_no_opt.
>>
>> But the only real consumer of BB_PARTITION is pass_reorder_blocks.  And
>> that pass is already gated to only run if optimization is enabled.  So
>> really there's no point in running this pass without optimization.
>>
>> Committed.
> 
> Why not simply move pass_df_initialize_no_opt earlier?  Introducing more 
> checks on optimize instead of only relying on flag_xxx seems to go the 
> wrong direction.

Primarily because of pass_reorder_blocks not running.


r~


Re: [patch] Fix PR tree-optimization/49771

2011-07-25 Thread Ulrich Weigand
Richard Guenther wrote:
> On Mon, Jul 25, 2011 at 4:23 PM, Ulrich Weigand  wrote:
> > I had always understood this to reflect the simple fact that a
> > pointer to some type must never hold a value that is not properly
> > aligned for that type.  (Maybe this is only true on STRICT_ALIGNMENT
> > targets?)   This has always been an important property to generate
> > good code on SPU ...
> 
> We do not preserve pointer type casts in the middle-end (anymore).

Huh, OK.  I was not aware of that ...

> >> nonzero_bits1 seems to be the only consumer of REGNO_POINTER_ALIGN
> >> apart from maybe alpha.c and spu.c.

There's also a use in find_reloads_subreg_address, as well as in the
i386/predicates.md and arm/arm.md files.

> > This means I need to generate a rotate to fix up the value that was
> > loaded by the (forced aligned) load instruction.  However, the form
> > of this rotate can be simpler if I know that e.g. reg X is always
> > guaranteed to be 128-bits aligned and only reg Y introduces the
> > potential misalignment.  If on the other hand neither of the base
> > registers is guaranteed to be 128-bit aligned, I need to generate
> > more complex rotate code ...
> 
> Because then you need the value of X + Y instead of just picking either?

Yes, exactly.

> Why not expand this explicitly when you still have the per-SSA name
> alignment information around?

When would that be?  The expansion does happen in the initial expand
stage, but I'm getting called from the middle-end via emit_move_insn etc.
which already provides me with a MEM ...

Can I use REG_ATTRS->decl to get at the register's DECL and use
get_pointer_alignment on that?  [ On the other hand, don't we have
the same problems with reliability of REG_ATTRS that we have with
REGNO_POINTER_ALIGN, given e.g. the coalescing you mentioned? ]

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


[testsuite] Restore g++.dg/torture/pr49309.C (PR testsuite/49753)

2011-07-25 Thread Rainer Orth
As previously discussed, I've returned the pr49309.C test to
gcc/testsuite, given that it's a compiler and not a runtime library
test.

Tested on i386-pc-solaris2.11, installed on mainline and 4.6 branch.

Rainer


2011-07-18  Rainer Orth  

gcc/testsuite:
PR testsuite/49753
* g++.dg/torture/pr49309.C: Add -fpreprocessed to dg-options.

Revert:
2011-07-15  Jakub Jelinek  

PR testsuite/49753
* g++.dg/torture/pr49309.C: Remove.

libmudflap:
Revert:
2011-07-15  Jakub Jelinek  

PR testsuite/49753
PR tree-optimization/49309
* testsuite/libmudflap.c++/pass68-frag.cxx: New test.

diff --git a/libmudflap/testsuite/libmudflap.c++/pass68-frag.cxx 
b/gcc/testsuite/g++.dg/torture/pr49309.C
rename from libmudflap/testsuite/libmudflap.c++/pass68-frag.cxx
rename to gcc/testsuite/g++.dg/torture/pr49309.C
--- a/libmudflap/testsuite/libmudflap.c++/pass68-frag.cxx
+++ b/gcc/testsuite/g++.dg/torture/pr49309.C
@@ -1,6 +1,6 @@
 // PR tree-optimization/49309
 // { dg-do compile }
-// { dg-options "-fmudflap" }
+// { dg-options "-fpreprocessed -fmudflap" }
 
 struct A
 {

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [fixincludes] Fix posix_spawn* declarations in Solaris (PR c++/49347)

2011-07-25 Thread Bruce Korb

On 07/25/11 08:17, Rainer Orth wrote:

As discussed in the PR, the Solaris 10+  header needs a fix to
make it work with g++.  The following patch implements it.

It passed a i386-pc-solaris2.11 bootstrap without regressions and make
check in fixincludes works without failures.

Ok for mainline?


Hi Rainer,

> some-function(char *const argv[_RESTRICT_KYWD], ...)

looks pretty broken to me.  How would it work with plain gcc?
Anyway, editing the _RESTRICT_KYWD into the correct place looks
correct to me, and I'm sure you tested.  "Ship it".  Methinks
all active branches, too


Re: [fixincludes] Fix posix_spawn* declarations in Solaris (PR c++/49347)

2011-07-25 Thread Rainer Orth
Hi Bruce,

>> some-function(char *const argv[_RESTRICT_KYWD], ...)
>
> looks pretty broken to me.  How would it work with plain gcc?

no idea, but both gcc -std=c99 and Sun Studio cc -xc99 do accept it.

> Anyway, editing the _RESTRICT_KYWD into the correct place looks
> correct to me, and I'm sure you tested.  "Ship it".  Methinks
> all active branches, too

I'll restrict it to mainline and the 4.6 branch since this is the first
one that had the fix for

#define_RESTRICT_KYWD  __restrict

in .

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: Fix pass_partition_blocks vs -O0

2011-07-25 Thread Michael Matz
Hi,

On Mon, 25 Jul 2011, Richard Henderson wrote:

> >> But the only real consumer of BB_PARTITION is pass_reorder_blocks.  
> >> And that pass is already gated to only run if optimization is 
> >> enabled.  So really there's no point in running this pass without 
> >> optimization.
> >>
> >> Committed.
> > 
> > Why not simply move pass_df_initialize_no_opt earlier?  Introducing 
> > more checks on optimize instead of only relying on flag_xxx seems to 
> > go the wrong direction.
> 
> Primarily because of pass_reorder_blocks not running.

Yeah, well, that just means pass_reorder_blocks also could do with a 
better gate (indeed it doesn't do much except a CFG cleanup when neither 
flag_reorder_blocks nor flag_reorder_blocks_and_partition is set).  
Granted there are currently > 320 tests for optimize, so we'll survive one 
more, it just felt wrong.


Ciao,
Michael.


Re: [PR43597, ARM, TESTCASE]

2011-07-25 Thread Tom de Vries
Hi,

thanks for the review.

On 07/18/2011 03:19 PM, Richard Earnshaw wrote:
> On 18/07/11 12:09, Tom de Vries wrote:
>> Hi,
>>
>> PR43597 was fixed by 
>> http://gcc.gnu.org/viewcvs?view=revision&revision=172032.
>>
>> This patch adds a testcase.
>>
>> OK for trunk?
>>
>> Thanks,
>> - Tom
>>
>> 2011-07-18  Tom de Vries  
>>
>> PR target/43597
>> * gcc.target/arm/pr43597.c: New test.
>>
>>
> 
> No, don't pass -mthumb through dg-options unless you're using something
> like require-effective-target.
> 

OK, I see.

> In this case the post-compile tests are all gated on thumb2.  So why not
> make the whole test just require arm_thumb2_ok?
> 
> R.
> 

Done. OK for trunk?

Thanks,
- Tom

2011-07-25  Tom de Vries  

PR target/43597
* gcc.target/arm/pr43597.c: New test.
Index: gcc.target/arm/pr43597.c
===
--- gcc.target/arm/pr43597.c	(revision 0)
+++ gcc.target/arm/pr43597.c	(revision 0)
@@ -0,0 +1,28 @@
+/* { dg-do assemble } */
+/* { dg-options "-Os -save-temps -mthumb" } */
+/* { dg-require-effective-target arm_thumb2_ok } */
+
+extern int bar ();
+extern void bar2 (int);
+
+int
+foo4 ()
+{
+  int result = 0;
+  int f = -1;
+  f = bar ();
+  if (f < 0)
+{
+  result = 1;
+  goto bail;
+}
+  bar ();
+ bail:
+  bar2 (f);
+  return result;
+}
+
+/* { dg-final { scan-assembler-times "sub" 1 } } */
+/* { dg-final { scan-assembler-times "cmp" 0 } } */
+/* { dg-final { object-size text <= 30 } } */
+/* { dg-final { cleanup-saved-temps "pr43597" } } */


Re: [ARM] Fix PR49641

2011-07-25 Thread Bernd Schmidt
On 07/13/11 16:01, Richard Earnshaw wrote:
> On 07/07/11 21:02, Bernd Schmidt wrote:
>> This corrects an error in store_multiple_operation. We're only
>> generating the writeback version of the instruction on Thumb-1, so
>> that's where we must make sure the base register isn't also stored.
>>
>> The ARMv7 manual is unfortunately not totally clear that this does in
>> fact produce unpredictable results; it seems to suggest that this is the
>> case only for the T2 encoding. Older documentation makes it clear.
>>
>> Tested on arm-eabi{,mthumb}.
>>
> 
> I agree that the wording here is unclear, but the pseudo code for the
> decode makes the situation clearer, and does reflect what I really
> believe to be the case.  Put explicitly:

[...]

I just remembered this patch. Your reply didn't actually comment on it,
so - ok to install?


bernd


Re: [Patch,AVR]: PR49687 (better widening 32-bit mul)

2011-07-25 Thread Georg-Johann Lay
Weddington, Eric wrote:
> 
>> Eric, can you review the assembler routines and say if such reuse is ok or 
>> if you'd prefer a
>> speed-optimized version of __mulsi3 like in the current libgcc?
> 
> Hi Johann,
> 
> Typically a penalty on speed is preferred over a penalty on code size. Do you 
> already have
> information on how it compares on code size with the old routines?
> 
> Eric

The old sizes are

62 __mulsi3
26 __mulhisi3
22 __umulhisi3
10 __xmulhisi3

where the __[u]mulhisi3 will drag in __xmulhisi3 and the insns don't combine
with constants.

The new implementation has more fragments, the indented modules are dragged
in i.e. used by respective function:

12 __mulhisi3
 __umulhisi3
 __usmulhisi3_tail

30 __umulhisi3

02 __usmulhisi3
10 __usmulhisi3_tail

20 __muluhisi3
 __umulhisi3

08 __mulohisi3
04 __mulshisi3
 __muluhisi3

30 __mulsi3
 __muluhisi3

This means that a pure __mulsi3 will have 30+30+20 = 80 bytes (+18).

If all functions are used they occupy 116 bytes (-4), so they actually
save a little space if they are used all with the benefit that they also
can one-extend, extend 32 = 16*32 as well as 32=16*16 and work for
small (17 bit signed) constants.

__umulhisi3 reads:

DEFUN __umulhisi3
mul A0, B0
movwC0, r0
mul A1, B1
movwC2, r0
mul A0, B1
add C1, r0
adc C2, r1
clr __zero_reg__
adc C3, __zero_reg__
mul A1, B0
add C1, r0
adc C2, r1
clr __zero_reg__
adc C3, __zero_reg__
ret
ENDF __umulhisi3

It could be compressed to the following sequence, i.e.
24 bytes instead of 30, but I think that's too much of
quenching the last byte out of the code:

DEFUN __umulhisi3
mul A0, B0
movwC0, r0
mul A1, B1
movwC2, r0
mul A0, B1
rcall   1f
mul A1, B0
1:  add C1, r0
adc C2, r1
clr __zero_reg__
adc C3, __zero_reg__
ret
ENDF __umulhisi3


In that lack of real-world-code that uses 32-bit arithmetic I trust
my intuition that code size will decrease in general ;-)

Tiny examples are sometimes misleading because of additional moves from
unpleasant register allocation, bit that's a different story...

Johann


[testsuite] Provide and use mmap effective-target keyword

2011-07-25 Thread Rainer Orth
When last week a testcase using mmap was posted with a copy of some old
(and wrong) list of targets supporting mmap, I noticed what mess we have
here.  To fix this, I've introduced a new effective-target keyword mmap
and use it in all testcases.

Two minor changes to the tests were required:

* gcc.dg/20030711-1.c and gcc.dg/20050826-1.c failed to compile on IRIX
  which doesn't have MAP_ANON.

* gcc.dg/vect/pr49038.c must not use dg-do run: on Solaris 8/x86, which
  cannot execute SSE insns, the vect.dg are usually demoted into compile
  tests on such targets, which is defeated by the explicit dg-do run.

With those changes, I could successfully run the tests on
i386-pc-solaris2.8, i386-pc-solaris2.11, alpha-dec-osf5.1b,
mips-sgi-irix6.5, powerpc-apple-darwin9.8.0 and i386-apple-darwin9.8.0.

Given this wide range of working system, I think it's reasonably safe to
install this patch, thus: installed on mainline.

Rainer


2011-07-23  Rainer Orth  

gcc:
* doc/sourcebuild.texi (Effective-Target Keywords, Environment
attributes): Document mmap.

gcc/testsuite:
* lib/target-supports.exp (check_effective_target_mmap): New proc.

* gcc.c-torture/execute/loop-2f.c: Remove #ifdef __unix__.
* gcc.c-torture/execute/loop-2g.c: Likewise.
* gcc.c-torture/execute/loop-2f.x: Load target-supports.exp.
Require mmap support.
* gcc.c-torture/execute/loop-2g.x: Likewise.
* gcc.dg/20030711-1.c: Replace dg-do target list by mmap.
(MAP_ANON): Provide default.
* gcc.dg/20050826-1.c: Likewise.
* gcc.target/i386/pr36533.c: Likewise.
* gcc.dg/vect/pr49038.c: Remove dg-do run.
Use dg-require-effective-target mmap.

diff --git a/gcc/doc/sourcebuild.texi b/gcc/doc/sourcebuild.texi
--- a/gcc/doc/sourcebuild.texi
+++ b/gcc/doc/sourcebuild.texi
@@ -1679,6 +1679,9 @@ Target might have errors of a few ULP in
 conversion functions and overflow is not always detected correctly by
 those functions.
 
+@item mmap
+Target supports @code{mmap}.
+
 @item newlib
 Target supports Newlib.
 
diff --git a/gcc/testsuite/gcc.c-torture/execute/loop-2f.c 
b/gcc/testsuite/gcc.c-torture/execute/loop-2f.c
--- a/gcc/testsuite/gcc.c-torture/execute/loop-2f.c
+++ b/gcc/testsuite/gcc.c-torture/execute/loop-2f.c
@@ -1,6 +1,5 @@
 #include 
 
-#ifdef __unix__ /* ??? Is that good enough? */
 #include 
 #include 
 #include 
@@ -18,7 +17,6 @@
 #ifndef MAP_FIXED
 #define MAP_FIXED 0
 #endif
-#endif
 
 #define MAP_START (void *)0x7fff8000
 #define MAP_LEN 0x1
diff --git a/gcc/testsuite/gcc.c-torture/execute/loop-2f.x 
b/gcc/testsuite/gcc.c-torture/execute/loop-2f.x
--- a/gcc/testsuite/gcc.c-torture/execute/loop-2f.x
+++ b/gcc/testsuite/gcc.c-torture/execute/loop-2f.x
@@ -1,3 +1,9 @@
+load_lib target-supports.exp
+
+if { ! [check_effective_target_mmap] } {
+return 1
+}
+
 if [istarget "m68k-*-linux*"] {
 # the executable is at the same position the test tries to remap
 return 1
diff --git a/gcc/testsuite/gcc.c-torture/execute/loop-2g.c 
b/gcc/testsuite/gcc.c-torture/execute/loop-2g.c
--- a/gcc/testsuite/gcc.c-torture/execute/loop-2g.c
+++ b/gcc/testsuite/gcc.c-torture/execute/loop-2g.c
@@ -1,6 +1,5 @@
 #include 
 
-#ifdef __unix__ /* ??? Is that good enough? */
 #include 
 #include 
 #include 
@@ -18,7 +17,6 @@
 #ifndef MAP_FIXED
 #define MAP_FIXED 0
 #endif
-#endif
 
 #define MAP_START (void *)0x7fff8000
 #define MAP_LEN 0x1
diff --git a/gcc/testsuite/gcc.c-torture/execute/loop-2g.x 
b/gcc/testsuite/gcc.c-torture/execute/loop-2g.x
--- a/gcc/testsuite/gcc.c-torture/execute/loop-2g.x
+++ b/gcc/testsuite/gcc.c-torture/execute/loop-2g.x
@@ -1,3 +1,9 @@
+load_lib target-supports.exp
+
+if { ! [check_effective_target_mmap] } {
+return 1
+}
+
 if [istarget "m68k-*-linux*"] {
 # the executable is at the same position the test tries to remap
 return 1
diff --git a/gcc/testsuite/gcc.dg/20030711-1.c 
b/gcc/testsuite/gcc.dg/20030711-1.c
--- a/gcc/testsuite/gcc.dg/20030711-1.c
+++ b/gcc/testsuite/gcc.dg/20030711-1.c
@@ -1,6 +1,6 @@
 /* Test whether strncmp has not been "optimized" into memcmp
nor any code with memcmp semantics.  */
-/* { dg-do run { target i?86-*-linux* x86_64-*-linux* ia64-*-linux* 
alpha*-*-linux* powerpc*-*-linux* s390*-*-linux* sparc*-*-linux* *-*-darwin* } 
} */
+/* { dg-do run { target mmap } } */
 /* { dg-options "-O2" } */
 #include 
 #include 
@@ -8,6 +8,9 @@
 #ifndef MAP_ANONYMOUS
 #define MAP_ANONYMOUS MAP_ANON
 #endif
+#ifndef MAP_ANON
+#define MAP_ANON 0
+#endif
 #include 
 
 void __attribute__((noinline)) test (const char *p)
diff --git a/gcc/testsuite/gcc.dg/20050826-1.c 
b/gcc/testsuite/gcc.dg/20050826-1.c
--- a/gcc/testsuite/gcc.dg/20050826-1.c
+++ b/gcc/testsuite/gcc.dg/20050826-1.c
@@ -1,6 +1,6 @@
 /* Test whether strncmp has not been "optimized" into memcmp
nor any code with memcmp semantics.  */
-/* { dg-do run { target i?86-*-linux* x86_64-*-linux* ia64-*-linux* 

[PATCH] Fix PR47594: Sign extend constants while translating to Graphite

2011-07-25 Thread Sebastian Pop
"Bug 47594 - gfortran.dg/vect/vect-5.f90 execution test fails when
compiled with -O2 -fgraphite-identity"

The problem is due to the fact that Graphite generates this loop:

for (scat_3=0;scat_3<=4294967295*scat_1+T_51-1;scat_3++) {
  S6(scat_1,scat_3);
}

that has a "-1" encoded as an unsigned "4294967295".  This constant
comes from the computation of the number of iterations "M - I" of
the inner loop:

do I = 1, N
  do J = I, M
A(J,2) = B(J)
  end do
end do

The patch fixes the problem by sign-extending the constants for the
step of a chain of recurrence in scan_tree_for_params_right_scev.

The same patter could occur for multiplication by a scalar, like in
"-1 * N" and so the patch also fixes these cases in
scan_tree_for_params.

Bootstrapped and tested on amd64-linux.

2011-07-23  Sebastian Pop  

PR middle-end/47594
* graphite-sese-to-poly.c (scan_tree_for_params_right_scev): Sign
extend constants.
(scan_tree_for_params): Same.

* gfortran.dg/graphite/run-id-pr47594.f90: New.
---
 gcc/ChangeLog  |7 
 gcc/graphite-sese-to-poly.c|   26 --
 gcc/testsuite/ChangeLog|5 +++
 .../gfortran.dg/graphite/run-id-pr47594.f90|   35 
 4 files changed, 69 insertions(+), 4 deletions(-)
 create mode 100644 gcc/testsuite/gfortran.dg/graphite/run-id-pr47594.f90

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 65676cb..f7e2f7d 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,5 +1,12 @@
 2011-07-23  Sebastian Pop  
 
+   PR middle-end/47594
+   * graphite-sese-to-poly.c (scan_tree_for_params_right_scev): Sign
+   extend constants.
+   (scan_tree_for_params): Same.
+
+2011-07-23  Sebastian Pop  
+
* tree-ssa-loop-manip.c (canonicalize_loop_ivs): Build an unsigned
iv only when the largest type is unsigned.  Do not call
lang_hooks.types.type_for_size.
diff --git a/gcc/graphite-sese-to-poly.c b/gcc/graphite-sese-to-poly.c
index 7e23c9d..5c9e984 100644
--- a/gcc/graphite-sese-to-poly.c
+++ b/gcc/graphite-sese-to-poly.c
@@ -633,7 +633,11 @@ scan_tree_for_params_right_scev (sese s, tree e, int var,
   gcc_assert (TREE_CODE (e) == INTEGER_CST);
 
   mpz_init (val);
-  tree_int_to_gmp (e, val);
+
+  /* Necessary to not get "-1 = 2^n - 1". */
+  mpz_set_double_int
+   (val, double_int_sext (tree_to_double_int (e),
+  TYPE_PRECISION (TREE_TYPE (e))), false);
   add_value_to_dim (l, expr, val);
   mpz_clear (val);
 }
@@ -729,9 +733,16 @@ scan_tree_for_params (sese s, tree e, 
ppl_Linear_Expression_t c,
  if (c)
{
  mpz_t val;
- gcc_assert (host_integerp (TREE_OPERAND (e, 1), 0));
+ tree cst = TREE_OPERAND (e, 1);
+
+ gcc_assert (host_integerp (cst, 0));
  mpz_init (val);
- tree_int_to_gmp (TREE_OPERAND (e, 1), val);
+
+ /* Necessary to not get "-1 = 2^n - 1". */
+ mpz_set_double_int
+   (val, double_int_sext (tree_to_double_int (cst),
+  TYPE_PRECISION (TREE_TYPE (cst))), 
false);
+
  mpz_mul (val, val, k);
  scan_tree_for_params (s, TREE_OPERAND (e, 0), c, val);
  mpz_clear (val);
@@ -744,9 +755,16 @@ scan_tree_for_params (sese s, tree e, 
ppl_Linear_Expression_t c,
  if (c)
{
  mpz_t val;
+ tree cst = TREE_OPERAND (e, 0);
+
  gcc_assert (host_integerp (TREE_OPERAND (e, 0), 0));
  mpz_init (val);
- tree_int_to_gmp (TREE_OPERAND (e, 0), val);
+
+ /* Necessary to not get "-1 = 2^n - 1". */
+ mpz_set_double_int
+   (val, double_int_sext (tree_to_double_int (cst),
+  TYPE_PRECISION (TREE_TYPE (cst))), 
false);
+
  mpz_mul (val, val, k);
  scan_tree_for_params (s, TREE_OPERAND (e, 1), c, val);
  mpz_clear (val);
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 1f93f4c..b7c2be3 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,5 +1,10 @@
 2011-07-23  Sebastian Pop  
 
+   PR middle-end/47594
+   * gfortran.dg/graphite/run-id-pr47594.f90: New.
+
+2011-07-23  Sebastian Pop  
+
PR middle-end/47653
* gcc.dg/graphite/run-id-pr47653.c: New.
* gcc.dg/graphite/interchange-3.c: Do not use unsigned types for
diff --git a/gcc/testsuite/gfortran.dg/graphite/run-id-pr47594.f90 
b/gcc/testsuite/gfortran.dg/graphite/run-id-pr47594.f90
new file mode 100644
index 000..7f36fc6
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/graphite/run-id-pr47594.f90
@@ -0,0 +1,35 @@
+! { dg-options "-O2 -fgraphite-identity" }
+
+Subroutine foo (N, M)
+Integer N
+

Re: CFT: Move unwinder to toplevel libgcc

2011-07-25 Thread Rainer Orth
Steve,

> Well, I see "-Wl,--version-script=libgcc.map" on the link line now but I
> still get an error during the link:
>
> /wsp/sje/gcc_git/gcc-ia64-debian-linux-gnu-gcc/ia64-debian-linux-gnu/bin/ld: 
> ./libgcc_s.so.1.tmp: version node not found for symbol 
> _Unwind_GetBSP@GCC_3.3.2
> /wsp/sje/gcc_git/gcc-ia64-debian-linux-gnu-gcc/ia64-debian-linux-gnu/bin/ld: 
> failed to set dynamic section sizes: Bad value
> collect2: error: ld returned 1 exit status
> make[3]: *** [libgcc_s.so] Error 1
> make[3]: Leaving directory 
> `/wsp/sje/gcc_git/build-ia64-debian-linux-gnu-gcc/obj_gcc/ia64-debian-linux-gnu/libgcc'
> make[2]: *** [all-stage1-target-libgcc] Error 2
> make[2]: Leaving directory 
> `/wsp/sje/gcc_git/build-ia64-debian-linux-gnu-gcc/obj_gcc'
> make[1]: *** [stage1-bubble] Error 2
> make[1]: Leaving directory 
> `/wsp/sje/gcc_git/build-ia64-debian-linux-gnu-gcc/obj_gcc'
> make: *** [bootstrap] Error 2
>
> I think the contents of the map file may be wrong.  This error involves a 
> different symbol then when the
> mapfile was missing.

I'm convinced now that this is the wrong approach.  All we need for
libunwind is a couple of common definitions that happen to only live in
t-slibgcc at the moment.  But including t-slibgcc and dependencies opens
a can of worms, so it's far easier to just provide the definitions
t-libunwind-elf needs ourselves.  So could you

* remove all the t-slibgcc* and related files (t-linux) from tmake_file
  in libgcc/config.host and

* add the following at the top of libgcc/config/t-libunwind-elf:

SHLIB_SOLINK = @shlib_base_name@.so
SHLIB_OBJS = @shlib_objs@
SHLIB_DIR = @multilib_dir@
SHLIB_SLIBDIR_QUAL = @shlib_slibdir_qual@

As you can see, these four variables (the only SHLIB_* ones
t-libunwind-elf uses) are substituted by libgcc/Makefile.in and are
completely generic.

I'll be working on the SHLIB_* move to toplevel libgcc next, so if all
else fails, we could handle all that SHLIB stuff there.

Thanks.
Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [build, ada] Allow Solaris bootstrap with C++ (PR bootstrap/49794)

2011-07-25 Thread Rainer Orth
Paolo,

> On Wed, Jul 20, 2011 at 18:35, Rainer Orth  
> wrote:
>>  I've hacked around this by wrapping the AM_ICONV calls in
>>  AC_LANG_{PUSH, POP}(C++), but I think this exposes a fundamental
>>  issue: the configure tests must be performed with the compiler used
>>  for the build.  That this works without is pure luck IMO.
>
> Right, but it also applies to more than this test.  If you wrapped in
> ifs more than just this call, the approach may be fine, but first I
> would like to look at Autoconf (or at the diff for the regenerated
> configure) to check that what you're doing is safe.  I'm afraid it may

the configure diff looked completely innocent to me, but that may be
just me ;-(

> not be, which means you're patch is not good for the configure part.
> :(

Sorry, I only say your mail after the weekend, and have already checked
in the patch based on Ian's approval.  At least so far I've not become
aware of any problems caused by the patch.

>> * Also, the definition of HAVE_DESIGNATED_INITIALIZERS was wrong for g++
>>  on Solaris which again defines __STDC_VERSION__ 199901L.  To fix this,
>>  I never define H_D_I if __cplusplus.
>
> Is this a valid definition of __STDC_VERSION__ at all?

Why wouldn't it be?  It's the standard C99 value.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: PR 45819 - possible fix?

2011-07-25 Thread DJ Delorie

> Fact is that GCC knows that memory is not properly aligned.

So in the impossibly rare case that gcc is *wrong*, how is the
programmer supposed to tell gcc that?  I mean, gcc 4.4 has been doing
what the programmer wanted, and zillions of ARM devices have been
happily working, and now you tell me they should have been segfaulting
for the last N years.  Surely there's a chance that the ARM developers
know what they're talking about, and have been desperately trying to
convince gcc to stop trying to second-guess them?

I mean, what else should the user expect when they cast a random value
to a "volatile uint32_t *" and derefence it?  I would have expected
gcc to preserve the load *exactly* as the user specified it, not
convert that one load into FOUR loads.


Re: IA64 HP-UX bootstrap with C++

2011-07-25 Thread Rainer Orth
Joseph,

> Are -static-libstdc++ and -static-libgcc not working for you (with the 
> stage 1 compiler when it's used to link stage 2, and the stage 2 compiler 
> used to link stage 3)?  If not, fixing them if possible would be the right 
> approach.

unless HP-UX ld supports -Bstatic/-Bdynamic and --help, the
corresponding section in gcc/configure.ac (gcc_cv_ld_static_dynamic and
friends) needs to be updated.  I had to do the same for Solaris, IRIX
and Tru64 UNIX.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [M32C] Hookize PRINT_OPERAND, PRINT_OPERAND_ADDRESS and PRINT_OPERAND_PUNCT_VALID_P

2011-07-25 Thread DJ Delorie

Ok.


Re: [C++0x] contiguous bitfields race implementation

2011-07-25 Thread Aldy Hernandez

On 07/22/11 13:44, Jason Merrill wrote:

On 07/18/2011 08:02 AM, Aldy Hernandez wrote:

+ /* If other threads can't see this value, no need to restrict
stores. */
+ if (ALLOW_STORE_DATA_RACES
+ || !DECL_THREAD_VISIBLE_P (innerdecl))
+ {
+ *bitstart = *bitend = 0;
+ return;
+ }


What if get_inner_reference returns something that isn't a DECL, such as
an INDIRECT_REF?


I had changed this already to take into account aliasing, so if we get 
an INDIRECT_REF, ptr_deref_may_alias_global_p() returns true, and we 
proceed with the restriction:


+  /* If other threads can't see this value, no need to restrict stores.  */
+  if (ALLOW_STORE_DATA_RACES
+  || (!ptr_deref_may_alias_global_p (innerdecl)
+ && (DECL_THREAD_LOCAL_P (innerdecl)
+ || !TREE_STATIC (innerdecl



+ if (fld)
+ {
+ /* We found the end of the bit field sequence. Include the
+ padding up to the next field and be done. */
+ *bitend = bitpos - 1;
+ }


bitpos is the position of "field", and it seems to me we want the
position of "fld" here.


Notice that bitpos gets recalculated at each iteration by 
get_inner_reference, so bitpos is actually the position of fld.



+ /* If unset, no restriction. */
+ if (!bitregion_end)
+ maxbits = 0;
+ else
+ maxbits = (bitregion_end - bitregion_start) % align;


Maybe use MAX_FIXED_MODE_SIZE so you don't have to test it against 0?


Fixed everywhere.


+ if (!bitregion_end)
+ maxbits = 0;
+ else if (1||bitpos + offset * BITS_PER_UNIT < bitregion_start)
+ maxbits = bitregion_end - bitregion_start;
+ else
+ maxbits = bitregion_end - (bitpos + offset * BITS_PER_UNIT) + 1;


I assume the 1|| was there for debugging?


Fixed, plus I adjusted the calculation of maxbits everywhere because I 
found an off-by-one error.


I have also overhauled store_bit_field() to adjust the address of the 
address to point to the beginning of the bit region.  This fixed a 
myraid of corner cases pointed out by a test Hans Boehm was kind enough 
to provide.


I have added more tests.

How does this look?  (Pending tests.)
* params.h (ALLOW_STORE_DATA_RACES): New.
* params.def (PARAM_ALLOW_STORE_DATA_RACES): New.
* Makefile.in (expr.o): Depend on PARAMS_H.
* machmode.h (get_best_mode): Add argument.
* fold-const.c (optimize_bit_field_compare): Add argument to
get_best_mode.
(fold_truthop): Same.
* ifcvt.c (noce_emit_move_insn): Add argument to store_bit_field.
* expr.c (emit_group_store): Same.
(copy_blkmode_from_reg): Same.
(write_complex_part): Same.
(optimize_bitfield_assignment_op): Add argument.
Add argument to get_best_mode.
(get_bit_range): New.
(expand_assignment): Calculate maxbits and pass it down
accordingly.
(store_field): New argument.
(expand_expr_real_2): New argument to store_field.
Include params.h.
* expr.h (store_bit_field): New argument.
* stor-layout.c (get_best_mode): Restrict mode expansion by taking
into account maxbits.
* calls.c (store_unaligned_arguments_into_pseudos): New argument
to store_bit_field.
* expmed.c (store_bit_field_1): New argument.  Use it.
(store_bit_field): Same.
(store_fixed_bit_field): Same.
(store_split_bit_field): Same.
(extract_bit_field_1): Pass new argument to get_best_mode.
(extract_bit_field): Same.
* stmt.c (store_bit_field): Pass new argument to store_bit_field.
* doc/invoke.texi: Document parameter allow-store-data-races.

Index: doc/invoke.texi
===
--- doc/invoke.texi (revision 176280)
+++ doc/invoke.texi (working copy)
@@ -9027,6 +9027,11 @@ The maximum number of conditional stores
 if either vectorization (@option{-ftree-vectorize}) or if-conversion
 (@option{-ftree-loop-if-convert}) is disabled.  The default is 2.
 
+@item allow-store-data-races
+Allow optimizers to introduce new data races on stores.
+Set to 1 to allow, otherwise to 0.  This option is enabled by default
+unless implicitly set by the @option{-fmemory-model=} option.
+
 @item case-values-threshold
 The smallest number of different values for which it is best to use a
 jump-table instead of a tree of conditional branches.  If the value is
Index: machmode.h
===
--- machmode.h  (revision 176280)
+++ machmode.h  (working copy)
@@ -248,7 +248,10 @@ extern enum machine_mode mode_for_vector
 
 /* Find the best mode to use to access a bit field.  */
 
-extern enum machine_mode get_best_mode (int, int, unsigned int,
+extern enum machine_mode get_best_mode (int, int,
+   unsigned HOST_WIDE_INT,
+   unsigned HOST_WIDE_INT,
+   unsigned int,
enum machine_mode, int);

[v3] libstdc++/49836

2011-07-25 Thread Paolo Carlini

Hi,

tested x86_64-linux, committed to mainline (see audit trail for details)

Thanks,
Paolo.


2011-07-25  Paolo Carlini  
Nathan Ridge  

PR libstdc++/49836
* include/bits/stl_vector.h (vector<>::_M_emplace_back_aux):
Declare.
(vector<>::push_back(const value_type&)): Use it.
* include/bits/vector.tcc: Define.
(vector<>::emplace_back(_Args&&...)): Use it.
* testsuite/util/testsuite_tr1.h (CopyConsOnlyType, MoveConsOnlyType):
Add.
* testsuite/23_containers/vector/modifiers/push_back/49836.cc: New.
* testsuite/23_containers/deque/modifiers/push_back/49836.cc:
Likewise.
* testsuite/23_containers/deque/modifiers/push_front/49836.cc:
Likewise.
* testsuite/23_containers/vector/requirements/dr438/assign_neg.cc:
Adjust dg-error line number.
* testsuite/23_containers/vector/requirements/dr438/insert_neg.cc:
Likewise.
* testsuite/23_containers/vector/requirements/dr438/
constructor_1_neg.cc: Likewise.
* testsuite/23_containers/vector/requirements/dr438/
constructor_2_neg.cc: Likewise.
Index: include/bits/stl_vector.h
===
--- include/bits/stl_vector.h   (revision 176718)
+++ include/bits/stl_vector.h   (working copy)
@@ -902,7 +902,11 @@
++this->_M_impl._M_finish;
  }
else
+#ifdef __GXX_EXPERIMENTAL_CXX0X__
+ _M_emplace_back_aux(__x);
+#else
  _M_insert_aux(end(), __x);
+#endif
   }
 
 #ifdef __GXX_EXPERIMENTAL_CXX0X__
@@ -1303,6 +1307,10 @@
   template
 void
 _M_insert_aux(iterator __position, _Args&&... __args);
+
+  template
+void
+_M_emplace_back_aux(_Args&&... __args);
 #endif
 
   // Called by the latter.
Index: include/bits/vector.tcc
===
--- include/bits/vector.tcc (revision 176718)
+++ include/bits/vector.tcc (working copy)
@@ -99,7 +99,7 @@
++this->_M_impl._M_finish;
  }
else
- _M_insert_aux(end(), std::forward<_Args>(__args)...);
+ _M_emplace_back_aux(std::forward<_Args>(__args)...);
   }
 #endif
 
@@ -387,7 +387,51 @@
}
 }
 
+#ifdef __GXX_EXPERIMENTAL_CXX0X__
   template
+template
+  void
+  vector<_Tp, _Alloc>::
+  _M_emplace_back_aux(_Args&&... __args)
+  {
+   const size_type __len =
+ _M_check_len(size_type(1), "vector::_M_emplace_back_aux");
+   pointer __new_start(this->_M_allocate(__len));
+   pointer __new_finish(__new_start);
+   __try
+ {
+   _Alloc_traits::construct(this->_M_impl, __new_start + size(),
+std::forward<_Args>(__args)...);
+   __new_finish = 0;
+
+   __new_finish
+ = std::__uninitialized_move_if_noexcept_a
+ (this->_M_impl._M_start, this->_M_impl._M_finish,
+  __new_start, _M_get_Tp_allocator());
+
+   ++__new_finish;
+ }
+   __catch(...)
+ {
+   if (!__new_finish)
+ _Alloc_traits::destroy(this->_M_impl, __new_start + size());
+   else
+ std::_Destroy(__new_start, __new_finish, _M_get_Tp_allocator());
+   _M_deallocate(__new_start, __len);
+   __throw_exception_again;
+ }
+   std::_Destroy(this->_M_impl._M_start, this->_M_impl._M_finish,
+ _M_get_Tp_allocator());
+   _M_deallocate(this->_M_impl._M_start,
+ this->_M_impl._M_end_of_storage
+ - this->_M_impl._M_start);
+   this->_M_impl._M_start = __new_start;
+   this->_M_impl._M_finish = __new_finish;
+   this->_M_impl._M_end_of_storage = __new_start + __len;
+  }
+#endif
+
+  template
 void
 vector<_Tp, _Alloc>::
 _M_fill_insert(iterator __position, size_type __n, const value_type& __x)
Index: testsuite/23_containers/vector/modifiers/push_back/49836.cc
===
--- testsuite/23_containers/vector/modifiers/push_back/49836.cc (revision 0)
+++ testsuite/23_containers/vector/modifiers/push_back/49836.cc (revision 0)
@@ -0,0 +1,50 @@
+// { dg-options "-std=gnu++0x" }
+
+// Copyright (C) 2011 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+/

[PATCH, Obvious cleanup] Remove parm name from declaration

2011-07-25 Thread Dodji Seketeli
Hello,

I committed this obvious header cleanup patch to trunk.

-- 
Dodji

gcc/c-family

* c-common.h (set_underlying_type): Remove parm name from
declaration.

diff --git a/gcc/c-family/c-common.h b/gcc/c-family/c-common.h
index 202be02..4ac7c4a 100644
--- a/gcc/c-family/c-common.h
+++ b/gcc/c-family/c-common.h
@@ -985,7 +985,7 @@ extern void warn_for_sign_compare (location_t,
   enum tree_code resultcode);
 extern void do_warn_double_promotion (tree, tree, tree, const char *, 
  location_t);
-extern void set_underlying_type (tree x);
+extern void set_underlying_type (tree);
 extern VEC(tree,gc) *make_tree_vector (void);
 extern void release_tree_vector (VEC(tree,gc) *);
 extern VEC(tree,gc) *make_tree_vector_single (tree);


[Patch,AVR]: Fix PR29560 (map 16-bit shift to 8-bit)

2011-07-25 Thread Georg-Johann Lay
This is an optimization for 8-bit shifts if the high part is unused.

Variable shift offset shifts are tedious on AVR because these devices
can just shift by 1.

If the high part of a shift is unused, the high part need not to be
computed, i.e. the 16-bit shift can be mapped to a 8-bit shift.

Most of such shifts come from implicit 8->16 bit (word_mode) conversions
when dealing with 8-bit variables as in the example code:

unsigned char shift1 (unsigned char x, unsigned char s)
{
return x << s;
}

unsigned char y;

void shift2 (unsigned char x, unsigned char s)
{
y = x << s;
}

Note that the result will still be correct for shift offsets
8..15, i.e. the result will be 0.

Ok to commit?

Johann


PR target/29560
* config/avr/avr.md: Add peephole2 to map ashlhi3 to ashlqi3 if
high part of shift target is unused.

Index: config/avr/avr.md
===
--- config/avr/avr.md	(revision 176624)
+++ config/avr/avr.md	(working copy)
@@ -1993,6 +1993,29 @@ (define_insn "ashlhi3"
   [(set_attr "length" "6,0,2,2,4,10,10")
(set_attr "cc" "clobber,none,set_n,clobber,set_n,clobber,clobber")])
 
+
+;; High part of 16-bit shift is unused after the instruction:
+;; No need to compute it, map to 8-bit shift.
+
+(define_peephole2
+  [(set (match_operand:HI 0 "register_operand" "")
+(ashift:HI (match_dup 0)
+   (match_operand:QI 1 "register_operand" "")))]
+  ""
+  [(set (match_dup 2)
+(ashift:QI (match_dup 2)
+   (match_dup 1)))
+   (clobber (match_dup 3))]
+  {
+operands[3] = simplify_gen_subreg (QImode, operands[0], HImode, 1);
+
+if (!peep2_reg_dead_p (1, operands[3]))
+  FAIL;
+
+operands[2] = simplify_gen_subreg (QImode, operands[0], HImode, 0);
+  })
+
+
 (define_insn "ashlsi3"
   [(set (match_operand:SI 0 "register_operand"   "=r,r,r,r,r,r,r")
 	(ashift:SI (match_operand:SI 1 "register_operand" "0,0,0,r,0,0,0")


Re: eliminate bitmap regs_invalidated_by_call_regset

2011-07-25 Thread Steven Bosscher
On Mon, Jul 25, 2011 at 4:52 PM, Dimitrios Apostolou  wrote:
> Hello list,
>
> the attached patch eliminates regs_invalidated_by_call_regset bitmap and
> uses instead the original regs_invalidated_by_call HARD_REG_SET. Tested on
> i386, I had the following two regressions that I'll investigate right on:
>
>  FAIL: libmudflap.cth/pass39-frag.c (-O3) (rerun 10) execution test
>  FAIL: libmudflap.cth/pass39-frag.c (-O3) (rerun 10) output pattern test

These fail at random on some systems. IMHO: Just ignore this one.

Ciao!
Steven


Re: [RFC] Replace some bitmaps with HARD_REG_SETs - second version

2011-07-25 Thread Steven Bosscher
On Mon, Jul 25, 2011 at 4:20 PM, Michael Matz  wrote:
> Hi,
>
> On Mon, 25 Jul 2011, Dimitrios Apostolou wrote:
>
>> Bug found, in df_mark_reg I need to iterate until regno + n, not n. The error
>> is at the following hunk:
>>
>> --- gcc/df-scan.c       2011-02-02 20:08:06 +
>> +++ gcc/df-scan.c       2011-07-24 17:16:46 +
>> @@ -3713,35 +3717,40 @@ df_mark_reg (rtx reg, void *vset)
>>    if (regno < FIRST_PSEUDO_REGISTER)
>>      {
>>        int n = hard_regno_nregs[regno][GET_MODE (reg)];
>> -      bitmap_set_range (set, regno, n);
>> +      int i;
>> +      for (i=regno; i> +       SET_HARD_REG_BIT (*set, i);
>>      }
>
> No.  n is a count, hence the upper bound is regno + n.

Indeed. Which is what Jimis said. Note the "error is in this hunk" ;-)

Ciao!
Steven


Re: [Patch,AVR]: Fix PR29560 (map 16-bit shift to 8-bit)

2011-07-25 Thread Richard Henderson
On 07/25/2011 10:30 AM, Georg-Johann Lay wrote:
>   PR target/29560
>   * config/avr/avr.md: Add peephole2 to map ashlhi3 to ashlqi3 if
>   high part of shift target is unused.

Ok.


r~


Re: Allow IRIX Ada bootstrap with C++

2011-07-25 Thread Rainer Orth
Eric Botcazou  writes:

>> 2011-07-20  Rainer Orth  
>>
>>  * init.c [sgi] (__gnat_error_handler): Update sigaction(2) citation.
>>  Correct argument types.
>>  Extract code from reason.
>>  (__gnat_install_handler): Assign to act.sa_sigaction.
>
> This breaks signal handling on our IRIX 6.5 machine though.

Same for me ;-(  As already noted in the patch submission, there's
something fishy going on with the IRIX sighandler stuff:

gcc/ada/init.c (__gnat_install_handler) explicitly does not include
SA_SIGINFO in sa_flags, which means the handler only gets one arg, sig.
Still the installed handler (__gnat_error_handler) accesses args beyond
the first.

There are two possible solutions:

* Actually set SA_SIGINFO.

* Punt and cast the second __gnat_error_handler `arg' to an int.
  Running under gdb, it seems that three args are really passed.

I prefer the first, since that's the clean solution.  Unfortunately, my
question why the current code doesn't set SA_SIGINFO, yet cites a
considerable part of the man page about its effects, remained unanswered
when I submitted the patch.

Both solutions do work for a simple test of the 32-bit
null_pointer_deref1 test, but I'll have to perform a full testsuite run
and also adapt libjava/include/posix-signals.h.  I remember that there
were problems when I set SA_SIGINFO there, so maybe irix6-unwind.h needs
updates to cope with that.

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[C++ Patch, committed] PR 49845

2011-07-25 Thread Paolo Carlini

Hi,

I'm committing as obvious the below, to fix the breakage I inadvertently 
cause. Sorry again.


Paolo.

///
2011-07-25  Paolo Carlini  

PR bootstrap/49845
* parser.c (cp_parser_perform_range_for_lookup): Always assign *being
and *end before returning.
Index: parser.c
===
--- parser.c(revision 176754)
+++ parser.c(working copy)
@@ -8796,7 +8796,10 @@ static tree
 cp_parser_perform_range_for_lookup (tree range, tree *begin, tree *end)
 {
   if (error_operand_p (range))
-return error_mark_node;
+{
+  *begin = *end = error_mark_node;
+  return error_mark_node;
+}
 
   if (!COMPLETE_TYPE_P (complete_type (TREE_TYPE (range
 {


Re: [PATCH, PR 49094] Refrain from creating misaligned accesses in SRA

2011-07-25 Thread Martin Jambor
Hi,

On Thu, Jul 21, 2011 at 11:40:32AM +0200, Martin Jambor wrote:
> Hi,
> 
> On Thu, Jul 21, 2011 at 10:34:35AM +0200, Richard Guenther wrote:
> > On Wed, 20 Jul 2011, Ulrich Weigand wrote:
> > 
> > > Richard Guenther wrote:
> > > > On Tue, Jul 19, 2011 at 8:20 PM, Ulrich Weigand  
> > > > wrote:
> > > > > The problem is that in this expression
> > > > >   disappear = VIEW_CONVERT_EXPR(x_8);
> > > > > the rhs is considered unaligned and blocks the SRA transformation.
> > > > >
> > > > > The check you added for SSA_NAMEs doesn't hit, because the SSA_NAME is
> > > > > encapsulated in a VIEW_CONVERT_EXPR. When get_object_alignment is 
> > > > > called,
> > > > > the VIEW_CONVERT_EXPR is stripped off by get_inner_reference and the
> > > > > SSA_NAME appears, but then get_object_alignment doesn't handle it
> > > > > and just returns the default alignment of 8 bits.
> > > > >
> > > > > Maybe get_object_alignment should itself handle SSA_NAMEs?
> > > > 
> > > > But what should it return for a rvalue?  There is no "alignment" here.
> > > > I think SRA should avoid asking for rvalues.
> > > 
> > > I must admit I do not fully understand what the SRA code is attempting
> > > to achieve here ...  Could you elaborate on what you mean by "avoid
> > > asking for rvalues"?  Should the SRA code never check the RHS of an
> > > assignment for alignment, only the LHS?  Or should it classify the RHS
> > > tree according to whether the access is rvalue or lvalue (how would
> > > that work?)?
> > 
> > Well, it should only ask for stores / loads.  I'm not sure what we'd
> > want to return as alignment for an rvalue - MAX_ALIGNMENT?  What should
> > we return for get_object_alignment of an INTEGER_CST for example?
> > 
> 
> Yeah, we certainly should not be examining alingment of invariants and
> of conversions of ssa names in.  As far as rvalues in general are
> concerned, I don't really know which gimple predicate that would be.
> A comment suggests is_gimple_val but that does not return true for a
> VIEW_CONVERT_EXPR of an SSA_NAME and would return true for aggregate
> variables (which perhaps would not be a problem, however they do have
> an alignment).
> 
> So at the moment I'd go for stripping all conversions from exp before
> the first if in tree_non_mode_aligned_mem_p and adding
> is_gimple_min_invariant to the condition.  Does that make sense?


Like this?  Ulrich, can you please verify it works?  I have
bootstrapped this on x86_64 but there it obvioulsy works and my run of
compile/testsuite on compile farm sparc64 will take some time (plus,
the testcase you complained about passes there).

Thanks,

Martin


2011-07-25  Martin Jambor  

* tree-sra.c (tree_non_mode_aligned_mem_p): Strip conversions and
return false for invariants.

Index: src/gcc/tree-sra.c
===
--- src.orig/gcc/tree-sra.c
+++ src/gcc/tree-sra.c
@@ -1075,9 +1075,14 @@ tree_non_mode_aligned_mem_p (tree exp)
   enum machine_mode mode = TYPE_MODE (TREE_TYPE (exp));
   unsigned int align;
 
+  while (CONVERT_EXPR_P (exp)
+|| TREE_CODE (exp) == VIEW_CONVERT_EXPR)
+exp = TREE_OPERAND (exp, 0);
+
   if (TREE_CODE (exp) == SSA_NAME
   || TREE_CODE (exp) == MEM_REF
   || mode == BLKmode
+  || is_gimple_min_invariant (exp)
   || !STRICT_ALIGNMENT)
 return false;
 



[google] Merge r176592 from google/main to google/gcc-4_6 branch.

2011-07-25 Thread Rong Xu
committed.


Re: [PATCH, PR 49094] Refrain from creating misaligned accesses in SRA

2011-07-25 Thread Ulrich Weigand
Martin Jambor wrote:

> Like this?  Ulrich, can you please verify it works?  I have
> bootstrapped this on x86_64 but there it obvioulsy works and my run of
> compile/testsuite on compile farm sparc64 will take some time (plus,
> the testcase you complained about passes there).

Yes, this does fix the testcase that was failing on SPU.

Thanks,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com


Re: [RFC] Replace some bitmaps with HARD_REG_SETs - second version

2011-07-25 Thread Dimitrios Apostolou
That was a bug, indeed, but unfortunately it wasn't the one causing the 
crash I posted earlier... Even after fixing it I get the same backtrace 
from gdb.


So the petition "spot the bug" holds...


Thanks,
Dimitris



added some assert checks in hard-reg-set.h

2011-07-25 Thread Dimitrios Apostolou

Hello list,

the attached patch was tested on i386, and measured to have almost no 
overhead in runtime, when RTL checks are enabled:


Instruction Count before: 2328.6 M
Instruction Count after:  2334.4 M

which translates to some milliseconds, well within noise area.

Changelog:


2011-07-25  Dimitrios Apostolou 

	* hard-reg-set.h (TEST_HARD_REG_BIT, SET_HARD_REG_BIT, 
CLEAR_HARD_REG_BIT): Added some assert checks for test, set and clear 
operations of HARD_REG_SETs, enabled when RTL checks are on. Runtime 
overhead was measured as negligible.




Thanks,
Dimitris=== modified file 'gcc/hard-reg-set.h'
--- gcc/hard-reg-set.h  2011-01-03 20:52:22 +
+++ gcc/hard-reg-set.h  2011-07-25 17:06:36 +
@@ -41,6 +41,13 @@ along with GCC; see the file COPYING3.  
 
 typedef unsigned HOST_WIDEST_FAST_INT HARD_REG_ELT_TYPE;
 
+#ifdef ENABLE_RTL_CHECKING
+#define gcc_rtl_assert(EXPR) gcc_assert (EXPR)
+#else
+#define gcc_rtl_assert(EXPR) ((void)(0 && (EXPR)))
+#endif
+
+
 #if FIRST_PSEUDO_REGISTER <= HOST_BITS_PER_WIDEST_FAST_INT
 
 #define HARD_REG_SET HARD_REG_ELT_TYPE
@@ -91,14 +98,35 @@ typedef HARD_REG_ELT_TYPE HARD_REG_SET[H
 
 #define UHOST_BITS_PER_WIDE_INT ((unsigned) HOST_BITS_PER_WIDEST_FAST_INT)
 
-#ifdef HARD_REG_SET
-
 #define SET_HARD_REG_BIT(SET, BIT)  \
- ((SET) |= HARD_CONST (1) << (BIT))
+  hard_reg_set_set_bit (&(SET), (BIT))
 #define CLEAR_HARD_REG_BIT(SET, BIT)  \
- ((SET) &= ~(HARD_CONST (1) << (BIT)))
+  hard_reg_set_clear_bit(&(SET), (BIT))
 #define TEST_HARD_REG_BIT(SET, BIT)  \
- (!!((SET) & (HARD_CONST (1) << (BIT
+  hard_reg_set_bit_p((SET), (BIT))
+
+#ifdef HARD_REG_SET
+
+static inline void
+hard_reg_set_set_bit (HARD_REG_SET *s, unsigned int bit)
+{
+  gcc_rtl_assert (bit < FIRST_PSEUDO_REGISTER);
+  (*s) |= HARD_CONST (1) << bit;
+}
+
+static inline void
+hard_reg_set_clear_bit (HARD_REG_SET *s, unsigned int bit)
+{
+  gcc_rtl_assert (bit < FIRST_PSEUDO_REGISTER);
+  (*s) &= ~(HARD_CONST (1) << bit);
+}
+
+static inline bool
+hard_reg_set_bit_p (const HARD_REG_SET s, unsigned int bit)
+{
+  gcc_rtl_assert (bit < FIRST_PSEUDO_REGISTER);
+  return ((s >> bit) & HARD_CONST (1));
+}
 
 #define CLEAR_HARD_REG_SET(TO) ((TO) = HARD_CONST (0))
 #define SET_HARD_REG_SET(TO) ((TO) = ~ HARD_CONST (0))
@@ -137,17 +165,32 @@ hard_reg_set_empty_p (const HARD_REG_SET
 
 #else
 
-#define SET_HARD_REG_BIT(SET, BIT) \
-  ((SET)[(BIT) / UHOST_BITS_PER_WIDE_INT]  \
-   |= HARD_CONST (1) << ((BIT) % UHOST_BITS_PER_WIDE_INT))
-
-#define CLEAR_HARD_REG_BIT(SET, BIT)   \
-  ((SET)[(BIT) / UHOST_BITS_PER_WIDE_INT]  \
-   &= ~(HARD_CONST (1) << ((BIT) % UHOST_BITS_PER_WIDE_INT)))
-
-#define TEST_HARD_REG_BIT(SET, BIT)\
-  (!!((SET)[(BIT) / UHOST_BITS_PER_WIDE_INT]   \
-  & (HARD_CONST (1) << ((BIT) % UHOST_BITS_PER_WIDE_INT
+static inline void
+hard_reg_set_set_bit (HARD_REG_SET *s, unsigned int bit)
+{
+  int byte = bit / UHOST_BITS_PER_WIDE_INT;
+  int bitpos = bit % UHOST_BITS_PER_WIDE_INT;
+  gcc_rtl_assert (bit < FIRST_PSEUDO_REGISTER);
+  (*s)[byte] |= HARD_CONST (1) << bitpos;
+}
+
+static inline void
+hard_reg_set_clear_bit (HARD_REG_SET *s, unsigned int bit)
+{
+  int byte = bit / UHOST_BITS_PER_WIDE_INT;
+  int bitpos = bit % UHOST_BITS_PER_WIDE_INT;
+  gcc_rtl_assert (bit < FIRST_PSEUDO_REGISTER);
+  (*s)[byte] &= ~(HARD_CONST (1) << bitpos);
+}
+
+static inline bool
+hard_reg_set_bit_p (const HARD_REG_SET s, unsigned int bit)
+{
+  int byte = bit / UHOST_BITS_PER_WIDE_INT;
+  int bitpos = bit % UHOST_BITS_PER_WIDE_INT;
+  gcc_rtl_assert (bit < FIRST_PSEUDO_REGISTER);
+  return ((s[byte] >> bitpos) & HARD_CONST (1));
+}
 
 #if FIRST_PSEUDO_REGISTER <= 2*HOST_BITS_PER_WIDEST_FAST_INT
 #define CLEAR_HARD_REG_SET(TO)  \



Re: [PATCH] Fix configure --with-cloog

2011-07-25 Thread Romain Geissler
Le 6 juil. 2011 à 11:04, Romain Geissler a écrit :

> Hello
> 
> This patch fix an issue while building with cloog and gmp installed in
> a custom separate directories.
> 
> How to reproduce :
> - Make sure you've installed cloog and gmp in separate directories
> (ie ${WITH-CLOOG-PATH}/lib doesn't contain libgmp)
> - Make sure neither gmp nor cloog is not installed in a directory
> searched by default by your linker when looking for libs.
> - Launch configure script with both --with-gmp and --with-cloog
> switch properly set
> 
> This result in an unexpected error while configuring: error: Unable to
> find a usable CLooG.
> 
> 
> 2011-07-06  Romain Geissler  
> 
>   * configure: Add $gmplibs to cloog $LDFLAGS
> 
> Index: configure
> ===
> --- configure   (revision 175709)
> +++ configure   (working copy)
> @@ -5713,7 +5713,7 @@ if test "x$with_cloog" != "xno"; then
> 
> CFLAGS="${CFLAGS} ${clooginc} ${gmpinc}"
>   CPPFLAGS="${CPPFLAGS} ${_cloogorginc}"
> -  LDFLAGS="${LDFLAGS} ${clooglibs}"
> +  LDFLAGS="${LDFLAGS} ${clooglibs} ${gmplibs}"
> 
>   case $cloog_backend in
> "ppl-legacy")

Ping !


Re: [PLUGIN] compile and install gengtype, install gtype.state

2011-07-25 Thread Romain Geissler
Le 19 juil. 2011 à 14:41, Romain Geissler a écrit :

> 2011/7/19 Jakub Jelinek :
>> On Tue, Jul 19, 2011 at 02:26:32PM +0200, Romain Geissler wrote:
>>> 2011-07-18  Romain Geissler  
>>> 
>>>   * gengtype-state.c (#include "bconfig.h"): include "bconfig.h"
>>>   if GENERATOR_FILE is defined, "config.h" otherwise.
>>>   * gengtype.c: Likewise.
>>>   * gengtype-lex.l: Likewise.
>>>   * gengtype-parse.c: Likewise.
>>>   * Makefile.in (gengtype): compile and install for host when
>>>   $enable_plugins is set to "yes".
>>>   (gtype.state): install when $enable_plugins is set to "yes".
>> 
>> s/: include/: Include/;s/: compile/: Compile/;s/: install/: Install/
>> 
>>Jakub
>> 
> 
> Fixed
> 
> Romain Geissler
> 
> 2011-07-18  Romain Geissler  
> 
>   * gengtype-state.c (#include "bconfig.h"): Include "bconfig.h"
>   if GENERATOR_FILE is defined, "config.h" otherwise.
>   * gengtype.c: Likewise.
>   * gengtype-lex.l: Likewise.
>   * gengtype-parse.c: Likewise.
>   * Makefile.in (gengtype): Compile and install for host when
>   $enable_plugins is set to "yes".
>   (gtype.state): Install when $enable_plugins is set to "yes".


Ping !


Re: [PLUGIN] compile and install gengtype, install gtype.state

2011-07-25 Thread Jakub Jelinek
On Mon, Jul 25, 2011 at 09:10:55PM +0200, Romain Geissler wrote:
> > 2011-07-18  Romain Geissler  
> > 
> > * gengtype-state.c (#include "bconfig.h"): Include "bconfig.h"
> > if GENERATOR_FILE is defined, "config.h" otherwise.

Still not right, this should have been
* gengtype-state.c: Include "bconfig.h" if GENERATOR_FILE is
define, "config.h" otherwise.

> > * gengtype.c: Likewise.
> > * gengtype-lex.l: Likewise.
> > * gengtype-parse.c: Likewise.

> > * Makefile.in (gengtype): Compile and install for host when
> > $enable_plugins is set to "yes".
> > (gtype.state): Install when $enable_plugins is set to "yes".

And this should list all the Makefile.in goals you've changed, added etc.

Ok with those changes.

Jakub


RE: [Patch,AVR]: PR49687 (better widening 32-bit mul)

2011-07-25 Thread Weddington, Eric


> -Original Message-
> From: Georg-Johann Lay [mailto:a...@gjlay.de]
> Sent: Monday, July 25, 2011 10:29 AM
> To: Weddington, Eric
> Cc: gcc-patches@gcc.gnu.org; Anatoly Sokolov; Denis Chertykov; Richard
> Henderson
> Subject: Re: [Patch,AVR]: PR49687 (better widening 32-bit mul)
> 

> 
> This means that a pure __mulsi3 will have 30+30+20 = 80 bytes (+18).
> 
> If all functions are used they occupy 116 bytes (-4), so they actually
> save a little space if they are used all with the benefit that they also
> can one-extend, extend 32 = 16*32 as well as 32=16*16 and work for
> small (17 bit signed) constants.
> 
> __umulhisi3 reads:
> 
> DEFUN __umulhisi3
> mul A0, B0
> movwC0, r0
> mul A1, B1
> movwC2, r0
> mul A0, B1
> add C1, r0
> adc C2, r1
> clr __zero_reg__
> adc C3, __zero_reg__
> mul A1, B0
> add C1, r0
> adc C2, r1
> clr __zero_reg__
> adc C3, __zero_reg__
> ret
> ENDF __umulhisi3
> 
> It could be compressed to the following sequence, i.e.
> 24 bytes instead of 30, but I think that's too much of
> quenching the last byte out of the code:
> 
> DEFUN __umulhisi3
> mul A0, B0
> movwC0, r0
> mul A1, B1
> movwC2, r0
> mul A0, B1
> rcall   1f
> mul A1, B0
> 1:  add C1, r0
> adc C2, r1
> clr __zero_reg__
> adc C3, __zero_reg__
> ret
> ENDF __umulhisi3
> 
> 
> In that lack of real-world-code that uses 32-bit arithmetic I trust
> my intuition that code size will decrease in general ;-)
> 

Hi Johann,

I would agree with you that it seems that overall code size will decrease in 
general.

However, I also like your creative compression in the second sequence above, 
and I think that it would be best to implement that sequence and try to find 
others like that where possible.

Remember that to AVR users, code size is *everything*. Even saving 6 bytes here 
or there has a positive effect.

I'll let Richard (or Denis if he's back from vacation) do the actual approval 
of the patch, as they are a lot more technically competent in this area. But 
I'm ok with the general tactic of the code reuse with looking at further ways 
to reduce code size like the example above.

Eric Weddington


[PATCH, i386, take 2]: Rewrite LEA handling (was:Re: PATCH [10/n] X32: Support x32 LEA insns)

2011-07-25 Thread Uros Bizjak
On Mon, Jul 25, 2011 at 3:30 PM, H.J. Lu  wrote:

>> Attached patch implements -fpic handling for x32. In x32 mode, we now
>> use x86_64_general_operand and corresponding "e" constraints for adds
>> in SImode, since it looks that invalid addresses can only be generated
>> through adds. This avoids a whole bunch of new predicates and
>> constraints.

> X32 glibc is miscompiled:
>
> CPP='/export/build/gnu/gcc-x32/release/usr/gcc-4.7.0-x32/bin/gcc -mx32
>  -E -x c-header'
> /export/build/gnu/glibc-x32/build-x86_64-linux/elf/ld-linux-x32.so.2
> --library-path 
> /export/build/gnu/glibc-x32/build-x86_64-linux:/export/build/gnu/glibc-x32/build-x86_64-linux/math:/export/build/gnu/glibc-x32/build-x86_64-linux/elf:/export/build/gnu/glibc-x32/build-x86_64-linux/dlfcn:/export/build/gnu/glibc-x32/build-x86_64-linux/nss:/export/build/gnu/glibc-x32/build-x86_64-linux/nis:/export/build/gnu/glibc-x32/build-x86_64-linux/rt:/export/build/gnu/glibc-x32/build-x86_64-linux/resolv:/export/build/gnu/glibc-x32/build-x86_64-linux/crypt:/export/build/gnu/glibc-x32/build-x86_64-linux/nptl
> /export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcgen -Y
> ../scripts -h rpcsvc/yppasswd.x -o
> /export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcsvc/yppasswd.T
> make[5]: *** 
> [/export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcsvc/yppasswd.stmp]
> Segmentation fault (core dumped)
>
> Some LEA patterns are wrong for x32.  I will investigate.

We have to prevent symbols from entering general_operand predicated
SImode operands. Fortunatelly, x86_64_general_operand works OK for
x32, while both for i686 and x86_64 are unaffected due to early bypass
(i686) and due to the fact that all symbols are DImode (x86_64).

2011-07-25  Uros Bizjak  
H.J. Lu  

PR target/47381
PR target/49832
PR target/49833
* config/i386/i386.md (i): Change SImode attribute to "e".
(g): Change SImode attribute to "rme".
(di): Change SImode attribute to "nF".
(general_operand): Change SImode attribute to x86_64_general_operand.
(general_szext_operand): Change SImode attribute to
x86_64_szext_general_operand.
(immediate_operand): Change SImode attribute to
x86_64_immediate_operand-
(*movdi_internal_rex64): Remove mode from pic_32bit_operand check.
(*movsi_internal): Ditto.  Use "e" constraint in alternative 2.
(*lea_1): Use SWI48 mode iterator.
(*lea_1_zext): New insn pattern.
(*add1): Use x86_64_general_operand predicate for operand 2.
Update operand constraints.
(addsi_1_zext): Ditto.
(*add2): Ditto.
(*addsi_3_zext): Ditto.
(*subsi_1_zext): Ditto.
(*subsi_2_zext): Ditto.
(*subsi_3_zext): Ditto.
(*addsi3_carry_zext): Ditto.
(*si3_zext_cc_overflow): Ditto.
(*mulsi3_1_zext): Ditto.
(*andsi_1): Ditto.
(*andsi_1_zext): Ditto.
(*andsi_2_zext): Ditto.
(*si_1_zext): Ditto.
(*si_2_zext): Ditto.
(*test_1): Use  predicate for operand 1.
(*and_2): Ditto.
(add->lea splitter): Check operand modes in insn constraint.  Extend
operands less than SImode wide to SImode.
(add->lea zext splitter): Do not extend input operands to DImode.
(*lea_general_1): Handle only QImode and HImode operands.
(*lea_general_2): Ditto.
(*lea_general_3): Ditto.
(*lea_general_1_zext): Remove.
(*lea_general_2_zext): Ditto.
(*lea_general_3_zext): Ditto.
(*lea_general_4): Check operand modes in insn constraint.  Extend
operands less than SImode wide to SImode.
(ashift->lea splitter): Ditto.
* config/i386/i386.c (ix86_print_operand_address): Print address
registers with 'q' modifier on 64bit targets.
* config/i386/predicates.md (pic_32bit_opreand): Define as special
predicate.  Reject non-SI and non-DI modes.

Bootstrapped and regression ested on x86_64-pc-linux-gnu {,-m32}.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 176748)
+++ config/i386/i386.md (working copy)
@@ -861,13 +861,13 @@
 (define_mode_attr r [(QI "q") (HI "r") (SI "r") (DI "r")])
 
 ;; Immediate operand constraint for integer modes.
-(define_mode_attr i [(QI "n") (HI "n") (SI "i") (DI "e")])
+(define_mode_attr i [(QI "n") (HI "n") (SI "e") (DI "e")])
 
 ;; General operand constraint for word modes.
-(define_mode_attr g [(QI "qmn") (HI "rmn") (SI "g") (DI "rme")])
+(define_mode_attr g [(QI "qmn") (HI "rmn") (SI "rme") (DI "rme")])
 
 ;; Immediate operand constraint for double integer modes.
-(define_mode_attr di [(SI "iF") (DI "e")])
+(define_mode_attr di [(SI "nF") (DI "e")])
 
 ;; Immediate operand constraint for shifts.
 (define_mode_attr S [(QI "I") (HI "I") (SI "I") (DI "J") (TI "O")])
@@ -876,7 +876,7 @@
 (define_mode_attr general_operand

Re: [PATCH] GNU/kFreeBSD systems running on MIPS

2011-07-25 Thread Richard Sandiford
Robert Millan  writes:
> This patch adds support for GNU/kFreeBSD systems running on MIPS.

Looks good.  However, Rainer's in the middle of moving things from gcc/
to libgcc/ -- where they belong -- and committing a new port now would
interfere with that.  If it's OK, I'd like to hold off applying this
until Rainer's finished his changes.

Thanks,
Richard


[google] Disable annotalysis in google/main (issue4816050)

2011-07-25 Thread Diego Novillo

Annotalysis has been broken in google/main since the last merge from
trunk.  This patch disables it until Delesley comes up with a
permanent solution.

Tested on x86_64.  OK for google/main?


2011-07-25  Diego Novillo  

* tree-threadsafe-analyze.c (gate_threadsafe_analyze): Always
return false.

2011-07-25   Diego Novillo  

* gcc/testsuite/g++.dg/dg.exp: Remove tests in directory
thread-ann.

Index: gcc/testsuite/g++.dg/dg.exp
===
--- gcc/testsuite/g++.dg/dg.exp (revision 176640)
+++ gcc/testsuite/g++.dg/dg.exp (working copy)
@@ -48,6 +48,7 @@ set tests [prune $tests $srcdir/$subdir/
 set tests [prune $tests $srcdir/$subdir/torture/*]
 set tests [prune $tests $srcdir/$subdir/graphite/*]
 set tests [prune $tests $srcdir/$subdir/guality/*]
+set tests [prune $tests $srcdir/$subdir/thread-ann/*]
 
 # Main loop.
 dg-runtest $tests "" $DEFAULT_CXXFLAGS
Index: gcc/tree-threadsafe-analyze.c
===
--- gcc/tree-threadsafe-analyze.c   (revision 176640)
+++ gcc/tree-threadsafe-analyze.c   (working copy)
@@ -3571,7 +3571,8 @@ execute_threadsafe_analyze (void)
 static bool
 gate_threadsafe_analyze (void)
 {
-  return warn_thread_safety != 0;
+  /* FIXME google/main - Annotalysis is currently broken.  */
+  return false;
 }
 
 struct gimple_opt_pass pass_threadsafe_analyze =

--
This patch is available for review at http://codereview.appspot.com/4816050


Re: [PATCH, i386, take 2]: Rewrite LEA handling (was:Re: PATCH [10/n] X32: Support x32 LEA insns)

2011-07-25 Thread H.J. Lu
On Mon, Jul 25, 2011 at 12:59 PM, Uros Bizjak  wrote:
> On Mon, Jul 25, 2011 at 3:30 PM, H.J. Lu  wrote:
>
>>> Attached patch implements -fpic handling for x32. In x32 mode, we now
>>> use x86_64_general_operand and corresponding "e" constraints for adds
>>> in SImode, since it looks that invalid addresses can only be generated
>>> through adds. This avoids a whole bunch of new predicates and
>>> constraints.
>
>> X32 glibc is miscompiled:
>>
>> CPP='/export/build/gnu/gcc-x32/release/usr/gcc-4.7.0-x32/bin/gcc -mx32
>>  -E -x c-header'
>> /export/build/gnu/glibc-x32/build-x86_64-linux/elf/ld-linux-x32.so.2
>> --library-path 
>> /export/build/gnu/glibc-x32/build-x86_64-linux:/export/build/gnu/glibc-x32/build-x86_64-linux/math:/export/build/gnu/glibc-x32/build-x86_64-linux/elf:/export/build/gnu/glibc-x32/build-x86_64-linux/dlfcn:/export/build/gnu/glibc-x32/build-x86_64-linux/nss:/export/build/gnu/glibc-x32/build-x86_64-linux/nis:/export/build/gnu/glibc-x32/build-x86_64-linux/rt:/export/build/gnu/glibc-x32/build-x86_64-linux/resolv:/export/build/gnu/glibc-x32/build-x86_64-linux/crypt:/export/build/gnu/glibc-x32/build-x86_64-linux/nptl
>> /export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcgen -Y
>> ../scripts -h rpcsvc/yppasswd.x -o
>> /export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcsvc/yppasswd.T
>> make[5]: *** 
>> [/export/build/gnu/glibc-x32/build-x86_64-linux/sunrpc/rpcsvc/yppasswd.stmp]
>> Segmentation fault (core dumped)
>>
>> Some LEA patterns are wrong for x32.  I will investigate.
>
> We have to prevent symbols from entering general_operand predicated
> SImode operands. Fortunatelly, x86_64_general_operand works OK for
> x32, while both for i686 and x86_64 are unaffected due to early bypass
> (i686) and due to the fact that all symbols are DImode (x86_64).
>
> 2011-07-25  Uros Bizjak  
>            H.J. Lu  
>
>        PR target/47381
>        PR target/49832
>        PR target/49833
>        * config/i386/i386.md (i): Change SImode attribute to "e".
>        (g): Change SImode attribute to "rme".
>        (di): Change SImode attribute to "nF".
>        (general_operand): Change SImode attribute to x86_64_general_operand.
>        (general_szext_operand): Change SImode attribute to
>        x86_64_szext_general_operand.
>        (immediate_operand): Change SImode attribute to
>        x86_64_immediate_operand-
>        (*movdi_internal_rex64): Remove mode from pic_32bit_operand check.
>        (*movsi_internal): Ditto.  Use "e" constraint in alternative 2.
>        (*lea_1): Use SWI48 mode iterator.
>        (*lea_1_zext): New insn pattern.
>        (*add1): Use x86_64_general_operand predicate for operand 2.
>        Update operand constraints.
>        (addsi_1_zext): Ditto.
>        (*add2): Ditto.
>        (*addsi_3_zext): Ditto.
>        (*subsi_1_zext): Ditto.
>        (*subsi_2_zext): Ditto.
>        (*subsi_3_zext): Ditto.
>        (*addsi3_carry_zext): Ditto.
>        (*si3_zext_cc_overflow): Ditto.
>        (*mulsi3_1_zext): Ditto.
>        (*andsi_1): Ditto.
>        (*andsi_1_zext): Ditto.
>        (*andsi_2_zext): Ditto.
>        (*si_1_zext): Ditto.
>        (*si_2_zext): Ditto.
>        (*test_1): Use  predicate for operand 1.
>        (*and_2): Ditto.
>        (add->lea splitter): Check operand modes in insn constraint.  Extend
>        operands less than SImode wide to SImode.
>        (add->lea zext splitter): Do not extend input operands to DImode.
>        (*lea_general_1): Handle only QImode and HImode operands.
>        (*lea_general_2): Ditto.
>        (*lea_general_3): Ditto.
>        (*lea_general_1_zext): Remove.
>        (*lea_general_2_zext): Ditto.
>        (*lea_general_3_zext): Ditto.
>        (*lea_general_4): Check operand modes in insn constraint.  Extend
>        operands less than SImode wide to SImode.
>        (ashift->lea splitter): Ditto.
>        * config/i386/i386.c (ix86_print_operand_address): Print address
>        registers with 'q' modifier on 64bit targets.
>        * config/i386/predicates.md (pic_32bit_opreand): Define as special
>        predicate.  Reject non-SI and non-DI modes.
>
> Bootstrapped and regression ested on x86_64-pc-linux-gnu {,-m32}.
>
> Uros.
>

GCC and glibc testsuites are clean on x32.  Can you check it in?

Thanks.


-- 
H.J.


  1   2   >