Failure to combine SHIFT with ZERO_EXTEND

2010-02-04 Thread Rahul Kharche
Hi All,

On our private port of GCC 4.4.1 we fail to combine successive SHIFT
operations like in the following case

#include 
#include 

void f1 ()
{
  unsigned short t1;
  unsigned short t2;

  t1 = rand();
  t2 = rand();

  t1 <<= 1; t2 <<= 1;
  t1 <<= 1; t2 <<= 1;
  t1 <<= 1; t2 <<= 1;
  t1 <<= 1; t2 <<= 1;
  t1 <<= 1; t2 <<= 1;
  t1 <<= 1; t2 <<= 1;

  printf("%d\n", (t1+t2));
}

This is a ZERO_EXTEND problem, because combining SHIFTs with whole
integers works correctly, so do signed values. The problem seems to
arise in the RTL combiner which combines the ZERO_EXTEND with the
SHIFT to generate a SHIFT and an AND. Our architecture does not
support AND with large constants and hence do not have a matching
insn pattern (we prefer not doing this, because of large constants
remain hanging at the end of all RTL optimisations and cause needless
reloads).

Fixing the combiner to convert masking AND operations to ZERO_EXTRACT
fixes this issue without any obvious regressions. I'm adding the
patch here against GCC 4.4.1 for any comments and/or suggestions.

Cheers,
Rahul


--- combine.c   2009-04-01 21:47:37.0 +0100
+++ combine.c   2010-02-04 15:04:41.0 +
@@ -446,6 +446,7 @@
 static void record_truncated_values (rtx *, void *);
 static bool reg_truncated_to_mode (enum machine_mode, const_rtx);
 static rtx gen_lowpart_or_truncate (enum machine_mode, rtx);
+static bool can_zero_extract_p (rtx, rtx, enum machine_mode);
 

 
 /* It is not safe to use ordinary gen_lowpart in combine.
@@ -6973,6 +6974,16 @@
   make_compound_operation (XEXP (x, 0),
next_code),
   i, NULL_RTX, 1, 1, 0, 1);
+  else if (can_zero_extract_p (XEXP (x, 0), XEXP (x, 1), mode))
+{
+ unsigned HOST_WIDE_INT len =  HOST_BITS_PER_WIDE_INT
+   - CLZ_HWI (UINTVAL (XEXP (x,
1)));
+ new_rtx = make_extraction (mode,
+   make_compound_operation (XEXP (x, 0),
+next_code),
+0, NULL_RTX, len, 1, 0,
+in_code == COMPARE);
+}
 
   break;
 
@@ -7245,6 +7256,25 @@
 return simplify_gen_unary (TRUNCATE, mode, x, GET_MODE (x));
 }
 
+static bool
+can_zero_extract_p (rtx x, rtx mask_rtx, enum machine_mode mode)
+{
+  unsigned HOST_WIDE_INT count_lz, count_tz;
+  unsigned HOST_WIDE_INT nonzero, mask_all;
+  unsigned HOST_WIDE_INT mask_value = UINTVAL (mask_rtx);
+
+  mask_all = (unsigned HOST_WIDE_INT) -1;
+  nonzero = nonzero_bits (x, mode);
+  count_lz = CLZ_HWI (mask_value);
+  count_tz = CTZ_HWI (mask_value);
+
+  if (count_tz <= (unsigned HOST_WIDE_INT) CTZ_HWI (nonzero)
+  && ((mask_all >> (count_lz + count_tz)) << count_tz) ==
mask_value)
+return true;
+  
+  return false;
+}
+
 /* See if X can be simplified knowing that we will only refer to it in
MODE and will only refer to those bits that are nonzero in MASK.
If other bits are being computed or if masking operations are done
@@ -8957,7 +8987,6 @@
 op0 = UNKNOWN;
 
   *pop0 = op0;
-
   /* ??? Slightly redundant with the above mask, but not entirely.
  Moving this above means we'd have to sign-extend the mode mask
  for the final test.  */


MicroBlaze branch updated

2010-02-04 Thread Michael Eager

The microblaze branch has been synced with gcc-head and
updated to gcc-4.5.0.

--
Michael Eagerea...@eagercon.com
1960 Park Blvd., Palo Alto, CA 94306  650-325-8077


Exception handling information in the macintosh

2010-02-04 Thread jacob navia

Hi

I have developed a JIT for linux 64 bits. It generates exception 
handling information

according to DWARF under linux and it works with gcc 4.2.1.

I have recompiled the same code under the Macintosh and something has 
changed,

apparently, because now any throw that passes through my code crashes.

Are there any differences bertween the exception info format between the
macintosh and linux?

The stack at the moment of the throw looks like this:

   CPP code compiled with gcc  4.2.1 calls
   JIT code generated on the fly by my JIT compiler that calls
   CPP code compiled with gcc 4.2.1 that throws. The catch
   is in the CPP code
  
The throw must go through the JIT code, so it needs the DWARF frame 
descriptions

that I generate. Apparently there is a difference.

Thanks in advance for any information.

jacob





Re: gen_lowpart called where 'truncate' needed?

2010-02-04 Thread Mat Hostetter
Adam Nemet  writes:

> Ian Lance Taylor  writes:
> > Mat Hostetter  writes:
> >
> >> Since the high bits are already zero, that would be less efficient on
> >> most platforms, so guarding it with something like this would probably
> >> be smarter:
> >>
> >>   if (targetm.mode_rep_extended (mode, GET_MODE(x)) == SIGN_EXTEND)
> >> return simplify_gen_unary (TRUNCATE, mode, x, GET_MODE (x));
> >>
> >> I'm happy to believe I'm doing something wrong in my back end, but I'm
> >> not sure what that would be.  I could also believe these are obscure
> >> edge cases no one cared about before.  Any tips would be appreciated.
> >
> > Interesting.  I think you are in obscure edge case territory.  Your
> > suggestion makes sense to me, and in fact it should probably be put
> > into gen_lowpart_common.
> 
> FWIW, I disagree.  Firstly, mode_rep_extended is a special case of
> !TRULY_NOOP_TRUNCATION so the above check should use that.  Secondly, in
> MIPS we call gen_lowpart to convert DI to SI when we know it's safe.  In
> this case we always want a subreg not a truncate for better code.  So I
> don't think gen_lowpart_common is the right place to fix this.
>
> I think the right fix is to call convert_to_mode or convert_move in the
> expansion code which ensure the proper truncation.

That would yield correct code, but wouldn't it throw away the fact
that the high bits are already known to be zero, and yield redundant
zero-extension on some platforms?  I'm guessing that's why the code was
originally written to call convert_lowpart rather than convert_to_mode.

And just to add a minor wrinkle: for the widen_bswap case, which
produces (bswap64 (x) >> 32), the optimal thing is actually to use
ashiftrt instead of lshiftrt when targetm.mode_rep_extended says
SIGN_EXTEND, and then call convert_lowpart as now.

-Mat


help splitting instruction with two memory operands

2010-02-04 Thread King, Steven R
Hello List,

I'm new to gcc internals.  As part of an experiment, I copied the i386 back-end 
in gcc 4.2.2 to create my own i386-like target arch.  At some point, my hacking 
caused my i386 to produce assembly with two memory touching operands in one 
instruction, like this:

movl12(%ebp), -44(%ebp)
movl16(%ebp), -40(%ebp)
movl20(%ebp), -36(%ebp)

I compiled with -O0.

I need to fix this back to i386 style with only 1 memory reference allowed per 
instruction.  After a day looking, I'm unable to determine what I did wrong 
either in the i386.md or i386.c files to cause this.  My own i386 is quite 
hacked up, so just doing a diff with the original file was too noisy to give 
much clue.

Can anyone please offer a pointer as to where RTL with two memory references is 
split in the i386 back end?

Thanks you,
-steve


gcc-4.5-20100204 is now available

2010-02-04 Thread gccadmin
Snapshot gcc-4.5-20100204 is now available on
  ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20100204/
and on various mirrors, see http://gcc.gnu.org/mirrors.html for details.

This snapshot has been generated from the GCC 4.5 SVN branch
with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 156503

You'll find:

gcc-4.5-20100204.tar.bz2  Complete GCC (includes all of below)

gcc-core-4.5-20100204.tar.bz2 C front end and core compiler

gcc-ada-4.5-20100204.tar.bz2  Ada front end and runtime

gcc-fortran-4.5-20100204.tar.bz2  Fortran front end and runtime

gcc-g++-4.5-20100204.tar.bz2  C++ front end and runtime

gcc-java-4.5-20100204.tar.bz2 Java front end and runtime

gcc-objc-4.5-20100204.tar.bz2 Objective-C front end and runtime

gcc-testsuite-4.5-20100204.tar.bz2The GCC testsuite

Diffs from 4.5-20100128 are available in the diffs/ subdirectory.

When a particular snapshot is ready for public consumption the LATEST-4.5
link is updated and a message is sent to the gcc list.  Please do not use
a snapshot before it has been announced that way.


Unwanted IRA copies via process_reg_shuffles

2010-02-04 Thread Jeff Law


I was looking at a regression caused by having ira-reload utilize the 
existing copy detection code in IRA rather than my own and stumbled upon 
this...


Consider this insn prior to IRA:

(insn 72 56 126 8 j.c:744 (parallel [
(set (reg:SI 110)
(minus:SI (reg:SI 69 [ ew_u$parts$lsw ])
(reg:SI 68 [ ew_u$parts$lsw ])))
(clobber (reg:CC 17 flags))
]) 290 {*subsi_1} (expr_list:REG_DEAD (reg:SI 69 [ 
ew_u$parts$lsw ])

(expr_list:REG_DEAD (reg:SI 68 [ ew_u$parts$lsw ])
(expr_list:REG_UNUSED (reg:CC 17 flags)
(nil)

Which matches this pattern in the x86 backend:

(define_insn "*sub_1"
  [(set (match_operand:SWI 0 "nonimmediate_operand" "=m,")
(minus:SWI
  (match_operand:SWI 1 "nonimmediate_operand" "0,0")
  (match_operand:SWI 2 "" ",m")))
   (clobber (reg:CC FLAGS_REG))]
  "ix86_binary_operator_ok (MINUS, mode, operands)"
  "sub{}\t{%2, %0|%0, %2}"
  [(set_attr "type" "alu")
   (set_attr "mode" "")])


Note carefully that the constraints require operands 0 and 1 to match.  
Operand 2 is not tied to any other operand.  In fact, if operand 2 is 
tied to operand0, then we are guaranteed to generate a reload and muck 
up the code pretty badly.



Now looking at the copies recorded by IRA we have:


  cp0:a0(r95)<->a1(r70)@248:move
  cp1:a4(r69)<->a6(r110)@178:constraint
  cp2:a3(r68)<->a6(r110)@22:shuffle
  cp3:a9(r66)<->a10(r92)@11:shuffle
  cp4:a11(r79)<->a12(r93)@89:move
  cp5:a1(r70)<->a14(r96)@114:constraint
  cp6:a1(r70)<->a13(r97)@114:constraint


Note carefully cp2 which claims a shuffle-copy between r68 and r110.   
ISTM that when trying to assign a hard reg to pseudo r110 that if r68 
has a hard reg, but r69 does not, then pseudo r110 will show a cost 
savings if it is allocated into the same hard reg as pseudo r68.



The problematic code is add_insn_allocno_copies:
   {
  extract_insn (insn);
  for (i = 0; i < recog_data.n_operands; i++)
{
  operand = recog_data.operand[i];
  if (REG_SUBREG_P (operand)
&& find_reg_note (insn, REG_DEAD,
REG_P (operand)
? operand : SUBREG_REG (operand)) != 
NULL_RTX)

{
  str = recog_data.constraints[i];
  while (*str == ' ' || *str == '\t')
str++;
  bound_p = false;
  for (j = 0, commut_p = false; j < 2; j++, commut_p = true)
if ((dup = get_dup (i, commut_p)) != NULL_RTX
&& REG_SUBREG_P (dup)
&& process_regs_for_copy (operand, dup, true,
  NULL_RTX, freq))
  bound_p = true;
  if (bound_p)
continue;
  /* If an operand dies, prefer its hard register for the
 output operands by decreasing the hard register cost
 or creating the corresponding allocno copies.  The
 cost will not correspond to a real move insn cost, so
 make the frequency smaller.  */
  process_reg_shuffles (operand, i, freq < 8 ? 1 : freq / 8);
}
}


With r68 dying and not bound to an another operand, we create a 
reg-shuffle copy to encourage tying r68 to the output operand.  Not good.


ISTM that if an output is already bound to some input that we should not 
be recording a copy between an unbound dying input and the bound output.


You can play with the attached (meaningless) testcase -O2 -m32 -fPIC.  
You won't see code quality regressions due to this problem on this 
testcase with the mainline sources, but it should be enough to trigger 
the bogus copy.


Thoughts?

Jeff
typedef unsigned int size_t;
typedef int wchar_t;
typedef struct
{
  int quot;
  int rem;
} div_t;
typedef struct
{
  long int quot;
  long int rem;
} ldiv_t;
__extension__ typedef struct
{
  long long int quot;
  long long int rem;
} lldiv_t;
extern size_t __ctype_get_mb_cur_max (void) __attribute__ ((__nothrow__));
extern double atof (__const char *__nptr)
  __attribute__ ((__nothrow__)) __attribute__ ((__pure__))
  __attribute__ ((__nonnull__ (1)));
extern int atoi (__const char *__nptr) __attribute__ ((__nothrow__))
  __attribute__ ((__pure__)) __attribute__ ((__nonnull__ (1)));
extern long int atol (__const char *__nptr) __attribute__ ((__nothrow__))
  __attribute__ ((__pure__)) __attribute__ ((__nonnull__ (1)));
__extension__ extern long long int atoll (__const char *__nptr)
  __attribute__ ((__nothrow__)) __attribute__ ((__pure__))
  __attribute__ ((__nonnull__ (1)));
extern double strtod (__const char *__restrict __nptr,
  char **__restrict __endptr)
  __attribute__ ((__nothrow__)) __attribute__ ((__nonnull__ (1)));
extern float strtof (__const char *__restrict __nptr,
 char **__restrict __endptr) __attribute__ ((__nothrow__))
  __attribute__ ((__nonnull__ (1)));
extern long double s

Re: Exception handling information in the macintosh

2010-02-04 Thread Jack Howarth
On Thu, Feb 04, 2010 at 08:12:10PM +0100, jacob navia wrote:
> Hi
>
> I have developed a JIT for linux 64 bits. It generates exception  
> handling information
> according to DWARF under linux and it works with gcc 4.2.1.
>
> I have recompiled the same code under the Macintosh and something has  
> changed,
> apparently, because now any throw that passes through my code crashes.
>
> Are there any differences bertween the exception info format between the
> macintosh and linux?
>
> The st...@the moment of the throw looks like this:
>
>CPP code compiled with gcc  4.2.1 calls
>JIT code generated on the fly by my JIT compiler that calls
>CPP code compiled with gcc 4.2.1 that throws. The catch
>is in the CPP code
>   The throw must go through the JIT code, so it needs the DWARF frame  
> descriptions
> that I generate. Apparently there is a difference.
>
> Thanks in advance for any information.
>
> jacob
>
>

Jacob,
Are you compiling on darwin10 and using the Apple or FSF
gcc compilers? If you are using Apple's, this question should
be on the darwin-devel mailing list instead. I would mention
though that darwin10 is problematic in that the libgcc and its
unwinder calls are now subsumed into libSystem. This means that
regardless of how you try to link in libgcc, the new code in
libSystem will always be used. For darwin10, Apple decided to
default their linker over to compact unwind which causes problems
with some of the java testcases on gcc 4.4.x. This is fixed for
FSF gcc 4.5 by forcing the compiler to always link with the
-no_compact_unwind option. Another complexity is that Apple
decided to silently abort some of the libgcc calls (now in
libSystem) that require access to FDEs like _Unwind_FindEnclosingFunction().
The reasoning was that the default behavior (compact unwind info) doesn't
use FDEs.
   This is fixed for gcc 4.5 by 
http://gcc.gnu.org/ml/gcc-patches/2009-12/msg00998.html.
If you are using any other unwinder call that is now silently
aborting, let me know as it may be another that we need to re-export
under a different name from libgcc_ext. Alternatively, you may
be able to work around this issue by using -mmacosx-version-min=10.5
under darwin10.
   Jack


Re: gen_lowpart called where 'truncate' needed?

2010-02-04 Thread Adam Nemet
Mat Hostetter writes:
> Adam Nemet  writes:
> 
> > Ian Lance Taylor  writes:
> > > Mat Hostetter  writes:
> > >
> > >> Since the high bits are already zero, that would be less efficient on
> > >> most platforms, so guarding it with something like this would probably
> > >> be smarter:
> > >>
> > >>   if (targetm.mode_rep_extended (mode, GET_MODE(x)) == SIGN_EXTEND)
> > >> return simplify_gen_unary (TRUNCATE, mode, x, GET_MODE (x));
> > >>
> > >> I'm happy to believe I'm doing something wrong in my back end, but I'm
> > >> not sure what that would be.  I could also believe these are obscure
> > >> edge cases no one cared about before.  Any tips would be appreciated.
> > >
> > > Interesting.  I think you are in obscure edge case territory.  Your
> > > suggestion makes sense to me, and in fact it should probably be put
> > > into gen_lowpart_common.
> > 
> > FWIW, I disagree.  Firstly, mode_rep_extended is a special case of
> > !TRULY_NOOP_TRUNCATION so the above check should use that.  Secondly, in
> > MIPS we call gen_lowpart to convert DI to SI when we know it's safe.  In
> > this case we always want a subreg not a truncate for better code.  So I
> > don't think gen_lowpart_common is the right place to fix this.
> >
> > I think the right fix is to call convert_to_mode or convert_move in the
> > expansion code which ensure the proper truncation.
> 
> That would yield correct code, but wouldn't it throw away the fact
> that the high bits are already known to be zero, and yield redundant
> zero-extension on some platforms?  I'm guessing that's why the code was
> originally written to call convert_lowpart rather than convert_to_mode.

convert_to_mode uses gen_lowpart for truncation if TRULY_NOOP_TRUNCATION.

> And just to add a minor wrinkle: for the widen_bswap case, which
> produces (bswap64 (x) >> 32), the optimal thing is actually to use
> ashiftrt instead of lshiftrt when targetm.mode_rep_extended says
> SIGN_EXTEND, and then call convert_lowpart as now.

Agreed but we sort of do this due to this pattern in mips.md:

(define_insn "*lshr32_trunc"
  [(set (match_operand:SUBDI 0 "register_operand" "=d")
(truncate:SUBDI
  (lshiftrt:DI (match_operand:DI 1 "register_operand" "d")
   (const_int 32]
  "TARGET_64BIT && !TARGET_MIPS16"
  "dsra\t%0,%1,32"
  [(set_attr "type" "shift")
   (set_attr "mode" "")])


That said with mode_rep_extended this optimization could now be performed by
the middle-end in simplify-rtx.c:

(truncate:SI (lshiftrt:DI .. 32))
->
(subreg:SI (ashiftrt:DI .. 32) )

if targetm.mode_rep_extended (SImod, DImode) == SIGN_EXTEND.

Adam