Failure to combine SHIFT with ZERO_EXTEND
Hi All, On our private port of GCC 4.4.1 we fail to combine successive SHIFT operations like in the following case #include #include void f1 () { unsigned short t1; unsigned short t2; t1 = rand(); t2 = rand(); t1 <<= 1; t2 <<= 1; t1 <<= 1; t2 <<= 1; t1 <<= 1; t2 <<= 1; t1 <<= 1; t2 <<= 1; t1 <<= 1; t2 <<= 1; t1 <<= 1; t2 <<= 1; printf("%d\n", (t1+t2)); } This is a ZERO_EXTEND problem, because combining SHIFTs with whole integers works correctly, so do signed values. The problem seems to arise in the RTL combiner which combines the ZERO_EXTEND with the SHIFT to generate a SHIFT and an AND. Our architecture does not support AND with large constants and hence do not have a matching insn pattern (we prefer not doing this, because of large constants remain hanging at the end of all RTL optimisations and cause needless reloads). Fixing the combiner to convert masking AND operations to ZERO_EXTRACT fixes this issue without any obvious regressions. I'm adding the patch here against GCC 4.4.1 for any comments and/or suggestions. Cheers, Rahul --- combine.c 2009-04-01 21:47:37.0 +0100 +++ combine.c 2010-02-04 15:04:41.0 + @@ -446,6 +446,7 @@ static void record_truncated_values (rtx *, void *); static bool reg_truncated_to_mode (enum machine_mode, const_rtx); static rtx gen_lowpart_or_truncate (enum machine_mode, rtx); +static bool can_zero_extract_p (rtx, rtx, enum machine_mode); /* It is not safe to use ordinary gen_lowpart in combine. @@ -6973,6 +6974,16 @@ make_compound_operation (XEXP (x, 0), next_code), i, NULL_RTX, 1, 1, 0, 1); + else if (can_zero_extract_p (XEXP (x, 0), XEXP (x, 1), mode)) +{ + unsigned HOST_WIDE_INT len = HOST_BITS_PER_WIDE_INT + - CLZ_HWI (UINTVAL (XEXP (x, 1))); + new_rtx = make_extraction (mode, + make_compound_operation (XEXP (x, 0), +next_code), +0, NULL_RTX, len, 1, 0, +in_code == COMPARE); +} break; @@ -7245,6 +7256,25 @@ return simplify_gen_unary (TRUNCATE, mode, x, GET_MODE (x)); } +static bool +can_zero_extract_p (rtx x, rtx mask_rtx, enum machine_mode mode) +{ + unsigned HOST_WIDE_INT count_lz, count_tz; + unsigned HOST_WIDE_INT nonzero, mask_all; + unsigned HOST_WIDE_INT mask_value = UINTVAL (mask_rtx); + + mask_all = (unsigned HOST_WIDE_INT) -1; + nonzero = nonzero_bits (x, mode); + count_lz = CLZ_HWI (mask_value); + count_tz = CTZ_HWI (mask_value); + + if (count_tz <= (unsigned HOST_WIDE_INT) CTZ_HWI (nonzero) + && ((mask_all >> (count_lz + count_tz)) << count_tz) == mask_value) +return true; + + return false; +} + /* See if X can be simplified knowing that we will only refer to it in MODE and will only refer to those bits that are nonzero in MASK. If other bits are being computed or if masking operations are done @@ -8957,7 +8987,6 @@ op0 = UNKNOWN; *pop0 = op0; - /* ??? Slightly redundant with the above mask, but not entirely. Moving this above means we'd have to sign-extend the mode mask for the final test. */
MicroBlaze branch updated
The microblaze branch has been synced with gcc-head and updated to gcc-4.5.0. -- Michael Eagerea...@eagercon.com 1960 Park Blvd., Palo Alto, CA 94306 650-325-8077
Exception handling information in the macintosh
Hi I have developed a JIT for linux 64 bits. It generates exception handling information according to DWARF under linux and it works with gcc 4.2.1. I have recompiled the same code under the Macintosh and something has changed, apparently, because now any throw that passes through my code crashes. Are there any differences bertween the exception info format between the macintosh and linux? The stack at the moment of the throw looks like this: CPP code compiled with gcc 4.2.1 calls JIT code generated on the fly by my JIT compiler that calls CPP code compiled with gcc 4.2.1 that throws. The catch is in the CPP code The throw must go through the JIT code, so it needs the DWARF frame descriptions that I generate. Apparently there is a difference. Thanks in advance for any information. jacob
Re: gen_lowpart called where 'truncate' needed?
Adam Nemet writes: > Ian Lance Taylor writes: > > Mat Hostetter writes: > > > >> Since the high bits are already zero, that would be less efficient on > >> most platforms, so guarding it with something like this would probably > >> be smarter: > >> > >> if (targetm.mode_rep_extended (mode, GET_MODE(x)) == SIGN_EXTEND) > >> return simplify_gen_unary (TRUNCATE, mode, x, GET_MODE (x)); > >> > >> I'm happy to believe I'm doing something wrong in my back end, but I'm > >> not sure what that would be. I could also believe these are obscure > >> edge cases no one cared about before. Any tips would be appreciated. > > > > Interesting. I think you are in obscure edge case territory. Your > > suggestion makes sense to me, and in fact it should probably be put > > into gen_lowpart_common. > > FWIW, I disagree. Firstly, mode_rep_extended is a special case of > !TRULY_NOOP_TRUNCATION so the above check should use that. Secondly, in > MIPS we call gen_lowpart to convert DI to SI when we know it's safe. In > this case we always want a subreg not a truncate for better code. So I > don't think gen_lowpart_common is the right place to fix this. > > I think the right fix is to call convert_to_mode or convert_move in the > expansion code which ensure the proper truncation. That would yield correct code, but wouldn't it throw away the fact that the high bits are already known to be zero, and yield redundant zero-extension on some platforms? I'm guessing that's why the code was originally written to call convert_lowpart rather than convert_to_mode. And just to add a minor wrinkle: for the widen_bswap case, which produces (bswap64 (x) >> 32), the optimal thing is actually to use ashiftrt instead of lshiftrt when targetm.mode_rep_extended says SIGN_EXTEND, and then call convert_lowpart as now. -Mat
help splitting instruction with two memory operands
Hello List, I'm new to gcc internals. As part of an experiment, I copied the i386 back-end in gcc 4.2.2 to create my own i386-like target arch. At some point, my hacking caused my i386 to produce assembly with two memory touching operands in one instruction, like this: movl12(%ebp), -44(%ebp) movl16(%ebp), -40(%ebp) movl20(%ebp), -36(%ebp) I compiled with -O0. I need to fix this back to i386 style with only 1 memory reference allowed per instruction. After a day looking, I'm unable to determine what I did wrong either in the i386.md or i386.c files to cause this. My own i386 is quite hacked up, so just doing a diff with the original file was too noisy to give much clue. Can anyone please offer a pointer as to where RTL with two memory references is split in the i386 back end? Thanks you, -steve
gcc-4.5-20100204 is now available
Snapshot gcc-4.5-20100204 is now available on ftp://gcc.gnu.org/pub/gcc/snapshots/4.5-20100204/ and on various mirrors, see http://gcc.gnu.org/mirrors.html for details. This snapshot has been generated from the GCC 4.5 SVN branch with the following options: svn://gcc.gnu.org/svn/gcc/trunk revision 156503 You'll find: gcc-4.5-20100204.tar.bz2 Complete GCC (includes all of below) gcc-core-4.5-20100204.tar.bz2 C front end and core compiler gcc-ada-4.5-20100204.tar.bz2 Ada front end and runtime gcc-fortran-4.5-20100204.tar.bz2 Fortran front end and runtime gcc-g++-4.5-20100204.tar.bz2 C++ front end and runtime gcc-java-4.5-20100204.tar.bz2 Java front end and runtime gcc-objc-4.5-20100204.tar.bz2 Objective-C front end and runtime gcc-testsuite-4.5-20100204.tar.bz2The GCC testsuite Diffs from 4.5-20100128 are available in the diffs/ subdirectory. When a particular snapshot is ready for public consumption the LATEST-4.5 link is updated and a message is sent to the gcc list. Please do not use a snapshot before it has been announced that way.
Unwanted IRA copies via process_reg_shuffles
I was looking at a regression caused by having ira-reload utilize the existing copy detection code in IRA rather than my own and stumbled upon this... Consider this insn prior to IRA: (insn 72 56 126 8 j.c:744 (parallel [ (set (reg:SI 110) (minus:SI (reg:SI 69 [ ew_u$parts$lsw ]) (reg:SI 68 [ ew_u$parts$lsw ]))) (clobber (reg:CC 17 flags)) ]) 290 {*subsi_1} (expr_list:REG_DEAD (reg:SI 69 [ ew_u$parts$lsw ]) (expr_list:REG_DEAD (reg:SI 68 [ ew_u$parts$lsw ]) (expr_list:REG_UNUSED (reg:CC 17 flags) (nil) Which matches this pattern in the x86 backend: (define_insn "*sub_1" [(set (match_operand:SWI 0 "nonimmediate_operand" "=m,") (minus:SWI (match_operand:SWI 1 "nonimmediate_operand" "0,0") (match_operand:SWI 2 "" ",m"))) (clobber (reg:CC FLAGS_REG))] "ix86_binary_operator_ok (MINUS, mode, operands)" "sub{}\t{%2, %0|%0, %2}" [(set_attr "type" "alu") (set_attr "mode" "")]) Note carefully that the constraints require operands 0 and 1 to match. Operand 2 is not tied to any other operand. In fact, if operand 2 is tied to operand0, then we are guaranteed to generate a reload and muck up the code pretty badly. Now looking at the copies recorded by IRA we have: cp0:a0(r95)<->a1(r70)@248:move cp1:a4(r69)<->a6(r110)@178:constraint cp2:a3(r68)<->a6(r110)@22:shuffle cp3:a9(r66)<->a10(r92)@11:shuffle cp4:a11(r79)<->a12(r93)@89:move cp5:a1(r70)<->a14(r96)@114:constraint cp6:a1(r70)<->a13(r97)@114:constraint Note carefully cp2 which claims a shuffle-copy between r68 and r110. ISTM that when trying to assign a hard reg to pseudo r110 that if r68 has a hard reg, but r69 does not, then pseudo r110 will show a cost savings if it is allocated into the same hard reg as pseudo r68. The problematic code is add_insn_allocno_copies: { extract_insn (insn); for (i = 0; i < recog_data.n_operands; i++) { operand = recog_data.operand[i]; if (REG_SUBREG_P (operand) && find_reg_note (insn, REG_DEAD, REG_P (operand) ? operand : SUBREG_REG (operand)) != NULL_RTX) { str = recog_data.constraints[i]; while (*str == ' ' || *str == '\t') str++; bound_p = false; for (j = 0, commut_p = false; j < 2; j++, commut_p = true) if ((dup = get_dup (i, commut_p)) != NULL_RTX && REG_SUBREG_P (dup) && process_regs_for_copy (operand, dup, true, NULL_RTX, freq)) bound_p = true; if (bound_p) continue; /* If an operand dies, prefer its hard register for the output operands by decreasing the hard register cost or creating the corresponding allocno copies. The cost will not correspond to a real move insn cost, so make the frequency smaller. */ process_reg_shuffles (operand, i, freq < 8 ? 1 : freq / 8); } } With r68 dying and not bound to an another operand, we create a reg-shuffle copy to encourage tying r68 to the output operand. Not good. ISTM that if an output is already bound to some input that we should not be recording a copy between an unbound dying input and the bound output. You can play with the attached (meaningless) testcase -O2 -m32 -fPIC. You won't see code quality regressions due to this problem on this testcase with the mainline sources, but it should be enough to trigger the bogus copy. Thoughts? Jeff typedef unsigned int size_t; typedef int wchar_t; typedef struct { int quot; int rem; } div_t; typedef struct { long int quot; long int rem; } ldiv_t; __extension__ typedef struct { long long int quot; long long int rem; } lldiv_t; extern size_t __ctype_get_mb_cur_max (void) __attribute__ ((__nothrow__)); extern double atof (__const char *__nptr) __attribute__ ((__nothrow__)) __attribute__ ((__pure__)) __attribute__ ((__nonnull__ (1))); extern int atoi (__const char *__nptr) __attribute__ ((__nothrow__)) __attribute__ ((__pure__)) __attribute__ ((__nonnull__ (1))); extern long int atol (__const char *__nptr) __attribute__ ((__nothrow__)) __attribute__ ((__pure__)) __attribute__ ((__nonnull__ (1))); __extension__ extern long long int atoll (__const char *__nptr) __attribute__ ((__nothrow__)) __attribute__ ((__pure__)) __attribute__ ((__nonnull__ (1))); extern double strtod (__const char *__restrict __nptr, char **__restrict __endptr) __attribute__ ((__nothrow__)) __attribute__ ((__nonnull__ (1))); extern float strtof (__const char *__restrict __nptr, char **__restrict __endptr) __attribute__ ((__nothrow__)) __attribute__ ((__nonnull__ (1))); extern long double s
Re: Exception handling information in the macintosh
On Thu, Feb 04, 2010 at 08:12:10PM +0100, jacob navia wrote: > Hi > > I have developed a JIT for linux 64 bits. It generates exception > handling information > according to DWARF under linux and it works with gcc 4.2.1. > > I have recompiled the same code under the Macintosh and something has > changed, > apparently, because now any throw that passes through my code crashes. > > Are there any differences bertween the exception info format between the > macintosh and linux? > > The st...@the moment of the throw looks like this: > >CPP code compiled with gcc 4.2.1 calls >JIT code generated on the fly by my JIT compiler that calls >CPP code compiled with gcc 4.2.1 that throws. The catch >is in the CPP code > The throw must go through the JIT code, so it needs the DWARF frame > descriptions > that I generate. Apparently there is a difference. > > Thanks in advance for any information. > > jacob > > Jacob, Are you compiling on darwin10 and using the Apple or FSF gcc compilers? If you are using Apple's, this question should be on the darwin-devel mailing list instead. I would mention though that darwin10 is problematic in that the libgcc and its unwinder calls are now subsumed into libSystem. This means that regardless of how you try to link in libgcc, the new code in libSystem will always be used. For darwin10, Apple decided to default their linker over to compact unwind which causes problems with some of the java testcases on gcc 4.4.x. This is fixed for FSF gcc 4.5 by forcing the compiler to always link with the -no_compact_unwind option. Another complexity is that Apple decided to silently abort some of the libgcc calls (now in libSystem) that require access to FDEs like _Unwind_FindEnclosingFunction(). The reasoning was that the default behavior (compact unwind info) doesn't use FDEs. This is fixed for gcc 4.5 by http://gcc.gnu.org/ml/gcc-patches/2009-12/msg00998.html. If you are using any other unwinder call that is now silently aborting, let me know as it may be another that we need to re-export under a different name from libgcc_ext. Alternatively, you may be able to work around this issue by using -mmacosx-version-min=10.5 under darwin10. Jack
Re: gen_lowpart called where 'truncate' needed?
Mat Hostetter writes: > Adam Nemet writes: > > > Ian Lance Taylor writes: > > > Mat Hostetter writes: > > > > > >> Since the high bits are already zero, that would be less efficient on > > >> most platforms, so guarding it with something like this would probably > > >> be smarter: > > >> > > >> if (targetm.mode_rep_extended (mode, GET_MODE(x)) == SIGN_EXTEND) > > >> return simplify_gen_unary (TRUNCATE, mode, x, GET_MODE (x)); > > >> > > >> I'm happy to believe I'm doing something wrong in my back end, but I'm > > >> not sure what that would be. I could also believe these are obscure > > >> edge cases no one cared about before. Any tips would be appreciated. > > > > > > Interesting. I think you are in obscure edge case territory. Your > > > suggestion makes sense to me, and in fact it should probably be put > > > into gen_lowpart_common. > > > > FWIW, I disagree. Firstly, mode_rep_extended is a special case of > > !TRULY_NOOP_TRUNCATION so the above check should use that. Secondly, in > > MIPS we call gen_lowpart to convert DI to SI when we know it's safe. In > > this case we always want a subreg not a truncate for better code. So I > > don't think gen_lowpart_common is the right place to fix this. > > > > I think the right fix is to call convert_to_mode or convert_move in the > > expansion code which ensure the proper truncation. > > That would yield correct code, but wouldn't it throw away the fact > that the high bits are already known to be zero, and yield redundant > zero-extension on some platforms? I'm guessing that's why the code was > originally written to call convert_lowpart rather than convert_to_mode. convert_to_mode uses gen_lowpart for truncation if TRULY_NOOP_TRUNCATION. > And just to add a minor wrinkle: for the widen_bswap case, which > produces (bswap64 (x) >> 32), the optimal thing is actually to use > ashiftrt instead of lshiftrt when targetm.mode_rep_extended says > SIGN_EXTEND, and then call convert_lowpart as now. Agreed but we sort of do this due to this pattern in mips.md: (define_insn "*lshr32_trunc" [(set (match_operand:SUBDI 0 "register_operand" "=d") (truncate:SUBDI (lshiftrt:DI (match_operand:DI 1 "register_operand" "d") (const_int 32] "TARGET_64BIT && !TARGET_MIPS16" "dsra\t%0,%1,32" [(set_attr "type" "shift") (set_attr "mode" "")]) That said with mode_rep_extended this optimization could now be performed by the middle-end in simplify-rtx.c: (truncate:SI (lshiftrt:DI .. 32)) -> (subreg:SI (ashiftrt:DI .. 32) ) if targetm.mode_rep_extended (SImod, DImode) == SIGN_EXTEND. Adam