Re: Updated to respond to various email comments from Jason, Diego and Cary (issue6197069)
> As I don't have access to a Darwin machine to test a fix, would you > mind updating the test? The failures are gone with the obvious patch: diff -up ../_clean/gcc/testsuite/gcc.dg/pubtypes-2.c gcc/testsuite/gcc.dg/pubtypes-2.c --- ../_clean/gcc/testsuite/gcc.dg/pubtypes-2.c 2009-11-25 18:15:43.0 +0100 +++ gcc/testsuite/gcc.dg/pubtypes-2.c 2012-06-22 23:56:37.0 +0200 @@ -2,7 +2,7 @@ /* { dg-options "-O0 -gdwarf-2 -dA" } */ /* { dg-skip-if "Unmatchable assembly" { mmix-*-* } { "*" } { "" } } */ /* { dg-final { scan-assembler "__debug_pubtypes" } } */ -/* { dg-final { scan-assembler "long+\[ \t\]+0x6a+\[ \t\]+\[#;]+\[ \t\]+Length of Public Type Names Info" } } */ +/* { dg-final { scan-assembler "long+\[ \t\]+0x13b+\[ \t\]+\[#;]+\[ \t\]+Length of Public Type Names Info" } } */ /* { dg-final { scan-assembler "used_struct0\"+\[ \t\]+\[#;]+\[ \t\]+external name" } } */ /* { dg-final { scan-assembler-not "unused_struct0\"+\[ \t\]+\[#;]+\[ \t\]+external name" } } */ diff -up ../_clean/gcc/testsuite/gcc.dg/pubtypes-3.c gcc/testsuite/gcc.dg/pubtypes-3.c --- ../_clean/gcc/testsuite/gcc.dg/pubtypes-3.c 2009-11-25 18:15:46.0 +0100 +++ gcc/testsuite/gcc.dg/pubtypes-3.c 2012-06-22 23:57:10.0 +0200 @@ -2,7 +2,7 @@ /* { dg-options "-O0 -gdwarf-2 -dA" } */ /* { dg-skip-if "Unmatchable assembly" { mmix-*-* } { "*" } { "" } } */ /* { dg-final { scan-assembler "__debug_pubtypes" } } */ -/* { dg-final { scan-assembler "long+\[ \t\]+0x6a+\[ \t\]+\[#;]+\[ \t\]+Length of Public Type Names Info" } } */ +/* { dg-final { scan-assembler "long+\[ \t\]+0x13b+\[ \t\]+\[#;]+\[ \t\]+Length of Public Type Names Info" } } */ /* { dg-final { scan-assembler "used_struct0\"+\[ \t\]+\[#;]+\[ \t\]+external name" } } */ /* { dg-final { scan-assembler-not "unused_struct0\"+\[ \t\]+\[#;]+\[ \t\]+external name" } } */ /* { dg-final { scan-assembler-not "\"list_name_type0\"+\[ \t\]+\[#;]+\[ \t\]+external name" } } */ diff -up ../_clean/gcc/testsuite/gcc.dg/pubtypes-4.c gcc/testsuite/gcc.dg/pubtypes-4.c --- ../_clean/gcc/testsuite/gcc.dg/pubtypes-4.c 2009-11-25 18:15:39.0 +0100 +++ gcc/testsuite/gcc.dg/pubtypes-4.c 2012-06-22 23:57:38.0 +0200 @@ -2,7 +2,7 @@ /* { dg-options "-O0 -gdwarf-2 -dA" } */ /* { dg-skip-if "Unmatchable assembly" { mmix-*-* } { "*" } { "" } } */ /* { dg-final { scan-assembler "__debug_pubtypes" } } */ -/* { dg-final { scan-assembler "long+\[ \t\]+0xa1+\[ \t\]+\[#;]+\[ \t\]+Length of Public Type Names Info" } } */ +/* { dg-final { scan-assembler "long+\[ \t\]+0x172+\[ \t\]+\[#;]+\[ \t\]+Length of Public Type Names Info" } } */ /* { dg-final { scan-assembler "used_struct0\"+\[ \t\]+\[#;]+\[ \t\]+external name" } } */ /* { dg-final { scan-assembler-not "unused_struct0\"+\[ \t\]+\[#;]+\[ \t\]+external name" } } */ /* { dg-final { scan-assembler "\"list_name_type0\"+\[ \t\]+\[#;]+\[ \t\]+external name" } } */ I don't have write access, so someone will have to do the commit. TIA Dominique
Re: [PATCH][C++] Fix PR53605
On Mon, Jun 11, 2012 at 5:15 AM, Richard Guenther wrote: > > When I changed empty arrays domain to use a signed sizetype [0, -1] > domain mangling of a empty-array-type broke because mangling adds > an unsigned one to the signed -1 which causes an ICE (I chose to > do that instead of shifting the range to [1, 0] because much more > code relies on a zero lower bound ...). > > The following fixes that by using double-ints for the arithmetic > and thus also does not generate a not needed tree node. > > Bootstrapped and tested on x86_64-unknown-linux-gnu, ok? > > Thanks, > Richard. > > 2012-06-11 Richard Guenther > > PR c++/53616 > * mangle.c (write_array_type): Use double-ints for array domain > arithmetic. > This caused: http://gcc.gnu.org/bugzilla/show_bug.cgi?id=53752 -- H.J.
Re: [patch testsuite]: Fix two testcases for x86_64-*-mingw* target
On Jun 20, 2012, at 5:10 AM, Kai Tietz wrote: > 2012-06-20 Kai Tietz > >* gcc.target/i386/pr23943.c (size_t): Use compatible type-definition >for LLP64 targets. >* gcc.target/i386/pr38988.c: Likewise > Ok for apply? Ok.
[PATCH 0/3] Fix target/53749
The problem being addressed is that expand_mult expects that any mode that supports multiply also support shift. In practice that's a fairly valid assumption -- I've never seen an ISA for which this didn't hold. However, it means that if you jump through hoops in the backend to provide a (vector) multiply, you'd better jump through those same hoops to provide a (vector) shift. On the good side, with the costs adjusted properly we can get the complicated multiply expansion being reduced to a (sequence of) paddb insns. Two patches to clean things up so that the final patch to support vector shifts in V*QImode is more readable. Tested all together on x86_64-linux. Visual spot checks of -mavx and -mavx2 code. r~ Richard Henderson (3): i386: Extract the guts of mulv16qi3 to ix86_expand_vecop_qihi i386: Pass ix86_expand_sse_unpack operands by value PR target/53749 gcc/ChangeLog | 25 +++ gcc/config/i386/i386-protos.h |4 +- gcc/config/i386/i386.c| 164 + gcc/config/i386/i386.md |3 + gcc/config/i386/sse.md| 147 +--- 5 files changed, 217 insertions(+), 126 deletions(-) -- 1.7.10.2
[PATCH 1/3] i386: Extract the guts of mulv16qi3 to ix86_expand_vecop_qihi
* config/i386/sse.md (mul3): Change from insn_and_split to pure expander; move expansion code ... * config/i386/i386.c (ix86_expand_vecop_qihi): ... here. New function. * config/i386/i386-protos.h: Update. diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 431db6c..4e7469d 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -192,6 +192,8 @@ extern void ix86_expand_rounddf_32 (rtx, rtx); extern void ix86_expand_trunc (rtx, rtx); extern void ix86_expand_truncdf_32 (rtx, rtx); +extern void ix86_expand_vecop_qihi (enum rtx_code, rtx, rtx, rtx); + #ifdef TREE_CODE extern void init_cumulative_args (CUMULATIVE_ARGS *, tree, rtx, tree, int); #endif /* TREE_CODE */ diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 8167770..e23c418 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -38438,6 +38438,91 @@ ix86_expand_vec_extract_even_odd (rtx targ, rtx op0, rtx op1, unsigned odd) expand_vec_perm_even_odd_1 (&d, odd); } +/* Expand a vector operation CODE for a V*QImode in terms of the + same operation on V*HImode. */ + +void +ix86_expand_vecop_qihi (enum rtx_code code, rtx dest, rtx op1, rtx op2) +{ + enum machine_mode qimode = GET_MODE (dest); + enum machine_mode himode; + rtx (*gen_il) (rtx, rtx, rtx); + rtx (*gen_ih) (rtx, rtx, rtx); + rtx op1_l, op1_h, op2_l, op2_h, res_l, res_h; + struct expand_vec_perm_d d; + bool ok; + int i; + + if (qimode == V16QImode) +{ + himode = V8HImode; + gen_il = gen_vec_interleave_lowv16qi; + gen_ih = gen_vec_interleave_highv16qi; +} + else if (qimode == V32QImode) +{ + himode = V16HImode; + gen_il = gen_avx2_interleave_lowv32qi; + gen_ih = gen_avx2_interleave_highv32qi; +} + else +gcc_unreachable (); + + /* Unpack data such that we've got a source byte in each low byte of + each word. We don't care what goes into the high byte of each word. + Rather than trying to get zero in there, most convenient is to let + it be a copy of the low byte. */ + op1_l = gen_reg_rtx (qimode); + op1_h = gen_reg_rtx (qimode); + emit_insn (gen_il (op1_l, op1, op1)); + emit_insn (gen_ih (op1_h, op1, op1)); + + op2_l = gen_reg_rtx (qimode); + op2_h = gen_reg_rtx (qimode); + emit_insn (gen_il (op2_l, op2, op2)); + emit_insn (gen_ih (op2_h, op2, op2)); + + /* Perform the operation. */ + res_l = expand_simple_binop (himode, code, gen_lowpart (himode, op1_l), + gen_lowpart (himode, op2_l), NULL_RTX, + 1, OPTAB_DIRECT); + res_h = expand_simple_binop (himode, code, gen_lowpart (himode, op1_h), + gen_lowpart (himode, op2_h), NULL_RTX, + 1, OPTAB_DIRECT); + gcc_assert (res_l && res_h); + + /* Merge the data back into the right place. */ + d.target = dest; + d.op0 = gen_lowpart (qimode, res_l); + d.op1 = gen_lowpart (qimode, res_h); + d.vmode = qimode; + d.nelt = GET_MODE_NUNITS (qimode); + d.one_operand_p = false; + d.testing_p = false; + + if (qimode == V16QImode) +{ + /* For SSE2, we used an full interleave, so the desired +results are in the even elements. */ + for (i = 0; i < 16; ++i) + d.perm[i] = i * 2; +} + else +{ + /* For AVX, the interleave used above was not cross-lane. So the +extraction is evens but with the second and third quarter swapped. +Happily, that is even one insn shorter than even extraction. */ + for (i = 0; i < 32; ++i) + d.perm[i] = i * 2 + ((i & 24) == 8 ? 16 : (i & 24) == 16 ? -16 : 0); +} + + ok = ix86_expand_vec_perm_const_1 (&d); + gcc_assert (ok); + + set_unique_reg_note (get_last_insn (), REG_EQUAL, + gen_rtx_fmt_ee (code, qimode, op1, op2)); +} + void ix86_expand_sse2_mulv4si3 (rtx op0, rtx op1, rtx op2) { diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 6c54d33..2f361a6 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -5213,70 +5213,13 @@ (set_attr "prefix" "orig,vex") (set_attr "mode" "TI")]) -(define_insn_and_split "mul3" +(define_expand "mul3" [(set (match_operand:VI1_AVX2 0 "register_operand") (mult:VI1_AVX2 (match_operand:VI1_AVX2 1 "register_operand") (match_operand:VI1_AVX2 2 "register_operand")))] - "TARGET_SSE2 - && can_create_pseudo_p ()" - "#" - "&& 1" - [(const_int 0)] + "TARGET_SSE2" { - rtx t[6]; - int i; - enum machine_mode mulmode = mode; - - for (i = 0; i < 6; ++i) -t[i] = gen_reg_rtx (mode); - - /* Unpack data such that we've got a source byte in each low byte of - each word. We don't care what goes into the high byte of each word. - Rather than trying to get zero in there, most convenient is to let - it be a copy of the low byte. */ - emit_insn (gen__interleave_high (t[0], operands[1], -
[PATCH 2/3] i386: Pass ix86_expand_sse_unpack operands by value
* config/i386/i386.c (ix86_expand_sse_unpack): Split operands[] parameter into src and dest. * config/i386/sse.md (vec_unpacku_hi_): Update call. (vec_unpacks_hi_): Likewise. (vec_unpacku_lo_): Likewise. (vec_unpacks_lo_): Likewise. * config/i386/i386-protos.h: Update. diff --git a/gcc/config/i386/i386-protos.h b/gcc/config/i386/i386-protos.h index 4e7469d..88de8ed 100644 --- a/gcc/config/i386/i386-protos.h +++ b/gcc/config/i386/i386-protos.h @@ -130,7 +130,7 @@ extern bool ix86_expand_fp_vcond (rtx[]); extern bool ix86_expand_int_vcond (rtx[]); extern void ix86_expand_vec_perm (rtx[]); extern bool ix86_expand_vec_perm_const (rtx[]); -extern void ix86_expand_sse_unpack (rtx[], bool, bool); +extern void ix86_expand_sse_unpack (rtx, rtx, bool, bool); extern bool ix86_expand_int_addcc (rtx[]); extern rtx ix86_expand_call (rtx, rtx, rtx, rtx, rtx, bool); extern void ix86_split_call_vzeroupper (rtx, rtx); diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index e23c418..7ae2060 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -20187,10 +20187,10 @@ ix86_expand_vec_perm (rtx operands[]) true if we want the N/2 high elements, else the low elements. */ void -ix86_expand_sse_unpack (rtx operands[2], bool unsigned_p, bool high_p) +ix86_expand_sse_unpack (rtx dest, rtx src, bool unsigned_p, bool high_p) { - enum machine_mode imode = GET_MODE (operands[1]); - rtx tmp, dest; + enum machine_mode imode = GET_MODE (src); + rtx tmp; if (TARGET_SSE4_1) { @@ -20252,20 +20252,20 @@ ix86_expand_sse_unpack (rtx operands[2], bool unsigned_p, bool high_p) if (GET_MODE_SIZE (imode) == 32) { tmp = gen_reg_rtx (halfmode); - emit_insn (extract (tmp, operands[1])); + emit_insn (extract (tmp, src)); } else if (high_p) { /* Shift higher 8 bytes to lower 8 bytes. */ tmp = gen_reg_rtx (imode); emit_insn (gen_sse2_lshrv1ti3 (gen_lowpart (V1TImode, tmp), -gen_lowpart (V1TImode, operands[1]), +gen_lowpart (V1TImode, src), GEN_INT (64))); } else - tmp = operands[1]; + tmp = src; - emit_insn (unpack (operands[0], tmp)); + emit_insn (unpack (dest, tmp)); } else { @@ -20295,15 +20295,13 @@ ix86_expand_sse_unpack (rtx operands[2], bool unsigned_p, bool high_p) gcc_unreachable (); } - dest = gen_lowpart (imode, operands[0]); - if (unsigned_p) tmp = force_reg (imode, CONST0_RTX (imode)); else tmp = ix86_expand_sse_cmp (gen_reg_rtx (imode), GT, CONST0_RTX (imode), - operands[1], pc_rtx, pc_rtx); + src, pc_rtx, pc_rtx); - emit_insn (unpack (dest, operands[1], tmp)); + emit_insn (unpack (gen_lowpart (imode, dest), src, tmp)); } } diff --git a/gcc/config/i386/sse.md b/gcc/config/i386/sse.md index 2f361a6..c7c6392 100644 --- a/gcc/config/i386/sse.md +++ b/gcc/config/i386/sse.md @@ -7818,25 +7818,25 @@ [(match_operand: 0 "register_operand") (match_operand:VI124_AVX2 1 "register_operand")] "TARGET_SSE2" - "ix86_expand_sse_unpack (operands, false, false); DONE;") + "ix86_expand_sse_unpack (operands[0], operands[1], false, false); DONE;") (define_expand "vec_unpacks_hi_" [(match_operand: 0 "register_operand") (match_operand:VI124_AVX2 1 "register_operand")] "TARGET_SSE2" - "ix86_expand_sse_unpack (operands, false, true); DONE;") + "ix86_expand_sse_unpack (operands[0], operands[1], false, true); DONE;") (define_expand "vec_unpacku_lo_" [(match_operand: 0 "register_operand") (match_operand:VI124_AVX2 1 "register_operand")] "TARGET_SSE2" - "ix86_expand_sse_unpack (operands, true, false); DONE;") + "ix86_expand_sse_unpack (operands[0], operands[1], true, false); DONE;") (define_expand "vec_unpacku_hi_" [(match_operand: 0 "register_operand") (match_operand:VI124_AVX2 1 "register_operand")] "TARGET_SSE2" - "ix86_expand_sse_unpack (operands, true, true); DONE;") + "ix86_expand_sse_unpack (operands[0], operands[1], true, true); DONE;") ; ;; -- 1.7.10.2
[PATCH 3/3] PR target/53749
* config/i386/i386.c (ix86_rtx_costs): Add reasonable costs for V*QImode shifts and multiply. (ix86_expand_vecop_qihi): Support shifts. * config/i386/i386.md (any_shift): New code iterator. * config/i386/sse.md (ashlv16qi3): Merge ... (v16qi3): ... into ... (3): ... here. Use ix86_expand_vecop_qihi to support SSE and AVX. diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c index 7ae2060..fc30632 100644 --- a/gcc/config/i386/i386.c +++ b/gcc/config/i386/i386.c @@ -31938,9 +31938,10 @@ ix86_set_reg_reg_cost (enum machine_mode mode) scanned. In either case, *TOTAL contains the cost result. */ static bool -ix86_rtx_costs (rtx x, int code, int outer_code_i, int opno, int *total, +ix86_rtx_costs (rtx x, int code_i, int outer_code_i, int opno, int *total, bool speed) { + enum rtx_code code = (enum rtx_code) code_i; enum rtx_code outer_code = (enum rtx_code) outer_code_i; enum machine_mode mode = GET_MODE (x); const struct processor_costs *cost = speed ? ix86_cost : &ix86_size_cost; @@ -32045,7 +32046,31 @@ ix86_rtx_costs (rtx x, int code, int outer_code_i, int opno, int *total, /* ??? Should be SSE vector operation cost. */ /* At least for published AMD latencies, this really is the same as the latency for a simple fpu operation like fabs. */ - *total = cost->fabs; + /* V*QImode is emulated with 1-11 insns. */ + if (mode == V16QImode || mode == V32QImode) + { + int count; + if (TARGET_XOP && mode == V16QImode) + { + /* For XOP we use vpshab, which requires a broadcast of the +value to the variable shift insn. For constants this +means a V16Q const in mem; even when we can perform the +shift with one insn set the cost to prefer paddb. */ + if (CONSTANT_P (XEXP (x, 1))) + { + *total = (cost->fabs + + rtx_cost (XEXP (x, 0), code, 0, speed) + + (speed ? 2 : COSTS_N_BYTES (16))); + return true; + } + count = 3; + } + else + count = TARGET_SSSE3 ? 7 : 11; + *total = cost->fabs * count; + } + else + *total = cost->fabs; return false; } if (GET_MODE_SIZE (mode) < UNITS_PER_WORD) @@ -32119,9 +32144,15 @@ ix86_rtx_costs (rtx x, int code, int outer_code_i, int opno, int *total, } else if (GET_MODE_CLASS (mode) == MODE_VECTOR_INT) { + /* V*QImode is emulated with 7-13 insns. */ + if (mode == V16QImode || mode == V32QImode) + { + int extra = TARGET_XOP ? 5 : TARGET_SSSE3 ? 6 : 11; + *total = cost->fmul * 2 + cost->fabs * extra; + } /* Without sse4.1, we don't have PMULLD; it's emulated with 7 insns, including two PMULUDQ. */ - if (mode == V4SImode && !(TARGET_SSE4_1 || TARGET_AVX)) + else if (mode == V4SImode && !(TARGET_SSE4_1 || TARGET_AVX)) *total = cost->fmul * 2 + cost->fabs * 5; else *total = cost->fmul; @@ -38448,44 +38479,66 @@ ix86_expand_vecop_qihi (enum rtx_code code, rtx dest, rtx op1, rtx op2) rtx (*gen_ih) (rtx, rtx, rtx); rtx op1_l, op1_h, op2_l, op2_h, res_l, res_h; struct expand_vec_perm_d d; - bool ok; + bool ok, full_interleave; + bool uns_p = false; int i; - if (qimode == V16QImode) + switch (qimode) { +case V16QImode: himode = V8HImode; gen_il = gen_vec_interleave_lowv16qi; gen_ih = gen_vec_interleave_highv16qi; -} - else if (qimode == V32QImode) -{ + break; +case V32QImode: himode = V16HImode; gen_il = gen_avx2_interleave_lowv32qi; gen_ih = gen_avx2_interleave_highv32qi; + break; +default: + gcc_unreachable (); } - else -gcc_unreachable (); - /* Unpack data such that we've got a source byte in each low byte of - each word. We don't care what goes into the high byte of each word. - Rather than trying to get zero in there, most convenient is to let - it be a copy of the low byte. */ - op1_l = gen_reg_rtx (qimode); - op1_h = gen_reg_rtx (qimode); - emit_insn (gen_il (op1_l, op1, op1)); - emit_insn (gen_ih (op1_h, op1, op1)); + op2_l = op2_h = op2; + switch (code) +{ +case MULT: + /* Unpack data such that we've got a source byte in each low byte of +each word. We don't care what goes into the high byte of each word. +Rather than trying to get zero in there, most convenient is to let +it be a copy of the low byte. */ + op2_l = gen_reg_rtx (qimode); + op2_h = gen_reg_rtx (qim
Re: [Patch] PR 51938: extend ifcombine
Hello, thanks for looking into the patch. A couple more details now that I am back from a conference: On Wed, 20 Jun 2012, Marc Glisse wrote: On Wed, 20 Jun 2012, Richard Guenther wrote: On Sun, Jun 10, 2012 at 4:16 PM, Marc Glisse wrote: Hello, currently, tree-ssa-ifcombine handles pairs of imbricated "if"s that share the same then branch, or the same else branch. There is no particular reason why it couldn't also handle the case where the then branch of one is the else branch of the other, which is what I do here. Any comments? The general idea looks good, but I think the patch is too invasive. As far as I can see the only callers with a non-zero 'inv' argument come from ifcombine_ifnotorif and ifcombine_ifnotandif (and both with inv == 2). The idea of also supporting inv==1 or inv==3 is for uniformity, and because we might at some point want to get rid of the 'or' function and implement everything in terms of the 'and' version, with suitably inverted tests. I would rather see a more localized patch that makes use of invert_tree_comparison to perform the inversion on the call arguments of maybe_fold_and/or_comparisons. I would rather have done that as well, and as a matter of fact that is what the first version of the patch did, until I realized that -ftrapping-math was the default. Is there any reason that would not work? invert_tree_comparison is useless for floating point (the case I am most interested in) unless we specify -fno-trapping-math (writing this patch taught me to add this flag to my default flags, but I can't expect everyone to do the same). An issue is that gcc mixes the behaviors of qnan and snan (it is not really an issue, it just means that !(comparison) can't be represented as comparison2). At least + if (inv & 1) +lcompcode2 = COMPCODE_TRUE - lcompcode2; looks as if it were not semantically correct - you cannot simply invert floating-point comparisons (see the restrictions invert_tree_comparison has). I don't remember all details, but I specifically thought of that, and the trapping behavior is handled a few lines below. More specifically, it has (was already there, I didn't add it): if (!honor_nans) ... else if (flag_trapping_math) { /* Check that the original operation and the optimized ones will trap under the same condition. */ bool ltrap = (lcompcode & COMPCODE_UNORD) == 0 && (lcompcode != COMPCODE_EQ) && (lcompcode != COMPCODE_ORD); ... same for rtrap and trap /* In a short-circuited boolean expression the LHS might be such that the RHS, if evaluated, will never trap. For example, in ORD (x, y) && (x < y), we evaluate the RHS only if neither x nor y is NaN. (This is a mixed blessing: for example, the expression above will never trap, hence optimizing it to x < y would be invalid). */ if ((code == TRUTH_ORIF_EXPR && (lcompcode & COMPCODE_UNORD)) || (code == TRUTH_ANDIF_EXPR && !(lcompcode & COMPCODE_UNORD))) rtrap = false; /* If the comparison was short-circuited, and only the RHS trapped, we may now generate a spurious trap. */ if (rtrap && !ltrap && (code == TRUTH_ANDIF_EXPR || code == TRUTH_ORIF_EXPR)) return NULL_TREE; /* If we changed the conditions that cause a trap, we lose. */ if ((ltrap || rtrap) != trap) ... which presumably ensures that the trapping behavior is preserved (I'll have to double-check that I didn't break that logic). Do you have an idea how this can be handled in a less intrusive way (and without duplicating too much code)? -- Marc Glisse
Coverage unlinking
This patch changes the coverage unlinking to be less aggressive. As David pointed out, when experimenting with random optimization options along side -fuse-coverage and -frandom-seed, it is inconvenient for gcc to remove the coverage data file. The reason it was doing so, is because -frandom-seed prevents a unique stamp being placed in the coverage data to distinguish execution of the newly optimized program from execution of the older incarnation of it. libgcov would get confused and give errors about mismatched function checksums. This patch tweaks things. With -fuse-coverage we read in the coverage data file and its stamp. We can feed that stamp into the stamp to be used for the output coverage data, and hence effectively distinguish generations of coverage data even when -frandom-seed prevents randomness. We also don't need to delete the data file if we're not producing coverage data for the new compilation. Manual testing shows this patch is preventing deletion of the data file in the circumstances I understand David's using. David, care to try this patch out to see if it resolves your use case? The changes to tree.c merely extend the crc32 code to allow it to generate a crc from a 32-bit unsigned, in addition to the byte variant already present. testing on i686-pc-linux-gnu shows no regressions. nathan 2012-06-23 Nathan Sidwell * coverage.c (bbg_file_stamp): New. (read_counts_file): Merge incoming stamp with bbg_file_stamp. (build_info): Write bbg_file_stamp. (coverage_init): Initialize bbg_file_stamp. Read counts file before writing graph header. (coverage_finish): Don't unlink the data file if we can generate a unique file stamp. * tree.h (crc32_unsigned): Declare. * tree.c (crc32_unsigned_bits): New, broken out of ... (crc32_byte): ... here. Use it. (crc32_unsigned): New. Index: coverage.c === --- coverage.c (revision 188900) +++ coverage.c (working copy) @@ -101,6 +101,9 @@ static GTY(()) tree gcov_fn_info_ptr_typ we're not writing to the notes file. */ static char *bbg_file_name; +/* File stamp for graph file. */ +static unsigned bbg_file_stamp; + /* Name of the count data file. */ static char *da_file_name; @@ -205,8 +208,9 @@ read_counts_file (void) return; } - /* Read and discard the stamp. */ - gcov_read_unsigned (); + /* Read the stamp, used for creating a generation count. */ + tag = gcov_read_unsigned (); + bbg_file_stamp = crc32_unsigned (bbg_file_stamp, tag); counts_hash = htab_create (10, htab_counts_entry_hash, htab_counts_entry_eq, @@ -905,7 +909,7 @@ build_info (tree info_type, tree fn_ary) /* stamp */ CONSTRUCTOR_APPEND_ELT (v1, info_fields, build_int_cstu (TREE_TYPE (info_fields), - local_tick)); + bbg_file_stamp)); info_fields = DECL_CHAIN (info_fields); /* Filename */ @@ -1101,6 +1105,11 @@ coverage_init (const char *filename) memcpy (da_file_name + prefix_len, filename, len); strcpy (da_file_name + prefix_len + len, GCOV_DATA_SUFFIX); + bbg_file_stamp = local_tick; + + if (flag_branch_probabilities) +read_counts_file (); + /* Name of bbg file. */ if (flag_test_coverage && !flag_compare_debug) { @@ -1117,12 +1126,9 @@ coverage_init (const char *filename) { gcov_write_unsigned (GCOV_NOTE_MAGIC); gcov_write_unsigned (GCOV_VERSION); - gcov_write_unsigned (local_tick); + gcov_write_unsigned (bbg_file_stamp); } } - - if (flag_branch_probabilities) -read_counts_file (); } /* Performs file-level cleanup. Close graph file, generate coverage @@ -1133,10 +1139,11 @@ coverage_finish (void) { if (bbg_file_name && gcov_close ()) unlink (bbg_file_name); - - if (!local_tick || local_tick == (unsigned)-1) -/* Only remove the da file, if we cannot stamp it. If we can - stamp it, libgcov will DTRT. */ + + if (!flag_branch_probabilities && flag_test_coverage + && (!local_tick || local_tick == (unsigned)-1)) +/* Only remove the da file, if we're emitting coverage code and + cannot uniquely stamp it. If we can stamp it, libgcov will DTRT. */ unlink (da_file_name); if (coverage_obj_init ()) Index: tree.c === --- tree.c (revision 188900) +++ tree.c (working copy) @@ -8734,23 +8734,37 @@ dump_tree_statistics (void) /* Generate a crc32 of a byte. */ -unsigned -crc32_byte (unsigned chksum, char byte) +static unsigned +crc32_unsigned_bits (unsigned chksum, unsigned value, unsigned bits) { - unsigned value = (unsigned) byte << 24; - unsigned ix; + unsigned ix; - for (ix = 8; ix--; value <<= 1) - { - unsigned feedback; - - feedback = (value ^ chksum) & 0x8000 ? 0x04c11db7 : 0; - chksum <<= 1; - chksum ^= feedback; - } + for (ix = bits; ix--; value <<= 1) +{ + unsigned feedback; + + f
Re: [gimplefe] creating individual gimple_assign statements
On Thu, Jun 21, 2012 at 11:33 AM, Diego Novillo wrote: > On 12-06-20 23:04 , Sandeep Soni wrote: >> >> Hi, >> >> This patch creates basic gimple_assign statements. It is a little raw >> not considering all types of gimple_assign statements for which I have >> already started working. >> >> Here is the Changelog. >> >> 2012-06-09 Sandeep Soni >> >> * parser.c (gimple_symtab_get): New. >> (gimple_symtab_get_token): New. >> (gp_parse_expect_lhs): Returns tree node. >> (gp_parse_expect_rhs_op): Returns the op as tree node. >> (gp_parse_assign_stmt) : Builds gimple_assign statement. > > > Align the '(' with the '*'. > > >> >> Index: gcc/gimple/parser.c >> === >> --- gcc/gimple/parser.c (revision 188546) >> +++ gcc/gimple/parser.c (working copy) >> @@ -105,6 +105,7 @@ >> gimple_symtab_eq_hash, NULL); >> } >> >> + >> /* Registers DECL with the gimple symbol table as having identifier ID. >> */ >> >> static void >> @@ -123,6 +124,41 @@ >> *slot = new_entry; >> } >> >> + >> +/* Gets the tree node for the corresponding identifier ID */ >> + >> +static tree >> +gimple_symtab_get (tree id) >> +{ >> + struct gimple_symtab_entry_def temp; >> + gimple_symtab_entry_t entry; >> + void **slot; >> + >> + gimple_symtab_maybe_init_hash_table(); >> + temp.id = id; >> + slot = htab_find_slot (gimple_symtab, &temp, NO_INSERT); >> + if (slot) >> + { >> + entry = (gimple_symtab_entry_t) *slot; >> + return entry->decl; >> + } >> + else >> + return NULL; >> +} >> + >> + >> +/* Gets the tree node of token TOKEN from the global gimple symbol table. >> */ >> + >> +static tree >> +gimple_symtab_get_token (const gimple_token *token) >> +{ >> + const char *name = gl_token_as_text(token); >> + tree id = get_identifier(name); > > > Space before '('. > > >> + tree decl = gimple_symtab_get (id); >> + return decl; >> +} >> + >> + >> /* Return the string representation of token TOKEN. */ >> >> static const char * >> @@ -360,10 +396,11 @@ >> /* Helper for gp_parse_assign_stmt. The token read from reader PARSER >> should >> be the lhs of the tuple. */ >> >> -static void >> +static tree >> gp_parse_expect_lhs (gimple_parser *parser) >> { >> const gimple_token *next_token; >> + tree lhs; >> >> /* Just before the name of the identifier we might get the symbol >> of dereference too. If we do get it then consume that token, else >> @@ -372,18 +409,22 @@ >> if (next_token->type == CPP_MULT) >> next_token = gl_consume_token (parser->lexer); >> >> - gl_consume_expected_token (parser->lexer, CPP_NAME); >> + next_token = gl_consume_token (parser->lexer); >> + lhs = gimple_symtab_get_token (next_token); >> gl_consume_expected_token (parser->lexer, CPP_COMMA); >> + return lhs; >> + >> } >> >> >> /* Helper for gp_parse_assign_stmt. The token read from reader PARSER >> should >> be the first operand in rhs of the tuple. */ >> >> -static void >> +static tree >> gp_parse_expect_rhs_op (gimple_parser *parser) >> { >> const gimple_token *next_token; >> + tree rhs = NULL_TREE; >> >> next_token = gl_peek_token (parser->lexer); >> >> @@ -402,11 +443,13 @@ >> case CPP_NUMBER: >> case CPP_STRING: >> next_token = gl_consume_token (parser->lexer); >> + rhs = gimple_symtab_get_token (next_token); >> break; >> >> default: >> break; >> } >> + >> } > > > No empty line here. > > The function returns nothing? > > >> >> >> @@ -420,9 +463,10 @@ >> gimple_token *optoken; >> enum tree_code opcode; >> enum gimple_rhs_class rhs_class; >> + tree op1 = NULL_TREE, op2 = NULL_TREE, op3 = NULL_TREE; >> >> opcode = gp_parse_expect_subcode (parser, &optoken); >> - gp_parse_expect_lhs (parser); >> + tree lhs = gp_parse_expect_lhs (parser); >> >> rhs_class = get_gimple_rhs_class (opcode); >> switch (rhs_class) >> @@ -436,16 +480,16 @@ >> case GIMPLE_UNARY_RHS: >> case GIMPLE_BINARY_RHS: >> case GIMPLE_TERNARY_RHS: >> - gp_parse_expect_rhs_op (parser); >> + op1 = gp_parse_expect_rhs_op (parser); >> if (rhs_class == GIMPLE_BINARY_RHS || rhs_class == >> GIMPLE_TERNARY_RHS) >> { >> gl_consume_expected_token (parser->lexer, CPP_COMMA); >> - gp_parse_expect_rhs_op (parser); >> + op2 = gp_parse_expect_rhs_op (parser); >> } >> if (rhs_class == GIMPLE_TERNARY_RHS) >> { >> gl_consume_expected_token (parser->lexer, CPP_COMMA); >> - gp_parse_expect_rhs_op (parser); >> + op3 = gp_parse_expect_rhs_op (parser); >> } >> break; >> >> @@ -454,6 +498,9 @@ >> } >> >> gl_consume_expected_token (parser->lexer, CPP_GREATER); >> + >> + gimple stmt = gimple_build_assign_with_ops (code, lhs, op1, op2, op3); >> + gcc_assert(verify_gimple_stmt(stmt)); > > > Space
Re: [Patch] PR 51938: extend ifcombine
On Wed, 20 Jun 2012, Marc Glisse wrote: On Wed, 20 Jun 2012, Richard Guenther wrote: On Sun, Jun 10, 2012 at 4:16 PM, Marc Glisse wrote: Hello, currently, tree-ssa-ifcombine handles pairs of imbricated "if"s that share the same then branch, or the same else branch. There is no particular reason why it couldn't also handle the case where the then branch of one is the else branch of the other, which is what I do here. Any comments? The general idea looks good, but I think the patch is too invasive. As far as I can see the only callers with a non-zero 'inv' argument come from ifcombine_ifnotorif and ifcombine_ifnotandif (and both with inv == 2). I would rather see a more localized patch that makes use of invert_tree_comparison to perform the inversion on the call arguments of maybe_fold_and/or_comparisons. Is there any reason that would not work? invert_tree_comparison is useless for floating point (the case I am most interested in) unless we specify -fno-trapping-math (writing this patch taught me to add this flag to my default flags, but I can't expect everyone to do the same). An issue is that gcc mixes the behaviors of qnan and snan (it is not really an issue, it just means that !(comparison) can't be represented as comparison2). Actually, what would you think of s/flag_trapping_math/flag_signaling_nans/ in invert_tree_comparison? Are there other possible ways ahaving a sNaN for a or b? -- Marc Glisse
Re: Coverage unlinking
thanks for the fix. It works fine for me. David On Sat, Jun 23, 2012 at 11:00 AM, Nathan Sidwell wrote: > This patch changes the coverage unlinking to be less aggressive. As David > pointed out, when experimenting with random optimization options along side > -fuse-coverage and -frandom-seed, it is inconvenient for gcc to remove the > coverage data file. The reason it was doing so, is because -frandom-seed > prevents a unique stamp being placed in the coverage data to distinguish > execution of the newly optimized program from execution of the older > incarnation of it. libgcov would get confused and give errors about > mismatched function checksums. > > This patch tweaks things. With -fuse-coverage we read in the coverage data > file and its stamp. We can feed that stamp into the stamp to be used for > the output coverage data, and hence effectively distinguish generations of > coverage data even when -frandom-seed prevents randomness. We also don't > need to delete the data file if we're not producing coverage data for the > new compilation. > > Manual testing shows this patch is preventing deletion of the data file in > the circumstances I understand David's using. David, care to try this patch > out to see if it resolves your use case? > > The changes to tree.c merely extend the crc32 code to allow it to generate a > crc from a 32-bit unsigned, in addition to the byte variant already present. > > testing on i686-pc-linux-gnu shows no regressions. > > nathan
Re: libgo patch committed: Copy runtime_printf from other library
mich...@chamberlain.net.au writes: > On 25/05/2012 6:45 AM, Ian Lance Taylor wrote: >> This patch to libgo copies the implementation of runtime_printf ... > > The actual patch attached to the message was the one you had earlier > posted for the Go 1 release compatibility with the off-by-1 error in > runtime.Callers, rather than the described patch. Sorry, here is the correct patch. Ian Index: libgo/runtime/msize.c === --- libgo/runtime/msize.c (revision 187847) +++ libgo/runtime/msize.c (revision 187848) @@ -103,7 +103,7 @@ runtime_InitSizes(void) sizeclass++; } if(sizeclass != NumSizeClasses) { - // runtime_printf("sizeclass=%d NumSizeClasses=%d\n", sizeclass, NumSizeClasses); + runtime_printf("sizeclass=%d NumSizeClasses=%d\n", sizeclass, NumSizeClasses); runtime_throw("InitSizes - bad NumSizeClasses"); } @@ -122,13 +122,13 @@ runtime_InitSizes(void) for(n=0; n < MaxSmallSize; n++) { sizeclass = runtime_SizeToClass(n); if(sizeclass < 1 || sizeclass >= NumSizeClasses || runtime_class_to_size[sizeclass] < n) { -// runtime_printf("size=%d sizeclass=%d runtime_class_to_size=%d\n", n, sizeclass, runtime_class_to_size[sizeclass]); -// runtime_printf("incorrect SizeToClass"); +runtime_printf("size=%d sizeclass=%d runtime_class_to_size=%d\n", n, sizeclass, runtime_class_to_size[sizeclass]); +runtime_printf("incorrect SizeToClass"); goto dump; } if(sizeclass > 1 && runtime_class_to_size[sizeclass-1] >= n) { -// runtime_printf("size=%d sizeclass=%d runtime_class_to_size=%d\n", n, sizeclass, runtime_class_to_size[sizeclass]); -// runtime_printf("SizeToClass too big"); +runtime_printf("size=%d sizeclass=%d runtime_class_to_size=%d\n", n, sizeclass, runtime_class_to_size[sizeclass]); +runtime_printf("SizeToClass too big"); goto dump; } } Index: libgo/runtime/thread-linux.c === --- libgo/runtime/thread-linux.c (revision 187847) +++ libgo/runtime/thread-linux.c (revision 187848) @@ -68,7 +68,7 @@ runtime_futexwakeup(uint32 *addr, uint32 // I don't know that futex wakeup can return // EAGAIN or EINTR, but if it does, it would be // safe to loop and call futex again. - runtime_printf("futexwakeup addr=%p returned %lld\n", addr, (long long)ret); + runtime_printf("futexwakeup addr=%p returned %D\n", addr, ret); *(int32*)0x1006 = 0x1006; } Index: libgo/runtime/chan.c === --- libgo/runtime/chan.c (revision 187847) +++ libgo/runtime/chan.c (revision 187848) @@ -100,8 +100,8 @@ runtime_makechan_c(ChanType *t, int64 hi c->dataqsiz = hint; if(debug) - runtime_printf("makechan: chan=%p; elemsize=%lld; elemalign=%d; dataqsiz=%d\n", - c, (long long)elem->__size, elem->__align, c->dataqsiz); + runtime_printf("makechan: chan=%p; elemsize=%D; elemalign=%d; dataqsiz=%d\n", + c, (int64)elem->__size, elem->__align, c->dataqsiz); return c; } Index: libgo/runtime/malloc.goc === --- libgo/runtime/malloc.goc (revision 187847) +++ libgo/runtime/malloc.goc (revision 187848) @@ -139,7 +139,7 @@ __go_free(void *v) m->mallocing = 1; if(!runtime_mlookup(v, nil, nil, &s)) { - // runtime_printf("free %p: not an allocated block\n", v); + runtime_printf("free %p: not an allocated block\n", v); runtime_throw("free runtime_mlookup"); } prof = runtime_blockspecial(v); @@ -367,7 +367,7 @@ runtime_mallocinit(void) if(p == nil) runtime_throw("runtime: cannot reserve arena virtual address space"); if((uintptr)p & (((uintptr)1<__code != GO_PTR) { - // runtime_printf("runtime.SetFinalizer: first argument is %S, not pointer\n", *obj.type->string); + runtime_printf("runtime.SetFinalizer: first argument is %S, not pointer\n", *obj.__type_descriptor->__reflection); goto throw; } if(!runtime_mlookup(obj.__object, &base, &size, nil) || obj.__object != base) { - // runtime_printf("runtime.SetFinalizer: pointer not at beginning of allocated block\n"); + runtime_printf("runtime.SetFinalizer: pointer not at beginning of allocated block\n"); goto throw; } ft = nil; @@ -494,7 +494,7 @@ func SetFinalizer(obj Eface, finalizer E return; badfunc: - // runtime_printf("runtime.SetFinalizer: second argument is %S, not func(%S)\n", *finalizer.type->string, *obj.type->string); + runtime_printf("runtime.SetFinalizer: second argument is %S, not func(%S)\n", *finalizer.__type_descriptor->__reflection, *obj.__type_descriptor->__reflection); throw: runtime_throw("runtime.SetFinalizer"); } Index: libgo/runtime/go-print.c === --- libgo/runtime/go-print.c (revision 187847) +++ libgo/runtime/go-print.c (revision 187848) @@ -8,6 +8,7 @@ #include #include +#include "runtime.h" #include "array.h" #include
ping x3 : latch mem to reg before multi-access in convert_move
Hello, ping # 3 for http://gcc.gnu.org/ml/gcc-patches/2012-04/msg00298.html This is related to convert-move possibly emitting a sequence with multiple accesses to one input, triggering multiple memory accesses when that input happens to be a mem. > The original problem we had with this was the introduction of an > artificial race condition in addition to the potential performance > impact. > While our original testcases don't expose the problem with current > versions of the compiler, the issue appears to remain latent and the > change still looks sensible in any case. ... > * expr.c (convert_move): Latch mem integer inputs into a > register before expanding a multi-instructions sequence. Thanks in advance, Olivier
ping x2: fix mem+32760 ICE on powerpc - introduce predicates weaker than mode_dependent_address_p
Hello, Ping #2 for the non-back-end parts of http://gcc.gnu.org/ml/gcc-patches/2012-04/msg01668.html Thanks much in advance, Olivier > David approved the rs6000 parts already (with adjustments to > comments, http://gcc.gnu.org/ml/gcc-patches/2012-05/msg00368.html) > > Thanks much in advance for your feedback, > > Olivier > > On Apr 26, 2012, at 11:30 , Olivier Hainque wrote: > ... >> a number of places in the compiler use the >> mode_dependent_address_p predicate to actually check for weaker necesssary >> conditions. Typically, a few places need to check that a MEM access remains >> valid when just narrowing the mode, while mode_dependent_address_p tells if >> any mode change is valid. >> >> While this is of course generally safe, this has been causing endless >> troubles >> to the powerpc back-end which has apparently unique particularities related >> to >> altivec modes. > ... >> The attached patch is a proposal to fix this, slightly generalized compared >> to the original one. The general idea is to allow for weaker predicates at >> the places where we need them. >> >> This is achieved by the introduction of a TARGET_MAY_NARROW_ACCESS target >> hook, which defaults to !mode_dependent_address_p and is redefined for >> powerpc. The patch uses this hook directly instead of the former predicate in >> a couple of places where this was the intent already, as well as new >> "valid_access_mode_change_p" function to direct to one or the or the other >> depending on provided original and destination modes. >> >> This provides a better match for actual internal needs, allows to get rid of >> the powerpc back-end twists (no need to lie in mode_dependent_address_p any >> more) and cures the observed internal compiler error. > ... >> 2012-04-26 Olivier Hainque >> >> * target.def (TARGET_MAY_NARROW_ACCESS_TO): New hook. >> * doc/tm.texi[.in] (TARGET_MAY_NARROW_ACCESS_TO): Document. >> * targhooks.c (default_may_narrow_access_to): Default implementation. >> * targhooks.h (default_may_narrow_access_to): Declare. >> * config/rs6000/rs6000.c (rs6000_may_narrow_access_to): Specific >> implementation. >> (rs6000_mode_dependent_address): Stop lying for + const_int. >> (rs6000_offsettable_memref_p): Adjust comments accordingly. >> * expr.c (convert_move): Use may_narrow_access_to instead of >> mode_dependent_address_p where appropriate. >> * recog.c (offsettable_address_addr_space_p): Likewise. >> (valid_access_mode_change_p): New function. >> * recog.h (valid_access_mode_change_p): Declare. >> * simplify-rtx.c (simplify_subreg): Use it instead of >> mode_dependent_address_p where appropriate. >> >> testsuite/ >> * gcc.dg/offsetmem.c: New test. >> >> >