Re: target specific builtin expansion (middle end and back end definition inconsistence problem?).
Hi Ian, 2012/4/22 Ian Lance Taylor : > Feng LI writes: > >> Yes, you are right. But how could I reference to a backend defined builtin >> function in the middle end (I need to generate the builtin function in the >> middle end and expand it in x86 backend)? > > If the function doesn't have a machine-independent definition, then use > a target hook. Then I remove the duplicate builtin definition in x86 backend. I define the builtin function with built_in_class as BUILT_IN_MD in builtins.def. So that in the expand_builtin, the target specific hook will be called: if (DECL_BUILT_IN_CLASS (fndecl) == BUILT_IN_MD) return targetm.expand_builtin (exp, target, subtarget, mode, ignore); Then in the middle end I could reference to this builtin function with tree tcreate_fn = built_in_decls[BUILT_IN_TCREATE], and it'll call the target specific expansion in the backend. The problem happens as the code below shows: builtin_expand (){ enum built_in_function fcode = DECL_FUNCTION_CODE (fndecl); } ix86_expand_builtin{ enum ix86_builtins fcode = DECL_FUNCTION_CODE (fndecl); } The builtin codes (enum built_in_function and ix86_builtins_fcode) are different sets in both middle end and backend, so I end up with trapping into this code block: if (ix86_builtins_isa[fcode].isa && !(ix86_builtins_isa[fcode].isa & ix86_isa_flags)) { char *opts = ix86_target_string (ix86_builtins_isa[fcode].isa, 0, NULL, NULL, NULL, false); if (!opts) error ("%qE needs unknown isa option", fndecl); else { gcc_assert (opts != NULL); error ("%qE needs isa option %s", fndecl, opts); free (opts); } return const0_rtx; } Not sure if I'm doing it in the right way, help needed... Thanks, Feng > > (That said I've thought for a while that we need better mechanisms for > target-specific optimization passes. If we had those I would tell you > to write one. E.g., reg-stack.c is a target-specific pass.) > > Ian
Re: Attempting changes to the GIMPLifier
2012/4/22 : > Dear all, > > I have a few questions regarding how to augment the information dumped in > "004t" GIMPLE dumps (prior any optimization). > > My main concerns are: > > 1. Printing global variables. Look at the cgraph (.000i.cgraph) dump. > 2. Preserving function arguments (what I call an "interface"). I think we do that now. > Both 1 and 2 are not currently addressed, at least in the gcc-4.5.1 and > gcc-4.6.0 gimplifiers that I work with. > > Is this information available in internal data structures so I can expose it > via use of the GIMPLE API? > > I've also noticed inconsistencies among GIMPLE dumps produced following > different optimizations, but this is another topic. > > Thanks in advance. > > Best regards, > Nikolaos Kavvadias > >
Re: [RFC] Converting end of loop computations to MIN_EXPRs.
On Sun, Apr 22, 2012 at 8:50 AM, Ramana Radhakrishnan wrote: > Hi, > > A colleague noticed that we were not vectorizing loops that had end of > loop computations that were MIN type operations that weren't expressed > in the form of a typical min operation. A transform from (i < x ) && > ( i < y) to ( i < min (x, y)) is only something that we should do in > these situations rather than as a general transformation where we > might be able to end up generating slightly better code because the > condition would end up being dependent on an invariant outside the > loop. However in the general case such a transformation would is not > advisable - it's up to the reader to work that out. I don't exactly understand why the general transform is not advisable. We already synthesize min/max operations. Can you elaborate on why you think that better code might be generated when not doing this transform? > #define min(x,y) ((x) <= (y) ? (x) : (y)) > > void foo (int x, int y, int * a, int * b, int *c) > { > int i; > > for (i = 0; > i < x && i < y; > /* i < min (x, y); */ > i++) > a[i] = b[i] * c[i]; > > } > > The patch below deals with this case and I'm guessing that it could > also handle more of the comparison cases and come up with more > intelligent choices and should be made quite a lot more robust than > what it is right now. Yes. At least if you have i < 5 && i < y we canonicalize it to i <= 4 && i < y, so your pattern matching would fail. Btw, the canonical case this happens in is probably for (i = 0; i < n; ++i) for (j = 0; j < m && j < i; ++j) a[i][j] = ... thus iterating over the lower/upper triangular part of a non-square matrix (including or not including the diagonal, thus also j < m && j <= i) Richard. > regards, > Ramana > > > > diff --git a/gcc/tree-ssa-loop-im.c b/gcc/tree-ssa-loop-im.c > index ce5eb20..a529536 100644 > --- a/gcc/tree-ssa-loop-im.c > +++ b/gcc/tree-ssa-loop-im.c > @@ -563,6 +563,7 @@ stmt_cost (gimple stmt) > > switch (gimple_assign_rhs_code (stmt)) > { > + case MIN_EXPR: > case MULT_EXPR: > case WIDEN_MULT_EXPR: > case WIDEN_MULT_PLUS_EXPR: > @@ -971,6 +972,124 @@ rewrite_reciprocal (gimple_stmt_iterator *bsi) > return stmt1; > } > > +/* We look for a sequence that is : > + def_stmt1 : x = a < b > + def_stmt2 : y = a < c > + stmt: z = x & y > + use_stmt_cond: if ( z != 0) > + > + where b, c are loop invariant . > + > + In which case we might as well replace this by : > + > + t = min (b, c) > + if ( a < t ) > +*/ > + > +static gimple > +rewrite_min_test (gimple_stmt_iterator *bsi) > +{ > + gimple stmt, def_stmt_x, def_stmt_y, use_stmt_cond, stmt1; > + tree x, y, z, a, b, c, var, t, name; > + use_operand_p use; > + bool is_lhs_of_comparison = false; > + > + stmt = gsi_stmt (*bsi); > + z = gimple_assign_lhs (stmt); > + > + /* We start by looking at whether x is used in the > + right set of conditions. */ > + if (TREE_CODE (z) != SSA_NAME > + || !single_imm_use (z, &use, &use_stmt_cond) > + || gimple_code (use_stmt_cond) != GIMPLE_COND) > + return stmt; > + > + x = gimple_assign_rhs1 (stmt); > + y = gimple_assign_rhs2 (stmt); > + > + if (TREE_CODE (x) != SSA_NAME > + || TREE_CODE (y) != SSA_NAME) > + return stmt; > + > + def_stmt_x = SSA_NAME_DEF_STMT (x); > + def_stmt_y = SSA_NAME_DEF_STMT (y); > + > + /* def_stmt_x and def_stmt_y should be of the > + form > + > + x = a cmp b > + y = a cmp c > + > + or > + > + x = b cmp a > + y = c cmp a > + */ > + if (!is_gimple_assign (def_stmt_x) > + || !is_gimple_assign (def_stmt_y) > + || (gimple_assign_rhs_code (def_stmt_x) > + != gimple_assign_rhs_code (def_stmt_y))) > + return stmt; > + > + if (gimple_assign_rhs1 (def_stmt_x) == gimple_assign_rhs1 (def_stmt_y) > + && (gimple_assign_rhs_code (def_stmt_x) == LT_EXPR > + || gimple_assign_rhs_code (def_stmt_x) == LE_EXPR)) > + { > + a = gimple_assign_rhs1 (def_stmt_x); > + b = gimple_assign_rhs2 (def_stmt_x); > + c = gimple_assign_rhs2 (def_stmt_y); > + is_lhs_of_comparison = true; > + } > + else > + { > + if (gimple_assign_rhs2 (def_stmt_x) == gimple_assign_rhs2 (def_stmt_y) > + && (gimple_assign_rhs_code (def_stmt_x) == GT_EXPR > + || gimple_assign_rhs_code (def_stmt_x) == GE_EXPR)) > + { > + a = gimple_assign_rhs2 (def_stmt_x); > + b = gimple_assign_rhs1 (def_stmt_x); > + c = gimple_assign_rhs1 (def_stmt_y); > + } > + else > + return stmt; > + } > + > + if (outermost_invariant_loop (b, loop_containing_stmt (def_stmt_x)) != NULL > + && outermost_invariant_loop (c, loop_containing_stmt > (def_stmt_y)) != NULL) > + > + { > + if (dump_file) > + fprintf (dump_file, "Found a potential transformation to min\n"); > + > + /* mintmp = min (b , c). */ > + > + var = create_tmp_var (TRE
A case where PHI-OPT pessimizes the code
Hello, I ported the code to expand switch() statements with bit tests from stmt.c to GIMPLE, and looked at the resulting code to verify that the transformation applies correctly, when I noticed this strange PHI-OPT transformation that results in worse code for the test case of PR45830 which looks like this: int foo (int *a) { switch (*a) { case 0: case 1:case 2:case 3:case 4:case 5: case 19:case 20:case 21:case 22:case 23: case 26:case 27: return 1; default: return 0; } } After transforming the switch() to a series of bit tests, the code looks like this: ;; Function foo (foo, funcdef_no=0, decl_uid=1996, cgraph_uid=0) beginning to process the following SWITCH statement (pr45830.c:8) : --- switch (D.2013_3) , case 0 ... 5: , case 19 ... 23: , case 26 ... 27: > expanding as bit test is preferableSwitch converted foo (int * a) { _Bool D.2023; long unsigned int D.2022; long unsigned int D.2021; long unsigned int csui.1; _Bool D.2019; int D.2014; int D.2013; : D.2013_3 = *a_2(D); D.2019_5 = D.2013_3 > 27; if (D.2019_5 != 0) goto (); else goto ; : D.2021_7 = (long unsigned int) D.2013_3; csui.1_4 = 1 << D.2021_7; D.2022_8 = csui.1_4 & 217579583; D.2023_9 = D.2022_8 != 0; if (D.2023_9 != 0) goto (); else goto ; : : # D.2014_1 = PHI <1(5), 0(3)> : return D.2014_1; } This is the equivalent code of what the expander in stmt.c would generate. Unfortunately, the first PHI-OPT pass (phiopt1) changes the code as follows: ;; Function foo (foo, funcdef_no=0, decl_uid=1996, cgraph_uid=0) Removing basic block 4 foo (int * a) { int D.2026; _Bool D.2025; long unsigned int D.2022; long unsigned int D.2021; long unsigned int csui.1; int D.2014; int D.2013; : D.2013_3 = *a_2(D); if (D.2013_3 > 27) goto (); else goto ; : D.2021_7 = (long unsigned int) D.2013_3; csui.1_4 = 1 << D.2021_7; D.2022_8 = csui.1_4 & 217579583; D.2025_9 = D.2022_8 != 0; D.2026_5 = (int) D.2025_9; // new statement # D.2014_1 = PHI // modified PHI node : return D.2014_1; } This results in worse code on powerpc64: BEFORE AFTER foo:foo: lwz 9,0(3) lwz 9,0(3) cmplwi 7,9,27 | cmpwi 7,9,27 bgt 7,.L4 | bgt 7,.L3 li 8,1| li 3,1 lis 10,0xcf8lis 10,0xcf8 sld 9,8,9 | sld 9,3,9 ori 10,10,63ori 10,10,63 and. 8,9,10 and. 8,9,10 li 3,1| mfcr 10 bnelr 0 | rlwinm 10,10,3,1 .L4: | xori 3,10,1 > blr > .p2align 4,,15 > .L3: li 3,0 li 3,0 blr blr BEFORE is the code that results if stmt.c expands the switch to bit tests (i.e. PHI-OPT never gets to transform the code as shown), and AFTER is with my equivalent GIMPLE implementation. Apparently, the optimizers are unable to recover from the transformation PHI-OPT performs. I am not sure how to fix this problem. I am somewhat surprised by the code generated by the powerpc backend for "t=(int)(_Bool)some_bool", because I would have expected the type range for _Bool to be <0,1> so that the type conversion should be a single bit test. On the other hand, maybe PHI-OPT should recognize this pattern and reject the transformation??? Your thoughts/comments/suggestions, please? Ciao! Steven P.S. Unfortunately I haven't been able to produce a test case that shows the problem without my switch conversion pass.
Re: A case where PHI-OPT pessimizes the code
On Mon, Apr 23, 2012 at 2:15 PM, Steven Bosscher wrote: > Hello, > > I ported the code to expand switch() statements with bit tests from > stmt.c to GIMPLE, and looked at the resulting code to verify that the > transformation applies correctly, when I noticed this strange PHI-OPT > transformation that results in worse code for the test case of PR45830 > which looks like this: > > int > foo (int *a) > { > switch (*a) > { > case 0: case 1: case 2: case 3: case 4: case 5: > case 19: case 20: case 21: case 22: case 23: > case 26: case 27: > return 1; > default: > return 0; > } > } > > > After transforming the switch() to a series of bit tests, the code > looks like this: > > > ;; Function foo (foo, funcdef_no=0, decl_uid=1996, cgraph_uid=0) > > beginning to process the following SWITCH statement (pr45830.c:8) : --- > switch (D.2013_3) , case 0 ... 5: , case 19 ... > 23: , case 26 ... 27: > > > expanding as bit test is preferableSwitch converted > > foo (int * a) > { > _Bool D.2023; > long unsigned int D.2022; > long unsigned int D.2021; > long unsigned int csui.1; > _Bool D.2019; > int D.2014; > int D.2013; > > : > D.2013_3 = *a_2(D); > D.2019_5 = D.2013_3 > 27; > if (D.2019_5 != 0) > goto (); > else > goto ; > > : > D.2021_7 = (long unsigned int) D.2013_3; > csui.1_4 = 1 << D.2021_7; > D.2022_8 = csui.1_4 & 217579583; > D.2023_9 = D.2022_8 != 0; > if (D.2023_9 != 0) > goto (); > else > goto ; > > : > > : > > # D.2014_1 = PHI <1(5), 0(3)> > : > return D.2014_1; > > } > > > This is the equivalent code of what the expander in stmt.c would > generate. Unfortunately, the first PHI-OPT pass (phiopt1) changes the > code as follows: > > ;; Function foo (foo, funcdef_no=0, decl_uid=1996, cgraph_uid=0) > > Removing basic block 4 > foo (int * a) > { > int D.2026; > _Bool D.2025; > long unsigned int D.2022; > long unsigned int D.2021; > long unsigned int csui.1; > int D.2014; > int D.2013; > > : > D.2013_3 = *a_2(D); > if (D.2013_3 > 27) > goto (); > else > goto ; > > : > D.2021_7 = (long unsigned int) D.2013_3; > csui.1_4 = 1 << D.2021_7; > D.2022_8 = csui.1_4 & 217579583; > D.2025_9 = D.2022_8 != 0; > D.2026_5 = (int) D.2025_9; // new statement > > # D.2014_1 = PHI // modified PHI node > : > return D.2014_1; > > } > > > This results in worse code on powerpc64: > > BEFORE AFTER > foo: foo: > lwz 9,0(3) lwz 9,0(3) > cmplwi 7,9,27 | cmpwi 7,9,27 > bgt 7,.L4 | bgt 7,.L3 > li 8,1 | li 3,1 > lis 10,0xcf8 lis 10,0xcf8 > sld 9,8,9 | sld 9,3,9 > ori 10,10,63 ori 10,10,63 > and. 8,9,10 and. 8,9,10 > li 3,1 | mfcr 10 > bnelr 0 | rlwinm 10,10,3,1 > .L4: | xori 3,10,1 > > blr > > .p2align 4,,15 > > .L3: > li 3,0 li 3,0 > blr blr > > BEFORE is the code that results if stmt.c expands the switch to bit > tests (i.e. PHI-OPT never gets to transform the code as shown), and > AFTER is with my equivalent GIMPLE implementation. Apparently, the > optimizers are unable to recover from the transformation PHI-OPT > performs. > > I am not sure how to fix this problem. I am somewhat surprised by the > code generated by the powerpc backend for "t=(int)(_Bool)some_bool", > because I would have expected the type range for _Bool to be <0,1> so > that the type conversion should be a single bit test. On the other > hand, maybe PHI-OPT should recognize this pattern and reject the > transformation??? > > Your thoughts/comments/suggestions, please? > > Ciao! > Steven > > > P.S. Unfortunately I haven't been able to produce a test case that > shows the problem without my switch conversion pass. int foo (_Bool b) { if (b) return 1; else return 0; } PHI-OPT tries to do conditional replacement, thus transform bb0: if (cond) goto bb2; else goto bb1; bb1: bb2: x = PHI <0 (bb1), 1 (bb0), ...>; to bb0: x' = cond; goto bb2; bb2: x = PHI ; trying to save a compare (assuming the target has a set-cc like instruction). I think the ppc backend should be fixed here (if possible), or the generic expansion of this kind of pattern needs to improve. On x86_64 we simply do (insn 7 6 8 (set (reg:SI 63 [ D.1715 ]) (zero_extend:SI (reg/v:QI 61 [ b ]))) t.c:4 -1 (nil)) Richard.
Re: A case where PHI-OPT pessimizes the code
On Mon, Apr 23, 2012 at 2:27 PM, Richard Guenther wrote: > int foo (_Bool b) > { > if (b) > return 1; > else > return 0; > } Indeed PHI-OPT performs the transformation on this code, too. But the resulting code on powerpc64 is fine: [stevenb@gcc1-power7 gcc]$ cat t.c.149t.optimized ;; Function foo (foo, funcdef_no=0, decl_uid=1996, cgraph_uid=0) foo (_Bool b) { int D.2006; : D.2006_4 = (int) b_2(D); return D.2006_4; } [stevenb@gcc1-power7 gcc]$ cat t.s .file "t.c" .section".toc","aw" .section".text" .align 2 .p2align 4,,15 .globl foo .section".opd","aw" .align 3 foo: .quad .L.foo,.TOC.@tocbase,0 .previous .type foo, @function .L.foo: blr .long 0 .byte 0,0,0,0,0,0,0,0 .size foo,.-.L.foo .ident "GCC: (GNU) 4.8.0 20120418 (experimental) [trunk revision 186580]" [stevenb@gcc1-power7 gcc]$ However, this C test case shows the problem: [stevenb@gcc1-power7 gcc]$ head -n 24 t.c #define ONEUL (1UL) int foo (long unsigned int a) { _Bool b; long unsigned int cst, csui; if (a > 27) goto return_zero; /*cst = 217579583UL;*/ cst = (ONEUL << 0) | (ONEUL << 1) | (ONEUL << 2) | (ONEUL << 3) | (ONEUL << 4) | (ONEUL << 5) | (ONEUL << 19) | (ONEUL << 20) | (ONEUL << 21) | (ONEUL << 22) | (ONEUL << 23) | (ONEUL << 26) | (ONEUL << 27); csui = (ONEUL << a); b = ((csui & cst) != 0); if (b) return 1; else return 0; return_zero: return 0; } [stevenb@gcc1-power7 gcc]$ ./cc1 -quiet -O2 -fdump-tree-all t.c [stevenb@gcc1-power7 gcc]$ cat t.s .file "t.c" .section".toc","aw" .section".text" .align 2 .p2align 4,,15 .globl foo .section".opd","aw" .align 3 foo: .quad .L.foo,.TOC.@tocbase,0 .previous .type foo, @function .L.foo: cmpldi 7,3,27 bgt 7,.L3 li 10,1 lis 9,0xcf8 sld 3,10,3 ori 9,9,63 and. 10,3,9 mfcr 9 rlwinm 9,9,3,1 xori 3,9,1 blr .p2align 4,,15 .L3: .L2: li 3,0 blr .long 0 .byte 0,0,0,0,0,0,0,0 .size foo,.-.L.foo .ident "GCC: (GNU) 4.8.0 20120418 (experimental) [trunk revision 186580]" I will file a PR for this later today, maybe after trying on a few other targets to see if this is a middle-end problem or a target issue. Ciao! Steven
Failed access check
Hello, clang gave an error on a code that compiled with gcc so far. The reduced test case is: 8<8<8<8<--- class V; struct E { E(const V& v_); char* c; V* v; int i; }; class V { private: union { char* c; struct { V* v; int i; }; }; }; E::E(const V& v_) : c(v_.c), // line 25 v(v_.v), i(v_.i) { } 8<8<8<8<--- Tried with gcc 4.4, 4.5, 4.6, 4.7, 4.8, all gave the same error: gcc-4.8 -c gccaccessbug.cpp -Wall -Wextra gccaccessbug.cpp: In constructor ‘E::E(const V&)’: gccaccessbug.cpp:16:9: error: ‘char* Vc’ is private gccaccessbug.cpp:25:7: error: within this context Line 25 is where E::c is initialized, V::c is private so the error is due. However, V::v and V::i are also private, but no diagnostic is given. If I comment out 'c(v_.c)', the source compiles w/o error. Should I file a bug report? Checked BZ for 'access control', but found nothing relevant, only bugs related to templated code. Regards, Peter
Is it possible to get unpartitioned LTO with Fortran?
Hi, I have a simple IPA pass that requires access to all function bodies in a program. For C and small Fortran programs doing this at link time causes no issues. However, when I attempt to compile a larger Fortran program the pass is called multiple times at link time, each time with only a portion of the function bodies available. Is there anyway to force unpartitioned (i.e. single execution) of my pass on these Fortran programs, with access to all the function bodies? I am using GCC 4.7 with gold linker. Cheers, Andrew -- View this message in context: http://old.nabble.com/Is-it-possible-to-get-unpartitioned-LTO-with-Fortran--tp33732132p33732132.html Sent from the gcc - Dev mailing list archive at Nabble.com.
Re: Is it possible to get unpartitioned LTO with Fortran?
> > Hi, > > I have a simple IPA pass that requires access to all function bodies in a > program. For C and small Fortran programs doing this at link time causes no > issues. However, when I attempt to compile a larger Fortran program the > pass is called multiple times at link time, each time with only a portion of > the function bodies available. Is there anyway to force unpartitioned (i.e. > single execution) of my pass on these Fortran programs, with access to all > the function bodies? > > I am using GCC 4.7 with gold linker. -flto-partition=nonde shuld work. Honza > > Cheers, > Andrew > -- > View this message in context: > http://old.nabble.com/Is-it-possible-to-get-unpartitioned-LTO-with-Fortran--tp33732132p33732132.html > Sent from the gcc - Dev mailing list archive at Nabble.com.
Re: A case where PHI-OPT pessimizes the code
On Mon, Apr 23, 2012 at 02:50:13PM +0200, Steven Bosscher wrote: > csui = (ONEUL << a); > b = ((csui & cst) != 0); > if (b) > return 1; > else > return 0; We (powerpc) would be much better if this were csui = (ONEUL << a); return (csui & cst) >> a; Other targets would probably benefit too. -- Alan Modra Australia Development Lab, IBM
Re: A case where PHI-OPT pessimizes the code
On 04/23/2012 06:27 AM, Richard Guenther wrote: On Mon, Apr 23, 2012 at 2:15 PM, Steven Bosscher wrote: Hello, I ported the code to expand switch() statements with bit tests from stmt.c to GIMPLE, and looked at the resulting code to verify that the transformation applies correctly, when I noticed this strange PHI-OPT transformation that results in worse code for the test case of PR45830 which looks like this: int foo (int *a) { switch (*a) { case 0: case 1:case 2:case 3:case 4:case 5: case 19:case 20:case 21:case 22:case 23: case 26:case 27: return 1; default: return 0; } } After transforming the switch() to a series of bit tests, the code looks like this: ;; Function foo (foo, funcdef_no=0, decl_uid=1996, cgraph_uid=0) beginning to process the following SWITCH statement (pr45830.c:8) : --- switch (D.2013_3), case 0 ... 5:, case 19 ... 23:, case 26 ... 27:> expanding as bit test is preferableSwitch converted foo (int * a) { _Bool D.2023; long unsigned int D.2022; long unsigned int D.2021; long unsigned int csui.1; _Bool D.2019; int D.2014; int D.2013; : D.2013_3 = *a_2(D); D.2019_5 = D.2013_3> 27; if (D.2019_5 != 0) goto (); else goto; : D.2021_7 = (long unsigned int) D.2013_3; csui.1_4 = 1<< D.2021_7; D.2022_8 = csui.1_4& 217579583; D.2023_9 = D.2022_8 != 0; if (D.2023_9 != 0) goto (); else goto; : : # D.2014_1 = PHI<1(5), 0(3)> : return D.2014_1; } This is the equivalent code of what the expander in stmt.c would generate. Unfortunately, the first PHI-OPT pass (phiopt1) changes the code as follows: ;; Function foo (foo, funcdef_no=0, decl_uid=1996, cgraph_uid=0) Removing basic block 4 foo (int * a) { int D.2026; _Bool D.2025; long unsigned int D.2022; long unsigned int D.2021; long unsigned int csui.1; int D.2014; int D.2013; : D.2013_3 = *a_2(D); if (D.2013_3> 27) goto (); else goto; : D.2021_7 = (long unsigned int) D.2013_3; csui.1_4 = 1<< D.2021_7; D.2022_8 = csui.1_4& 217579583; D.2025_9 = D.2022_8 != 0; D.2026_5 = (int) D.2025_9; // new statement # D.2014_1 = PHI // modified PHI node : return D.2014_1; } This results in worse code on powerpc64: BEFORE AFTER foo:foo: lwz 9,0(3) lwz 9,0(3) cmplwi 7,9,27 | cmpwi 7,9,27 bgt 7,.L4 | bgt 7,.L3 li 8,1| li 3,1 lis 10,0xcf8lis 10,0xcf8 sld 9,8,9 | sld 9,3,9 ori 10,10,63ori 10,10,63 and. 8,9,10 and. 8,9,10 li 3,1| mfcr 10 bnelr 0 | rlwinm 10,10,3,1 .L4: | xori 3,10,1 > blr > .p2align 4,,15 > .L3: li 3,0 li 3,0 blr blr BEFORE is the code that results if stmt.c expands the switch to bit tests (i.e. PHI-OPT never gets to transform the code as shown), and AFTER is with my equivalent GIMPLE implementation. Apparently, the optimizers are unable to recover from the transformation PHI-OPT performs. I am not sure how to fix this problem. I am somewhat surprised by the code generated by the powerpc backend for "t=(int)(_Bool)some_bool", because I would have expected the type range for _Bool to be<0,1> so that the type conversion should be a single bit test. On the other hand, maybe PHI-OPT should recognize this pattern and reject the transformation??? Your thoughts/comments/suggestions, please? Ciao! Steven P.S. Unfortunately I haven't been able to produce a test case that shows the problem without my switch conversion pass. int foo (_Bool b) { if (b) return 1; else return 0; } PHI-OPT tries to do conditional replacement, thus transform bb0: if (cond) goto bb2; else goto bb1; bb1: bb2: x = PHI<0 (bb1), 1 (bb0), ...>; to bb0: x' = cond; goto bb2; bb2: x = PHI; trying to save a compare (assuming the target has a set-cc like instruction). I think the ppc backend should be fixed here (if possible), or the generic expansion of this kind of pattern needs to improve. On x86_64 we simply do (insn 7 6 8 (set (reg:SI 63 [ D.1715 ]) (zero_extend:SI (reg/v:QI 61 [ b ]))) t.c:4 -1 (nil)) FWIW, there's a patch buried in a BZ where phi-opt is extended to eliminate PHIs using casts, arithmetic, etc. I never followed up on it because my tests showed that it wasn't a win.It might be possible to retask those bits to improve this code. jeff ps. It was related to missing a conditional move in a loop, so a search for missing cmov or something like that might find the bug. Alternately it was probably attached to the 4
Re: old archives from 1998
On 04/22/2012 11:43 AM, Ian Lance Taylor wrote: When EGCS and GCC merged back together again, the changes made to the FSF version of GCC (that is, the non-EGCS version) were put into FSFChangelog, which is where you found them. There was no attempt to copy all the entries from FSFChangeLog to the regular ChangeLog files, so it is not surprising that the change is not there. If I remember correctly, we just copied the ChangeLog from the old gcc2 tree to FSFChangeLog at each import. We made no attempt to weave the egcs & gcc2 ChangeLogs together. As I recall most changes to the FSF version of GCC were discussed on the gcc2 mailing list. But I might be misremembering. And I can't find any archives of the gcc2 mailing list anywhere. Cygnus kept some archives of the old gcc2 development list, but never released them as the list had always been considered. I doubt I could even find those old archives anymore. Jeff
MIPS: 2'nd pass of ira, causes weird register allocation for 2-op mult
The summery goes something like this: It is possible for the second pass of ira to get confused and decide that NO_REGS or a hard float register are better choices for the result of the 2 operand mult. First pass already optimally allocated in GR_AND_MD1_REGS. Two pass ira is enabled with "-fexpensive-optimizations". Below is the code that will provoke the problem (pre-processed fixed point function from libgcc) -8< typedef unsigned long size_t; typedef int HItype __attribute__ ((mode (HI))); typedef unsigned int UHItype __attribute__ ((mode (HI))); typedef _Fract HQtype __attribute__ ((mode (HQ))); typedef unsigned _Fract UHQtype __attribute__ ((mode (UHQ))); typedef int SItype __attribute__ ((mode (SI))); typedef unsigned int USItype __attribute__ ((mode (SI))); extern void *memcpy (void *, const void *, size_t); extern USItype __saturate1uhq (USItype); UHQtype __mulhelperuhq (UHQtype a, UHQtype b, int satp) { UHQtype c; UHItype x, y, z; USItype dx, dy, dz; memcpy (&x, &a, 2); memcpy (&y, &b, 2); dx = (USItype) x; dy = (USItype) y; dz = dx * dy; dz += ((USItype) 1 << (16 - 1)); dz = dz >> 16; if (satp) dz = __saturate1uhq (dz); z = (UHItype) dz; memcpy (&c, &z, 2); return c; } -8< Compiling with -O1 give pretty optimal code (Check that impressive optimi- zation of memcpy()): -8< .file 1 "u1.c" .section .mdebug.abi32 .previous .gnu_attribute 4, 3 # -G value = 8, Arch = mips1, ISA = 1 # GNU C version 4.7.0 (mips-sde-elf) # compiled by GNU C version 4.6.3 20120306 (Red Hat 4.6.3-2), GMP version 4.3.2, MPFR version 3.0.0, MPC version 0.9 # GGC heuristics: --param ggc-min-expand=100 --param ggc-min-heapsize=131072 # options passed: u1.c -mno-mips16 -O1 -march=mips1 -fdump-tree-all # -fdump-ipa-all -ftree-vectorizer-verbose=9 -fdump-rtl-all -fverbose-asm # -frandom-seed=0 -O1 -msoft-float -fno-expensive-optimizations ... __mulhelperuhq: .frame $sp,24,$31 # vars= 0, regs= 1/0, args= 16, gp= 0 .mask 0x8000,-4 .fmask 0x,0 .setnoreorder .setnomacro andi$5,$5,0x # b, b andi$4,$4,0x # a, a mult$5,$4# b, a mflo$2 # dz li $3,32768# 0x8000 # tmp209, addu$2,$2,$3 # dz, dz, tmp209 beq $6,$0,.L5#, satp,, srl $2,$2,16 # dz, dz, addiu $sp,$sp,-24 #,, sw $31,20($sp) #, jal __saturate1uhq # move$4,$2#, dz lw $31,20($sp) #, addiu $sp,$sp,24 #,, .L5: j $31 nop ... .ident "GCC: 4.7.0" -8< It looks as optimal as it gets... Unfortunately, when enabling -fexpensive-optimizations the code get really bad: -8< ... # options passed: u1.c -mno-mips16 -O1 -march=mips1 -fdump-tree-all # -fdump-ipa-all -ftree-vectorizer-verbose=9 -fdump-rtl-all -fverbose-asm # -frandom-seed=0 -O1 -msoft-float -fexpensive-optimizations ... __mulhelperuhq: .frame $sp,32,$31 # vars= 8, regs= 1/0, args= 16, gp= 0 ... addiu $sp,$sp,-32 # <<< set up stack frame sw $31,28($sp) # <<< save link reg andi$5,$5,0x # b, b andi$4,$4,0x # a, a mult$5,$4# b, a mflo$2 # <<< move from mdlo sw $2,16($sp) # <<< store mdlo on the stack li $2,32768 mflo$3 # <<< move from mdlo again! addu$2,$3,$2 # dz,, tmp209 beq $6,$0,.L2#, satp,, srl $2,$2,16 # dz, dz, jal __saturate1uhq move$4,$2#, dz .L2: lw $31,28($sp) nop j $31 addiu $sp,$sp,32 ... -8< Here two additional instructions, to get mdlo and store it on the stack, has been added. Notice how the valid mdlo value is overwritten and then immediately reloaded and how 16($sp) is never actually used: mflo$2 # <<< move from mdlo sw $2,16($sp) # <<< store mdlo on the stack li $2,32768 mflo$3 # <<< move from mdlo again! The problem seem to originate from the ira pass find_costs_and_classes() (ira-costs.c) when the second pass fails to find something better than pass one. One reason for this to happen could be because the way mflo is penaltizied: -8< static int mips_move_to_gpr_cost (enum machine_mode mode ATTRIBUTE_UNUSED, reg_class_t from) { switch (from) { case G
Re: [SPAM] Re: Attempting changes to the GIMPLifier
Hi Richard 1. Printing global variables. Look at the cgraph (.000i.cgraph) dump. 2. Preserving function arguments (what I call an "interface"). I think we do that now. Thank you very much! I'll grab the latest release and have a look. Best regards, Nikolaos Kavvadias Both 1 and 2 are not currently addressed, at least in the gcc-4.5.1 and gcc-4.6.0 gimplifiers that I work with. Is this information available in internal data structures so I can expose it via use of the GIMPLE API? I've also noticed inconsistencies among GIMPLE dumps produced following different optimizations, but this is another topic. Thanks in advance. Best regards, Nikolaos Kavvadias
Re: A case where PHI-OPT pessimizes the code
On Mon, Apr 23, 2012 at 4:43 PM, Alan Modra wrote: > On Mon, Apr 23, 2012 at 02:50:13PM +0200, Steven Bosscher wrote: >> csui = (ONEUL << a); >> b = ((csui & cst) != 0); >> if (b) >> return 1; >> else >> return 0; > > We (powerpc) would be much better if this were > > csui = (ONEUL << a); > return (csui & cst) >> a; > > Other targets would probably benefit too. Yes, this has been discussed before. See here: http://gcc.gnu.org/ml/gcc-patches/2003-01/msg01791.html http://gcc.gnu.org/ml/gcc-patches/2003-01/msg01950.html However, like Roger, I would prefer to not implement this right now. I only want to port the code from stmt.c to GIMPLE, at least initially. Later on, we could look at different code generation approaches for this kind of switch() statement. Ciao! Steven
Re: target specific builtin expansion (middle end and back end definition inconsistence problem?).
Feng LI writes: > Hi Ian, > > 2012/4/22 Ian Lance Taylor : >> Feng LI writes: >> >>> Yes, you are right. But how could I reference to a backend defined builtin >>> function in the middle end (I need to generate the builtin function in the >>> middle end and expand it in x86 backend)? >> >> If the function doesn't have a machine-independent definition, then use >> a target hook. > > Then I remove the duplicate builtin definition in x86 backend. > I define the builtin function with built_in_class as BUILT_IN_MD in > builtins.def. Sorry, I meant use a target hook to actually generate the call expression. The target hook can refer to the target-specific builtin function. Ian
Re: A case where PHI-OPT pessimizes the code
On Mon, Apr 23, 2012 at 2:50 PM, Steven Bosscher wrote: > I will file a PR for this later today, maybe after trying on a few > other targets to see if this is a middle-end problem or a target > issue. This is now PR target/53087 (http://gcc.gnu.org/PR53087). Actually the poor code looks to be coming from the &-operation, not from the _Bool->int conversion. But I don't know enough powerpc-speak to be sure... Ciao! Steven
Re: target specific builtin expansion (middle end and back end definition inconsistence problem?).
Hi Ian, 2012/4/23 Ian Lance Taylor : > Feng LI writes: > >> Hi Ian, >> >> 2012/4/22 Ian Lance Taylor : >>> Feng LI writes: >>> Yes, you are right. But how could I reference to a backend defined builtin function in the middle end (I need to generate the builtin function in the middle end and expand it in x86 backend)? >>> >>> If the function doesn't have a machine-independent definition, then use >>> a target hook. >> >> Then I remove the duplicate builtin definition in x86 backend. >> I define the builtin function with built_in_class as BUILT_IN_MD in >> builtins.def. > > Sorry, I meant use a target hook to actually generate the call > expression. The target hook can refer to the target-specific builtin > function. Just for confirmation, do you mean by calling this hook: targetm.builtin_decl (unsigned code, bool initialized_p) in the middle end for getting the builtin definition in the backend? Probably I'm asking a silly question, when is the time of the initialization of the backend builtin functions. I'm refering it in gcc middle end, near OPENMP expansion (omp-low.c) pass. Thanks, Feng > > Ian
Re: Failed access check
"Peter A. Felvegi" writes: > Should I file a bug report? Yes, please. Thanks. Ian
Re: target specific builtin expansion (middle end and back end definition inconsistence problem?).
Feng LI writes: > Yes, you are right. But how could I reference to a backend defined builtin > function in the middle end (I need to generate the builtin function in the > middle end and expand it in x86 backend)? If the function doesn't have a machine-independent definition, then use a target hook. >>> >>> Then I remove the duplicate builtin definition in x86 backend. >>> I define the builtin function with built_in_class as BUILT_IN_MD in >>> builtins.def. >> >> Sorry, I meant use a target hook to actually generate the call >> expression. The target hook can refer to the target-specific builtin >> function. > Just for confirmation, do you mean by calling this hook: > targetm.builtin_decl (unsigned code, bool initialized_p) > in the middle end for getting the builtin definition in the backend? > > Probably I'm asking a silly question, when is the time of the initialization > of the backend builtin functions. I'm refering it in gcc middle end, near > OPENMP expansion (omp-low.c) pass. No, I mean adding a new target hook build_my_magic_call and calling that. That target hook would be build a call to the function. You haven't really described the background, so I suppose I don't know if this is appropriate. It's not the right approach if you want to contribute this back to GCC mainline, but then of course GCC mainline also doesn't want a target-specific function in builtins.def. Ian
avr-size -C --format=avr options
I've previously used WinAVR and their port of gcc has an 'enhancement' in the avr-size which is option '-C' or '--format=avr' which also needs you to pass the chip type in '--mmcu=atmega328p' for example. This shows the total amount of flash required (ie code + data), and RAM (ie data + bss) and compares these against the abilities of the chip you have specified. This is very useful in build scripts to fail code that is too big to upload to the processor. I'm trying to create a newer version of the AVR toolchain as WinAVR is now quite old (January 2010!) - but need to work around this issue. I'm not sure if it was functionality the WinAVR folk added themselves or whether it's a 'no longer supported' option in gcc or if it originates somewhere else! Anyone know?
Re: RFC: Add STB_GNU_SECONDARY
On Sat, Apr 21, 2012 at 12:01 PM, Joern Rennecke wrote: > Quoting "H.J. Lu" : > >> Putting our own foo in a section with a special prefix in section name, >> like .secondary_*, works with linker support. But it isn't very reliable. > > > In what way is requiring linker support for STB_GNU_SECONDARY more reliable > than requiring linker support for .secondary_* sections? It may lead to conflict with section __attribute__ in C source and section directive in assembly code. -- H.J.
Re: Failed access check
On 23 April 2012 18:48, Ian Lance Taylor wrote: > "Peter A. Felvegi" writes: > >> Should I file a bug report? > > Yes, please. Thanks. Please check it's not http://gcc.gnu.org/bugzilla/show_bug.cgi?id=24926 first
Re: target specific builtin expansion (middle end and back end definition inconsistence problem?).
On Mon, Apr 23, 2012 at 8:03 PM, Ian Lance Taylor wrote: > Feng LI writes: > >> Yes, you are right. But how could I reference to a backend defined >> builtin >> function in the middle end (I need to generate the builtin function in >> the >> middle end and expand it in x86 backend)? > > If the function doesn't have a machine-independent definition, then use > a target hook. Then I remove the duplicate builtin definition in x86 backend. I define the builtin function with built_in_class as BUILT_IN_MD in builtins.def. >>> >>> Sorry, I meant use a target hook to actually generate the call >>> expression. The target hook can refer to the target-specific builtin >>> function. >> Just for confirmation, do you mean by calling this hook: >> targetm.builtin_decl (unsigned code, bool initialized_p) >> in the middle end for getting the builtin definition in the backend? >> >> Probably I'm asking a silly question, when is the time of the initialization >> of the backend builtin functions. I'm refering it in gcc middle end, near >> OPENMP expansion (omp-low.c) pass. > > No, I mean adding a new target hook build_my_magic_call and calling > that. That target hook would be build a call to the function. > > You haven't really described the background, so I suppose I don't know > if this is appropriate. It's not the right approach if you want to > contribute this back to GCC mainline, but then of course GCC mainline > also doesn't want a target-specific function in builtins.def. Just to add some 2 cents - target specific builtins are either covering a generic concept (and thus can be created by the middle-end by involving a target hook), or they can be completely target specific, in which case they are _not_created by the middle-end at all. Richard. > Ian
Re: avr-size -C --format=avr options
Clive Webster schrieb: I've previously used WinAVR and their port of gcc has an 'enhancement' in the avr-size which is option '-C' or '--format=avr' which also needs you to Please notice that this mailing list is about GCC development, not about binutils. size is not a part of GCC. size is part of GNU binutils. pass the chip type in '--mmcu=atmega328p' for example. This shows the total amount of flash required (ie code + data), and RAM (ie data + bss) and compares these against the abilities of the chip you have specified. This is very useful in build scripts to fail code that is too big to upload to the processor. I'm trying to create a newer version of the AVR toolchain as WinAVR is now quite old (January 2010!) - but need to work around this issue. I'm not sure if it was functionality the WinAVR folk added themselves or whether it's a 'no longer supported' option in gcc or if it originates somewhere else! For patches applied to the tools shipped with WinAVR, see respective patches in your WinAVR distribution in folder ./source For a list of files contained in your WinAVR distribution, read ./WinAVR-manifest.log For links to the projects, see the files ./source/SOURCE and WinAVR-user-manual.txt in your WinAVR distribution. Johann
Re: A case where PHI-OPT pessimizes the code
On Mon, Apr 23, 2012 at 06:07:52PM +0200, Steven Bosscher wrote: > On Mon, Apr 23, 2012 at 4:43 PM, Alan Modra wrote: > > On Mon, Apr 23, 2012 at 02:50:13PM +0200, Steven Bosscher wrote: > >> csui = (ONEUL << a); > >> b = ((csui & cst) != 0); > >> if (b) > >> return 1; > >> else > >> return 0; > > > > We (powerpc) would be much better if this were > > > > csui = (ONEUL << a); > > return (csui & cst) >> a; > > > > Other targets would probably benefit too. > > Yes, this has been discussed before. See here: > > http://gcc.gnu.org/ml/gcc-patches/2003-01/msg01791.html > http://gcc.gnu.org/ml/gcc-patches/2003-01/msg01950.html I'm suggesting something slightly different to either of these. I realize it's probably not that easy to implement, and is really outside the scope of the switch statement code you're working on, but it would be nice if we could avoid the comparison. On high end powerpc machines, int -> cc -> int costs the equivalent of many operations just on int. (In the powerpc code you showed, the comparison is folded into the AND, emitted as "and.", the move from cc is "mfcr; rlwinm; xori". "and." isn't cheap and "mfcr" is relatively expensive.) -- Alan Modra Australia Development Lab, IBM