RFC: SMS problem with emit_copy_of_insn_after copying REG_NOTEs
Hello all, I'm preparing and testing SMS correction/improvements patch and while testing it on the SPU with the vectorizer testcases I've got an ICE in the "gcc_assert ( MAX_RECOG_OPERANDS - i)" in function copy_insn_1 in emit_rtl.c. The call traces back to the loop versionioning called in modulo-sched.c before the SMSing actually starts. The specific instruction it tries to copy when it fails is (insn 32 31 33 4 (parallel [ (set (reg:SI 162) (div:SI (reg:SI 164) (reg:SI 156))) (set (reg:SI 163) (mod:SI (reg:SI 164) (reg:SI 156))) (clobber (scratch:SI)) (clobber (scratch:SI)) (clobber (scratch:SI)) (clobber (scratch:SI)) (clobber (scratch:SI)) (clobber (scratch:SI)) (clobber (scratch:SI)) (clobber (scratch:SI)) (clobber (scratch:SI)) (clobber (reg:SI 130 hbr)) ]) 129 {divmodsi4} (insn_list:REG_DEP_TRUE 30 (insn_list:REG_DEP_TRUE 31 (nil))) (expr_list:REG_DEAD (reg:SI 164) (expr_list:REG_DEAD (reg:SI 156) (expr_list:REG_UNUSED (reg:SI 130 hbr) (expr_list:REG_UNUSED (scratch:SI) (expr_list:REG_UNUSED (scratch:SI) (expr_list:REG_UNUSED (scratch:SI) (expr_list:REG_UNUSED (scratch:SI) (expr_list:REG_UNUSED (scratch:SI) (expr_list:REG_UNUSED (scratch:SI) (expr_list:REG_UNUSED (scratch:SI) (expr_list:REG_UNUSED (scratch:SI) (expr_list:REG_UNUSED (scratch:SI) (expr_list:REG_UNUSED (reg:SI 163) (nil))) The error happens in the first call to copy_insn_1 in the loop below (copied from emit_copy_of_insn_after from emit_rtl.c): for (link = REG_NOTES (insn); link; link = XEXP (link, 1)) if (REG_NOTE_KIND (link) != REG_LABEL) { if (GET_CODE (link) == EXPR_LIST) REG_NOTES (new) = copy_insn_1 (gen_rtx_EXPR_LIST (REG_NOTE_KIND (link), XEXP (link, 0), REG_NOTES (new))); else REG_NOTES (new) = copy_insn_1 (gen_rtx_INSN_LIST (REG_NOTE_KIND (link), XEXP (link, 0), REG_NOTES (new))); } Tracing the execution of copy_insn_1, it seems that it goes over the same REG_NOTES many times (it seems to be a quadratic time complexity algorithm). This causes "copy_insn_n_scratches++" to be executed more times than there are SCRATCH registers (and even REG_NOTES) leading to the failure in the assert. There are 9 SCRATCH registers used in the instruction and MAX_RECOG_OPERANDS is 30 for the SPU. Since copy_insn_n_scratches is initialized in copy_insn and since we go over regnotes over and over again, I've modified in the loop above the two calls to copy_insn_1 with the calls to copy_insn. This caused the ICEs in the testsuite to disappear. I wonder if this constitutes a legitimate fix or I'm missing something? Thanks in advance, Vladimir
Re: RFC: SMS problem with emit_copy_of_insn_after copying REG_NOTEs
Hi, Jan, Thanks for fast response! I've tested the change you proposed and we still failed in the assert checking that the number of SCRATCHes being too large (>30) while copying the REG_NOTES of the instruction (see below) using just 9 SCRATCH registers. Thanks, Vladimir On 12/18/06, Jan Hubicka <[EMAIL PROTECTED]> wrote: > Hello all, > > I'm preparing and testing SMS correction/improvements patch and while > testing it on the SPU with the vectorizer testcases I've got an ICE in > the "gcc_assert ( MAX_RECOG_OPERANDS - i)" in function copy_insn_1 in > emit_rtl.c. The call traces back to the loop versionioning called in > modulo-sched.c before the SMSing actually starts. The specific > instruction it tries to copy when it fails is > > (insn 32 31 33 4 (parallel [ >(set (reg:SI 162) >(div:SI (reg:SI 164) >(reg:SI 156))) >(set (reg:SI 163) >(mod:SI (reg:SI 164) >(reg:SI 156))) >(clobber (scratch:SI)) >(clobber (scratch:SI)) >(clobber (scratch:SI)) >(clobber (scratch:SI)) >(clobber (scratch:SI)) >(clobber (scratch:SI)) >(clobber (scratch:SI)) >(clobber (scratch:SI)) >(clobber (scratch:SI)) >(clobber (reg:SI 130 hbr)) >]) 129 {divmodsi4} (insn_list:REG_DEP_TRUE 30 > (insn_list:REG_DEP_TRUE 31 (nil))) >(expr_list:REG_DEAD (reg:SI 164) >(expr_list:REG_DEAD (reg:SI 156) >(expr_list:REG_UNUSED (reg:SI 130 hbr) >(expr_list:REG_UNUSED (scratch:SI) >(expr_list:REG_UNUSED (scratch:SI) >(expr_list:REG_UNUSED (scratch:SI) >(expr_list:REG_UNUSED (scratch:SI) >(expr_list:REG_UNUSED (scratch:SI) >(expr_list:REG_UNUSED (scratch:SI) >(expr_list:REG_UNUSED (scratch:SI) >(expr_list:REG_UNUSED >(scratch:SI) >(expr_list:REG_UNUSED > (scratch:SI) > (expr_list:REG_UNUSED (reg:SI 163) >(nil))) > > The error happens in the first call to copy_insn_1 in the loop below > (copied from emit_copy_of_insn_after from emit_rtl.c): > > > for (link = REG_NOTES (insn); link; link = XEXP (link, 1)) >if (REG_NOTE_KIND (link) != REG_LABEL) > { >if (GET_CODE (link) == EXPR_LIST) > REG_NOTES (new) >= copy_insn_1 (gen_rtx_EXPR_LIST (REG_NOTE_KIND (link), > XEXP (link, 0), > REG_NOTES (new))); >else > REG_NOTES (new) >= copy_insn_1 (gen_rtx_INSN_LIST (REG_NOTE_KIND (link), > XEXP (link, 0), > REG_NOTES (new))); > } > THanks for sending updated patch, I will try to look across it tomorrow > Tracing the execution of copy_insn_1, it seems that it goes over the > same REG_NOTES many times (it seems to be a quadratic time complexity > algorithm). This causes "copy_insn_n_scratches++" to be executed more > times than there are SCRATCH registers (and even REG_NOTES) leading to > the failure in the assert. There are 9 SCRATCH registers used in the > instruction and MAX_RECOG_OPERANDS is 30 for the SPU. > > Since copy_insn_n_scratches is initialized in copy_insn and since we > go over regnotes over and over again, I've modified in the loop > above the two calls to copy_insn_1 with the calls to copy_insn. This > caused the ICEs in the testsuite to disappear. > > I wonder if this constitutes a legitimate fix or I'm missing something? I believe you really want to avoid quadratic amount of work. This is probably best done by REG_NOTES (new) = gen_rtx_EXPR_LIST (REG_NOTE_KIND (link), copy_insn_1 (XEXP (link, 0)), REG_NOTES (new))); so copy_insn_1 don't recusively descend into already copied chain. Honza > > Thanks in advance, > Vladimir
Re: RFC: SMS problem with emit_copy_of_insn_after copying REG_NOTEs
Hi, I've rebuilt again everything from scratch with the changes to emit_copy_of_insn_after as Jan suggested (see patch below) and the ICE caused by quadratic accumulation of the counter of scratch registers is gone! Thanks, Vladimir Index: emit-rtl.c === --- emit-rtl.c (revision 120004) +++ emit-rtl.c (working copy) @@ -5296,16 +5306,16 @@ if (REG_NOTE_KIND (link) != REG_LABEL) { if (GET_CODE (link) == EXPR_LIST) - REG_NOTES (new) - = copy_insn_1 (gen_rtx_EXPR_LIST (REG_NOTE_KIND (link), - XEXP (link, 0), - REG_NOTES (new))); + + REG_NOTES (new) + = gen_rtx_EXPR_LIST (REG_NOTE_KIND (link), + copy_insn_1 (XEXP (link, 0)), REG_NOTES (new)); else - REG_NOTES (new) - = copy_insn_1 (gen_rtx_INSN_LIST (REG_NOTE_KIND (link), - XEXP (link, 0), - REG_NOTES (new))); - } + REG_NOTES (new) + = gen_rtx_INSN_LIST (REG_NOTE_KIND (link), + copy_insn_1 (XEXP (link, 0)), REG_NOTES (new)); + + } /* Fix the libcall sequences. */ if ((note1 = find_reg_note (new, REG_RETVAL, NULL_RTX)) != NULL) if (GET_CODE (link) == EXPR_LIST) #if 1 REG_NOTES (new) = gen_rtx_EXPR_LIST (REG_NOTE_KIND (link), copy_insn_1 (XEXP (link, 0)), REG_NOTES (new)); #endif #if 0 REG_NOTES (new) = copy_insn_1 (gen_rtx_EXPR_LIST (REG_NOTE_KIND (link), XEXP (link, 0), REG_NOTES (new))); #endif else #if 0 REG_NOTES (new) = copy_insn_1 (gen_rtx_INSN_LIST (REG_NOTE_KIND (link), XEXP (link, 0), REG_NOTES (new))); #endif #if 1 REG_NOTES (new) = gen_rtx_INSN_LIST (REG_NOTE_KIND (link), copy_insn_1 (XEXP (link, 0)), REG_NOTES (new)); #endif } On 12/19/06, Jan Hubicka <[EMAIL PROTECTED]> wrote: > Hi, Jan, > Thanks for fast response! > > I've tested the change you proposed and we still failed in the assert > checking that the number of SCRATCHes being too large (>30) while > copying the REG_NOTES of the instruction (see below) using just 9 > SCRATCH registers. Hi, apparently there seems to be another reason copy_insn_1 can do quadratic amount of work except for this one, I don't seem to be able to see any however. Just for sure, did you updated both cases of wrong recursion, the EXPR_LIST I sent and the INSN_LIST hunk just bellow? Otherwise probably adding a breakpoint on copy_insn_1 and seeing how it manage to do so many recursions will surely help :) Honza
Re: RFC: SMS problem with emit_copy_of_insn_after copying REG_NOTEs
Hi, Sorry for possibly causing confusion. I had tested the patch on my ICE testcase and bootstrapped for -enable-languages=C, but didn't run the full bootstrap. Bootstrapping the latest Andrew's patch on ppc-linux and testing it on SPU. Vladimir On 12/30/06, Jan Hubicka <[EMAIL PROTECTED]> wrote: Hi, thanks for testing. I've bootstrapped/regtested this variant of patch and comitted it as obvious. Honza 2006-12-30 Jan Hubicka <[EMAIL PROTECTED]> Vladimir Yanovsky <[EMAIL PROTECTED]> * emit-rt.c (emit_copy_of_insn_after): Fix bug causing exponential amount of copies of INSN_NOTEs list. Index: emit-rtl.c === --- emit-rtl.c (revision 120274) +++ emit-rtl.c (working copy) @@ -5297,14 +5297,12 @@ emit_copy_of_insn_after (rtx insn, rtx a { if (GET_CODE (link) == EXPR_LIST) REG_NOTES (new) - = copy_insn_1 (gen_rtx_EXPR_LIST (REG_NOTE_KIND (link), - XEXP (link, 0), - REG_NOTES (new))); + = gen_rtx_EXPR_LIST (REG_NOTE_KIND (link), + copy_insn_1 (XEXP (link, 0)), REG_NOTES (new)); else REG_NOTES (new) - = copy_insn_1 (gen_rtx_INSN_LIST (REG_NOTE_KIND (link), - XEXP (link, 0), - REG_NOTES (new))); + = gen_rtx_INSN_LIST (REG_NOTE_KIND (link), +copy_insn_1 (XEXP (link, 0)), REG_NOTES (new)); } /* Fix the libcall sequences. */
Re: RFC: SMS problem with emit_copy_of_insn_after copying REG_NOTEs
I've bootstrapped OK C/C++/Fortran on PPC. make check-gcc is running now Thanks, Vladimir On 1/1/07, Jan Hubicka <[EMAIL PROTECTED]> wrote: > Hi, > Sorry for possibly causing confusion. I had tested the patch on my ICE > testcase and bootstrapped for -enable-languages=C, but didn't run the > full bootstrap. Bootstrapping the latest Andrew's patch on ppc-linux > and testing it on SPU. Vladimir, I bootstrapped/regtested the patch myself on i686 before commiting it, so the rule was met here. Unfortunately i686 don't seems to show the regression. I've bootstrapped/regtested x86_64 and i686 with Andrew's patch and it works all fine. Honza
Re: RFC: SMS problem with emit_copy_of_insn_after copying REG_NOTEs
The testing of the committed patch on the PPC-linux has produced no regressions relatively to the state that was before the bootstrap break-up. The same holds for the Andrew's version of the patch. 21 testsuite failures on PPC-linux that were introduced together with the bootstrap problem has disappeared with this commit. Vladimir On 1/1/07, Jan Hubicka <[EMAIL PROTECTED]> wrote: Hi, I've commited the following patch that fixes the obvious problem of calling emit_insn_1 for INSN_LIST argument. It seems to solve the problems I can reproduce and it bootstraps x86_64-linux/i686-linux and Darwin (thanks to andreast). The patch was preaproved by Ian. This is meant as fast fix to avoid bootstrap. Andrew's optimization still makes sense as an microoptimization and the nested libcall issue probably ought to be resolved, but can be dealt with incrementally. My apologizes for the problems. Honza Index: ChangeLog === --- ChangeLog (revision 120315) +++ ChangeLog (working copy) @@ -1,3 +1,8 @@ +2007-01-01 Jan Hubicka <[EMAIL PROTECTED]> + + * emit-rtl.c (emit_copy_of_insn_after): Do not call copy_insn_1 for + INSN_LIST. + 2007-01-01 Mike Stump <[EMAIL PROTECTED]> * configure.ac (HAVE_GAS_LITERAL16): Add autoconf check for Index: emit-rtl.c === --- emit-rtl.c (revision 120313) +++ emit-rtl.c (working copy) @@ -5302,7 +5302,7 @@ emit_copy_of_insn_after (rtx insn, rtx a else REG_NOTES (new) = gen_rtx_INSN_LIST (REG_NOTE_KIND (link), -copy_insn_1 (XEXP (link, 0)), REG_NOTES (new)); +XEXP (link, 0), REG_NOTES (new)); } /* Fix the libcall sequences. */
A problem with the loop structure
Hi all, I will greatly appreciate any suggestions regarding the following problem I have with the loop structure. I am working on Swing Modulo Scheduling with Sony SDK for SPU (based on gcc 4.1.1). Below there are 3 observation describing the problem. Thanks a lot, Vladimir 1. The problem was unveiled by compiling a testcase with dump turned on. The compilation failed while calling function get_loop_body from flow_loop_dump on the following assert : else if (loop->latch != loop->header) { tv = dfs_enumerate_from (loop->latch, 1, glb_enum_p, tovisit + 1, loop->num_nodes - 1, loop->header) + 1; gcc_assert (tv == loop->num_nodes); The compilation exits successfully if compiled without enabling the dump. 2. SMS pass contained a single call to loop_version on the loop to be SMSed. This happened before any SMS related stuff was done. Trying to call verify_loop_structure(loops) just after the call to loop_version failed on the same assert in get_loop_body as in (1). The loop on which we fail is neither the versioned loop nor the new loop. Below there are dumps to verify_loop_structure called from different places in loop_version: (gdb) n 1466 first_head = entry->dest; (gdb) p verify_loop_structure(loops) $12 = void (gdb) n 1469 if (!cfg_hook_duplicate_loop_to_header_edge (loop, entry, loops, 1, (gdb) p verify_loop_structure(loops) $13 = void (gdb) n 1475 second_head = entry->dest; (gdb) p verify_loop_structure(loops) bmark_lite.c: In function 't_run_test': bmark_lite.c:1225: error: loop 7's header does not have exactly 2 entries Breakpoint 1, fancy_abort ( file=0x884008 "/Develop/sony/build/toolchain/gcc/gcc/cfgloop.c", line=1277, function=0x8841d0 "verify_loop_structure") at /Develop/sony/build/toolchain/gcc/gcc/diagnostic.c:602 602 internal_error ("in %s, at %s:%d", function, trim_filename (file), line); 3. At the very beginning of the SMS pass we build the loop structure using build_loops_structure defined in modulo-sched.c. Just after the call I tried to print in gdb the loop on which we failed in get_loop_body. This failed as well (gdb) p print_loop(dumpfile, 0xbabe20, 0) No symbol "dumpfile" in current context. (gdb) p print_loop(stdout, 0xbabe20, 0) loop_0 { } $1 = void (gdb) p print_loop(stdout, 0xd42e20, 0) loop_7 { bb_21 (preds = {bb_256 }, succs = {bb_23 bb_22 }) { :; matrixA.770 = matrixA; temp.801 = *(matrixA.770 + (varsize * *) ivtmp.701 * 4B); temp.874 = temp.801 + pretmp.130; sum1_lsm.411 = *temp.874; col1_lsm.839 = (int) ivtmp.701; col1_lsm.837 = 0; if (col1_lsm.839 > 0) goto ; else (void) 0; } bb_23 (preds = {bb_21 }, succs = {bb_262 }) { :; ivtmp.694 = 0; } bb_262 (preds = {bb_23 }, succs = {bb_264 bb_263 }) { Breakpoint 1, fancy_abort ( file=0x86e980 "/Develop/sony/build/toolchain/gcc/gcc/tree-flow-inline.h", line=722, function=0x86e9b9 "bsi_start") at /Develop/sony/build/toolchain/gcc/gcc/diagnostic.c:602 602 internal_error ("in %s, at %s:%d", function, trim_filename (file), line); The failure was on the assert in line 722(please find below): (gdb) up #1 0x00469d80 in bsi_start (bb=0x2ebc0100) at /Develop/sony/build/toolchain/gcc/gcc/tree-flow-inline.h:722 722 gcc_assert (bb->index < 0); (gdb) l 717 block_stmt_iterator bsi; 718 if (bb->stmt_list) 719 bsi.tsi = tsi_start (bb->stmt_list); 720 else 721 { 722 gcc_assert (bb->index < 0); 723 bsi.tsi.ptr = NULL; 724 bsi.tsi.container = NULL; 725 } 726 bsi.bb = bb; (gdb)
Re: A problem with the loop structure
Hi, Thanks a lot for your help and suggestions! Below I attach some more observations. I would be grateful for any more ideas on what can be wrong here. Thanks a lot, Vladimir --- The problem happens because the num_nodes of the outer loop of the versioned loop is more than what is reported by the DFS traversal. i) The loops are: 1) Before versioning: ;; Loop 7: //Outer loop ;; header 256, latch 24 ;; depth 4, level 2, outer 6 ;; nodes: 256 24 23 21 ;; ;; Loop 8: //To be versioned ;; header 24, latch 24 ;; depth 5, level 1, outer 7 ;; nodes: 24 (gdb) p loop->num_nodes $221 = 2 (gdb) p loop->num $222 = 8 2) After versioning (loop 7 printed suppressing the assert(dfs_result == num_nodes): ;; Loop 7: ;; header 256, latch 266 ;; depth 4, level 2, outer 6 ;; nodes: 256 266 24 265 259 263 262 23 21 ;; ;; Loop 8: ;; header 24, latch 259 ;; depth 5, level 1, outer 7 ;; nodes: 24 259 ;; Loop 54: ;; header 260, latch 261 ;; depth 5, level 1, outer 7 ;; nodes: 260 261 ii) In loop_version there are two calls to loop_split_edge_with 1. loop_split_edge_with (loop_preheader_edge (loop), NULL); 2. loop_split_edge_with (loop_preheader_edge (nloop), NULL); nloop is the versioned loop, loop is the original. loop_split_edge_with has the following: new_bb = split_edge (e); add_bb_to_loop (new_bb, loop_c); 1) When we get to the fist call, nloop->outer->num_nodes = 8 while dfs returns 6. After the first call nloop->outer->num_nodes = 9 and dfs returns 7, seems that add_bb_to_loop performed OK in this case. Here is the dump of new_bb in the first call: loop_split_edge_with (edge e, rtx insns) -> add_bb_to_loop (new_bb, loop_c); Correct result: (gdb) p debug_bb_n(new_bb->index) ;; basic block 263, loop depth 0, count 0 ;; prev block 262, next block 24 ;; pred: 262 ;; succ: 24 [100.0%] (fallthru) ;; Registers live at start: 1 [$sp] 127 [$127] 128 [$vfp] 129 [$vap] 587 607 626 628 629 635 672 733 735 792 846 897 1432 1437 1438 1440 1450 1453 1456 1560 1561 1676 (code_label 5826 5824 5825 263 418 "" [1 uses]) (note 5825 5826 710 263 [bb 263] NOTE_INSN_BASIC_BLOCK) ;; Registers live at end: 1 [$sp] 127 [$127] 128 [$vfp] 129 [$vap] 587 607 626 628 629 635 672 733 735 792 846 897 1432 1437 1438 1440 1450 1453 1456 1560 1561 1676 2. Now, the second call to loop_split_edge_with (edge e, rtx insns) -> add_bb_to_loop (new_bb, loop_c) results in nloop->outer->num_nodes = 10 while dfs still returns 7. Printing new_bb we see it has both pred and succ fallthru. (gdb) p debug_bb_n(new_bb->index) ;; basic block 264, loop depth 0, count 0 ;; prev block 262, next block 263 ;; pred: 262 [100.0%] (fallthru) ;; succ: 260 [100.0%] (fallthru) ;; Registers live at start: 1 [$sp] 127 [$127] 128 [$vfp] 129 [$vap] 587 607 626 628 629 635 672 733 735 792 846 897 1432 1437 1438 1440 1450 1453 1456 1560 1561 1676 (note 5827 5824 5826 264 [bb 264] NOTE_INSN_BASIC_BLOCK) ;; Registers live at end: 1 [$sp] 127 [$127] 128 [$vfp] 129 [$vap] 587 607 626 628 629 635 672 733 735 792 846 897 1432 1437 1438 1440 1450 1453 1456 1560 1561 1676 $215 = (struct basic_block_def *) 0x2ebc0500 On 4/29/07, Zdenek Dvorak <[EMAIL PROTECTED]> wrote: Hello, > (based on gcc 4.1.1). now that is a problem; things have changed a lot since then, so I am not sure how much I will be able to help. > 1. The problem was unveiled by compiling a testcase with dump turned > on. The compilation failed while calling function get_loop_body from > flow_loop_dump on the following assert : > > else if (loop->latch != loop->header) >{ > tv = dfs_enumerate_from (loop->latch, 1, glb_enum_p, > tovisit + 1, loop->num_nodes - 1, > loop->header) + 1; > > > gcc_assert (tv == loop->num_nodes); > > The compilation exits successfully if compiled without enabling the dump. this means that there is some problem in some loop transformation, forgetting to record membership of some blocks to their loops or something like that. > 2. SMS pass contained a single call to loop_version on the loop to be > SMSed. This happened before any SMS related stuff was done. Trying to > call verify_loop_structure(loops) just after the call to loop_version > failed on the same assert in get_loop_body as in (1). The loop on > which we fail is neither the versioned loop nor the new loop. Probably it is their superloop? > Below > there are dumps to verify_loop_structure called from different places > in loop_version: These dumps are not very useful, loop structures do not have to be consistent in the middle of the transformation. > 3. At the very beginning of the SMS pass we build the loop structure > using build_loops_structure defined in modulo-sched.c. Just after the > > call I tried to print in gdb the loop on which we failed in > get_loop_body. This failed as well > > (gdb) p print_loop(dumpfile, 0xbabe20,
[RFC] propagating loop dependences from trees to RTL (for SMS)
As a follow up to http://gcc.gnu.org/ml/gcc/2005-04/msg00461.html I would like to improve SMS by passing data dependencies information computed in tree-level to rtl-level SMS. Currently data-dependency graph built for use by SMS has an edge for every two data references (i.e. it's too conservative). I want to check for every loop, using functions defined in tree-data-ref.c, if there are data dependencies in the loop. The problem is how to pass this information to SMS (note - we're only trying to convey whether there are no dependencies at all in the loop - i.e. one bit of information). The alternatives being considered are: 1. Introduce a new BB bit flag and set it for the header BB of a loop that has no data dependencies. This approach already works, but only if the old loop optimizer (pass_loop_optimize) is disabled (otherwise the bit doesn't survive). One potential problem is that the loop header BB may change between the tree-level and SMS as result of some optimization pass (can that really happen?) 2. Use a bitmap (as suggested in http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01353.html) that is indexed using the BB index. In my case I need to define and use the property within different functions. I can define a static function "set_and_check_nodeps(bb_loop_header)" and define a bitmap there. Like the previous solution, The problem that can arise is that some intermediate optimizations can change the header of the loop. By the way, is it guaranteed that a BB keeps the same index throught the entire compilation? 3. Use insn_notes - introduce a new note "NOTE_INSN_NO_DEPS_IN_LOOP" to be inserted after the "NOTE_INSN_LOOP_BEG" for relevant loops. 4. Other ideas? thanks, Vladimir
Re: A problem with the loop structure
Hi, The problem with the ICE after the loop versioning in SMS was caused because the header of the versioned loop was at the same time the latch of the outer loop. After the versioning the nodes of the newly created loop could not be accessed by a DFS traversal of the outer loop starting from its latch (header of the versioned loop), leading to the ICE on assert that the number of nodes reported by DFS is nloop->outer->num_nodes. Solution (for the case of the call in SMS): call canon_loop(loop->outer) before the call to versioning in the sms_schedule so that a new empty latch is created for the outer loop. Thanks, Vladimir On 5/4/07, Zdenek Dvorak <[EMAIL PROTECTED]> wrote: Hello, > ii) > In loop_version there are two calls to loop_split_edge_with > 1. loop_split_edge_with (loop_preheader_edge (loop), NULL); > 2. loop_split_edge_with (loop_preheader_edge (nloop), NULL); > nloop is the versioned loop, loop is the original. > > loop_split_edge_with has the following: > new_bb = split_edge (e); > add_bb_to_loop (new_bb, loop_c); > > 1) When we get to the fist call, nloop->outer->num_nodes = 8 while dfs > returns 6. then the problem is before this call; you need to check which two blocks that are marked as belonging to nloop->outer in fact do not belong to this loop, and why. Zdenek
Does unrolling prevents doloop optimizations?
Hello, In file loop_doloop.c function doloop_condition_get makes sure that the condition is GE or NE otherwise it prevents doloop optimizations. This caused a problem for a loop which had NE condition without unrolling and EQ if unrolling was run. Can I make doloop work after the unroller? Thanks, Vladimir Without unrolling: (insn 135 80 136 4 (set (reg:SI 204 [ LastIndex ]) (plus:SI (reg:SI 204 [ LastIndex ]) (const_int -1 [0x]))) 51 {addsi3} (nil) (nil)) (jump_insn 136 135 84 4 (set (pc) (if_then_else (ne:SI (reg:SI 204 [ LastIndex ]) (const_int 0 [0x0])) (label_ref:SI 69) (pc))) 368 {*spu.md:3288} (insn_list:REG_DEP_TRUE 135 (nil)) (expr_list:REG_BR_PROB (const_int 9000 [0x2328]) (nil))) After unrolling: (insn 445 421 446 21 (set (reg:SI 213) (plus:SI (reg:SI 213) (const_int -1 [0x]))) 51 {addsi3} (nil) (nil)) (jump_insn 446 445 667 21 (set (pc) (if_then_else (eq:SI (reg:SI 213) (const_int 0 [0x0])) (label_ref:SI 465) (pc))) 368 {*spu.md:3288} (insn_list:REG_DEP_TRUE 445 (nil)) (expr_list:REG_BR_PROB (const_int 1000 [0x3e8]) (nil)))
Re: Does unrolling prevents doloop optimizations?
Thanks, To make sure I understood you correctly, does it mean that the change (below in /* */) in doloop_condition_get is safe? /* We expect a GE or NE comparison with 0 or 1. */ if (/*(GET_CODE (condition) != GE && GET_CODE (condition) != NE) ||*/ (XEXP (condition, 1) != const0_rtx && XEXP (condition, 1) != const1_rtx)) return 0; Thanks, Vladimir On 6/12/07, Zdenek Dvorak <[EMAIL PROTECTED]> wrote: Hello, > In file loop_doloop.c function doloop_condition_get makes sure that > the condition is GE or NE > otherwise it prevents doloop optimizations. This caused a problem for > a loop which had NE condition without unrolling and EQ if unrolling > was run. actually, doloop_condition_get is not applied to the code of the program, so this change is irrelevant (doloop_condition_get is applied to the doloop pattern from the machine description). So there must be some other reason why doloop transformation is not applied for your loop. Zdenek > Can I make doloop work after the unroller? > > Thanks, > Vladimir > > > Without unrolling: > (insn 135 80 136 4 (set (reg:SI 204 [ LastIndex ]) >(plus:SI (reg:SI 204 [ LastIndex ]) >(const_int -1 [0x]))) 51 {addsi3} (nil) >(nil)) > > (jump_insn 136 135 84 4 (set (pc) >(if_then_else (ne:SI (reg:SI 204 [ LastIndex ]) >(const_int 0 [0x0])) >(label_ref:SI 69) >(pc))) 368 {*spu.md:3288} (insn_list:REG_DEP_TRUE 135 (nil)) >(expr_list:REG_BR_PROB (const_int 9000 [0x2328]) >(nil))) > > > After unrolling: > (insn 445 421 446 21 (set (reg:SI 213) >(plus:SI (reg:SI 213) >(const_int -1 [0x]))) 51 {addsi3} (nil) >(nil)) > > (jump_insn 446 445 667 21 (set (pc) >(if_then_else (eq:SI (reg:SI 213) >(const_int 0 [0x0])) >(label_ref:SI 465) >(pc))) 368 {*spu.md:3288} (insn_list:REG_DEP_TRUE 445 (nil)) >(expr_list:REG_BR_PROB (const_int 1000 [0x3e8]) >(nil)))