RFC: SMS problem with emit_copy_of_insn_after copying REG_NOTEs

2006-12-18 Thread Vladimir Yanovsky

Hello all,

I'm preparing and testing SMS correction/improvements patch and while
testing it on the SPU with the vectorizer testcases I've got an ICE in
the  "gcc_assert ( MAX_RECOG_OPERANDS - i)" in function copy_insn_1 in
emit_rtl.c. The call traces back to the loop versionioning called in
modulo-sched.c before the SMSing actually starts. The specific
instruction it tries to copy when it fails is

(insn 32 31 33 4 (parallel [
   (set (reg:SI 162)
   (div:SI (reg:SI 164)
   (reg:SI 156)))
   (set (reg:SI 163)
   (mod:SI (reg:SI 164)
   (reg:SI 156)))
   (clobber (scratch:SI))
   (clobber (scratch:SI))
   (clobber (scratch:SI))
   (clobber (scratch:SI))
   (clobber (scratch:SI))
   (clobber (scratch:SI))
   (clobber (scratch:SI))
   (clobber (scratch:SI))
   (clobber (scratch:SI))
   (clobber (reg:SI 130 hbr))
   ]) 129 {divmodsi4} (insn_list:REG_DEP_TRUE 30
(insn_list:REG_DEP_TRUE 31 (nil)))
   (expr_list:REG_DEAD (reg:SI 164)
   (expr_list:REG_DEAD (reg:SI 156)
   (expr_list:REG_UNUSED (reg:SI 130 hbr)
   (expr_list:REG_UNUSED (scratch:SI)
   (expr_list:REG_UNUSED (scratch:SI)
   (expr_list:REG_UNUSED (scratch:SI)
   (expr_list:REG_UNUSED (scratch:SI)
   (expr_list:REG_UNUSED (scratch:SI)
   (expr_list:REG_UNUSED (scratch:SI)
   (expr_list:REG_UNUSED (scratch:SI)
   (expr_list:REG_UNUSED (scratch:SI)
   (expr_list:REG_UNUSED
(scratch:SI)
(expr_list:REG_UNUSED (reg:SI 163)
   (nil)))

The error happens in the first call to copy_insn_1 in the loop below
(copied from emit_copy_of_insn_after from emit_rtl.c):


 for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
   if (REG_NOTE_KIND (link) != REG_LABEL)
 {
   if (GET_CODE (link) == EXPR_LIST)
 REG_NOTES (new)
   = copy_insn_1 (gen_rtx_EXPR_LIST (REG_NOTE_KIND (link),
 XEXP (link, 0),
 REG_NOTES (new)));
   else
 REG_NOTES (new)
   = copy_insn_1 (gen_rtx_INSN_LIST (REG_NOTE_KIND (link),
 XEXP (link, 0),
 REG_NOTES (new)));
 }

Tracing the execution of copy_insn_1, it seems that it goes over the
same REG_NOTES many times (it seems to be a quadratic time complexity
algorithm). This causes "copy_insn_n_scratches++" to be executed more
times than there are SCRATCH registers (and even REG_NOTES) leading to
the failure in the assert. There are 9 SCRATCH registers used in the
instruction and MAX_RECOG_OPERANDS is 30 for the SPU.

Since copy_insn_n_scratches is initialized in copy_insn and since we
go over regnotes over and   over again, I've modified in the loop
above the two calls to copy_insn_1 with the calls to copy_insn. This
caused the ICEs in the testsuite to disappear.

I wonder if this constitutes a legitimate fix or I'm missing something?

Thanks in advance,
Vladimir


Re: RFC: SMS problem with emit_copy_of_insn_after copying REG_NOTEs

2006-12-19 Thread Vladimir Yanovsky

Hi, Jan,
Thanks for fast response!

I've tested the change you proposed and we still failed in the assert
checking that the number of SCRATCHes being too large (>30) while
copying the REG_NOTES of the instruction (see below) using just 9
SCRATCH registers.

Thanks,
Vladimir

On 12/18/06, Jan Hubicka <[EMAIL PROTECTED]> wrote:

> Hello all,
>
> I'm preparing and testing SMS correction/improvements patch and while
> testing it on the SPU with the vectorizer testcases I've got an ICE in
> the  "gcc_assert ( MAX_RECOG_OPERANDS - i)" in function copy_insn_1 in
> emit_rtl.c. The call traces back to the loop versionioning called in
> modulo-sched.c before the SMSing actually starts. The specific
> instruction it tries to copy when it fails is
>
> (insn 32 31 33 4 (parallel [
>(set (reg:SI 162)
>(div:SI (reg:SI 164)
>(reg:SI 156)))
>(set (reg:SI 163)
>(mod:SI (reg:SI 164)
>(reg:SI 156)))
>(clobber (scratch:SI))
>(clobber (scratch:SI))
>(clobber (scratch:SI))
>(clobber (scratch:SI))
>(clobber (scratch:SI))
>(clobber (scratch:SI))
>(clobber (scratch:SI))
>(clobber (scratch:SI))
>(clobber (scratch:SI))
>(clobber (reg:SI 130 hbr))
>]) 129 {divmodsi4} (insn_list:REG_DEP_TRUE 30
> (insn_list:REG_DEP_TRUE 31 (nil)))
>(expr_list:REG_DEAD (reg:SI 164)
>(expr_list:REG_DEAD (reg:SI 156)
>(expr_list:REG_UNUSED (reg:SI 130 hbr)
>(expr_list:REG_UNUSED (scratch:SI)
>(expr_list:REG_UNUSED (scratch:SI)
>(expr_list:REG_UNUSED (scratch:SI)
>(expr_list:REG_UNUSED (scratch:SI)
>(expr_list:REG_UNUSED (scratch:SI)
>(expr_list:REG_UNUSED (scratch:SI)
>(expr_list:REG_UNUSED (scratch:SI)
>(expr_list:REG_UNUSED
>(scratch:SI)
>(expr_list:REG_UNUSED
> (scratch:SI)
> (expr_list:REG_UNUSED (reg:SI 163)
>(nil)))
>
> The error happens in the first call to copy_insn_1 in the loop below
> (copied from emit_copy_of_insn_after from emit_rtl.c):
>
>
>  for (link = REG_NOTES (insn); link; link = XEXP (link, 1))
>if (REG_NOTE_KIND (link) != REG_LABEL)
>  {
>if (GET_CODE (link) == EXPR_LIST)
>  REG_NOTES (new)
>= copy_insn_1 (gen_rtx_EXPR_LIST (REG_NOTE_KIND (link),
>  XEXP (link, 0),
>  REG_NOTES (new)));
>else
>  REG_NOTES (new)
>= copy_insn_1 (gen_rtx_INSN_LIST (REG_NOTE_KIND (link),
>  XEXP (link, 0),
>  REG_NOTES (new)));
>  }
>
THanks for sending updated patch, I will try to look across it tomorrow
> Tracing the execution of copy_insn_1, it seems that it goes over the
> same REG_NOTES many times (it seems to be a quadratic time complexity
> algorithm). This causes "copy_insn_n_scratches++" to be executed more
> times than there are SCRATCH registers (and even REG_NOTES) leading to
> the failure in the assert. There are 9 SCRATCH registers used in the
> instruction and MAX_RECOG_OPERANDS is 30 for the SPU.
>
> Since copy_insn_n_scratches is initialized in copy_insn and since we
> go over regnotes over and   over again, I've modified in the loop
> above the two calls to copy_insn_1 with the calls to copy_insn. This
> caused the ICEs in the testsuite to disappear.
>
> I wonder if this constitutes a legitimate fix or I'm missing something?

I believe you really want to avoid quadratic amount of work.  This is
probably best done by
  REG_NOTES (new)
= gen_rtx_EXPR_LIST (REG_NOTE_KIND (link),
 copy_insn_1 (XEXP (link, 0)),
 REG_NOTES (new)));
so copy_insn_1 don't recusively descend into already copied chain.

Honza
>
> Thanks in advance,
> Vladimir



Re: RFC: SMS problem with emit_copy_of_insn_after copying REG_NOTEs

2006-12-28 Thread Vladimir Yanovsky

Hi,
I've rebuilt again everything from scratch with the changes to
emit_copy_of_insn_after as Jan suggested  (see patch below) and the
ICE caused by quadratic accumulation of the counter of scratch
registers is gone!

Thanks,
Vladimir

Index: emit-rtl.c
===
--- emit-rtl.c  (revision 120004)
+++ emit-rtl.c  (working copy)
@@ -5296,16 +5306,16 @@
if (REG_NOTE_KIND (link) != REG_LABEL)
  {
   if (GET_CODE (link) == EXPR_LIST)
- REG_NOTES (new)
-   = copy_insn_1 (gen_rtx_EXPR_LIST (REG_NOTE_KIND (link),
- XEXP (link, 0),
- REG_NOTES (new)));
+
+ REG_NOTES (new)
+   = gen_rtx_EXPR_LIST (REG_NOTE_KIND (link),
+ copy_insn_1 (XEXP (link, 0)),  REG_NOTES (new));
   else
- REG_NOTES (new)
-   = copy_insn_1 (gen_rtx_INSN_LIST (REG_NOTE_KIND (link),
- XEXP (link, 0),
- REG_NOTES (new)));
-  }
+  REG_NOTES (new)
+   = gen_rtx_INSN_LIST (REG_NOTE_KIND (link),
+ copy_insn_1 (XEXP (link, 0)),  REG_NOTES (new));
+
+ }

  /* Fix the libcall sequences.  */
  if ((note1 = find_reg_note (new, REG_RETVAL, NULL_RTX)) != NULL)

if (GET_CODE (link) == EXPR_LIST)

#if 1
REG_NOTES (new) =
  gen_rtx_EXPR_LIST (REG_NOTE_KIND (link),
  copy_insn_1 (XEXP (link, 0)),  REG_NOTES (new));
#endif

#if 0

  REG_NOTES (new)
   = copy_insn_1 (gen_rtx_EXPR_LIST (REG_NOTE_KIND (link),
 XEXP (link, 0),
REG_NOTES (new)));

#endif
   else
#if 0
 REG_NOTES (new)
   = copy_insn_1 (gen_rtx_INSN_LIST (REG_NOTE_KIND (link),
 XEXP (link, 0),
 REG_NOTES (new)));
#endif
#if 1
 REG_NOTES (new) =
 gen_rtx_INSN_LIST (REG_NOTE_KIND (link),
 copy_insn_1 (XEXP (link, 0)),  REG_NOTES (new));
#endif

}


On 12/19/06, Jan Hubicka <[EMAIL PROTECTED]> wrote:

> Hi, Jan,
> Thanks for fast response!
>
> I've tested the change you proposed and we still failed in the assert
> checking that the number of SCRATCHes being too large (>30) while
> copying the REG_NOTES of the instruction (see below) using just 9
> SCRATCH registers.

Hi,
apparently there seems to be another reason copy_insn_1 can do quadratic
amount of work except for this one, I don't seem to be able to see any
however.  Just for sure, did you updated both cases of wrong recursion,
the EXPR_LIST I sent and the INSN_LIST hunk just bellow?
Otherwise probably adding a breakpoint on copy_insn_1 and seeing how it
manage to do so many recursions will surely help :)

Honza



Re: RFC: SMS problem with emit_copy_of_insn_after copying REG_NOTEs

2006-12-31 Thread Vladimir Yanovsky

Hi,
Sorry for possibly causing confusion. I had tested the patch on my ICE
testcase and bootstrapped for -enable-languages=C, but didn't run the
full bootstrap. Bootstrapping the latest Andrew's patch on ppc-linux
and testing it on SPU.

Vladimir

On 12/30/06, Jan Hubicka <[EMAIL PROTECTED]> wrote:

Hi,
thanks for testing.  I've bootstrapped/regtested this variant of patch
and comitted it as obvious.

Honza

2006-12-30  Jan Hubicka  <[EMAIL PROTECTED]>
    Vladimir Yanovsky <[EMAIL PROTECTED]>

* emit-rt.c (emit_copy_of_insn_after): Fix bug causing exponential
amount of copies of INSN_NOTEs list.

Index: emit-rtl.c
===
--- emit-rtl.c  (revision 120274)
+++ emit-rtl.c  (working copy)
@@ -5297,14 +5297,12 @@ emit_copy_of_insn_after (rtx insn, rtx a
   {
if (GET_CODE (link) == EXPR_LIST)
  REG_NOTES (new)
-   = copy_insn_1 (gen_rtx_EXPR_LIST (REG_NOTE_KIND (link),
- XEXP (link, 0),
- REG_NOTES (new)));
+   = gen_rtx_EXPR_LIST (REG_NOTE_KIND (link),
+ copy_insn_1 (XEXP (link, 0)),  REG_NOTES (new));
else
  REG_NOTES (new)
-   = copy_insn_1 (gen_rtx_INSN_LIST (REG_NOTE_KIND (link),
- XEXP (link, 0),
- REG_NOTES (new)));
+  = gen_rtx_INSN_LIST (REG_NOTE_KIND (link),
+copy_insn_1 (XEXP (link, 0)),  REG_NOTES (new));
   }

   /* Fix the libcall sequences.  */



Re: RFC: SMS problem with emit_copy_of_insn_after copying REG_NOTEs

2007-01-01 Thread Vladimir Yanovsky

I've bootstrapped OK C/C++/Fortran on PPC. make check-gcc is running now

Thanks,
Vladimir

On 1/1/07, Jan Hubicka <[EMAIL PROTECTED]> wrote:

> Hi,
> Sorry for possibly causing confusion. I had tested the patch on my ICE
> testcase and bootstrapped for -enable-languages=C, but didn't run the
> full bootstrap. Bootstrapping the latest Andrew's patch on ppc-linux
> and testing it on SPU.

Vladimir,
I bootstrapped/regtested the patch myself on i686 before commiting it,
so the rule was met here.  Unfortunately i686 don't seems to show the
regression.  I've bootstrapped/regtested x86_64 and i686 with Andrew's
patch and it works all fine.

Honza



Re: RFC: SMS problem with emit_copy_of_insn_after copying REG_NOTEs

2007-01-02 Thread Vladimir Yanovsky

The testing of the committed patch on the PPC-linux has produced no
regressions relatively to the state that was before the bootstrap
break-up. The same holds for the Andrew's version of the patch. 21
testsuite failures on PPC-linux that were introduced together with the
bootstrap problem has disappeared with this commit.

Vladimir

On 1/1/07, Jan Hubicka <[EMAIL PROTECTED]> wrote:

Hi,
I've commited the following patch that fixes the obvious problem of
calling emit_insn_1 for INSN_LIST argument.  It seems to solve the
problems I can reproduce and it bootstraps x86_64-linux/i686-linux and
Darwin (thanks to andreast).  The patch was preaproved by Ian.  This is
meant as fast fix to avoid bootstrap.  Andrew's optimization still makes
sense as an microoptimization and the nested libcall issue probably
ought to be resolved, but can be dealt with incrementally.

My apologizes for the problems.
Honza

Index: ChangeLog
===
--- ChangeLog   (revision 120315)
+++ ChangeLog   (working copy)
@@ -1,3 +1,8 @@
+2007-01-01  Jan Hubicka  <[EMAIL PROTECTED]>
+
+   * emit-rtl.c (emit_copy_of_insn_after): Do not call copy_insn_1 for
+   INSN_LIST.
+
 2007-01-01  Mike Stump  <[EMAIL PROTECTED]>

* configure.ac (HAVE_GAS_LITERAL16): Add autoconf check for
Index: emit-rtl.c
===
--- emit-rtl.c  (revision 120313)
+++ emit-rtl.c  (working copy)
@@ -5302,7 +5302,7 @@ emit_copy_of_insn_after (rtx insn, rtx a
else
  REG_NOTES (new)
   = gen_rtx_INSN_LIST (REG_NOTE_KIND (link),
-copy_insn_1 (XEXP (link, 0)),  REG_NOTES (new));
+XEXP (link, 0),  REG_NOTES (new));
   }

   /* Fix the libcall sequences.  */



A problem with the loop structure

2007-04-28 Thread Vladimir Yanovsky

Hi all,

I will greatly appreciate any suggestions regarding the following
problem I have with the loop structure. I am working on Swing Modulo
Scheduling with Sony SDK for SPU (based on gcc 4.1.1). Below there are
3 observation describing the problem.

Thanks a lot,
Vladimir

1. The problem was unveiled by compiling a testcase with dump turned
on. The compilation failed while calling function get_loop_body from
flow_loop_dump on the following assert :

 else if (loop->latch != loop->header)
   {
 tv = dfs_enumerate_from (loop->latch, 1, glb_enum_p,
  tovisit + 1, loop->num_nodes - 1,
  loop->header) + 1;


 gcc_assert (tv == loop->num_nodes);

The compilation exits successfully if compiled without enabling the dump.

2. SMS pass contained a single call to loop_version on the loop to be
SMSed. This happened before any SMS related stuff was done. Trying to
call verify_loop_structure(loops) just after the call to loop_version
failed on the same assert in get_loop_body as in (1). The loop on
which we fail is neither the versioned loop nor the new loop. Below
there are dumps to verify_loop_structure called from different places
in loop_version:

(gdb) n
1466  first_head = entry->dest;
(gdb) p verify_loop_structure(loops)
$12 = void
(gdb) n
1469  if (!cfg_hook_duplicate_loop_to_header_edge (loop, entry, loops, 1,
(gdb) p verify_loop_structure(loops)
$13 = void
(gdb) n
1475  second_head = entry->dest;
(gdb) p verify_loop_structure(loops)
bmark_lite.c: In function 't_run_test':
bmark_lite.c:1225: error: loop 7's header does not have exactly 2 entries

Breakpoint 1, fancy_abort (
   file=0x884008 "/Develop/sony/build/toolchain/gcc/gcc/cfgloop.c", line=1277,
   function=0x8841d0 "verify_loop_structure")
   at /Develop/sony/build/toolchain/gcc/gcc/diagnostic.c:602
602   internal_error ("in %s, at %s:%d", function, trim_filename
(file), line);



3.  At the very beginning of the SMS pass we build the loop structure
using build_loops_structure defined in modulo-sched.c. Just after the
call I tried to print in gdb the loop on which we failed in
get_loop_body. This failed as well

(gdb)  p print_loop(dumpfile, 0xbabe20, 0)
No symbol "dumpfile" in current context.
(gdb)  p print_loop(stdout, 0xbabe20, 0)
loop_0
{
}
$1 = void
(gdb)  p print_loop(stdout, 0xd42e20, 0)
loop_7
{
 bb_21 (preds = {bb_256 }, succs = {bb_23 bb_22 })
 {
 :;
   matrixA.770 = matrixA;
   temp.801 = *(matrixA.770 + (varsize * *) ivtmp.701 * 4B);
   temp.874 = temp.801 + pretmp.130;
   sum1_lsm.411 = *temp.874;
   col1_lsm.839 = (int) ivtmp.701;
   col1_lsm.837 = 0;
   if (col1_lsm.839 > 0) goto ; else (void) 0;

 }
 bb_23 (preds = {bb_21 }, succs = {bb_262 })
 {
 :;
   ivtmp.694 = 0;

 }
 bb_262 (preds = {bb_23 }, succs = {bb_264 bb_263 })
 {

Breakpoint 1, fancy_abort (
   file=0x86e980 "/Develop/sony/build/toolchain/gcc/gcc/tree-flow-inline.h",
   line=722, function=0x86e9b9 "bsi_start")
   at /Develop/sony/build/toolchain/gcc/gcc/diagnostic.c:602
602   internal_error ("in %s, at %s:%d", function, trim_filename
(file), line);


The failure was on the assert in line 722(please find below):

(gdb) up
#1  0x00469d80 in bsi_start (bb=0x2ebc0100)
   at /Develop/sony/build/toolchain/gcc/gcc/tree-flow-inline.h:722
722   gcc_assert (bb->index < 0);
(gdb) l
717   block_stmt_iterator bsi;
718   if (bb->stmt_list)
719 bsi.tsi = tsi_start (bb->stmt_list);
720   else
721 {
722   gcc_assert (bb->index < 0);
723   bsi.tsi.ptr = NULL;
724   bsi.tsi.container = NULL;
725 }
726   bsi.bb = bb;
(gdb)


Re: A problem with the loop structure

2007-05-01 Thread Vladimir Yanovsky

Hi,

Thanks a lot for your help and suggestions! Below I attach some more
observations. I would be grateful for any more ideas on what can be
wrong here.

Thanks a lot,
Vladimir
---

The problem happens because the num_nodes of the outer loop of the
versioned loop is more than what is reported by the DFS traversal.

i)
The loops are:
1) Before versioning:
;; Loop 7:  //Outer loop
;;  header 256, latch 24
;;  depth 4, level 2, outer 6
;;  nodes: 256 24 23 21
;;
;; Loop 8:  //To be versioned
;;  header 24, latch 24
;;  depth 5, level 1, outer 7
;;  nodes: 24


(gdb) p loop->num_nodes
$221 = 2
(gdb) p loop->num
$222 = 8


2) After versioning (loop 7 printed suppressing the assert(dfs_result
== num_nodes):
;; Loop 7:
;;  header 256, latch 266
;;  depth 4, level 2, outer 6
;;  nodes: 256 266 24 265 259 263 262 23 21
;;
;; Loop 8:
;;  header 24, latch 259
;;  depth 5, level 1, outer 7
;;  nodes: 24 259
;; Loop 54:
;;  header 260, latch 261
;;  depth 5, level 1, outer 7
;;  nodes: 260 261

ii)
In loop_version there are two calls to loop_split_edge_with
1.  loop_split_edge_with (loop_preheader_edge (loop), NULL);
2.  loop_split_edge_with (loop_preheader_edge (nloop), NULL);
nloop is the versioned loop, loop is the original.

loop_split_edge_with has the following:
 new_bb = split_edge (e);
 add_bb_to_loop (new_bb, loop_c);

1) When we get to the fist call, nloop->outer->num_nodes = 8 while dfs
returns 6.
After the first call
nloop->outer->num_nodes = 9 and dfs returns 7, seems that
add_bb_to_loop performed OK in this case.

Here is the dump of new_bb in the first call:

loop_split_edge_with (edge e, rtx insns) ->  add_bb_to_loop (new_bb, loop_c);
Correct result:
(gdb) p debug_bb_n(new_bb->index)
;; basic block 263, loop depth 0, count 0
;; prev block 262, next block 24
;; pred:   262
;; succ:   24 [100.0%]  (fallthru)
;; Registers live at start:  1 [$sp] 127 [$127] 128 [$vfp] 129 [$vap]
587 607 626 628 629 635 672 733 735 792 846 897 1432 1437 1438 1440
1450 1453 1456 1560 1561 1676
(code_label 5826 5824 5825 263 418 "" [1 uses])
(note 5825 5826 710 263 [bb 263] NOTE_INSN_BASIC_BLOCK)
;; Registers live at end:  1 [$sp] 127 [$127] 128 [$vfp] 129 [$vap]
587 607 626 628 629 635 672 733 735 792 846 897 1432 1437 1438 1440
1450 1453 1456 1560 1561 1676

2. Now, the second call to
loop_split_edge_with (edge e, rtx insns) ->  add_bb_to_loop (new_bb, loop_c)
results in nloop->outer->num_nodes = 10 while dfs still returns 7.
Printing new_bb we see it has both pred and succ fallthru.

(gdb) p debug_bb_n(new_bb->index)
;; basic block 264, loop depth 0, count 0
;; prev block 262, next block 263
;; pred:   262 [100.0%]  (fallthru)
;; succ:   260 [100.0%]  (fallthru)
;; Registers live at start:  1 [$sp] 127 [$127] 128 [$vfp] 129 [$vap]
587 607 626 628 629 635 672 733 735 792 846 897 1432 1437 1438 1440
1450 1453 1456 1560 1561 1676
(note 5827 5824 5826 264 [bb 264] NOTE_INSN_BASIC_BLOCK)
;; Registers live at end:  1 [$sp] 127 [$127] 128 [$vfp] 129 [$vap]
587 607 626 628 629 635 672 733 735 792 846 897 1432 1437 1438 1440
1450 1453 1456 1560 1561 1676
$215 = (struct basic_block_def *) 0x2ebc0500


On 4/29/07, Zdenek Dvorak <[EMAIL PROTECTED]> wrote:

Hello,

> (based on gcc 4.1.1).

now that is a problem; things have changed a lot since then, so I am not
sure how much I will be able to help.

> 1. The problem was unveiled by compiling a testcase with dump turned
> on. The compilation failed while calling function get_loop_body from
> flow_loop_dump on the following assert :
>
>  else if (loop->latch != loop->header)
>{
>  tv = dfs_enumerate_from (loop->latch, 1, glb_enum_p,
>   tovisit + 1, loop->num_nodes - 1,
>   loop->header) + 1;
>
>
>  gcc_assert (tv == loop->num_nodes);
>
> The compilation exits successfully if compiled without enabling the dump.

this means that there is some problem in some loop transformation,
forgetting to record membership of some blocks to their loops or something
like that.

> 2. SMS pass contained a single call to loop_version on the loop to be
> SMSed. This happened before any SMS related stuff was done. Trying to
> call verify_loop_structure(loops) just after the call to loop_version
> failed on the same assert in get_loop_body as in (1). The loop on
> which we fail is neither the versioned loop nor the new loop.

Probably it is their superloop?

> Below
> there are dumps to verify_loop_structure called from different places
> in loop_version:

These dumps are not very useful, loop structures do not have to be
consistent in the middle of the transformation.

> 3.  At the very beginning of the SMS pass we build the loop structure
> using build_loops_structure defined in modulo-sched.c. Just after the
>
> call I tried to print in gdb the loop on which we failed in
> get_loop_body. This failed as well
>
> (gdb)  p print_loop(dumpfile, 0xbabe20, 

[RFC] propagating loop dependences from trees to RTL (for SMS)

2005-09-21 Thread Vladimir Yanovsky




As a follow up to http://gcc.gnu.org/ml/gcc/2005-04/msg00461.html

I would like to improve SMS by passing data dependencies information
computed in tree-level to rtl-level SMS. Currently data-dependency graph
built for use by SMS has an edge for every two data references (i.e. it's
too conservative). I want to check for every loop, using functions defined
in tree-data-ref.c, if there are data dependencies in the loop. The problem
is how to pass this information to SMS (note - we're only trying to convey
whether there are no dependencies at all in the loop - i.e. one bit of
information). The alternatives being considered are:

1. Introduce a new BB bit flag and set it for the header BB of a loop that
has no data dependencies. This approach already works, but only if the old
loop optimizer (pass_loop_optimize) is disabled (otherwise the bit doesn't
survive). One potential problem is that the loop header BB may change
between the tree-level and SMS as result of some optimization pass (can
that really happen?)

2. Use a bitmap (as suggested in
http://gcc.gnu.org/ml/gcc-patches/2005-03/msg01353.html) that is indexed
using the BB index.
 In my case I need to define and use the property within different
functions. I can define a static function
 "set_and_check_nodeps(bb_loop_header)" and define a bitmap there.
 Like the previous solution, The problem that can arise is that some
intermediate optimizations can change the header of the loop. By the way,
is it guaranteed that a BB keeps the same index throught the entire
compilation?

3. Use insn_notes - introduce a new note "NOTE_INSN_NO_DEPS_IN_LOOP" to be
inserted after the "NOTE_INSN_LOOP_BEG" for relevant loops.

4. Other ideas?

thanks,

Vladimir



Re: A problem with the loop structure

2007-06-07 Thread Vladimir Yanovsky

Hi,

The problem with the ICE after the loop versioning in SMS was caused
because the header  of the versioned loop was at the same time the
latch of the outer loop. After the versioning the nodes of the newly
created loop could not be accessed by a DFS traversal of the outer
loop starting from its latch (header of the versioned loop), leading
to the ICE on assert that the number of nodes reported by DFS is
nloop->outer->num_nodes.

Solution (for the case of the call in SMS): call
canon_loop(loop->outer) before the call to versioning in the
sms_schedule  so that a new empty latch is created for the outer loop.

Thanks,
Vladimir


On 5/4/07, Zdenek Dvorak <[EMAIL PROTECTED]> wrote:

Hello,

> ii)
> In loop_version there are two calls to loop_split_edge_with
> 1.  loop_split_edge_with (loop_preheader_edge (loop), NULL);
> 2.  loop_split_edge_with (loop_preheader_edge (nloop), NULL);
> nloop is the versioned loop, loop is the original.
>
> loop_split_edge_with has the following:
>  new_bb = split_edge (e);
>  add_bb_to_loop (new_bb, loop_c);
>
> 1) When we get to the fist call, nloop->outer->num_nodes = 8 while dfs
> returns 6.

then the problem is before this call; you need to check which two blocks
that are marked as belonging to nloop->outer in fact do not belong to
this loop, and why.

Zdenek



Does unrolling prevents doloop optimizations?

2007-06-12 Thread Vladimir Yanovsky

Hello,

In file loop_doloop.c function doloop_condition_get makes sure that
the condition is GE or NE
otherwise it prevents doloop optimizations. This caused a problem for
a loop which had NE condition without unrolling and EQ if unrolling
was run. Can I make doloop work after the unroller?

Thanks,
Vladimir


Without unrolling:
(insn 135 80 136 4 (set (reg:SI 204 [ LastIndex ])
   (plus:SI (reg:SI 204 [ LastIndex ])
   (const_int -1 [0x]))) 51 {addsi3} (nil)
   (nil))

(jump_insn 136 135 84 4 (set (pc)
   (if_then_else (ne:SI (reg:SI 204 [ LastIndex ])
   (const_int 0 [0x0]))
   (label_ref:SI 69)
   (pc))) 368 {*spu.md:3288} (insn_list:REG_DEP_TRUE 135 (nil))
   (expr_list:REG_BR_PROB (const_int 9000 [0x2328])
   (nil)))


After unrolling:
(insn 445 421 446 21 (set (reg:SI 213)
   (plus:SI (reg:SI 213)
   (const_int -1 [0x]))) 51 {addsi3} (nil)
   (nil))

(jump_insn 446 445 667 21 (set (pc)
   (if_then_else (eq:SI (reg:SI 213)
   (const_int 0 [0x0]))
   (label_ref:SI 465)
   (pc))) 368 {*spu.md:3288} (insn_list:REG_DEP_TRUE 445 (nil))
   (expr_list:REG_BR_PROB (const_int 1000 [0x3e8])
   (nil)))


Re: Does unrolling prevents doloop optimizations?

2007-06-12 Thread Vladimir Yanovsky

Thanks,

To make sure I understood you correctly, does it mean that the change
(below in /* */) in doloop_condition_get is safe?

 /* We expect a GE or NE comparison with 0 or 1.  */
 if (/*(GET_CODE (condition) != GE
  && GET_CODE (condition) != NE)
 ||*/ (XEXP (condition, 1) != const0_rtx
 && XEXP (condition, 1) != const1_rtx))
   return 0;

Thanks,
Vladimir


On 6/12/07, Zdenek Dvorak <[EMAIL PROTECTED]> wrote:

Hello,

> In file loop_doloop.c function doloop_condition_get makes sure that
> the condition is GE or NE
> otherwise it prevents doloop optimizations. This caused a problem for
> a loop which had NE condition without unrolling and EQ if unrolling
> was run.

actually, doloop_condition_get is not applied to the code of the
program, so this change is irrelevant (doloop_condition_get is applied
to the doloop pattern from the machine description).  So there must be
some other reason why doloop transformation is not applied for your
loop.

Zdenek

> Can I make doloop work after the unroller?
>
> Thanks,
> Vladimir
>
> 

> Without unrolling:
> (insn 135 80 136 4 (set (reg:SI 204 [ LastIndex ])
>(plus:SI (reg:SI 204 [ LastIndex ])
>(const_int -1 [0x]))) 51 {addsi3} (nil)
>(nil))
>
> (jump_insn 136 135 84 4 (set (pc)
>(if_then_else (ne:SI (reg:SI 204 [ LastIndex ])
>(const_int 0 [0x0]))
>(label_ref:SI 69)
>(pc))) 368 {*spu.md:3288} (insn_list:REG_DEP_TRUE 135 (nil))
>(expr_list:REG_BR_PROB (const_int 9000 [0x2328])
>(nil)))
>
>
> After unrolling:
> (insn 445 421 446 21 (set (reg:SI 213)
>(plus:SI (reg:SI 213)
>(const_int -1 [0x]))) 51 {addsi3} (nil)
>(nil))
>
> (jump_insn 446 445 667 21 (set (pc)
>(if_then_else (eq:SI (reg:SI 213)
>(const_int 0 [0x0]))
>(label_ref:SI 465)
>(pc))) 368 {*spu.md:3288} (insn_list:REG_DEP_TRUE 445 (nil))
>(expr_list:REG_BR_PROB (const_int 1000 [0x3e8])
>(nil)))