Hi, I recently read the articles about the selective scheduling implementation and found it quite interesting, I especially liked the idea of how neatly software pipelining is integrated. The target I am working on is a VLIW DSP so obviously these things are very important for good code generation.
However when compiling the following C function with -fselective-scheduling2 and -fsel-sched-pipelining I face a few problems. long dotproduct2(int *a, int *b) { int i; long s=0; for (i = 0; i < 256; i++) s += (long)*a++**b++; return s; } The output I get from sched2 pass is: ... Scheduling region 0 Scheduling on fences: (uid:32;seqno:6;) scanning new insn with uid = 80. deleting insn with uid = 80. Scheduled 0 bookkeeping copies, 0 insns needed bookkeeping, 0 insns renamed, 0 insns substituted Scheduling region 1 Scheduling on fences: (uid:72;seqno:1;) scanning new insn with uid = 81. deleting insn with uid = 81. Scheduled 0 bookkeeping copies, 0 insns needed bookkeeping, 0 insns renamed, 0 insns substituted Scheduling region 2 Scheduling on fences: (uid:65;seqno:1;) scanning new insn with uid = 82. deleting insn with uid = 82. Scheduled 0 bookkeeping copies, 0 insns needed bookkeeping, 0 insns renamed, 0 insns substituted (note 26 27 65 2 NOTE_INSN_FUNCTION_BEG) (insn:TI 65 26 30 2 dotprod2.c:2 (set (mem:QI (pre_dec (reg/f:QI 32 sp)) [0 S1 A16]) (reg/f:QI 32 sp)) 12 {pushqi1} (nil)) (insn 30 65 62 2 dotprod2.c:2 (set (reg/v:HI 16 a0l [orig:62 s ] [62]) (const_int 0 [0x0])) 6 {*zero_load_hi} (expr_list:REG_EQUAL (const_int 0 [0x0]) (nil))) (insn 62 30 66 2 dotprod2.c:2 (set (reg:QI 2 r2 [70]) (const_int 256 [0x100])) 5 {*constant_load_qi} (expr_list:REG_EQUAL (const_int 256 [0x100]) (nil))) (insn:TI 66 62 67 2 dotprod2.c:2 (set (mem:QI (pre_dec (reg/f:QI 32 sp)) [0 S1 A16]) (reg/f:QI 33 dp)) 12 {pushqi1} (nil)) (insn:TI 67 66 69 2 dotprod2.c:2 (set (reg/f:QI 33 dp) (reg/f:QI 32 sp)) 10 {*move_regs_qi} (nil)) (note 69 67 39 2 NOTE_INSN_PROLOGUE_END) (code_label 39 69 31 3 2 "" [1 uses]) (note 31 39 34 3 [bb 3] NOTE_INSN_BASIC_BLOCK) (note 34 31 32 3 NOTE_INSN_DELETED) (insn:TI 32 34 33 3 dotprod2.c:10 (set (reg:QI 19 a1h [67]) (mem:QI (post_inc:QI (reg/v/f:QI 1 r1 [orig:65 b ] [65])) [2 S1 A16])) 3 {*load_word_qi_with_post_inc} (expr_list:REG_INC (reg/v/f:QI 1 r1 [orig:65 b ] [65]) (nil))) (insn 33 32 35 3 dotprod2.c:10 (set (reg:QI 18 a1l [68]) (mem:QI (post_inc:QI (reg/v/f:QI 0 r0 [orig:64 a ] [64])) [2 S1 A16])) 3 {*load_word_qi_with_post_inc} (expr_list:REG_INC (reg/v/f:QI 0 r0 [orig:64 a ] [64]) (nil))) (insn 35 33 61 3 dotprod2.c:10 (set (reg/v:HI 16 a0l [orig:62 s ] [62]) (plus:HI (mult:HI (sign_extend:HI (reg:QI 19 a1h [67])) (sign_extend:HI (reg:QI 18 a1l [68]))) (reg/v:HI 16 a0l [orig:62 s ] [62]))) 23 {multacc} (expr_list:REG_DEAD (reg:QI 19 a1h [67]) (expr_list:REG_DEAD (reg:QI 18 a1l [68]) (nil)))) (jump_insn:TI 61 35 75 3 dotprod2.c:8 (parallel [ (set (pc) (if_then_else (ne (reg:QI 2 r2 [70]) (const_int 1 [0x1])) (label_ref:QI 39) (pc))) (set (reg:QI 2 r2 [70]) (plus:QI (reg:QI 2 r2 [70]) (const_int -1 [0xffffffff]))) (use (const_int 255 [0xff])) (use (const_int 255 [0xff])) (use (const_int 1 [0x1])) ]) 43 {doloop_end_internal} (expr_list:REG_BR_PROB (const_int 9899 [0x26ab]) (nil))) (note 75 61 70 4 [bb 4] NOTE_INSN_BASIC_BLOCK) (note 70 75 72 4 NOTE_INSN_EPILOGUE_BEG) ... The loop body is not correctly scheduled, the TImode flags indicate that the entire loop-body will be executed in a single cycle as a VLIW packet and this will not work since no loop-prologue code has been emitted. My (probably quite limited) understanding of what should happen is that: 1. the fence is placed at (before) uid 32. 2. Instructions uid 32 and uid 33 are scheduled in this vliw group 3. The fence is advanced to to uid 35. 4. Instruction uid 35 is scheduled and instructions uid 32 and 33 are moved up and scheduled in this group also. In the process of moving up uid 32 and 33 bookkeeping copies are created on the loop entry edge. I've tried to debug this without much success and would very much appreciate any comments on what to look for or what I might be doing wrong. The GCC version that I am using is 4.4.1. BR /Markus