Hello Maxim, Thanks for your reply. I have in the meantime adjusted the list scheduler to handle the situation better and it's now working better than it was before on my port. However, I will give your suggestion of using sched-ebb a try given that it might outperform the current solution I have.
Regards, Paulo Matos > -----Original Message----- > From: Maxim Kuvyrkov [mailto:ma...@kugelworks.com] > Sent: 16 July 2013 05:02 > To: Paulo Matos > Cc: gcc@gcc.gnu.org > Subject: Re: Delay scheduling due to possible future multiple issue in VLIW > > Paulo, > > GCC schedule is not particularly designed for VLIW architectures, but it > handles them reasonably well. For the example of your code both schedules > take same time to execute: > > 38: 0: r1 = e[r0] > 40: 4: [r0] = r1 > 41: 5: r0 = r0+4 > 43: 5: p0 = r1!=0 > 44: 6: jump p0 > > and > > 38: 0: r1 = e[r0] > 41: 1: r0 = r0+4 > 40: 4: [r0] = r1 > 43: 5: p0 = r1!=0 > 44: 6: jump p0 > > [It is true that the first schedule takes less space due to fortunate VLIW > packing.] > > You are correct that GCC scheduler is greedy and that it tries to issue > instructions as soon as possible (i.e., it is better to issue something on > the cycle, than nothing at all), which is a sensible strategy. For small > basic block the greedy algorithm may cause artifacts like the one you > describe. > > You could try increasing size of regions on which scheduler operates by > switching your port to use scheb-ebb scheduler, which was originally > developed for ia64. > > Regards, > > -- > Maxim Kuvyrkov > KugelWorks > > > > On 27/06/2013, at 8:35 PM, Paulo Matos wrote: > > > Let me add to my own post saying that it seems that the problem is that the > list scheduler is greedy in the sense that it will take an instruction from > the ready list no matter what when waiting and trying to pair it with later > on with another instruction might be more beneficial. In a sense it seems > that the idea is that 'issuing instructions as soon as possible is better' > which might be true for a single issue chip but a VLIW with multiple issue > has to contend with other problems. > > > > Any thoughts on this? > > > > Paulo Matos > > > > > >> -----Original Message----- > >> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of > Paulo > >> Matos > >> Sent: 26 June 2013 15:08 > >> To: gcc@gcc.gnu.org > >> Subject: Delay scheduling due to possible future multiple issue in VLIW > >> > >> Hello, > >> > >> We have a port for a VLIW machine using gcc head 4.8 with an maximum issue > of > >> 2 per clock cycle (sometimes only 1 due to machine constraints). > >> We are seeing the following situation in sched2: > >> > >> ;; --------------- forward dependences: ------------ > >> > >> ;; --- Region Dependences --- b 3 bb 0 > >> ;; insn code bb dep prio cost reservation > >> ;; ---- ---- -- --- ---- ---- ----------- > >> ;; 38 1395 3 0 6 4 > >> (p0+long_imm+ldst0+lock0),nothing*3 : 44m 43 41 40 > >> ;; 40 491 3 1 2 2 > (p0+long_imm+ldst0+lock0),nothing > >> : 44m 41 > >> ;; 41 536 3 2 1 1 (p0+no_stl2)|(p1+no_dual) : > 44 > >> ;; 43 1340 3 1 2 1 (p0+no_stl2)|(p1+no_dual) : > 44m > >> ;; 44 1440 3 4 1 1 (p0+long_imm) : > >> > >> ;; dependencies resolved: insn 38 > >> ;; tick updated: insn 38 into ready > >> ;; dependencies resolved: insn 41 > >> ;; tick updated: insn 41 into ready > >> ;; Advanced a state. > >> ;; Ready list after queue_to_ready: 41:4 38:2 > >> ;; Ready list after ready_sort: 41:4 38:2 > >> ;; Ready list (t = 0): 41:4 38:2 > >> ;; Chosen insn : 38 > >> ;; 0--> b 0: i 38r1=zxn([r0+`b']) > >> :(p0+long_imm+ldst0+lock0),nothing*3 > >> ;; dependencies resolved: insn 43 > >> ;; Ready-->Q: insn 43: queued for 4 cycles (change queue > index). > >> ;; tick updated: insn 43 into queue with cost=4 > >> ;; dependencies resolved: insn 40 > >> ;; Ready-->Q: insn 40: queued for 4 cycles (change queue > index). > >> ;; tick updated: insn 40 into queue with cost=4 > >> ;; Ready-->Q: insn 41: queued for 1 cycles (resource > conflict). > >> ;; Ready list (t = 0): > >> ;; Advanced a state. > >> ;; Q-->Ready: insn 41: moving to ready without stalls > >> ;; Ready list after queue_to_ready: 41:4 > >> ;; Ready list after ready_sort: 41:4 > >> ;; Ready list (t = 1): 41:4 > >> ;; Chosen insn : 41 > >> ;; 1--> b 0: i 41r0=r0+0x4 > >> :(p0+no_stl2)|(p1+no_dual) > >> > >> So, it is scheduling first insn 38 followed by 41. > >> The insn chain for bb3 before sched2 looks like: > >> (insn 38 36 40 3 (set (reg:DI 1 r1) > >> (zero_extend:DI (mem:SI (plus:SI (reg:SI 0 r0 [orig:119 ivtmp.13 ] > >> [119]) > >> (symbol_ref:SI ("b") [flags 0x80] <var_decl > >> 0x2b9c011f75a0 b>)) [2 MEM[symbol: b, index: ivtmp.13_7, offset: 0B]+0 S4 > >> A32]))) pr3115b.c:13 1395 {zero_extendsidi2} > >> (nil)) > >> (insn 40 38 41 3 (set (mem:SI (plus:SI (reg:SI 0 r0 [orig:119 ivtmp.13 ] > >> [119]) > >> (symbol_ref:SI ("a") [flags 0x80] <var_decl 0x2b9c011f7500 > >> a>)) [2 MEM[symbol: a, index: ivtmp.13_7, offset: 0B]+0 S4 A32]) > >> (reg:SI 1 r1 [orig:118 D.3048 ] [118])) pr3115b.c:13 491 {fp_movsi} > >> (nil)) > >> (insn 41 40 43 3 (set (reg:SI 0 r0 [orig:119 ivtmp.13 ] [119]) > >> (plus:SI (reg:SI 0 r0 [orig:119 ivtmp.13 ] [119]) > >> (const_int 4 [0x4]))) 536 {addsi3} > >> (nil)) > >> (insn 43 41 44 3 (set (reg:BI 64 p0 [122]) > >> (ne:BI (reg:SI 1 r1 [orig:118 D.3048 ] [118]) > >> (const_int 0 [0]))) pr3115b.c:13 1340 {cmp_simode} > >> (expr_list:REG_DEAD (reg:SI 1 r1 [orig:118 D.3048 ] [118]) > >> (nil))) > >> (jump_insn 44 43 55 3 (set (pc) > >> (if_then_else (ne (reg:BI 64 p0 [122]) > >> (const_int 0 [0])) > >> (label_ref:SI 35) > >> (pc))) pr3115b.c:13 1440 {cbranchbi4} > >> (expr_list:REG_DEAD (reg:BI 64 p0 [122]) > >> (expr_list:REG_BR_PROB (const_int 9844 [0x2674]) > >> (expr_list:REG_PRED_WIDTH (const_int 4 [0x4]) > >> (nil)))) > >> > >> > >> The problem with this is that GCC is scheduling insn 38, followed by 41, > (a > >> patched) 40, 43 and 44. > >> However, if it had delayed scheduling 41, waited a clock cycle, issued 40 > >> then it would be able to issue 38 paired with 43 in the same clock cycle > and > >> then 44. > >> So, instead of generating the following insn chain: > >> 38 > >> 41 > >> patched 40 > >> 43 > >> 44 > >> > >> it would generate > >> 38 > >> 40 > >> 41 : 43 > >> 44 > >> > >> Is there a way to instruct the scheduler to wait on an instruction on a > given > >> clock cycle (even if that instruction is the only one on the ready list) > >> because it's possible that it can be paired with a later instruction in > the > >> chain if issued simultaneously? > >> > >> Cheers, > >> > >> Paulo Matos > >> > > >