RE: Delay scheduling due to possible future multiple issue in VLIW

Paulo Matos Tue, 16 Jul 2013 06:27:48 -0700

Hello Maxim,

Thanks for your reply. I have in the meantime adjusted the list scheduler to 
handle the situation better and it's now working better than it was before on 
my port.
However, I will give your suggestion of using sched-ebb a try given that it 
might outperform the current solution I have.


Regards,
Paulo Matos


> -----Original Message-----
> From: Maxim Kuvyrkov [mailto:ma...@kugelworks.com]
> Sent: 16 July 2013 05:02
> To: Paulo Matos
> Cc: gcc@gcc.gnu.org
> Subject: Re: Delay scheduling due to possible future multiple issue in VLIW
> 
> Paulo,
> 
> GCC schedule is not particularly designed for VLIW architectures, but it
> handles them reasonably well.  For the example of your code both schedules
> take same time to execute:
> 
> 38: 0: r1 = e[r0]
> 40: 4: [r0] = r1
> 41: 5: r0 = r0+4
> 43: 5: p0 = r1!=0
> 44: 6: jump p0
> 
> and
> 
> 38: 0: r1 = e[r0]
> 41: 1: r0 = r0+4
> 40: 4: [r0] = r1
> 43: 5: p0 = r1!=0
> 44: 6: jump p0
> 
> [It is true that the first schedule takes less space due to fortunate VLIW
> packing.]
> 
> You are correct that GCC scheduler is greedy and that it tries to issue
> instructions as soon as possible (i.e., it is better to issue something on
> the cycle, than nothing at all), which is a sensible strategy.  For small
> basic block the greedy algorithm may cause artifacts like the one you
> describe.
> 
> You could try increasing size of regions on which scheduler operates by
> switching your port to use scheb-ebb scheduler, which was originally
> developed for ia64.
> 
> Regards,
> 
> --
> Maxim Kuvyrkov
> KugelWorks
> 
> 
> 
> On 27/06/2013, at 8:35 PM, Paulo Matos wrote:
> 
> > Let me add to my own post saying that it seems that the problem is that the
> list scheduler is greedy in the sense that it will take an instruction from
> the ready list no matter what when waiting and trying to pair it with later
> on with another instruction might be more beneficial. In a sense it seems
> that the idea is that 'issuing instructions as soon as possible is better'
> which might be true for a single issue chip but a VLIW with multiple issue
> has to contend with other problems.
> >
> > Any thoughts on this?
> >
> > Paulo Matos
> >
> >
> >> -----Original Message-----
> >> From: gcc-ow...@gcc.gnu.org [mailto:gcc-ow...@gcc.gnu.org] On Behalf Of
> Paulo
> >> Matos
> >> Sent: 26 June 2013 15:08
> >> To: gcc@gcc.gnu.org
> >> Subject: Delay scheduling due to possible future multiple issue in VLIW
> >>
> >> Hello,
> >>
> >> We have a port for a VLIW machine using gcc head 4.8 with an maximum issue
> of
> >> 2 per clock cycle (sometimes only 1 due to machine constraints).
> >> We are seeing the following situation in sched2:
> >>
> >> ;;   --------------- forward dependences: ------------
> >>
> >> ;;   --- Region Dependences --- b 3 bb 0
> >> ;;      insn  code    bb   dep  prio  cost   reservation
> >> ;;      ----  ----    --   ---  ----  ----   -----------
> >> ;;       38  1395     3     0     6     4
> >> (p0+long_imm+ldst0+lock0),nothing*3 : 44m 43 41 40
> >> ;;       40   491     3     1     2     2
> (p0+long_imm+ldst0+lock0),nothing
> >> : 44m 41
> >> ;;       41   536     3     2     1     1   (p0+no_stl2)|(p1+no_dual)   :
> 44
> >> ;;       43  1340     3     1     2     1   (p0+no_stl2)|(p1+no_dual)   :
> 44m
> >> ;;       44  1440     3     4     1     1   (p0+long_imm)       :
> >>
> >> ;;              dependencies resolved: insn 38
> >> ;;              tick updated: insn 38 into ready
> >> ;;              dependencies resolved: insn 41
> >> ;;              tick updated: insn 41 into ready
> >> ;;      Advanced a state.
> >> ;;              Ready list after queue_to_ready:    41:4  38:2
> >> ;;              Ready list after ready_sort:    41:4  38:2
> >> ;;      Ready list (t =   0):    41:4  38:2
> >> ;;              Chosen insn : 38
> >> ;;        0--> b  0: i  38r1=zxn([r0+`b'])
> >> :(p0+long_imm+ldst0+lock0),nothing*3
> >> ;;              dependencies resolved: insn 43
> >> ;;              Ready-->Q: insn 43: queued for 4 cycles (change queue
> index).
> >> ;;              tick updated: insn 43 into queue with cost=4
> >> ;;              dependencies resolved: insn 40
> >> ;;              Ready-->Q: insn 40: queued for 4 cycles (change queue
> index).
> >> ;;              tick updated: insn 40 into queue with cost=4
> >> ;;              Ready-->Q: insn 41: queued for 1 cycles (resource
> conflict).
> >> ;;      Ready list (t =   0):
> >> ;;      Advanced a state.
> >> ;;              Q-->Ready: insn 41: moving to ready without stalls
> >> ;;              Ready list after queue_to_ready:    41:4
> >> ;;              Ready list after ready_sort:    41:4
> >> ;;      Ready list (t =   1):    41:4
> >> ;;              Chosen insn : 41
> >> ;;        1--> b  0: i  41r0=r0+0x4
> >> :(p0+no_stl2)|(p1+no_dual)
> >>
> >> So, it is scheduling first insn 38 followed by 41.
> >> The insn chain for bb3 before sched2 looks like:
> >> (insn 38 36 40 3 (set (reg:DI 1 r1)
> >>        (zero_extend:DI (mem:SI (plus:SI (reg:SI 0 r0 [orig:119 ivtmp.13 ]
> >> [119])
> >>                    (symbol_ref:SI ("b") [flags 0x80]  <var_decl
> >> 0x2b9c011f75a0 b>)) [2 MEM[symbol: b, index: ivtmp.13_7, offset: 0B]+0 S4
> >> A32]))) pr3115b.c:13 1395 {zero_extendsidi2}
> >>     (nil))
> >> (insn 40 38 41 3 (set (mem:SI (plus:SI (reg:SI 0 r0 [orig:119 ivtmp.13 ]
> >> [119])
> >>                (symbol_ref:SI ("a") [flags 0x80]  <var_decl 0x2b9c011f7500
> >> a>)) [2 MEM[symbol: a, index: ivtmp.13_7, offset: 0B]+0 S4 A32])
> >>        (reg:SI 1 r1 [orig:118 D.3048 ] [118])) pr3115b.c:13 491 {fp_movsi}
> >>     (nil))
> >> (insn 41 40 43 3 (set (reg:SI 0 r0 [orig:119 ivtmp.13 ] [119])
> >>        (plus:SI (reg:SI 0 r0 [orig:119 ivtmp.13 ] [119])
> >>            (const_int 4 [0x4]))) 536 {addsi3}
> >>     (nil))
> >> (insn 43 41 44 3 (set (reg:BI 64 p0 [122])
> >>        (ne:BI (reg:SI 1 r1 [orig:118 D.3048 ] [118])
> >>            (const_int 0 [0]))) pr3115b.c:13 1340 {cmp_simode}
> >>     (expr_list:REG_DEAD (reg:SI 1 r1 [orig:118 D.3048 ] [118])
> >>        (nil)))
> >> (jump_insn 44 43 55 3 (set (pc)
> >>        (if_then_else (ne (reg:BI 64 p0 [122])
> >>                (const_int 0 [0]))
> >>            (label_ref:SI 35)
> >>            (pc))) pr3115b.c:13 1440 {cbranchbi4}
> >>     (expr_list:REG_DEAD (reg:BI 64 p0 [122])
> >>        (expr_list:REG_BR_PROB (const_int 9844 [0x2674])
> >>            (expr_list:REG_PRED_WIDTH (const_int 4 [0x4])
> >>                (nil))))
> >>
> >>
> >> The problem with this is that GCC is scheduling insn 38, followed by 41,
> (a
> >> patched) 40, 43 and 44.
> >> However, if it had delayed scheduling 41, waited a clock cycle, issued 40
> >> then it would be able to issue 38 paired with 43 in the same clock cycle
> and
> >> then 44.
> >> So, instead of generating the following insn chain:
> >> 38
> >> 41
> >> patched 40
> >> 43
> >> 44
> >>
> >> it would generate
> >> 38
> >> 40
> >> 41 : 43
> >> 44
> >>
> >> Is there a way to instruct the scheduler to wait on an instruction on a
> given
> >> clock cycle (even if that instruction is the only one on the ready list)
> >> because it's possible that it can be paired with a later instruction in
> the
> >> chain if issued simultaneously?
> >>
> >> Cheers,
> >>
> >> Paulo Matos
> >>
> >
>

RE: Delay scheduling due to possible future multiple issue in VLIW

Reply via email to