"Bingfeng Mei" <b...@broadcom.com> wrote on 30/01/2009 14:44:01:

> Hello,
> I try to make modulo scheduling work more efficiently for our VLIW
target. I
> found one serious issue that prevents current SMS algorithm from
achieving
> high IPC is so-called "transitive closure" problem, where scheduling
window is
> only calculated using direct predecessors and successors. Because SMS is
not
> an iterative algorithm, this may cause failures in finding a valid
schedule.

Agreed.

> Without splitting rows, some simple loops just cannot be scheduled not
matter
> how big the II is. With splitting rows, schedule can be found, but only
at
> bigger II.

It may happen that even splitting rows will not help, e.g. when we
repeatedly end up with a node having negative sched window.

> GCC wiki (http://gcc.gnu.org/wiki/SwingModuloScheduling) lists this
> as a TODO. Is there any work going on about this issue

No, not to my knowledge. We had some testcase where this showed up, hence
its appearance in the TODO, but afaicr some change caused it to disappear.

> (the last wiki update
> was one year ago)? If no one is working on it, I plan to do it. My idea
is to
> use the MinDist algorithm described in B. Rau's classic paper "iterative
> modulo
scheduling" (http://www.hpl.hp.com/techreports/94/HPL-94-115.html). The
> same algorithm can also be used to compute better RecMII. The biggest
concern
> is complexity of computing MinDist matrix, which is O(N^3). N is number
of
> nodes in the loop. I remember somewhere GCC coding guide says "never
write
> quadratic algorithm" :-) Is this an absolute requirement? If yes, I will
keep
> it as our target-specific code (we are less concerned about compilation
time).
> Otherwise, I will try to make it more generic to see if it can make into
> mainline in 4.5. Any comments?
>

The problem appears only when the DDG is cyclic, and for every cycle in the
DDG, the problem may arise when trying to schedule the last node of the
cycle, which has both predecessors and successors already scheduled. So you
might try to connect only each such predecessor to every such successor
with a transitive arc, to ensure that this last node will have a non-empty
scheduling window. But this might not suffice; you may eventually need to
wire (nearly) every pair of nodes in the strongly connected component. If
this happens, you'd be better off with a dense graph representation
(incidence matrix) than the current sparse one (adjaceny lists).

An example would help see things more clearly. If you have a (small :) DDG
demonstrating the need for transitive arcs, I'd be happy to have a look and
advise what should be done.

Ayal.

> Cheers,
> Bingfeng Mei
>
> Broadcom UK
>
>

Reply via email to