"Bingfeng Mei" <b...@broadcom.com> wrote on 30/01/2009 14:44:01:
> Hello, > I try to make modulo scheduling work more efficiently for our VLIW target. I > found one serious issue that prevents current SMS algorithm from achieving > high IPC is so-called "transitive closure" problem, where scheduling window is > only calculated using direct predecessors and successors. Because SMS is not > an iterative algorithm, this may cause failures in finding a valid schedule. Agreed. > Without splitting rows, some simple loops just cannot be scheduled not matter > how big the II is. With splitting rows, schedule can be found, but only at > bigger II. It may happen that even splitting rows will not help, e.g. when we repeatedly end up with a node having negative sched window. > GCC wiki (http://gcc.gnu.org/wiki/SwingModuloScheduling) lists this > as a TODO. Is there any work going on about this issue No, not to my knowledge. We had some testcase where this showed up, hence its appearance in the TODO, but afaicr some change caused it to disappear. > (the last wiki update > was one year ago)? If no one is working on it, I plan to do it. My idea is to > use the MinDist algorithm described in B. Rau's classic paper "iterative > modulo scheduling" (http://www.hpl.hp.com/techreports/94/HPL-94-115.html). The > same algorithm can also be used to compute better RecMII. The biggest concern > is complexity of computing MinDist matrix, which is O(N^3). N is number of > nodes in the loop. I remember somewhere GCC coding guide says "never write > quadratic algorithm" :-) Is this an absolute requirement? If yes, I will keep > it as our target-specific code (we are less concerned about compilation time). > Otherwise, I will try to make it more generic to see if it can make into > mainline in 4.5. Any comments? > The problem appears only when the DDG is cyclic, and for every cycle in the DDG, the problem may arise when trying to schedule the last node of the cycle, which has both predecessors and successors already scheduled. So you might try to connect only each such predecessor to every such successor with a transitive arc, to ensure that this last node will have a non-empty scheduling window. But this might not suffice; you may eventually need to wire (nearly) every pair of nodes in the strongly connected component. If this happens, you'd be better off with a dense graph representation (incidence matrix) than the current sparse one (adjaceny lists). An example would help see things more clearly. If you have a (small :) DDG demonstrating the need for transitive arcs, I'd be happy to have a look and advise what should be done. Ayal. > Cheers, > Bingfeng Mei > > Broadcom UK > >