Thank you very much for your detailed response!

> I suspect your machine description says that dependency between loads and
> multiply-add has zero latency, thus allowing the scheduler to place them
> into
> one instruction group.  Grep for various comments about tick_check_p
> function.
> In verbose scheduler dumps, there should be something like
>
> Expr 35 is not ready yet until cycle 2
> No best expr found!
> Finished a cycle.  Current cycle = 2

At a glance when compiling without the -fsel-sched-pipelining flag
(but with -fselective-scheduling2) proper VLIW grouping is performed
so I guess the dependency is not zero latency but I will try to
investigate the details. Increasing verbosity and comparing dumps to
ia64 will probably be helpful.

> On the high level, yes.  In this particular example, pipelining of loads
> would
> not be possible for the following reasons:
> 1) speculative motion of loads with pre/post-increment is not implemented
> (ia64 backend disables auto-inc generation pass when sel-sched is enabled);

Is there a fundamental problem with pre/post-increment support in the
selective scheduling approach or is this something that might be
implemented in the future?

> 2) when pipelining loads, scheduler needs to transform them into
> control-speculative form (since loop epilogue is not generated, load on the
> very last iteration of the transformed loop may access unallocated memory).
> In other words, selective scheduler does not preserve number of instruction
> executions (pipelined instructions from original loop will be executed more
> times than number of loop iterations).
> Speculative loads are not supported by any mainline GCC target except ia64.

On my target it is always safe to performs loads so I suppose I could
pretend to support speculative loads in order to get around this.

/Markus

Reply via email to