Thank you very much for your detailed response! > I suspect your machine description says that dependency between loads and > multiply-add has zero latency, thus allowing the scheduler to place them > into > one instruction group. Grep for various comments about tick_check_p > function. > In verbose scheduler dumps, there should be something like > > Expr 35 is not ready yet until cycle 2 > No best expr found! > Finished a cycle. Current cycle = 2
At a glance when compiling without the -fsel-sched-pipelining flag (but with -fselective-scheduling2) proper VLIW grouping is performed so I guess the dependency is not zero latency but I will try to investigate the details. Increasing verbosity and comparing dumps to ia64 will probably be helpful. > On the high level, yes. In this particular example, pipelining of loads > would > not be possible for the following reasons: > 1) speculative motion of loads with pre/post-increment is not implemented > (ia64 backend disables auto-inc generation pass when sel-sched is enabled); Is there a fundamental problem with pre/post-increment support in the selective scheduling approach or is this something that might be implemented in the future? > 2) when pipelining loads, scheduler needs to transform them into > control-speculative form (since loop epilogue is not generated, load on the > very last iteration of the transformed loop may access unallocated memory). > In other words, selective scheduler does not preserve number of instruction > executions (pipelined instructions from original loop will be executed more > times than number of loop iterations). > Speculative loads are not supported by any mainline GCC target except ia64. On my target it is always safe to performs loads so I suppose I could pretend to support speculative loads in order to get around this. /Markus