Is there any specific reason why none of the graphite loop optimizations (loop-block, loop-interchange, loop-strip-mine, loop-jam) are enabled with -O3 or -Ofast?
I assume doing so would make them much more widely used. Perhaps would be something to consider for 5.0? -Andi