------- Comment #37 from bonzini at gnu dot org 2006-08-07 06:19 ------- I don't see how the last fmul[sl] can be removed without increasing code size. The only way to fix it would be to change the machine description to say that "this processor does not like FP operations with a memory operand". With a peephole, this is as good as we can get it. The last fmul is not coupled with a "fld %st" because it consumes the stack entry. See in comment #30, where there is still a "fmull b".
Can you please try re-running the tests? It takes skill^W^W seems quite weird to have a 100x slow-down, also because my tests were run on a similar Prescott (P4e). It also would be interesting to re-run your code generator on a compiler built from svn trunk. If it can provide higher performance, you'd be satisfied I guess even if it comes from a different kernel. Also, I strongly believe that you should implement vectorization, or at least find out *why* GCC does not vectorize your code. It may be simply that it does not have any guarantee on the alignment. -- http://gcc.gnu.org/bugzilla/show_bug.cgi?id=27827