Worse code generation for FPU on versions after 6

katsunori.kumatani at gmail dot com Tue, 21 Feb 2017 04:28:07 -0800

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79593


--- Comment #10 from Katsunori Kumatani <katsunori.kumatani at gmail dot com> 
---
(In reply to Uroš Bizjak from comment #7)
> 
> This patch allows conversion from (reg->reg) move to (const->reg) when
> appropriate constant can be determined from REG_EQUIV/REG_EQUAL note. It
> also handles float_extend cases. Patched gcc now generates:
> 

Hi, thank you for patch, patch looks better indeed, but there is still this I
am speaking about at the beginning:

        fld     %st(0)
        flds    global_data+4
        fxch    %st(3)
        fcomip  %st(2), %st
        fstp    %st(1)

Shouldn't this instead be:

        flds    global_data+4
        fxch    %st(2)
        fcomip  %st(1), %st

GCC 5 does generate the latter case so GCC 5 generates better code (optimal in
this small case). In fact, this seems to have to do with "floating point type
conversions". The conversions are implicit there, so there's no point in doing
it like this. (one of the few good things about x87 apart from automatic high
precision)

For example, if you change my code to instead use 'float' instead of 'long
double' and change the constants to float literals, it will generate code like
GCC 5 in this respect. Can you look into it? I'm really sorry I'm not familiar
with GCC internals at all.

That's why I thought it was a regression in some way when "converting" floating
point types. It only happens after GCC 6.

If I change floating point types to 'float', GCC 6 generates this portion at
beginning:

        flds    global_data+4
        fxch    %st(2)
        fcomip  %st(1), %st

Which is exactly the same as GCC 5 -- but GCC 5 does it even on 'long double'
and when loading floats. It's obviously worse code for some reason.


Yes using floats is a workaround for this small code, because none of the
registers get spilled to the stack. However, in real code, I want to use 'long
double' because it's likely it will have to get spilled on the stack. And one
of the main reasons I'm using x87 is for that extra precision in this
particular code.

It's not that "long double" generates bad code. It's simply the conversion from
"float" to "long double" that does it, I believe. It's as if GCC 6 doesn't know
float and long double are the same thing in a x87 register.



(In reply to Jakub Jelinek from comment #5)
> With -Ofast -m32 -mfpmath=i387 it changed with r244816.
> 
> Why do you care about 32-bit code performance (and especially x87)?  That is
> pure legacy, people who care about performance should be using -m64 or at
> least -msse2 -mfpmath=sse (unless you need long double).
> 
> What I said still applies, the reason why the combiner doesn't optimize it
> is that it has more than one use, which only goes away at regstack time and
> there is no peephole afterwards that could improve it.

Please check my reply above with more information, I hope it helps to clarify
the issue and why it is a regression (compared to GCC 5 code gen). Anyway, I
didn't mean it the way you implied, or that it's a huge performance penalty.
It's just that it's worse code than v5 so I figured it's a regression in code
generation.

I actually compile this as *both* -m32 and -m64 in the end (I find 32-bit code
easier to look through though that's why I use it), to interface with Wine (so
technically a sort of both Unix and Windows component, specialized). Obviously,
I want to use long double for higher precision during internal calculations
here -- so 387 code will happen even in 64-bit mode.

But the "conversion" from a floating point type to long double seems to
generate extra instructions that serve no purpose. GCC 5 doesn't do this,
that's why I termed it a regression instead of enhancement... I hope that makes
sense!

[Bug target/79593] [6/7 Regression] Poor/Worse code generation for FPU on versions after 6

Reply via email to