https://gcc.gnu.org/bugzilla/show_bug.cgi?id=79593
--- Comment #10 from Katsunori Kumatani <katsunori.kumatani at gmail dot com> --- (In reply to Uroš Bizjak from comment #7) > > This patch allows conversion from (reg->reg) move to (const->reg) when > appropriate constant can be determined from REG_EQUIV/REG_EQUAL note. It > also handles float_extend cases. Patched gcc now generates: > Hi, thank you for patch, patch looks better indeed, but there is still this I am speaking about at the beginning: fld %st(0) flds global_data+4 fxch %st(3) fcomip %st(2), %st fstp %st(1) Shouldn't this instead be: flds global_data+4 fxch %st(2) fcomip %st(1), %st GCC 5 does generate the latter case so GCC 5 generates better code (optimal in this small case). In fact, this seems to have to do with "floating point type conversions". The conversions are implicit there, so there's no point in doing it like this. (one of the few good things about x87 apart from automatic high precision) For example, if you change my code to instead use 'float' instead of 'long double' and change the constants to float literals, it will generate code like GCC 5 in this respect. Can you look into it? I'm really sorry I'm not familiar with GCC internals at all. That's why I thought it was a regression in some way when "converting" floating point types. It only happens after GCC 6. If I change floating point types to 'float', GCC 6 generates this portion at beginning: flds global_data+4 fxch %st(2) fcomip %st(1), %st Which is exactly the same as GCC 5 -- but GCC 5 does it even on 'long double' and when loading floats. It's obviously worse code for some reason. Yes using floats is a workaround for this small code, because none of the registers get spilled to the stack. However, in real code, I want to use 'long double' because it's likely it will have to get spilled on the stack. And one of the main reasons I'm using x87 is for that extra precision in this particular code. It's not that "long double" generates bad code. It's simply the conversion from "float" to "long double" that does it, I believe. It's as if GCC 6 doesn't know float and long double are the same thing in a x87 register. (In reply to Jakub Jelinek from comment #5) > With -Ofast -m32 -mfpmath=i387 it changed with r244816. > > Why do you care about 32-bit code performance (and especially x87)? That is > pure legacy, people who care about performance should be using -m64 or at > least -msse2 -mfpmath=sse (unless you need long double). > > What I said still applies, the reason why the combiner doesn't optimize it > is that it has more than one use, which only goes away at regstack time and > there is no peephole afterwards that could improve it. Please check my reply above with more information, I hope it helps to clarify the issue and why it is a regression (compared to GCC 5 code gen). Anyway, I didn't mean it the way you implied, or that it's a huge performance penalty. It's just that it's worse code than v5 so I figured it's a regression in code generation. I actually compile this as *both* -m32 and -m64 in the end (I find 32-bit code easier to look through though that's why I use it), to interface with Wine (so technically a sort of both Unix and Windows component, specialized). Obviously, I want to use long double for higher precision during internal calculations here -- so 387 code will happen even in 64-bit mode. But the "conversion" from a floating point type to long double seems to generate extra instructions that serve no purpose. GCC 5 doesn't do this, that's why I termed it a regression instead of enhancement... I hope that makes sense!