https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522
Kewen Lin <linkw at gcc dot gnu.org> changed: What |Removed |Added ---------------------------------------------------------------------------- Resolution|--- |INVALID CC| |rguenth at gcc dot gnu.org Status|WAITING |RESOLVED --- Comment #12 from Kewen Lin <linkw at gcc dot gnu.org> --- (In reply to Mathieu Malaterre from comment #11) > Here is a dead simple reduced version: > > ``` > % cat pr111522.cc > #include <iostream> > #include <cstring> > #pragma GCC push_options > #pragma GCC target "cpu=power10" > float BitCast(int in) { > float out; > memcpy(&out, &in, sizeof(out)); > return out; > } > float kNearOneF = BitCast(1065353215); > #pragma GCC pop_options > int main() { std::cout << kNearOneF << std::endl; } > ``` > > You can compare: > > g++ -o works -O2 pr111522.cc -Wall -Wextra -Werror -Wfatal-errors > > vs > > g++ -o fails -flto -O2 pr111522.cc -Wall -Wextra -Werror -Wfatal-errors > > For some reason, `-flto` rightfully generates a `xxspltidp` instruction: > > (gdb) display/i $pc > 1: x/i $pc > => 0x100000940 <_Z7BitCasti.constprop.0>: xxspltidp vs1,1065353215 > > I am not sure I understand the behavior of the non LTO case now... I think this is a test issue. The given source code claims it wants to compile the function BitCast with -mcpu=power10, it's valid to generate power10 insns for it and its specialized ones. Without LTO, no power10 insn helps the general BitCast, so the generated insns looks like: 0000000010000b10 <_Z7BitCasti>: 10000b10: c6 07 69 78 rldicr r9,r3,32,31 10000b14: 66 01 29 7c mtfprd f1,r9 10000b18: 2c 0d 20 f0 xscvspdpn vs1,vs1 10000b1c: 20 00 80 4e blr while with LTO, function versioning is able to create one specialized function with fixed argument 1065353215, then the newly created one is able to leverage power10 insn so we have: // specialized with const argument propagate 0000000010000840 <_Z7BitCasti.constprop.0>: 10000840: 7f 3f 00 05 xxspltidp vs1,1065353215 10000844: ff ff 24 80 10000848: 20 00 80 4e blr while the global variable initialization still uses power8 insns: 0000000010000940 <_GLOBAL__sub_I__Z7BitCasti>: 10000940: 02 10 40 3c lis r2,4098 10000944: 00 7f 42 38 addi r2,r2,32512 10000948: a6 02 08 7c mflr r0 1000094c: 10 00 01 f8 std r0,16(r1) 10000950: e1 ff 21 f8 stdu r1,-32(r1) 10000954: dd fe ff 4b bl 10000830 <00000184.long_branch.184:6> 10000958: 18 00 41 e8 ld r2,24(r1) 1000095c: 20 00 21 38 addi r1,r1,32 10000960: 00 00 00 60 nop 10000964: 10 00 01 e8 ld r0,16(r1) 10000968: 5c 81 22 d0 stfs f1,-32420(r2) 1000096c: a6 03 08 7c mtlr r0 10000970: 20 00 80 4e blr If we specify -mcpu=power10 -flto, we can see _GLOBAL__sub_I__Z7BitCasti will directly adopts p10 insns (it implicitly indicates that with the default -mcpu=power8, inlining considers it's unsafe to inline _Z7BitCasti.constprop.0) 0000000010000900 <_GLOBAL__sub_I__Z7BitCasti>: 10000900: 7f 3f 00 05 xxspltidp vs0,1065353215 10000904: ff ff 04 80 10000908: 01 00 10 06 pstfs f0,128852 # 1002005c <kNearOneF> 1000090c: 54 f7 00 d0 10000910: 20 00 80 4e blr