https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111522

Kewen Lin <linkw at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
         Resolution|---                         |INVALID
                 CC|                            |rguenth at gcc dot gnu.org
             Status|WAITING                     |RESOLVED

--- Comment #12 from Kewen Lin <linkw at gcc dot gnu.org> ---
(In reply to Mathieu Malaterre from comment #11)
> Here is a dead simple reduced version:
> 
> ```
> % cat pr111522.cc
> #include <iostream>
> #include <cstring>
> #pragma GCC push_options
> #pragma GCC target "cpu=power10"
> float BitCast(int in) {
>   float out;
>   memcpy(&out, &in, sizeof(out));
>   return out;
> }
> float kNearOneF = BitCast(1065353215);
> #pragma GCC pop_options
> int main() { std::cout << kNearOneF << std::endl; }
> ```
> 
> You can compare:
> 
> g++ -o works -O2 pr111522.cc -Wall -Wextra -Werror -Wfatal-errors
> 
> vs
> 
> g++ -o fails -flto -O2 pr111522.cc -Wall -Wextra -Werror -Wfatal-errors
> 
> For some reason, `-flto` rightfully generates a `xxspltidp` instruction:
> 
> (gdb) display/i $pc
> 1: x/i $pc
> => 0x100000940 <_Z7BitCasti.constprop.0>:       xxspltidp vs1,1065353215
> 
> I am not sure I understand the behavior of the non LTO case now...

I think this is a test issue. The given source code claims it wants to compile
the function BitCast with -mcpu=power10, it's valid to generate power10 insns
for it and its specialized ones.

Without LTO, no power10 insn helps the general BitCast, so the generated insns
looks like:

0000000010000b10 <_Z7BitCasti>:
    10000b10:   c6 07 69 78     rldicr  r9,r3,32,31
    10000b14:   66 01 29 7c     mtfprd  f1,r9
    10000b18:   2c 0d 20 f0     xscvspdpn vs1,vs1
    10000b1c:   20 00 80 4e     blr

while with LTO, function versioning is able to create one specialized function
with fixed argument 1065353215, then the newly created one is able to leverage
power10 insn so we have:

// specialized with const argument propagate 
0000000010000840 <_Z7BitCasti.constprop.0>:
    10000840:   7f 3f 00 05     xxspltidp vs1,1065353215
    10000844:   ff ff 24 80
    10000848:   20 00 80 4e     blr

while the global variable initialization still uses power8 insns:

0000000010000940 <_GLOBAL__sub_I__Z7BitCasti>:
    10000940:   02 10 40 3c     lis     r2,4098
    10000944:   00 7f 42 38     addi    r2,r2,32512
    10000948:   a6 02 08 7c     mflr    r0
    1000094c:   10 00 01 f8     std     r0,16(r1)
    10000950:   e1 ff 21 f8     stdu    r1,-32(r1)
    10000954:   dd fe ff 4b     bl      10000830 <00000184.long_branch.184:6>
    10000958:   18 00 41 e8     ld      r2,24(r1)
    1000095c:   20 00 21 38     addi    r1,r1,32
    10000960:   00 00 00 60     nop
    10000964:   10 00 01 e8     ld      r0,16(r1)
    10000968:   5c 81 22 d0     stfs    f1,-32420(r2)
    1000096c:   a6 03 08 7c     mtlr    r0
    10000970:   20 00 80 4e     blr

If we specify -mcpu=power10 -flto, we can see _GLOBAL__sub_I__Z7BitCasti will
directly adopts p10 insns (it implicitly indicates that with the default
-mcpu=power8, inlining considers it's unsafe to inline _Z7BitCasti.constprop.0)

0000000010000900 <_GLOBAL__sub_I__Z7BitCasti>:
    10000900:   7f 3f 00 05     xxspltidp vs0,1065353215
    10000904:   ff ff 04 80
    10000908:   01 00 10 06     pstfs   f0,128852       # 1002005c <kNearOneF>
    1000090c:   54 f7 00 d0
    10000910:   20 00 80 4e     blr

Reply via email to