https://gcc.gnu.org/bugzilla/show_bug.cgi?id=102107
Bug ID: 102107 Summary: protocol register (r12) corrupted before a tail call Product: gcc Version: 11.2.1 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: pc at us dot ibm.com Target Milestone: --- Created attachment 51367 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=51367&action=edit preprocessed source (large) I've been working on an effort to improve Python performance, and hit an issue when running with a libpython.so that was built with "-mcpu=power10". The problem appears to be not correctly setting up (and preserving) register 12 before calling into a dynamically loaded, non-PCrel Python module in the form of a shared object. GDB shows the following instruction stream: => 0x7ffff7d25014 <do_mkvalue+1924>: ld r12,0(r9) => 0x7ffff7d25018 <do_mkvalue+1928>: addi r1,r1,112 r12 0x7fffe921af60 140737104686944 => 0x7ffff7d2501c <do_mkvalue+1932>: std r10,0(r30) => 0x7ffff7d25020 <do_mkvalue+1936>: ld r3,8(r9) => 0x7ffff7d25024 <do_mkvalue+1940>: ld r9,0(r31) => 0x7ffff7d25028 <do_mkvalue+1944>: ld r29,-24(r1) => 0x7ffff7d2502c <do_mkvalue+1948>: ld r30,-16(r1) => 0x7ffff7d25030 <do_mkvalue+1952>: mtctr r12 => 0x7ffff7d25034 <do_mkvalue+1956>: lwz r12,8(r1) r12 0x4000 16384 => 0x7ffff7d25038 <do_mkvalue+1960>: addi r9,r9,1 => 0x7ffff7d2503c <do_mkvalue+1964>: std r9,0(r31) => 0x7ffff7d25040 <do_mkvalue+1968>: ld r31,-8(r1) => 0x7ffff7d25044 <do_mkvalue+1972>: mtocrf 8,r12 => 0x7ffff7d25048 <do_mkvalue+1976>: bctr => 0x7fffe921af60 <return_none>: addis r2,r12,4 => 0x7fffe921af64 <return_none+4>: addi r2,r2,-12384 => 0x7fffe921af68 <return_none+8>: nop => 0x7fffe921af6c <return_none+12>: ld r3,-32728(r2) Program received signal SIGSEGV, Segmentation fault. 0x00007fffe921af6c in _Py_INCREF (op=<optimized out>) at ../Python-3.9.6/Include/object.h:408 408 op->ob_refcnt++; After setting r12 to the address of the caller (0x7ffff7d25014), the load at 0x7ffff7d25034 overwrites it with the CR save value just before the tail call (bctr) at 0x7ffff7d25048, resulting in the badness when setting up and using the TOC. I suspect some sort of instruction scheduling issue? I've attached a rather large pre-processed C file. It's complicated to reduce because of functions calling other functions. I gave "creduce" a shot at it, but it's challenging (for me, at least) to craft a script that knows what to look for. I'll also attach the best I could get from creduce, but shield your eyes before looking at it.