https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100799
--- Comment #17 from Surya Kumari Jangala <jskumari at gcc dot gnu.org> --- I analysed the reduced test case specified in comment 15. In the .s file, the callee decrements r1 by 224, ie, callee’s frame size is 224. But there is an instruction in the callee that accesses into the caller’s frame at (r1+272). At first glance this looks odd, even incorrect, but after further analysis, I am not sure if this is incorrect. If we look at the RTL dumps, the offset 272 is introduced in ‘reload’. ‘Insn 4’ stores into (r1+272). ‘Insn 4’ after vregs: (insn 4 3 5 2 (set (reg/v/f:DI 177 [ arrayD.2714 ]) (reg:DI 5 5 [ arrayD.2714 ])) "bug.f":1:23 675 {*movdi_internal64} (expr_list:REG_EQUIV (mem/f/c:DI (plus:DI (reg/f:DI 99 ap) (const_int 48 [0x30])) [3 arrayD.2714+0 S8 A64]) (nil))) ‘Insn 4’ after IRA: (insn 4 214 237 2 (set (reg/v/f:DI 177 [ arrayD.2714 ]) (reg:DI 262)) "bug.f":1:23 675 {*movdi_internal64} (expr_list:REG_DEAD (reg:DI 262) (expr_list:REG_EQUIV (mem/f/c:DI (plus:DI (reg/f:DI 99 ap) (const_int 48 [0x30])) [3 arrayD.2714+0 S8 A64]) (nil)))) ‘Insn 4’ after reload: (insn 4 214 19 2 (set (mem/f/c:DI (plus:DI (reg/f:DI 1 1) (const_int 272 [0x110])) [3 arrayD.2714+0 S8 A64]) (reg:DI 5 5 [262])) "bug.f":1:23 675 {*movdi_internal64} (expr_list:REG_EQUIV (mem/f/c:DI (plus:DI (reg/f:DI 99 ap) (const_int 48 [0x30])) [3 arrayD.2714+0 S8 A64]) (nil))) As we can see, during vregs phase, we are moving r5 to r177 and r177 is equiv to (ap+48). ‘ap’ (r99) is the base register for access to arguments of the function. In the gcc code: #define ARG_POINTER_REGNUM 99 During vregs phase, not just r5, but all registers from r3-r10 are moved to pseudo registers and these pseudo regs are equivalent to (ap+’offset’) with ‘offset’ starting from 32 for r3 and going on till 88 for r10. Note that ap points to the beginning of the callee frame, hence to access the parameter save area of the caller’s frame, 32 needs to be added to ap. During LRA, in curr_insn_transform(), we make equivalence substitution and change r177 to r1+272. (272 because r177 is equivalent to ap+48, and ap equals r1+224, so ap+48 = r1+272). The argument registers r3-r10 are saved as they need to be reused to pass parameters to functions called from the callee. But not all parameter registers are spilled to the stack. For example, r6 is saved in r24. We can see this after the “final” phase: (insn 5 289 19 (set (reg/v/f:DI 24 %r24 [orig:178 ldaD.2715 ] [178]) (reg:DI 6 %r6 [263])) "bug.f":1:23 675 {*movdi_internal64} (expr_list:REG_EQUIV (mem/f/c:DI (plus:DI (reg/f:DI 99 ap) (const_int 56 [0x38])) [6 ldaD.2715+0 S8 A64]) (nil))) I guess r5 had to be spilled to stack because there were no free registers. Also, note that there is a load from (r1+272) in the reduced test case. This shows that the value in r5 is needed, and hence it has to be saved somewhere. I ran the test case with the options: -mcpu=power8 -O2 -fPIC If -fPIC option is removed, we do not see any access to the caller’s frame in the generated assembly. But it does have instructions that save the parameter registers into other registers. I suppose the parameter registers did not have to be saved on stack (ie, in the caller’s parameter save area) because there were enough registers available. That is, perhaps there is lesser register pressure without -fPIC. After vregs: (insn 4 3 5 2 (set (reg/v/f:DI 177 [ arrayD.2714 ]) (reg:DI 5 %r5 [ arrayD.2714 ])) "bug.f":1:23 675 {*movdi_internal64} (expr_list:REG_EQUIV (mem/f/c:DI (plus:DI (reg/f:DI 99 ap) (const_int 48 [0x30])) [3 arrayD.2714+0 S8 A64]) After reload: (insn 4 214 19 2 (set (reg/v/f:DI 17 %r17 [orig:177 arrayD.2714 ] [177]) (reg:DI 5 %r5 [262])) "bug.f":1:23 675 {*movdi_internal64} (expr_list:REG_EQUIV (mem/f/c:DI (plus:DI (reg/f:DI 99 ap) (const_int 48 [0x30])) [3 arrayD.2714+0 S8 A64]) (nil))) To summarise, the reduced testcase seems to be correctly compiled. So I shifted my focus to the original fortran file dgebal.f in the openBLAS library. In dgebal.f too we have some instructions accessing the caller’s parameter save area. These are the interesting snippets of instructions from the assembly code: // The original contents of r23 are spilled. std %r23,-192(%r1) // r3 is saved in r23 mr %r23,%r3 // frame is allocated stdu %r1,-400(%r1) // restore r3 contents before making call to lsame_. There are several calls to lsame_ and // each time, r3 is restored. mr %r3,%r23 bl lsame_ // save r23 to the stack because we are running out of registers and we need a free reg. // Note that we are saving to the caller’s frame into the parameter save area. And we // are saving to (400+32) which is the // location that r3 would have been spilled. This is correct because r23 holds the contents of r3. std %r23,432(%r1) // Use r23 li %r23,1 cmpwi %cr0,%r23,2 // Load back r23 as we need to pass parameter to lsame_ ld %r23,432(%r1) mr %r3,%r23 bl lsame_ // Epilogue: restore r1 and the original contents of r23. addi %r1,%r1,400 ld %r23,-192(%r1) blr The snippets of assembly code above are for r3 being saved in r23. There are other parameter registers too being saved like for example, r10 is copied to r30 which is then later spilled into the caller’s parameter save area at (r1+488). 488=400+32+56 = 400+32+8*7, and this is the location for r10. In the rtl dump, after vregs phase, we can see registers r3 to r10 being saved to pseudo registers. After vregs phase: (r3 saved to pseudo r303 which is equiv to ap+32) (insn 2 43 3 2 (set (reg/v/f:DI 303 [ jobD.2712 ]) (reg:DI 3 %r3 [ jobD.2712 ])) "dgebal.f":2:23 675 {*movdi_internal64} (expr_list:REG_EQUIV (mem/f/c:DI (plus:DI (reg/f:DI 99 ap) (const_int 32 [0x20])) [4 jobD.2712+0 S8 A64]) After reload: (r303 is assigned to r23 and it is spilled at r1+432). (insn 1620 931 1627 22 (set (mem/f/c:DI (plus:DI (reg/f:DI 1 %r1) (const_int 432 [0x1b0])) [4 jobD.2712+0 S8 A64]) (reg/v/f:DI 23 %r23 [orig:303 jobD.2712 ] [303])) 675 {*movdi_internal64} (nil)) >From the description of the bug (https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100799#c0), the issue is occurring when FlexiBLAS is used with OpenBLAS. And the issue is that when the fortran routine DGEEV calls another fortran routine DGEBAL, the second parameter (’N’) gets corrupted when control returns back to DGEEV. (DGEBAL and DGEEV are routines in openBLAS). When FlexiBLAS is used, any call from an openBLAS routine to another openBLAS routine goes thru flexiBLAS. So the call to DGEBAL goes thru flexiblas first. FlexiBLAS is a wrapper library and it contains a wrapper for dgebal. FlexiBLAS is written in C while openBLAS is a fortran library. There is a wrapper for DGEBAL in flexiblas which reroutes the call to DGEBAL in openBLAS. My suspicion is that the wrapper routine written in C does not allocate the optional parameter save area. I tried compiling the wrapper routine for dgebal with -O2 -fPIC and with these options, the frame size is only 32; the parameter save area is not being allocated. And I think this is resulting in corrupting contents of DGEEV’s stack when the fortran routine DGEBAL writes into the caller’s parameter save area. I am not sure with what options flexiBLAS is built, but I suspect we do not allocate parameter save area irrespective of the options used. I wonder if saving the parameter registers r3-r10 to the parameter save area of caller’s frame is specific to Fortran. In C, looks like these registers are being saved in the callee frame itself.