https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72804
--- Comment #1 from Peter Bergner <bergner at gcc dot gnu.org> --- Using the following patch, I'm able to clean up the first simple test case: Index: rs6000.c =================================================================== --- rs6000.c (revision 239144) +++ rs6000.c (working copy) @@ -7747,7 +7747,6 @@ reg_offset_addressing_ok_p (machine_mode case V2DFmode: case V2DImode: case V1TImode: - case TImode: case TFmode: case KFmode: /* AltiVec/VSX vector modes. Only reg+reg addressing was valid until the ... meaning we end up with just the two ld's insns, similar to the -mno-vsx-timode compile, but it doesn't help the bigger test case at all. I have a somewhat smaller test case that still shows the bad code gen (using the patch above): bergner@genoa:~/gcc/BUGS/LRA$ cat t2.i __int128_t ptr4 (__int128_t *p) { return ~p[1]; } bergner@genoa:~/gcc/BUGS/LRA$ /home/bergner/gcc/build/gcc-fsf-mainline-pr72804-debug/gcc/xgcc -B/home/bergner/gcc/build/gcc-fsf-mainline-pr72804-debug/gcc -O2 -mcpu=power7 -mno-vsx-timode -S t2.i bergner@genoa:~/gcc/BUGS/LRA$ cat t2.s ptr4: ld 11,24(3) ld 10,16(3) not 4,11 not 3,10 blr bergner@genoa:~/gcc/BUGS/LRA$ /home/bergner/gcc/build/gcc-fsf-mainline-pr72804-debug/gcc/xgcc -B/home/bergner/gcc/build/gcc-fsf-mainline-pr72804-debug/gcc -O2 -mcpu=power7 -mvsx-timode -S t2.i bergner@genoa:~/gcc/BUGS/LRA$ cat t2.s ptr4: stdu 1,-352(1) addi 9,3,16 lxvd2x 0,0,9 addi 9,1,32 stxvd2x 0,0,9 ori 2,2,0 lxvd2x 0,0,9 addi 9,1,48 xxpermdi 0,0,0,2 xxpermdi 0,0,0,2 stxvd2x 0,0,9 addi 9,1,64 xxpermdi 0,0,0,2 xxpermdi 0,0,0,2 stxvd2x 0,0,9 addi 9,1,80 xxpermdi 0,0,0,2 xxpermdi 0,0,0,2 stxvd2x 0,0,9 addi 9,1,96 xxpermdi 0,0,0,2 xxpermdi 0,0,0,2 stxvd2x 0,0,9 addi 9,1,112 xxpermdi 0,0,0,2 xxpermdi 0,0,0,2 stxvd2x 0,0,9 addi 9,1,128 xxpermdi 0,0,0,2 xxpermdi 0,0,0,2 stxvd2x 0,0,9 addi 9,1,144 xxpermdi 0,0,0,2 xxpermdi 0,0,0,2 stxvd2x 0,0,9 addi 9,1,160 xxpermdi 0,0,0,2 xxpermdi 0,0,0,2 stxvd2x 0,0,9 addi 9,1,176 xxpermdi 0,0,0,2 xxpermdi 0,0,0,2 stxvd2x 0,0,9 addi 9,1,192 xxpermdi 0,0,0,2 xxpermdi 0,0,0,2 stxvd2x 0,0,9 addi 9,1,208 xxpermdi 0,0,0,2 xxpermdi 0,0,0,2 stxvd2x 0,0,9 addi 9,1,224 xxpermdi 0,0,0,2 xxpermdi 0,0,0,2 stxvd2x 0,0,9 addi 9,1,240 xxpermdi 0,0,0,2 xxpermdi 0,0,0,2 stxvd2x 0,0,9 addi 9,1,256 xxpermdi 0,0,0,2 xxpermdi 0,0,0,2 stxvd2x 0,0,9 addi 9,1,272 xxpermdi 0,0,0,2 xxpermdi 0,0,0,2 stxvd2x 0,0,9 addi 9,1,288 xxpermdi 0,0,0,2 xxpermdi 0,0,0,2 stxvd2x 0,0,9 addi 9,1,304 xxpermdi 0,0,0,2 xxpermdi 0,0,0,2 stxvd2x 0,0,9 addi 9,1,320 xxpermdi 0,0,0,2 xxpermdi 0,0,0,2 stxvd2x 0,0,9 addi 9,1,336 xxpermdi 0,0,0,2 xxpermdi 0,0,0,2 stxvd2x 0,0,9 ori 2,2,0 ld 10,0(9) ld 11,8(9) addi 1,1,352 not 3,10 not 4,11 blr Lot's of useless code in there! :-( If I compare the rtl between the two, I see the following for -mno-vsx-timode: (insn 2 4 3 2 (set (reg/v/f:DI 157 [ pD.2334 ]) (reg:DI 3 3 [ pD.2334 ])) t2.i:3 565 {*movdi_internal64} (nil)) (insn 6 3 7 2 (set (reg:DI 160) (plus:DI (reg/v/f:DI 157 [ pD.2334 ]) (const_int 16 [0x10]))) t2.i:4 75 {*adddi3} (nil)) (insn 7 6 8 2 (set (reg:TI 159) (mem:TI (reg:DI 160) [1 MEM[(__int128D.6 *)p_2(D) + 16B]+0 S16 A128])) t2.i:4 568 {*movti_ppc64} (nil)) (insn 8 7 9 2 (set (reg:TI 158) (not:TI (reg:TI 159))) t2.i:4 446 {*one_cmplti3_internal} (nil)) versus for -mvsx-timode: (insn 2 4 3 2 (set (reg/v/f:DI 157 [ pD.2334 ]) (reg:DI 3 3 [ pD.2334 ])) t2.i:3 565 {*movdi_internal64} (nil)) (insn 6 3 7 2 (set (reg:TI 159) (mem:TI (plus:DI (reg/v/f:DI 157 [ pD.2334 ]) (const_int 16 [0x10])) [1 MEM[(__int128D.6 *)p_2(D) + 16B]+0 S16 A128])) t2.i:4 955 {*vsx_le_perm_load_ti} (nil)) (insn 7 6 8 2 (set (reg:TI 158) (not:TI (reg:TI 159))) t2.i:4 446 {*one_cmplti3_internal} (nil)) Looking at the movti_ppc64 pattern, I see we're disabling it with the VECTOR_MEM_NONE_P (<MODE>mode) test. If I remove that, we get closer: bergner@genoa:~/gcc/BUGS/LRA$ /home/bergner/gcc/build/gcc-fsf-mainline-pr72804-debug/gcc/xgcc -B/home/bergner/gcc/build/gcc-fsf-mainline-pr72804-debug/gcc -O2 -mcpu=power7 -mvsx-timode -S t2.i bergner@genoa:~/gcc/BUGS/LRA$ cat t2.s ptr4: addi 9,3,16 lxvd2x 0,0,9 addi 9,1,-16 stxvd2x 0,0,9 ld 10,-16(1) ld 11,-8(1) not 3,10 not 4,11 blr Still an unnecessary copt to the stack and back. (insn 2 4 3 2 (set (reg/v/f:DI 157 [ pD.2334 ]) (reg:DI 3 3 [ pD.2334 ])) t2.i:3 565 {*movdi_internal64} (nil)) (insn 6 3 7 2 (set (reg:TI 159) (mem:TI (plus:DI (reg/v/f:DI 157 [ pD.2334 ]) (const_int 16 [0x10])) [1 MEM[(__int128D.6 *)p_2(D) + 16B]+0 S16 A128])) t2.i:4 568 {*movti_ppc64} (nil)) (insn 7 6 8 2 (set (reg:TI 158) (not:TI (reg:TI 159))) t2.i:4 446 {*one_cmplti3_internal} (nil)) So it seems the -mno-vsx-timode code doesn't allow the load to contain an address other than a REG, whereas the -mvsx-timode code is allowing the RED+OFF addressing.