https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68381
Bug ID: 68381 Summary: [6 Regression] wrong code and quality regression with __builtin_mul_overflow() @ aarch64 Product: gcc Version: 6.0 Status: UNCONFIRMED Severity: normal Priority: P3 Component: target Assignee: unassigned at gcc dot gnu.org Reporter: zsojka at seznam dot cz Target Milestone: --- Created attachment 36733 --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=36733&action=edit reduced testcase Output: $ aarch64-unknown-linux-gnu-gcc -O -fexpensive-optimizations -fno-tree-bit-ccp testcase.c $ ./a.out Aborted The function foo() is miscompiled. 5-branch output: foo: uxth w0, w0 uxth w1, w1 umull x0, w0, w1 tbz w0, #31, .L7 stp x29, x30, [sp, -16]! add x29, sp, 0 bl abort .L7: ret trunk output: foo: tbnz w0, #31, .L10 ret .L10: stp x29, x30, [sp, -16]! add x29, sp, 0 bl abort Things seem to break in .combine if -fexpensive-optimisations is enabled. Before .combine, there is: (insn 2 5 3 2 (set (reg/v:SI 80 [ xD.2712 ]) (zero_extend:SI (reg:HI 0 x0 [ xD.2712 ]))) testcase.c:3 82 {*zero_extendhisi2_aarch64} (expr_list:REG_DEAD (reg:HI 0 x0 [ xD.2712 ]) (nil))) (insn 3 2 4 2 (set (reg/v:SI 81 [ yD.2713 ]) (zero_extend:SI (reg:HI 1 x1 [ yD.2713 ]))) testcase.c:3 82 {*zero_extendhisi2_aarch64} (expr_list:REG_DEAD (reg:HI 1 x1 [ yD.2713 ]) (nil))) (note 4 3 7 2 NOTE_INSN_FUNCTION_BEG) (insn 7 4 8 2 (set (reg:SI 76 [ _5+4 ]) (const_int 0 [0])) testcase.c:5 39 {*movsi_aarch64} (nil)) (insn 8 7 9 2 (set (reg:DI 82) (mult:DI (zero_extend:DI (reg/v:SI 80 [ xD.2712 ])) (zero_extend:DI (reg/v:SI 81 [ yD.2713 ])))) testcase.c:5 360 {umulsidi3} (expr_list:REG_DEAD (reg/v:SI 81 [ yD.2713 ]) (expr_list:REG_DEAD (reg/v:SI 80 [ xD.2712 ]) (nil)))) (insn 9 8 10 2 (set (reg:DI 83) (lshiftrt:DI (reg:DI 82) (const_int 32 [0x20]))) testcase.c:5 614 {*aarch64_lshr_sisd_or_int_di3} (nil)) (insn 10 9 39 2 (set (reg:CC 66 cc) (compare:CC (subreg:SI (reg:DI 83) 0) (const_int 0 [0]))) testcase.c:5 375 {*cmpsi} (expr_list:REG_UNUSED (reg:CC 66 cc) (nil))) ... (insn 43 42 44 2 (set (reg:CC 66 cc) (compare:CC (subreg:SI (reg:DI 82) 0) (const_int 0 [0]))) testcase.c:5 375 {*cmpsi} (nil)) and .combine shows: Trying 2, 8, 9 -> 10: Successfully matched this instruction: (set (reg:DI 83) (const_int 0 [0])) (const_int 0 [0]) which seems to miss the parallel set of reg 82. The performance regression is at -O3: 5-branch output: foo: uxth x0, w0 // xD.2664, xD.2664 uxth x1, w1 // yD.2665, yD.2665 mul x0, x0, x1 // tmp84, xD.2664, yD.2665 cmp x0, x0, sxtw // tmp84, tmp84 bne .L9 //, ret .L9: stp x29, x30, [sp, -16]! //,,, add x29, sp, 0 //,, bl abort // trunk output: foo: uxth w0, w0 // xD.2712, xD.2712 uxth w1, w1 // yD.2713, yD.2713 umull x0, w0, w1 // tmp81, xD.2712, yD.2713 tbnz w0, #31, .L6 // tmp81, mov w2, 0 // _5, cbnz w2, .L6 // _5, ret .L6: stp x29, x30, [sp, -16]! //,,, add x29, sp, 0 //,, bl abort // The code: mov w2, 0 // _5, cbnz w2, .L6 // _5, seems to be absolutely unneeded. I don't know if the wrong-code and missed-optimization is related. $ aarch64-unknown-linux-gnu-gcc -v Using built-in specs. COLLECT_GCC=/mnt/svn/gcc-trunk/binary-latest-aarch64/bin/aarch64-unknown-linux-gnu-gcc COLLECT_LTO_WRAPPER=/mnt/svn/gcc-trunk/binary-230409-checking-yes-rtl-df-nographite-aarch64/libexec/gcc/aarch64-unknown-linux-gnu/6.0.0/lto-wrapper Target: aarch64-unknown-linux-gnu Configured with: /mnt/svn/gcc-trunk//configure --enable-checking=yes,rtl,df --enable-languages=c,c++ --prefix=/mnt/svn/gcc-trunk/binary-230409-checking-yes-rtl-df-nographite-aarch64/ --without-cloog --without-ppl --without-isl --host=x86_64-pc-linux-gnu --target=aarch64-unknown-linux-gnu --build=x86_64-pc-linux-gnu --with-sysroot=/home/aarch64-chroot --with-as=/usr/libexec/gcc/aarch64-unknown-linux-gnu/as --with-ld=/usr/libexec/gcc/aarch64-unknown-linux-gnu/ld Thread model: posix gcc version 6.0.0 20151116 (experimental) (GCC) Tested revisions: trunk r230409 - FAIL 5-branch r229483 - OK