https://gcc.gnu.org/bugzilla/show_bug.cgi?id=122266
Bug ID: 122266
Summary: miscompilation of int128 xor-and-shift under -O2
Product: gcc
Version: 16.0
Status: UNCONFIRMED
Severity: normal
Priority: P3
Component: rtl-optimization
Assignee: unassigned at gcc dot gnu.org
Reporter: mlugg at mlugg dot co.uk
Target Milestone: ---
Consider the following C source file:
signed __int128 repro(signed __int128 lhs) {
signed __int128 sign_mask = -1;
if (lhs >= 0) sign_mask = 0;
return ((lhs ^ sign_mask) >> 1) ^ sign_mask;
}
Compile this file with -O2 for x86_64 Linux, and view the generated assembly
with `objdump`:
$ gcc -c repro.c -o repro.o -O2
$ objdump -S repro.o
repro.o: file format elf64-x86-64
Disassembly of section .text:
0000000000000000 <repro>:
0: 48 89 f0 mov %rsi,%rax
3: 48 c1 f8 3f sar $0x3f,%rax
7: 48 89 c2 mov %rax,%rdx
a: c3 ret
This output is a miscompilation. It reproduces with -O2 or -O3, but not -O1.
Here is a simple C program which can demonstrate the miscompilation:
#include <stdio.h>
extern signed __int128 repro(signed __int128);
int main(void) {
const signed __int128 res = repro(2);
/*
* '2 >= 0', so 'sign_mask = 0'
* so '((lhs ^ 0) >> 1) ^ 0' is '2 >> 1'
* so we expect 'res == 1'
*/
printf("%d\n", (int)res); /* prints 0 */
return 0;
}
However, note that if 'main' is in the same translation unit as 'repro', the
call gets optimized out entirely and the correct behavior is observed.
I first noticed this bug with my system's GCC build, which is version 14.2.1.
However, Compiler Explorer indicates that the bug reproduces on trunk, and back
to GCC 13.1:
https://godbolt.org/z/6ovve5jz8
The bug does *not* appear to reproduce on GCC 12.5 and earlier.
It appears that the bug is in the RTL instruction combination ("combine") pass.
The above Compiler Explorer link shows the following debug output from that
pass (I am able to reproduce similar output on my system GCC build by adding
the `-fdump-rtl-combine-all` flag to my command line):
Trying 14, 17, 18 -> 19:
14: {r102:TI=r105:TI>>0x7f;clobber flags:CC;}
REG_UNUSED flags:CC
17: {r110:TI=r105:TI^r102:TI;clobber flags:CC;}
REG_DEAD r105:TI
REG_UNUSED flags:CC
18: {r111:TI=r110:TI>>0x1;clobber flags:CC;}
REG_DEAD r110:TI
REG_UNUSED flags:CC
19: {r109:TI=r111:TI^r102:TI;clobber flags:CC;}
REG_DEAD r111:TI
REG_DEAD r102:TI
REG_UNUSED flags:CC
Successfully matched this instruction:
(parallel [
(set (reg:TI 109 [ _5 ])
(ashiftrt:TI (reg:TI 105 [ lhsD.2957 ])
(const_int 127 [0x7f])))
(clobber (reg:CC 17 flags))
])
allowing combination of insns 14, 17, 18 and 19
original costs 12 + 8 + 8 + 8 = 36
replacement cost 12
deferring deletion of insn with uid = 18.
deferring deletion of insn with uid = 17.
deferring rescan insn with uid = 15.
deferring deletion of insn with uid = 14.
modifying insn i3 19: {r109:TI=r105:TI>>0x7f;clobber flags:CC;}
REG_DEAD r105:TI
REG_UNUSED flags:CC
deferring rescan insn with uid = 19.
That combination step is incorrect and introduces the miscompilation. I
unfortunately wasn't able to track this down any further since I am unfamiliar
with the GCC codebase.