https://bugs.kde.org/show_bug.cgi?id=352364
--- Comment #2 from Julian Seward <jsew...@acm.org> --- Some more analysis: typedef struct { unsigned char c; int i; void *foo; } S; int main (int argc, char **argv) { S x; S *s = &x; s->c = 1; if (s->c == 0 && s->i == 205 && s->foo == s) return 1; return 0; } /* 00000000100003a0 <.main>: 100003a0: 39 20 00 01 li r9,1 100003a4: 38 60 00 00 li r3,0 100003a8: 99 21 ff f0 stb r9,-16(r1) 100003ac: 60 42 00 00 ori r2,r2,0 100003b0: e9 21 ff f0 ld r9,-16(r1) 100003b4: 79 29 46 00 rldicl r9,r9,8,24 100003b8: 79 29 c0 02 rotldi r9,r9,56 100003bc: 2b a9 00 cd cmpldi cr7,r9,205 100003c0: 4c 9e 00 20 bnelr cr7 <-------here 100003c4: e9 21 ff f8 ld r9,-8(r1) 100003c8: 38 61 ff f0 addi r3,r1,-16 100003cc: 7c 63 4a 78 xor r3,r3,r9 100003d0: 7c 63 00 74 cntlzd r3,r3 100003d4: 78 63 d1 82 rldicl r3,r3,58,6 100003d8: 7c 63 07 b4 extsw r3,r3 100003dc: 4e 80 00 20 blr ------ IMark(0x100003B0, 4, 0) ------ t33 = LDbe:I64(Add64(t29,0xFFFFFFFFFFFFFFF0:I64)) ------ IMark(0x100003B4, 4, 0) ------ t34 = And64(Or64(Shl64(t33,0x8:I8),Shr64(t33,0x38:I8)),0xFFFFFFFFFF:I64) ------ IMark(0x100003B8, 4, 0) ------ t48 = Or64(Shl64(t34,0x38:I8),Shr64(t34,0x8:I8)) PUT(88) = t48 ------ IMark(0x100003BC, 4, 0) ------ t54 = 64to8(CmpORD64U(t48,0xCD:I64)) Simplified: t33 = loaded value t34 = Rol64(t33, 8) & 0xFF_FFFF_FFFF t48 = Ror64(t34, 8) t54 = 64to8(CmpORD64U(t48,0xCD:I64)) */ The t33 -> t48 transformation, is (MSB on the left): t33 = A B C D E F G H Rol(t33, 8) = B C D E F G H A t34 = Rol(t33, 8) & 0xFF_FFFF_FFFF = 0 0 0 E F G H A t48 = Ror(t34, 8) = A 0 0 0 E F G H So the two rotates in opposite directions serve only to temporarily put the to-be-zeroed-out area at the top of the word, since rldicl can only make a bit field immediate that is "anchored" at one end of the word or the other. Clearly A is 'c' in the struct and 'E F G H' must be 'i'. After 100003a8, memory at -16(r1) is -16 -15 -14 -13 -12 -11 -10 -9 1 U U U U U U U when loaded (bigendian) we get a 64-bit word, with MSB on the left here: 1 U U U U U U U after masking per the above it is 1 0 0 0 U U U U and are comparing it with 0 0 0 0 0 0 0 205 This seems to me to be a case that would be correctly handled by expensiveCmpEQorNE() (mc_translate.c:983, see comment preceding it). But that applies only to Cmp{EQ,NE}{32,64} and the ppc front end doesn't produce those. Instead it produces CmpORD{U,S}{32,64}. In effect the amd64/x86 (and other, to some extent) front ends together with ir_opt.c succeed in recovering, from the original instruction stream, the actual condition (==, !=, >, >=, <=, <, signed or unsigned) that gcc/clang compiled. In this case 100003bc: 2b a9 00 cd cmpldi cr7,r9,205 100003c0: 4c 9e 00 20 bnelr cr7 <-------here we know that really this is a != comparison, and if the front end could have recovered that, creating a CmpNE64, then it would have been instrumented by expensiveCmpEQorNE and we wouldn't have had the false positive. I guess we could redo doCmpORD so that it performs the "expensive" interpretation for EQ# rather than the "standard interp", per the comments in it. But that would slow down all comparisons, including the 99.99% that don't need this level of accuracy. A different approach would be to redo how the ppc condition codes are translated into IR, and in particular to attempt to recover the actual compared condition, like with the amd64/x86/arm/arm64 front ends. And get rid of CmpORD{32,64}{U,S}. That would in a way be better, since it would mean that ppc code benefitted from the same improvements in definedness tracking in comparisons that all the other architectures do. This isn't straightforward, though -- might be a week of work. -- You are receiving this mail because: You are watching all bug changes.