[valgrind] [Bug 352364] ppc64: --expensive-definedness-checks=yes is not quite working here

Julian Seward via KDE Bugzilla Mon, 15 Aug 2016 23:41:50 -0700

https://bugs.kde.org/show_bug.cgi?id=352364


--- Comment #2 from Julian Seward <jsew...@acm.org> ---
Some more analysis:

typedef
   struct { unsigned char c; int i; void *foo;
} S;

int main (int argc, char **argv) {
   S x;
   S *s = &x;
   s->c = 1;
   if (s->c == 0 && s->i == 205 && s->foo == s) return 1;
   return 0;
}

/*
00000000100003a0 <.main>:
    100003a0:   39 20 00 01     li      r9,1
    100003a4:   38 60 00 00     li      r3,0
    100003a8:   99 21 ff f0     stb     r9,-16(r1)
    100003ac:   60 42 00 00     ori     r2,r2,0
    100003b0:   e9 21 ff f0     ld      r9,-16(r1)
    100003b4:   79 29 46 00     rldicl  r9,r9,8,24
    100003b8:   79 29 c0 02     rotldi  r9,r9,56
    100003bc:   2b a9 00 cd     cmpldi  cr7,r9,205
    100003c0:   4c 9e 00 20     bnelr   cr7   <-------here
    100003c4:   e9 21 ff f8     ld      r9,-8(r1)
    100003c8:   38 61 ff f0     addi    r3,r1,-16
    100003cc:   7c 63 4a 78     xor     r3,r3,r9
    100003d0:   7c 63 00 74     cntlzd  r3,r3
    100003d4:   78 63 d1 82     rldicl  r3,r3,58,6
    100003d8:   7c 63 07 b4     extsw   r3,r3
    100003dc:   4e 80 00 20     blr

   ------ IMark(0x100003B0, 4, 0) ------
   t33 = LDbe:I64(Add64(t29,0xFFFFFFFFFFFFFFF0:I64))
   ------ IMark(0x100003B4, 4, 0) ------
   t34 = And64(Or64(Shl64(t33,0x8:I8),Shr64(t33,0x38:I8)),0xFFFFFFFFFF:I64)
   ------ IMark(0x100003B8, 4, 0) ------
   t48 = Or64(Shl64(t34,0x38:I8),Shr64(t34,0x8:I8))
   PUT(88) = t48
   ------ IMark(0x100003BC, 4, 0) ------
   t54 = 64to8(CmpORD64U(t48,0xCD:I64))

   Simplified:
   t33 = loaded value
   t34 = Rol64(t33, 8) & 0xFF_FFFF_FFFF
   t48 = Ror64(t34, 8)
   t54 = 64to8(CmpORD64U(t48,0xCD:I64))
*/

The t33 -> t48 transformation, is (MSB on the left):

 t33                                 = A B C D E F G H
 Rol(t33, 8)                         = B C D E F G H A
 t34 = Rol(t33, 8) & 0xFF_FFFF_FFFF  = 0 0 0 E F G H A
 t48 = Ror(t34, 8)                   = A 0 0 0 E F G H

So the two rotates in opposite directions serve only to temporarily
put the to-be-zeroed-out area at the top of the word, since rldicl can
only make a bit field immediate that is "anchored" at one end of the
word or the other.

Clearly A is 'c' in the struct and 'E F G H' must be 'i'. 

After 100003a8, memory at -16(r1) is

    -16 -15 -14 -13 -12 -11 -10 -9
      1   U   U   U   U   U   U  U

when loaded (bigendian) we get a 64-bit word, with MSB on the left here:

   1 U U U U U U U

after masking per the above it is

   1 0 0 0 U U U U

and are comparing it with 0 0 0 0 0 0 0 205

This seems to me to be a case that would be correctly handled by
expensiveCmpEQorNE() (mc_translate.c:983, see comment preceding it).
But that applies only to Cmp{EQ,NE}{32,64} and the ppc front end
doesn't produce those.  Instead it produces CmpORD{U,S}{32,64}.

In effect the amd64/x86 (and other, to some extent) front ends together
with ir_opt.c succeed in recovering, from the original instruction stream,
the actual condition (==, !=, >, >=, <=, <, signed or unsigned) that gcc/clang
compiled.  In this case

    100003bc:   2b a9 00 cd     cmpldi  cr7,r9,205
    100003c0:   4c 9e 00 20     bnelr   cr7   <-------here

we know that really this is a != comparison, and if the front end could
have recovered that, creating a CmpNE64, then it would have been instrumented
by expensiveCmpEQorNE and we wouldn't have had the false positive.

I guess we could redo doCmpORD so that it performs the "expensive"
interpretation
for EQ# rather than the "standard interp", per the comments in it.  But that
would slow down all comparisons, including the 99.99% that don't need this
level of accuracy.

A different approach would be to redo how the ppc condition codes are
translated
into IR, and in particular to attempt to recover the actual compared condition,
like with the amd64/x86/arm/arm64 front ends.  And get rid of
CmpORD{32,64}{U,S}.
That would in a way be better, since it would mean that ppc code benefitted
from the same improvements in definedness tracking in comparisons that all the
other architectures do.  This isn't straightforward, though -- might be a week
of work.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 352364] ppc64: --expensive-definedness-checks=yes is not quite working here

Reply via email to