https://bugs.kde.org/show_bug.cgi?id=488169

            Bug ID: 488169
           Summary: VEX infinite loop in instrumenting some ret sequence
                    on ARM-64 (example in lackey tool when
                    --trace-superblocks=yes or --trace-mem=yes)
    Classification: Developer tools
           Product: valgrind
           Version: 3.23.0
          Platform: Other
                OS: Linux
            Status: REPORTED
          Severity: normal
          Priority: NOR
         Component: vex
          Assignee: jsew...@acm.org
          Reporter: newh...@cs.swarthmore.edu
  Target Milestone: ---

***
If you're not sure this is actually a bug, instead post about it at
https://discuss.kde.org

If you're reporting a crash, attach a backtrace with debug symbols; see
https://community.kde.org/Guidelines_and_HOWTOs/Debugging/How_to_create_useful_crash_reports
***

SUMMARY:

There is a VEX instrumentation error for arm-64 when instrumenting some return
sequence instructions.  This error results an infinite loop of instrumentation
because VEX is not correctly getting the return address on a ret instruction in
these cases (details about one function with this error in it, including its
full disassembled pre-VEX instrumented version of the code is below).

The bug is present in the lackey tool when run with --trace-superblocks=yes
and/or --trace-mem=yes.  

NOTE: your lackey test code does not test all its command line options. 
Otherwise, I'm  sure you would see this too.

Please fix this error with VEX on arm. We are using superblock-level
instrumentation in our valgrind tool that works for x86 architectures, and we
really would like to support ARM too, but this VEX error is preventing us from
doing so.


STEPS TO REPRODUCE
1.   gcc -g  prog.c
2.  valgrind --tool=lackey  --trace-superblocks=yes ./a.out

 The observed result  occurs for every C program we have tried, here is an
example super-trivial one that triggers it:
  % cat prog.c
  int main(int argc, char *argv) {
    int x;
    x = 1;
    return 0;
  }

OBSERVED RESULT

The code gets into an infinite loop of tracing between two superblocks, each of
which jumps to the other forever (but should not.   Here is some example output
from running it (more details below of what instructions these correspond to
and how we figured this out, which may help you fix this VEX bug):  

  ...
  SB 04954ed8  
  SB 04954ecc
  SB 04954ed8
  SB 04954ecc
  SB 04954ed8
  SB 04954ecc
  SB 04954ed8
  SB 04954ecc
  SB 04954ed8
  SB 04954ecc
  SB 04954ed8
  ...
  repeats forever

EXPECTED RESULT

We would expect that when we run lackey with --trace-superblocks=yes on this 
a.out , we would see a sequence of its SB's printed out until it the program
terminates followe by lackey's regular summary  output at the end (i.e., we
would expect there to be no infinite loop of SB's printed out, because this
a.out does not have an infinite loop).  

SOFTWARE/OS VERSIONS

This bug is present in valgrind version 3.23.0.  We have also found it in every
previous version of valgrind we tested, including 3.22.0, 3.21.0, 3.19.0, and
3.16.1.   It is present in versions we build from source as well as the default
valgrind version that is installed with the debian version installed.  We have
found it on both ARM-64 boards we have.

Specifically, we can replicate it on these two ARM-64 systems:

(1) linux: Linux 5.10.123-meson64 #22.05.3 SMP PREEMPT Wed Jun 22 07:23:04 UTC
2022 aarch64 GNU/Linux
    gcc (Debian 10.2.1-6) 10.2.1 2021011
    valgrind version valgrind-3.16.1   (version with debian)
    and valgrind version valgrind-3.23.0  (built from source)

(2) linux: Linux 5.15.69-rockchip64 #22.08.2 SMP PREEMPT Wed Sep 21 19:28:26
UTC 2022 aarch64 GNU/Linux
    gcc (Debian 10.2.1-6) 10.2.1 2021011
    valgrind version valgrind-3.16.1   (version with debian)
    and valgrind version valgrind-3.23.0  (built from source)


ADDITIONAL INFORMATION

Here is some information to help you fix this problem.

We found where the infinite loop is occuring (or at least one example code
sequence for which it happens, and the corrosponding to the specific case
above).

The infinite loop occurs in VEX's instrumentation the __aarch64_cas4_acq (in
/usr/lib/aarch64-linux-gnu/libc-2.31.so) function called as part of _start
before main is called.  Again, this occurs when instrumenting at the superblock
level triggered by running lackey with --trace-superblocks=yes, and also at the
memory access level triggered by running lackey with --trace-mem=yes).

This is a dump from gdb of a the originial executable (not of your VEX
instrumented binary, but of the code sequence you are instrumenting in VEX).  

The infinite loop happens when ret instruction is executed (at <+48>).  The VEX
instrumented code incorrectly executes n the b.ne (at <+36>) as the retun
address instruction (which it incorrect), which then branches to the ret
instruction (at <+48>, which uses instruction at <+36> as the return address,
which ... and so on forever:

  (gdb) disass
  Dump of assembler code for function __aarch64_cas4_acq:
  => 0x0000fffff7f65eb0 <+0>:   bti     c
    0x0000fffff7f65eb4 <+4>:    adrp    x16, 0xfffff7fcb000 <resbuf.0+8>
    0x0000fffff7f65eb8 <+8>:    ldrb    w16, [x16, #4032]
    0x0000fffff7f65ebc <+12>:   cbz     w16, 0xfffff7f65ec8
<__aarch64_cas4_acq+24>
    0x0000fffff7f65ec0 <+16>:   casa    w0, w1, [x2]
    0x0000fffff7f65ec4 <+20>:   ret
    0x0000fffff7f65ec8 <+24>:   mov     w16, w0
    0x0000fffff7f65ecc <+28>:   ldaxr   w0, [x2]
    0x0000fffff7f65ed0 <+32>:   cmp     w0, w16
    0x0000fffff7f65ed4 <+36>:   b.ne    0xfffff7f65ee0 <__aarch64_cas4_acq+48> 
// b.any
    0x0000fffff7f65ed8 <+40>:   stxr    w17, w1, [x2]
    0x0000fffff7f65edc <+44>:   cbnz    w17, 0xfffff7f65ecc
<__aarch64_cas4_acq+28>
    0x0000fffff7f65ee0 <+48>:   ret

Here is a stack trace of where this function is called (part of the _start
sequence, before main):

  (gdb) bt
  #0  0x0000fffff7f65eb0 in __aarch64_cas4_acq () from
/lib/aarch64-linux-gnu/libc.so.6
  #1  0x0000fffff7e8db70 in __internal_atexit (func=0xfffff7fdab20 <_dl_fini>,
arg=arg@entry=0x0,
      d=d@entry=0x0, listp=listp@entry=0xfffff7fc76b8 <__exit_funcs>) at
cxa_atexit.c:43
  #2  0x0000fffff7e8dc4c in __GI___cxa_atexit (func=<optimized out>,
arg=arg@entry=0x0, d=d@entry=0x0)
      at cxa_atexit.c:70
  #3  0x0000fffff7e77d90 in __libc_start_main (main=0xaaaaaaaa0724 <main>,
argc=1,
      argv=0xfffffffff128, init=0xaaaaaaaa0750 <__libc_csu_init>,
fini=<optimized out>,
      rtld_fini=<optimized out>, stack_end=<optimized out>) at
../csu/libc-start.c:238
  #4  0x0000aaaaaaaa0644 in _start ()


Here is some output from our modification to trace_superblock function in
lackey (valgrind version 3.23.0) where we added a call to (VG_(pp_addrinfo)):

  ...
  SB 04954ecc
  ==116621==  Address 0x4954ed8 is in the Text segment of
/usr/lib/aarch64-linux-gnu/libc-2.31.so
  ==116621==    at 0x4954ED8: __aarch64_cas4_acq (in
/usr/lib/aarch64-linux-gnu/libc-2.31.so)
  SB 04954ed8
  ==116621==  Address 0x4954ecc is in the Text segment of
/usr/lib/aarch64-linux-gnu/libc-2.31.so
  ==116621==    at 0x4954ECC: __aarch64_cas4_acq (in
/usr/lib/aarch64-linux-gnu/libc-2.31.so)
  SB 04954ecc
  ==116621==  Address 0x4954ed8 is in the Text segment of
/usr/lib/aarch64-linux-gnu/libc-2.31.so
  ==116621==    at 0x4954ED8: __aarch64_cas4_acq (in
/usr/lib/aarch64-linux-gnu/libc-2.31.so)
  SB 04954ed8
  ==116621==  Address 0x4954ecc is in the Text segment of
/usr/lib/aarch64-linux-gnu/libc-2.31.so
  ==116621==    at 0x4954ECC: __aarch64_cas4_acq (in
/usr/lib/aarch64-linux-gnu/libc-2.31.so)
  SB 04954ecc
  ==116621==  Address 0x4954ed8 is in the Text segment of
/usr/lib/aarch64-linux-gnu/libc-2.31.so
  ==116621==    at 0x4954ED8: __aarch64_cas4_acq (in
/usr/lib/aarch64-linux-gnu/libc-2.31.so)
  SB 04954ed8
  ==116621==  Address 0x4954ecc is in the Text segment of
/usr/lib/aarch64-linux-gnu/libc-2.31.so
  ==116621==    at 0x4954ECC: __aarch64_cas4_acq (in
/usr/lib/aarch64-linux-gnu/libc-2.31.so)
  SB 04954ecc
  ==116621==  Address 0x4954ed8 is in the Text segment of
/usr/lib/aarch64-linux-gnu/libc-2.31.so
  ==116621==    at 0x4954ED8: __aarch64_cas4_acq (in
/usr/lib/aarch64-linux-gnu/libc-2.31.so)
  SB 04954ed8
  ...
  repeats forever

-- 
You are receiving this mail because:
You are watching all bug changes.

Reply via email to