true segfaults with "grail" enabled

Julian Seward Fri, 13 Mar 2020 00:31:01 -0700

https://bugs.kde.org/show_bug.cgi?id=417281


--- Comment #5 from Julian Seward <jsew...@acm.org> ---
(In reply to Julian Seward from comment #4)
> As a next step I am inclined to add printf lines for all cases (rules)
> in the insn selector.  Then run the test case with and without &&-recovery
> enabled, so as to find the sets of rules used in both cases, and diff
> them, to see what is only used when &&-recovery is enabled.  Then look
> more closely at those cases.

I did that yesterday.  The only rules I found like that were for Iop_And1 
and Iop_Or1, as expected.  And these still look correct to me.  So I really
don't think this is an instruction selector problem, and for reasons explained
in comment #4, I don't think this is a front end problem either.  That leaves
as possibilities either the register allocator (core logic), the s390-specific
support for the register allocator (getRegUsage_S390, basically), or something
to do with the stack layout.

I experimented with register allocation a bit.  getRRegUniverse_S390() tells
the register allocator which registers are available.  If register allocation
is correct, removing some registers from the set of available ones should not
change the behaviour of the code, affecting only performance.  But it does
make a difference.  getRRegUniverse_S390() provides GPRs in two separate
groups: 1..5 and 6..11.  Removing one group or the other does not change the
place (in the guest code) where V crashes, but it does change the error
messages at the crash point.  This seems like a big red flag to me.

With the GPR group 6..11 removed from allocation, what I notice is that the
crash occurs soon after a call to a helper function.  And it crashes due to
an invalid memory reference, using as address, a value which is spilled 
before the call and reloaded afterwards.  I'll put a longer disassembly in
an attached file, but here are the essentials:

   // r13 is the guest state pointer.  It points to an area which
   // is: 3 copies of struct VexGuestS390XState, followed by a spill area.
   0x0000001003c5be5a:  llilf   %r0,67149062 ; 0x4009d06
   0x0000001003c5be60:  stg     %r0,720(%r13) ; set guest PC to 0x4009d06

   // Do stuff (I don't know what; probably not important)
   0x0000001003c5be66:  lg      %r5,656(%r13)
   0x0000001003c5be6c:  sllg    %r5,%r5,3
   0x0000001003c5be72:  ag      %r5,624(%r13)
   0x0000001003c5be78:  lg      %r4,1472(%r13)
   0x0000001003c5be7e:  sllg    %r4,%r4,3
   0x0000001003c5be84:  og      %r4,1440(%r13)
   0x0000001003c5be8a:  ltgr    %r4,%r4
   0x0000001003c5be8e:  stg     %r5,2464(%r13) ; spill <---
   0x0000001003c5be94:  je      0x1003c5beae

   // helper call; sequence created by s390_insn_helper_call_emit()
   0x0000001003c5be98:  iihf    %r1,8
   0x0000001003c5be9e:  iilf    %r1,98288
   0x0000001003c5bea4:  stfpc   232(%r15)
   0x0000001003c5bea8:  basr    %r14,%r1
   0x0000001003c5beaa:  lfpc    232(%r15)

   // after the call
   0x0000001003c5beae:  lg      %r5,2464(%r13) ; reload <---
   0x0000001003c5beb4:  lgr     %r2,%r5

   // another helper call
   0x0000001003c5beb8:  iihf    %r1,8
   0x0000001003c5bebe:  iilf    %r1,95000
   0x0000001003c5bec4:  stfpc   232(%r15)
   0x0000001003c5bec8:  basr    %r14,%r1
   0x0000001003c5beca:  lfpc    232(%r15)

   // after the call
   0x0000001003c5bece:  lgr     %r5,%r2
   0x0000001003c5bed2:  lg      %r3,2464(%r13) ; reload <---
=> 0x0000001003c5bed8:  lg      %r4,0(%r3) // crash (r3 is near zero) <---

So it might be that 2464(%r13) got corrupted across one of the helper calls.

>From comments on VexGuestS390XState, it appears that this has size 816 bytes.
If that's correct, then the 3 copies of it starting at 0(%r13) have total size
2448 bytes, so offset 2464 is 16 bytes into the spill area.

So that's my analysis so far.  I think it is suspicious that changing the
available register set changes the messages shown at the failure site.  But
the analysis might be wrong.  For example, it may be the value that is 
computed into %r5 and then spilled into 2464(%r13) is already wrong, and
that the spilling is OK.

-- 
You are receiving this mail because:
You are watching all bug changes.

[valgrind] [Bug 417281] s390x: /bin/true segfaults with "grail" enabled

Reply via email to