https://gcc.gnu.org/bugzilla/show_bug.cgi?id=119151

            Bug ID: 119151
           Summary: unwind-dw2-btree maintains separators wrong
           Product: gcc
           Version: 14.2.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: libgcc
          Assignee: unassigned at gcc dot gnu.org
          Reporter: michael at slashhome dot org
  Target Milestone: ---

Created attachment 60670
  --> https://gcc.gnu.org/bugzilla/attachment.cgi?id=60670&action=edit
a minimal failure case

I work on a system which JIT compiles objects in memory and loads and unloads
them with __register_object and __deregister_object. With older versions of
gcc, this results in very poor performance with concurrent exception handling.
With newer gcc, a btree is used, which improves the performance dramatically.
Unfortunately, there is a bug in this btree.

The symptom is that sometimes, randomly, an exception will not be caught. The
stack looks like this:
#0  0x00007f62f4620c37 in __GI_raise (sig=6) at
../nptl/sysdeps/unix/sysv/linux/raise.c:56
#1  0x00007f62f4624028 in __GI_abort () at abort.c:89
#2  0x00007f62f4c6cca6 in __gnu_cxx::__verbose_terminate_handler () at
../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#3  0x00007f62f4c730d6 in __cxxabiv1::__terminate (handler=<optimized out>) at
../../../../libstdc++-v3/libsupc++/eh_terminate.cc:47
#4  0x00007f62f4c73111 in std::terminate () at
../../../../libstdc++-v3/libsupc++/eh_terminate.cc:57
#5  0x00007f62f4c7336c in __cxxabiv1::__cxa_throw (obj=<optimized out>,
tinfo=0x59dd4a0 <typeinfo for SQLException>, dest=0xe7d9380
<SQLException::~SQLException()>) at
../../../../libstdc++-v3/libsupc++/eh_throw.cc:95
#6  0x00007f62b59b9c83 in 100000:InListElementToHtElement_0_internal_throw ()
... etc etc ...

That is one of the JIT compiled functions at the bottom there. This happened in
about 1/50 runs of a 40 minute test. After some work, dumping cores, and
intercepting some calls, I determined that this was due to
_Unwind_RaiseException returning _URC_END_OF_STACK to __cxa_throw.

The reason was that it couldn't find a frame in the registered_frames btree. It
couldn't find this frame because it was looking in the wrong leaf - it needed
the leaf one prior. It was looking in the wrong leaf because the separator of
the inner node was wrong.

The separator of the inner node was wrong because an object (or several
objects) had been removed and another object had been inserted using much the
same address space, but with a different endpoint. Separators in the btree are
only maintained using the base pointer - the size of a region inserted into the
btree is never considered. size is considered for placement of an entry within
a leaf node, but is not used for finding a node to insert into or used to
adjust separators.

I have attached a minimal failure case, which can be built via `gcc -g -I
../build/x86_64-pc-linux-gnu/libgcc minimal-btree-failure.c -o minimal`. It
will assert because the final call to `btree_lookup` will return nullptr. When
it does so, the btree has three nodes. A root node, with two children, with
{separator = 599} and {separator = max_separator}. The left node contains the
entry we're looking for, {base = 595 size = 10}. You'll note that this extends
from 595 to 605. But the separator of the parent node is 599! So when we look
for 602, we try the right node, which is the wrong direction.

There is a fairly simple fix, which is to update the separators while inserting
a new item. We need to update the separator each time we go down a level in the
tree, whether we're splitting an inner node in the process or not. I will add a
patch in a followup comment since the bug filing form won't let me add two
files.

I have not run thorough tests on this patch yet, just some simple ones
involving creating large trees, deleting one node, inserting a node in the
resulting hole, and then trying to query a spot in the middle of the new node.
The test case is the shortest such one found my those tests. I think this would
benefit from both randomized testing and stress testing. I couldn't find any
tests for this library in the gcc tree, nor any tests exercising
__register_frame and __unregister_frame. Are there any?

Reply via email to