Summary: The old linear scan logic called free while searching the list
of frames. The atomic fast path finds the frame quickly, but forgot the
free call. This patches adds the missing free. Bugzilla #109685.
See:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/619026.html
Best
Thomas
Summary: The radix sort did not handle the uppermost byte correctly,
which sometimes broke win32 exceptions. Bugzilla #109670. The reporter
confirmed that the patch fixes the bug.
See:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/618000.html
Best
Thomas
Summary: __register_frame and the corresponding _Unwind_Find_FDE use a
global mutex for frame registration and unwinding. This can lead to very
poor performance on machines with high core counts. This patch organizes
the frames in a b-tree with read-optimized synchronization instead,
which allo
Original bug report: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=110956
Rainer Orth successfully tested the patch on Solaris with a full bootstrap.
Some uncommon unwinding table encodings need to access the base pointer
for address computations. We do not have that information in calls to
__de
The radix sort uses two buffers, a1 for input and a2 for output.
After every digit the role of the two buffers is swapped.
When terminating the sort early the code made sure the output
was in a2. However, when we run out of bits, as can happen on
32bit platforms, the sorted result was in a1, was
Summary: The old linear scan logic called free while searching the list
of frames. The atomic fast path finds the frame quickly, but forgot the
free call. This patches adds the missing free. Bugzilla #109685.
See:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617245.html
Best
Thomas
Dear Sören,
we ran into a regression introduced by these changes. The regression
manifests itself in a failing assertion in __deregister_frame_info_bases.
The assertion failure was observed while using Chromium's `flatc` build
system tool. The failing assertion is:
unwind-dw2-fde.c:281
even building the flatbuffers repo externally using cmake at the same revision
didn't reproduce it.
that said, i have attached a dockerfile that shows you what /does/ fail, under
the specific conditions. but there is no unpatched libgcc_s, so you'll have
to do that part yourself, and build a libg
Hello, this patch breaks the build on targets where range is not declared i.e.
where the #ifdef ATOMIC_FDE_FAST_PATH path is not taken.
argh, I did not realize I tested the patch only on atomic fast path
platforms. The patch below fixes that by moving the check inside the #ifdef.
I will chec
Am 19.05.23 um 19:26 schrieb Jeff Law:
See:
https://gcc.gnu.org/pipermail/gcc-patches/2023-May/617245.html
I think this needs an update given the other changes in this space.
jeff
I have included the updated the patch below.
The atomic fastpath bypasses the code that releases the sort
arra
The __register_frame/__deregister_frame functions are used to register
unwinding frames from JITed code in a sorted list. That list itself
is protected by object_mutex, which leads to terrible performance
in multi-threaded code and is somewhat expensive even if single-threaded.
There was already a
Hi Dimitar,
This patch broke avr and pru-elf cross builds:
gcc/libgcc/unwind-dw2-fde.c:680:28: error: unknown type name ‘uintptr_t’
680 |uintptr_t *range)
Should uintptr_t be replaced with __UINTPTR_TYPE__? Such change fixes the
above broken builds for me. But
Hi Dimitar,
This patch broke avr and pru-elf cross builds:
gcc/libgcc/unwind-dw2-fde.c:680:28: error: unknown type name ‘uintptr_t’
680 |uintptr_t *range)
Should uintptr_t be replaced with __UINTPTR_TYPE__? Such change fixes the
above broken builds for me. But
I haven't debugged this in any way, nor checked whether it only impacts
exactly my below scenario, but noticed the following:
At least when building LibreOffice with Clang (16 trunk) with ASan and
UBsan enabled against libstdc++ (with --gcc-toolchain and
LD_LIBRARY_PATH to pick up a libstdc++
At least when building LibreOffice with Clang (16 trunk) with ASan and
UBsan enabled against libstdc++ (with --gcc-toolchain and
LD_LIBRARY_PATH to pick up a libstdc++ trunk build including this change
at build and run-time), at least one of the LibreOffice tests executed
during the build st
In some scenarios (e.g., when mixing gcc and clang code), it can
happen that frames are deregistered after the lookup structure
has already been destroyed. That in itself would be fine, but
it triggers an assert in __deregister_frame_info_bases that
expects to find the frame.
To avoid that, we no
This patch broke bootstrap on AIX and probably other targets.
#ifdef ATOMIC_FDE_FAST_PATH
#include "unwind-dw2-btree.h"
static struct btree registered_frames;
static bool in_shutdown;
...
#else
in_shutdown only is defined for ATOMIC_FDE_FAST_PATH but used in code /
asserts not protected by tha
+static const bool in_shutdown = false;
I'll let Jason or others decide if this is the right solution. It seems
that in_shutdown also could be declared outside the #ifdef and
initialized as "false".
sure, either is fine. Moving it outside the #ifdef wastes one byte in
the executable (w
Hi Claudiu,
This change prohibits compiling of ARC backend:
+ gcc_assert (in_shutdown || ob);
in_shutdown is only defined when ATOMIC_FDE_FAST_PATH is defined,
while gcc_assert is outside of any ifdef. Please can you revisit this
line and change it accordingly.
I have a patch ready, I am
Hi Iain,
You might also want to include Rainer’s patch,
AFAIR patches to fix bootstrap are allowed to proceed as an exception to
the usual rules,
I was not aware of that. I have pushed the patch below now (including
Rainer's change), I will update the code if requested.
Best
Thomas
f
The __register_frame/__deregister_frame functions are used to register
unwinding frames from JITed code in a sorted list. That list itself
is protected by object_mutex, which leads to terrible performance
in multi-threaded code and is somewhat expensive even if single-threaded.
There was already a
+// Common logic for version locks
+struct version_lock
+{
+ // The lock itself
+ uintptr_t version_lock;
+};
version_lock must not overflow, right? This means we need a wider
counter on 32-bit, too. glibc contains a 62-bit counter that it uses
for its own data structure.
an overflow is
We may have to add a new interface. In some other cases, I've seen
errno being used for error reporting, but that requires changes in
applications to recognize the error. It's probably better to crash here
than to fail mysteriously later.
Out of curiosity, how many times do you call the registr
The __register_frame/__deregister_frame functions are used to register
unwinding frames from JITed code in a sorted list. That list itself
is protected by object_mutex, which leads to terrible performance
in multi-threaded code and is somewhat expensive even if single-threaded.
There was already a
Summary: __register_frame and the corresponding _Unwind_Find_FDE use a
global mutex for frame registration and unwinding. This can lead to very
poor performance on machines with high core counts. This patch organizes
the frames in a b-tree with read-optimized synchronization instead,
which allo
Thanks for your feedback, I will update the patch in the next few days,
addressing the comments and reorganizing classify_object_over_fdes.
Concerning your question:
+
+restart:
+ struct btree_node *iter;
+ uintptr_t lock;
+ {
+ // Accessing the root node requires defending against concu
Summary: __register_frame and the corresponding _Unwind_Find_FDE use a
global mutex for frame registration and unwinding. This can lead to very
poor performance on machines with high core counts. This patch organizes
the frames in a b-tree with read-optimized synchronization instead,
which allo
The atomic fastpath bypasses the code that releases the sort
array which was lazily allocated during unwinding. We now
check after deregistering if there is an array to free.
libgcc/ChangeLog:
* unwind-dw2-fde.c: Free sort array in atomic fast path.
---
libgcc/unwind-dw2-fde.c | 6 ++
NOTE: A stress test program and a detailed walkthrough that breaks this
patch into manageable parts can be found here:
https://databasearchitects.blogspot.com/2022/06/making-unwinding-through-jit-ed-code.html
The __register_frame/__deregister_frame functions are used to register
unwinding frames f
Hi,
When dynamically linking a fast enough machine hides the latency, but when
Statically linking or on slower devices this change caused a 5x increase in
Instruction count and 2x increase in cycle count before getting to main.
This has been quite noticeable on smaller devices. Is there a reas
It's easy to reproduce on x86 as well.
As a testcase:
#include
int main(int argc, char** argv) {
return 0;
}
And just compile with: g++ -O1 hello.cpp -static -o hello.exe.
thanks, I will take a look.
Best
Thomas
Hi,
When dynamically linking a fast enough machine hides the latency, but when
Statically linking or on slower devices this change caused a 5x increase in
Instruction count and 2x increase in cycle count before getting to main.
I have looked at ways to fix that. The problem is that with static
When registering a dynamic unwinding frame the fde list is sorted.
Previously, we split the list into a sorted and an unsorted part,
sorted the later using heap sort, and merged both. That can be
quite slow due to the large number of (expensive) comparisons.
This patch replaces that logic with a
Would it be possible to trigger lazy registration if the version is read
as a zero? This would not introduce any additional atomic instructions
on the fast path.
yes, that is possible. The main problem is the transition from lazy to
non-lazy mode when the first exception is thrown. We must som
When registering an unwind frame with __register_frame_info_bases
we currently initialize that fde object eagerly. This has the
advantage that it is immutable afterwards and we can safely
access it from multiple threads, but it has the disadvantage
that we pay the initialization cost even if the a
35 matches
Mail list logo