eddieran opened a new issue, #16194: URL: https://github.com/apache/dubbo/issues/16194
### Pre-check - [X] I am sure that all the content I provide is in English. - [X] I had searched in the [issues](https://github.com/apache/dubbo/issues?q=is%3Aissue) and found no similar issues. ### Apache Dubbo Component Java SDK (apache/dubbo) ### Dubbo Version Dubbo Java 3.3.6, OpenJDK 21.0.5+11-LTS, Ubuntu (ZGC Generational: `-XX:+UseZGC -XX:+ZGenerational`, Heap: `-Xms65g -Xmx65g`) ### Steps to reproduce this issue When the Dubbo thread pool is exhausted, `AbortPolicyWithReport.rejectedExecution()` invokes `dumpJStack()` → `JVMUtil.jstack()` → `ThreadMXBean.dumpAllThreads(true, true)`. The second `true` parameter (`lockedSynchronizers`) forces the JVM to scan the **entire Java heap** at a safepoint to find all `AbstractOwnableSynchronizer` instances via `HeapInspection::find_instances_at_safepoint()`. On ZGC with large heaps, every object reference during this scan passes through ZGC's load barrier (color bit check → potential relocate → forwarding table lookup → remap), making this operation **12–20x slower** than on G1/Parallel. #### Environment - **JDK**: OpenJDK 21.0.5+11-LTS - **GC**: ZGC Generational (`-XX:+UseZGC -XX:+ZGenerational`) - **Heap**: 65GB (`-Xms65g -Xmx65g`) - **Threads**: ~1950 - **Dubbo version**: 3.3.6 #### Reproduction Steps 1. Deploy a Dubbo application with ZGC and a large heap (≥32GB) 2. Drive enough traffic to exhaust the Dubbo thread pool 3. `AbortPolicyWithReport.rejectedExecution()` fires and calls `dumpJStack()` 4. Observe a **~37-second full application freeze** in GC logs (safepoint duration) #### Relevant Code [`JVMUtil.java` Line 36](https://github.com/apache/dubbo/blob/3.3/dubbo-common/src/main/java/org/apache/dubbo/common/utils/JVMUtil.java#L36): ```java public static void jstack(OutputStream stream) throws Exception { ThreadMXBean threadMxBean = ManagementFactory.getThreadMXBean(); for (ThreadInfo threadInfo : threadMxBean.dumpAllThreads(true, true)) { // ^^^^ lockedSynchronizers=true triggers heap scan stream.write(getThreadDumpString(threadInfo).getBytes(StandardCharsets.UTF_8)); } } ``` ### What you expected to happen `JVMUtil.jstack()` should complete in **milliseconds to low seconds** without causing a significant safepoint pause, regardless of GC algorithm or heap size. **What actually happens:** A **36.83-second safepoint pause** freezes all ~1950 application threads. The freeze cascades: queued requests during the pause immediately exhaust the thread pool again on release, triggering another dump. We observed **4 consecutive ThreadDumps producing ~150 seconds of near-total service unavailability**. #### GC Log Evidence (safepoint entries) | Event | Timestamp | At Safepoint Duration | Thread Count | |-------|-----------|----------------------|--------------| | ThreadDump #1 | 10:41:04.466 | 38,589,643,169 ns (**38.59s**) | 1954 | | ThreadDump #2 | 10:51:15.645 | 36,833,709,974 ns (**36.83s**) | 1947 | For comparison, normal ZGC safepoint operations (Mark Start, Mark End, Relocate Start) complete in **0.1–0.8ms**. #### async-profiler Wall-Mode Stack Trace (VM Thread during freeze) ``` VM_ThreadDump::doit() → ConcurrentLocksDump::dump_at_safepoint() → HeapInspection::find_instances_at_safepoint() → ZHeap::object_iterate() → ZHeapIterator::drain_and_steal<true>() → FindInstanceClosure::do_object() → ZBarrier::load_barrier_on_oop_field_preloaded ``` ZGC barrier frames account for **55%** of VM Thread samples during the freeze: | Frame | % of VM Thread samples | |-------|----------------------| | `FindInstanceClosure::do_object()` | 25.0% | | `ZBarrier::load_barrier_on_oop_field_preloaded` | 19.8% | | `OopOopIterateDispatch::oop_oop_iterate` | 17.0% | | `ZBarrierSet::AccessBarrier::oop_access_barrier` | 8.4% | ### Anything else #### Root Cause Analysis `dumpAllThreads(lockedMonitors, lockedSynchronizers)` with `lockedSynchronizers=true` requires the JVM to find all heap instances of `AbstractOwnableSynchronizer`. This is implemented via `HeapInspection::find_instances_at_safepoint()`, which iterates the **entire heap** at a safepoint. - On traditional GCs (G1, Parallel): heap iteration uses direct pointer access — a 65GB heap scan takes ~1–3 seconds. - On ZGC: every object reference passes through the load barrier (color bit check, potential relocate, forwarding table lookup), inflating the cost by **12–20x** to ~37 seconds. This is a known JVM-level characteristic. The OpenJDK community has addressed it on the tooling side: - [JDK-8262098](https://bugs.openjdk.org/browse/JDK-8262098): "jhsdb jstack can be very slow" — identified heap scan for `AbstractOwnableSynchronizer` as expensive - [JDK-8324066](https://bugs.openjdk.org/browse/JDK-8324066): "clhsdb jstack should not by default scan for j.u.c locks" — fixed by disabling lock scanning by default However, the **programmatic API** (`ThreadMXBean.dumpAllThreads`) has no such protection — callers passing `lockedSynchronizers=true` still trigger the full heap scan. #### Suggested Fix **Option A (minimal, recommended):** Change `dumpAllThreads(true, true)` to `dumpAllThreads(true, false)`: ```java // Before ThreadInfo[] infos = threadMxBean.dumpAllThreads(true, true); // After — avoids heap scan for AbstractOwnableSynchronizer ThreadInfo[] infos = threadMxBean.dumpAllThreads(true, false); ``` This retains locked monitor information (cheap — derived from thread stacks) but skips `lockedSynchronizers` (triggers expensive heap scan). Lock contention on `synchronized` blocks remains fully visible; only `java.util.concurrent.locks` ownership is lost from the dump output. **Option B (adaptive):** Detect the GC type and heap size at runtime; only pass `lockedSynchronizers=true` when the cost is acceptable (e.g., G1/Parallel with heap < 16GB). **Option C (JDK 21+):** Use `jcmd <pid> Thread.dump_to_file` equivalent or `HotSpotDiagnosticMXBean` for thread dumps that do not require a safepoint heap scan. #### Cascade Effect The existing rate-limiting fix (#14468) prevents repeated dumps within 10 minutes but does **not** reduce the per-dump cost. When a single dump takes 37 seconds, the cascade is: 1. Thread pool exhausted → `dumpAllThreads(true, true)` → 37s freeze 2. During freeze, all incoming requests queue up 3. On safepoint release, the queued requests immediately exhaust the pool again 4. If the 10-minute window has passed (or on first occurrence), another 37s freeze In our production incident, this produced ~150 seconds of near-total unavailability. #### Related Issues - #6696 — `AbortPolicyWithReport` causes JVM pause (same symptom, reported in 2020 for Dubbo 2.6.8, no resolution) - #14467 / #14468 — Fix repeated jstack when thread pool exhausted (addresses frequency, not per-call cost) - [JDK-8262098](https://bugs.openjdk.org/browse/JDK-8262098) — jhsdb jstack can be very slow - [JDK-8324066](https://bugs.openjdk.org/browse/JDK-8324066) — clhsdb jstack should not scan for j.u.c locks by default - [JDK-8221153](https://bugs.openjdk.org/browse/JDK-8221153) — ZGC heap iteration and verification pollutes GC statistics ### Are you willing to submit a pull request to fix on your own? - [X] Yes I am willing to submit a pull request on my own! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
