eddieran opened a new pull request, #16195: URL: https://github.com/apache/dubbo/pull/16195
## What is the purpose of the change Fixes #16194 `JVMUtil.jstack()` calls `ThreadMXBean.dumpAllThreads(true, true)`. The `lockedSynchronizers=true` parameter forces the JVM to scan the **entire Java heap** at a safepoint to find all `AbstractOwnableSynchronizer` instances. On ZGC with large heaps, this causes catastrophic safepoint pauses (**36–39 seconds** measured on a 65GB heap with ~1950 threads) that freeze the entire application. This PR changes `lockedSynchronizers` from `true` to `false`, eliminating the heap scan. ### Root Cause | GC | Heap Scan Mechanism | 65GB Scan Time | |----|---------------------|----------------| | G1 / Parallel | Direct pointer access | ~1–3s | | **ZGC** | Load barrier on every reference (color bit check → relocate → forwarding table → remap) | **~37s** | The OpenJDK community already fixed this on the tooling side ([JDK-8324066](https://bugs.openjdk.org/browse/JDK-8324066): "clhsdb jstack should not scan for j.u.c locks by default"), but the programmatic API (`ThreadMXBean.dumpAllThreads`) has no such protection. ### Production Impact When `AbortPolicyWithReport` fires on ZGC + large heap: 1. `dumpAllThreads(true, true)` → 37s full application freeze 2. Queued requests immediately exhaust pool on release → cascading freezes 3. Observed: **4 consecutive dumps → ~150s near-total service unavailability** ## Brief changelog - `JVMUtil.jstack()`: Change `dumpAllThreads(true, true)` to `dumpAllThreads(true, false)` ## What is lost Only the "Locked synchronizers" section at the bottom of each thread's dump — i.e., `java.util.concurrent.locks.ReentrantLock` / `ReadWriteLock` ownership. All other diagnostic info is retained: | Information | Retained? | |---|---| | Thread name, ID, state | Yes | | Full stack traces | Yes | | `synchronized` block contention (BLOCKED on ...) | Yes | | `synchronized` monitor ownership (- locked ...) | Yes | | Waiting/parking state | Yes | | Lock owner for BLOCKED threads | Yes | | **j.u.c.locks ownership (ReentrantLock, etc.)** | **No** | ## Verifying this change Existing tests pass — all tests in `AbortPolicyWithReportTest` mock the `jstack()` method and are not affected by the parameter change. The fix can be verified by: 1. Deploy a Dubbo app with ZGC + large heap (≥32GB) 2. Exhaust the thread pool to trigger `AbortPolicyWithReport` 3. Observe safepoint duration in GC logs — should drop from ~37s to <100ms -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
