ankitsultana opened a new issue, #10140: URL: https://github.com/apache/pinot/issues/10140
**Context** When the `Monitor` in `RoundRobinScheduler` leaves, it will call `Monitor#signalNextWaiter` which will call `RoundRobinScheduler#hasNext` since the `signalNextWaiter` method iterates over all the guards for the given monitor and checks if they are satisfied. Now the `hasNext` method tries to re-compute the _ready queue every time it is called. **What we are seeing internally** We looked at one of our servers which was having memory issues and took thread-dumps over a period of few minutes. We found that all the QueryWorker threads are always waiting on acquiring the Guard: ``` "query_worker_on_8421_port-1-thread-1" #2871 prio=5 os_prio=0 cpu=135674.67ms elapsed=69632.90s tid=0x00007ee398016000 nid=0xd24 waiting on condition [0x00007ef0a8758000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@11.0.15/Native Method) -- "query_worker_on_8421_port-1-thread-2" #2882 prio=5 os_prio=0 cpu=101003.62ms elapsed=69629.03s tid=0x00007ee398017000 nid=0xd30 waiting on condition [0x00007e9974377000] java.lang.Thread.State: WAITING (parking) at jdk.internal.misc.Unsafe.park(java.base@11.0.15/Native Method) -- ... ``` There's always one thread which is trying to compute the ready queue. ``` "grpc-default-executor-37238" #54551 daemon prio=5 os_prio=0 cpu=2507.80ms elapsed=14701.97s tid=0x00007edbb0234000 nid=0xd955 runnable [0x00007edcbddb1000] java.lang.Thread.State: RUNNABLE at java.util.HashMap.hash(java.base@11.0.15/HashMap.java:340) at java.util.HashMap.containsKey(java.base@11.0.15/HashMap.java:592) at java.util.HashSet.contains(java.base@11.0.15/HashSet.java:204) at java.util.Collections.disjoint(java.base@11.0.15/Collections.java:5465) at com.google.common.collect.Sets$2.isEmpty(Sets.java:871) at org.apache.pinot.query.runtime.executor.RoundRobinScheduler.computeReady(RoundRobinScheduler.java:147) ``` The number of threads can increase quite a lot, since the callback is called in `MailboxContentStreamObserver` which uses grpc default executor (which seems unbounded?). ``` ❯❯❯ cat 2.thdump| grep "Monitor.enter" | wc -l 4227 ... ❯❯❯ cat 6.thdump| grep "Monitor.enter" | wc -l 10658 ... ❯❯❯ cat 7.thdump| grep "Monitor.enter" | wc -l 13802 ... ❯❯❯ cat 7.thdump| grep "grpc-default-executor" | wc -l 13712 <<== number of grpc-default-executor threads ``` cc: @agavra @walterddr -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org