ankitsultana opened a new issue, #10140:
URL: https://github.com/apache/pinot/issues/10140

   **Context**
   
   When the `Monitor` in `RoundRobinScheduler` leaves, it will call 
`Monitor#signalNextWaiter` which will call `RoundRobinScheduler#hasNext` since 
the `signalNextWaiter` method iterates over all the guards for the given 
monitor and checks if they are satisfied.
   
   Now the `hasNext` method tries to re-compute the _ready queue every time it 
is called.
   
   **What we are seeing internally**
   
   We looked at one of our servers which was having memory issues and took 
thread-dumps over a period of few minutes. We found that all the QueryWorker 
threads are always waiting on acquiring the Guard:
   
   ```
   "query_worker_on_8421_port-1-thread-1" #2871 prio=5 os_prio=0 
cpu=135674.67ms elapsed=69632.90s tid=0x00007ee398016000 nid=0xd24 waiting on 
condition  [0x00007ef0a8758000]
      java.lang.Thread.State: WAITING (parking)
           at jdk.internal.misc.Unsafe.park(java.base@11.0.15/Native Method)
   --
   "query_worker_on_8421_port-1-thread-2" #2882 prio=5 os_prio=0 
cpu=101003.62ms elapsed=69629.03s tid=0x00007ee398017000 nid=0xd30 waiting on 
condition  [0x00007e9974377000]
      java.lang.Thread.State: WAITING (parking)
           at jdk.internal.misc.Unsafe.park(java.base@11.0.15/Native Method)
   --
   ...
   ```
   
   There's always one thread which is trying to compute the ready queue.
   
   ```
   "grpc-default-executor-37238" #54551 daemon prio=5 os_prio=0 cpu=2507.80ms 
elapsed=14701.97s tid=0x00007edbb0234000 nid=0xd955 runnable  
[0x00007edcbddb1000]
      java.lang.Thread.State: RUNNABLE
           at java.util.HashMap.hash(java.base@11.0.15/HashMap.java:340)
           at java.util.HashMap.containsKey(java.base@11.0.15/HashMap.java:592)
           at java.util.HashSet.contains(java.base@11.0.15/HashSet.java:204)
           at 
java.util.Collections.disjoint(java.base@11.0.15/Collections.java:5465)
           at com.google.common.collect.Sets$2.isEmpty(Sets.java:871)
           at 
org.apache.pinot.query.runtime.executor.RoundRobinScheduler.computeReady(RoundRobinScheduler.java:147)
   ```
   
   The number of threads can increase quite a lot, since the callback is called 
in `MailboxContentStreamObserver` which uses grpc default executor (which seems 
unbounded?).
   
   ```
   ❯❯❯ cat 2.thdump| grep "Monitor.enter" | wc -l
       4227
   ...
   ❯❯❯ cat 6.thdump| grep "Monitor.enter" | wc -l
      10658
   ...
   ❯❯❯ cat 7.thdump| grep "Monitor.enter" | wc -l
      13802
   ...
   ❯❯❯ cat 7.thdump| grep "grpc-default-executor" | wc -l
      13712  <<== number of grpc-default-executor threads
   ```
   
   cc: @agavra @walterddr
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to