ankitsultana commented on issue #10185:
URL: https://github.com/apache/pinot/issues/10185#issuecomment-1406922018

   @Jackie-Jiang : Saw this issue again today and maybe the allSegmentsLoaded 
lock is an issue. The server was restarted around 10 hours ago from the moment 
I started debugging. This time most tables have recovered and only 1 partial 
upsert table is still having an issue.
   
   For that table, there are 6 segments that should be in CONSUMING state as 
per IS that are in OFFLINE state. There's also 1 segment for the same table 
that should be in ONLINE state but is in ERROR state.
   
   In the thread-dump, I see that around 5 threads are blocked, waiting to 
acquire the `allSegmentsLoaded` lock.
   
   ```
   ❯❯❯ cat 1.thdump | grep "0x00007f30e3b2ae68"
           - waiting to lock <0x00007f30e3b2ae68> (a 
java.util.concurrent.atomic.AtomicBoolean)
           - waiting to lock <0x00007f30e3b2ae68> (a 
java.util.concurrent.atomic.AtomicBoolean)
           - waiting to lock <0x00007f30e3b2ae68> (a 
java.util.concurrent.atomic.AtomicBoolean)
           - waiting to lock <0x00007f30e3b2ae68> (a 
java.util.concurrent.atomic.AtomicBoolean)
           - waiting to lock <0x00007f30e3b2ae68> (a 
java.util.concurrent.atomic.AtomicBoolean)
           - locked <0x00007f30e3b2ae68> (a 
java.util.concurrent.atomic.AtomicBoolean)
   ```
   
   ```
   "HelixTaskExecutor-message_handle_thread_36" #117 daemon prio=5 os_prio=0 
cpu=4243721.44ms elapsed=37170.30s tid=0x00007f2d1804d800 nid=0xda waiting on 
condition  [0x00007f2c97dfb000]
      java.lang.Thread.State: TIMED_WAITING (sleeping)
           at java.lang.Thread.sleep(java.base@11.0.15/Native Method)
           at 
org.apache.pinot.segment.local.utils.tablestate.TableStateUtils.waitForAllSegmentsLoaded(TableStateUtils.java:133)
           at 
org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:416)
           - locked <0x00007f30e3b2ae68> (a 
java.util.concurrent.atomic.AtomicBoolean)
           at 
org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:189)
           at 
org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:168)
           at 
org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeConsumingFromOffline(SegmentOnlineOfflineStateModelFactory.java:83)
   ```
   
   I do see a bunch of helix message handler threads are sitting idle (we have 
50+ and only 6 are involved with that lock above, all presumably for the same 
table).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to