ankitsultana opened a new issue, #10185:
URL: https://github.com/apache/pinot/issues/10185

   We recently started using Partial Upsert tables for a use-case and started 
seeing this issue. We have a cluster with a few partial upsert tables with 
replication=1. If we restart a server in the cluster, all the tables (even 
offline/vanilla-realtime tables) go into Bad state.
   
   In the server logs we see logs like the following:
   
   ```
   2023/01/26 17:45:32.602 INFO [TableStateUtils] 
[HelixTaskExecutor-message_handle_thread_29] Find unloaded segment: 
my_great_table, table: my_great_table, expected: ONLINE, actual: OFFLINE
   ```
   
   On taking a thread-dump I see as many threads as there are partial upsert 
tables in the cluster, all stuck in this loop (corresponding 
[PR](https://github.com/apache/pinot/pull/8923/files)):
   
   
https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/tablestate/TableStateUtils.java#L130
   
   Sample thread-dump:
   
   ```
   "HelixTaskExecutor-message_handle_thread_35" #116 daemon prio=5 os_prio=0 
cpu=74100.46ms elapsed=3248.88s tid=0x00007eb8202b8000 nid=0xe9 waiting on 
condition  [0x00007eb78b8f8000]
      java.lang.Thread.State: TIMED_WAITING (sleeping)
           at java.lang.Thread.sleep(java.base@11.0.15/Native Method)
           at 
org.apache.pinot.segment.local.utils.tablestate.TableStateUtils.waitForAllSegmentsLoaded(TableStateUtils.java:133)
           at 
org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:416)
           - locked <0x00007ebbe43613b8> (a 
java.util.concurrent.atomic.AtomicBoolean)
           at 
org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:189)
           at 
org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:168)
           at 
org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeConsumingFromOffline(SegmentOnlineOfflineStateModelFactory.java:83)
           at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.15/Native 
Method)
           at 
jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.15/NativeMethodAccessorImpl.java:62)
           at 
jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.15/DelegatingMethodAccessorImpl.java:43)
           at java.lang.reflect.Method.invoke(java.base@11.0.15/Method.java:566)
           at 
org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:350)
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to