ankitsultana opened a new issue, #10185: URL: https://github.com/apache/pinot/issues/10185
We recently started using Partial Upsert tables for a use-case and started seeing this issue. We have a cluster with a few partial upsert tables with replication=1. If we restart a server in the cluster, all the tables (even offline/vanilla-realtime tables) go into Bad state. In the server logs we see logs like the following: ``` 2023/01/26 17:45:32.602 INFO [TableStateUtils] [HelixTaskExecutor-message_handle_thread_29] Find unloaded segment: my_great_table, table: my_great_table, expected: ONLINE, actual: OFFLINE ``` On taking a thread-dump I see as many threads as there are partial upsert tables in the cluster, all stuck in this loop (corresponding [PR](https://github.com/apache/pinot/pull/8923/files)): https://github.com/apache/pinot/blob/master/pinot-segment-local/src/main/java/org/apache/pinot/segment/local/utils/tablestate/TableStateUtils.java#L130 Sample thread-dump: ``` "HelixTaskExecutor-message_handle_thread_35" #116 daemon prio=5 os_prio=0 cpu=74100.46ms elapsed=3248.88s tid=0x00007eb8202b8000 nid=0xe9 waiting on condition [0x00007eb78b8f8000] java.lang.Thread.State: TIMED_WAITING (sleeping) at java.lang.Thread.sleep(java.base@11.0.15/Native Method) at org.apache.pinot.segment.local.utils.tablestate.TableStateUtils.waitForAllSegmentsLoaded(TableStateUtils.java:133) at org.apache.pinot.core.data.manager.realtime.RealtimeTableDataManager.addSegment(RealtimeTableDataManager.java:416) - locked <0x00007ebbe43613b8> (a java.util.concurrent.atomic.AtomicBoolean) at org.apache.pinot.server.starter.helix.HelixInstanceDataManager.addRealtimeSegment(HelixInstanceDataManager.java:189) at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline(SegmentOnlineOfflineStateModelFactory.java:168) at org.apache.pinot.server.starter.helix.SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel.onBecomeConsumingFromOffline(SegmentOnlineOfflineStateModelFactory.java:83) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(java.base@11.0.15/Native Method) at jdk.internal.reflect.NativeMethodAccessorImpl.invoke(java.base@11.0.15/NativeMethodAccessorImpl.java:62) at jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(java.base@11.0.15/DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(java.base@11.0.15/Method.java:566) at org.apache.helix.messaging.handling.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:350) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org