fx19880617 opened a new pull request #5864: URL: https://github.com/apache/incubator-pinot/pull/5864
## Description Current Pinot server start ServerInstance before ControllerLeaderLocator creation. This is typically ok but in one extreme case that during realtime server bootstrap, one segment is consumed to threshold and trying to persist, it will use `ControllerLeaderLocator` in `ServerSegmentCompletionProtocolHandler.createSegmentCompletionUrl(...)` which may not be created yet. Below is a sample logs: ``` 2020/08/14 00:20:05.855 INFO [PluginManager] [main] Plugins root dir is [/opt/pinot/plugins] 2020/08/14 00:20:05.869 INFO [PluginManager] [main] Loading all plugins. Please use env variable 'plugins.include' to customize. 2020/08/14 00:20:05.869 INFO [PluginManager] [main] Trying to load plugin [pinot-batch-ingestion-hadoop] from location [/opt/pinot/plugins/pinot-batch-ingestion/pinot-batch-ingestion-hadoop] 2020/08/14 00:20:05.870 INFO [PluginManager] [main] Successfully loaded plugin [pinot-batch-ingestion-hadoop] from jar file [/opt/pinot/plugins/pinot-batch-ingestion/pinot-batch-ingestion-hadoop/pinot-batch-ingestion-hadoop-0.5.0-SNAPSHOT-shaded.jar] ... ... 2020/08/14 00:20:06.199 INFO [ServerInstance] [main] Initializing server instance 2020/08/14 00:20:06.199 INFO [ServerInstance] [main] Initializing server metrics 2020/08/14 00:20:06.220 INFO [ServerInstance] [main] Initializing instance data manager of class: org.apache.pinot.server.starter.helix.HelixInstanceDataManager 2020/08/14 00:20:06.223 INFO [HelixInstanceDataManager] [main] Initializing Helix instance data manager 2020/08/14 00:20:06.224 INFO [HelixInstanceDataManagerConfig] [main] InstanceDataManagerConfig, key: datadir , value: /var/pinot/server/data/index 2020/08/14 00:20:06.225 INFO [HelixInstanceDataManagerConfig] [main] InstanceDataManagerConfig, key: segmenttardir , value: /var/pinot/server/data/segment 2020/08/14 00:20:06.225 INFO [HelixInstanceDataManagerConfig] [main] InstanceDataManagerConfig, key: data.manager.class , value: org.apache.pinot.server.starter.helix.HelixInstanceDataManager ... 2020/08/14 00:20:08.114 INFO [HelixTaskExecutor] [main] Scheduling message 553a8091-14b7-48a9-91b6-61c9c722112d: testTable_REALTIME:testTable__0__4__20200813T2127Z, OFFLINE->CONSUMING 2020/08/14 00:20:08.115 INFO [HelixTaskExecutor] [main] Submit task: 553a8091-14b7-48a9-91b6-61c9c722112d to pool: java.util.concurrent.ThreadPoolExecutor@48b0e701[Running, pool size = 1, active threads = 1, queued tasks = 0, completed tasks = 0] 2020/08/14 00:20:08.203 INFO [HelixStateTransitionHandler] [HelixTaskExecutor-message_handle_thread] Instance Server_my-host-server-0.my-host-server-headless.my-host.svc.cluster.local_8098, partition testTable__0__4__20200813T2127Z received state transition from OFFLINE to CONSUMING on session 100e1f0f4d00022, message id: 553a8091-14b7-48a9-91b6-61c9c722112d 2020/08/14 00:20:08.203 INFO [SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel] [HelixTaskExecutor-message_handle_thread] SegmentOnlineOfflineStateModel.onBecomeConsumingFromOffline() : ZnRecord=553a8091-14b7-48a9-91b6-61c9c722112d, {CREATE_TIMESTAMP=1597364407619, ClusterEventName=CurrentStateChange, EXECUTE_START_TIMESTAMP=1597364408128, EXE_SESSION_ID=100e1f0f4d00022, FROM_STATE=OFFLINE, MSG_ID=553a8091-14b7-48a9-91b6-61c9c722112d, MSG_STATE=read, MSG_TYPE=STATE_TRANSITION, PARTITION_NAME=testTable__0__4__20200813T2127Z, READ_TIMESTAMP=1597364407995, RESOURCE_NAME=testTable_REALTIME, RESOURCE_TAG=testTable_REALTIME, SRC_NAME=my-host-controller-0.my-host-controller-headless.my-host.svc.cluster.local_9000, SRC_SESSION_ID=100e1f0f4d0001e, STATE_MODEL_DEF=SegmentOnlineOfflineStateModel, STATE_MODEL_FACTORY_NAME=DEFAULT, TGT_NAME=Server_my-host-server-0.my-host-server-headless.my-host.svc.cluster.local_8098, TGT_SESSION_ID=100e1f0f4d00022, TO_STATE=CONSUMING}{}{} , Stat=Stat {_version=0, _creationTime=1597364407632, _modifiedTime=1597364407632, _ephemeralOwner=0} 2020/08/14 00:20:08.203 INFO [SegmentOnlineOfflineStateModelFactory$SegmentOnlineOfflineStateModel] [HelixTaskExecutor-message_handle_thread] SegmentOnlineOfflineStateModel.onBecomeOnlineFromOffline() : ZnRecord=553a8091-14b7-48a9-91b6-61c9c722112d, {CREATE_TIMESTAMP=1597364407619, ClusterEventName=CurrentStateChange, EXECUTE_START_TIMESTAMP=1597364408128, EXE_SESSION_ID=100e1f0f4d00022, FROM_STATE=OFFLINE, MSG_ID=553a8091-14b7-48a9-91b6-61c9c722112d, MSG_STATE=read, MSG_TYPE=STATE_TRANSITION, PARTITION_NAME=testTable__0__4__20200813T2127Z, READ_TIMESTAMP=1597364407995, RESOURCE_NAME=testTable_REALTIME, RESOURCE_TAG=testTable_REALTIME, SRC_NAME=my-host-controller-0.my-host-controller-headless.my-host.svc.cluster.local_9000, SRC_SESSION_ID=100e1f0f4d0001e, STATE_MODEL_DEF=SegmentOnlineOfflineStateModel, STATE_MODEL_FACTORY_NAME=DEFAULT, TGT_NAME=Server_my-host-server-0.my-host-server-headless.my-host.svc.cluster.local_8098, TGT_SESSION_ID=100e1f0f4d00022, TO_STATE=CONSUMING}{}{}, S tat=Stat {_version=0, _creationTime=1597364407632, _modifiedTime=1597364407632, _ephemeralOwner=0} 2020/08/14 00:20:08.203 INFO [HelixInstanceDataManager] [HelixTaskExecutor-message_handle_thread] Adding segment: testTable__0__4__20200813T2127Z to table: testTable_REALTIME 2020/08/14 00:20:09.093 INFO [LLRealtimeSegmentDataManager_testTable__0__4__20200813T2127Z] [HelixTaskExecutor-message_handle_thread] RealtimeDataResourceZKMetadata contains no information about sorted column for segment testTable__0__4__20200813T2127Z 2020/08/14 00:20:09.099 INFO [LLRealtimeSegmentDataManager_testTable__0__4__20200813T2127Z] [HelixTaskExecutor-message_handle_thread] Creating new stream consumer, reason: Starting 2020/08/14 00:20:09.129 INFO [LLRealtimeSegmentDataManager_testTable__0__4__20200813T2127Z] [HelixTaskExecutor-message_handle_thread] Creating new stream metadata provider, reason: Starting ... 2020/08/14 00:20:09.173 INFO [FixedByteSVMutableForwardIndex] [HelixTaskExecutor-message_handle_thread] Allocating 200000 bytes for: testTable__0__4__20200813T2127Z:ItemStatus.sv.unsorted.fwd ... 2020/08/14 00:20:09.228 INFO [MutableSegmentImpl_testTable__0__4__20200813T2127Z_testTables] [HelixTaskExecutor-message_handle_thread] Metrics aggregation is disabled. 2020/08/14 00:20:09.272 INFO [LLRealtimeSegmentDataManager_testTable__0__4__20200813T2127Z] [HelixTaskExecutor-message_handle_thread] Starting consumption on realtime consuming segment testTable__0__4__20200813T2127Z maxRowCount 50000 maxEndTime 2020-08-14T00:30:09.229Z 2020/08/14 00:20:09.292 INFO [LLRealtimeSegmentDataManager_testTable__0__4__20200813T2127Z] [HelixTaskExecutor-message_handle_thread] Created new consumer thread Thread[testTable__0__4__20200813T2127Z,5,main] for org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager@a12cbb0 2020/08/14 00:20:09.299 INFO [testTable_REALTIME-RealtimeTableDataManager] [HelixTaskExecutor-message_handle_thread] Initialize RealtimeSegmentDataManager - testTable__0__4__20200813T2127Z 2020/08/14 00:20:09.299 INFO [HelixInstanceDataManager] [HelixTaskExecutor-message_handle_thread] Added segment: testTable__0__4__20200813T2127Z to table: testTable_REALTIME 2020/08/14 00:20:09.300 INFO [LLRealtimeSegmentDataManager_testTable__0__4__20200813T2127Z] [testTable__0__4__20200813T2127Z] Starting consumption loop start offset 200000, finalOffset null 2020/08/14 00:20:09.383 INFO [HelixTask] [HelixTaskExecutor-message_handle_thread] Message: 553a8091-14b7-48a9-91b6-61c9c722112d (parent: null) handling task for testTable_REALTIME:testTable__0__4__20200813T2127Z completed at: 1597364409383, results: true. FrameworkTime: 53 ms; HandlerTime: 1211 ms. 2020/08/14 00:20:09.717 INFO [Metadata] [testTable__0__4__20200813T2127Z] Cluster ID: 21TWeeruS9eTKd-gAXF1qQ 2020/08/14 00:20:14.127 INFO [AbstractCoordinator] [testTable__0__4__20200813T2127Z] [Consumer clientId=consumer-2, groupId=] Discovered group coordinator b-3.gc-pinot-kafka.hgsrac.c3.kafka.us-west-2.amazonaws.com:9092 (id: 2147483644 rack: null) 2020/08/14 00:20:18.489 INFO [LLRealtimeSegmentDataManager_testTable__0__4__20200813T2127Z] [testTable__0__4__20200813T2127Z] Stopping consumption due to row limit nRows=50000 numRowsIndexed=50000, numRowsConsumed=50000 2020/08/14 00:20:18.489 INFO [LLRealtimeSegmentDataManager_testTable__0__4__20200813T2127Z] [testTable__0__4__20200813T2127Z] Stopping consumption due to row limit nRows=50000 numRowsIndexed=50000, numRowsConsumed=50000 2020/08/14 00:20:18.489 ERROR [LLRealtimeSegmentDataManager_testTable__0__4__20200813T2127Z] [testTable__0__4__20200813T2127Z] Exception while in work java.lang.RuntimeException: Not yet created at org.apache.pinot.server.realtime.ControllerLeaderLocator.getInstance(ControllerLeaderLocator.java:79) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-8ab032fa05b514ab6614396d632b9c9e52b9100d] at org.apache.pinot.server.realtime.ServerSegmentCompletionProtocolHandler.createSegmentCompletionUrl(ServerSegmentCompletionProtocolHandler.java:188) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-8ab032fa05b514ab6614396d632b9c9e52b9100d] at org.apache.pinot.server.realtime.ServerSegmentCompletionProtocolHandler.segmentConsumed(ServerSegmentCompletionProtocolHandler.java:170) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-8ab032fa05b514ab6614396d632b9c9e52b9100d] at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager.postSegmentConsumedMsg(LLRealtimeSegmentDataManager.java:926) ~[pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-8ab032fa05b514ab6614396d632b9c9e52b9100d] at org.apache.pinot.core.data.manager.realtime.LLRealtimeSegmentDataManager$PartitionConsumer.run(LLRealtimeSegmentDataManager.java:553) [pinot-all-0.5.0-SNAPSHOT-jar-with-dependencies.jar:0.5.0-SNAPSHOT-8ab032fa05b514ab6614396d632b9c9e52b9100d] at java.lang.Thread.run(Thread.java:748) [?:1.8.0_265] ``` ---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org