lnbest0707-uber opened a new issue, #12732: URL: https://github.com/apache/pinot/issues/12732
After servers with invalid configuration joined the cluster, e.g. like below `{ "id": "<some_id>", "simpleFields": { "HELIX_HOST": "<some_host>", "HELIX_PORT": "", }, "mapFields": {}, "listFields": {} }` The broker (even in other tenants) cannot build brokerResource correctly. The entire cluster cannot server any queries any more. It would raise "**410 BrokerResourceMissingError**". Following noticeable error might appear in broker log `java.lang.NullPointerException: Cannot invoke "java.util.Set.contains(Object)" because "this._enabledInstances" is null at o.a.p.b.r.i.BaseInstanceSelector.getEnabledCandidatesAndAddToServingInstances(BaseInstanceSelector.java:338) at o.a.p.b.r.i.BaseInstanceSelector.refreshSegmentStates(BaseInstanceSelector.java:294) at o.a.p.b.r.i.BaseInstanceSelector.init(BaseInstanceSelector.java:117) at o.a.p.b.r.i.BalancedInstanceSelector.init(BalancedInstanceSelector.java:50) at o.a.p.b.r.BrokerRoutingManager.buildRouting(BrokerRoutingManager.java:450) at o.a.p.b.b.h.BrokerResourceOnlineOfflineStateModelFactory$BrokerResourceOnlineOfflineStateModel.onBecomeOnlineFromOffline(BrokerResourceOnlineOfflineStateModelFactory.java:80) at j.i.r.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103) at java.lang.reflect.Method.invoke(Method.java:580) at o.a.h.m.h.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:350) at o.a.h.m.h.HelixStateTransitionHandler.h...` which indicates that this._enabledInstances cannot be initialized BaseInstanceSelector. This Set object is initialized in following method ` @Override public void init(Set<String> enabledInstances, IdealState idealState, ExternalView externalView, Set<String> onlineSegments) { _enabledInstances = enabledInstances; Map<String, Long> newSegmentCreationTimeMap = getNewSegmentCreationTimeMapFromZK(idealState, externalView, onlineSegments); updateSegmentMaps(idealState, externalView, onlineSegments, newSegmentCreationTimeMap); refreshSegmentStates(); } ` And it is indirectly called by BrokerRoutingManager.processInstanceConfigChange() Once any exception raised in the method before calling `_routableServers = enabledServers;` the exception might be caught and leave the Set as null. Taking the above server config as an example, the broker would have following exception `java.lang.NumberFormatException: For input string: "" at java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65) at java.base/java.lang.Integer.parseInt(Integer.java:662) at java.base/java.lang.Integer.parseInt(Integer.java:770) at org.apache.pinot.core.transport.ServerInstance.<init>(ServerInstance.java:63) at org.apache.pinot.broker.routing.BrokerRoutingManager.processInstanceConfigChange(BrokerRoutingManager.java:245) at org.apache.pinot.broker.routing.BrokerRoutingManager.processClusterChange(BrokerRoutingManager.java:133) at org.apache.pinot.broker.broker.helix.ClusterChangeMediator.processClusterChange(ClusterChangeMediator.java:134) at org.apache.pinot.broker.broker.helix.ClusterChangeMediator.lambda$new$0(ClusterChangeMediator.java:96) at java.base/java.lang.Thread.run(Thread.java:829)` and skip/return in advance. Such behavior is very dangerous and problematic for a distributed system. A single participant instance failure would cause the entire cluster down. Following items might be required: - Once updating server configs through controller API, sanity check and enforcement need to be in place. - During initialization of brokers, safely created each not-to-be null object in constructor. Once constructing mapping across servers, safely isolate the bad configs and ensure functionality of good candidates. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org