lnbest0707-uber opened a new issue, #12732:
URL: https://github.com/apache/pinot/issues/12732

   After servers with invalid configuration joined the cluster, e.g. like below
   `{
     "id": "<some_id>",
     "simpleFields": {
       "HELIX_HOST": "<some_host>",
       "HELIX_PORT": "",
     },
     "mapFields": {},
     "listFields": {}
   }`
   The broker (even in other tenants) cannot build brokerResource correctly. 
The entire cluster cannot server any queries any more. It would raise "**410 
BrokerResourceMissingError**".
   
   Following noticeable error might appear in broker log
   `java.lang.NullPointerException: Cannot invoke 
"java.util.Set.contains(Object)" because "this._enabledInstances" is null
        at 
o.a.p.b.r.i.BaseInstanceSelector.getEnabledCandidatesAndAddToServingInstances(BaseInstanceSelector.java:338)
        at 
o.a.p.b.r.i.BaseInstanceSelector.refreshSegmentStates(BaseInstanceSelector.java:294)
        at o.a.p.b.r.i.BaseInstanceSelector.init(BaseInstanceSelector.java:117)
        at 
o.a.p.b.r.i.BalancedInstanceSelector.init(BalancedInstanceSelector.java:50)
        at 
o.a.p.b.r.BrokerRoutingManager.buildRouting(BrokerRoutingManager.java:450)
        at 
o.a.p.b.b.h.BrokerResourceOnlineOfflineStateModelFactory$BrokerResourceOnlineOfflineStateModel.onBecomeOnlineFromOffline(BrokerResourceOnlineOfflineStateModelFactory.java:80)
        at 
j.i.r.DirectMethodHandleAccessor.invoke(DirectMethodHandleAccessor.java:103)
        at java.lang.reflect.Method.invoke(Method.java:580)
        at 
o.a.h.m.h.HelixStateTransitionHandler.invoke(HelixStateTransitionHandler.java:350)
        at o.a.h.m.h.HelixStateTransitionHandler.h...`
   
   which indicates that this._enabledInstances cannot be initialized 
BaseInstanceSelector. This Set object is initialized in following method
   `
   @Override
     public void init(Set<String> enabledInstances, IdealState idealState, 
ExternalView externalView,
         Set<String> onlineSegments) {
       _enabledInstances = enabledInstances;
       Map<String, Long> newSegmentCreationTimeMap =
           getNewSegmentCreationTimeMapFromZK(idealState, externalView, 
onlineSegments);
       updateSegmentMaps(idealState, externalView, onlineSegments, 
newSegmentCreationTimeMap);
       refreshSegmentStates();
     }
   `
   And it is indirectly called by 
BrokerRoutingManager.processInstanceConfigChange()
   Once any exception raised in the method before calling `_routableServers = 
enabledServers;` the exception might be caught and leave the Set as null.
   
   Taking the above server config as an example, the broker would have 
following exception
   `java.lang.NumberFormatException: For input string: "" at 
java.base/java.lang.NumberFormatException.forInputString(NumberFormatException.java:65)
 at java.base/java.lang.Integer.parseInt(Integer.java:662) at 
java.base/java.lang.Integer.parseInt(Integer.java:770) at 
org.apache.pinot.core.transport.ServerInstance.<init>(ServerInstance.java:63) 
at 
org.apache.pinot.broker.routing.BrokerRoutingManager.processInstanceConfigChange(BrokerRoutingManager.java:245)
 at 
org.apache.pinot.broker.routing.BrokerRoutingManager.processClusterChange(BrokerRoutingManager.java:133)
 at 
org.apache.pinot.broker.broker.helix.ClusterChangeMediator.processClusterChange(ClusterChangeMediator.java:134)
 at 
org.apache.pinot.broker.broker.helix.ClusterChangeMediator.lambda$new$0(ClusterChangeMediator.java:96)
 at java.base/java.lang.Thread.run(Thread.java:829)`
   and skip/return in advance.
   
   Such behavior is very dangerous and problematic for a distributed system. A 
single participant instance failure would cause the entire cluster down.
   Following items might be required:
   - Once updating server configs through controller API, sanity check and 
enforcement need to be in place.
   - During initialization of brokers, safely created each not-to-be null 
object in constructor. Once constructing mapping across servers, safely isolate 
the bad configs and ensure functionality of good candidates.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to