[ https://issues.apache.org/jira/browse/GEODE-9808?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen Nichols closed GEODE-9808. ------------------------------- > Client ops fail with NoLocatorsAvailableException when all servers leave the > DS > -------------------------------------------------------------------------------- > > Key: GEODE-9808 > URL: https://issues.apache.org/jira/browse/GEODE-9808 > Project: Geode > Issue Type: Bug > Components: client/server > Affects Versions: 1.15.0 > Reporter: Bill Burcham > Assignee: Donal Evans > Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > When there are no cache servers (only locators) in a cluster, client > operations will fail with a misleading exception: > {noformat} > org.apache.geode.cache.client.NoAvailableLocatorsException: Unable to connect > to any locators in the list > [gemfire-cluster-locator-0.gemfire-cluster-locator.namespace-1850250019.svc.cluster.local:10334, > > gemfire-cluster-locator-1.gemfire-cluster-locator.namespace-1850250019.svc.cluster.local:10334, > > gemfire-cluster-locator-2.gemfire-cluster-locator.namespace-1850250019.svc.cluster.local:10334] > at > org.apache.geode.cache.client.internal.AutoConnectionSourceImpl.findServer(AutoConnectionSourceImpl.java:174) > at > org.apache.geode.cache.client.internal.ConnectionFactoryImpl.createClientToServerConnection(ConnectionFactoryImpl.java:211) > at > org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.createPooledConnection(ConnectionManagerImpl.java:196) > at > org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.forceCreateConnection(ConnectionManagerImpl.java:227) > at > org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.exchangeConnection(ConnectionManagerImpl.java:365) > at > org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:161) > at > org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:120) > at > org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:805) > at org.apache.geode.cache.client.internal.PutOp.execute(PutOp.java:91) > {noformat} > Even the client is able to connect to a locator, we encounter a > NoAvailableLocatorsException exception with the message "Unable to connect to > any locators in the list". > Investigating the product code we see: > # If there are no cache servers in the cluster, ServerLocator.pickServer() > will definitely construct a ClientConnectionResponse(null) which causes that > object’s hasResult() to respond with false in the loop termination in > AutoConnectionSourceImpl.queryLocators() > # Not only is the exception wording misleading in > AutoConnectionSourceImpl.findServer()—it’s also misleading in at least two > other calling locations in AutoConnectionSourceImpl: findReplacementServer() > and findServersForQueue(). > # In each of those cases the calling method translates a null response from > queryLocators() into a throw of a NoAvailableLocatorsException > # an appropriate exception, NoAvailableServersException, already exists, for > the case where we were able to contact a locator but the locator was not able > to find any cache servers > # According to my Git spelunking queryLocators() has been obfuscating the > true cause of the failure since at least 2015 > Without analyzing ServerLocator.pickServer() > (LocatorLoadSnapshot.getServerForConnection()) to discern why two locators > might disagree on how many cache servers are in the cluster, it seems to me > that we should modify AutoConnectionSourceImpl.queryLocators() so that: > * if it gets a ServerLocationResponse with hasResult() true, it immediately > returns that as it does now > * otherwise it keeps trying and it keeps track of the last (non-null) > ServerLocationResponse it has received > * it returns the last non-null ServerLocationResponse it received (otherwise > it returns null) > With that in hand, we can change the three call locations in > AutoConnectionSourceImpl: findServer(), findReplacementServer(), and > findServersForQueue() to each throw NoAvailableLocatorsException if no > locator responded, or NoAvailableServersException if a locator responded with > a ClientConnectionResponse for which hasResult() returns null. -- This message was sent by Atlassian Jira (v8.20.7#820007)