[ https://issues.apache.org/jira/browse/GEODE-9666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Owen Nichols closed GEODE-9666. ------------------------------- > Client throws NoAvailableLocatorsException after locators change IP addresses > ----------------------------------------------------------------------------- > > Key: GEODE-9666 > URL: https://issues.apache.org/jira/browse/GEODE-9666 > Project: Geode > Issue Type: Bug > Components: membership > Affects Versions: 1.15.0 > Reporter: Aaron Lindsey > Assignee: Aaron Lindsey > Priority: Major > Labels: pull-request-available > Fix For: 1.15.0 > > > We have a test for Geode on Kubernetes which: > * Deploys a Geode cluster consisting of 2 locator Pods, 3 server Pods > * Deploys 5 Spring boot client Pods which continually do PUTs and GETs > * Triggers a rolling restart of the locator Pods > ** The rolling restart operation restarts one locator at a time, waiting for > each restarted locator to become fully online before restarting the next > locator > * Stops the client operations and validates there were no exceptions thrown > in the clients. > Occasionally, we see {{NoAvailableLocatorsException}} thrown on one of the > clients: > {code:none} > org.apache.geode.cache.client.NoAvailableLocatorsException: Unable to connect > to any locators in the list > [system-test-gemfire-locator-0.system-test-gemfire-locator.gemfire-system-test-3f1ecc74-b1ea-4288-b4d1-594bbb8364ab.svc.cluster.local:10334, > > system-test-gemfire-locator-1.system-test-gemfire-locator.gemfire-system-test-3f1ecc74-b1ea-4288-b4d1-594bbb8364ab.svc.cluster.local:10334] > at > org.apache.geode.cache.client.internal.AutoConnectionSourceImpl.findServer(AutoConnectionSourceImpl.java:174) > at > org.apache.geode.cache.client.internal.ConnectionFactoryImpl.createClientToServerConnection(ConnectionFactoryImpl.java:198) > at > org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.createPooledConnection(ConnectionManagerImpl.java:196) > at > org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.createPooledConnection(ConnectionManagerImpl.java:190) > at > org.apache.geode.cache.client.internal.pooling.ConnectionManagerImpl.borrowConnection(ConnectionManagerImpl.java:276) > at > org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:136) > at > org.apache.geode.cache.client.internal.OpExecutorImpl.execute(OpExecutorImpl.java:119) > at > org.apache.geode.cache.client.internal.PoolImpl.execute(PoolImpl.java:801) > at org.apache.geode.cache.client.internal.GetOp.execute(GetOp.java:92) > at > org.apache.geode.cache.client.internal.ServerRegionProxy.get(ServerRegionProxy.java:114) > at > org.apache.geode.internal.cache.LocalRegion.findObjectInSystem(LocalRegion.java:2802) > at > org.apache.geode.internal.cache.LocalRegion.getObject(LocalRegion.java:1469) > at > org.apache.geode.internal.cache.LocalRegion.nonTxnFindObject(LocalRegion.java:1442) > at > org.apache.geode.internal.cache.LocalRegionDataView.findObject(LocalRegionDataView.java:197) > at > org.apache.geode.internal.cache.LocalRegion.get(LocalRegion.java:1379) > at > org.apache.geode.internal.cache.LocalRegion.get(LocalRegion.java:1318) > at > org.apache.geode.internal.cache.LocalRegion.get(LocalRegion.java:1303) > at > org.apache.geode.internal.cache.AbstractRegion.get(AbstractRegion.java:439) > at > org.apache.geode.kubernetes.client.service.AsyncOperationService.evaluate(AsyncOperationService.java:282) > at > org.apache.geode.kubernetes.client.api.Controller.evaluateRegion(Controller.java:88) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.base/java.lang.reflect.Method.invoke(Method.java:566) > at > org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:197) > at > org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:141) > at > org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:106) > at > org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:894) > at > org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:808) > at > org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87) > at > org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1063) > at > org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:963) > at > org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1006) > at > org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:898) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:626) > at > org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:883) > at javax.servlet.http.HttpServlet.service(HttpServlet.java:733) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:227) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162) > at > org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:53) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162) > at > org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:100) > at > org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162) > at > org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:93) > at > org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162) > at > org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:201) > at > org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:119) > at > org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:189) > at > org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:162) > at > org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:202) > at > org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:97) > at > org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:542) > at > org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:143) > at > org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:92) > at > org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:78) > at > org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:764) > at > org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:357) > at > org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:374) > at > org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65) > at > org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:893) > at > org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1707) > at > org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) > at > java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128) > at > java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628) > at > org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) > at java.base/java.lang.Thread.run(Thread.java:829) > {code} > We do not expect any of the clients to throw {{NoAvailableLocatorsException}} > because there is always at least one locator available during the test. > We did some investigation and found that: > * Locator Pods get different IP addresses on Kubernetes after they are > restarted, but they keep the same hostname. > * After we see the {{NoAvailableLocatorsException}} thrown from a client, we > see the client continues trying to contact the locators using stale IP > addresses (i.e. the locators' original IP addresses from before they were > restarted). We checked that the locators' DNS names are resolvable to the > correct IP addresses from within the locator containers. We also ruled out > the as [JVM DNS cache > settings|https://docs.oracle.com/javase/7/docs/technotes/guides/net/properties.html] > as the cause of the stale IP addresses. > * The changes for GEODE-9139 changed the behavior of > {{org.apache.geode.distributed.internal.tcpserver.HostAndPort}} to > permanently cache the resolved address once it has tried one time. This > undoes part of the fix introduced by GEODE-7808, in which HostAndPort was > created as a way to hold an unresolved hostname. > In order to fix this issue, it seems like > {{org.apache.geode.distributed.internal.tcpserver.HostAndPort}} should be > changed so that when it contains an unresolved address, it will try to > resolve the address each time {{getSocketInetAddress}} is called. This was > the behavior in Geode 1.13 and 1.14, so changing it back shouldn't have a > negative impact on performance. -- This message was sent by Atlassian Jira (v8.20.7#820007)