[ 
https://issues.apache.org/jira/browse/GEODE-9025?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jakov Varenina updated GEODE-9025:
----------------------------------
    Description: 
When running Apache geode in Kubernetes, then in some cases ClassCastException 
is thrown when locator discovery is performed. This exception occurs when 
locator try to cast received Object to RemoteLocatorJoinResponse object. The 
problem is that locator discovery thread is then stopped, and due to that, 
locator discovery will never be continued and successfully performed. The only 
way to re-trigger locator discovery again is to restart the locator.

*The root cause of this issues is following:*
 If locator gets EOFException when sending VersionRequest message, then it 
automatically assumes that remote locator is running old version of geode which 
doesn't support VersionRequest message. Locator then uses the oldest known 
version and sends RemoteLocatorJoinRequest towards the remote locator. Then 
locator tries to read the response as follows:
{code:java}
  public Object requestToServer(HostAndPort addr, Object request, int timeout,
      boolean replyExpected) throws IOException, ClassNotFoundException {
      ...
      Object response = objectDeserializer.readObject(versionedDataInputStream);
      logger.debug("received response: {}", response);
      return response;
}
{code}
The function requestToServer() will read and return the Object. 
ClassCastException will be thrown when locator tries to cast Object to 
LocatorRequestJoinResponse:
{code:java}
  public void exchangeRemoteLocators() {        
        ...
        RemoteLocatorJoinResponse response = (RemoteLocatorJoinResponse) 
locatorClient
            .requestToServer(locatorId.getHost(), request, 
WAN_LOCATOR_CONNECTION_TIMEOUT, true);
{code}
 
{code:java}
{"timestamp":"2021-03-03T15:38:56.444Z","severity":"critical","message":"Locator
 discovery task encountred unexpected 
exception","metadata":\\{"function":"KVDB","thread_name":"WAN Locator Discovery 
Thread2","thread_id":"60"}
,"version":"0.2.1","service_id":"eric-data-kvdb-ag","exception":" 
java.lang.ClassCastException: class java.lang.Class cannot be cast to class 
org.apache.geode.cache.client.internal.locator.wan.RemoteLocatorJoinResponse 
(java.lang.Class is in module java.base of loader 'bootstrap'; 
org.apache.geode.cache.client.internal.locator.wan.RemoteLocatorJoinResponse is 
in unnamed module of loader 'app')
 at 
org.apache.geode.cache.client.internal.locator.wan.LocatorDiscovery.exchangeRemoteLocators(LocatorDiscovery.java:193)
 at 
org.apache.geode.cache.client.internal.locator.wan.LocatorDiscovery$RemoteLocatorDiscovery.run(LocatorDiscovery.java:131)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 at java.base/java.lang.Thread.run(Thread.java:834)
{code}

*The solution:*

I think that locator must not assume the version in this case, but to treat 
this case as unsuccessful and propagate the EOFException. Locactor discovery 
thread will then retry after 10s.

 

  was:
When running Apache geode in Kubernetes, then in some cases ClassCastException 
is thrown when locator discovery is performed. This exception occurs when 
locator try to cast received Object to RemoteLocatorJoinResponse object. The 
problem is that locator discovery thread is then stopped, and due to that, 
locator discovery will never be continued and successfully performed. The only 
way to re-trigger locator discovery again is to restart the locator.

*The root cause of this issues is following:*
 If locator gets EOFException when sending VersionRequest message, then it 
automatically assumes that remote locator is running old version of geode which 
doesn't support VersionRequest message. Locator then uses the oldest known 
version and sends RemoteLocatorJoinRequest towards the remote locator. Then 
locator tries to read the response as follows:
{code:java}
  public Object requestToServer(HostAndPort addr, Object request, int timeout,
      boolean replyExpected) throws IOException, ClassNotFoundException {
      ...
      Object response = objectDeserializer.readObject(versionedDataInputStream);
      logger.debug("received response: {}", response);
      return response;
}
{code}
The function requestToServer() will read and return the Object. 
ClassCastException will be thrown when locator tries to cast Object to 
LocatorRequestJoinResponse:
{code:java}
  public void exchangeRemoteLocators() {        
        ...
        RemoteLocatorJoinResponse response = (RemoteLocatorJoinResponse) 
locatorClient
            .requestToServer(locatorId.getHost(), request, 
WAN_LOCATOR_CONNECTION_TIMEOUT, true);
{code}
 

{"timestamp":"2021-03-03T15:38:56.444Z","severity":"critical","message":"Locator
 discovery task encountred unexpected 
exception","metadata":\{"function":"KVDB","thread_name":"WAN Locator Discovery 
Thread2","thread_id":"60"},"version":"0.2.1","service_id":"eric-data-kvdb-ag","exception":"
 java.lang.ClassCastException: class java.lang.Class cannot be cast to class 
org.apache.geode.cache.client.internal.locator.wan.RemoteLocatorJoinResponse 
(java.lang.Class is in module java.base of loader 'bootstrap'; 
org.apache.geode.cache.client.internal.locator.wan.RemoteLocatorJoinResponse is 
in unnamed module of loader 'app')
 at 
org.apache.geode.cache.client.internal.locator.wan.LocatorDiscovery.exchangeRemoteLocators(LocatorDiscovery.java:193)
 at 
org.apache.geode.cache.client.internal.locator.wan.LocatorDiscovery$RemoteLocatorDiscovery.run(LocatorDiscovery.java:131)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
 at 
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
 at java.base/java.lang.Thread.run(Thread.java:834)


 *The solution:*

I think that locator must not assume the version in this case, but to treat 
this case as unsuccessful and propagate the EOFException. Locactor discovery 
thread will then retry after 10s.

 


> ClassCastException occures during remote locator discovery
> ----------------------------------------------------------
>
>                 Key: GEODE-9025
>                 URL: https://issues.apache.org/jira/browse/GEODE-9025
>             Project: Geode
>          Issue Type: Bug
>            Reporter: Jakov Varenina
>            Assignee: Jakov Varenina
>            Priority: Major
>
> When running Apache geode in Kubernetes, then in some cases 
> ClassCastException is thrown when locator discovery is performed. This 
> exception occurs when locator try to cast received Object to 
> RemoteLocatorJoinResponse object. The problem is that locator discovery 
> thread is then stopped, and due to that, locator discovery will never be 
> continued and successfully performed. The only way to re-trigger locator 
> discovery again is to restart the locator.
> *The root cause of this issues is following:*
>  If locator gets EOFException when sending VersionRequest message, then it 
> automatically assumes that remote locator is running old version of geode 
> which doesn't support VersionRequest message. Locator then uses the oldest 
> known version and sends RemoteLocatorJoinRequest towards the remote locator. 
> Then locator tries to read the response as follows:
> {code:java}
>   public Object requestToServer(HostAndPort addr, Object request, int timeout,
>       boolean replyExpected) throws IOException, ClassNotFoundException {
>       ...
>       Object response = 
> objectDeserializer.readObject(versionedDataInputStream);
>       logger.debug("received response: {}", response);
>       return response;
> }
> {code}
> The function requestToServer() will read and return the Object. 
> ClassCastException will be thrown when locator tries to cast Object to 
> LocatorRequestJoinResponse:
> {code:java}
>   public void exchangeRemoteLocators() {        
>         ...
>         RemoteLocatorJoinResponse response = (RemoteLocatorJoinResponse) 
> locatorClient
>             .requestToServer(locatorId.getHost(), request, 
> WAN_LOCATOR_CONNECTION_TIMEOUT, true);
> {code}
>  
> {code:java}
> {"timestamp":"2021-03-03T15:38:56.444Z","severity":"critical","message":"Locator
>  discovery task encountred unexpected 
> exception","metadata":\\{"function":"KVDB","thread_name":"WAN Locator 
> Discovery Thread2","thread_id":"60"}
> ,"version":"0.2.1","service_id":"eric-data-kvdb-ag","exception":" 
> java.lang.ClassCastException: class java.lang.Class cannot be cast to class 
> org.apache.geode.cache.client.internal.locator.wan.RemoteLocatorJoinResponse 
> (java.lang.Class is in module java.base of loader 'bootstrap'; 
> org.apache.geode.cache.client.internal.locator.wan.RemoteLocatorJoinResponse 
> is in unnamed module of loader 'app')
>  at 
> org.apache.geode.cache.client.internal.locator.wan.LocatorDiscovery.exchangeRemoteLocators(LocatorDiscovery.java:193)
>  at 
> org.apache.geode.cache.client.internal.locator.wan.LocatorDiscovery$RemoteLocatorDiscovery.run(LocatorDiscovery.java:131)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
>  at 
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
>  at java.base/java.lang.Thread.run(Thread.java:834)
> {code}
> *The solution:*
> I think that locator must not assume the version in this case, but to treat 
> this case as unsuccessful and propagate the EOFException. Locactor discovery 
> thread will then retry after 10s.
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to