[jira] [Commented] (GEODE-2238) Member may fail to receive cluster configuration from locator

Kirk Lund (JIRA) Wed, 21 Dec 2016 16:22:36 -0800

    [ 
https://issues.apache.org/jira/browse/GEODE-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768569#comment-15768569
 ]


Kirk Lund commented on GEODE-2238:
----------------------------------

There are two request/response messages involving cluster config:

1) SharedConfigurationStatusRequest/SharedConfigurationStatusResponse -- used 
only to get status of cluster config service such as for "status locator" 
command.

This is implemented with a Future so that handling the 
SharedConfigurationStatusRequest will correctly wait on a Future until cluster 
config has been initialized:
{code:java}
  public SharedConfigurationStatusResponse getSharedConfigurationStatus() {
    ExecutorService es =
        ((GemFireCacheImpl) 
myCache).getDistributionManager().getWaitingThreadPool();
    Future<SharedConfigurationStatusResponse> statusFuture =
        es.submit(new FetchSharedConfigStatus());
    SharedConfigurationStatusResponse response = null;

    try {
      response = statusFuture.get(5, TimeUnit.SECONDS);
    } catch (Exception e) {
      logger.info("Exception occured while fetching the status {}", 
CliUtil.stackTraceAsString(e));
      response = new SharedConfigurationStatusResponse();
      response.setStatus(SharedConfigurationStatus.UNDETERMINED);
    }
    return response;
  }
{code}

2) ConfigurationRequest/ConfigurationResponse -- used to request the actual 
cluster config and return it to a server that's starting up.

The handling of this request looks problematic. It has a retry-loop and it 
acquires a distributed lock. I don't see anything in this code -- 
SharedConfiguration.createConfigurationReponse(ConfigurationRequest) -- that is 
awaiting the initialization of cluster config. The retry loop and the locking 
would only somewhat reduce the chance of replying before initialization 
completes. We could probably remove the retry and the use of distributed lock 
service here and make it much more robust by using a Future in the way that #1 
does.

> Member may fail to receive cluster configuration from locator
> -------------------------------------------------------------
>
>                 Key: GEODE-2238
>                 URL: https://issues.apache.org/jira/browse/GEODE-2238
>             Project: Geode
>          Issue Type: Bug
>          Components: management
>    Affects Versions: 1.0.0-incubating
>            Reporter: Kirk Lund
>            Assignee: Dan Smith
>              Labels: Flaky
>
> LuceneClusterConfigurationDUnitTest.indexWithAnalyzerGetsCreatedUsingClusterConfiguration
>  is failing frequently in precheckin. I'm going to mark it as FlakyTest. 
> Below is the stack trace:
> {noformat}
> :geode-lucene:distributedTest
> org.apache.geode.cache.lucene.internal.configuration.LuceneClusterConfigurationDUnitTest
>  > indexWithAnalyzerGetsCreatedUsingClusterConfiguration FAILED
>     org.apache.geode.test.dunit.RMIException: While invoking 
> org.apache.geode.cache.lucene.internal.configuration.LuceneClusterConfigurationDUnitTest$$Lambda$29/613305101.run
>  in VM 2 running on Host 3fb23bc375ef with 4 VMs
>         at org.apache.geode.test.dunit.VM.invoke(VM.java:344)
>         at org.apache.geode.test.dunit.VM.invoke(VM.java:314)
>         at org.apache.geode.test.dunit.VM.invoke(VM.java:259)
>         at org.apache.geode.test.dunit.rules.Member.invoke(Member.java:60)
>         at 
> org.apache.geode.cache.lucene.internal.configuration.LuceneClusterConfigurationDUnitTest.indexWithAnalyzerGetsCreatedUsingClusterConfiguration(LuceneClusterConfigurationDUnitTest.java:102)
>         Caused by:
>         java.lang.AssertionError
>             at org.junit.Assert.fail(Assert.java:86)
>             at org.junit.Assert.assertTrue(Assert.java:41)
>             at org.junit.Assert.assertNotNull(Assert.java:712)
>             at org.junit.Assert.assertNotNull(Assert.java:722)
>             at 
> org.apache.geode.cache.lucene.internal.configuration.LuceneClusterConfigurationDUnitTest.lambda$indexWithAnalyzerGetsCreatedUsingClusterConfiguration$bb17a952$1(LuceneClusterConfigurationDUnitTest.java:105)
> 94 tests completed, 1 failed
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (GEODE-2238) Member may fail to receive cluster configuration from locator

Reply via email to