[ https://issues.apache.org/jira/browse/GEODE-2238?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15768569#comment-15768569 ]
Kirk Lund commented on GEODE-2238: ---------------------------------- There are two request/response messages involving cluster config: 1) SharedConfigurationStatusRequest/SharedConfigurationStatusResponse -- used only to get status of cluster config service such as for "status locator" command. This is implemented with a Future so that handling the SharedConfigurationStatusRequest will correctly wait on a Future until cluster config has been initialized: {code:java} public SharedConfigurationStatusResponse getSharedConfigurationStatus() { ExecutorService es = ((GemFireCacheImpl) myCache).getDistributionManager().getWaitingThreadPool(); Future<SharedConfigurationStatusResponse> statusFuture = es.submit(new FetchSharedConfigStatus()); SharedConfigurationStatusResponse response = null; try { response = statusFuture.get(5, TimeUnit.SECONDS); } catch (Exception e) { logger.info("Exception occured while fetching the status {}", CliUtil.stackTraceAsString(e)); response = new SharedConfigurationStatusResponse(); response.setStatus(SharedConfigurationStatus.UNDETERMINED); } return response; } {code} 2) ConfigurationRequest/ConfigurationResponse -- used to request the actual cluster config and return it to a server that's starting up. The handling of this request looks problematic. It has a retry-loop and it acquires a distributed lock. I don't see anything in this code -- SharedConfiguration.createConfigurationReponse(ConfigurationRequest) -- that is awaiting the initialization of cluster config. The retry loop and the locking would only somewhat reduce the chance of replying before initialization completes. We could probably remove the retry and the use of distributed lock service here and make it much more robust by using a Future in the way that #1 does. > Member may fail to receive cluster configuration from locator > ------------------------------------------------------------- > > Key: GEODE-2238 > URL: https://issues.apache.org/jira/browse/GEODE-2238 > Project: Geode > Issue Type: Bug > Components: management > Affects Versions: 1.0.0-incubating > Reporter: Kirk Lund > Assignee: Dan Smith > Labels: Flaky > > LuceneClusterConfigurationDUnitTest.indexWithAnalyzerGetsCreatedUsingClusterConfiguration > is failing frequently in precheckin. I'm going to mark it as FlakyTest. > Below is the stack trace: > {noformat} > :geode-lucene:distributedTest > org.apache.geode.cache.lucene.internal.configuration.LuceneClusterConfigurationDUnitTest > > indexWithAnalyzerGetsCreatedUsingClusterConfiguration FAILED > org.apache.geode.test.dunit.RMIException: While invoking > org.apache.geode.cache.lucene.internal.configuration.LuceneClusterConfigurationDUnitTest$$Lambda$29/613305101.run > in VM 2 running on Host 3fb23bc375ef with 4 VMs > at org.apache.geode.test.dunit.VM.invoke(VM.java:344) > at org.apache.geode.test.dunit.VM.invoke(VM.java:314) > at org.apache.geode.test.dunit.VM.invoke(VM.java:259) > at org.apache.geode.test.dunit.rules.Member.invoke(Member.java:60) > at > org.apache.geode.cache.lucene.internal.configuration.LuceneClusterConfigurationDUnitTest.indexWithAnalyzerGetsCreatedUsingClusterConfiguration(LuceneClusterConfigurationDUnitTest.java:102) > Caused by: > java.lang.AssertionError > at org.junit.Assert.fail(Assert.java:86) > at org.junit.Assert.assertTrue(Assert.java:41) > at org.junit.Assert.assertNotNull(Assert.java:712) > at org.junit.Assert.assertNotNull(Assert.java:722) > at > org.apache.geode.cache.lucene.internal.configuration.LuceneClusterConfigurationDUnitTest.lambda$indexWithAnalyzerGetsCreatedUsingClusterConfiguration$bb17a952$1(LuceneClusterConfigurationDUnitTest.java:105) > 94 tests completed, 1 failed > {noformat} -- This message was sent by Atlassian JIRA (v6.3.4#6332)