[ https://issues.apache.org/jira/browse/SOLR-13943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Chris M. Hostetter updated SOLR-13943: -------------------------------------- Attachment: apache_Lucene-Solr-repro-Java11_618.log.txt apache_Lucene-Solr-BadApples-Tests-master_533.log.txt apache_Lucene-Solr-BadApples-Tests-master_531.log.txt Status: Open (was: Open) The root cause of the problem apperas to be that the test assumes it can set a "watcher" on ZK to monitor for changes to {{/aliases.json}} and then {{await()}} on a latch that will be updated when that watcher fires. Once the {{await()}} returns, it tries to parse the {{ROUTER_START}} property of the (updated) alias. The problem with this approach, is that when ZK updates result in notifying watchers, there is no garuntee which order the watchers are called in or how quickly they will be called. The watcher registered (and {{await()}}ed) by the test thread can fire before the {{AliasesManager}} is updated in the {{ZkStateReader}} used by the {{ZkClientClusterStateProvider}} -- _which is what the test consults when asserting the value of the {{ROUTER_START}} property_. The test either needs to ignore the {{clusterStateProvider}} and use the data provided to it's own watcher to veirfying the property was updated as expected, *OR* it needs to "hook in" to the {{ZkStateReader}} / {{AliasesManager}} and only proceed once they aware of the latest {{aliases.json}} information. ---- FWIW: I notice that the {{AliasesManager}} is public on {{ZkStateReader}} and has a a public {{update()}} method that forces a sync with ZK. While it's almost certianly not a best practice to do force sync's with ZK in production code, doing so here *using the ZkStateReader of the underlying {{ClusterStateProvider}}* after the {{aliasUpdate.await()}} may be suitable for test purposes? I should also point out: if monitoring/wating on {{/alias.json}} updates is a common occurance (even if just for tests), there should probably be public APIs for doing so similar to the {{DocCollectionWatcher}}, {{CollectionPropsWatcher}}, and {{LiveNodesWatcher}} APIs > TimeRoutedAliasUpdateProcessorTest.testDateMathInStart: multi-threaded race > condition due to ZK assumptions > ----------------------------------------------------------------------------------------------------------- > > Key: SOLR-13943 > URL: https://issues.apache.org/jira/browse/SOLR-13943 > Project: Solr > Issue Type: Test > Security Level: Public(Default Security Level. Issues are Public) > Reporter: Chris M. Hostetter > Priority: Major > Attachments: apache_Lucene-Solr-BadApples-Tests-master_531.log.txt, > apache_Lucene-Solr-BadApples-Tests-master_533.log.txt, > apache_Lucene-Solr-repro-Java11_618.log.txt > > > TimeRoutedAliasUpdateProcessorTest does not currently run in many jenkins > builds due to being marked BadApple(SOLR-13059) -- however when it does run, > the method {{testDateMathInStart}} frequently fails due to what appears to be > a multi-threaded race condition in the test logic... > {noformat} > [junit4] 2> NOTE: reproduce with: ant test > -Dtestcase=TimeRoutedAliasUpdateProcessorTest > -Dtests.method=testDateMathInStart -Dtests.seed=8879E35521A4B9EA > -Dtests.multiplier=2 -Dtests. > slow=true -Dtests.badapples=true -Dtests.locale=nl-BQ > -Dtests.timezone=America/Porto_Acre -Dtests.asserts=true > -Dtests.file.encoding=UTF-8 > [junit4] FAILURE 6.96s J0 | > TimeRoutedAliasUpdateProcessorTest.testDateMathInStart <<< > [junit4] > Throwable #1: java.lang.AssertionError: router.start should > not have any date math by this point and parse as an instant. Using class > org.apache.solr.client.solrj.impl.ZkCl > ientClusterStateProvider Found:2019-09-14T03:00:00Z/DAY > [junit4] > at > __randomizedtesting.SeedInfo.seed([8879E35521A4B9EA:64FE3DD88112B802]:0) > [junit4] > at > org.apache.solr.update.processor.TimeRoutedAliasUpdateProcessorTest.testDateMathInStart(TimeRoutedAliasUpdateProcessorTest.java:765) > [junit4] > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > [junit4] > at > java.base/jdk.internal.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > [junit4] > at > java.base/jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > [junit4] > at > java.base/java.lang.reflect.Method.invoke(Method.java:566) > [junit4] > at java.base/java.lang.Thread.run(Thread.java:834) > {noformat} > I'll attach some logs from recent failures and my own quick analysis of the > problems of how the test appears to be asserting ZK updates. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org