zacharymorn commented on a change in pull request #240: URL: https://github.com/apache/lucene/pull/240#discussion_r694404654
########## File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSideways.java ########## @@ -520,10 +458,39 @@ private DrillDownQuery getDrillDownQuery( } @SuppressWarnings("unchecked") - private <R> ConcurrentDrillSidewaysResult<R> searchSequentially( - final DrillDownQuery query, final CollectorManager<?, R> hitCollectorManager) + private <C extends Collector, R> ConcurrentDrillSidewaysResult<R> searchSequentially( + final DrillDownQuery query, final CollectorManager<C, R> hitCollectorManager) throws IOException { + // This mirrors a similar hack from DrillSideways#search(query, collector). + // Without this cache, LRU cache will be used, causing acceptDocs to be null during collection Review comment: Thanks @gsmiller for the additional context & a new dedicated ticket. Just to provide some additional context below to wrap up the discussion in this PR before we cut over. This issue was discovered during my nightly test run with the following seed `./gradlew test --tests TestDrillSideways.testRandom -Dtests.seed=69A3EF02D8E3465E -Dtests.nightly=true` on commit https://github.com/apache/lucene/pull/240/commits/4af77405fcb914d497d82eb60f0011f93079ec7b . The failing stack trace looks something like this: ``` org.apache.lucene.facet.TestDrillSideways > test suite's output saved to /Users/xichen/IdeaProjects/lucene/lucene/facet/build/test-results/test/outputs/OUTPUT-org.apache.lucene.facet.TestDrillSideways.txt, copied below: > java.lang.AssertionError: expected:<56> but was:<59> > at __randomizedtesting.SeedInfo.seed([69A3EF02D8E3465E:1BEFCA0D6983F02D]:0) > at org.junit.Assert.fail(Assert.java:89) > at org.junit.Assert.failNotEquals(Assert.java:835) > at org.junit.Assert.assertEquals(Assert.java:647) > at org.junit.Assert.assertEquals(Assert.java:633) > at org.apache.lucene.facet.TestDrillSideways.verifyEquals(TestDrillSideways.java:1607) > at org.apache.lucene.facet.TestDrillSideways.testRandom(TestDrillSideways.java:1185) ``` and the following difference with verbose output ``` 1> dim0 topN=2 (vs 2 unique values) 1> actual 1> 0: [e1 9d 80 e1 9d 97 e1 9d 82 e1 9d 92]: 74 1> 1: [ea a5 aa ea a5 a2 ea a5 a0]: 59 1> expected (unsorted) 1> 0: [ea a5 aa ea a5 a2 ea a5 a0]: 56 1> 1: [e1 9d 80 e1 9d 97 e1 9d 82 e1 9d 92]: 73 > java.lang.AssertionError: expected:<56> but was:<59> ``` The failing test complained there are more docs collected than expected. As I have limited context on drillsideway / drilldown logic, I ended up spending quite a few hours on debugging this, and found the issue ultimately came from this caching issue. Specifically, here are the main key difference in the call chain that contributed to this difference: #### Expected / existing entry path from `DrillSideways#search(Query, Collector)` with the hack ``` DrillSideways#search(Query, Collector) -> IndexSearcher#createWeight -> NOT cached weight -> ... -> DrillSidewaysScorer#doUnionScoring with INSTANTIATED acceptDocs ``` With this path, some docs will not be collected as `acceptDocs.get(docID) == false`. The hit collector also uses the same `acceptDocs` so the number of matches between hit collector and drillsideway / drilldown collector are also the same. #### With this change, using the entry path from `DrillSideways#searchSequentially(Query, CollectorManager)` and without the hack: ``` DrillSideways#search(Query, Collector) -> IndexSearcher#createWeight -> cached weight -> ... -> DrillSidewaysScorer#doUnionScoring with NULL acceptDocs (NULL was passed in via the LRUQueryCache code path above) ``` With this path, some deleted docs will be collected as `acceptDocs == null`. As the hit collector uses an instantiated `acceptDocs`, here drillsideway / drilldown collector will also collect more docs compared with hit collector. Hope this clarifies. ---- I think for the purpose of this PR, we may still need to leave this hack in since otherwise this may break nightly tests once merged. As it sounds like you discovered another issue related to this, and that needs to be addressed as part of https://issues.apache.org/jira/browse/LUCENE-10060 as well, please feel free to take on that one independently from this, and I can resolve any merge conflict in `DrillSideways` had you merged that change earlier than this one. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org