Re: [PR] Ensure LeafCollector#finish is only called once on the main collector during drill-sideways [lucene]

via GitHub Thu, 12 Oct 2023 09:25:33 -0700


gsmiller commented on code in PR #12642:
URL: https://github.com/apache/lucene/pull/12642#discussion_r1357072892



##########
lucene/facet/src/test/org/apache/lucene/facet/TestDrillSideways.java:
##########
@@ -316,6 +316,58 @@ public void testBasic() throws Exception {
     IOUtils.close(searcher.getIndexReader(), taxoReader, taxoWriter, dir, 
taxoDir);
   }
 
+  public void testCollectionTerminated() throws Exception {
+    try (Directory dir = newDirectory();
+        Directory taxoDir = newDirectory();
+        RandomIndexWriter w = new RandomIndexWriter(random(), dir);
+        DirectoryTaxonomyWriter taxoW =
+            new DirectoryTaxonomyWriter(taxoDir, 
IndexWriterConfig.OpenMode.CREATE)) {
+      FacetsConfig facetsConfig = new FacetsConfig();
+
+      Document d = new Document();
+      d.add(new FacetField("foo", "bar"));
+      w.addDocument(facetsConfig.build(taxoW, d));
+
+      try (IndexReader r = w.getReader();
+          TaxonomyReader taxoR = new DirectoryTaxonomyReader(taxoW)) {
+        IndexSearcher searcher = new IndexSearcher(r);
+
+        Query baseQuery = new MatchAllDocsQuery();
+        Query dimQ = new TermQuery(new Term("foo", "bar"));
+
+        DrillDownQuery ddq = new DrillDownQuery(facetsConfig, baseQuery);
+        ddq.add("foo", dimQ);
+        DrillSideways drillSideways = new DrillSideways(searcher, 
facetsConfig, taxoR);
+
+        CollectorManager<?, ?> cm =
+            new CollectorManager<>() {
+              @Override
+              public Collector newCollector() throws IOException {
+                return new Collector() {

Review Comment:
   Yeah, I'm kind of conflicted here to be honest. I'm hesitant to expose 
something like `hasFinishedCollectingPreviousLeaf` for this one testing 
use-case. Another option would be to somehow make `AssertingBulkScorer` "aware" 
of when it's allowed to score ranges of documents. That's the fundamental issue 
here is that we have this drill-sideways bulk scorer `DrillSidewaysScorer` that 
_must_ score all docs at once, and cannot score ranges. This really just 
exposes some fundamental design frictions with drill-sideways, but I don't want 
to spiral this change into something much bigger than it needs to be (i.e., I 
acknowledge that we might benefit from a rethinking of how we expose 
drill-sideways functionality, but that's a much bigger task).
   
   So... things we can do here without turning this into much more work:
   1. Expose `hasFinishedCollectingPreviousLeaf` as you suggest
   2. Skip using this "asserting" family of classes and just implement our own 
wrapper classes specific to this test that are meant to just test the calling 
of `finish`
   3. Add a new method to the `BulkScorer` API that communicates whether-or-not 
doc range scoring is legal, then make `AssertingBulkScorer` aware of that new 
method and have it check whether-or-not it can try scoring a range of docs
   4. Have `AssertingBulkScorer` check the instance type of the bulk scorer 
it's wrapping, which would require exposing the definition of 
`DrillSidewaysScorer` beyond the facets package (i.e., make it public)
   
   I don't like #3; adding a new method to the `BulkScorer` definition feels 
like overkill to solve this. I also don't really like #1 or #4 since they both 
require visibility modifications just for this test, but I think #1 is better 
than #4 since it only impacts testing code and not production code. I also 
don't love #2 since we miss out on other nice checks.
   
   I think I'm convinced that your suggestion of exposing 
`hasFinishedCollectingPreviousLeaf` is the least of all evils here. I'll give 
that a try and let's see what we think.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Ensure LeafCollector#finish is only called once on the main collector during drill-sideways [lucene]

Reply via email to