[GitHub] [lucene] msokolov commented on a diff in pull request #12380: Add a post-collection hook to LeafCollector.

via GitHub Wed, 28 Jun 2023 05:15:44 -0700


msokolov commented on code in PR #12380:
URL: https://github.com/apache/lucene/pull/12380#discussion_r1245115736



##########
lucene/suggest/src/java/org/apache/lucene/search/suggest/document/TopSuggestDocsCollector.java:
##########
@@ -100,12 +100,19 @@ public int getCountToCollect() {
   @Override
   protected void doSetNextReader(LeafReaderContext context) throws IOException 
{
     docBase = context.docBase;
+  }
+
+  @Override
+  public void finish() throws IOException {
     if (seenSurfaceForms != null) {
-      seenSurfaceForms.clear();
       // NOTE: this also clears the priorityQueue:
       for (SuggestScoreDoc hit : priorityQueue.getResults()) {
         pendingResults.add(hit);
       }
+
+      // Deduplicate all hits: we already dedup'd efficiently within each 
segment by

Review Comment:
   any particular reason to change the order of operations here?



##########
lucene/suggest/src/java/org/apache/lucene/search/suggest/document/TopSuggestDocsCollector.java:
##########
@@ -136,15 +143,7 @@ public TopSuggestDocs get() throws IOException {
     SuggestScoreDoc[] suggestScoreDocs;
 
     if (seenSurfaceForms != null) {
-      // NOTE: this also clears the priorityQueue:
-      for (SuggestScoreDoc hit : priorityQueue.getResults()) {
-        pendingResults.add(hit);
-      }
-
-      // Deduplicate all hits: we already dedup'd efficiently within each 
segment by
-      // truncating the FST top paths search, but across segments there may 
still be dups:
-      seenSurfaceForms.clear();

Review Comment:
   oh I see we did it both ways. Probably makes no difference? Still 
superstitiously I would always want to clear/delete things last just in case...



##########
lucene/test-framework/src/java/org/apache/lucene/tests/search/AssertingLeafCollector.java:
##########
@@ -57,4 +58,11 @@ public void collect(int doc) throws IOException {
   public DocIdSetIterator competitiveIterator() throws IOException {
     return in.competitiveIterator();
   }
+
+  @Override
+  public void finish() throws IOException {
+    assert finishCalled == false;

Review Comment:
   Did we previously disallow re-use of LeafCollectors? If not, this could 
break someone



##########
lucene/test-framework/src/java/org/apache/lucene/tests/search/AssertingCollector.java:
##########
@@ -49,7 +50,9 @@ public LeafCollector getLeafCollector(LeafReaderContext 
context) throws IOExcept
     assert context.docBase >= previousLeafMaxDoc;
     previousLeafMaxDoc = context.docBase + context.reader().maxDoc();
 
+    assert hasFinishedCollectingPreviousLeaf;

Review Comment:
   it's a pity we can't assert that we finished the final leaf



##########
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##########
@@ -749,6 +749,9 @@ protected void search(List<LeafReaderContext> leaves, 
Weight weight, Collector c
           partialResult = true;
         }
       }
+      // Note: this is called if collection ran successfully, including the 
above special cases of
+      // CollectionTerminatedException and TimeExceededException, but no other 
exception.
+      leafCollector.finish();

Review Comment:
   This could be an opportunity for capturing statistics about how often 
time-limitation is applied?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] msokolov commented on a diff in pull request #12380: Add a post-collection hook to LeafCollector.

Reply via email to