msokolov commented on code in PR #12380: URL: https://github.com/apache/lucene/pull/12380#discussion_r1245115736
########## lucene/suggest/src/java/org/apache/lucene/search/suggest/document/TopSuggestDocsCollector.java: ########## @@ -100,12 +100,19 @@ public int getCountToCollect() { @Override protected void doSetNextReader(LeafReaderContext context) throws IOException { docBase = context.docBase; + } + + @Override + public void finish() throws IOException { if (seenSurfaceForms != null) { - seenSurfaceForms.clear(); // NOTE: this also clears the priorityQueue: for (SuggestScoreDoc hit : priorityQueue.getResults()) { pendingResults.add(hit); } + + // Deduplicate all hits: we already dedup'd efficiently within each segment by Review Comment: any particular reason to change the order of operations here? ########## lucene/suggest/src/java/org/apache/lucene/search/suggest/document/TopSuggestDocsCollector.java: ########## @@ -136,15 +143,7 @@ public TopSuggestDocs get() throws IOException { SuggestScoreDoc[] suggestScoreDocs; if (seenSurfaceForms != null) { - // NOTE: this also clears the priorityQueue: - for (SuggestScoreDoc hit : priorityQueue.getResults()) { - pendingResults.add(hit); - } - - // Deduplicate all hits: we already dedup'd efficiently within each segment by - // truncating the FST top paths search, but across segments there may still be dups: - seenSurfaceForms.clear(); Review Comment: oh I see we did it both ways. Probably makes no difference? Still superstitiously I would always want to clear/delete things last just in case... ########## lucene/test-framework/src/java/org/apache/lucene/tests/search/AssertingLeafCollector.java: ########## @@ -57,4 +58,11 @@ public void collect(int doc) throws IOException { public DocIdSetIterator competitiveIterator() throws IOException { return in.competitiveIterator(); } + + @Override + public void finish() throws IOException { + assert finishCalled == false; Review Comment: Did we previously disallow re-use of LeafCollectors? If not, this could break someone ########## lucene/test-framework/src/java/org/apache/lucene/tests/search/AssertingCollector.java: ########## @@ -49,7 +50,9 @@ public LeafCollector getLeafCollector(LeafReaderContext context) throws IOExcept assert context.docBase >= previousLeafMaxDoc; previousLeafMaxDoc = context.docBase + context.reader().maxDoc(); + assert hasFinishedCollectingPreviousLeaf; Review Comment: it's a pity we can't assert that we finished the final leaf ########## lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java: ########## @@ -749,6 +749,9 @@ protected void search(List<LeafReaderContext> leaves, Weight weight, Collector c partialResult = true; } } + // Note: this is called if collection ran successfully, including the above special cases of + // CollectionTerminatedException and TimeExceededException, but no other exception. + leafCollector.finish(); Review Comment: This could be an opportunity for capturing statistics about how often time-limitation is applied? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org