[GitHub] [lucene] gsmiller commented on a change in pull request #240: LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager)

GitBox Sat, 21 Aug 2021 07:06:21 -0700


gsmiller commented on a change in pull request #240:
URL: https://github.com/apache/lucene/pull/240#discussion_r693358901




##########
File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSideways.java
##########
@@ -520,10 +458,39 @@ private DrillDownQuery getDrillDownQuery(
   }
 
   @SuppressWarnings("unchecked")
-  private <R> ConcurrentDrillSidewaysResult<R> searchSequentially(
-      final DrillDownQuery query, final CollectorManager<?, R> 
hitCollectorManager)
+  private <C extends Collector, R> ConcurrentDrillSidewaysResult<R> 
searchSequentially(
+      final DrillDownQuery query, final CollectorManager<C, R> 
hitCollectorManager)
       throws IOException {
 
+    // This mirrors a similar hack from DrillSideways#search(query, collector).
+    // Without this cache, LRU cache will be used, causing acceptDocs to be 
null during collection

Review comment:
       Thanks for the details! I'm still a bit confused though. I'm not totally 
clear on where the problem with `acceptDocs` comes into play (but there are a 
lot of code layers to follow here and I'm still trying to wrap my head around 
it). The `DrillSidewaysScorer` (which implements `BulkScorer`) would get used 
to populate the cached bitset, and it checks against `acceptDocs` when scoring. 
So wouldn't the cached docs already take into account the deleted doc 
filtering? 
   
   I think there _is_ at least one other problem with caching though. It's 
important that the `DrillSidewaysScorer` actually runs during query evaluation 
every time since it's where the "sideways" `FacetsCollectors` get populated 
with the "near miss" docs. So I think there's at least another critical caching 
issue here where only the "true hit" docs would get cached, then on a cache hit 
would get reused without the `DrillSidewaysScorer` getting a chance to populate 
the near misses (so all those FCs would be empty).
   
   So yeah, I'm very much convinced this hack is needed (and I see where it's 
done already for the base `Collector` implementation), but I want to make sure 
I fully understand the caching concerns around `acceptDoc`. Apologies if I'm 
overlooking it!




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] gsmiller commented on a change in pull request #240: LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager)

Reply via email to