[GitHub] [lucene] msokolov commented on a change in pull request #142: LUCENE-9944: Allow DrillSideways users to pass a CollectorManager without requiring an ExecutorService (and concurrent DrillSideways approach).

GitBox Tue, 25 May 2021 19:02:48 -0700


msokolov commented on a change in pull request #142:
URL: https://github.com/apache/lucene/pull/142#discussion_r639346222




##########
File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysQuery.java
##########
@@ -40,21 +41,65 @@
 // TODO change the way DrillSidewaysScorer is used, this query does not work
 // with filter caching
 class DrillSidewaysQuery extends Query {
+
   final Query baseQuery;
-  final Collector drillDownCollector;
-  final Collector[] drillSidewaysCollectors;
+
+  // The caller must either directly provide FacetsCollectors for the drill 
down and sideways
+  // collecting, or provide FacetsCollectorManagers used to create the 
FacetsCollectors. If
+  // this query will be executed concurrently, FacetsCollectorManagers should 
be used to ensure

Review comment:
       can we raise an exception if this happens?

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysQuery.java
##########
@@ -131,8 +185,24 @@ public boolean isCacheable(LeafReaderContext ctx) {
       public BulkScorer bulkScorer(LeafReaderContext context) throws 
IOException {
         Scorer baseScorer = baseWeight.scorer(context);
 
+        int drillDownCount = drillDowns.length;
+
+        // TODO: If the caller provided a FacetsCollectorManager instead of 
directly providing
+        // FacetsCollectors, we assume this will be invoked during a 
concurrent search. Ideally
+        // we'd only create new FacetsCollectors for each "leaf slice" that 
will be concurrently
+        // searched, as opposed to each actual leaf, but we don't have that 
information at this
+        // level so we always provide a new FacetsCollector. There might be a 
better way to
+        // refactor this logic.

Review comment:
       I think in the case of collecting results, we create the collectors in 
`IndexSearcher.search(Query query, CollectorManager<C, T> collectorManager)`, 
which knows about the slices. If we want to improve this, we could add a new 
`IndexSearcher.search` accepting a DrilldownQuery, but maybe that's something 
for another day?

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysQuery.java
##########
@@ -40,21 +41,65 @@
 // TODO change the way DrillSidewaysScorer is used, this query does not work
 // with filter caching
 class DrillSidewaysQuery extends Query {
+
   final Query baseQuery;
-  final Collector drillDownCollector;
-  final Collector[] drillSidewaysCollectors;
+
+  // The caller must either directly provide FacetsCollectors for the drill 
down and sideways
+  // collecting, or provide FacetsCollectorManagers used to create the 
FacetsCollectors. If
+  // this query will be executed concurrently, FacetsCollectorManagers should 
be used to ensure
+  // multiple threads aren't collecting into the same FacetsCollector.
+  final FacetsCollector drillDownCollector;
+  final FacetsCollector[] drillSidewaysCollectors;
+  final FacetsCollectorManager drillDownCollectorManager;
+  final FacetsCollectorManager[] drillSidewaysCollectorManagers;
+  final List<FacetsCollector> managedDrillDownCollectors;
+  final List<FacetsCollector[]> managedDrillSidewaysCollectors;
+
   final Query[] drillDownQueries;
+
   final boolean scoreSubDocsAtOnce;
 
+  /**
+   * Construct a new {@code DrillSidewaysQuery} that will directly use the 
provided {@link
+   * FacetsCollector}s. Use this if you're certain that the query will not be 
executed concurrently.

Review comment:
       Can we give users an idea of the tradeoff here? Naive user will like to 
execute concurrently if it goes faster, but would prefer single-threaded 
execution ... if they want to provide a custom collector? Some other reason?

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSidewaysQuery.java
##########
@@ -142,7 +212,19 @@ public BulkScorer bulkScorer(LeafReaderContext context) 
throws IOException {
                 new ConstantScoreScorer(drillDowns[dim], 0f, scoreMode, 
DocIdSetIterator.empty());
           }
 
-          dims[dim] = new DrillSidewaysScorer.DocsAndCost(scorer, 
drillSidewaysCollectors[dim]);
+          // If the caller directly provided FacetsCollectors to use for the 
sideways dimensions,

Review comment:
       Could we simplify the API here by eliminating the two different calling 
methods and instead always requiring a CollectorManager?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] msokolov commented on a change in pull request #142: LUCENE-9944: Allow DrillSideways users to pass a CollectorManager without requiring an ExecutorService (and concurrent DrillSideways approach).

Reply via email to