[GitHub] [lucene] javanna commented on a diff in pull request #12348: Make SliceExecutor extensible and include method to computeSlices. This will allow different custom implementation to be plugged in with IndexSearcher to compute and execute slices on provided executor

via GitHub Thu, 08 Jun 2023 14:34:25 -0700


javanna commented on code in PR #12348:
URL: https://github.com/apache/lucene/pull/12348#discussion_r1223551727



##########
lucene/core/src/java/org/apache/lucene/search/SliceExecutor.java:
##########
@@ -55,4 +74,8 @@ boolean shouldExecuteOnCallerThread(int index, int numTasks) {
     // Execute last task on caller thread
     return index == numTasks - 1;
   }
+
+  public Executor getExecutor() {

Review Comment:
   looks like there would be no need for this method if we made 
`getSliceExecutionControlPlane` public



##########
lucene/core/src/java/org/apache/lucene/search/SliceExecutor.java:
##########
@@ -18,22 +18,41 @@
 package org.apache.lucene.search;
 
 import java.util.Collection;
+import java.util.List;
 import java.util.Objects;
 import java.util.concurrent.Executor;
 import java.util.concurrent.RejectedExecutionException;
+import org.apache.lucene.index.LeafReaderContext;
 
 /**
  * Executor which is responsible for execution of slices based on the current 
status of the system
  * and current system load
  */
-class SliceExecutor {
+public class SliceExecutor {

Review Comment:
   I am thinking this class should be marked experimental if it's made public



##########
lucene/core/src/java/org/apache/lucene/search/SliceExecutor.java:
##########
@@ -18,22 +18,41 @@
 package org.apache.lucene.search;
 
 import java.util.Collection;
+import java.util.List;
 import java.util.Objects;
 import java.util.concurrent.Executor;
 import java.util.concurrent.RejectedExecutionException;
+import org.apache.lucene.index.LeafReaderContext;
 
 /**
  * Executor which is responsible for execution of slices based on the current 
status of the system
  * and current system load
  */
-class SliceExecutor {
+public class SliceExecutor {
+
+  /** Thresholds for index slice allocation logic */
+  private static final int MAX_DOCS_PER_SLICE = 250_000;
+
+  private static final int MAX_SEGMENTS_PER_SLICE = 5;
+
   private final Executor executor;
 
-  SliceExecutor(Executor executor) {
+  public SliceExecutor(Executor executor) {
     this.executor = Objects.requireNonNull(executor, "Executor is null");
   }
 
-  final void invokeAll(Collection<? extends Runnable> tasks) {
+  /**
+   * method to segregate LeafReaderContexts amongst multiple slices using the 
default
+   * MAX_SEGMENTS_PER_SLICE and MAX_DOCUMENTS_PER_SLICE
+   *
+   * @param leaves LeafReaderContexts for this index
+   * @return computed slices
+   */
+  public LeafSlice[] computeSlices(List<LeafReaderContext> leaves) {
+    return IndexSearcher.slices(leaves, MAX_DOCS_PER_SLICE, 
MAX_SEGMENTS_PER_SLICE);

Review Comment:
   I wonder if it'd make sense to move the slices static method from 
`IndexSearcher` to `SliceExecutor`, so that we don't have to call 
IndexSearcher.slices from here.



##########
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##########
@@ -882,27 +885,6 @@ public IndexReaderContext getTopReaderContext() {
     return readerContext;
   }
 
-  /**
-   * A class holding a subset of the {@link IndexSearcher}s leaf contexts to 
be executed within a
-   * single thread.
-   *
-   * @lucene.experimental
-   */
-  public static class LeafSlice {
-
-    /**
-     * The leaves that make up this slice.
-     *
-     * @lucene.experimental
-     */
-    public final LeafReaderContext[] leaves;
-
-    public LeafSlice(List<LeafReaderContext> leavesList) {
-      Collections.sort(leavesList, Comparator.comparingInt(l -> l.docBase));
-      this.leaves = leavesList.toArray(new LeafReaderContext[0]);
-    }
-  }

Review Comment:
   I wonder if it's necessary to move this class to be an outer class. It is 
marked experimental, so we should be free to move it, yet I would tend to avoid 
breaking consumers if possible.



##########
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##########
@@ -225,12 +217,24 @@ public IndexSearcher(IndexReaderContext context, Executor 
executor) {
     this(context, executor, getSliceExecutionControlPlane(executor));
   }
 
-  // Package private for testing
-  IndexSearcher(IndexReaderContext context, Executor executor, SliceExecutor 
sliceExecutor) {
+  /**
+   * Creates a searcher searching the provided top-level {@link 
IndexReaderContext}.
+   *
+   * <p>Given a non-<code>null</code> {@link Executor} and {@link 
SliceExecutor} this method runs
+   * searches for each segment separately, using the provided Executor. The 
provided {@link
+   * Executor} instance should be same as the one used in {@link 
SliceExecutor} NOTE: if you are
+   * using {@link NIOFSDirectory}, do not use the shutdownNow method of 
ExecutorService as this uses
+   * Thread.interrupt under-the-hood which can silently close file descriptors 
(see <a href=
+   * "https://issues.apache.org/jira/browse/LUCENE-2239";>LUCENE-2239</a>).
+   *
+   * @lucene.experimental
+   */
+  public IndexSearcher(IndexReaderContext context, Executor executor, 
SliceExecutor sliceExecutor) {

Review Comment:
   I wonder if we should make this constructor public, or instead make the 
`getSliceExecutionControlPlane` method public. Would this second option work 
for what you need or would you still run into issues as it's called from the 
`IndexSearcher` constructor a bit like `slices`?
   
   Otherwise I'd think we should look at having a single way, either provide an 
`Executor` or a `SliceExecutor`.



##########
lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java:
##########
@@ -319,11 +323,10 @@ public QueryCachingPolicy getQueryCachingPolicy() {
 
   /**
    * Expert: Creates an array of leaf slices each holding a subset of the 
given leaves. Each {@link
-   * LeafSlice} is executed in a single thread. By default, segments with more 
than
-   * MAX_DOCS_PER_SLICE will get their own thread
+   * LeafSlice} is executed in a single thread.
    */
   protected LeafSlice[] slices(List<LeafReaderContext> leaves) {

Review Comment:
   I don't see the point of overriding this if we are making `SliceExecutor` 
public in order for it to be extended. I would remove this method from 
`IndexSearcher`. 



##########
lucene/core/src/java/org/apache/lucene/search/SliceExecutor.java:
##########
@@ -55,4 +74,8 @@ boolean shouldExecuteOnCallerThread(int index, int numTasks) {
     // Execute last task on caller thread

Review Comment:
   should we make `shouldExecuteOnCallerThread` also public?



##########
lucene/core/src/java/org/apache/lucene/search/SliceExecutor.java:
##########
@@ -18,22 +18,41 @@
 package org.apache.lucene.search;
 
 import java.util.Collection;
+import java.util.List;
 import java.util.Objects;
 import java.util.concurrent.Executor;
 import java.util.concurrent.RejectedExecutionException;
+import org.apache.lucene.index.LeafReaderContext;
 
 /**
  * Executor which is responsible for execution of slices based on the current 
status of the system
  * and current system load
  */
-class SliceExecutor {
+public class SliceExecutor {
+
+  /** Thresholds for index slice allocation logic */
+  private static final int MAX_DOCS_PER_SLICE = 250_000;
+
+  private static final int MAX_SEGMENTS_PER_SLICE = 5;
+
   private final Executor executor;
 
-  SliceExecutor(Executor executor) {
+  public SliceExecutor(Executor executor) {
     this.executor = Objects.requireNonNull(executor, "Executor is null");
   }
 
-  final void invokeAll(Collection<? extends Runnable> tasks) {
+  /**
+   * method to segregate LeafReaderContexts amongst multiple slices using the 
default
+   * MAX_SEGMENTS_PER_SLICE and MAX_DOCUMENTS_PER_SLICE
+   *
+   * @param leaves LeafReaderContexts for this index
+   * @return computed slices
+   */
+  public LeafSlice[] computeSlices(List<LeafReaderContext> leaves) {
+    return IndexSearcher.slices(leaves, MAX_DOCS_PER_SLICE, 
MAX_SEGMENTS_PER_SLICE);
+  }
+
+  public void invokeAll(Collection<? extends Runnable> tasks) {

Review Comment:
   Could you add javadocs now that the method is public?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] javanna commented on a diff in pull request #12348: Make SliceExecutor extensible and include method to computeSlices. This will allow different custom implementation to be plugged in with IndexSearcher to compute and execute slices on provided executor

Reply via email to