[GitHub] [lucene] zacharymorn commented on a change in pull request #240: LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager)

GitBox Fri, 20 Aug 2021 00:17:50 -0700


zacharymorn commented on a change in pull request #240:
URL: https://github.com/apache/lucene/pull/240#discussion_r692722544




##########
File path: lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java
##########
@@ -527,7 +503,10 @@ public TopDocs search(Query query, int n) throws 
IOException {
    *
    * @throws TooManyClauses If a query would exceed {@link 
IndexSearcher#getMaxClauseCount()}
    *     clauses.
+   * @deprecated This method is being deprecated in favor of {@link 
IndexSearcher#search(Query,

Review comment:
       > I think we like the @lucene.deprecated tag?
   
   Do you mean `@lucene.deprecated` should be created and used instead of using 
`@deprecated`? I can't seems to find that tag is being used in lucene 
actually...I think `@lucene.deprecated` most likely won't be recognized by IDE 
or build tool to signal / warn the use of deprecated methods though.
   
   > Is your reasoning behind keeping this in there for now that all of our 
uses of this (internal to Lucene) haven't yet migrated as part of this change? 
Would the plan me to migrate all usages off of this internally and then 
actually remove it on main/9.0, or are you thinking of keeping it around until 
10.0? I think our backwards compatibility policy is such that we could just 
directly remove this on main/9.0, but then leave it like you have it (marked 
deprecated) if you choose to backport this to 8x. Since this method is so 
fundamental though, I could easily see an argument to keep it around for an 
extra major release to give users more time to migrate. Then again, the 
migration path seems pretty straight-forward. What do you think?
   
   As explained in 
https://issues.apache.org/jira/browse/LUCENE-10002?focusedCommentId=17397827&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-17397827,
 I use deprecation instead of direct removal in this PR exactly for the reasons 
you mentioned:
   1. Not all internal uses have been migrated in this PR (which may require 
another few thousand lines of changes).
   1. I personally feel that we should give users more time for migration and 
testing out the changes, so 10.0 release might be a good timing for complete 
removal. 
   
   I would be interested in seeing how folks feel about the timing of removal, 
but after #1  is completed, complete removal of IndexSearcher#search(Query, 
Collector) in lucene codebase should require only straightforward deletion 
changes.

##########
File path: lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java
##########
@@ -659,9 +614,12 @@ public TopFieldDocs reduce(Collection<TopFieldCollector> 
collectors) throws IOEx
    */
   public <C extends Collector, T> T search(Query query, CollectorManager<C, T> 
collectorManager)
       throws IOException {
-    if (executor == null || leafSlices.length <= 1) {
+    if (executor == null || leafSlices.length == 0) {

Review comment:
       Technically speaking in this block, `collectorManager#newCollector` & 
`collectorManager#reduce` etc are still used. But I couldn't remember why I 
made this change exactly (maybe was just trying to still use the thread pool 
executor to process this case as well), so I reverted this change.

##########
File path: lucene/core/src/java/org/apache/lucene/search/TopFieldCollector.java
##########
@@ -383,9 +382,13 @@ protected void updateMinCompetitiveScore(Scorable scorer) 
throws IOException {
    *     count of the result will be accurate. {@link Integer#MAX_VALUE} may 
be used to make the hit
    *     count accurate, but this will also make query processing slower.
    * @return a {@link TopFieldCollector} instance which will sort the results 
by the sort criteria.
+   * @deprecated This method is being deprecated in favor of using the 
constructor of {@link
+   *     TopFieldCollectorManager} due to its support for concurrency in 
IndexSearcher
    */
+  @Deprecated

Review comment:
       Please see above.

##########
File path: lucene/core/src/java/org/apache/lucene/search/TopFieldCollector.java
##########
@@ -407,97 +410,14 @@ public static TopFieldCollector create(Sort sort, int 
numHits, int totalHitsThre
    *     field is indexed both with doc values and points. In this case, there 
is an assumption that
    *     the same data is stored in these points and doc values.
    * @return a {@link TopFieldCollector} instance which will sort the results 
by the sort criteria.
+   * @deprecated This method is being deprecated in favor of using the 
constructor of {@link
+   *     TopFieldCollectorManager} due to its support for concurrency in 
IndexSearcher
    */
+  @Deprecated

Review comment:
       Please see above.

##########
File path: 
lucene/test-framework/src/java/org/apache/lucene/util/FixedBitSetCollector.java
##########
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+import java.util.Collection;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.search.CollectorManager;
+import org.apache.lucene.search.ScoreMode;
+import org.apache.lucene.search.SimpleCollector;
+
+/** Test utility collector that uses FixedBitSet to record hits. */
+public class FixedBitSetCollector extends SimpleCollector {
+  private FixedBitSet hits;

Review comment:
       Updated.

##########
File path: 
lucene/test-framework/src/java/org/apache/lucene/util/FixedBitSetCollector.java
##########
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+import java.util.Collection;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.search.CollectorManager;
+import org.apache.lucene.search.ScoreMode;
+import org.apache.lucene.search.SimpleCollector;
+
+/** Test utility collector that uses FixedBitSet to record hits. */
+public class FixedBitSetCollector extends SimpleCollector {
+  private FixedBitSet hits;
+  private int docBase;
+
+  public FixedBitSetCollector(int maxDoc) {
+    hits = new FixedBitSet(maxDoc);
+  }
+
+  @Override
+  public ScoreMode scoreMode() {
+    return ScoreMode.COMPLETE_NO_SCORES;
+  }
+
+  @Override
+  protected void doSetNextReader(LeafReaderContext context) {
+    docBase = context.docBase;
+  }
+
+  @Override
+  public void collect(int doc) {
+    hits.set(docBase + doc);
+  }
+
+  public FixedBitSet getHits() {
+    return hits;
+  }
+
+  public static CollectorManager<FixedBitSetCollector, FixedBitSet> create(int 
maxDoc) {
+    return new CollectorManager<>() {
+      @Override
+      public FixedBitSetCollector newCollector() {
+        return new FixedBitSetCollector(maxDoc);
+      }
+
+      @Override
+      public FixedBitSet reduce(Collection<FixedBitSetCollector> collectors) {
+        FixedBitSet result = new FixedBitSet(maxDoc);
+        collectors.stream().forEach(c -> result.or(c.getHits()));

Review comment:
       You are right! Updated.

##########
File path: 
lucene/test-framework/src/java/org/apache/lucene/util/FixedBitSetCollector.java
##########
@@ -0,0 +1,68 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util;
+
+import java.util.Collection;
+import org.apache.lucene.index.LeafReaderContext;
+import org.apache.lucene.search.CollectorManager;
+import org.apache.lucene.search.ScoreMode;
+import org.apache.lucene.search.SimpleCollector;
+
+/** Test utility collector that uses FixedBitSet to record hits. */
+public class FixedBitSetCollector extends SimpleCollector {
+  private FixedBitSet hits;
+  private int docBase;
+
+  public FixedBitSetCollector(int maxDoc) {
+    hits = new FixedBitSet(maxDoc);
+  }
+
+  @Override
+  public ScoreMode scoreMode() {
+    return ScoreMode.COMPLETE_NO_SCORES;
+  }
+
+  @Override
+  protected void doSetNextReader(LeafReaderContext context) {
+    docBase = context.docBase;
+  }
+
+  @Override
+  public void collect(int doc) {
+    hits.set(docBase + doc);
+  }
+
+  public FixedBitSet getHits() {
+    return hits;
+  }
+
+  public static CollectorManager<FixedBitSetCollector, FixedBitSet> create(int 
maxDoc) {

Review comment:
       Makes sense. Updated.

##########
File path: 
lucene/core/src/java/org/apache/lucene/search/TopScoreDocCollectorManager.java
##########
@@ -0,0 +1,142 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.search;
+
+import java.io.IOException;
+import java.util.Collection;
+
+/**
+ * Create a TopScoreDocCollectorManager which uses a shared hit counter to 
maintain number of hits
+ * and a shared {@link MaxScoreAccumulator} to propagate the minimum score 
across segments
+ *
+ * <p>Note that a new collectorManager should be created for each search due 
to its internal states.
+ */
+public class TopScoreDocCollectorManager
+    implements CollectorManager<TopScoreDocCollector, TopDocs> {
+  private final int numHits;
+  private final ScoreDoc after;
+  private final HitsThresholdChecker hitsThresholdChecker;
+  private final MaxScoreAccumulator minScoreAcc;
+
+  /**
+   * Creates a new {@link TopScoreDocCollectorManager} given the number of 
hits to collect and the
+   * number of hits to count accurately.
+   *
+   * <p><b>NOTE</b>: If the total hit count of the top docs is less than or 
exactly {@code
+   * totalHitsThreshold} then this value is accurate. On the other hand, if 
the {@link
+   * TopDocs#totalHits} value is greater than {@code totalHitsThreshold} then 
its value is a lower
+   * bound of the hit count. A value of {@link Integer#MAX_VALUE} will make 
the hit count accurate
+   * but will also likely make query processing slower.
+   *
+   * <p><b>NOTE</b>: The instances returned by this method pre-allocate a full 
array of length
+   * <code>numHits</code>, and fill the array with sentinel objects.
+   *
+   * @param numHits the number of results to collect.
+   * @param after the previous doc after which matching docs will be collected.
+   * @param totalHitsThreshold the number of docs to count accurately. If the 
query matches more
+   *     than {@code totalHitsThreshold} hits then its hit count will be a 
lower bound. On the other
+   *     hand if the query matches less than or exactly {@code 
totalHitsThreshold} hits then the hit
+   *     count of the result will be accurate. {@link Integer#MAX_VALUE} may 
be used to make the hit
+   *     count accurate, but this will also make query processing slower.
+   * @param supportsConcurrency to use thread-safe and slower internal states 
for count tracking.
+   */
+  public TopScoreDocCollectorManager(
+      int numHits, ScoreDoc after, int totalHitsThreshold, boolean 
supportsConcurrency) {

Review comment:
       Yes the concern is indeed valid. Technically speaking this is probably 
an existing issue, as you can accidentally pass in not thread-safe internals to 
collectorManager to be used in a concurrent index searcher (the example below 
is thread-safe and internal to lucene's method, but the user can implement 
similar logic and accidentally pass in not thread-safe `HitsThresholdChecker` 
and nothing would check for thread-safety afterward) :
   
   
https://github.com/apache/lucene/blob/efb7b2a5e8c1bdc19dfd65f7095f70a142343472/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L482-L510
   
   On the other hand, not all collectorManager requires concurrency support, 
for example: 
   
   
https://github.com/apache/lucene/blob/efb7b2a5e8c1bdc19dfd65f7095f70a142343472/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L430-L447
   
   However, given we are trying to promote IndexSearch#search(Query, 
CollectorManager) to be the main method for collection, and to guard against 
hard to debug concurrency related errors, I feel we should definitely do what 
you suggested. Maybe we can add a new method `supportsConcurrency` to 
CollectorManager's API like below to "force" the user to specify it, and to 
allow `IndexSearcher` to check for it:
   
   ```
   public interface CollectorManager<C extends Collector, T> {
   
     C newCollector() throws IOException;
   
     T reduce(Collection<C> collectors) throws IOException;
   
     boolean supportsConcurrency();
   }
   ```
   
   What do you think?

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/DrillSideways.java
##########
@@ -520,10 +458,39 @@ private DrillDownQuery getDrillDownQuery(
   }
 
   @SuppressWarnings("unchecked")
-  private <R> ConcurrentDrillSidewaysResult<R> searchSequentially(
-      final DrillDownQuery query, final CollectorManager<?, R> 
hitCollectorManager)
+  private <C extends Collector, R> ConcurrentDrillSidewaysResult<R> 
searchSequentially(
+      final DrillDownQuery query, final CollectorManager<C, R> 
hitCollectorManager)
       throws IOException {
 
+    // This mirrors a similar hack from DrillSideways#search(query, collector).
+    // Without this cache, LRU cache will be used, causing acceptDocs to be 
null during collection

Review comment:
       Sorry about the typo there, I meant `Without this hack`. The caching 
decision happens here 
   
   
https://github.com/apache/lucene/blob/5896e5389a83f657781875a852120615ba4763dc/lucene/core/src/java/org/apache/lucene/search/IndexSearcher.java#L863-L866
   
   so the hack of returning `ScoreMode.COMPLETE` basically would cause 
`scoreMode.needsScores() == true` in this logic. 
   
   Without this hack, eventually this logic will be invoked, passing `null` for 
`acceptDocs`, causing some deleted docs to be collected as well.
   
   
https://github.com/apache/lucene/blob/5896e5389a83f657781875a852120615ba4763dc/lucene/core/src/java/org/apache/lucene/search/LRUQueryCache.java#L509-L521




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] zacharymorn commented on a change in pull request #240: LUCENE-10002: Deprecate IndexSearch#search(Query, Collector) in favor of IndexSearcher#search(Query, CollectorManager)

Reply via email to