[GitHub] [lucene] jpountz commented on a change in pull request #418: LUCENE-10061: Implements dynamic pruning support for CombinedFieldsQuery

GitBox Tue, 30 Nov 2021 07:02:12 -0800


jpountz commented on a change in pull request #418:
URL: https://github.com/apache/lucene/pull/418#discussion_r759341441




##########
File path: 
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java
##########
@@ -441,6 +495,292 @@ public boolean isCacheable(LeafReaderContext ctx) {
     }
   }
 
+  /** Merge impacts for combined field. */
+  static ImpactsSource mergeImpacts(
+      Map<String, List<ImpactsEnum>> fieldsWithImpactsEnums,
+      Map<String, List<Impacts>> fieldsWithImpacts,
+      Map<String, List<Integer>> fieldTermDocFreq,
+      Map<String, Float> fieldWeights) {
+    return new ImpactsSource() {
+      Impacts leadingImpacts = null;
+
+      class SubIterator {
+        final Iterator<Impact> iterator;
+        int previousFreq;
+        Impact current;
+
+        SubIterator(Iterator<Impact> iterator) {
+          this.iterator = iterator;
+          this.current = iterator.next();
+        }
+
+        void next() {
+          previousFreq = current.freq;
+          if (iterator.hasNext() == false) {
+            current = null;
+          } else {
+            current = iterator.next();
+          }
+        }
+      }
+
+      @Override
+      public Impacts getImpacts() throws IOException {
+        // Use the impacts that have the lower next boundary (doc id in skip 
entry) as a lead for
+        // each field
+        // They collectively will decide on the number of levels and the block 
boundaries.
+
+        if (leadingImpacts == null) {
+          float maxWeight = Float.MIN_VALUE;
+          String maxWeightField = "";
+
+          for (Map.Entry<String, Float> fieldWeightEntry : 
fieldWeights.entrySet()) {
+            String field = fieldWeightEntry.getKey();
+            float weight = fieldWeightEntry.getValue();
+
+            if (maxWeight < weight) {
+              maxWeight = weight;
+              maxWeightField = field;
+            }
+          }

Review comment:
       Since field weights do not change over time, could we compute the field 
that has the higest weight up-front instead of doing it every time `getImpacts` 
is called?

##########
File path: 
lucene/sandbox/src/java/org/apache/lucene/sandbox/search/CombinedFieldQuery.java
##########
@@ -402,14 +423,30 @@ public Explanation explain(LeafReaderContext context, int 
doc) throws IOExceptio
     public Scorer scorer(LeafReaderContext context) throws IOException {
       List<PostingsEnum> iterators = new ArrayList<>();
       List<FieldAndWeight> fields = new ArrayList<>();
+      Map<String, List<ImpactsEnum>> fieldImpactsEnum = new 
HashMap<>(fieldAndWeights.size());
+      Map<String, List<Integer>> fieldTermDocFreq = new 
HashMap<>(fieldAndWeights.size());

Review comment:
       Do we actually nee this list of doc freqs? They would be equal to 
impactsEnum#cost all the time?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene] jpountz commented on a change in pull request #418: LUCENE-10061: Implements dynamic pruning support for CombinedFieldsQuery

Reply via email to