[GitHub] [lucene] jpountz commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

GitBox Wed, 09 Feb 2022 03:00:47 -0800


jpountz commented on a change in pull request #658:
URL: https://github.com/apache/lucene/pull/658#discussion_r802539169




##########
File path: lucene/core/src/java/org/apache/lucene/search/PointRangeQuery.java
##########
@@ -369,6 +376,45 @@ public Scorer scorer(LeafReaderContext context) throws 
IOException {
         return scorerSupplier.get(Long.MAX_VALUE);
       }
 
+      @Override
+      public int count(LeafReaderContext context) throws IOException {
+        LeafReader reader = context.reader();
+
+        PointValues values = reader.getPointValues(field);
+        if (checkValidPointValues(values) == false) {
+          return 0;
+        }
+
+        if (reader.hasDeletions() == false
+            && numDims == 1
+            && values.getDocCount() == values.size()) {
+          // if all documents have at-most one point
+          final int[] intersectingLeafNodeCount = {0};
+          // create a custom IntersectVisitor that records the number of 
leafNodes that matched
+          final IntersectVisitor visitor =
+              new IntersectVisitor() {
+                @Override
+                public void visit(int docID) {
+                  intersectingLeafNodeCount[0]++;

Review comment:
       Let's throw an UnsupportedOperationException here and move the increment 
to `visit(int,byte[])`? Tt would be a bug if this method would ever get called 
since the point is to skip nodes that are contained by the query.

##########
File path: lucene/core/src/java/org/apache/lucene/index/PointValues.java
##########
@@ -369,6 +369,52 @@ private void intersect(IntersectVisitor visitor, PointTree 
pointTree) throws IOE
     }
   }
 
+  /**
+   * Finds the number of points matching the provided range conditions. Using 
this method is faster
+   * than calling {@link #intersect(IntersectVisitor)} to get the count of 
intersecting points. This
+   * method does not enforce live documents, therefore it should only be used 
when there are no
+   * deleted documents.
+   */
+  public final long countPoints(IntersectVisitor visitor) throws IOException {
+    final PointTree pointTree = getPointTree();
+    long countPoints = countPoints(visitor, pointTree);
+    assert pointTree.moveToParent()
+        == false; // just checking to make sure we ended the tree search at 
the root node
+    return countPoints;
+  }
+
+  private long countPoints(IntersectVisitor visitor, PointTree pointTree) 
throws IOException {
+    Relation r = visitor.compare(pointTree.getMinPackedValue(), 
pointTree.getMaxPackedValue());
+    switch (r) {
+      case CELL_OUTSIDE_QUERY:
+        // This cell is fully outside the query shape: return 0 as the count 
of its nodes
+        return 0;
+      case CELL_INSIDE_QUERY:
+        // This cell is fully inside the query shape: return the size of the 
entire node as the
+        // count
+        return pointTree.size();
+      case CELL_CROSSES_QUERY:
+        /*
+        The cell crosses the shape boundary, or the cell fully contains the 
query, so we fall
+        through and do full counting.
+        */
+        if (pointTree.moveToChild()) {
+          int cellCount = 0;
+          do {
+            cellCount += countPoints(visitor, pointTree);
+          } while (pointTree.moveToSibling());
+          pointTree.moveToParent();
+          return cellCount;
+        } else {
+          // we have reached a leaf node here.
+          pointTree.visitDocValues(visitor);
+          return 0; // the visitor has safely recorded the number of leaf 
nodes that matched
+        }
+      default:
+        throw new IllegalArgumentException("Unreachable code");
+    }
+  }
+

Review comment:
       I think I'd keep these two methods as implementation details of 
PointRangeQuery? The contract is a bit weird as the `IntersectVisitor` only 
collects documents that are on leaves that cross the query.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jpountz commented on a change in pull request #658: LUCENE-10378 Implement Weight#count for PointRangeQuery

Reply via email to