gsmiller commented on code in PR #974:
URL: https://github.com/apache/lucene/pull/974#discussion_r907916354


##########
lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java:
##########
@@ -232,20 +233,43 @@ public FacetResult getAllChildren(String dim, String... 
path) throws IOException
     return new FacetResult(dim, path, totCount, labelValues, 
labelValues.length);
   }
 
-  // The current getTopChildren method is not returning "top" ranges. Instead, 
it returns all
-  // user-provided ranges in
-  // the order the user specified them when instantiating. This concept is 
being introduced and
-  // supported in the
-  // getAllChildren functionality in LUCENE-10550. getTopChildren is 
temporarily calling
-  // getAllChildren to maintain its
-  // current behavior, and the current implementation will be replaced by an 
actual "top children"
-  // implementation
-  // in LUCENE-10614
-  // TODO: fix getTopChildren in LUCENE-10614
   @Override
   public FacetResult getTopChildren(int topN, String dim, String... path) 
throws IOException {
     validateTopN(topN);
-    return getAllChildren(dim, path);
+    validateDimAndPathForGetChildren(dim, path);
+
+    int resultSize = Math.min(topN, counts.length);
+    PriorityQueue<LabelAndValue> pq =
+        new PriorityQueue<>(resultSize) {
+          @Override
+          protected boolean lessThan(LabelAndValue a, LabelAndValue b) {
+            int cmp = Integer.compare(a.value.intValue(), b.value.intValue());
+            if (cmp == 0) {
+              cmp = b.label.compareTo(a.label);
+            }
+            return cmp < 0;
+          }
+        };
+
+    for (int i = 0; i < counts.length; i++) {
+      if (pq.size() < resultSize) {
+        pq.add(new LabelAndValue(ranges[i].label, counts[i]));

Review Comment:
   I wonder if we should only add to the pq when the count is > 0 to be 
consistent with other Facet implementations. What do you think?



##########
lucene/demo/src/java/org/apache/lucene/demo/facet/DistanceFacetsExample.java:
##########
@@ -212,7 +212,26 @@ public static Query getBoundingBoxQuery(
   }
 
   /** User runs a query and counts facets. */
-  public FacetResult search() throws IOException {
+  public FacetResult searchAllChildren() throws IOException {
+
+    FacetsCollector fc = searcher.search(new MatchAllDocsQuery(), new 
FacetsCollectorManager());
+
+    Facets facets =
+        new DoubleRangeFacetCounts(
+            "field",
+            getDistanceValueSource(),
+            fc,
+            getBoundingBoxQuery(ORIGIN_LATITUDE, ORIGIN_LONGITUDE, 10.0),
+            ONE_KM,
+            TWO_KM,
+            FIVE_KM,
+            TEN_KM);
+
+    return facets.getAllChildren("field");
+  }
+
+  /** User runs a query and counts facets. */
+  public FacetResult searchTopChildren() throws IOException {

Review Comment:
   I'm not totally sold we need to demo the `getTopChildren` functionality. It 
feels like it will be a little obscure for range faceting to me. What do you 
think of just changing the existing example code in-place to use 
`getAllChildren` instead of `getTopChildren` since that's probably the more 
common use-case? Curious what you think though. Do you think we should demo 
`getTopChildren` as well?



##########
lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java:
##########
@@ -232,20 +233,43 @@ public FacetResult getAllChildren(String dim, String... 
path) throws IOException
     return new FacetResult(dim, path, totCount, labelValues, 
labelValues.length);
   }
 
-  // The current getTopChildren method is not returning "top" ranges. Instead, 
it returns all
-  // user-provided ranges in
-  // the order the user specified them when instantiating. This concept is 
being introduced and
-  // supported in the
-  // getAllChildren functionality in LUCENE-10550. getTopChildren is 
temporarily calling
-  // getAllChildren to maintain its
-  // current behavior, and the current implementation will be replaced by an 
actual "top children"
-  // implementation
-  // in LUCENE-10614
-  // TODO: fix getTopChildren in LUCENE-10614
   @Override
   public FacetResult getTopChildren(int topN, String dim, String... path) 
throws IOException {
     validateTopN(topN);
-    return getAllChildren(dim, path);
+    validateDimAndPathForGetChildren(dim, path);
+
+    int resultSize = Math.min(topN, counts.length);
+    PriorityQueue<LabelAndValue> pq =
+        new PriorityQueue<>(resultSize) {
+          @Override
+          protected boolean lessThan(LabelAndValue a, LabelAndValue b) {
+            int cmp = Integer.compare(a.value.intValue(), b.value.intValue());
+            if (cmp == 0) {
+              cmp = b.label.compareTo(a.label);
+            }
+            return cmp < 0;
+          }
+        };
+
+    for (int i = 0; i < counts.length; i++) {
+      if (pq.size() < resultSize) {
+        pq.add(new LabelAndValue(ranges[i].label, counts[i]));
+      } else {
+        int topValue = pq.top().value.intValue();
+        if (counts[i] > topValue
+            || (counts[i] == topValue && 
ranges[i].label.compareTo(pq.top().label) < 0)) {
+          pq.updateTop(new LabelAndValue(ranges[i].label, counts[i]));
+        }
+      }

Review Comment:
   Ideally, you'd do something like this here to avoid creating unnecessary 
garbage:
   
   ```
   } else {
           LabelAndValue top = pq.top();
           assert top != null;
           int topValue = top.value.intValue();
           if (counts[i] > topValue || (counts[i] == topValue && 
ranges[i].label.compareTo(top.label) < 0)) {
             top.label = ranges[i].label;
             top.value = counts[i];
             pq.updateTop();
           }
         }
   ```
   
   ... but of course LabelAndValue is immutable. The way we do these elsewhere 
is to use some internal class to just store the pq entries, and then translate 
into `LabelAndValue` instances at the return. Have a look at 
`LongValueFacetCount#getTopChildren` as an example.
   
   I would suggest following this similar approach for consistency here. Of 
course, this might actually create more garbage if top-n and the total number 
of ranges are very similar. So there's probably an optimization opportunity 
here where we implement this differently depending on the ratio of top-n to the 
total number of ranges (but I would save that for a separate issue/exploration).



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to