gsmiller commented on a change in pull request #747:
URL: https://github.com/apache/lucene/pull/747#discussion_r829161071



##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -143,9 +146,49 @@ private FacetResult getPathResult(
       String[] path,
       int pathOrd,
       PrimitiveIterator.OfInt childOrds,
-      int topN)
+      int topN,
+      HashMap<String, SortedSetDocValuesChildOrdsResult> cacheChildOrdsResult)

Review comment:
       If I'm reading the code correctly, this map is used in a "read only" way 
here, meaning you never put an entry in the cache—you only look to see if 
something else previously stored results there. Also, this method is specific 
to a single dimension. Because of this, I would suggest just passing in a 
single `SortedSetDocValuesChildOrdsResult` instance instead of the map. That 
instance could be `null` if no results have been memorized. I think that would 
make the code a little more readable. My first thought was, "why isn't this 
cache getting populated on a 'miss' here?"

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -190,20 +234,45 @@ private FacetResult getPathResult(
       String[] parts = FacetsConfig.stringToPath(term.utf8ToString());
       labelValues[i] = new LabelAndValue(parts[parts.length - 1], 
ordAndValue.value);
     }
+    return labelValues;
+  }
 
-    if (dimConfig.hierarchical == false) {
+  /** Returns value/count of a dimension. */
+  private int getDimValue(
+      FacetsConfig.DimConfig dimConfig,
+      String dim,
+      int pathOrd,
+      PrimitiveIterator.OfInt childOrds,
+      int topN,
+      HashMap<String, SortedSetDocValuesChildOrdsResult> cacheChildOrdsResult) 
{
+
+    // if dimConfig.hierarchical == true, return dimCount directly
+    if (dimConfig.hierarchical == true && pathOrd >= 0) {
+      return counts[pathOrd];
+    }
+
+    // if dimConfig.hierarchical == false
+    if (dimConfig.multiValued) {
       // see if dimCount is actually reliable or needs to be reset
-      if (dimConfig.multiValued) {
-        if (dimConfig.requireDimCount) {
-          dimCount = counts[pathOrd];
-        } else {
-          dimCount = -1; // dimCount is in accurate at this point, so set it 
to -1
-        }
+      if (dimConfig.requireDimCount && pathOrd >= 0) {
+        return counts[pathOrd];
+      } else {
+        return -1; // dimCount is inaccurate at this point, so set it to -1
       }
-      return new FacetResult(dim, emptyPath, dimCount, labelValues, 
childCount);
-    } else {
-      return new FacetResult(dim, path, counts[pathOrd], labelValues, 
childCount);
     }
+
+    // if dimCount was not aggregated at indexing time, iterate over childOrds 
to get dimCount
+    SortedSetDocValuesChildOrdsResult childOrdsResult = 
getChildOrdsResult(childOrds, topN);
+    if (childOrdsResult.q == null) {
+      return 0;
+    }

Review comment:
       This can just be removed right? The `return childOrdsResult.dimCount` 
path below will return `0` in this case won't it?

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -414,4 +505,101 @@ public int compare(FacetResult a, FacetResult b) {
 
     return results;
   }
+
+  @Override
+  public List<FacetResult> getTopDims(int topNDims, int topNChildren) throws 
IOException {
+    // Creates priority queue to store top dimensions and sort by their 
aggregated values/hits and
+    // string values.
+    PriorityQueue<SortedSetDocValuesDimValueResult> pq =
+        new PriorityQueue<>(topNDims) {
+          @Override
+          protected boolean lessThan(
+              SortedSetDocValuesDimValueResult a, 
SortedSetDocValuesDimValueResult b) {
+            if (a.value.intValue() > b.value.intValue()) {
+              return false;
+            } else if (a.value.intValue() < b.value.intValue()) {
+              return true;
+            } else {
+              return a.dim.compareTo(b.dim) > 0;
+            }
+          }
+        };
+
+    HashMap<String, SortedSetDocValuesChildOrdsResult> cacheChildOrdsResult = 
new HashMap<>();

Review comment:
       I wonder if there's a way to use the dimension's ordinal as the key 
instead of the string. I haven't looked closely enough to see how difficult 
that would be, but if it's possible, it would be more efficient to use the hppc 
`IntIntHashMap` here instead of Java's HashMap.

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -414,4 +505,101 @@ public int compare(FacetResult a, FacetResult b) {
 
     return results;
   }
+
+  @Override
+  public List<FacetResult> getTopDims(int topNDims, int topNChildren) throws 
IOException {
+    // Creates priority queue to store top dimensions and sort by their 
aggregated values/hits and
+    // string values.
+    PriorityQueue<SortedSetDocValuesDimValueResult> pq =
+        new PriorityQueue<>(topNDims) {
+          @Override
+          protected boolean lessThan(
+              SortedSetDocValuesDimValueResult a, 
SortedSetDocValuesDimValueResult b) {
+            if (a.value.intValue() > b.value.intValue()) {
+              return false;
+            } else if (a.value.intValue() < b.value.intValue()) {
+              return true;
+            } else {
+              return a.dim.compareTo(b.dim) > 0;
+            }

Review comment:
       minor: Another option for doing this that some people prefer is below. 
I'm happy with either so don't feel you need to change this, but thought I'd 
suggest it in case you hadn't seen this approach:
   
   ```suggestion
               int cmp = Integer.compare(a.value.intValue(), 
b.value.intValue());
               if (cmp == 0) {
                 cmp = b.dim.compareTo(a.dim);
               }
               return cmp < 0;
   ```

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -414,4 +505,101 @@ public int compare(FacetResult a, FacetResult b) {
 
     return results;
   }
+
+  @Override
+  public List<FacetResult> getTopDims(int topNDims, int topNChildren) throws 
IOException {
+    // Creates priority queue to store top dimensions and sort by their 
aggregated values/hits and
+    // string values.
+    PriorityQueue<SortedSetDocValuesDimValueResult> pq =
+        new PriorityQueue<>(topNDims) {
+          @Override
+          protected boolean lessThan(
+              SortedSetDocValuesDimValueResult a, 
SortedSetDocValuesDimValueResult b) {
+            if (a.value.intValue() > b.value.intValue()) {
+              return false;
+            } else if (a.value.intValue() < b.value.intValue()) {
+              return true;
+            } else {
+              return a.dim.compareTo(b.dim) > 0;
+            }
+          }
+        };
+
+    HashMap<String, SortedSetDocValuesChildOrdsResult> cacheChildOrdsResult = 
new HashMap<>();
+
+    for (String dim : state.getDims()) {
+      FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim);
+      if (dimConfig.hierarchical) {
+        DimTree dimTree = state.getDimTree(dim);
+        int dimOrd = dimTree.dimStartOrd;
+        // get dim value
+        int dimCount =
+            getDimValue(
+                dimConfig, dim, dimOrd, dimTree.iterator(), topNChildren, 
cacheChildOrdsResult);
+        if (dimCount != 0) {
+          // use priority queue to store SortedSetDocValuesDimValueResult for 
topNDims
+          pq.insertWithOverflow(new SortedSetDocValuesDimValueResult(dim, 
dimCount));
+        }
+      } else {
+        OrdRange ordRange = state.getOrdRange(dim);
+        int dimOrd = ordRange.start;
+        PrimitiveIterator.OfInt childIt = ordRange.iterator();
+        if (dimConfig.multiValued && dimConfig.requireDimCount) {
+          // If the dim is multi-valued and requires dim counts, we know we've 
explicitly indexed
+          // the dimension and we need to skip past it so the iterator is 
positioned on the first
+          // child:
+          childIt.next();
+        }
+        int dimCount =
+            getDimValue(dimConfig, dim, dimOrd, childIt, topNChildren, 
cacheChildOrdsResult);
+        if (dimCount != 0) {
+          pq.insertWithOverflow(new SortedSetDocValuesDimValueResult(dim, 
dimCount));
+        }
+      }
+    }
+
+    // get FacetResult for topNDims
+    List<FacetResult> results = new LinkedList<>();
+    while (pq.size() > 0) {
+      SortedSetDocValuesDimValueResult dimValueResult = pq.pop();
+      if (dimValueResult != null) {

Review comment:
       Do we ever put `null` into the queue? We shouldn't right?

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -414,4 +505,101 @@ public int compare(FacetResult a, FacetResult b) {
 
     return results;
   }
+
+  @Override
+  public List<FacetResult> getTopDims(int topNDims, int topNChildren) throws 
IOException {
+    // Creates priority queue to store top dimensions and sort by their 
aggregated values/hits and
+    // string values.
+    PriorityQueue<SortedSetDocValuesDimValueResult> pq =
+        new PriorityQueue<>(topNDims) {
+          @Override
+          protected boolean lessThan(
+              SortedSetDocValuesDimValueResult a, 
SortedSetDocValuesDimValueResult b) {
+            if (a.value.intValue() > b.value.intValue()) {
+              return false;
+            } else if (a.value.intValue() < b.value.intValue()) {
+              return true;
+            } else {
+              return a.dim.compareTo(b.dim) > 0;
+            }
+          }
+        };
+
+    HashMap<String, SortedSetDocValuesChildOrdsResult> cacheChildOrdsResult = 
new HashMap<>();
+
+    for (String dim : state.getDims()) {
+      FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim);
+      if (dimConfig.hierarchical) {
+        DimTree dimTree = state.getDimTree(dim);
+        int dimOrd = dimTree.dimStartOrd;
+        // get dim value
+        int dimCount =
+            getDimValue(
+                dimConfig, dim, dimOrd, dimTree.iterator(), topNChildren, 
cacheChildOrdsResult);
+        if (dimCount != 0) {
+          // use priority queue to store SortedSetDocValuesDimValueResult for 
topNDims
+          pq.insertWithOverflow(new SortedSetDocValuesDimValueResult(dim, 
dimCount));
+        }
+      } else {
+        OrdRange ordRange = state.getOrdRange(dim);
+        int dimOrd = ordRange.start;
+        PrimitiveIterator.OfInt childIt = ordRange.iterator();
+        if (dimConfig.multiValued && dimConfig.requireDimCount) {
+          // If the dim is multi-valued and requires dim counts, we know we've 
explicitly indexed
+          // the dimension and we need to skip past it so the iterator is 
positioned on the first
+          // child:
+          childIt.next();
+        }
+        int dimCount =
+            getDimValue(dimConfig, dim, dimOrd, childIt, topNChildren, 
cacheChildOrdsResult);
+        if (dimCount != 0) {
+          pq.insertWithOverflow(new SortedSetDocValuesDimValueResult(dim, 
dimCount));
+        }
+      }
+    }
+
+    // get FacetResult for topNDims
+    List<FacetResult> results = new LinkedList<>();
+    while (pq.size() > 0) {
+      SortedSetDocValuesDimValueResult dimValueResult = pq.pop();
+      if (dimValueResult != null) {
+        FacetResult factResult =
+            getFacetResultForDim(dimValueResult.dim, topNChildren, 
cacheChildOrdsResult);
+        if (factResult != null) {

Review comment:
       Similar question: Can these ever be `null`?

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -414,4 +505,101 @@ public int compare(FacetResult a, FacetResult b) {
 
     return results;
   }
+
+  @Override
+  public List<FacetResult> getTopDims(int topNDims, int topNChildren) throws 
IOException {
+    // Creates priority queue to store top dimensions and sort by their 
aggregated values/hits and
+    // string values.
+    PriorityQueue<SortedSetDocValuesDimValueResult> pq =
+        new PriorityQueue<>(topNDims) {
+          @Override
+          protected boolean lessThan(
+              SortedSetDocValuesDimValueResult a, 
SortedSetDocValuesDimValueResult b) {
+            if (a.value.intValue() > b.value.intValue()) {
+              return false;
+            } else if (a.value.intValue() < b.value.intValue()) {
+              return true;
+            } else {
+              return a.dim.compareTo(b.dim) > 0;
+            }
+          }
+        };
+
+    HashMap<String, SortedSetDocValuesChildOrdsResult> cacheChildOrdsResult = 
new HashMap<>();
+
+    for (String dim : state.getDims()) {
+      FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim);
+      if (dimConfig.hierarchical) {
+        DimTree dimTree = state.getDimTree(dim);
+        int dimOrd = dimTree.dimStartOrd;
+        // get dim value
+        int dimCount =
+            getDimValue(
+                dimConfig, dim, dimOrd, dimTree.iterator(), topNChildren, 
cacheChildOrdsResult);
+        if (dimCount != 0) {
+          // use priority queue to store SortedSetDocValuesDimValueResult for 
topNDims
+          pq.insertWithOverflow(new SortedSetDocValuesDimValueResult(dim, 
dimCount));

Review comment:
       Could you have a look at how PQ insertion works elsewhere in this file 
and try to follow the same paradigm of reusing instances and inline updating 
the bottom value instead of always instantiating new instances of 
`SortedSetDocValuesDimValueResult`? We can save some object creation here.

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -366,33 +435,55 @@ public Number getSpecificValue(String dim, String... 
path) throws IOException {
     return counts[ord];
   }
 
+  /** Returns FacetResult for a dimension. */
+  private FacetResult getFacetResultForDim(
+      String dim,
+      int topNChildren,
+      HashMap<String, SortedSetDocValuesChildOrdsResult> cacheChildOrdsResult)
+      throws IOException {
+    FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim);
+    if (dimConfig.hierarchical) {
+      DimTree dimTree = state.getDimTree(dim);
+      int dimOrd = dimTree.dimStartOrd;
+      FacetResult fr =
+          getPathResult(
+              dimConfig,
+              dim,
+              emptyPath,
+              dimOrd,
+              dimTree.iterator(),
+              topNChildren,
+              cacheChildOrdsResult);
+      if (fr != null) {
+        return fr;
+      }
+    } else {
+      OrdRange ordRange = state.getOrdRange(dim);
+      int dimOrd = ordRange.start;
+      PrimitiveIterator.OfInt childIt = ordRange.iterator();
+      if (dimConfig.multiValued && dimConfig.requireDimCount) {
+        // If the dim is multi-valued and requires dim counts, we know we've 
explicitly indexed
+        // the dimension and we need to skip past it so the iterator is 
positioned on the first
+        // child:
+        childIt.next();
+      }
+      FacetResult fr =
+          getPathResult(
+              dimConfig, dim, emptyPath, dimOrd, childIt, topNChildren, 
cacheChildOrdsResult);
+      if (fr != null) {
+        return fr;
+      }
+    }
+    return null;
+  }

Review comment:
       minor: You can simplify the return branches here a little bit:
   
   ```suggestion
   FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim);
       if (dimConfig.hierarchical) {
         DimTree dimTree = state.getDimTree(dim);
         int dimOrd = dimTree.dimStartOrd;
         return getPathResult(
                 dimConfig,
                 dim,
                 emptyPath,
                 dimOrd,
                 dimTree.iterator(),
                 topNChildren,
                 cacheChildOrdsResult);
       } else {
         OrdRange ordRange = state.getOrdRange(dim);
         int dimOrd = ordRange.start;
         PrimitiveIterator.OfInt childIt = ordRange.iterator();
         if (dimConfig.multiValued && dimConfig.requireDimCount) {
           // If the dim is multi-valued and requires dim counts, we know we've 
explicitly indexed
           // the dimension and we need to skip past it so the iterator is 
positioned on the first
           // child:
           childIt.next();
         }
         return getPathResult(
                 dimConfig, dim, emptyPath, dimOrd, childIt, topNChildren, 
cacheChildOrdsResult);
       }
   ```

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -190,20 +234,45 @@ private FacetResult getPathResult(
       String[] parts = FacetsConfig.stringToPath(term.utf8ToString());
       labelValues[i] = new LabelAndValue(parts[parts.length - 1], 
ordAndValue.value);
     }
+    return labelValues;
+  }
 
-    if (dimConfig.hierarchical == false) {
+  /** Returns value/count of a dimension. */
+  private int getDimValue(
+      FacetsConfig.DimConfig dimConfig,
+      String dim,
+      int pathOrd,
+      PrimitiveIterator.OfInt childOrds,
+      int topN,
+      HashMap<String, SortedSetDocValuesChildOrdsResult> cacheChildOrdsResult) 
{
+
+    // if dimConfig.hierarchical == true, return dimCount directly
+    if (dimConfig.hierarchical == true && pathOrd >= 0) {

Review comment:
       Did you run into an issue where `pathOrd` was negative here? Curious how 
this check came about. I would assume lots of other things would break if we 
got in a situation where `pathOrd` was negative, but maybe you found some 
interesting case?

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -414,4 +505,101 @@ public int compare(FacetResult a, FacetResult b) {
 
     return results;
   }
+
+  @Override
+  public List<FacetResult> getTopDims(int topNDims, int topNChildren) throws 
IOException {
+    // Creates priority queue to store top dimensions and sort by their 
aggregated values/hits and
+    // string values.
+    PriorityQueue<SortedSetDocValuesDimValueResult> pq =
+        new PriorityQueue<>(topNDims) {
+          @Override
+          protected boolean lessThan(
+              SortedSetDocValuesDimValueResult a, 
SortedSetDocValuesDimValueResult b) {
+            if (a.value.intValue() > b.value.intValue()) {
+              return false;
+            } else if (a.value.intValue() < b.value.intValue()) {
+              return true;
+            } else {
+              return a.dim.compareTo(b.dim) > 0;
+            }
+          }
+        };
+
+    HashMap<String, SortedSetDocValuesChildOrdsResult> cacheChildOrdsResult = 
new HashMap<>();
+
+    for (String dim : state.getDims()) {
+      FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim);
+      if (dimConfig.hierarchical) {
+        DimTree dimTree = state.getDimTree(dim);
+        int dimOrd = dimTree.dimStartOrd;
+        // get dim value
+        int dimCount =
+            getDimValue(
+                dimConfig, dim, dimOrd, dimTree.iterator(), topNChildren, 
cacheChildOrdsResult);
+        if (dimCount != 0) {
+          // use priority queue to store SortedSetDocValuesDimValueResult for 
topNDims
+          pq.insertWithOverflow(new SortedSetDocValuesDimValueResult(dim, 
dimCount));
+        }
+      } else {
+        OrdRange ordRange = state.getOrdRange(dim);
+        int dimOrd = ordRange.start;
+        PrimitiveIterator.OfInt childIt = ordRange.iterator();
+        if (dimConfig.multiValued && dimConfig.requireDimCount) {
+          // If the dim is multi-valued and requires dim counts, we know we've 
explicitly indexed
+          // the dimension and we need to skip past it so the iterator is 
positioned on the first
+          // child:
+          childIt.next();
+        }
+        int dimCount =
+            getDimValue(dimConfig, dim, dimOrd, childIt, topNChildren, 
cacheChildOrdsResult);
+        if (dimCount != 0) {
+          pq.insertWithOverflow(new SortedSetDocValuesDimValueResult(dim, 
dimCount));
+        }
+      }
+    }
+
+    // get FacetResult for topNDims
+    List<FacetResult> results = new LinkedList<>();

Review comment:
       Could we just use a `FacetResult[]` here instead since we know the exact 
size it should be? Then you can add to it in reverse like you need, and just 
return `Arrays.asList(results)` at the end.

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -414,4 +505,101 @@ public int compare(FacetResult a, FacetResult b) {
 
     return results;
   }
+
+  @Override
+  public List<FacetResult> getTopDims(int topNDims, int topNChildren) throws 
IOException {
+    // Creates priority queue to store top dimensions and sort by their 
aggregated values/hits and
+    // string values.
+    PriorityQueue<SortedSetDocValuesDimValueResult> pq =
+        new PriorityQueue<>(topNDims) {
+          @Override
+          protected boolean lessThan(
+              SortedSetDocValuesDimValueResult a, 
SortedSetDocValuesDimValueResult b) {
+            if (a.value.intValue() > b.value.intValue()) {
+              return false;
+            } else if (a.value.intValue() < b.value.intValue()) {
+              return true;
+            } else {
+              return a.dim.compareTo(b.dim) > 0;
+            }
+          }
+        };
+
+    HashMap<String, SortedSetDocValuesChildOrdsResult> cacheChildOrdsResult = 
new HashMap<>();
+
+    for (String dim : state.getDims()) {
+      FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim);
+      if (dimConfig.hierarchical) {
+        DimTree dimTree = state.getDimTree(dim);
+        int dimOrd = dimTree.dimStartOrd;
+        // get dim value
+        int dimCount =
+            getDimValue(
+                dimConfig, dim, dimOrd, dimTree.iterator(), topNChildren, 
cacheChildOrdsResult);
+        if (dimCount != 0) {
+          // use priority queue to store SortedSetDocValuesDimValueResult for 
topNDims
+          pq.insertWithOverflow(new SortedSetDocValuesDimValueResult(dim, 
dimCount));
+        }
+      } else {
+        OrdRange ordRange = state.getOrdRange(dim);
+        int dimOrd = ordRange.start;
+        PrimitiveIterator.OfInt childIt = ordRange.iterator();
+        if (dimConfig.multiValued && dimConfig.requireDimCount) {
+          // If the dim is multi-valued and requires dim counts, we know we've 
explicitly indexed
+          // the dimension and we need to skip past it so the iterator is 
positioned on the first
+          // child:
+          childIt.next();
+        }
+        int dimCount =
+            getDimValue(dimConfig, dim, dimOrd, childIt, topNChildren, 
cacheChildOrdsResult);
+        if (dimCount != 0) {
+          pq.insertWithOverflow(new SortedSetDocValuesDimValueResult(dim, 
dimCount));
+        }
+      }
+    }
+
+    // get FacetResult for topNDims
+    List<FacetResult> results = new LinkedList<>();
+    while (pq.size() > 0) {
+      SortedSetDocValuesDimValueResult dimValueResult = pq.pop();
+      if (dimValueResult != null) {
+        FacetResult factResult =
+            getFacetResultForDim(dimValueResult.dim, topNChildren, 
cacheChildOrdsResult);
+        if (factResult != null) {
+          results.add(0, factResult);
+        }
+      }
+    }
+    return results;
+  }
+
+  /**
+   * Creates SortedSetDocValuesChildOrdsResult to store dimCount, childCount, 
and TopOrdAndIntQueue
+   * q for getPathResult.
+   */
+  private class SortedSetDocValuesChildOrdsResult {
+    final int dimCount;
+    final int childCount;
+    final TopOrdAndIntQueue q;
+
+    SortedSetDocValuesChildOrdsResult(int dimCount, int childCount, 
TopOrdAndIntQueue q) {
+      this.dimCount = dimCount;
+      this.childCount = childCount;
+      this.q = q;
+    }
+  }
+
+  /**
+   * Creates SortedSetDocValuesDimValueResult to store the label and count of 
dim in order to sort
+   * by these two fields.
+   */
+  private class SortedSetDocValuesDimValueResult {

Review comment:
       Please make this `static`. Also, similar minor comment about considering 
a less verbose name.

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -143,9 +146,49 @@ private FacetResult getPathResult(
       String[] path,
       int pathOrd,
       PrimitiveIterator.OfInt childOrds,
-      int topN)
+      int topN,
+      HashMap<String, SortedSetDocValuesChildOrdsResult> cacheChildOrdsResult)
       throws IOException {
 
+    SortedSetDocValuesChildOrdsResult childOrdsResult;
+
+    // if getTopDims is called, get results from cacheChildOrdsResult, 
otherwise call
+    // getChildOrdsResult to get dimCount, childCount and TopOrdAndIntQueue q
+    if (cacheChildOrdsResult != null && cacheChildOrdsResult.containsKey(dim)) 
{
+      childOrdsResult = cacheChildOrdsResult.get(dim);
+    } else {
+      childOrdsResult = getChildOrdsResult(childOrds, topN);
+    }
+
+    if (childOrdsResult.q == null) {
+      return null;
+    }
+
+    LabelAndValue[] labelValues = 
getLabelValuesFromTopOrdAndIntQueue(childOrdsResult.q);
+
+    int dimCount = childOrdsResult.dimCount;

Review comment:
       I wonder if this logic would be better placed in `getChildOrdsResults`? 
That way `SortedSetDocValuesChildOrdsResult` would always have an accurate 
`dimCount` (or `-1` if an accurate value can't be computed). What do you think?

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/Facets.java
##########
@@ -48,4 +48,13 @@ public abstract FacetResult getTopChildren(int topN, String 
dim, String... path)
    * indexed, for example depending on the type of document.
    */
   public abstract List<FacetResult> getAllDims(int topN) throws IOException;
+
+  /**
+   * Returns labels for topN dimensions and their topNChildren sorted by the 
number of hits that
+   * dimension matched
+   */
+  public List<FacetResult> getTopDims(int topNDims, int topNChildren) throws 
IOException {

Review comment:
       Thanks for the update! It might be nice to also state that results 
should be the same as calling `getAllDims` and then only using the first 
`topNDims`, but may be more efficient.

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -414,4 +505,101 @@ public int compare(FacetResult a, FacetResult b) {
 
     return results;
   }
+
+  @Override
+  public List<FacetResult> getTopDims(int topNDims, int topNChildren) throws 
IOException {
+    // Creates priority queue to store top dimensions and sort by their 
aggregated values/hits and
+    // string values.
+    PriorityQueue<SortedSetDocValuesDimValueResult> pq =
+        new PriorityQueue<>(topNDims) {
+          @Override
+          protected boolean lessThan(
+              SortedSetDocValuesDimValueResult a, 
SortedSetDocValuesDimValueResult b) {
+            if (a.value.intValue() > b.value.intValue()) {
+              return false;
+            } else if (a.value.intValue() < b.value.intValue()) {
+              return true;
+            } else {
+              return a.dim.compareTo(b.dim) > 0;
+            }
+          }
+        };
+
+    HashMap<String, SortedSetDocValuesChildOrdsResult> cacheChildOrdsResult = 
new HashMap<>();
+
+    for (String dim : state.getDims()) {
+      FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim);
+      if (dimConfig.hierarchical) {
+        DimTree dimTree = state.getDimTree(dim);
+        int dimOrd = dimTree.dimStartOrd;
+        // get dim value
+        int dimCount =
+            getDimValue(
+                dimConfig, dim, dimOrd, dimTree.iterator(), topNChildren, 
cacheChildOrdsResult);
+        if (dimCount != 0) {
+          // use priority queue to store SortedSetDocValuesDimValueResult for 
topNDims
+          pq.insertWithOverflow(new SortedSetDocValuesDimValueResult(dim, 
dimCount));
+        }
+      } else {
+        OrdRange ordRange = state.getOrdRange(dim);
+        int dimOrd = ordRange.start;
+        PrimitiveIterator.OfInt childIt = ordRange.iterator();
+        if (dimConfig.multiValued && dimConfig.requireDimCount) {
+          // If the dim is multi-valued and requires dim counts, we know we've 
explicitly indexed
+          // the dimension and we need to skip past it so the iterator is 
positioned on the first
+          // child:
+          childIt.next();
+        }
+        int dimCount =
+            getDimValue(dimConfig, dim, dimOrd, childIt, topNChildren, 
cacheChildOrdsResult);
+        if (dimCount != 0) {
+          pq.insertWithOverflow(new SortedSetDocValuesDimValueResult(dim, 
dimCount));
+        }
+      }
+    }
+
+    // get FacetResult for topNDims
+    List<FacetResult> results = new LinkedList<>();
+    while (pq.size() > 0) {
+      SortedSetDocValuesDimValueResult dimValueResult = pq.pop();
+      if (dimValueResult != null) {
+        FacetResult factResult =
+            getFacetResultForDim(dimValueResult.dim, topNChildren, 
cacheChildOrdsResult);
+        if (factResult != null) {
+          results.add(0, factResult);
+        }
+      }
+    }
+    return results;
+  }
+
+  /**
+   * Creates SortedSetDocValuesChildOrdsResult to store dimCount, childCount, 
and TopOrdAndIntQueue
+   * q for getPathResult.
+   */
+  private class SortedSetDocValuesChildOrdsResult {

Review comment:
       Please make this `static`. You could also make the name a little shorter 
since it's just privately scoped and it's a little redundant to say 
"SortedSetDocValues". Maybe just `ChildOrdsResult`?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to