[GitHub] [lucene] gsmiller commented on a diff in pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

GitBox Mon, 13 Jun 2022 17:56:42 -0700


gsmiller commented on code in PR #914:
URL: https://github.com/apache/lucene/pull/914#discussion_r896240300



##########
lucene/facet/src/java/org/apache/lucene/facet/LongValueFacetCounts.java:
##########
@@ -346,6 +346,43 @@ private void increment(long value) {
     }
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws 
IOException {
+    if (dim.equals(field) == false) {
+      throw new IllegalArgumentException(
+          "invalid dim \"" + dim + "\"; should be \"" + field + "\"");
+    }
+    if (path.length != 0) {
+      throw new IllegalArgumentException("path.length should be 0");
+    }

Review Comment:
   minor: There's some common validation logic between this and 
`getTopChildren` you could factor out into a common helper method if you 
thought it made sense.



##########
lucene/facet/src/java/org/apache/lucene/facet/LongValueFacetCounts.java:
##########
@@ -346,6 +346,43 @@ private void increment(long value) {
     }
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws 
IOException {
+    if (dim.equals(field) == false) {
+      throw new IllegalArgumentException(
+          "invalid dim \"" + dim + "\"; should be \"" + field + "\"");
+    }
+    if (path.length != 0) {
+      throw new IllegalArgumentException("path.length should be 0");
+    }
+
+    List<LabelAndValue> labelValues = new ArrayList<>();
+    boolean countsAdded = false;
+    if (hashCounts.size() != 0) {
+      for (LongIntCursor c : hashCounts) {
+        int count = c.value;
+        if (count != 0) {
+          if (countsAdded == false && c.key >= counts.length) {
+            countsAdded = true;
+            appendCounts(labelValues);
+          }
+          labelValues.add(new LabelAndValue(Long.toString(c.key), count));
+        }
+      }
+    }
+
+    if (countsAdded == false) {
+      appendCounts(labelValues);
+    }
+
+    return new FacetResult(
+        field,
+        new String[0],
+        totCount,
+        labelValues.toArray(new LabelAndValue[0]),
+        labelValues.size());

Review Comment:
   It looks like you're trying to maintain value-sort-order, sort of like what 
`getAllChildrenSortByValue` is doing, but since we don't make any ordering 
guarantees, I think we can simplify this a little bit. What do you think of 
doing something like this?
   
   ```suggestion
       List<LabelAndValue> labelValues = new ArrayList<>();
       for (int i = 0; i < counts.length; i++) {
         if (counts[i] != 0) {
           labelValues.add(new LabelAndValue(Long.toString(i), counts[i]));
         }
       }
   
       if (hashCounts.size() != 0) {
         for (LongIntCursor c : hashCounts) {
           int count = c.value;
           if (count != 0) {
             labelValues.add(new LabelAndValue(Long.toString(c.key), c.value));
           }
         }
       }
   
       return new FacetResult(
           field,
           new String[0],
           totCount,
           labelValues.toArray(new LabelAndValue[0]),
           labelValues.size());
   ```



##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FloatTaxonomyFacets.java:
##########
@@ -101,6 +102,63 @@ public Number getSpecificValue(String dim, String... path) 
throws IOException {
     return values[ord];
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws 
IOException {

Review Comment:
   Please see my feedback on `IntTaxonomyFacets`. It should all apply here as 
well. Thanks!



##########
lucene/facet/src/java/org/apache/lucene/facet/sortedset/AbstractSortedSetDocValueFacetCounts.java:
##########
@@ -72,6 +72,40 @@ public FacetResult getTopChildren(int topN, String dim, 
String... path) throws I
     return createFacetResult(topChildrenForPath, dim, path);
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws 
IOException {
+    FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim);
+
+    if (dimConfig.hierarchical) {
+      int pathOrd = (int) dv.lookupTerm(new 
BytesRef(FacetsConfig.pathToString(dim, path)));
+      if (pathOrd < 0) {
+        // path was never indexed
+        return null;
+      }
+      SortedSetDocValuesReaderState.DimTree dimTree = state.getDimTree(dim);

Review Comment:
   As a general note, I think we usually prefer to _not_ fully qualify internal 
class names unless necessary (and prefer to import them directly instead). For 
example, we're already doing `import 
org.apache.lucene.facet.sortedset.SortedSetDocValuesReaderState.DimTree;`, so 
you can just say `DimTree` here instead of 
`SortedSetDocValuesReaderState.DimTree`. I'm betting your IDE is setup with 
this style, but something to just keep an eye on.



##########
lucene/facet/src/java/org/apache/lucene/facet/sortedset/AbstractSortedSetDocValueFacetCounts.java:
##########
@@ -72,6 +72,40 @@ public FacetResult getTopChildren(int topN, String dim, 
String... path) throws I
     return createFacetResult(topChildrenForPath, dim, path);
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws 
IOException {
+    FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim);
+
+    if (dimConfig.hierarchical) {
+      int pathOrd = (int) dv.lookupTerm(new 
BytesRef(FacetsConfig.pathToString(dim, path)));
+      if (pathOrd < 0) {
+        // path was never indexed
+        return null;
+      }
+      SortedSetDocValuesReaderState.DimTree dimTree = state.getDimTree(dim);
+      return getPathResult(dimConfig, dim, path, pathOrd, 
dimTree.iterator(pathOrd));
+    } else {
+      if (path.length > 0) {
+        throw new IllegalArgumentException(
+            "Field is not configured as hierarchical, path should be 0 
length");
+      }
+      OrdRange ordRange = state.getOrdRange(dim);
+      if (ordRange == null) {
+        // means dimension was never indexed
+        return null;
+      }
+      int dimOrd = ordRange.start;
+      PrimitiveIterator.OfInt childIt = ordRange.iterator();
+      if (dimConfig.multiValued && dimConfig.requireDimCount) {
+        // If the dim is multi-valued and requires dim counts, we know we've 
explicitly indexed
+        // the dimension and we need to skip past it so the iterator is 
positioned on the first
+        // child:
+        childIt.next();
+      }
+      return getPathResult(dimConfig, dim, null, dimOrd, childIt);
+    }
+  }

Review Comment:
   I think we can actually optimize this a little bit when looking up the dim 
ord and preparing the child iterator by borrowing some ideas from 
`getTopChildren`. Also, I don't think we need to break out the `getPathResult` 
method since we only call it from this method (and it's easy to collapse the 
invocation to a single place). So I'd suggest we inline that logic. Here's what 
I'm thinking. Hope you don't mind me taking a pass at this (and then remove 
`getPathResult` completely):
   
   ```suggestion
   @Override
     public FacetResult getAllChildren(String dim, String... path) throws 
IOException {
       FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim);
   
       // Determine the path ord and resolve an iterator to its immediate 
children. The logic for this
       // depends on whether-or-not the dimension is configured as hierarchical:
       final int pathOrd;
       final PrimitiveIterator.OfInt childIterator;
       if (dimConfig.hierarchical) {
         DimTree dimTree = state.getDimTree(dim);
         if (path.length > 0) {
           pathOrd = (int) dv.lookupTerm(new 
BytesRef(FacetsConfig.pathToString(dim, path)));
         } else {
           // If there's no path, this is a little more efficient to just look 
up the dim:
           pathOrd = dimTree.dimStartOrd;
         }
         if (pathOrd < 0) {
           // path was never indexed
           return null;
         }
         childIterator = dimTree.iterator(pathOrd);
       } else {
         if (path.length > 0) {
           throw new IllegalArgumentException(
             "Field is not configured as hierarchical, path should be 0 
length");
         }
         OrdRange ordRange = state.getOrdRange(dim);
         if (ordRange == null) {
           // means dimension was never indexed
           return null;
         }
         pathOrd = ordRange.start;
         childIterator = ordRange.iterator();
         if (dimConfig.multiValued && dimConfig.requireDimCount) {
           // If the dim is multi-valued and requires dim counts, we know we've 
explicitly indexed
           // the dimension and we need to skip past it so the iterator is 
positioned on the first
           // child:
           childIterator.next();
         }
       }
   
       // Compute the actual results:
       int pathCount = 0;
   
       List<LabelAndValue> labelValues = new ArrayList<>();
       while (childIterator.hasNext()) {
         int ord = childIterator.next();
         int count = getCount(ord);
         if (count > 0) {
           pathCount += count;
           final BytesRef term = dv.lookupOrd(ord);
           String[] parts = FacetsConfig.stringToPath(term.utf8ToString());
           labelValues.add(new LabelAndValue(parts[parts.length - 1], count));
         }
       }
   
       if (dimConfig.hierarchical) {
         pathCount = getCount(pathOrd);
       } else {
         // see if pathCount is actually reliable or needs to be reset
         if (dimConfig.multiValued) {
           if (dimConfig.requireDimCount) {
             pathCount = getCount(pathOrd);
           } else {
             pathCount = -1; // pathCount is inaccurate at this point, so set 
it to -1
           }
         }
       }
   
       return new FacetResult(
         dim, path, pathCount, labelValues.toArray(new LabelAndValue[0]), 
labelValues.size());
     }
   ```



##########
lucene/facet/src/java/org/apache/lucene/facet/StringValueFacetCounts.java:
##########
@@ -130,6 +131,41 @@ public StringValueFacetCounts(StringDocValuesReaderState 
state, FacetsCollector
     }
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws 
IOException {
+    if (dim.equals(field) == false) {
+      throw new IllegalArgumentException(
+          "invalid dim \"" + dim + "\"; should be \"" + field + "\"");
+    }
+    if (path.length != 0) {
+      throw new IllegalArgumentException("path.length should be 0");
+    }
+
+    int childCount = 0; // total number of labels with non-zero count

Review Comment:
   minor: Instead of keeping track of `childCound`, I think you can just 
reference `labelValues.size()` when you go to create `FacetResult` right?



##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java:
##########
@@ -163,6 +164,76 @@ public Number getSpecificValue(String dim, String... path) 
throws IOException {
     return getValue(ord);
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws 
IOException {
+    DimConfig dimConfig = verifyDim(dim);
+    FacetLabel cp = new FacetLabel(dim, path);
+    int dimOrd = taxoReader.getOrdinal(cp);
+    if (dimOrd == -1) {
+      return null;
+    }
+
+    int aggregatedValue = 0;
+    int childCount = 0;

Review Comment:
   I don't think we need this local variable right? Can you just use 
`ordinals.size()` when you go to create the response?



##########
lucene/facet/src/java/org/apache/lucene/facet/Facets.java:
##########
@@ -29,6 +29,12 @@ public abstract class Facets {
   /** Default constructor. */
   public Facets() {}
 
+  /**
+   * Returns all the children labels under the specified path. Returns null if 
the specified path

Review Comment:
   Maybe be specific that it returns all child labels *with non-zero counts*? 
Also, I wonder if we should mention something about the lack of a specific sort 
order? I could see some users assuming it would sort by count, but we're 
intentionally not doing that right? So maybe a note that users should generally 
make no assumptions about the child sort order?



##########
lucene/facet/src/java/org/apache/lucene/facet/sortedset/DefaultSortedSetDocValuesReaderState.java:
##########
@@ -381,7 +381,7 @@ public OrdRange getOrdRange(String dim) {
   public DimTree getDimTree(String dim) {
     if (config.getDimConfig(dim).hierarchical == false) {
       throw new UnsupportedOperationException(
-          "This opperation is only supported for hierarchical facets");
+          "This operation is only supported for hierarchical facets");

Review Comment:
   Nice catch :)



##########
lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java:
##########
@@ -163,6 +164,76 @@ public Number getSpecificValue(String dim, String... path) 
throws IOException {
     return getValue(ord);
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws 
IOException {
+    DimConfig dimConfig = verifyDim(dim);
+    FacetLabel cp = new FacetLabel(dim, path);
+    int dimOrd = taxoReader.getOrdinal(cp);
+    if (dimOrd == -1) {
+      return null;
+    }
+
+    int aggregatedValue = 0;
+    int childCount = 0;
+
+    List<Integer> ordinals = new ArrayList<>();
+    List<Integer> ordValues = new ArrayList<>();
+
+    if (sparseValues != null) {
+      for (IntIntCursor c : sparseValues) {
+        int value = c.value;
+        int ord = c.key;
+        if (parents[ord] == dimOrd && value > 0) {
+          aggregatedValue = aggregationFunction.aggregate(aggregatedValue, 
value);
+          childCount++;
+          ordinals.add(ord);
+          ordValues.add(value);
+        }
+      }
+    } else {
+      int[] children = getChildren();
+      int[] siblings = getSiblings();
+      int ord = children[dimOrd];
+      while (ord != TaxonomyReader.INVALID_ORDINAL) {
+        int value = values[ord];
+        if (value > 0) {
+          aggregatedValue = aggregationFunction.aggregate(aggregatedValue, 
value);
+          childCount++;
+          ordinals.add(ord);
+          ordValues.add(value);
+        }
+        ord = siblings[ord];
+      }
+    }
+
+    if (aggregatedValue == 0) {
+      return null;
+    }
+
+    if (dimConfig.multiValued) {
+      if (dimConfig.requireDimCount) {
+        aggregatedValue = getValue(dimOrd);
+      } else {
+        // Our sum'd value is not correct, in general:
+        aggregatedValue = -1;
+      }
+    } else {
+      // Our sum'd dim value is accurate, so we keep it
+    }
+
+    int[] ordinalArray = new int[ordinals.size()];
+    for (int i = 0; i < ordinals.size(); i++) {
+      ordinalArray[i] = ordinals.get(i);
+    }

Review Comment:
   Thanks for using `getBulkPath` below, but there's no need for this 
`ordinalArray` is there? Can't you just use `ordinals` directly for 
`getBulkPath` (and remove this code completely)?



##########
lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java:
##########
@@ -216,6 +216,24 @@ protected void count(String field, 
List<FacetsCollector.MatchingDocs> matchingDo
     totCount -= missingCount;
   }
 
+  // Return range labels in the order that specified by the user
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws 
IOException {
+    if (dim.equals(field) == false) {
+      throw new IllegalArgumentException(
+          "invalid dim \"" + dim + "\"; should be \"" + field + "\"");
+    }
+    if (path.length != 0) {
+      throw new IllegalArgumentException("path.length should be 0");
+    }
+    LabelAndValue[] labelValues = new LabelAndValue[counts.length];
+    for (int i = 0; i < counts.length; i++) {
+      labelValues[i] = new LabelAndValue(ranges[i].label, counts[i]);
+    }
+    return new FacetResult(dim, path, totCount, labelValues, 
labelValues.length);
+  }
+
+  // TODO: fix getTopChildren in LUCENE-10538
   @Override
   public FacetResult getTopChildren(int topN, String dim, String... path) {

Review Comment:
   Could you change `getTopChildren` to just call `getAllChildren` for now 
instead of having the copy/paste code? And maybe add a nice comment explaining 
what's going on?



##########
lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java:
##########
@@ -216,6 +216,24 @@ protected void count(String field, 
List<FacetsCollector.MatchingDocs> matchingDo
     totCount -= missingCount;
   }
 
+  // Return range labels in the order that specified by the user

Review Comment:
   Could you use javadoc please? Maybe something like:
   
   ```suggestion
     /**
      * {@inheritDoc}
      * <p>
      * NOTE: This implementation guarantees that ranges will be returned in 
the order specified by
      * the user when calling the constructor.
      */
   ```



##########
lucene/facet/src/java/org/apache/lucene/facet/StringValueFacetCounts.java:
##########
@@ -130,6 +131,41 @@ public StringValueFacetCounts(StringDocValuesReaderState 
state, FacetsCollector
     }
   }
 
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws 
IOException {
+    if (dim.equals(field) == false) {

Review Comment:
   minor: There some common validation logic shared with getTopChildren you 
could factor out if you think it makes sense.



##########
lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java:
##########
@@ -216,6 +216,24 @@ protected void count(String field, 
List<FacetsCollector.MatchingDocs> matchingDo
     totCount -= missingCount;
   }
 
+  // Return range labels in the order that specified by the user
+  @Override
+  public FacetResult getAllChildren(String dim, String... path) throws 
IOException {
+    if (dim.equals(field) == false) {
+      throw new IllegalArgumentException(
+          "invalid dim \"" + dim + "\"; should be \"" + field + "\"");
+    }
+    if (path.length != 0) {
+      throw new IllegalArgumentException("path.length should be 0");
+    }
+    LabelAndValue[] labelValues = new LabelAndValue[counts.length];
+    for (int i = 0; i < counts.length; i++) {
+      labelValues[i] = new LabelAndValue(ranges[i].label, counts[i]);
+    }
+    return new FacetResult(dim, path, totCount, labelValues, 
labelValues.length);
+  }
+
+  // TODO: fix getTopChildren in LUCENE-10538

Review Comment:
   I think LUCENE-10538 is a little overloaded, so I created a new issue to 
track the specific idea of creating a true `getTopChildren` implementation for 
range faceting. Could you reference LUCENE-10614 instead?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a diff in pull request #914: LUCENE-10550: Add getAllChildren functionality to facets

Reply via email to