[GitHub] [lucene] gsmiller commented on a change in pull request #509: LUCENE-10250: Add support for arbitrary length hierarchical SSDV facets

GitBox Thu, 06 Jan 2022 17:12:09 -0800


gsmiller commented on a change in pull request #509:
URL: https://github.com/apache/lucene/pull/509#discussion_r779892023




##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetField.java
##########
@@ -41,19 +42,28 @@
   public final String dim;
 
   /** Label. */
-  public final String label;
+  public final String[] path;

Review comment:
       It's frustrating that these fields are public, but because they are, 
this isn't actually backwards compatible. It seems like these are public to 
make them visible to other code in the facets module that's in a different java 
package (specifically, `FacetsConfig`), but users could still be reading this 
field directly for some reason and have their code no longer compile because of 
this change. If we really want to keep this fully back-compat, I think we need 
to add a new field for `path` and maintain the `label` string (marked as 
deprecated). The label could be created in the ctor with 
`FacetsConfig#pathToString`.

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -95,17 +96,34 @@ public FacetResult getTopChildren(int topN, String dim, 
String... path) throws I
     if (topN <= 0) {
       throw new IllegalArgumentException("topN must be > 0 (got: " + topN + 
")");
     }
-    if (path.length > 0) {
-      throw new IllegalArgumentException("path should be 0 length");
-    }
-    OrdRange ordRange = state.getOrdRange(dim);
-    if (ordRange == null) {
-      return null; // means dimension was never indexed
+
+    if (state.isHierarchicalDim(dim)) {
+      int pathOrd = (int) dv.lookupTerm(new 
BytesRef(FacetsConfig.pathToString(dim, path)));
+      if (pathOrd == -1) {

Review comment:
       If you take a look at the javadoc (or implementation) for 
`SortedSetDocValues#lookupTerm`, I think you'll want to change this condition 
to check for a return value < 0 (instead of `-1` explicitly). Any negative 
return value indicates "not found".
   
   It would also be good to ensure there's a unit test covering this case. I 
suspect there may not be if you didn't find this as a bug in testing?

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/DefaultSortedSetDocValuesReaderState.java
##########
@@ -79,38 +104,160 @@ public DefaultSortedSetDocValuesReaderState(IndexReader 
reader, String field) th
     }
     valueCount = (int) dv.getValueCount();
 
-    // TODO: we can make this more efficient if eg we can be
-    // "involved" when OrdinalMap is being created?  Ie see
-    // each term/ord it's assigning as it goes...
-    String lastDim = null;
-    int startOrd = -1;
-
-    // TODO: this approach can work for full hierarchy?;
-    // TaxoReader can't do this since ords are not in
-    // "sorted order" ... but we should generalize this to
-    // support arbitrary hierarchy:
-    for (int ord = 0; ord < valueCount; ord++) {
-      final BytesRef term = dv.lookupOrd(ord);
+    int ord = 0;
+    while (ord != valueCount) {
+      BytesRef term = dv.lookupOrd(ord);
       String[] components = FacetsConfig.stringToPath(term.utf8ToString());
-      if (components.length != 2) {
-        throw new IllegalArgumentException(
-            "this class can only handle 2 level hierarchy (dim/value); got: "
-                + Arrays.toString(components)
-                + " "
-                + term.utf8ToString());
+      String dim = components[0];
+      if (config != null && config.getDimConfig(dim).hierarchical) {
+        ord = createOneHierarchicalFacetDimState(dv, ord) + 1;
+      } else {
+        ord = createOneFlatFacetDimState(dv, ord) + 1;
+      }
+    }
+  }
+
+  // returns last ord of dimension
+  private int createOneHierarchicalFacetDimState(SortedSetDocValues dv, int 
dimStartOrd)
+      throws IOException {
+    List<Boolean> hasChildren = new ArrayList<>();
+    List<Integer> siblings = new ArrayList<>();
+
+    String dim = null;
+
+    // stack of paths with unfulfilled siblings
+    Stack<OrdAndComponent> siblingStack = new Stack<>();
+
+    String[] nextComponents = null;
+    int dimEndOrd = dimStartOrd;
+    while (true) {
+      String[] components;
+
+      int ord = dimEndOrd - dimStartOrd;
+
+      if (nextComponents == null) {
+        BytesRef term = dv.lookupOrd(dimEndOrd);
+        components = FacetsConfig.stringToPath(term.utf8ToString());
+        dim = components[0];
+      } else {
+        components = nextComponents;
       }
-      if (!components[0].equals(lastDim)) {
-        if (lastDim != null) {
-          prefixToOrdRange.put(lastDim, new OrdRange(startOrd, ord - 1));
+
+      while (siblingStack.empty() == false
+          && siblingStack.peek().component.length >= components.length) {
+        OrdAndComponent possibleSibling = siblingStack.pop();
+        if (possibleSibling.component.length == components.length) {
+          // lengths are equal
+          boolean isSibling = true;
+          for (int i = 0; i < possibleSibling.component.length - 1; i++) {
+            if (possibleSibling.component[i].equals(components[i]) == false) {

Review comment:
       My reasoning might be incomplete, but I don't think you need this loop 
check. Isn't it enough that the lengths are equal? I think for some ancestry 
component to be different, we would have already popped this off right?

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/FacetsConfig.java
##########
@@ -471,19 +471,35 @@ private void indexDrillDownTerms(
 
   private void processSSDVFacetFields(
       Map<String, List<SortedSetDocValuesFacetField>> byField, Document doc) {
+
     for (Map.Entry<String, List<SortedSetDocValuesFacetField>> ent : 
byField.entrySet()) {
 
       String indexFieldName = ent.getKey();
 
       for (SortedSetDocValuesFacetField facetField : ent.getValue()) {
-        FacetLabel facetLabel = new FacetLabel(facetField.dim, 
facetField.label);
-        String fullPath = pathToString(facetLabel.components, 
facetLabel.length);
+        FacetLabel facetLabel = new FacetLabel(facetField.dim, 
facetField.path);
+        if (getDimConfig(facetField.dim).hierarchical) {

Review comment:
       minor: Maybe pull out the DimConfig into a local variable since you 
reference it a couple times.

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesReaderState.java
##########
@@ -62,15 +159,27 @@ protected SortedSetDocValuesReaderState() {}
   /** Indexed field we are reading. */
   public abstract String getField();
 
+  /** Returns top-level index reader. */
+  public abstract IndexReader getReader();
+
+  /** Number of unique labels. */
+  public abstract int getSize();
+
+  /** Returns if dim is configured as hierarchical ** */
+  public abstract boolean isHierarchicalDim(String dim);
+
+  /*** Only used for flat facets (dim/value) ***/
+
   /** Returns the {@link OrdRange} for this dimension. */
   public abstract OrdRange getOrdRange(String dim);
 
   /** Returns mapping from prefix to {@link OrdRange}. */
   public abstract Map<String, OrdRange> getPrefixToOrdRange();
 
-  /** Returns top-level index reader. */
-  public abstract IndexReader getReader();
+  /*** Only used for hierarchical facets ***/
 
-  /** Number of unique labels. */
-  public abstract int getSize();
+  public abstract DimTree getDimTree(String dim);
+
+  /** Returns a list of all dimensions and their respective ordinals */

Review comment:
       This doesn't return the ordinals right? Just the dimension strings?

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -145,12 +164,19 @@ private FacetResult getDim(String dim, OrdRange ordRange, 
int topN) throws IOExc
     LabelAndValue[] labelValues = new LabelAndValue[q.size()];
     for (int i = labelValues.length - 1; i >= 0; i--) {
       TopOrdAndIntQueue.OrdAndValue ordAndValue = q.pop();
+      assert ordAndValue != null;
       final BytesRef term = dv.lookupOrd(ordAndValue.ord);
       String[] parts = FacetsConfig.stringToPath(term.utf8ToString());
-      labelValues[i] = new LabelAndValue(parts[1], ordAndValue.value);
+      labelValues[i] = new LabelAndValue(parts[parts.length - 1], 
ordAndValue.value);
     }
 
-    return new FacetResult(dim, new String[0], dimCount, labelValues, 
childCount);
+    if (pathOrd == -1) {
+      // not hierarchical facet
+      return new FacetResult(dim, new String[0], dimCount, labelValues, 
childCount);

Review comment:
       Maybe define a local constant as `new String[0]` that you reference in 
these spots instead?

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesReaderState.java
##########
@@ -51,8 +51,105 @@ public OrdRange(int start, int end) {
       this.start = start;
       this.end = end;
     }
+
+    /** Iterates from start to end ord (inclusive) * */

Review comment:
       minor: I see some one-line javadoc that has a funky extra '*' near the 
end. Could you do a pass and try to clean these up? I'm sure it's some combo of 
and IDE + spotless?

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesReaderState.java
##########
@@ -62,15 +159,27 @@ protected SortedSetDocValuesReaderState() {}
   /** Indexed field we are reading. */
   public abstract String getField();
 
+  /** Returns top-level index reader. */
+  public abstract IndexReader getReader();
+
+  /** Number of unique labels. */
+  public abstract int getSize();
+
+  /** Returns if dim is configured as hierarchical ** */
+  public abstract boolean isHierarchicalDim(String dim);

Review comment:
       What about actually exposing the `FacetsConfig` instead of just this 
one-off attribute of the config? That would also help with a separate issue 
where dim counts can be over-counted since SSDV faceting doesn't know 
whether-or-not it's multi-valued (LUCENE-9952).

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/FacetsConfig.java
##########
@@ -471,19 +471,35 @@ private void indexDrillDownTerms(
 
   private void processSSDVFacetFields(
       Map<String, List<SortedSetDocValuesFacetField>> byField, Document doc) {
+
     for (Map.Entry<String, List<SortedSetDocValuesFacetField>> ent : 
byField.entrySet()) {
 
       String indexFieldName = ent.getKey();
 
       for (SortedSetDocValuesFacetField facetField : ent.getValue()) {
-        FacetLabel facetLabel = new FacetLabel(facetField.dim, 
facetField.label);
-        String fullPath = pathToString(facetLabel.components, 
facetLabel.length);
+        FacetLabel facetLabel = new FacetLabel(facetField.dim, 
facetField.path);
+        if (getDimConfig(facetField.dim).hierarchical) {
+          for (int i = 0; i < facetLabel.length; i++) {
+            String fullPath = pathToString(facetLabel.components, i + 1);
+            // For facet counts:
+            doc.add(new SortedSetDocValuesField(indexFieldName, new 
BytesRef(fullPath)));
+
+            // For drill-down:
+            indexDrillDownTerms(doc, indexFieldName, 
getDimConfig(facetField.dim), facetLabel);
+          }
+        } else {
+          if (facetLabel.length != 2) {
+            throw new IllegalArgumentException(

Review comment:
       minor: Maybe try to base the exception message on the error checking 
already done for taxo-facets to make the wording a little more consistent?
   
   ```
           if (facetField.path.length > 1 && dimConfig.hierarchical == false) {
             throw new IllegalArgumentException(
                 "dimension \""
                     + facetField.dim
                     + "\" is not hierarchical yet has "
                     + facetField.path.length
                     + " components");
           }
   ```

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -95,17 +96,34 @@ public FacetResult getTopChildren(int topN, String dim, 
String... path) throws I
     if (topN <= 0) {
       throw new IllegalArgumentException("topN must be > 0 (got: " + topN + 
")");
     }
-    if (path.length > 0) {
-      throw new IllegalArgumentException("path should be 0 length");
-    }
-    OrdRange ordRange = state.getOrdRange(dim);
-    if (ordRange == null) {
-      return null; // means dimension was never indexed
+
+    if (state.isHierarchicalDim(dim)) {
+      int pathOrd = (int) dv.lookupTerm(new 
BytesRef(FacetsConfig.pathToString(dim, path)));
+      if (pathOrd == -1) {
+        // path was never indexed
+        return null;
+      }
+      DimTree dimTree = state.getDimTree(dim);
+      if (dimTree == null) {
+        return null;
+      }
+      return getDim(dim, path, pathOrd, dimTree.iterator(pathOrd), topN);
+    } else {
+      if (path.length > 0) {
+        throw new IllegalArgumentException(
+            "Field is not configured as hierarchical, path should be 0 
length");
+      }
+      OrdRange ordRange = state.getOrdRange(dim);
+      if (ordRange == null) {
+        return null;

Review comment:
       minor: Could you please retain the comment on this line? `// means 
dimension was never indexed`

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesReaderState.java
##########
@@ -51,8 +51,105 @@ public OrdRange(int start, int end) {
       this.start = start;
       this.end = end;
     }
+
+    /** Iterates from start to end ord (inclusive) * */
+    public PrimitiveIterator.OfInt iterator() {
+      return new PrimitiveIterator.OfInt() {
+        int current = start;
+
+        @Override
+        public int nextInt() {
+          if (current > end) {
+            return INVALID_ORDINAL;
+          }
+          return current++;
+        }
+
+        @Override
+        public boolean hasNext() {
+          return current <= end;
+        }
+      };
+    }
+  }
+
+  /**
+   * Holds children and sibling information for a single dimension. Only used 
with hierarchical
+   * dimensions.
+   */
+  public static final class DimTree {
+    private final FixedBitSet hasChildren;
+    // TODO: This array can take up a lot of space. Change type based on input 
size maybe?
+    private final int[] siblings;
+
+    /** The first ord of the dimension * */
+    public final int dimStartOrd;
+
+    /** Sibling and children must be of same length * */
+    public DimTree(int dimStartOrd, List<Integer> sibling, List<Boolean> 
hasChildren) {
+      assert sibling.size() == hasChildren.size();

Review comment:
       Throwing an IllegalArumentException might be a better fit here since 
these are user-provided parameters.

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -95,17 +96,34 @@ public FacetResult getTopChildren(int topN, String dim, 
String... path) throws I
     if (topN <= 0) {
       throw new IllegalArgumentException("topN must be > 0 (got: " + topN + 
")");
     }
-    if (path.length > 0) {
-      throw new IllegalArgumentException("path should be 0 length");
-    }
-    OrdRange ordRange = state.getOrdRange(dim);
-    if (ordRange == null) {
-      return null; // means dimension was never indexed
+
+    if (state.isHierarchicalDim(dim)) {
+      int pathOrd = (int) dv.lookupTerm(new 
BytesRef(FacetsConfig.pathToString(dim, path)));
+      if (pathOrd == -1) {
+        // path was never indexed
+        return null;
+      }
+      DimTree dimTree = state.getDimTree(dim);
+      if (dimTree == null) {

Review comment:
       Seems like this should never happen right? If we reach this point, it 
means the path was indexed. Shouldn't there always be a dim tree then?

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/DefaultSortedSetDocValuesReaderState.java
##########
@@ -79,38 +104,160 @@ public DefaultSortedSetDocValuesReaderState(IndexReader 
reader, String field) th
     }
     valueCount = (int) dv.getValueCount();
 
-    // TODO: we can make this more efficient if eg we can be
-    // "involved" when OrdinalMap is being created?  Ie see
-    // each term/ord it's assigning as it goes...
-    String lastDim = null;
-    int startOrd = -1;
-
-    // TODO: this approach can work for full hierarchy?;
-    // TaxoReader can't do this since ords are not in
-    // "sorted order" ... but we should generalize this to
-    // support arbitrary hierarchy:
-    for (int ord = 0; ord < valueCount; ord++) {
-      final BytesRef term = dv.lookupOrd(ord);
+    int ord = 0;
+    while (ord != valueCount) {
+      BytesRef term = dv.lookupOrd(ord);
       String[] components = FacetsConfig.stringToPath(term.utf8ToString());
-      if (components.length != 2) {
-        throw new IllegalArgumentException(
-            "this class can only handle 2 level hierarchy (dim/value); got: "
-                + Arrays.toString(components)
-                + " "
-                + term.utf8ToString());
+      String dim = components[0];
+      if (config != null && config.getDimConfig(dim).hierarchical) {
+        ord = createOneHierarchicalFacetDimState(dv, ord) + 1;
+      } else {
+        ord = createOneFlatFacetDimState(dv, ord) + 1;
+      }
+    }
+  }
+
+  // returns last ord of dimension
+  private int createOneHierarchicalFacetDimState(SortedSetDocValues dv, int 
dimStartOrd)
+      throws IOException {
+    List<Boolean> hasChildren = new ArrayList<>();
+    List<Integer> siblings = new ArrayList<>();
+
+    String dim = null;
+
+    // stack of paths with unfulfilled siblings
+    Stack<OrdAndComponent> siblingStack = new Stack<>();
+
+    String[] nextComponents = null;
+    int dimEndOrd = dimStartOrd;
+    while (true) {
+      String[] components;
+
+      int ord = dimEndOrd - dimStartOrd;
+
+      if (nextComponents == null) {
+        BytesRef term = dv.lookupOrd(dimEndOrd);
+        components = FacetsConfig.stringToPath(term.utf8ToString());
+        dim = components[0];
+      } else {
+        components = nextComponents;
       }
-      if (!components[0].equals(lastDim)) {
-        if (lastDim != null) {
-          prefixToOrdRange.put(lastDim, new OrdRange(startOrd, ord - 1));
+
+      while (siblingStack.empty() == false
+          && siblingStack.peek().component.length >= components.length) {
+        OrdAndComponent possibleSibling = siblingStack.pop();
+        if (possibleSibling.component.length == components.length) {
+          // lengths are equal
+          boolean isSibling = true;
+          for (int i = 0; i < possibleSibling.component.length - 1; i++) {
+            if (possibleSibling.component[i].equals(components[i]) == false) {
+              isSibling = false;
+              break;
+            }
+          }
+          if (isSibling) {
+            siblings.set(possibleSibling.ord, ord);
+          }
         }
-        startOrd = ord;
-        lastDim = components[0];
       }
+
+      if (dimEndOrd + 1 == valueCount) {
+        // current ord needs to be added, can't have children or siblings
+        siblings.add(-1);
+        hasChildren.add(false);
+        break;
+      }
+
+      BytesRef nextTerm = dv.lookupOrd(dimEndOrd + 1);
+      nextComponents = FacetsConfig.stringToPath(nextTerm.utf8ToString());
+
+      if (nextComponents[0].equals(components[0]) == false) {
+        // current ord needs to be added, can't have children or siblings
+        siblings.add(-1);
+        hasChildren.add(false);
+        break;
+      }
+
+      if (components.length < nextComponents.length) {
+        // next ord must be a direct child of current ord, this is because we 
are indexing all
+        // ancestral paths
+        hasChildren.add(ord, true);
+        // we don't know if this ord has a sibling or where it's sibling could 
be yet
+        siblingStack.push(new OrdAndComponent(ord, components));
+        // we still add INVALID_ORDINAL, which will be replaced if a valid 
sibling is found
+        siblings.add(ord, INVALID_ORDINAL);
+      } else if (components.length == nextComponents.length) {
+        // next ord must be a sibling of current and there are no direct 
children or current, this
+        // is because we
+        // are indexing all ancestral paths
+        siblings.add(ord, ord + 1);
+        hasChildren.add(ord, false);
+      } else {
+        // components.length > nextComponents.length
+        // next ord is neither sibling nor child
+        siblings.add(ord, INVALID_ORDINAL);
+        hasChildren.add(ord, false);
+      }
+
+      dimEndOrd++;
     }
 
-    if (lastDim != null) {
-      prefixToOrdRange.put(lastDim, new OrdRange(startOrd, valueCount - 1));
+    prefixToDimTree.put(dim, new DimTree(dimStartOrd, siblings, hasChildren));
+
+    return dimEndOrd;
+  }
+
+  // returns last ord of dimension
+  private int createOneFlatFacetDimState(SortedSetDocValues dv, int 
dimStartOrd)
+      throws IOException {
+
+    String dim = null;
+
+    String[] nextComponents = null;
+    int dimEndOrd = dimStartOrd;
+
+    while (true) {
+      String[] components;
+
+      if (nextComponents == null) {

Review comment:
       This should only be true on the first iteration of the loop right? Maybe 
it would be a little more readable to pull this outside the loop? So you'd 
initialize nextComponents to be the first entry for the dim and then enter the 
loop?

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/SortedSetDocValuesFacetCounts.java
##########
@@ -317,10 +343,19 @@ public Number getSpecificValue(String dim, String... 
path) throws IOException {
   public List<FacetResult> getAllDims(int topN) throws IOException {
 
     List<FacetResult> results = new ArrayList<>();
-    for (Map.Entry<String, OrdRange> ent : 
state.getPrefixToOrdRange().entrySet()) {
-      FacetResult fr = getDim(ent.getKey(), ent.getValue(), topN);
-      if (fr != null) {
-        results.add(fr);
+    for (String dim : state.getDims()) {
+      if (state.isHierarchicalDim(dim)) {
+        DimTree dimTree = state.getDimTree(dim);
+        FacetResult fr = getDim(dim, new String[0], dimTree.dimStartOrd, 
dimTree.iterator(), topN);

Review comment:
       Can you just pass `-1` instead of `dimTree.dimStartOrd` here? If I'm 
understanding the logic in `getDim`, I think you can right? Then you can make 
`dimStartOrd` `private`, which would be nice.

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/DefaultSortedSetDocValuesReaderState.java
##########
@@ -51,20 +54,42 @@
 
   private final Map<String, OrdinalMap> cachedOrdMaps = new HashMap<>();
 
+  private final FacetsConfig config;
+
+  /*** Used for hierarchical dims ***/

Review comment:
       I commented elsewhere but there's something going on with extra '*' in 
some of these javadoc comments :)

##########
File path: 
lucene/facet/src/java/org/apache/lucene/facet/sortedset/DefaultSortedSetDocValuesReaderState.java
##########
@@ -79,38 +104,160 @@ public DefaultSortedSetDocValuesReaderState(IndexReader 
reader, String field) th
     }
     valueCount = (int) dv.getValueCount();
 
-    // TODO: we can make this more efficient if eg we can be
-    // "involved" when OrdinalMap is being created?  Ie see
-    // each term/ord it's assigning as it goes...
-    String lastDim = null;
-    int startOrd = -1;
-
-    // TODO: this approach can work for full hierarchy?;
-    // TaxoReader can't do this since ords are not in
-    // "sorted order" ... but we should generalize this to
-    // support arbitrary hierarchy:
-    for (int ord = 0; ord < valueCount; ord++) {
-      final BytesRef term = dv.lookupOrd(ord);
+    int ord = 0;
+    while (ord != valueCount) {
+      BytesRef term = dv.lookupOrd(ord);
       String[] components = FacetsConfig.stringToPath(term.utf8ToString());
-      if (components.length != 2) {
-        throw new IllegalArgumentException(
-            "this class can only handle 2 level hierarchy (dim/value); got: "
-                + Arrays.toString(components)
-                + " "
-                + term.utf8ToString());
+      String dim = components[0];
+      if (config != null && config.getDimConfig(dim).hierarchical) {
+        ord = createOneHierarchicalFacetDimState(dv, ord) + 1;
+      } else {
+        ord = createOneFlatFacetDimState(dv, ord) + 1;
+      }
+    }
+  }
+
+  // returns last ord of dimension
+  private int createOneHierarchicalFacetDimState(SortedSetDocValues dv, int 
dimStartOrd)
+      throws IOException {
+    List<Boolean> hasChildren = new ArrayList<>();
+    List<Integer> siblings = new ArrayList<>();
+
+    String dim = null;
+
+    // stack of paths with unfulfilled siblings
+    Stack<OrdAndComponent> siblingStack = new Stack<>();
+
+    String[] nextComponents = null;
+    int dimEndOrd = dimStartOrd;
+    while (true) {
+      String[] components;
+
+      int ord = dimEndOrd - dimStartOrd;
+
+      if (nextComponents == null) {

Review comment:
       I made a similar comment elsewhere, but it seems like nextComponents 
could be initialized outside of the loop right?

##########
File path: lucene/facet/src/java/org/apache/lucene/facet/FacetsConfig.java
##########
@@ -471,19 +471,35 @@ private void indexDrillDownTerms(
 
   private void processSSDVFacetFields(
       Map<String, List<SortedSetDocValuesFacetField>> byField, Document doc) {
+
     for (Map.Entry<String, List<SortedSetDocValuesFacetField>> ent : 
byField.entrySet()) {
 
       String indexFieldName = ent.getKey();
 
       for (SortedSetDocValuesFacetField facetField : ent.getValue()) {
-        FacetLabel facetLabel = new FacetLabel(facetField.dim, 
facetField.label);
-        String fullPath = pathToString(facetLabel.components, 
facetLabel.length);
+        FacetLabel facetLabel = new FacetLabel(facetField.dim, 
facetField.path);
+        if (getDimConfig(facetField.dim).hierarchical) {
+          for (int i = 0; i < facetLabel.length; i++) {
+            String fullPath = pathToString(facetLabel.components, i + 1);
+            // For facet counts:
+            doc.add(new SortedSetDocValuesField(indexFieldName, new 
BytesRef(fullPath)));
+
+            // For drill-down:
+            indexDrillDownTerms(doc, indexFieldName, 
getDimConfig(facetField.dim), facetLabel);

Review comment:
       I don't think you intended to put this in the loop right? In fact, I 
think it should be outside of the conditional altogether since the logic should 
be the same regardless of whether-or-not it's hierarchical right?




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] gsmiller commented on a change in pull request #509: LUCENE-10250: Add support for arbitrary length hierarchical SSDV facets

Reply via email to