gsmiller commented on code in PR #914: URL: https://github.com/apache/lucene/pull/914#discussion_r896240300
########## lucene/facet/src/java/org/apache/lucene/facet/LongValueFacetCounts.java: ########## @@ -346,6 +346,43 @@ private void increment(long value) { } } + @Override + public FacetResult getAllChildren(String dim, String... path) throws IOException { + if (dim.equals(field) == false) { + throw new IllegalArgumentException( + "invalid dim \"" + dim + "\"; should be \"" + field + "\""); + } + if (path.length != 0) { + throw new IllegalArgumentException("path.length should be 0"); + } Review Comment: minor: There's some common validation logic between this and `getTopChildren` you could factor out into a common helper method if you thought it made sense. ########## lucene/facet/src/java/org/apache/lucene/facet/LongValueFacetCounts.java: ########## @@ -346,6 +346,43 @@ private void increment(long value) { } } + @Override + public FacetResult getAllChildren(String dim, String... path) throws IOException { + if (dim.equals(field) == false) { + throw new IllegalArgumentException( + "invalid dim \"" + dim + "\"; should be \"" + field + "\""); + } + if (path.length != 0) { + throw new IllegalArgumentException("path.length should be 0"); + } + + List<LabelAndValue> labelValues = new ArrayList<>(); + boolean countsAdded = false; + if (hashCounts.size() != 0) { + for (LongIntCursor c : hashCounts) { + int count = c.value; + if (count != 0) { + if (countsAdded == false && c.key >= counts.length) { + countsAdded = true; + appendCounts(labelValues); + } + labelValues.add(new LabelAndValue(Long.toString(c.key), count)); + } + } + } + + if (countsAdded == false) { + appendCounts(labelValues); + } + + return new FacetResult( + field, + new String[0], + totCount, + labelValues.toArray(new LabelAndValue[0]), + labelValues.size()); Review Comment: It looks like you're trying to maintain value-sort-order, sort of like what `getAllChildrenSortByValue` is doing, but since we don't make any ordering guarantees, I think we can simplify this a little bit. What do you think of doing something like this? ```suggestion List<LabelAndValue> labelValues = new ArrayList<>(); for (int i = 0; i < counts.length; i++) { if (counts[i] != 0) { labelValues.add(new LabelAndValue(Long.toString(i), counts[i])); } } if (hashCounts.size() != 0) { for (LongIntCursor c : hashCounts) { int count = c.value; if (count != 0) { labelValues.add(new LabelAndValue(Long.toString(c.key), c.value)); } } } return new FacetResult( field, new String[0], totCount, labelValues.toArray(new LabelAndValue[0]), labelValues.size()); ``` ########## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/FloatTaxonomyFacets.java: ########## @@ -101,6 +102,63 @@ public Number getSpecificValue(String dim, String... path) throws IOException { return values[ord]; } + @Override + public FacetResult getAllChildren(String dim, String... path) throws IOException { Review Comment: Please see my feedback on `IntTaxonomyFacets`. It should all apply here as well. Thanks! ########## lucene/facet/src/java/org/apache/lucene/facet/sortedset/AbstractSortedSetDocValueFacetCounts.java: ########## @@ -72,6 +72,40 @@ public FacetResult getTopChildren(int topN, String dim, String... path) throws I return createFacetResult(topChildrenForPath, dim, path); } + @Override + public FacetResult getAllChildren(String dim, String... path) throws IOException { + FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim); + + if (dimConfig.hierarchical) { + int pathOrd = (int) dv.lookupTerm(new BytesRef(FacetsConfig.pathToString(dim, path))); + if (pathOrd < 0) { + // path was never indexed + return null; + } + SortedSetDocValuesReaderState.DimTree dimTree = state.getDimTree(dim); Review Comment: As a general note, I think we usually prefer to _not_ fully qualify internal class names unless necessary (and prefer to import them directly instead). For example, we're already doing `import org.apache.lucene.facet.sortedset.SortedSetDocValuesReaderState.DimTree;`, so you can just say `DimTree` here instead of `SortedSetDocValuesReaderState.DimTree`. I'm betting your IDE is setup with this style, but something to just keep an eye on. ########## lucene/facet/src/java/org/apache/lucene/facet/sortedset/AbstractSortedSetDocValueFacetCounts.java: ########## @@ -72,6 +72,40 @@ public FacetResult getTopChildren(int topN, String dim, String... path) throws I return createFacetResult(topChildrenForPath, dim, path); } + @Override + public FacetResult getAllChildren(String dim, String... path) throws IOException { + FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim); + + if (dimConfig.hierarchical) { + int pathOrd = (int) dv.lookupTerm(new BytesRef(FacetsConfig.pathToString(dim, path))); + if (pathOrd < 0) { + // path was never indexed + return null; + } + SortedSetDocValuesReaderState.DimTree dimTree = state.getDimTree(dim); + return getPathResult(dimConfig, dim, path, pathOrd, dimTree.iterator(pathOrd)); + } else { + if (path.length > 0) { + throw new IllegalArgumentException( + "Field is not configured as hierarchical, path should be 0 length"); + } + OrdRange ordRange = state.getOrdRange(dim); + if (ordRange == null) { + // means dimension was never indexed + return null; + } + int dimOrd = ordRange.start; + PrimitiveIterator.OfInt childIt = ordRange.iterator(); + if (dimConfig.multiValued && dimConfig.requireDimCount) { + // If the dim is multi-valued and requires dim counts, we know we've explicitly indexed + // the dimension and we need to skip past it so the iterator is positioned on the first + // child: + childIt.next(); + } + return getPathResult(dimConfig, dim, null, dimOrd, childIt); + } + } Review Comment: I think we can actually optimize this a little bit when looking up the dim ord and preparing the child iterator by borrowing some ideas from `getTopChildren`. Also, I don't think we need to break out the `getPathResult` method since we only call it from this method (and it's easy to collapse the invocation to a single place). So I'd suggest we inline that logic. Here's what I'm thinking. Hope you don't mind me taking a pass at this (and then remove `getPathResult` completely): ```suggestion @Override public FacetResult getAllChildren(String dim, String... path) throws IOException { FacetsConfig.DimConfig dimConfig = stateConfig.getDimConfig(dim); // Determine the path ord and resolve an iterator to its immediate children. The logic for this // depends on whether-or-not the dimension is configured as hierarchical: final int pathOrd; final PrimitiveIterator.OfInt childIterator; if (dimConfig.hierarchical) { DimTree dimTree = state.getDimTree(dim); if (path.length > 0) { pathOrd = (int) dv.lookupTerm(new BytesRef(FacetsConfig.pathToString(dim, path))); } else { // If there's no path, this is a little more efficient to just look up the dim: pathOrd = dimTree.dimStartOrd; } if (pathOrd < 0) { // path was never indexed return null; } childIterator = dimTree.iterator(pathOrd); } else { if (path.length > 0) { throw new IllegalArgumentException( "Field is not configured as hierarchical, path should be 0 length"); } OrdRange ordRange = state.getOrdRange(dim); if (ordRange == null) { // means dimension was never indexed return null; } pathOrd = ordRange.start; childIterator = ordRange.iterator(); if (dimConfig.multiValued && dimConfig.requireDimCount) { // If the dim is multi-valued and requires dim counts, we know we've explicitly indexed // the dimension and we need to skip past it so the iterator is positioned on the first // child: childIterator.next(); } } // Compute the actual results: int pathCount = 0; List<LabelAndValue> labelValues = new ArrayList<>(); while (childIterator.hasNext()) { int ord = childIterator.next(); int count = getCount(ord); if (count > 0) { pathCount += count; final BytesRef term = dv.lookupOrd(ord); String[] parts = FacetsConfig.stringToPath(term.utf8ToString()); labelValues.add(new LabelAndValue(parts[parts.length - 1], count)); } } if (dimConfig.hierarchical) { pathCount = getCount(pathOrd); } else { // see if pathCount is actually reliable or needs to be reset if (dimConfig.multiValued) { if (dimConfig.requireDimCount) { pathCount = getCount(pathOrd); } else { pathCount = -1; // pathCount is inaccurate at this point, so set it to -1 } } } return new FacetResult( dim, path, pathCount, labelValues.toArray(new LabelAndValue[0]), labelValues.size()); } ``` ########## lucene/facet/src/java/org/apache/lucene/facet/StringValueFacetCounts.java: ########## @@ -130,6 +131,41 @@ public StringValueFacetCounts(StringDocValuesReaderState state, FacetsCollector } } + @Override + public FacetResult getAllChildren(String dim, String... path) throws IOException { + if (dim.equals(field) == false) { + throw new IllegalArgumentException( + "invalid dim \"" + dim + "\"; should be \"" + field + "\""); + } + if (path.length != 0) { + throw new IllegalArgumentException("path.length should be 0"); + } + + int childCount = 0; // total number of labels with non-zero count Review Comment: minor: Instead of keeping track of `childCound`, I think you can just reference `labelValues.size()` when you go to create `FacetResult` right? ########## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java: ########## @@ -163,6 +164,76 @@ public Number getSpecificValue(String dim, String... path) throws IOException { return getValue(ord); } + @Override + public FacetResult getAllChildren(String dim, String... path) throws IOException { + DimConfig dimConfig = verifyDim(dim); + FacetLabel cp = new FacetLabel(dim, path); + int dimOrd = taxoReader.getOrdinal(cp); + if (dimOrd == -1) { + return null; + } + + int aggregatedValue = 0; + int childCount = 0; Review Comment: I don't think we need this local variable right? Can you just use `ordinals.size()` when you go to create the response? ########## lucene/facet/src/java/org/apache/lucene/facet/Facets.java: ########## @@ -29,6 +29,12 @@ public abstract class Facets { /** Default constructor. */ public Facets() {} + /** + * Returns all the children labels under the specified path. Returns null if the specified path Review Comment: Maybe be specific that it returns all child labels *with non-zero counts*? Also, I wonder if we should mention something about the lack of a specific sort order? I could see some users assuming it would sort by count, but we're intentionally not doing that right? So maybe a note that users should generally make no assumptions about the child sort order? ########## lucene/facet/src/java/org/apache/lucene/facet/sortedset/DefaultSortedSetDocValuesReaderState.java: ########## @@ -381,7 +381,7 @@ public OrdRange getOrdRange(String dim) { public DimTree getDimTree(String dim) { if (config.getDimConfig(dim).hierarchical == false) { throw new UnsupportedOperationException( - "This opperation is only supported for hierarchical facets"); + "This operation is only supported for hierarchical facets"); Review Comment: Nice catch :) ########## lucene/facet/src/java/org/apache/lucene/facet/taxonomy/IntTaxonomyFacets.java: ########## @@ -163,6 +164,76 @@ public Number getSpecificValue(String dim, String... path) throws IOException { return getValue(ord); } + @Override + public FacetResult getAllChildren(String dim, String... path) throws IOException { + DimConfig dimConfig = verifyDim(dim); + FacetLabel cp = new FacetLabel(dim, path); + int dimOrd = taxoReader.getOrdinal(cp); + if (dimOrd == -1) { + return null; + } + + int aggregatedValue = 0; + int childCount = 0; + + List<Integer> ordinals = new ArrayList<>(); + List<Integer> ordValues = new ArrayList<>(); + + if (sparseValues != null) { + for (IntIntCursor c : sparseValues) { + int value = c.value; + int ord = c.key; + if (parents[ord] == dimOrd && value > 0) { + aggregatedValue = aggregationFunction.aggregate(aggregatedValue, value); + childCount++; + ordinals.add(ord); + ordValues.add(value); + } + } + } else { + int[] children = getChildren(); + int[] siblings = getSiblings(); + int ord = children[dimOrd]; + while (ord != TaxonomyReader.INVALID_ORDINAL) { + int value = values[ord]; + if (value > 0) { + aggregatedValue = aggregationFunction.aggregate(aggregatedValue, value); + childCount++; + ordinals.add(ord); + ordValues.add(value); + } + ord = siblings[ord]; + } + } + + if (aggregatedValue == 0) { + return null; + } + + if (dimConfig.multiValued) { + if (dimConfig.requireDimCount) { + aggregatedValue = getValue(dimOrd); + } else { + // Our sum'd value is not correct, in general: + aggregatedValue = -1; + } + } else { + // Our sum'd dim value is accurate, so we keep it + } + + int[] ordinalArray = new int[ordinals.size()]; + for (int i = 0; i < ordinals.size(); i++) { + ordinalArray[i] = ordinals.get(i); + } Review Comment: Thanks for using `getBulkPath` below, but there's no need for this `ordinalArray` is there? Can't you just use `ordinals` directly for `getBulkPath` (and remove this code completely)? ########## lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java: ########## @@ -216,6 +216,24 @@ protected void count(String field, List<FacetsCollector.MatchingDocs> matchingDo totCount -= missingCount; } + // Return range labels in the order that specified by the user + @Override + public FacetResult getAllChildren(String dim, String... path) throws IOException { + if (dim.equals(field) == false) { + throw new IllegalArgumentException( + "invalid dim \"" + dim + "\"; should be \"" + field + "\""); + } + if (path.length != 0) { + throw new IllegalArgumentException("path.length should be 0"); + } + LabelAndValue[] labelValues = new LabelAndValue[counts.length]; + for (int i = 0; i < counts.length; i++) { + labelValues[i] = new LabelAndValue(ranges[i].label, counts[i]); + } + return new FacetResult(dim, path, totCount, labelValues, labelValues.length); + } + + // TODO: fix getTopChildren in LUCENE-10538 @Override public FacetResult getTopChildren(int topN, String dim, String... path) { Review Comment: Could you change `getTopChildren` to just call `getAllChildren` for now instead of having the copy/paste code? And maybe add a nice comment explaining what's going on? ########## lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java: ########## @@ -216,6 +216,24 @@ protected void count(String field, List<FacetsCollector.MatchingDocs> matchingDo totCount -= missingCount; } + // Return range labels in the order that specified by the user Review Comment: Could you use javadoc please? Maybe something like: ```suggestion /** * {@inheritDoc} * <p> * NOTE: This implementation guarantees that ranges will be returned in the order specified by * the user when calling the constructor. */ ``` ########## lucene/facet/src/java/org/apache/lucene/facet/StringValueFacetCounts.java: ########## @@ -130,6 +131,41 @@ public StringValueFacetCounts(StringDocValuesReaderState state, FacetsCollector } } + @Override + public FacetResult getAllChildren(String dim, String... path) throws IOException { + if (dim.equals(field) == false) { Review Comment: minor: There some common validation logic shared with getTopChildren you could factor out if you think it makes sense. ########## lucene/facet/src/java/org/apache/lucene/facet/range/RangeFacetCounts.java: ########## @@ -216,6 +216,24 @@ protected void count(String field, List<FacetsCollector.MatchingDocs> matchingDo totCount -= missingCount; } + // Return range labels in the order that specified by the user + @Override + public FacetResult getAllChildren(String dim, String... path) throws IOException { + if (dim.equals(field) == false) { + throw new IllegalArgumentException( + "invalid dim \"" + dim + "\"; should be \"" + field + "\""); + } + if (path.length != 0) { + throw new IllegalArgumentException("path.length should be 0"); + } + LabelAndValue[] labelValues = new LabelAndValue[counts.length]; + for (int i = 0; i < counts.length; i++) { + labelValues[i] = new LabelAndValue(ranges[i].label, counts[i]); + } + return new FacetResult(dim, path, totCount, labelValues, labelValues.length); + } + + // TODO: fix getTopChildren in LUCENE-10538 Review Comment: I think LUCENE-10538 is a little overloaded, so I created a new issue to track the specific idea of creating a true `getTopChildren` implementation for range faceting. Could you reference LUCENE-10614 instead? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org