[ https://issues.apache.org/jira/browse/LUCENE-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447710#comment-17447710 ]
Greg Miller commented on LUCENE-10250: -------------------------------------- I took another look at the SSDV faceting code to try to jog my memory a bit more on where these "non hierarchical" assumptions are being applied and where they might get tricky, and I'd encourage you to have a look at what {{DefaultSortedSetDocValuesReaderState}} does at construction time. When that "state" object is constructed, it determines the ordinal range for every unique dimension, which is later used when getting counts for specific dimensions. We would need a general way to determine the ordinal range for any given path prefix in order to generalize to hierarchical data (to Robert's point). So when the user, for example, requests the "top-n" child values for a given hierarchical path, we can efficiently get the counts for all "child" ordinals. But looking at the code in {{DefaultSortedSetDocValuesReaderState}}, it appears the question has been asked in the form of a TODO as to whether-or-not that mapping logic could be generalized to hierarchical paths. I don't see any reason why it can't, so I think there's a path forward there without even needed to open up more SSDV capabilities (like finding all ordinals for a given prefix). > Add hierarchical labels to SSDV facets > -------------------------------------- > > Key: LUCENE-10250 > URL: https://issues.apache.org/jira/browse/LUCENE-10250 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Marc D'Mello > Priority: Major > Labels: discussion > > Hi all, > I recently [added a new benchmarking > task|https://github.com/mikemccand/luceneutil/issues/141] to {{luceneutil}} > to count facets on a random word chosen from each document which would give > us a very high cardinality facet benchmarking compared to the faceting > benchmarks we already had. After being merged, [~mikemccand] pointed out some > [interesting > results|https://home.apache.org/~mikemccand/lucenebench/BrowseRandomLabelTaxoFacets.html] > in the nightly benchmarks where the {{BrowseRandomLabelSSDVFacets}} task was > much faster than the {{BrowseRandomLabelTaxoFacets}} task. > I was thinking that using SSDV facets instead of taxonomy facets for our use > case at Amazon Product Search could potentially lead to some increases in QPS > and decreases in index size, but the issue is we use hierarchical labels, and > as I understand it, SSDV faceting only supports a 2 level hierarchy as of > today. This leads to my question of why is there a limitation like this on > SSDV facets? Is hierarchical labels just a feature that hasn't been > implemented in SSDV facets yet, or is there some more complex reason that we > can't add hierarchical labels to SSDV facets? > Thanks! -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org