[
https://issues.apache.org/jira/browse/LUCENE-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447645#comment-17447645
]
Greg Miller commented on LUCENE-10250:
--------------------------------------
I can't think of any reason off the top of my head that SSDV facet counting
couldn't support hierarchical dimensions but here are a few placed I'd suggest
digging into:
# SortedSetDocValuesFacetField, which is used to add these fields at indexing
time, appears to only support a single "flat" value, so that would need some
thought (along with code in FacetsConfig that helps in the indexing).
# I _think_ DefaultSortedSetDocValuesReaderState has some baked in assumptions
around "flat" data. I would poke around in there as a first stop to see if
there's anything fundamentally preventing the extension to hierarchies.
I'd poked around this code a fair amount a few months back so I'll see if I can
refresh my memory a bit more and will add some additional info here if I come
up with something.
> Add hierarchical labels to SSDV facets
> --------------------------------------
>
> Key: LUCENE-10250
> URL: https://issues.apache.org/jira/browse/LUCENE-10250
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Marc D'Mello
> Priority: Major
> Labels: discussion
>
> Hi all,
> I recently [added a new benchmarking
> task|https://github.com/mikemccand/luceneutil/issues/141] to {{luceneutil}}
> to count facets on a random word chosen from each document which would give
> us a very high cardinality facet benchmarking compared to the faceting
> benchmarks we already had. After being merged, [~mikemccand] pointed out some
> [interesting
> results|https://home.apache.org/~mikemccand/lucenebench/BrowseRandomLabelTaxoFacets.html]
> in the nightly benchmarks where the {{BrowseRandomLabelSSDVFacets}} task was
> much faster than the {{BrowseRandomLabelTaxoFacets}} task.
> I was thinking that using SSDV facets instead of taxonomy facets for our use
> case at Amazon Product Search could potentially lead to some increases in QPS
> and decreases in index size, but the issue is we use hierarchical labels, and
> as I understand it, SSDV faceting only supports a 2 level hierarchy as of
> today. This leads to my question of why is there a limitation like this on
> SSDV facets? Is hierarchical labels just a feature that hasn't been
> implemented in SSDV facets yet, or is there some more complex reason that we
> can't add hierarchical labels to SSDV facets?
> Thanks!
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]