[
https://issues.apache.org/jira/browse/LUCENE-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448219#comment-17448219
]
Greg Miller commented on LUCENE-10250:
--------------------------------------
The more I've thought about this, the more I like it actually. I'd like to add
a "bigger plus one" {+}({+}++1?) to exploring this idea. Taxonomy faceting is
super useful but seems a little "clunky" to me in its current form (for lack of
a better adjective). The whole idea of using a side-car Lucene index to store
taxonomy information is neat/creative, but feels slightly "off". We've recently
chased optimizations to how we encode information in this index (and in docs)
(see: LUCENE-9450, LUCENE-10062, LUCENE-10122). We also rely on specific merge
policies to ensure ordinal stability in the taxonomy index (see conversation
[here|https://github.com/apache/lucene/pull/442#discussion_r750653939]). While
I recognize that there are a couple fundamental differences between
taxonomy-based faceting and SSDV faceting, supporting hierarchical labels with
SSDV faceting would go a long way towards closing the gap between the two
implementations. If we could reach feature parity between these two approaches
by supporting hierarchies of arbitrary depth, I think the only real difference
becomes "ordinal mapping" penalties. So SSDV still needs to apply a global
mapping while taxonomy-based faceting does some extra work when merging
segments to avoid the need for global mapping at query time.
This is all a (fairly rambling) way of saying that I'm all in favor of trying
to reach feature parity between taxonomy-based faceting and SSDV faceting in
hopes that the two ideas could converge at some point into one implementation.
Maybe that's overly optimistic, but I'll hold out hope. This would be a nice
step in that direction.
> Add hierarchical labels to SSDV facets
> --------------------------------------
>
> Key: LUCENE-10250
> URL: https://issues.apache.org/jira/browse/LUCENE-10250
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Marc D'Mello
> Priority: Major
> Labels: discussion
>
> Hi all,
> I recently [added a new benchmarking
> task|https://github.com/mikemccand/luceneutil/issues/141] to {{luceneutil}}
> to count facets on a random word chosen from each document which would give
> us a very high cardinality facet benchmarking compared to the faceting
> benchmarks we already had. After being merged, [~mikemccand] pointed out some
> [interesting
> results|https://home.apache.org/~mikemccand/lucenebench/BrowseRandomLabelTaxoFacets.html]
> in the nightly benchmarks where the {{BrowseRandomLabelSSDVFacets}} task was
> much faster than the {{BrowseRandomLabelTaxoFacets}} task.
> I was thinking that using SSDV facets instead of taxonomy facets for our use
> case at Amazon Product Search could potentially lead to some increases in QPS
> and decreases in index size, but the issue is we use hierarchical labels, and
> as I understand it, SSDV faceting only supports a 2 level hierarchy as of
> today. This leads to my question of why is there a limitation like this on
> SSDV facets? Is hierarchical labels just a feature that hasn't been
> implemented in SSDV facets yet, or is there some more complex reason that we
> can't add hierarchical labels to SSDV facets?
> Thanks!
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]