[
https://issues.apache.org/jira/browse/LUCENE-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447673#comment-17447673
]
Greg Miller commented on LUCENE-10250:
--------------------------------------
{quote}I think it would be good to turn the problem around, e.g. why
does/should SSDV facets do something special here, and if so, what is really
needed?
{quote}
+1
{quote}SSDV is structured behind the scenes to support a variety of stuff:
faceting, sorting, grouping, runtime functions, etc. I'm currently not
convinced we have to really modify it in a special way to do such stuff:
particularly if the problems can be solved by just indexing data differently.
We wouldn't want for a more esoteric use-case to add performance costs to
everyone?
{quote}
Also +1. I don't think there's anything in SSDV itself that would prevent us
from doing this (or any changes to propose there). I do believe there are
assumptions in a few places in the _faceting specific_ implementation on top of
SSDV though that is assuming "flat" data (i.e., it assumes the string in the
SSDV field is in the form "dim/value"). That's what I think we need to look at
in a bit more detail.
> Add hierarchical labels to SSDV facets
> --------------------------------------
>
> Key: LUCENE-10250
> URL: https://issues.apache.org/jira/browse/LUCENE-10250
> Project: Lucene - Core
> Issue Type: Improvement
> Reporter: Marc D'Mello
> Priority: Major
> Labels: discussion
>
> Hi all,
> I recently [added a new benchmarking
> task|https://github.com/mikemccand/luceneutil/issues/141] to {{luceneutil}}
> to count facets on a random word chosen from each document which would give
> us a very high cardinality facet benchmarking compared to the faceting
> benchmarks we already had. After being merged, [~mikemccand] pointed out some
> [interesting
> results|https://home.apache.org/~mikemccand/lucenebench/BrowseRandomLabelTaxoFacets.html]
> in the nightly benchmarks where the {{BrowseRandomLabelSSDVFacets}} task was
> much faster than the {{BrowseRandomLabelTaxoFacets}} task.
> I was thinking that using SSDV facets instead of taxonomy facets for our use
> case at Amazon Product Search could potentially lead to some increases in QPS
> and decreases in index size, but the issue is we use hierarchical labels, and
> as I understand it, SSDV faceting only supports a 2 level hierarchy as of
> today. This leads to my question of why is there a limitation like this on
> SSDV facets? Is hierarchical labels just a feature that hasn't been
> implemented in SSDV facets yet, or is there some more complex reason that we
> can't add hierarchical labels to SSDV facets?
> Thanks!
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]