[jira] [Commented] (LUCENE-10250) Add hierarchical labels to SSDV facets

Greg Miller (Jira) Mon, 22 Nov 2021 15:13:05 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447673#comment-17447673
 ]


Greg Miller commented on LUCENE-10250:
--------------------------------------

{quote}I think it would be good to turn the problem around, e.g. why 
does/should SSDV facets do something special here, and if so, what is really 
needed?
{quote}
+1
{quote}SSDV is structured behind the scenes to support a variety of stuff: 
faceting, sorting, grouping, runtime functions, etc. I'm currently not 
convinced we have to really modify it in a special way to do such stuff: 
particularly if the problems can be solved by just indexing data differently. 
We wouldn't want for a more esoteric use-case to add performance costs to 
everyone?
{quote}
Also +1. I don't think there's anything in SSDV itself that would prevent us 
from doing this (or any changes to propose there). I do believe there are 
assumptions in a few places in the _faceting specific_ implementation on top of 
SSDV though that is assuming "flat" data (i.e., it assumes the string in the 
SSDV field is in the form "dim/value"). That's what I think we need to look at 
in a bit more detail.

 

> Add hierarchical labels to SSDV facets
> --------------------------------------
>
>                 Key: LUCENE-10250
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10250
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Marc D'Mello
>            Priority: Major
>              Labels: discussion
>
> Hi all,
> I recently [added a new benchmarking 
> task|https://github.com/mikemccand/luceneutil/issues/141] to {{luceneutil}} 
> to count facets on a random word chosen from each document which would give 
> us a very high cardinality facet benchmarking compared to the faceting 
> benchmarks we already had. After being merged, [~mikemccand] pointed out some 
> [interesting 
> results|https://home.apache.org/~mikemccand/lucenebench/BrowseRandomLabelTaxoFacets.html]
>  in the nightly benchmarks where the {{BrowseRandomLabelSSDVFacets}} task was 
> much faster than the {{BrowseRandomLabelTaxoFacets}} task.
> I was thinking that using SSDV facets instead of taxonomy facets for our use 
> case at Amazon Product Search could potentially lead to some increases in QPS 
> and decreases in index size, but the issue is we use hierarchical labels, and 
> as I understand it, SSDV faceting only supports a 2 level hierarchy as of 
> today. This leads to my question of why is there a limitation like this on 
> SSDV facets? Is hierarchical labels just a feature that hasn't been 
> implemented in SSDV facets yet, or is there some more complex reason that we 
> can't add hierarchical labels to SSDV facets?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10250) Add hierarchical labels to SSDV facets

Reply via email to