[jira] [Commented] (LUCENE-10250) Add hierarchical labels to SSDV facets

Greg Miller (Jira) Mon, 22 Nov 2021 16:30:04 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447710#comment-17447710
 ]


Greg Miller commented on LUCENE-10250:
--------------------------------------

I took another look at the SSDV faceting code to try to jog my memory a bit 
more on where these "non hierarchical" assumptions are being applied and where 
they might get tricky, and I'd encourage you to have a look at what 
{{DefaultSortedSetDocValuesReaderState}} does at construction time. When that 
"state" object is constructed, it determines the ordinal range for every unique 
dimension, which is later used when getting counts for specific dimensions. We 
would need a general way to determine the ordinal range for any given path 
prefix in order to generalize to hierarchical data (to Robert's point). So when 
the user, for example, requests the "top-n" child values for a given 
hierarchical path, we can efficiently get the counts for all "child" ordinals.

But looking at the code in {{DefaultSortedSetDocValuesReaderState}}, it appears 
the question has been asked in the form of a TODO as to whether-or-not that 
mapping logic could be generalized to hierarchical paths. I don't see any 
reason why it can't, so I think there's a path forward there without even 
needed to open up more SSDV capabilities (like finding all ordinals for a given 
prefix). 

> Add hierarchical labels to SSDV facets
> --------------------------------------
>
>                 Key: LUCENE-10250
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10250
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Marc D'Mello
>            Priority: Major
>              Labels: discussion
>
> Hi all,
> I recently [added a new benchmarking 
> task|https://github.com/mikemccand/luceneutil/issues/141] to {{luceneutil}} 
> to count facets on a random word chosen from each document which would give 
> us a very high cardinality facet benchmarking compared to the faceting 
> benchmarks we already had. After being merged, [~mikemccand] pointed out some 
> [interesting 
> results|https://home.apache.org/~mikemccand/lucenebench/BrowseRandomLabelTaxoFacets.html]
>  in the nightly benchmarks where the {{BrowseRandomLabelSSDVFacets}} task was 
> much faster than the {{BrowseRandomLabelTaxoFacets}} task.
> I was thinking that using SSDV facets instead of taxonomy facets for our use 
> case at Amazon Product Search could potentially lead to some increases in QPS 
> and decreases in index size, but the issue is we use hierarchical labels, and 
> as I understand it, SSDV faceting only supports a 2 level hierarchy as of 
> today. This leads to my question of why is there a limitation like this on 
> SSDV facets? Is hierarchical labels just a feature that hasn't been 
> implemented in SSDV facets yet, or is there some more complex reason that we 
> can't add hierarchical labels to SSDV facets?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10250) Add hierarchical labels to SSDV facets

Reply via email to