[jira] [Commented] (LUCENE-10250) Add hierarchical labels to SSDV facets

Robert Muir (Jira) Mon, 22 Nov 2021 17:09:06 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447712#comment-17447712
 ]


Robert Muir commented on LUCENE-10250:
--------------------------------------

{quote}
We would need a general way to determine the ordinal range for any given path 
prefix in order to generalize to hierarchical data (to Robert's point). So when 
the user, for example, requests the "top-n" child values for a given 
hierarchical path, we can efficiently get the counts for all "child" ordinals.
{quote}

Right, because values are sorted, you can compute the ordinal range of some 
prefix (or startTerm/endTerm) using SDV/SDDV's {{lookupTerm(BytesRef)}}: 

https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/index/SortedSetDocValues.java#L73

You can look at the solr faceting code for exact details of how they do it for 
a prefix.

But high-level, yeah the idea here is that if you index terms in a certain way, 
due to the sorted order, we really do effectively have a tree already. and we 
can drill down specific paths of it.

> Add hierarchical labels to SSDV facets
> --------------------------------------
>
>                 Key: LUCENE-10250
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10250
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Marc D'Mello
>            Priority: Major
>              Labels: discussion
>
> Hi all,
> I recently [added a new benchmarking 
> task|https://github.com/mikemccand/luceneutil/issues/141] to {{luceneutil}} 
> to count facets on a random word chosen from each document which would give 
> us a very high cardinality facet benchmarking compared to the faceting 
> benchmarks we already had. After being merged, [~mikemccand] pointed out some 
> [interesting 
> results|https://home.apache.org/~mikemccand/lucenebench/BrowseRandomLabelTaxoFacets.html]
>  in the nightly benchmarks where the {{BrowseRandomLabelSSDVFacets}} task was 
> much faster than the {{BrowseRandomLabelTaxoFacets}} task.
> I was thinking that using SSDV facets instead of taxonomy facets for our use 
> case at Amazon Product Search could potentially lead to some increases in QPS 
> and decreases in index size, but the issue is we use hierarchical labels, and 
> as I understand it, SSDV faceting only supports a 2 level hierarchy as of 
> today. This leads to my question of why is there a limitation like this on 
> SSDV facets? Is hierarchical labels just a feature that hasn't been 
> implemented in SSDV facets yet, or is there some more complex reason that we 
> can't add hierarchical labels to SSDV facets?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10250) Add hierarchical labels to SSDV facets

Reply via email to