[ 
https://issues.apache.org/jira/browse/LUCENE-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17447645#comment-17447645
 ] 

Greg Miller commented on LUCENE-10250:
--------------------------------------

I can't think of any reason off the top of my head that SSDV facet counting 
couldn't support hierarchical dimensions but here are a few placed I'd suggest 
digging into:
 # SortedSetDocValuesFacetField, which is used to add these fields at indexing 
time, appears to only support a single "flat" value, so that would need some 
thought (along with code in FacetsConfig that helps in the indexing).
 # I _think_ DefaultSortedSetDocValuesReaderState has some baked in assumptions 
around "flat" data. I would poke around in there as a first stop to see if 
there's anything fundamentally preventing the extension to hierarchies.

I'd poked around this code a fair amount a few months back so I'll see if I can 
refresh my memory a bit more and will add some additional info here if I come 
up with something.

> Add hierarchical labels to SSDV facets
> --------------------------------------
>
>                 Key: LUCENE-10250
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10250
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Marc D'Mello
>            Priority: Major
>              Labels: discussion
>
> Hi all,
> I recently [added a new benchmarking 
> task|https://github.com/mikemccand/luceneutil/issues/141] to {{luceneutil}} 
> to count facets on a random word chosen from each document which would give 
> us a very high cardinality facet benchmarking compared to the faceting 
> benchmarks we already had. After being merged, [~mikemccand] pointed out some 
> [interesting 
> results|https://home.apache.org/~mikemccand/lucenebench/BrowseRandomLabelTaxoFacets.html]
>  in the nightly benchmarks where the {{BrowseRandomLabelSSDVFacets}} task was 
> much faster than the {{BrowseRandomLabelTaxoFacets}} task.
> I was thinking that using SSDV facets instead of taxonomy facets for our use 
> case at Amazon Product Search could potentially lead to some increases in QPS 
> and decreases in index size, but the issue is we use hierarchical labels, and 
> as I understand it, SSDV faceting only supports a 2 level hierarchy as of 
> today. This leads to my question of why is there a limitation like this on 
> SSDV facets? Is hierarchical labels just a feature that hasn't been 
> implemented in SSDV facets yet, or is there some more complex reason that we 
> can't add hierarchical labels to SSDV facets?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to