[ 
https://issues.apache.org/jira/browse/LUCENE-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448244#comment-17448244
 ] 

Robert Muir commented on LUCENE-10250:
--------------------------------------

my suggestion would be, it seems to pay off to do this stuff benchmark driven. 
it has improved both of these faceting approaches a lot so far. We should keep 
pressing on it.

For this particular case, It seems a bit crazy we have to generate random data 
to make facet labels to benchmark

We are indexing wikipedia articles, no?

Wikipedia articles have "hierarchical categories" in the data already. For 
example, https://en.wikipedia.org/wiki/English_language has these categories:
{quote}
Categories: English language, Analytic languages, English languages, Fusional 
languages, Germanic languages, Stress-timed languages, Subject–verb–object 
languages, Cultural globalization
{quote}

And these have hierarchies: e.g. SVO-languages has sub-categories: 
https://en.wikipedia.org/wiki/Category:Subject%E2%80%93verb%E2%80%93object_languages

So I think, separately, it would be great to think about improving benchmarking 
with a more realistic use-case to drive decisions and tuning.

> Add hierarchical labels to SSDV facets
> --------------------------------------
>
>                 Key: LUCENE-10250
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10250
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Marc D'Mello
>            Priority: Major
>              Labels: discussion
>
> Hi all,
> I recently [added a new benchmarking 
> task|https://github.com/mikemccand/luceneutil/issues/141] to {{luceneutil}} 
> to count facets on a random word chosen from each document which would give 
> us a very high cardinality facet benchmarking compared to the faceting 
> benchmarks we already had. After being merged, [~mikemccand] pointed out some 
> [interesting 
> results|https://home.apache.org/~mikemccand/lucenebench/BrowseRandomLabelTaxoFacets.html]
>  in the nightly benchmarks where the {{BrowseRandomLabelSSDVFacets}} task was 
> much faster than the {{BrowseRandomLabelTaxoFacets}} task.
> I was thinking that using SSDV facets instead of taxonomy facets for our use 
> case at Amazon Product Search could potentially lead to some increases in QPS 
> and decreases in index size, but the issue is we use hierarchical labels, and 
> as I understand it, SSDV faceting only supports a 2 level hierarchy as of 
> today. This leads to my question of why is there a limitation like this on 
> SSDV facets? Is hierarchical labels just a feature that hasn't been 
> implemented in SSDV facets yet, or is there some more complex reason that we 
> can't add hierarchical labels to SSDV facets?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to