[jira] [Commented] (LUCENE-10250) Add hierarchical labels to SSDV facets

Robert Muir (Jira) Tue, 23 Nov 2021 15:51:26 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448306#comment-17448306
 ]


Robert Muir commented on LUCENE-10250:
--------------------------------------

yeah, i think you did the right thing to get some basic high-cardinality 
faceting tested with the main wikipedia.

perhaps another (easier) option for the future, would be to use another dataset 
such as geonames, and have a more simple standalone benchmark, more along the 
lines of http://people.apache.org/~mikemccand/geobench.html ?

it could just do faceting, and really "target" faceting specifically, things 
like index speed, index size, faceting speed, RAM usage, or whatever. It 
wouldn't need to do other stuff like indexing title/text bodies that adds to 
noise.

I just mention geonames because it has obvious simple hierarchical facets too: 
country, admin1 (province/state), admin2 (municipality/county), etc. And we are 
using it in benchmarking already 
https://download.geonames.org/export/dump/readme.txt


> Add hierarchical labels to SSDV facets
> --------------------------------------
>
>                 Key: LUCENE-10250
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10250
>             Project: Lucene - Core
>          Issue Type: Improvement
>            Reporter: Marc D'Mello
>            Priority: Major
>              Labels: discussion
>
> Hi all,
> I recently [added a new benchmarking 
> task|https://github.com/mikemccand/luceneutil/issues/141] to {{luceneutil}} 
> to count facets on a random word chosen from each document which would give 
> us a very high cardinality facet benchmarking compared to the faceting 
> benchmarks we already had. After being merged, [~mikemccand] pointed out some 
> [interesting 
> results|https://home.apache.org/~mikemccand/lucenebench/BrowseRandomLabelTaxoFacets.html]
>  in the nightly benchmarks where the {{BrowseRandomLabelSSDVFacets}} task was 
> much faster than the {{BrowseRandomLabelTaxoFacets}} task.
> I was thinking that using SSDV facets instead of taxonomy facets for our use 
> case at Amazon Product Search could potentially lead to some increases in QPS 
> and decreases in index size, but the issue is we use hierarchical labels, and 
> as I understand it, SSDV faceting only supports a 2 level hierarchy as of 
> today. This leads to my question of why is there a limitation like this on 
> SSDV facets? Is hierarchical labels just a feature that hasn't been 
> implemented in SSDV facets yet, or is there some more complex reason that we 
> can't add hierarchical labels to SSDV facets?
> Thanks!



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10250) Add hierarchical labels to SSDV facets

Reply via email to