[ https://issues.apache.org/jira/browse/LUCENE-10250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17448306#comment-17448306 ]
Robert Muir commented on LUCENE-10250: -------------------------------------- yeah, i think you did the right thing to get some basic high-cardinality faceting tested with the main wikipedia. perhaps another (easier) option for the future, would be to use another dataset such as geonames, and have a more simple standalone benchmark, more along the lines of http://people.apache.org/~mikemccand/geobench.html ? it could just do faceting, and really "target" faceting specifically, things like index speed, index size, faceting speed, RAM usage, or whatever. It wouldn't need to do other stuff like indexing title/text bodies that adds to noise. I just mention geonames because it has obvious simple hierarchical facets too: country, admin1 (province/state), admin2 (municipality/county), etc. And we are using it in benchmarking already https://download.geonames.org/export/dump/readme.txt > Add hierarchical labels to SSDV facets > -------------------------------------- > > Key: LUCENE-10250 > URL: https://issues.apache.org/jira/browse/LUCENE-10250 > Project: Lucene - Core > Issue Type: Improvement > Reporter: Marc D'Mello > Priority: Major > Labels: discussion > > Hi all, > I recently [added a new benchmarking > task|https://github.com/mikemccand/luceneutil/issues/141] to {{luceneutil}} > to count facets on a random word chosen from each document which would give > us a very high cardinality facet benchmarking compared to the faceting > benchmarks we already had. After being merged, [~mikemccand] pointed out some > [interesting > results|https://home.apache.org/~mikemccand/lucenebench/BrowseRandomLabelTaxoFacets.html] > in the nightly benchmarks where the {{BrowseRandomLabelSSDVFacets}} task was > much faster than the {{BrowseRandomLabelTaxoFacets}} task. > I was thinking that using SSDV facets instead of taxonomy facets for our use > case at Amazon Product Search could potentially lead to some increases in QPS > and decreases in index size, but the issue is we use hierarchical labels, and > as I understand it, SSDV faceting only supports a 2 level hierarchy as of > today. This leads to my question of why is there a limitation like this on > SSDV facets? Is hierarchical labels just a feature that hasn't been > implemented in SSDV facets yet, or is there some more complex reason that we > can't add hierarchical labels to SSDV facets? > Thanks! -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org