[ https://issues.apache.org/jira/browse/LUCENE-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355413#comment-17355413 ]
Greg Miller commented on LUCENE-9950: ------------------------------------- Thanks for adding your additional insights and use-case details [~sqshq]! As for specific use cases and performance comparison to SSDVFC, I'll describe what I was thinking (but unfortunately don't have any benchmark data for performance). In cases where a large majority of all possible dims need to be counted, it should be more efficient to pack them into a single field, allowing the matching docs to be iterated a single time (counting all dims/values along the way and probably getting some locality benefits as you mention). On the other hand, if only a small percentage of all available dims need to be counted, a lot of wasteful counting/computation takes place during the single counting iteration. In these cases, using a separate field-per-dimension means iterating the hits multiple times but not doing any unnecessary dim/value counting. Seems like there could be use-cases for both approaches. To be honest, I approached this more from the standpoint of being able to count doc value fields with any string data in them (unlike SSDVFC which assumes a specific format involving a dim). This is a nice functional complement to something like {{LongValueFacetCounts}}. It would be really interesting to do some performance benchmarking though! > Support both single- and multi-value string fields in facet counting > (non-taxonomy based approaches) > ---------------------------------------------------------------------------------------------------- > > Key: LUCENE-9950 > URL: https://issues.apache.org/jira/browse/LUCENE-9950 > Project: Lucene - Core > Issue Type: Improvement > Components: modules/facet > Affects Versions: main (9.0) > Reporter: Greg Miller > Priority: Minor > Fix For: main (9.0), 8.9 > > Time Spent: 3h > Remaining Estimate: 0h > > Users wanting to facet count string-based fields using a non-taxonomy-based > approach can use {{SortedSetDocValueFacetCounts}}, which accumulates facet > counts based on a {{SortedSetDocValues}} field. This requires the stored doc > values to be multi-valued (i.e., {{SORTED_SET}}), and doesn't work on > single-valued fields (i.e., SORTED). In contrast, if a user wants to facet > count on a stored numeric field, they can use {{LongValueFacetCounts}}, which > supports both single- and multi-valued fields (and in LUCENE-9948, we now > auto-detect instead of asking the user to specify). > Let's update {{SortedSetDocValueFacetCounts}} to also support, and > automatically detect single- and multi-value fields. Note that this is a > spin-off issue from LUCENE-9946, where [~rcmuir] points out that this can > essentially be a one-line change, but we may want to do some class renaming > at the same time. Also note that we should do this in > {{ConcurrentSortedSetDocValuesFacetCounts}} while we're at it. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org