[ 
https://issues.apache.org/jira/browse/LUCENE-9950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17355413#comment-17355413
 ] 

Greg Miller commented on LUCENE-9950:
-------------------------------------

Thanks for adding your additional insights and use-case details [~sqshq]! As 
for specific use cases and performance comparison to SSDVFC, I'll describe what 
I was thinking (but unfortunately don't have any benchmark data for 
performance). In cases where a large majority of all possible dims need to be 
counted, it should be more efficient to pack them into a single field, allowing 
the matching docs to be iterated a single time (counting all dims/values along 
the way and probably getting some locality benefits as you mention). On the 
other hand, if only a small percentage of all available dims need to be 
counted, a lot of wasteful counting/computation takes place during the single 
counting iteration. In these cases, using a separate field-per-dimension means 
iterating the hits multiple times but not doing any unnecessary dim/value 
counting. Seems like there could be use-cases for both approaches.

 

To be honest, I approached this more from the standpoint of being able to count 
doc value fields with any string data in them (unlike SSDVFC which assumes a 
specific format involving a dim). This is a nice functional complement to 
something like {{LongValueFacetCounts}}. It would be really interesting to do 
some performance benchmarking though!

> Support both single- and multi-value string fields in facet counting 
> (non-taxonomy based approaches)
> ----------------------------------------------------------------------------------------------------
>
>                 Key: LUCENE-9950
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9950
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>    Affects Versions: main (9.0)
>            Reporter: Greg Miller
>            Priority: Minor
>             Fix For: main (9.0), 8.9
>
>          Time Spent: 3h
>  Remaining Estimate: 0h
>
> Users wanting to facet count string-based fields using a non-taxonomy-based 
> approach can use {{SortedSetDocValueFacetCounts}}, which accumulates facet 
> counts based on a {{SortedSetDocValues}} field. This requires the stored doc 
> values to be multi-valued (i.e., {{SORTED_SET}}), and doesn't work on 
> single-valued fields (i.e., SORTED). In contrast, if a user wants to facet 
> count on a stored numeric field, they can use {{LongValueFacetCounts}}, which 
> supports both single- and multi-valued fields (and in LUCENE-9948, we now 
> auto-detect instead of asking the user to specify).
> Let's update {{SortedSetDocValueFacetCounts}} to also support, and 
> automatically detect single- and multi-value fields. Note that this is a 
> spin-off issue from LUCENE-9946, where [~rcmuir] points out that this can 
> essentially be a one-line change, but we may want to do some class renaming 
> at the same time. Also note that we should do this in 
> {{ConcurrentSortedSetDocValuesFacetCounts}} while we're at it.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to