[ 
https://issues.apache.org/jira/browse/LUCENE-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17342068#comment-17342068
 ] 

Greg Miller commented on LUCENE-9952:
-------------------------------------

Hmm, this is actually a bit trickier for {{SortedSetDocValue}} facet counting 
than I'd hoped. The tricky bit here is that, while counting ordinals, it's easy 
to detect if a single doc has multiple values, but much harder to detect if 
those multiple values are within the same dimension. Taxonomy counting handles 
this by explicitly checking the facet configuration via. {{FacetsConfig}} (and 
also allows the user to specify that they need dimension counts, in which case 
it may explicitly store those counts in the index). I think the best solution 
for SSDV cases is to actually make counting aware of the {{FacetsConfig}}, but 
if taking that approach, I'd rather not try to maintain backwards compatibility 
with the current API. I'll see if I can get a PR up with this approach soon, 
but if we like it, we'll maybe want to introduce this in 9.0 so we don't have 
to dance around backwards-compat challenges.

> FacetResult#value should consistently report doc count, not field count
> -----------------------------------------------------------------------
>
>                 Key: LUCENE-9952
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9952
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/facet
>    Affects Versions: 8.9
>            Reporter: Greg Miller
>            Priority: Minor
>
> As described in a dev@ list 
> [thread|http://mail-archives.apache.org/mod_mbox/lucene-dev/202105.mbox/%3CCANJ0CDo-9zt0U_pxWNOBkfiJpaAXZGGwOEJPnENAP6JzWz_t9Q%40mail.gmail.com%3E],
>  the value of {{FacetResult#value}} is sometimes populated with the number of 
> _docs_ that contain a value, and other times is populated with the total 
> number of values. For single-valued cases, these two values are identical, 
> but they are not in multi-value cases. For example, if a multi-value doc has 
> two value for the same field being counted, it will count only once when 
> doing a "doc count," but twice if doing a "field count."
> We should implement consistent behavior across all of our {{Facet}} 
> implementations. I propose that this behavior should be the doc count. If the 
> doc count isn't possible to calculate, we should populate 
> {{FacetResult#value}} with {{-1}} (which is what the taxonomy-based 
> implementations currently do).



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to