[ 
https://issues.apache.org/jira/browse/LUCENE-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Miller updated LUCENE-9952:
--------------------------------
    Description: 
As described in a dev@ list 
[thread|http://mail-archives.apache.org/mod_mbox/lucene-dev/202105.mbox/%3CCANJ0CDo-9zt0U_pxWNOBkfiJpaAXZGGwOEJPnENAP6JzWz_t9Q%40mail.gmail.com%3E],
 the value of {{FacetResult#value}} can be incorrect in SSDV faceting when docs 
are multi-valued (affects both {{SortedSetDocValueFacetCounts}} and 
{{ConcurrentSortedSetDocValueFacetCounts}}). If a doc has multiple values in 
the same dimension, it will be counted multiple times when populating the 
counts of {{FacetResult#value}}.

We should either provide an accurate count, or provide {{-1}} if we don't have 
an accurate count (like we do in taxonomy faceting). I _think_ this change will 
be a bit involved though as SSDV facet counting likely needs to be made aware 
of {{FacetConfig}}.

NOTE: I've updated this description to describe only the SSDV case after 
spinning off LUCENE-9953 to track the LongValueFacetCounts case.

  was:
As described in a dev@ list 
[thread|http://mail-archives.apache.org/mod_mbox/lucene-dev/202105.mbox/%3CCANJ0CDo-9zt0U_pxWNOBkfiJpaAXZGGwOEJPnENAP6JzWz_t9Q%40mail.gmail.com%3E],
 the value of {{FacetResult#value}} is sometimes populated with the number of 
_docs_ that contain a value, and other times is populated with the total number 
of values. For single-valued cases, these two values are identical, but they 
are not in multi-value cases. For example, if a multi-value doc has two value 
for the same field being counted, it will count only once when doing a "doc 
count," but twice if doing a "field count."

We should implement consistent behavior across all of our {{Facet}} 
implementations. I propose that this behavior should be the doc count. If the 
doc count isn't possible to calculate, we should populate {{FacetResult#value}} 
with {{-1}} (which is what the taxonomy-based implementations currently do).


> FacetResult#value can be inaccurate in SortedSetDocValueFacetCounts
> -------------------------------------------------------------------
>
>                 Key: LUCENE-9952
>                 URL: https://issues.apache.org/jira/browse/LUCENE-9952
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/facet
>    Affects Versions: 8.9
>            Reporter: Greg Miller
>            Priority: Minor
>
> As described in a dev@ list 
> [thread|http://mail-archives.apache.org/mod_mbox/lucene-dev/202105.mbox/%3CCANJ0CDo-9zt0U_pxWNOBkfiJpaAXZGGwOEJPnENAP6JzWz_t9Q%40mail.gmail.com%3E],
>  the value of {{FacetResult#value}} can be incorrect in SSDV faceting when 
> docs are multi-valued (affects both {{SortedSetDocValueFacetCounts}} and 
> {{ConcurrentSortedSetDocValueFacetCounts}}). If a doc has multiple values in 
> the same dimension, it will be counted multiple times when populating the 
> counts of {{FacetResult#value}}.
> We should either provide an accurate count, or provide {{-1}} if we don't 
> have an accurate count (like we do in taxonomy faceting). I _think_ this 
> change will be a bit involved though as SSDV facet counting likely needs to 
> be made aware of {{FacetConfig}}.
> NOTE: I've updated this description to describe only the SSDV case after 
> spinning off LUCENE-9953 to track the LongValueFacetCounts case.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to