[ https://issues.apache.org/jira/browse/LUCENE-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Greg Miller updated LUCENE-9952: -------------------------------- Description: As described in a dev@ list [thread|http://mail-archives.apache.org/mod_mbox/lucene-dev/202105.mbox/%3CCANJ0CDo-9zt0U_pxWNOBkfiJpaAXZGGwOEJPnENAP6JzWz_t9Q%40mail.gmail.com%3E], the value of {{FacetResult#value}} can be incorrect in SSDV faceting when docs are multi-valued (affects both {{SortedSetDocValueFacetCounts}} and {{ConcurrentSortedSetDocValueFacetCounts}}). If a doc has multiple values in the same dimension, it will be counted multiple times when populating the counts of {{FacetResult#value}}. We should either provide an accurate count, or provide {{-1}} if we don't have an accurate count (like we do in taxonomy faceting). I _think_ this change will be a bit involved though as SSDV facet counting likely needs to be made aware of {{FacetConfig}}. NOTE: I've updated this description to describe only the SSDV case after spinning off LUCENE-9953 to track the LongValueFacetCounts case. was: As described in a dev@ list [thread|http://mail-archives.apache.org/mod_mbox/lucene-dev/202105.mbox/%3CCANJ0CDo-9zt0U_pxWNOBkfiJpaAXZGGwOEJPnENAP6JzWz_t9Q%40mail.gmail.com%3E], the value of {{FacetResult#value}} is sometimes populated with the number of _docs_ that contain a value, and other times is populated with the total number of values. For single-valued cases, these two values are identical, but they are not in multi-value cases. For example, if a multi-value doc has two value for the same field being counted, it will count only once when doing a "doc count," but twice if doing a "field count." We should implement consistent behavior across all of our {{Facet}} implementations. I propose that this behavior should be the doc count. If the doc count isn't possible to calculate, we should populate {{FacetResult#value}} with {{-1}} (which is what the taxonomy-based implementations currently do). > FacetResult#value can be inaccurate in SortedSetDocValueFacetCounts > ------------------------------------------------------------------- > > Key: LUCENE-9952 > URL: https://issues.apache.org/jira/browse/LUCENE-9952 > Project: Lucene - Core > Issue Type: Bug > Components: modules/facet > Affects Versions: 8.9 > Reporter: Greg Miller > Priority: Minor > > As described in a dev@ list > [thread|http://mail-archives.apache.org/mod_mbox/lucene-dev/202105.mbox/%3CCANJ0CDo-9zt0U_pxWNOBkfiJpaAXZGGwOEJPnENAP6JzWz_t9Q%40mail.gmail.com%3E], > the value of {{FacetResult#value}} can be incorrect in SSDV faceting when > docs are multi-valued (affects both {{SortedSetDocValueFacetCounts}} and > {{ConcurrentSortedSetDocValueFacetCounts}}). If a doc has multiple values in > the same dimension, it will be counted multiple times when populating the > counts of {{FacetResult#value}}. > We should either provide an accurate count, or provide {{-1}} if we don't > have an accurate count (like we do in taxonomy faceting). I _think_ this > change will be a bit involved though as SSDV facet counting likely needs to > be made aware of {{FacetConfig}}. > NOTE: I've updated this description to describe only the SSDV case after > spinning off LUCENE-9953 to track the LongValueFacetCounts case. -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org