[
https://issues.apache.org/jira/browse/LUCENE-9952?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Greg Miller updated LUCENE-9952:
--------------------------------
Description:
As described in a dev@ list
[thread|http://mail-archives.apache.org/mod_mbox/lucene-dev/202105.mbox/%3CCANJ0CDo-9zt0U_pxWNOBkfiJpaAXZGGwOEJPnENAP6JzWz_t9Q%40mail.gmail.com%3E],
the value of {{FacetResult#value}} can be incorrect in SSDV faceting when docs
are multi-valued (affects both {{SortedSetDocValueFacetCounts}} and
{{ConcurrentSortedSetDocValueFacetCounts}}). If a doc has multiple values in
the same dimension, it will be counted multiple times when populating the
counts of {{FacetResult#value}}.
We should either provide an accurate count, or provide {{-1}} if we don't have
an accurate count (like we do in taxonomy faceting). I _think_ this change will
be a bit involved though as SSDV facet counting likely needs to be made aware
of {{FacetConfig}}.
NOTE: I've updated this description to describe only the SSDV case after
spinning off LUCENE-9953 to track the LongValueFacetCounts case.
was:
As described in a dev@ list
[thread|http://mail-archives.apache.org/mod_mbox/lucene-dev/202105.mbox/%3CCANJ0CDo-9zt0U_pxWNOBkfiJpaAXZGGwOEJPnENAP6JzWz_t9Q%40mail.gmail.com%3E],
the value of {{FacetResult#value}} is sometimes populated with the number of
_docs_ that contain a value, and other times is populated with the total number
of values. For single-valued cases, these two values are identical, but they
are not in multi-value cases. For example, if a multi-value doc has two value
for the same field being counted, it will count only once when doing a "doc
count," but twice if doing a "field count."
We should implement consistent behavior across all of our {{Facet}}
implementations. I propose that this behavior should be the doc count. If the
doc count isn't possible to calculate, we should populate {{FacetResult#value}}
with {{-1}} (which is what the taxonomy-based implementations currently do).
> FacetResult#value can be inaccurate in SortedSetDocValueFacetCounts
> -------------------------------------------------------------------
>
> Key: LUCENE-9952
> URL: https://issues.apache.org/jira/browse/LUCENE-9952
> Project: Lucene - Core
> Issue Type: Bug
> Components: modules/facet
> Affects Versions: 8.9
> Reporter: Greg Miller
> Priority: Minor
>
> As described in a dev@ list
> [thread|http://mail-archives.apache.org/mod_mbox/lucene-dev/202105.mbox/%3CCANJ0CDo-9zt0U_pxWNOBkfiJpaAXZGGwOEJPnENAP6JzWz_t9Q%40mail.gmail.com%3E],
> the value of {{FacetResult#value}} can be incorrect in SSDV faceting when
> docs are multi-valued (affects both {{SortedSetDocValueFacetCounts}} and
> {{ConcurrentSortedSetDocValueFacetCounts}}). If a doc has multiple values in
> the same dimension, it will be counted multiple times when populating the
> counts of {{FacetResult#value}}.
> We should either provide an accurate count, or provide {{-1}} if we don't
> have an accurate count (like we do in taxonomy faceting). I _think_ this
> change will be a bit involved though as SSDV facet counting likely needs to
> be made aware of {{FacetConfig}}.
> NOTE: I've updated this description to describe only the SSDV case after
> spinning off LUCENE-9953 to track the LongValueFacetCounts case.
--
This message was sent by Atlassian Jira
(v8.3.4#803005)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]