Lu Xugang created LUCENE-10281:
----------------------------------

             Summary: Error condition used to judge whether hits are sparse in 
StringValueFacetCounts
                 Key: LUCENE-10281
                 URL: https://issues.apache.org/jira/browse/LUCENE-10281
             Project: Lucene - Core
          Issue Type: Bug
          Components: modules/facet
    Affects Versions: 8.11
            Reporter: Lu Xugang


Description:
In construction method StringValueFacetCounts(StringDocValuesReaderState state, 
FacetsCollector facetsCollector), if facetsCollector was provided, a condition 
of *totalHits < totalDocs / 10 * used to judge whether using IntIntHashMap 
which means sparse to store term ord and count 。but per totalHits doesn't means 
it must be containing SSDV , and so is  totalDocs.  so the right calculation 
should be *( totalHits has SSDV) / (totalDocs has SSDV) *.  *totalDocs has 
SSDV* was easy to get by SortedSetDocValues#getValueCount(), 
*totalHits has SSDV* is hard to get because we can only read index by docId 
provided in FacetsCollector, but the way of getting *totalHits has SSDV* is 
slow and redundant.

Solution:
if we don't wanna to break the old logic that using denseCounts while 
cardinality < 1024 and using IntIntHashMap while 10% threshold and using 
denseCounts while the rest of the case, then we could still use denseCounts 
while cardinality < 1024, if not , we use IntIntHashMap.  when 10% of the 
unique term collected,then change to use denseCounts.







--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to