Lu Xugang created LUCENE-10281: ---------------------------------- Summary: Error condition used to judge whether hits are sparse in StringValueFacetCounts Key: LUCENE-10281 URL: https://issues.apache.org/jira/browse/LUCENE-10281 Project: Lucene - Core Issue Type: Bug Components: modules/facet Affects Versions: 8.11 Reporter: Lu Xugang
Description: In construction method StringValueFacetCounts(StringDocValuesReaderState state, FacetsCollector facetsCollector), if facetsCollector was provided, a condition of *totalHits < totalDocs / 10 * used to judge whether using IntIntHashMap which means sparse to store term ord and count 。but per totalHits doesn't means it must be containing SSDV , and so is totalDocs. so the right calculation should be *( totalHits has SSDV) / (totalDocs has SSDV) *. *totalDocs has SSDV* was easy to get by SortedSetDocValues#getValueCount(), *totalHits has SSDV* is hard to get because we can only read index by docId provided in FacetsCollector, but the way of getting *totalHits has SSDV* is slow and redundant. Solution: if we don't wanna to break the old logic that using denseCounts while cardinality < 1024 and using IntIntHashMap while 10% threshold and using denseCounts while the rest of the case, then we could still use denseCounts while cardinality < 1024, if not , we use IntIntHashMap. when 10% of the unique term collected,then change to use denseCounts. -- This message was sent by Atlassian Jira (v8.20.1#820001) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org