LuXugang opened a new pull request #511: URL: https://github.com/apache/lucene/pull/511
Description: In construction method StringValueFacetCounts(StringDocValuesReaderState state, FacetsCollector facetsCollector), if facetsCollector was provided, a condition of (totalHits < totalDocs / 10) used to judge whether using IntIntHashMap which means sparse to store term ord and count 。 But per totalHits doesn't means it must be containing SSDV , and so is totalDocs. so the right calculation should be ( totalHits has SSDV) / (totalDocs has SSDV) .(totalDocs has SSDV) was easy to get by SortedSetDocValues#getValueCount(), (totalHits has SSDV) is hard to get because we can only read index by docId provided by FacetsCollector, but the way of getting (totalHits has SSDV) is slow and redundant. Solution: if we don't wanna to break the old logic that using denseCounts while cardinality < 1024 and using IntIntHashMap while 10% threshold and using denseCounts while the rest of the case, then we could still use denseCounts if cardinality < 1024, if not , using IntIntHashMap. when 10% of the unique term collected,then change to use denseCounts. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org