[GitHub] [lucene] LuXugang opened a new pull request #511: LUCENE-10281: Error condition used to judge whether hits are sparse in StringValueFacetCounts

GitBox Fri, 03 Dec 2021 01:21:51 -0800


LuXugang opened a new pull request #511:
URL: https://github.com/apache/lucene/pull/511



   Description：
   In construction method StringValueFacetCounts(StringDocValuesReaderState 
state, FacetsCollector facetsCollector), if facetsCollector was provided, a 
condition of (totalHits < totalDocs / 10) used to judge whether using 
IntIntHashMap which means sparse to store term ord and count 。
   But per totalHits doesn't means it must be containing SSDV , and so is 
totalDocs. so the right calculation should be ( totalHits has SSDV) / 
(totalDocs has SSDV) .(totalDocs has SSDV) was easy to get by 
SortedSetDocValues#getValueCount(), (totalHits has SSDV) is hard to get because 
we can only read index by docId provided by FacetsCollector, but the way of 
getting (totalHits has SSDV) is slow and redundant.
   Solution:
   if we don't wanna to break the old logic that using denseCounts while 
cardinality < 1024 and using IntIntHashMap while 10% threshold and using 
denseCounts while the rest of the case, then we could still use denseCounts if 
cardinality < 1024, if not , using IntIntHashMap. when 10% of the unique term 
collected，then change to use denseCounts.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] LuXugang opened a new pull request #511: LUCENE-10281: Error condition used to judge whether hits are sparse in StringValueFacetCounts

Reply via email to