[ 
https://issues.apache.org/jira/browse/LUCENE-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lu Xugang updated LUCENE-10281:
-------------------------------
    Description: 
Description:
In construction method StringValueFacetCounts(StringDocValuesReaderState state, 
FacetsCollector facetsCollector), if facetsCollector was provided, a condition 
of *(totalHits < totalDocs / 10)* used to judge whether using IntIntHashMap 
which means sparse to store term ord and count 。

But per totalHits doesn't means it must be containing SSDV , and so is 
totalDocs. so the right calculation should be *( totalHits has SSDV) / 
(totalDocs has SSDV) .( totalDocs has SSDV)* was easy to get by 
SortedSetDocValues#getValueCount(), *totalHits has SSDV* is hard to get because 
we can only read index by docId provided by FacetsCollector, but the way of 
getting *totalHits has SSDV* is slow and redundant.

Solution:
if we don't wanna to break the old logic that using denseCounts while 
cardinality < 1024 and using IntIntHashMap while 10% threshold and using 
denseCounts while the rest of the case, then we could still use denseCounts if 
cardinality < 1024, if not , using IntIntHashMap. when 10% of the unique term 
collected,then change to use denseCounts.

  was:
Description:
In construction method StringValueFacetCounts(StringDocValuesReaderState state, 
FacetsCollector facetsCollector), if facetsCollector was provided, a condition 
of *totalHits < totalDocs / 10 * used to judge whether using IntIntHashMap 
which means sparse to store term ord and count 。but per totalHits doesn't means 
it must be containing SSDV , and so is  totalDocs.  so the right calculation 
should be *( totalHits has SSDV) / (totalDocs has SSDV) *.  *totalDocs has 
SSDV* was easy to get by SortedSetDocValues#getValueCount(), 
*totalHits has SSDV* is hard to get because we can only read index by docId 
provided in FacetsCollector, but the way of getting *totalHits has SSDV* is 
slow and redundant.

Solution:
if we don't wanna to break the old logic that using denseCounts while 
cardinality < 1024 and using IntIntHashMap while 10% threshold and using 
denseCounts while the rest of the case, then we could still use denseCounts 
while cardinality < 1024, if not , we use IntIntHashMap.  when 10% of the 
unique term collected,then change to use denseCounts.






> Error condition used to judge whether hits are sparse in 
> StringValueFacetCounts
> -------------------------------------------------------------------------------
>
>                 Key: LUCENE-10281
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10281
>             Project: Lucene - Core
>          Issue Type: Bug
>          Components: modules/facet
>    Affects Versions: 8.11
>            Reporter: Lu Xugang
>            Priority: Major
>
> Description:
> In construction method StringValueFacetCounts(StringDocValuesReaderState 
> state, FacetsCollector facetsCollector), if facetsCollector was provided, a 
> condition of *(totalHits < totalDocs / 10)* used to judge whether using 
> IntIntHashMap which means sparse to store term ord and count 。
> But per totalHits doesn't means it must be containing SSDV , and so is 
> totalDocs. so the right calculation should be *( totalHits has SSDV) / 
> (totalDocs has SSDV) .( totalDocs has SSDV)* was easy to get by 
> SortedSetDocValues#getValueCount(), *totalHits has SSDV* is hard to get 
> because we can only read index by docId provided by FacetsCollector, but the 
> way of getting *totalHits has SSDV* is slow and redundant.
> Solution:
> if we don't wanna to break the old logic that using denseCounts while 
> cardinality < 1024 and using IntIntHashMap while 10% threshold and using 
> denseCounts while the rest of the case, then we could still use denseCounts 
> if cardinality < 1024, if not , using IntIntHashMap. when 10% of the unique 
> term collected,then change to use denseCounts.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to