[jira] [Comment Edited] (LUCENE-10281) Error condition used to judge whether hits are sparse in StringValueFacetCounts

Lu Xugang (Jira) Mon, 06 Dec 2021 07:40:08 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17454083#comment-17454083
 ]


Lu Xugang edited comment on LUCENE-10281 at 12/6/21, 3:39 PM:
--------------------------------------------------------------

Hi, [~sokolov] , I did test via *python src/python/localrun.py -source 
wikimedium1m ,* and nineteen comparisons were performed, which result should be 
listed? sorry for not familiar with how to use luceneutil, and I just show the 
final comparison.

* 

 


was (Author: chrislu):
Hi, [~sokolov] , I did test via *python src/python/localrun.py -source 
wikimedium1m ,* and nineteen comparisons were performed, which result should be 
listed? sorry for not familiar with how to use luceneutil, and I just show the 
final comparison.

 !1.png! 
 

> Error condition used to judge whether hits are sparse in 
> StringValueFacetCounts
> -------------------------------------------------------------------------------
>
>                 Key: LUCENE-10281
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10281
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: modules/facet
>    Affects Versions: 8.11
>            Reporter: Lu Xugang
>            Priority: Minor
>         Attachments: 1.jpg
>
>          Time Spent: 10m
>  Remaining Estimate: 0h
>
> Description：
> In construction method StringValueFacetCounts(StringDocValuesReaderState 
> state, FacetsCollector facetsCollector), if facetsCollector was provided, a 
> condition of *(totalHits < totalDocs / 10)* used to judge whether using 
> IntIntHashMap which means sparse to store term ord and count 。
> But per totalHits doesn't means it must be containing SSDV , and so is 
> totalDocs. so the right calculation should be *( totalHits has SSDV) / 
> (totalDocs has SSDV) .( totalDocs has SSDV)* was easy to get by 
> SortedSetDocValues#getValueCount(), *totalHits has SSDV* is hard to get 
> because we can only read index by docId provided by FacetsCollector, but the 
> way of getting *totalHits has SSDV* is slow and redundant.
> Solution:
> if we don't wanna to break the old logic that using denseCounts while 
> cardinality < 1024 and using IntIntHashMap while 10% threshold and using 
> denseCounts while the rest of the case, then we could still use denseCounts 
> if cardinality < 1024, if not , using IntIntHashMap. when 10% of the unique 
> term collected，then change to use denseCounts.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[jira] [Comment Edited] (LUCENE-10281) Error condition used to judge whether hits are sparse in StringValueFacetCounts

Reply via email to