[
https://issues.apache.org/jira/browse/LUCENE-10281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17454083#comment-17454083
]
Lu Xugang edited comment on LUCENE-10281 at 12/6/21, 3:39 PM:
--------------------------------------------------------------
Hi, [~sokolov] , I did test via *python src/python/localrun.py -source
wikimedium1m ,* and nineteen comparisons were performed, which result should be
listed? sorry for not familiar with how to use luceneutil, and I just show the
final comparison.
*
was (Author: chrislu):
Hi, [~sokolov] , I did test via *python src/python/localrun.py -source
wikimedium1m ,* and nineteen comparisons were performed, which result should be
listed? sorry for not familiar with how to use luceneutil, and I just show the
final comparison.
!1.png!
> Error condition used to judge whether hits are sparse in
> StringValueFacetCounts
> -------------------------------------------------------------------------------
>
> Key: LUCENE-10281
> URL: https://issues.apache.org/jira/browse/LUCENE-10281
> Project: Lucene - Core
> Issue Type: Improvement
> Components: modules/facet
> Affects Versions: 8.11
> Reporter: Lu Xugang
> Priority: Minor
> Attachments: 1.jpg
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> Description:
> In construction method StringValueFacetCounts(StringDocValuesReaderState
> state, FacetsCollector facetsCollector), if facetsCollector was provided, a
> condition of *(totalHits < totalDocs / 10)* used to judge whether using
> IntIntHashMap which means sparse to store term ord and count 。
> But per totalHits doesn't means it must be containing SSDV , and so is
> totalDocs. so the right calculation should be *( totalHits has SSDV) /
> (totalDocs has SSDV) .( totalDocs has SSDV)* was easy to get by
> SortedSetDocValues#getValueCount(), *totalHits has SSDV* is hard to get
> because we can only read index by docId provided by FacetsCollector, but the
> way of getting *totalHits has SSDV* is slow and redundant.
> Solution:
> if we don't wanna to break the old logic that using denseCounts while
> cardinality < 1024 and using IntIntHashMap while 10% threshold and using
> denseCounts while the rest of the case, then we could still use denseCounts
> if cardinality < 1024, if not , using IntIntHashMap. when 10% of the unique
> term collected,then change to use denseCounts.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]