[ 
https://issues.apache.org/jira/browse/LUCENE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445792#comment-17445792
 ] 

Feng Guo commented on LUCENE-10233:
-----------------------------------

Hi [~jpountz], Would you like to give me some suggestions on how to speed up 
SparseFixedBitSet?



Here are some of my current thoughts: 

This is the cpu profile of SparseFixedBitSet: [^SparseFixedBitSet.png], most of 
the cpu is used to constructing SparseFixedBitSet, i think this is because we 
always new big arrays there. Constructing a single global SparseFixedBitSet and 
reusing it for each block may help, but global SparseFixedBitSet needs the 
sgment's maxdoc and maxDoc is not available in the BKDReader. I'm not sure if 
it is worth to changing the BKDReader constrctor signature for this since 
BKDReader is a public class.

> Store docIds as bitset when leafCardinality = 1 to speed up addAll
> ------------------------------------------------------------------
>
>                 Key: LUCENE-10233
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10233
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>            Reporter: Feng Guo
>            Priority: Major
>         Attachments: SparseFixedBitSet.png
>
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> In low cardinality points cases, id blocks will usually store doc ids that 
> have the same point value, and {{intersect}} will get into {{addAll}} logic. 
> If we store ids as bitset, and give the IntersectVisitor bulk visiting 
> ability, we can speed up addAll because we can just execute the 'or' logic 
> between the result and the block ids.
> Optimization will be triggered when the following conditions are met at the 
> same time:
>  # leafCardinality = 1
>  # max(docId) - min(docId) <= 16 * pointCount (in order to avoid expanding 
> too much storage)
>  # no duplicate doc id
> I mocked a field that has 10,000,000 docs per value and search it with a 1 
> term PointInSetQuery, the build scorer time decreased from 71ms to 8ms.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to