[jira] [Commented] (LUCENE-10233) Store docIds as bitset when leafCardinality = 1 to speed up addAll

Feng Guo (Jira) Wed, 17 Nov 2021 04:40:06 -0800


    [ 
https://issues.apache.org/jira/browse/LUCENE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445143#comment-17445143
 ]


Feng Guo commented on LUCENE-10233:
-----------------------------------

[~jpountz] I execute a low cardinality terms query on origin code/ initial 
implement / SparseFixedBitSet:

Origin: 158ms
Initial implement: 6ms
SparseFixedBitSet: 71ms

And this is the script: 
[https://github.com/apache/lucene/blob/2586014ca8575d55ae25fa4b35d8df6d2f8f40f2/lucene/core/src/java/org/apache/lucene/util/bkd/Run.java]

 

I hope there can be some more optimization for SparseFixedBitSet so that it can 
perform better.

> Store docIds as bitset when leafCardinality = 1 to speed up addAll
> ------------------------------------------------------------------
>
>                 Key: LUCENE-10233
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10233
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>            Reporter: Feng Guo
>            Priority: Major
>          Time Spent: 20m
>  Remaining Estimate: 0h
>
> In low cardinality points cases, id blocks will usually store doc ids that 
> have the same point value, and {{intersect}} will get into {{addAll}} logic. 
> If we store ids as bitset, and give the IntersectVisitor bulk visiting 
> ability, we can speed up addAll because we can just execute the 'or' logic 
> between the result and the block ids.
> Optimization will be triggered when the following conditions are met at the 
> same time:
>  # leafCardinality = 1
>  # max(docId) - min(docId) <= 16 * pointCount (in order to avoid expanding 
> too much storage)
>  # no duplicate doc id
> I mocked a field that has 10,000,000 docs per value and search it with a 1 
> term PointInSetQuery, the build scorer time decreased from 71ms to 8ms.



--
This message was sent by Atlassian Jira
(v8.20.1#820001)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[jira] [Commented] (LUCENE-10233) Store docIds as bitset when leafCardinality = 1 to speed up addAll

Reply via email to