[
https://issues.apache.org/jira/browse/LUCENE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17445032#comment-17445032
]
Feng Guo edited comment on LUCENE-10233 at 11/17/21, 11:00 AM:
---------------------------------------------------------------
[~jpountz] Thanks for the guide! I agree with you that a SparseFixedBitSet is
more suitable for the existing framework and less intrusive, so I implemented
it in the newest commit. This approach looks good to me too, so let's go ahead!
was (Author: gf2121):
[~jpountz] Thanks for the guide! I agree with you that a SparseFixedBitSet is
more suitable for the existing framework and less intrusive, so I implemented
it in the newest commit. This approach looks good to me too, so let's go the
way you prefer!
> Store docIds as bitset when leafCardinality = 1 to speed up addAll
> ------------------------------------------------------------------
>
> Key: LUCENE-10233
> URL: https://issues.apache.org/jira/browse/LUCENE-10233
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs
> Reporter: Feng Guo
> Priority: Major
> Time Spent: 20m
> Remaining Estimate: 0h
>
> In low cardinality points cases, id blocks will usually store doc ids that
> have the same point value, and {{intersect}} will get into {{addAll}} logic.
> If we store ids as bitset, and give the IntersectVisitor bulk visiting
> ability, we can speed up addAll because we can just execute the 'or' logic
> between the result and the block ids.
> Optimization will be triggered when the following conditions are met at the
> same time:
> # leafCardinality = 1
> # max(docId) - min(docId) <= 16 * pointCount (in order to avoid expanding
> too much storage)
> # no duplicate doc id
> I mocked a field that has 10,000,000 docs per value and search it with a 1
> term PointInSetQuery, the build scorer time decreased from 71ms to 8ms.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]