[
https://issues.apache.org/jira/browse/LUCENE-10233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Feng Guo updated LUCENE-10233:
------------------------------
Description:
In low cardinality points cases, id blocks will usually store doc ids that have
the same point value, and intersect will get into addAll logic. If we store ids
as bitset when the leafCadinality = 1, and give the IntersectVisitor bulk visit
ability, we can speed up addAll because we can just execute the 'or' logic
between the result and the block ids.
I mocked a field that has 10,000,000 docs per value and search it with a
PointInSetQuery with 1 term, the build scorer time decreased from 71ms to 8ms.
Concerns:
1. Bitset could occupy more disk space.(Maybe we can force this optimization
only works when block's (max-min) <= n * count?)
2. MergeReader will become a bit slower because it needs to iterate docIds one
by one.
was:
In low cardinality points cases, id blocks will usually store doc ids that have
the same point value, and intersect will get into addAll logic. If we store ids
as bitset when the leafCadinality = 1, and give the IntersectVisitor bulk visit
ability, we can speed up addAll because we can just execute the 'or' logic
between the result and the block ids.
I mocked a field that has 10,000,000 docs per value and search it with a
PointInSetQuery with 1 term, the build scorer time decreased from 71ms to 8ms.
Concerns:
1. Bitset could occupy more disk space.(Maybe we can force this optimization
only works when block's (max-min) <= n * count?)
2. MergeReader will become slower because it needs to iterate docIds one by
one.
> Store docIds as bitset when leafCardinality = 1 to speed up addAll
> ------------------------------------------------------------------
>
> Key: LUCENE-10233
> URL: https://issues.apache.org/jira/browse/LUCENE-10233
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs
> Reporter: Feng Guo
> Priority: Major
> Time Spent: 10m
> Remaining Estimate: 0h
>
> In low cardinality points cases, id blocks will usually store doc ids that
> have the same point value, and intersect will get into addAll logic. If we
> store ids as bitset when the leafCadinality = 1, and give the
> IntersectVisitor bulk visit ability, we can speed up addAll because we can
> just execute the 'or' logic between the result and the block ids.
> I mocked a field that has 10,000,000 docs per value and search it with a
> PointInSetQuery with 1 term, the build scorer time decreased from 71ms to 8ms.
> Concerns:
> 1. Bitset could occupy more disk space.(Maybe we can force this optimization
> only works when block's (max-min) <= n * count?)
> 2. MergeReader will become a bit slower because it needs to iterate docIds
> one by one.
--
This message was sent by Atlassian Jira
(v8.20.1#820001)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]