[
https://issues.apache.org/jira/browse/LUCENE-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Weiming Wu updated LUCENE-10624:
--------------------------------
Description:
h3. Problem Statement
We noticed DocValue read performance regression with the iterative API when
upgrading from Lucene 5 to Lucene 9. Our latency is increased by 50%. The
degradation is similar to what's described in
https://issues.apache.org/jira/browse/SOLR-9599
By analyzing profiling data, we found method "advanceWithinBlock" and
"advanceExactWithinBlock" for Sparse IndexedDISI is slow in Lucene 9 due to
their O(N) doc lookup algorithm.
h3. Changes
Used binary search algorithm to replace current O(N) lookup algorithm in Sparse
IndexedDISI "advanceWithinBlock" and "advanceExactWithinBlock" because docs are
in ascending order.
h3. Test
{code:java}
./gradlew tidy
./gradlew check {code}
h3. Benchmark
Ran sparseTaxis from {color:#1d1c1d}luceneutil. Attached the reports of
baseline and candidates.{color}
{color:#1d1c1d}1. Most cases have ~15% latency reduction.{color}
{color:#1d1c1d}2. Some highlights:{color}
* {color:#1d1c1d}T0 green_pickup_latitude:[40.75 TO 40.9]
yellow_pickup_latitude:[40.75 TO 40.9] sort=null{color}
** {color:#1d1c1d}Baseline: 10973978+ hits hits in 726.81967 msec{color}
** {color:#1d1c1d}Candidate: 10973978+ hits hits in 484.544594 msec{color}
was:
h3. Problem Statement
We noticed DocValue read performance regression with the iterative API when
upgrading from Lucene 5 to Lucene 9. Our latency is increased by 50%. The
degradation is similar to what's described in
https://issues.apache.org/jira/browse/SOLR-9599
By analyzing profiling data, we found method "advanceWithinBlock" and
"advanceExactWithinBlock" for Sparse IndexedDISI is slow in Lucene 9 due to
their O(N) doc lookup algorithm.
h3. Changes
Used binary search algorithm to replace current O(N) lookup algorithm in Sparse
IndexedDISI "advanceWithinBlock" and "advanceExactWithinBlock" because docs are
in ascending order.
h3. Test
{code:java}
./gradlew tidy
./gradlew check {code}
> Binary Search for Sparse IndexedDISI advanceWithinBlock &
> advanceExactWithinBlock
> ---------------------------------------------------------------------------------
>
> Key: LUCENE-10624
> URL: https://issues.apache.org/jira/browse/LUCENE-10624
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs
> Affects Versions: 9.0, 9.1, 9.2
> Reporter: Weiming Wu
> Priority: Major
> Attachments: baseline_sparseTaxis_searchsparse-sorted.0.log,
> candidate_sparseTaxis_searchsparse-sorted.0.log
>
> Time Spent: 10m
> Remaining Estimate: 0h
>
> h3. Problem Statement
> We noticed DocValue read performance regression with the iterative API when
> upgrading from Lucene 5 to Lucene 9. Our latency is increased by 50%. The
> degradation is similar to what's described in
> https://issues.apache.org/jira/browse/SOLR-9599
> By analyzing profiling data, we found method "advanceWithinBlock" and
> "advanceExactWithinBlock" for Sparse IndexedDISI is slow in Lucene 9 due to
> their O(N) doc lookup algorithm.
> h3. Changes
> Used binary search algorithm to replace current O(N) lookup algorithm in
> Sparse IndexedDISI "advanceWithinBlock" and "advanceExactWithinBlock" because
> docs are in ascending order.
> h3. Test
> {code:java}
> ./gradlew tidy
> ./gradlew check {code}
> h3. Benchmark
> Ran sparseTaxis from {color:#1d1c1d}luceneutil. Attached the reports of
> baseline and candidates.{color}
> {color:#1d1c1d}1. Most cases have ~15% latency reduction.{color}
> {color:#1d1c1d}2. Some highlights:{color}
> * {color:#1d1c1d}T0 green_pickup_latitude:[40.75 TO 40.9]
> yellow_pickup_latitude:[40.75 TO 40.9] sort=null{color}
> ** {color:#1d1c1d}Baseline: 10973978+ hits hits in 726.81967 msec{color}
> ** {color:#1d1c1d}Candidate: 10973978+ hits hits in 484.544594 msec{color}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]