[
https://issues.apache.org/jira/browse/LUCENE-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Weiming Wu updated LUCENE-10624:
--------------------------------
Attachment: candiate-exponential-searchsparse-sorted.0.log
> Binary Search for Sparse IndexedDISI advanceWithinBlock &
> advanceExactWithinBlock
> ---------------------------------------------------------------------------------
>
> Key: LUCENE-10624
> URL: https://issues.apache.org/jira/browse/LUCENE-10624
> Project: Lucene - Core
> Issue Type: Improvement
> Components: core/codecs
> Affects Versions: 9.0, 9.1, 9.2
> Reporter: Weiming Wu
> Priority: Major
> Attachments: baseline_sparseTaxis_searchsparse-sorted.0.log,
> candiate-exponential-searchsparse-sorted.0.log,
> candidate_sparseTaxis_searchsparse-sorted.0.log
>
> Time Spent: 0.5h
> Remaining Estimate: 0h
>
> h3. Problem Statement
> We noticed DocValue read performance regression with the iterative API when
> upgrading from Lucene 5 to Lucene 9. Our latency is increased by 50%. The
> degradation is similar to what's described in
> https://issues.apache.org/jira/browse/SOLR-9599
> By analyzing profiling data, we found method "advanceWithinBlock" and
> "advanceExactWithinBlock" for Sparse IndexedDISI is slow in Lucene 9 due to
> their O(N) doc lookup algorithm.
> h3. Changes
> Used binary search algorithm to replace current O(N) lookup algorithm in
> Sparse IndexedDISI "advanceWithinBlock" and "advanceExactWithinBlock" because
> docs are in ascending order.
> h3. Test
> {code:java}
> ./gradlew tidy
> ./gradlew check {code}
> h3. Benchmark
> Ran sparseTaxis test cases from {color:#1d1c1d}luceneutil. Attached the
> reports of baseline and candidates in attachments section.{color}
> {color:#1d1c1d}1. Most cases have 5-10% search latency reduction.{color}
> {color:#1d1c1d}2. Some highlights (>20%):{color}
> * *{color:#1d1c1d}T0 green_pickup_latitude:[40.75 TO 40.9]
> yellow_pickup_latitude:[40.75 TO 40.9] sort=null{color}*
> ** {color:#1d1c1d}*Baseline:* 10973978+ hits hits in *726.81967 msec*{color}
> ** {color:#1d1c1d}*Candidate:* 10973978+ hits hits in *484.544594
> msec*{color}
> * *{color:#1d1c1d}T0 cab_color:y cab_color:g sort=null{color}*
> ** {color:#1d1c1d}*Baseline:* 2300174+ hits hits in *95.698324 msec*{color}
> ** {color:#1d1c1d}*Candidate:* 2300174+ hits hits in *78.336193 msec*{color}
> * {color:#1d1c1d}*T1 cab_color:y cab_color:g sort=null*{color}
> ** {color:#1d1c1d}*Baseline:* 2300174+ hits hits in *391.565239 msec*{color}
> ** {color:#1d1c1d}*Candidate:* 300174+ hits hits in *227.592885
> msec*{color}{*}{*}
> * {color:#1d1c1d}*...*{color}
--
This message was sent by Atlassian Jira
(v8.20.7#820007)
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]