[ https://issues.apache.org/jira/browse/LUCENE-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17556662#comment-17556662 ]
Weiming Wu commented on LUCENE-10624: ------------------------------------- Added benchmark data to the content. > Binary Search for Sparse IndexedDISI advanceWithinBlock & > advanceExactWithinBlock > --------------------------------------------------------------------------------- > > Key: LUCENE-10624 > URL: https://issues.apache.org/jira/browse/LUCENE-10624 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Affects Versions: 9.0, 9.1, 9.2 > Reporter: Weiming Wu > Priority: Major > Attachments: baseline_sparseTaxis_searchsparse-sorted.0.log, > candidate_sparseTaxis_searchsparse-sorted.0.log > > Time Spent: 10m > Remaining Estimate: 0h > > h3. Problem Statement > We noticed DocValue read performance regression with the iterative API when > upgrading from Lucene 5 to Lucene 9. Our latency is increased by 50%. The > degradation is similar to what's described in > https://issues.apache.org/jira/browse/SOLR-9599 > By analyzing profiling data, we found method "advanceWithinBlock" and > "advanceExactWithinBlock" for Sparse IndexedDISI is slow in Lucene 9 due to > their O(N) doc lookup algorithm. > h3. Changes > Used binary search algorithm to replace current O(N) lookup algorithm in > Sparse IndexedDISI "advanceWithinBlock" and "advanceExactWithinBlock" because > docs are in ascending order. > h3. Test > {code:java} > ./gradlew tidy > ./gradlew check {code} > h3. Benchmark > Ran sparseTaxis test cases from {color:#1d1c1d}luceneutil. Attached the > reports of baseline and candidates in attachments section. > {color} > {color:#1d1c1d}1. Most cases have ~10% search latency reduction.{color} > {color:#1d1c1d}2. Some highlights (>20%):{color} > * *{color:#1d1c1d}T0 green_pickup_latitude:[40.75 TO 40.9] > yellow_pickup_latitude:[40.75 TO 40.9] sort=null{color}* > ** {color:#1d1c1d}*Baseline:* 10973978+ hits hits in *726.81967 msec*{color} > ** {color:#1d1c1d}*Candidate:* 10973978+ hits hits in *484.544594 > msec*{color} > * *{color:#1d1c1d}T0 cab_color:y cab_color:g sort=null{color}* > ** {color:#1d1c1d}*Baseline:* 2300174+ hits hits in *95.698324 msec*{color} > ** {color:#1d1c1d}*Candidate:* 2300174+ hits hits in *78.336193 msec*{color} -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org