[ 
https://issues.apache.org/jira/browse/LUCENE-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Weiming Wu updated LUCENE-10624:
--------------------------------
    Description: 
h3. Problem Statement

We noticed DocValue read performance regression with the iterative API when 
upgrading from Lucene 5 to Lucene 9. Our latency is increased by 50%. The 
degradation is similar to what's described in 
https://issues.apache.org/jira/browse/SOLR-9599 

By analyzing profiling data, we found method "advanceWithinBlock" and 
"advanceExactWithinBlock" for Sparse IndexedDISI is slow in Lucene 9 due to 
their O(N) doc lookup algorithm.
h3. Changes

Used binary search algorithm to replace current O(N) lookup algorithm in Sparse 
IndexedDISI "advanceWithinBlock" and "advanceExactWithinBlock" because docs are 
in ascending order.
h3. Test
{code:java}
./gradlew tidy
./gradlew check {code}
h3. Benchmark

06/30/2022 Update: The below benchmark data points are invalid. I started a new 
AWS EC2 instance and run the test. The performance of candidate and baseline 
are very close.

 

-Ran sparseTaxis test cases from {color:#1d1c1d}luceneutil. Attached the 
reports of baseline and candidates in attachments section.{color}-

-{color:#1d1c1d}1. Most cases have 5-10% search latency reduction.{color}-

-{color:#1d1c1d}2. Some highlights (>20%):{color}-
 * -*{color:#1d1c1d}T0 green_pickup_latitude:[40.75 TO 40.9] 
yellow_pickup_latitude:[40.75 TO 40.9] sort=null{color}*-
 ** -{color:#1d1c1d}*Baseline:*  10973978+ hits hits in *726.81967 msec*{color}-
 ** -{color:#1d1c1d}*Candidate:* 10973978+ hits hits in *484.544594 
msec*{color}-
 * -*{color:#1d1c1d}T0 cab_color:y cab_color:g sort=null{color}*-
 ** -{color:#1d1c1d}*Baseline:* 2300174+ hits hits in *95.698324 msec*{color}-
 ** -{color:#1d1c1d}*Candidate:* 2300174+ hits hits in *78.336193 msec*{color}-
 * -{color:#1d1c1d}*T1 cab_color:y cab_color:g sort=null*{color}-
 ** -{color:#1d1c1d}*Baseline:* 2300174+ hits hits in *391.565239 msec*{color}-
 ** -{color:#1d1c1d}*Candidate:* 300174+ hits hits in *227.592885 
msec*{color}{*}{{*}}-
 * -{color:#1d1c1d}*...*{color}-

  was:
h3. Problem Statement

We noticed DocValue read performance regression with the iterative API when 
upgrading from Lucene 5 to Lucene 9. Our latency is increased by 50%. The 
degradation is similar to what's described in 
https://issues.apache.org/jira/browse/SOLR-9599 

By analyzing profiling data, we found method "advanceWithinBlock" and 
"advanceExactWithinBlock" for Sparse IndexedDISI is slow in Lucene 9 due to 
their O(N) doc lookup algorithm.
h3. Changes

Used binary search algorithm to replace current O(N) lookup algorithm in Sparse 
IndexedDISI "advanceWithinBlock" and "advanceExactWithinBlock" because docs are 
in ascending order.
h3. Test
{code:java}
./gradlew tidy
./gradlew check {code}
h3. Benchmark

Ran sparseTaxis test cases from {color:#1d1c1d}luceneutil. Attached the reports 
of baseline and candidates in attachments section.{color}

{color:#1d1c1d}1. Most cases have 5-10% search latency reduction.{color}

{color:#1d1c1d}2. Some highlights (>20%):{color}
 * *{color:#1d1c1d}T0 green_pickup_latitude:[40.75 TO 40.9] 
yellow_pickup_latitude:[40.75 TO 40.9] sort=null{color}*
 ** {color:#1d1c1d}*Baseline:*  10973978+ hits hits in *726.81967 msec*{color}
 ** {color:#1d1c1d}*Candidate:* 10973978+ hits hits in *484.544594 msec*{color}
 * *{color:#1d1c1d}T0 cab_color:y cab_color:g sort=null{color}*
 ** {color:#1d1c1d}*Baseline:* 2300174+ hits hits in *95.698324 msec*{color}
 ** {color:#1d1c1d}*Candidate:* 2300174+ hits hits in *78.336193 msec*{color}
 * {color:#1d1c1d}*T1 cab_color:y cab_color:g sort=null*{color}
 ** {color:#1d1c1d}*Baseline:* 2300174+ hits hits in *391.565239 msec*{color}
 ** {color:#1d1c1d}*Candidate:* 300174+ hits hits in *227.592885 
msec*{color}{*}{*}
 * {color:#1d1c1d}*...*{color}


> Binary Search for Sparse IndexedDISI advanceWithinBlock & 
> advanceExactWithinBlock
> ---------------------------------------------------------------------------------
>
>                 Key: LUCENE-10624
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10624
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>    Affects Versions: 9.0, 9.1, 9.2
>            Reporter: Weiming Wu
>            Priority: Major
>         Attachments: baseline_sparseTaxis_searchsparse-sorted.0.log, 
> candiate-exponential-searchsparse-sorted.0.log, 
> candidate_sparseTaxis_searchsparse-sorted.0.log
>
>          Time Spent: 50m
>  Remaining Estimate: 0h
>
> h3. Problem Statement
> We noticed DocValue read performance regression with the iterative API when 
> upgrading from Lucene 5 to Lucene 9. Our latency is increased by 50%. The 
> degradation is similar to what's described in 
> https://issues.apache.org/jira/browse/SOLR-9599 
> By analyzing profiling data, we found method "advanceWithinBlock" and 
> "advanceExactWithinBlock" for Sparse IndexedDISI is slow in Lucene 9 due to 
> their O(N) doc lookup algorithm.
> h3. Changes
> Used binary search algorithm to replace current O(N) lookup algorithm in 
> Sparse IndexedDISI "advanceWithinBlock" and "advanceExactWithinBlock" because 
> docs are in ascending order.
> h3. Test
> {code:java}
> ./gradlew tidy
> ./gradlew check {code}
> h3. Benchmark
> 06/30/2022 Update: The below benchmark data points are invalid. I started a 
> new AWS EC2 instance and run the test. The performance of candidate and 
> baseline are very close.
>  
> -Ran sparseTaxis test cases from {color:#1d1c1d}luceneutil. Attached the 
> reports of baseline and candidates in attachments section.{color}-
> -{color:#1d1c1d}1. Most cases have 5-10% search latency reduction.{color}-
> -{color:#1d1c1d}2. Some highlights (>20%):{color}-
>  * -*{color:#1d1c1d}T0 green_pickup_latitude:[40.75 TO 40.9] 
> yellow_pickup_latitude:[40.75 TO 40.9] sort=null{color}*-
>  ** -{color:#1d1c1d}*Baseline:*  10973978+ hits hits in *726.81967 
> msec*{color}-
>  ** -{color:#1d1c1d}*Candidate:* 10973978+ hits hits in *484.544594 
> msec*{color}-
>  * -*{color:#1d1c1d}T0 cab_color:y cab_color:g sort=null{color}*-
>  ** -{color:#1d1c1d}*Baseline:* 2300174+ hits hits in *95.698324 msec*{color}-
>  ** -{color:#1d1c1d}*Candidate:* 2300174+ hits hits in *78.336193 
> msec*{color}-
>  * -{color:#1d1c1d}*T1 cab_color:y cab_color:g sort=null*{color}-
>  ** -{color:#1d1c1d}*Baseline:* 2300174+ hits hits in *391.565239 
> msec*{color}-
>  ** -{color:#1d1c1d}*Candidate:* 300174+ hits hits in *227.592885 
> msec*{color}{*}{{*}}-
>  * -{color:#1d1c1d}*...*{color}-



--
This message was sent by Atlassian Jira
(v8.20.10#820010)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to