[ 
https://issues.apache.org/jira/browse/LUCENE-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17558247#comment-17558247
 ] 

Weiming Wu edited comment on LUCENE-10624 at 6/23/22 10:46 PM:
---------------------------------------------------------------

Hi Adrien. Thanks for your comments!

 

For the reason of speedup, I investigated this spare doc test case. It 
retrieves all field values of hit docs at the end of the test. The change 
speeds up this operation. 
[https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/SearchTaxis.java#L152-L155.]
{code:java}
for(ScoreDoc hit : hits.scoreDocs) {
  Document doc = searcher.doc(hit.doc);
  results.add("  " + hit.doc + " " + hit.score + ": " + doc.getFields().size() 
+ " fields");
} {code}
Also, I found the blog of this performance test. Seems this performance test is 
designed to test sparse doc value retrieve. (not an expert, feel free to 
correct me). 
[https://www.elastic.co/blog/sparse-versus-dense-document-values-with-apache-lucene]

 

For exponential search, I did the performance test again. Comparing to the pure 
binary search, some cases speed up, some slow down. I also did a test in our 
search system, the latency is slightly increased because our doc is very spare. 
Therefore, I feel I need to investigate more. I plan to open a new issue for 
exponential search. Does it make sense? [~jpountz] 


was (Author: JIRAUSER290435):
Hi Adrien. Thanks for your comments!

 

For the reason of speedup, I investigated this spare doc test case. It 
retrieves all field values of hit docs at the end of the test. The change 
speeds up this operation. 
[https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/SearchTaxis.java#L152-L155.]
{code:java}
for(ScoreDoc hit : hits.scoreDocs) {
  Document doc = searcher.doc(hit.doc);
  results.add("  " + hit.doc + " " + hit.score + ": " + doc.getFields().size() 
+ " fields");
} {code}
Also, I found the blog of this performance test. Seems this performance test is 
designed to test sparse doc value retrieve. (not an expert, feel free to 
correct me). 
[https://www.elastic.co/blog/sparse-versus-dense-document-values-with-apache-lucene]

 

For exponential search, I did the performance test again. Comparing to the pure 
binary search, some cases speed up, some slow down. I also did a test in our 
search system, the latency is slightly increased because our doc is very spare. 
Therefore, I feel I need to investigate more. I plan to open a new issue for 
exponential search. Does it make sense?

> Binary Search for Sparse IndexedDISI advanceWithinBlock & 
> advanceExactWithinBlock
> ---------------------------------------------------------------------------------
>
>                 Key: LUCENE-10624
>                 URL: https://issues.apache.org/jira/browse/LUCENE-10624
>             Project: Lucene - Core
>          Issue Type: Improvement
>          Components: core/codecs
>    Affects Versions: 9.0, 9.1, 9.2
>            Reporter: Weiming Wu
>            Priority: Major
>         Attachments: baseline_sparseTaxis_searchsparse-sorted.0.log, 
> candidate_sparseTaxis_searchsparse-sorted.0.log
>
>          Time Spent: 0.5h
>  Remaining Estimate: 0h
>
> h3. Problem Statement
> We noticed DocValue read performance regression with the iterative API when 
> upgrading from Lucene 5 to Lucene 9. Our latency is increased by 50%. The 
> degradation is similar to what's described in 
> https://issues.apache.org/jira/browse/SOLR-9599 
> By analyzing profiling data, we found method "advanceWithinBlock" and 
> "advanceExactWithinBlock" for Sparse IndexedDISI is slow in Lucene 9 due to 
> their O(N) doc lookup algorithm.
> h3. Changes
> Used binary search algorithm to replace current O(N) lookup algorithm in 
> Sparse IndexedDISI "advanceWithinBlock" and "advanceExactWithinBlock" because 
> docs are in ascending order.
> h3. Test
> {code:java}
> ./gradlew tidy
> ./gradlew check {code}
> h3. Benchmark
> Ran sparseTaxis test cases from {color:#1d1c1d}luceneutil. Attached the 
> reports of baseline and candidates in attachments section.{color}
> {color:#1d1c1d}1. Most cases have 5-10% search latency reduction.{color}
> {color:#1d1c1d}2. Some highlights (>20%):{color}
>  * *{color:#1d1c1d}T0 green_pickup_latitude:[40.75 TO 40.9] 
> yellow_pickup_latitude:[40.75 TO 40.9] sort=null{color}*
>  ** {color:#1d1c1d}*Baseline:*  10973978+ hits hits in *726.81967 msec*{color}
>  ** {color:#1d1c1d}*Candidate:* 10973978+ hits hits in *484.544594 
> msec*{color}
>  * *{color:#1d1c1d}T0 cab_color:y cab_color:g sort=null{color}*
>  ** {color:#1d1c1d}*Baseline:* 2300174+ hits hits in *95.698324 msec*{color}
>  ** {color:#1d1c1d}*Candidate:* 2300174+ hits hits in *78.336193 msec*{color}
>  * {color:#1d1c1d}*T1 cab_color:y cab_color:g sort=null*{color}
>  ** {color:#1d1c1d}*Baseline:* 2300174+ hits hits in *391.565239 msec*{color}
>  ** {color:#1d1c1d}*Candidate:* 300174+ hits hits in *227.592885 
> msec*{color}{*}{*}
>  * {color:#1d1c1d}*...*{color}



--
This message was sent by Atlassian Jira
(v8.20.7#820007)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to