[ https://issues.apache.org/jira/browse/LUCENE-10624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17558247#comment-17558247 ]
Weiming Wu edited comment on LUCENE-10624 at 6/23/22 10:46 PM: --------------------------------------------------------------- Hi Adrien. Thanks for your comments! For the reason of speedup, I investigated this spare doc test case. It retrieves all field values of hit docs at the end of the test. The change speeds up this operation. [https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/SearchTaxis.java#L152-L155.] {code:java} for(ScoreDoc hit : hits.scoreDocs) { Document doc = searcher.doc(hit.doc); results.add(" " + hit.doc + " " + hit.score + ": " + doc.getFields().size() + " fields"); } {code} Also, I found the blog of this performance test. Seems this performance test is designed to test sparse doc value retrieve. (not an expert, feel free to correct me). [https://www.elastic.co/blog/sparse-versus-dense-document-values-with-apache-lucene] For exponential search, I did the performance test again. Comparing to the pure binary search, some cases speed up, some slow down. I also did a test in our search system, the latency is slightly increased because our doc is very spare. Therefore, I feel I need to investigate more. I plan to open a new issue for exponential search. Does it make sense? [~jpountz] was (Author: JIRAUSER290435): Hi Adrien. Thanks for your comments! For the reason of speedup, I investigated this spare doc test case. It retrieves all field values of hit docs at the end of the test. The change speeds up this operation. [https://github.com/mikemccand/luceneutil/blob/master/src/main/perf/SearchTaxis.java#L152-L155.] {code:java} for(ScoreDoc hit : hits.scoreDocs) { Document doc = searcher.doc(hit.doc); results.add(" " + hit.doc + " " + hit.score + ": " + doc.getFields().size() + " fields"); } {code} Also, I found the blog of this performance test. Seems this performance test is designed to test sparse doc value retrieve. (not an expert, feel free to correct me). [https://www.elastic.co/blog/sparse-versus-dense-document-values-with-apache-lucene] For exponential search, I did the performance test again. Comparing to the pure binary search, some cases speed up, some slow down. I also did a test in our search system, the latency is slightly increased because our doc is very spare. Therefore, I feel I need to investigate more. I plan to open a new issue for exponential search. Does it make sense? > Binary Search for Sparse IndexedDISI advanceWithinBlock & > advanceExactWithinBlock > --------------------------------------------------------------------------------- > > Key: LUCENE-10624 > URL: https://issues.apache.org/jira/browse/LUCENE-10624 > Project: Lucene - Core > Issue Type: Improvement > Components: core/codecs > Affects Versions: 9.0, 9.1, 9.2 > Reporter: Weiming Wu > Priority: Major > Attachments: baseline_sparseTaxis_searchsparse-sorted.0.log, > candidate_sparseTaxis_searchsparse-sorted.0.log > > Time Spent: 0.5h > Remaining Estimate: 0h > > h3. Problem Statement > We noticed DocValue read performance regression with the iterative API when > upgrading from Lucene 5 to Lucene 9. Our latency is increased by 50%. The > degradation is similar to what's described in > https://issues.apache.org/jira/browse/SOLR-9599 > By analyzing profiling data, we found method "advanceWithinBlock" and > "advanceExactWithinBlock" for Sparse IndexedDISI is slow in Lucene 9 due to > their O(N) doc lookup algorithm. > h3. Changes > Used binary search algorithm to replace current O(N) lookup algorithm in > Sparse IndexedDISI "advanceWithinBlock" and "advanceExactWithinBlock" because > docs are in ascending order. > h3. Test > {code:java} > ./gradlew tidy > ./gradlew check {code} > h3. Benchmark > Ran sparseTaxis test cases from {color:#1d1c1d}luceneutil. Attached the > reports of baseline and candidates in attachments section.{color} > {color:#1d1c1d}1. Most cases have 5-10% search latency reduction.{color} > {color:#1d1c1d}2. Some highlights (>20%):{color} > * *{color:#1d1c1d}T0 green_pickup_latitude:[40.75 TO 40.9] > yellow_pickup_latitude:[40.75 TO 40.9] sort=null{color}* > ** {color:#1d1c1d}*Baseline:* 10973978+ hits hits in *726.81967 msec*{color} > ** {color:#1d1c1d}*Candidate:* 10973978+ hits hits in *484.544594 > msec*{color} > * *{color:#1d1c1d}T0 cab_color:y cab_color:g sort=null{color}* > ** {color:#1d1c1d}*Baseline:* 2300174+ hits hits in *95.698324 msec*{color} > ** {color:#1d1c1d}*Candidate:* 2300174+ hits hits in *78.336193 msec*{color} > * {color:#1d1c1d}*T1 cab_color:y cab_color:g sort=null*{color} > ** {color:#1d1c1d}*Baseline:* 2300174+ hits hits in *391.565239 msec*{color} > ** {color:#1d1c1d}*Candidate:* 300174+ hits hits in *227.592885 > msec*{color}{*}{*} > * {color:#1d1c1d}*...*{color} -- This message was sent by Atlassian Jira (v8.20.7#820007) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org