Jackyrie2 opened a new pull request, #12480:
URL: https://github.com/apache/lucene/pull/12480

   ### Description
   This is an update to the previous PR. While benchmarking potential 
improvements to `HNSWGraphBuilder.initializeFromGraph`, a few issues were 
found. 
   * ordinal of newNode was used as index to the `scoringContext` map instead 
of using size as index
   * the entire `scoringContext` was evaluated during call to `sortInternal` 
instead of just checking if the new index being sorted has a pre-computed score
   
   Two new unit tests were added to demonstrate the bugs, this PR fixes the 
issues above.
   
   ### Benchmarking Result
   To measure any meaningful latency improvements, we have to first create a 
big index and other small indexes, then once we force an index merge, the index 
writer will invoke`HNSWGraphBuilder.initializeFromGraph`. 
[KnnGraphTester](https://github.com/mikemccand/luceneutil/blob/master/src/main/KnnGraphTester.java#L690)
 was modified as the following:
   1. first add 90% of documents
   2. iw.commit()
   3. forceMerge into 1 segment
   4. set merge policy to NoMergePolicy and add the rest of the documents
   5. set merge policy to LogDocMergePolicy
   6. forceMerge into 1 segment again <- This step is specifically captured in 
the benchmark and I have verified in logs that initializeFromGraph is called 
exactly once in this step
   
   From the benchmark results, we are calculating significantly fewer scores 
using the lazy eval enhancement. However, the indexMergeTime did not decrease 
as expected. 
   <img width="1162" alt="benchmark" 
src="https://github.com/apache/lucene/assets/45954779/c8201249-85d0-4192-90b3-543ae72623b5";>
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to