jmazanec15 commented on PR #12050: URL: https://github.com/apache/lucene/pull/12050#issuecomment-1397643952
Per [this discussion](https://github.com/apache/lucene/pull/12050#discussion_r1061034056), I refactored OnHeapHnswGraph to use a TreeMap to represent the graph structure for levels greater than 0. I ran performance tests with the same setup as https://github.com/apache/lucene/issues/11354#issuecomment-1239961308, and the results did not show a significant difference in indexing time between my previous implementation, the implementation using the map, and the current implementation with no merge optimization. Additionally, the results did not show a difference in merge time between by previous implementation and the implementation using the map. Here are the results: ### Segment Size 10K Exper. | Total indexing time (s) | Total time to merge numeric vectors (ms) | Recall -- | -- | -- | -- Control-1 | 189s | 697280 | 0.979 Control-2 | 190s | 722042 | 0.979 Control-3 | 191s | 713402 | 0.979 Test-array 1 | 190s | 683966 | 0.98 Test-array 2 | 187s | 683584 | 0.98 Test-array 3 | 190s | 702458 | 0.98 Test-map 1 | 189s | 723582 | 0.98 Test-map 2 | 187s | 658196 | 0.98 Test-map 3 | 190s | 667777 | 0.98 ### Segment Size 100K Exper. | Total indexing time (s) | Total time to merge numeric vectors (ms) | Recall -- | -- | -- | -- Control-1 | 366s | 675361 | 0.981 Control-2 | 370s | 695974 | 0.981 Control-3 | 367s | 684418 | 0.981 Test-array 1 | 368s | 651814 | 0.981 Test-array 2 | 368s | 654862 | 0.981 Test-array 3 | 368s | 656062 | 0.981 Test-map 1 | 364s | 637257 | 0.981 Test-map 2 | 370s | 628755 | 0.981 Test-map 3 | 366s | 647569 | 0.981 ### Segment Size 500K Exper. | Total indexing time (s) | Total time to merge numeric vectors (ms) | Recall -- | -- | -- | -- Control-1 | 633s | 655538 | 0.98 Control-2 | 631s | 664622 | 0.98 Control-3 | 627s | 635919 | 0.98 Test-array 1 | 639s | 376139 | 0.98 Test-array 2 | 636s | 378071 | 0.98 Test-array 3 | 638s | 352633 | 0.98 Test-map 1 | 645s | 373572 | 0.98 Test-map 2 | 635s | 374309 | 0.98 Test-map 3 | 633s | 381212 | 0.98 Given that the results do not show a significant difference, I switched to use the treemap to avoid multiple large array copies. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org