Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2024-02-06 Thread via GitHub
mayya-sharipova closed pull request #12794: Speedup concurrent multi-segment HNWS graph search URL: https://github.com/apache/lucene/pull/12794 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the sp

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2024-02-06 Thread via GitHub
mayya-sharipova commented on PR #12794: URL: https://github.com/apache/lucene/pull/12794#issuecomment-1929779853 Closed in favour of https://github.com/apache/lucene/pull/12962 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub a

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-23 Thread via GitHub
mayya-sharipova commented on PR #12794: URL: https://github.com/apache/lucene/pull/12794#issuecomment-1824736252 @vigyasharma Answering other questions: > We seem to consistently see an improvement in recall between single segment, and multi-segment runs (both seq and conc.) on baseli

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-23 Thread via GitHub
mayya-sharipova commented on code in PR #12794: URL: https://github.com/apache/lucene/pull/12794#discussion_r1403584908 ## lucene/core/src/java/org/apache/lucene/search/TopKnnCollector.java: ## @@ -26,26 +26,71 @@ * @lucene.experimental */ public final class TopKnnCollector

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-23 Thread via GitHub
mayya-sharipova commented on code in PR #12794: URL: https://github.com/apache/lucene/pull/12794#discussion_r1403563168 ## lucene/core/src/java/org/apache/lucene/search/TopKnnCollector.java: ## @@ -26,26 +26,71 @@ * @lucene.experimental */ public final class TopKnnCollector

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-23 Thread via GitHub
mayya-sharipova commented on code in PR #12794: URL: https://github.com/apache/lucene/pull/12794#discussion_r1403560016 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -79,24 +81,30 @@ public Query rewrite(IndexSearcher indexSearcher) throws

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-23 Thread via GitHub
mayya-sharipova commented on code in PR #12794: URL: https://github.com/apache/lucene/pull/12794#discussion_r1403551939 ## lucene/core/src/java/org/apache/lucene/search/TopKnnCollector.java: ## @@ -26,26 +26,71 @@ * @lucene.experimental */ public final class TopKnnCollector

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-20 Thread via GitHub
vigyasharma commented on code in PR #12794: URL: https://github.com/apache/lucene/pull/12794#discussion_r1399616466 ## lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java: ## @@ -79,24 +81,30 @@ public Query rewrite(IndexSearcher indexSearcher) throws IOEx

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-17 Thread via GitHub
vigyasharma commented on PR #12794: URL: https://github.com/apache/lucene/pull/12794#issuecomment-1817282807 Do you have a mental model on what kind of graphs would see minimal loss of recall between baseline and candidate? Is this change better with denser (higher fanout) graphs? Would it

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-17 Thread via GitHub
vigyasharma commented on PR #12794: URL: https://github.com/apache/lucene/pull/12794#issuecomment-1817274998 We seem to consistently see an improvement in recall between single segment, and multi-segment runs (both seq and conc.) on baseline. Is this because with multiple segments, we get m

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-17 Thread via GitHub
vigyasharma commented on code in PR #12794: URL: https://github.com/apache/lucene/pull/12794#discussion_r1397994430 ## lucene/core/src/java/org/apache/lucene/search/TopKnnCollector.java: ## @@ -26,26 +26,71 @@ * @lucene.experimental */ public final class TopKnnCollector ext

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-16 Thread via GitHub
mayya-sharipova commented on PR #12794: URL: https://github.com/apache/lucene/pull/12794#issuecomment-1815203589 ## Experiments - Available processors: 10; thread pool size: 16 - luceneutil tool Search: - **baseline**: Lucene main branch - **candidate1**: only global queue

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-13 Thread via GitHub
benwtrent commented on PR #12794: URL: https://github.com/apache/lucene/pull/12794#issuecomment-1808282034 @mayya-sharipova two important measurements we need to check here: - When comparing baseline & candidate, can the `candidate` get to higher recall than baseline with lower laten

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-12 Thread via GitHub
mayya-sharipova commented on PR #12794: URL: https://github.com/apache/lucene/pull/12794#issuecomment-1807150252 **10M vectors of 100 dims** : k=100, 27 segments || Avg visited nodes |QPS| Recall| | :--- | ---: | ---: | ---: |

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-10 Thread via GitHub
benwtrent commented on PR #12794: URL: https://github.com/apache/lucene/pull/12794#issuecomment-1806359735 @mayya-sharipova with those experiments, I am guessing these are over multiple segments, could you include that information in the table? It would also be awesome to see what the

Re: [PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-10 Thread via GitHub
mayya-sharipova commented on PR #12794: URL: https://github.com/apache/lucene/pull/12794#issuecomment-1806267939 ### Experiments - [luceneutil](https://github.com/mikemccand/luceneutil) tool - Apple M1 Max (Apple M1 Max, 10 CPU cores) - **baseline**: Lucene main branch - **c

[PR] Speedup concurrent multi-segment HNWS graph search [lucene]

2023-11-10 Thread via GitHub
mayya-sharipova opened a new pull request, #12794: URL: https://github.com/apache/lucene/pull/12794 Speedup concurrent multi-segment HNWS graph search by exchanging the global minimum similarity collected so far across segments. As the global similarity is used as a minimum threshold t