mayya-sharipova closed pull request #12794: Speedup concurrent multi-segment
HNWS graph search
URL: https://github.com/apache/lucene/pull/12794
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the sp
mayya-sharipova commented on PR #12794:
URL: https://github.com/apache/lucene/pull/12794#issuecomment-1929779853
Closed in favour of https://github.com/apache/lucene/pull/12962
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub a
mayya-sharipova commented on PR #12794:
URL: https://github.com/apache/lucene/pull/12794#issuecomment-1824736252
@vigyasharma Answering other questions:
> We seem to consistently see an improvement in recall between single
segment, and multi-segment runs (both seq and conc.) on baseli
mayya-sharipova commented on code in PR #12794:
URL: https://github.com/apache/lucene/pull/12794#discussion_r1403584908
##
lucene/core/src/java/org/apache/lucene/search/TopKnnCollector.java:
##
@@ -26,26 +26,71 @@
* @lucene.experimental
*/
public final class TopKnnCollector
mayya-sharipova commented on code in PR #12794:
URL: https://github.com/apache/lucene/pull/12794#discussion_r1403563168
##
lucene/core/src/java/org/apache/lucene/search/TopKnnCollector.java:
##
@@ -26,26 +26,71 @@
* @lucene.experimental
*/
public final class TopKnnCollector
mayya-sharipova commented on code in PR #12794:
URL: https://github.com/apache/lucene/pull/12794#discussion_r1403560016
##
lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java:
##
@@ -79,24 +81,30 @@ public Query rewrite(IndexSearcher indexSearcher) throws
mayya-sharipova commented on code in PR #12794:
URL: https://github.com/apache/lucene/pull/12794#discussion_r1403551939
##
lucene/core/src/java/org/apache/lucene/search/TopKnnCollector.java:
##
@@ -26,26 +26,71 @@
* @lucene.experimental
*/
public final class TopKnnCollector
vigyasharma commented on code in PR #12794:
URL: https://github.com/apache/lucene/pull/12794#discussion_r1399616466
##
lucene/core/src/java/org/apache/lucene/search/AbstractKnnVectorQuery.java:
##
@@ -79,24 +81,30 @@ public Query rewrite(IndexSearcher indexSearcher) throws
IOEx
vigyasharma commented on PR #12794:
URL: https://github.com/apache/lucene/pull/12794#issuecomment-1817282807
Do you have a mental model on what kind of graphs would see minimal loss of
recall between baseline and candidate? Is this change better with denser
(higher fanout) graphs? Would it
vigyasharma commented on PR #12794:
URL: https://github.com/apache/lucene/pull/12794#issuecomment-1817274998
We seem to consistently see an improvement in recall between single segment,
and multi-segment runs (both seq and conc.) on baseline. Is this because with
multiple segments, we get m
vigyasharma commented on code in PR #12794:
URL: https://github.com/apache/lucene/pull/12794#discussion_r1397994430
##
lucene/core/src/java/org/apache/lucene/search/TopKnnCollector.java:
##
@@ -26,26 +26,71 @@
* @lucene.experimental
*/
public final class TopKnnCollector ext
mayya-sharipova commented on PR #12794:
URL: https://github.com/apache/lucene/pull/12794#issuecomment-1815203589
## Experiments
- Available processors: 10; thread pool size: 16
- luceneutil tool
Search:
- **baseline**: Lucene main branch
- **candidate1**: only global queue
benwtrent commented on PR #12794:
URL: https://github.com/apache/lucene/pull/12794#issuecomment-1808282034
@mayya-sharipova two important measurements we need to check here:
- When comparing baseline & candidate, can the `candidate` get to higher
recall than baseline with lower laten
mayya-sharipova commented on PR #12794:
URL: https://github.com/apache/lucene/pull/12794#issuecomment-1807150252
**10M vectors of 100 dims** : k=100, 27 segments
|| Avg visited nodes |QPS| Recall|
| :--- | ---: | ---: | ---: |
benwtrent commented on PR #12794:
URL: https://github.com/apache/lucene/pull/12794#issuecomment-1806359735
@mayya-sharipova with those experiments, I am guessing these are over
multiple segments, could you include that information in the table?
It would also be awesome to see what the
mayya-sharipova commented on PR #12794:
URL: https://github.com/apache/lucene/pull/12794#issuecomment-1806267939
### Experiments
- [luceneutil](https://github.com/mikemccand/luceneutil) tool
- Apple M1 Max (Apple M1 Max, 10 CPU cores)
- **baseline**: Lucene main branch
- **c
mayya-sharipova opened a new pull request, #12794:
URL: https://github.com/apache/lucene/pull/12794
Speedup concurrent multi-segment HNWS graph search by exchanging
the global minimum similarity collected so far across segments. As the global
similarity is used as a minimum threshold t
17 matches
Mail list logo