Re: [PR] Add timeout support to AbstractVectorSimilarityQuery [lucene]

2024-05-13 Thread via GitHub
kaivalnp commented on code in PR #13285: URL: https://github.com/apache/lucene/pull/13285#discussion_r1599150976 ## lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java: ## @@ -105,13 +116,16 @@ public Scorer scorer(LeafReaderContext context) throws

Re: [PR] Add timeout support to AbstractVectorSimilarityQuery [lucene]

2024-05-13 Thread via GitHub
kaivalnp commented on code in PR #13285: URL: https://github.com/apache/lucene/pull/13285#discussion_r1599151876 ## lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java: ## @@ -144,22 +158,22 @@ protected boolean match(int doc) { }

Re: [PR] Add timeout support to AbstractVectorSimilarityQuery [lucene]

2024-05-13 Thread via GitHub
kaivalnp commented on code in PR #13285: URL: https://github.com/apache/lucene/pull/13285#discussion_r1599150976 ## lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java: ## @@ -105,13 +116,16 @@ public Scorer scorer(LeafReaderContext context) throws

Re: [PR] Add timeout support to AbstractVectorSimilarityQuery [lucene]

2024-05-13 Thread via GitHub
msokolov commented on code in PR #13285: URL: https://github.com/apache/lucene/pull/13285#discussion_r1599136370 ## lucene/core/src/java/org/apache/lucene/search/AbstractVectorSimilarityQuery.java: ## @@ -105,13 +116,16 @@ public Scorer scorer(LeafReaderContext context) throws

Re: [PR] Prefetch postings data. [lucene]

2024-05-13 Thread via GitHub
mikemccand commented on PR #13364: URL: https://github.com/apache/lucene/pull/13364#issuecomment-2108796478 This is cool! In the hot case, do we expect `prefetch` to be a no-op? So we are hoping for "first do no harm" in that case? (I haven't looked at `MMapDirectory`'s impl yet). But i

Re: [PR] Prefetch postings data. [lucene]

2024-05-13 Thread via GitHub
mikemccand commented on code in PR #13364: URL: https://github.com/apache/lucene/pull/13364#discussion_r1599078360 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99PostingsReader.java: ## @@ -1097,7 +1118,9 @@ public BlockImpactsDocsEnum(FieldInfo fieldInfo, In

Re: [PR] Prefetch postings data. [lucene]

2024-05-13 Thread via GitHub
mikemccand commented on code in PR #13364: URL: https://github.com/apache/lucene/pull/13364#discussion_r1599076213 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99PostingsReader.java: ## @@ -902,6 +917,12 @@ public int advance(int target) throws IOException {

Re: [PR] Prefetch postings data. [lucene]

2024-05-13 Thread via GitHub
mikemccand commented on code in PR #13364: URL: https://github.com/apache/lucene/pull/13364#discussion_r1599076213 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99PostingsReader.java: ## @@ -902,6 +917,12 @@ public int advance(int target) throws IOException {

Re: [PR] Prefetch postings data. [lucene]

2024-05-13 Thread via GitHub
mikemccand commented on PR #13364: URL: https://github.com/apache/lucene/pull/13364#issuecomment-2108781387 We might (eventually, later) consider an API change when pulling postings that expresses that the caller intends to advance? Or, maybe simpler would be expressing that caller will no

Re: [PR] Fix vector scorer interface consistency [lucene]

2024-05-13 Thread via GitHub
benwtrent merged PR #13365: URL: https://github.com/apache/lucene/pull/13365 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [I] Multi-value Support for KnnVectorField [lucene]

2024-05-13 Thread via GitHub
benwtrent commented on issue #12313: URL: https://github.com/apache/lucene/issues/12313#issuecomment-2108168850 > Not sure I understand option 3. Are you thinking that graph has different types of edges b/w documents based on diff. similarity functions? So if you were using max similarity y

Re: [I] Multi-value Support for KnnVectorField [lucene]

2024-05-13 Thread via GitHub
vigyasharma commented on issue #12313: URL: https://github.com/apache/lucene/issues/12313#issuecomment-2108154488 @benwtrent I like the idea of having documents be vertices in the graph, with an API that let's you iterate/access the different vectors per doc. It would have an indexing time

Re: [I] Exploring GPU based kNN vector search [lucene]

2024-05-13 Thread via GitHub
yupeng9 commented on issue #13003: URL: https://github.com/apache/lucene/issues/13003#issuecomment-2108153649 This is very interesting work. We saw Milvus published article on how GPU accelerates vector search, which looks like a game changer. ``` For a batch size of 1, the T4 is 6.4x

Re: [PR] Make `IndexInput#prefetch` take an offset. [lucene]

2024-05-13 Thread via GitHub
rmuir commented on PR #13363: URL: https://github.com/apache/lucene/pull/13363#issuecomment-2108089293 I like this much better from api perspective: it closer maps to `madvise()` and to me is more straightforward. Especially as the current PRs out there to use it (terms and postings) are us

Re: [PR] Fix vector scorer interface consistency [lucene]

2024-05-13 Thread via GitHub
benwtrent commented on code in PR #13365: URL: https://github.com/apache/lucene/pull/13365#discussion_r1598645109 ## lucene/core/src/java/org/apache/lucene/search/FloatVectorSimilarityValuesSource.java: ## @@ -36,29 +36,6 @@ public FloatVectorSimilarityValuesSource(float[] vecto

Re: [PR] Fix vector scorer interface consistency [lucene]

2024-05-13 Thread via GitHub
benwtrent commented on code in PR #13365: URL: https://github.com/apache/lucene/pull/13365#discussion_r1598625772 ## lucene/core/src/java/org/apache/lucene/util/quantization/QuantizedByteVectorValues.java: ## @@ -53,5 +53,5 @@ public final long cost() { * @param query the qu

Re: [PR] Ensure negative scores aren not returned from scalar quantization scorer [lucene]

2024-05-13 Thread via GitHub
benwtrent merged PR #13356: URL: https://github.com/apache/lucene/pull/13356 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.a

Re: [PR] Fix vector scorer interface consistency [lucene]

2024-05-13 Thread via GitHub
benwtrent commented on code in PR #13365: URL: https://github.com/apache/lucene/pull/13365#discussion_r1598534710 ## lucene/core/src/java/org/apache/lucene/util/quantization/QuantizedByteVectorValues.java: ## @@ -53,5 +53,5 @@ public final long cost() { * @param query the qu

Re: [I] Multi-value Support for KnnVectorField [lucene]

2024-05-13 Thread via GitHub
benwtrent commented on issue #12313: URL: https://github.com/apache/lucene/issues/12313#issuecomment-2107660631 @vigyasharma @krickert There are a couple of ways to implement this natively in Lucene. 1. Have each individual vector be a connection in the graph with some resolution ba

Re: [PR] Ensure negative scores aren not returned from scalar quantization scorer [lucene]

2024-05-13 Thread via GitHub
benwtrent commented on code in PR #13356: URL: https://github.com/apache/lucene/pull/13356#discussion_r1598523684 ## lucene/CHANGES.txt: ## @@ -359,6 +359,8 @@ Bug Fixes * GITHUB#12966: Aggregation facets no longer assume that aggregation values are positive. (Stefan Vodita)

Re: [PR] Reduce memory usage of field maps in FieldInfos and BlockTree TermsReader. [lucene]

2024-05-13 Thread via GitHub
bruno-roustant merged PR #13327: URL: https://github.com/apache/lucene/pull/13327 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@luc

Re: [PR] Prefetch postings data. [lucene]

2024-05-13 Thread via GitHub
rmuir commented on PR #13364: URL: https://github.com/apache/lucene/pull/13364#issuecomment-2107576749 Can we do better than blindly prefetching skipdata? Currently, skipdata is not used until we advance() past doc in the first block: https://github.com/apache/lucene/blob/83db8dfba3

Re: [I] Multi-value Support for KnnVectorField [lucene]

2024-05-13 Thread via GitHub
vigyasharma commented on issue #12313: URL: https://github.com/apache/lucene/issues/12313#issuecomment-2107568883 > In another scenario, the results would just return the top doc and not repeat it. I believe this is what the parent-block join implementation for vector values does cur

Re: [PR] Fix vector scorer interface consistency [lucene]

2024-05-13 Thread via GitHub
ChrisHegarty commented on code in PR #13365: URL: https://github.com/apache/lucene/pull/13365#discussion_r1598460737 ## lucene/core/src/java/org/apache/lucene/util/quantization/QuantizedByteVectorValues.java: ## @@ -53,5 +53,5 @@ public final long cost() { * @param query the

[PR] Fix vector scorer interface consistency [lucene]

2024-05-13 Thread via GitHub
benwtrent opened a new pull request, #13365: URL: https://github.com/apache/lucene/pull/13365 Follow up to: https://github.com/apache/lucene/pull/13181 I noticed the quantized interface had a slightly different name. Additionally, testing showed we are inconsistent when there ar

[PR] Prefetch postings data. [lucene]

2024-05-13 Thread via GitHub
jpountz opened a new pull request, #13364: URL: https://github.com/apache/lucene/pull/13364 This uses the `IndexInput#prefetch` API for postings. This relies on heuristics, as we don't know ahead of time what data we will need from a postings list: - Postings lists are prefetched entire

Re: [PR] Use `IndexInput#prefetch` for terms dictionary lookups. [lucene]

2024-05-13 Thread via GitHub
jpountz commented on code in PR #13359: URL: https://github.com/apache/lucene/pull/13359#discussion_r1598128358 ## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ## @@ -307,6 +309,31 @@ private boolean setEOF() { return true; }

Re: [PR] Add timeout support to AbstractVectorSimilarityQuery [lucene]

2024-05-13 Thread via GitHub
kaivalnp commented on PR #13285: URL: https://github.com/apache/lucene/pull/13285#issuecomment-2107012946 Hi @benwtrent @vigyasharma could you help review this? Thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use t