[GitHub] [lucene] jpountz commented on pull request #12334: Fix searchafter query high latency when after value is out of range for segment

2023-06-14 Thread via GitHub
jpountz commented on PR #12334: URL: https://github.com/apache/lucene/pull/12334#issuecomment-1590631585 We just upgraded Elasticsearch to a Lucene snapshot that has this change, and this triggered major speedups on some queries. In my opinion, the PR title and description don't do justice

[GitHub] [lucene] gashutos commented on pull request #12334: Fix searchafter query high latency when after value is out of range for segment

2023-06-14 Thread via GitHub
gashutos commented on PR #12334: URL: https://github.com/apache/lucene/pull/12334#issuecomment-1590638213 > We just upgraded Elasticsearch to a Lucene snapshot that has this change, and this triggered major speedups on some queries. In my opinion, the PR title and description don't do justi

[GitHub] [lucene] jpountz commented on pull request #12334: Fix searchafter query high latency when after value is out of range for segment

2023-06-14 Thread via GitHub
jpountz commented on PR #12334: URL: https://github.com/apache/lucene/pull/12334#issuecomment-1590663201 @gashutos I think we should make users aware of this optimization, would you be up for opening another PR that adds a CHANGES entry? -- This is an automated message from the Apache Git

[GitHub] [lucene] jpountz commented on pull request #12334: Fix searchafter query high latency when after value is out of range for segment

2023-06-14 Thread via GitHub
jpountz commented on PR #12334: URL: https://github.com/apache/lucene/pull/12334#issuecomment-1590666069 Let's also update the title/description of this PR? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above

[GitHub] [lucene] gashutos opened a new pull request, #12367: Add CHANGES.txt for #12334 Honor after value for skipping documents even if queue is not full for PagingFieldCollector

2023-06-14 Thread via GitHub
gashutos opened a new pull request, #12367: URL: https://github.com/apache/lucene/pull/12367 ### Description Adding CHANGES.txt in improvements sections. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

[GitHub] [lucene] gashutos closed pull request #12367: Add CHANGES.txt for #12334 Honor after value for skipping documents even if queue is not full for PagingFieldCollector

2023-06-14 Thread via GitHub
gashutos closed pull request #12367: Add CHANGES.txt for #12334 Honor after value for skipping documents even if queue is not full for PagingFieldCollector URL: https://github.com/apache/lucene/pull/12367 -- This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [lucene] gashutos opened a new pull request, #12368: Add CHANGES.txt for #12334 Honor after value for skipping documents e…

2023-06-14 Thread via GitHub
gashutos opened a new pull request, #12368: URL: https://github.com/apache/lucene/pull/12368 Adding CHANGES.txt for #12334 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment

[GitHub] [lucene] gashutos commented on pull request #12334: Honor after value for skipping documents even if queue is not full for PagingFieldCollector

2023-06-14 Thread via GitHub
gashutos commented on PR #12334: URL: https://github.com/apache/lucene/pull/12334#issuecomment-1590701060 Sure, changes title/description, LMK if looks good. CHANGES.txt PR https://github.com/apache/lucene/pull/12368 -- This is an automated message from the Apache Git Service. To respon

[GitHub] [lucene] javanna commented on issue #12347: Allow extensions of IndexSearcher to provide custom SliceExecutor and slices computation

2023-06-14 Thread via GitHub
javanna commented on issue #12347: URL: https://github.com/apache/lucene/issues/12347#issuecomment-1590702834 heya @sohami thanks a lot for sharing more context. > With custom slice computation to control the max slices per request/index the limiting factor in SliceExecutor will not

[GitHub] [lucene] jpountz commented on pull request #12334: Honor after value for skipping documents even if queue is not full for PagingFieldCollector

2023-06-14 Thread via GitHub
jpountz commented on PR #12334: URL: https://github.com/apache/lucene/pull/12334#issuecomment-1590703139 Looks great, thanks! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment.

[GitHub] [lucene] jpountz merged pull request #12368: Add CHANGES.txt for #12334 Honor after value for skipping documents e…

2023-06-14 Thread via GitHub
jpountz merged PR #12368: URL: https://github.com/apache/lucene/pull/12368 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] LuXugang commented on pull request #12349: CompetitiveIterator should be null if sort field does not exist in TermOrdValComparator

2023-06-14 Thread via GitHub
LuXugang commented on PR #12349: URL: https://github.com/apache/lucene/pull/12349#issuecomment-1590716969 ```java public void test111() throws IOException{ Directory dir = newDirectory(); IndexWriterConfig iwc = new IndexWriterConfig(new MockAnalyzer(random()));

[GitHub] [lucene] alessandrobenedetti commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors

2023-06-14 Thread via GitHub
alessandrobenedetti commented on code in PR #12253: URL: https://github.com/apache/lucene/pull/12253#discussion_r1229310619 ## lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/ConstKnnFloatValueSource.java: ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache

[GitHub] [lucene] uschindler commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors

2023-06-14 Thread via GitHub
uschindler commented on code in PR #12253: URL: https://github.com/apache/lucene/pull/12253#discussion_r1229323827 ## lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/ConstKnnFloatValueSource.java: ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software

[GitHub] [lucene] uschindler commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors

2023-06-14 Thread via GitHub
uschindler commented on code in PR #12253: URL: https://github.com/apache/lucene/pull/12253#discussion_r1229323827 ## lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/ConstKnnFloatValueSource.java: ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software

[GitHub] [lucene] uschindler commented on a diff in pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors

2023-06-14 Thread via GitHub
uschindler commented on code in PR #12253: URL: https://github.com/apache/lucene/pull/12253#discussion_r1229329251 ## lucene/queries/src/java/org/apache/lucene/queries/function/valuesource/ConstKnnFloatValueSource.java: ## @@ -0,0 +1,74 @@ +/* + * Licensed to the Apache Software

[GitHub] [lucene] jpountz commented on pull request #12349: CompetitiveIterator should be null if sort field does not exist in TermOrdValComparator

2023-06-14 Thread via GitHub
jpountz commented on PR #12349: URL: https://github.com/apache/lucene/pull/12349#issuecomment-1590865303 I agree that we should fix this comparator so that the last call to `IndexSearcher.search` in your test only collects 2000 hits. This doesn't seem to be what your PR does though? -- T

[GitHub] [lucene] jpountz merged pull request #12366: Move TermAndBoost back to its original location.

2023-06-14 Thread via GitHub
jpountz merged PR #12366: URL: https://github.com/apache/lucene/pull/12366 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[GitHub] [lucene] javanna opened a new pull request, #12369: Increased the likelihood of leveraging inter-segment concurrency in tests

2023-06-14 Thread via GitHub
javanna opened a new pull request, #12369: URL: https://github.com/apache/lucene/pull/12369 We have recently increased the likelihood of leveraging inter-segment search concurrency in tests when newSearcher is used to create the index searcher (see #959). When parallel execution is enabled

[GitHub] [lucene] javanna commented on a diff in pull request #12369: Increased the likelihood of leveraging inter-segment concurrency in tests

2023-06-14 Thread via GitHub
javanna commented on code in PR #12369: URL: https://github.com/apache/lucene/pull/12369#discussion_r1229356704 ## lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java: ## @@ -1965,9 +1966,9 @@ public static IndexSearcher newSearcher( .add

[GitHub] [lucene] javanna commented on a diff in pull request #12369: Increased the likelihood of leveraging inter-segment concurrency in tests

2023-06-14 Thread via GitHub
javanna commented on code in PR #12369: URL: https://github.com/apache/lucene/pull/12369#discussion_r1229359014 ## lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java: ## @@ -1965,9 +1966,9 @@ public static IndexSearcher newSearcher( .add

[GitHub] [lucene] jpountz commented on a diff in pull request #12369: Increased the likelihood of leveraging inter-segment concurrency in tests

2023-06-14 Thread via GitHub
jpountz commented on code in PR #12369: URL: https://github.com/apache/lucene/pull/12369#discussion_r1229481989 ## lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java: ## @@ -1941,7 +1940,7 @@ public static IndexSearcher newSearcher( } else {

[GitHub] [lucene] javanna commented on a diff in pull request #12369: Increased the likelihood of leveraging inter-segment concurrency in tests

2023-06-14 Thread via GitHub
javanna commented on code in PR #12369: URL: https://github.com/apache/lucene/pull/12369#discussion_r1229508836 ## lucene/test-framework/src/java/org/apache/lucene/tests/util/LuceneTestCase.java: ## @@ -1941,7 +1940,7 @@ public static IndexSearcher newSearcher( } else {

[GitHub] [lucene] LuXugang commented on pull request #12349: CompetitiveIterator should be null if sort field does not exist in TermOrdValComparator

2023-06-14 Thread via GitHub
LuXugang commented on PR #12349: URL: https://github.com/apache/lucene/pull/12349#issuecomment-1591110183 > This doesn't seem to be what your PR does though? It indeed has no relation to this PR》 > If search sort field does not exist, should we early terminate collection after

[GitHub] [lucene] jpountz commented on pull request #12349: CompetitiveIterator should be null if sort field does not exist in TermOrdValComparator

2023-06-14 Thread via GitHub
jpountz commented on PR #12349: URL: https://github.com/apache/lucene/pull/12349#issuecomment-159992 +1 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

[GitHub] [lucene] uschindler commented on pull request #12281: Add checks in KNNVectorField / KNNVectorQuery to only allow non-null, non-empty and finite vectors

2023-06-14 Thread via GitHub
uschindler commented on PR #12281: URL: https://github.com/apache/lucene/pull/12281#issuecomment-1591112679 I did not see any slowdowns in last night @mikemccand benchmark caused by the check during indexing and on building the query. -- This is an automated message from the Apache Git Se

[GitHub] [lucene] uschindler commented on issue #12358: Optimize `count()` for BooleanQuery disjunction

2023-06-14 Thread via GitHub
uschindler commented on issue #12358: URL: https://github.com/apache/lucene/issues/12358#issuecomment-1591117808 Hi, thanks for crosschecking. 1 hour warmup is therefor not changing anything. Anyways, I'd use a newer JDK like 20. -- This is an automated message from the Apache Git

[GitHub] [lucene] nreimers commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-06-14 Thread via GitHub
nreimers commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1591318871 @msokolov The index / vector DB should return the dot product score as is. No scaling, no truncation. Using dot product is tremendously useful for embedding models, they perf

[GitHub] [lucene] uschindler commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-06-14 Thread via GitHub
uschindler commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1591331493 > @msokolov The index / vector DB should return the dot product score as is. No scaling, no truncation. > > Using dot product is tremendously useful for embedding models, t

[GitHub] [lucene] benwtrent commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-06-14 Thread via GitHub
benwtrent commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1591347442 I would think as long as more negative values are scored lower, we will retrieve documents in a sane manner. Scaling negatives to restrict them and then not scaling positiv

[GitHub] [lucene] msokolov commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-06-14 Thread via GitHub
msokolov commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1591355715 Yeah, after consideration, I think we could maybe argue for changing the scaling of negative values given that they were documented as unsupported, even though it would be breaking

[GitHub] [lucene] msokolov commented on issue #12342: Prevent VectorSimilarity.DOT_PRODUCT from returning negative scores

2023-06-14 Thread via GitHub
msokolov commented on issue #12342: URL: https://github.com/apache/lucene/issues/12342#issuecomment-1591379022 Yeah. Another thing we could consider is doing this scaling in KnnVectorQuery and/or its Scorer. These have the ultimate responsibility of complying with the Scorer contract. If we

[GitHub] [lucene] alessandrobenedetti merged pull request #12253: GITHUB-12252: Add function queries for computing similarity scores between knn vectors

2023-06-14 Thread via GitHub
alessandrobenedetti merged PR #12253: URL: https://github.com/apache/lucene/pull/12253 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr..

[GitHub] [lucene] alessandrobenedetti closed issue #12252: Add function queries for computing vector similarity between knn vectors

2023-06-14 Thread via GitHub
alessandrobenedetti closed issue #12252: Add function queries for computing vector similarity between knn vectors URL: https://github.com/apache/lucene/issues/12252 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL

[GitHub] [lucene] sohami commented on issue #12347: Allow extensions of IndexSearcher to provide custom SliceExecutor and slices computation

2023-06-14 Thread via GitHub
sohami commented on issue #12347: URL: https://github.com/apache/lucene/issues/12347#issuecomment-1591452446 @javanna Thanks for your input. > Another thought on my end: executing sometimes on the caller thread, and sometimes on the executor makes things hard to reason about: how do y

[GitHub] [lucene] Jackyrie2 opened a new pull request, #12371: [Draft] #12236 Lazily compute similarity score

2023-06-14 Thread via GitHub
Jackyrie2 opened a new pull request, #12371: URL: https://github.com/apache/lucene/pull/12371 ### Description Per @zhaih suggestion in #12236, this PR moves the computation of the similarity score from `initalizedFromGraph` to a later time, when the `NeighborArray` needs to be sorted and

[GitHub] [lucene] benwtrent commented on pull request #12371: [Draft] #12236 Lazily compute similarity score

2023-06-14 Thread via GitHub
benwtrent commented on PR #12371: URL: https://github.com/apache/lucene/pull/12371#issuecomment-1591770109 Hey @Jackyrie2 this does add some extra memory overhead, 4 new object references. It would be good if it was justified with a benchmark. Could you share some benchmarking on ind

[GitHub] [lucene] javanna commented on issue #12347: Allow extensions of IndexSearcher to provide custom SliceExecutor and slices computation

2023-06-14 Thread via GitHub
javanna commented on issue #12347: URL: https://github.com/apache/lucene/issues/12347#issuecomment-1591822866 > Take the LeafSlice[] in constructor to allow for custom slice computation. Sounds good, I'll happily review that change. > Discuss different options to customize Slice

[GitHub] [lucene] atris commented on issue #12347: Allow extensions of IndexSearcher to provide custom SliceExecutor and slices computation

2023-06-14 Thread via GitHub
atris commented on issue #12347: URL: https://github.com/apache/lucene/issues/12347#issuecomment-1591837122 > > Take the LeafSlice[] in constructor to allow for custom slice computation. > > Sounds good, I'll happily review that change. > > > Discuss different options to custom

[GitHub] [lucene] jbellis opened a new pull request, #12372: Reuse neighborqueue during hnsw index build (attempt 2)

2023-06-14 Thread via GitHub
jbellis opened a new pull request, #12372: URL: https://github.com/apache/lucene/pull/12372 This changes HnswGraphBuilder to re-use the same candidates queues for adding nodes by allocating them in the Builder instance. This saves about 2.5% of build time and takes memory allocations

[GitHub] [lucene] jbellis commented on pull request #12372: Reuse neighborqueue during hnsw index build (attempt 2)

2023-06-14 Thread via GitHub
jbellis commented on PR #12372: URL: https://github.com/apache/lucene/pull/12372#issuecomment-1591859337 Additionally, the original change only re-used the candidates queues within a single addNode call, so this is improved in that respect as well. -- This is an automated message from the

[GitHub] [lucene] sohami commented on issue #12347: Allow extensions of IndexSearcher to provide custom SliceExecutor and slices computation

2023-06-14 Thread via GitHub
sohami commented on issue #12347: URL: https://github.com/apache/lucene/issues/12347#issuecomment-1591876384 @atris To summarize, there are 2 separate functionality I am looking to add: 1) Custom slice computation which the extension can provide. For this we can provide a constructor

[GitHub] [lucene] jbellis opened a new pull request, #12373: require that float vector components are smaller than 1E17 to prevent overflowing to Infinity

2023-06-14 Thread via GitHub
jbellis opened a new pull request, #12373: URL: https://github.com/apache/lucene/pull/12373 Following up to PR #12281 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[GitHub] [lucene] jbellis commented on pull request #12373: require that float vector components are smaller than 1E17 to prevent overflowing to Infinity

2023-06-14 Thread via GitHub
jbellis commented on PR #12373: URL: https://github.com/apache/lucene/pull/12373#issuecomment-1591954514 cc @uschindler -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To

[GitHub] [lucene] sohami opened a new pull request, #12374: Provide constructor to accept the LeafSlice computed by extensions

2023-06-14 Thread via GitHub
sohami opened a new pull request, #12374: URL: https://github.com/apache/lucene/pull/12374 ### Description Add a constructor which takes in the computed slices from extensions and uses that for running the search concurrently on provided executor. This is based on the discussion on the i

[GitHub] [lucene] sohami commented on issue #12347: Allow extensions of IndexSearcher to provide custom SliceExecutor and slices computation

2023-06-14 Thread via GitHub
sohami commented on issue #12347: URL: https://github.com/apache/lucene/issues/12347#issuecomment-1591995488 @javanna @atris I have create a PR (#12374) for item 1 above for now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub