[PR] bpv24 [lucene]

2025-01-27 Thread via GitHub
gf2121 opened a new pull request, #14176: URL: https://github.com/apache/lucene/pull/14176 **Background** * https://github.com/apache/lucene/pull/541 tried to introduce the vectorized decoding for BKD leaf blocks * https://github.com/apache/lucene/pull/706 reverted the PR since we see

Re: [I] Optimizing use the QueryCache [lucene]

2025-01-27 Thread via GitHub
sgup432 commented on issue #14028: URL: https://github.com/apache/lucene/issues/14028#issuecomment-2617663032 @kkewwei Great to see the improvement by changing the skip factor. I see there have been many discussions around finding the right value for `skip_factor` ([here](https://issu

Re: [PR] SortedSet DV Multi Range query [lucene]

2025-01-27 Thread via GitHub
gsmiller commented on code in PR #13974: URL: https://github.com/apache/lucene/pull/13974#discussion_r1931291804 ## lucene/sandbox/src/java/org/apache/lucene/sandbox/search/SortedSetMultiRangeQuery.java: ## @@ -0,0 +1,300 @@ +/* + * Licensed to the Apache Software Foundation (AS

Re: [PR] Implement ACORN-1 search for HNSW [lucene]

2025-01-27 Thread via GitHub
benchaplin closed pull request #14085: Implement ACORN-1 search for HNSW URL: https://github.com/apache/lucene/pull/14085 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To uns

[I] org.apache.lucene.search.TestKnnFloatVectorQuery.testFindFewer ComparisonFailure: expected: but was: [lucene]

2025-01-27 Thread via GitHub
ChrisHegarty opened a new issue, #14175: URL: https://github.com/apache/lucene/issues/14175 Reproduces on _main_. ``` Reproduce with: gradlew :lucene:core:test --tests "org.apache.lucene.search.TestKnnFloatVectorQuery.testFindFewer" -Ptests.jvms=6 "-Ptests.jvmargs=-XX:TieredStopAt

Re: [PR] Add knn result consistency test [lucene]

2025-01-27 Thread via GitHub
benwtrent commented on PR #14167: URL: https://github.com/apache/lucene/pull/14167#issuecomment-2616760637 > This made me wonder if it would be a better trade-off to let just one slice run on its own first, and then let all other N-1 slices run in parallel with one another, I really

Re: [PR] supports force merge based on specified segments. [lucene]

2025-01-27 Thread via GitHub
navneet1v commented on PR #14163: URL: https://github.com/apache/lucene/pull/14163#issuecomment-2616674699 @cheng66551 I have also seen similar behavior but not with ES, but with Opensearch. -- This is an automated message from the Apache Git Service. To respond to the message, please log

Re: [PR] Add nullability annotations to IndexSearcher APIs [lucene]

2025-01-27 Thread via GitHub
rmuir commented on PR #14132: URL: https://github.com/apache/lucene/pull/14132#issuecomment-2616629072 I will look in on the ECJ side, last time I used it, their null analysis had issues on lucene's enormous codebase. It was many years ago though, maybe it has solidified. error-prone

Re: [PR] Add nullability annotations to IndexSearcher APIs [lucene]

2025-01-27 Thread via GitHub
rmuir commented on PR #14132: URL: https://github.com/apache/lucene/pull/14132#issuecomment-2616604765 One option we could do for correctness is to turn on ecj's null analysis. ecj is pretty fast and runs as part of gradle checks already, and it is the compiler often using this feature in t

Re: [I] testMergeStability failing for Knn formats [lucene]

2025-01-27 Thread via GitHub
benwtrent commented on issue #13640: URL: https://github.com/apache/lucene/issues/13640#issuecomment-2616596850 I discovered two other weird behaviors digging into this test failure. But, neither seemed to fix this inconsistency: https://github.com/apache/lucene/pull/14174 -- Thi

[PR] Make knn graph conn writing more consistent [lucene]

2025-01-27 Thread via GitHub
benwtrent opened a new pull request, #14174: URL: https://github.com/apache/lucene/pull/14174 This fixes two minor inconsistencies. - Makes sure that connected components isn't called twice with concurrent hnsw merger - make sure that duplicate connections are also handled when a

[PR] [Draft] Support Multi-Vector HNSW Search via Flat Vector Storage [lucene]

2025-01-27 Thread via GitHub
vigyasharma opened a new pull request, #14173: URL: https://github.com/apache/lucene/pull/14173 Another take at #12313 The following PR adds support for _independent_ multi-vectors, i.e. scenarios where a single document is represented by multiple independent vector values. The most

[PR] Adjust knn merge stability testing [lucene]

2025-01-27 Thread via GitHub
benwtrent opened a new pull request, #14172: URL: https://github.com/apache/lucene/pull/14172 With the new connected components work, merging is unstable. It generally seems that between indexing, there might be new connections as `vex` index size increases. I am not 100% sure the ca

Re: [PR] Specialize DisiPriorityQueue for the 2-clauses case. [lucene]

2025-01-27 Thread via GitHub
jpountz commented on PR #14070: URL: https://github.com/apache/lucene/pull/14070#issuecomment-2616495910 Updated benchmark results, there is still a speedup. I'll merge soon. ``` TaskQPS baseline StdDevQPS my_modified_version StdDev

[I] testMergeStability failing for Knn formats [lucene]

2025-01-27 Thread via GitHub
benwtrent opened a new issue, #13640: URL: https://github.com/apache/lucene/issues/13640 ### Description All KNN formats are periodically failing `testMergeStability`. I have verified its due to https://github.com/apache/lucene/pull/13566 The stability failure is due to

Re: [I] testMergeStability failing for Knn formats [lucene]

2025-01-27 Thread via GitHub
benwtrent commented on issue #13640: URL: https://github.com/apache/lucene/issues/13640#issuecomment-2616417275 @msokolov its reared its head again. ``` ./gradlew test --tests TestPerFieldKnnVectorsFormat.testMergeStability -Dtests.seed=FF1182F3FC600FF -Dtests.locale=mni-Beng-IN

Re: [PR] Add knn result consistency test [lucene]

2025-01-27 Thread via GitHub
jpountz commented on PR #14167: URL: https://github.com/apache/lucene/pull/14167#issuecomment-2616408185 Somewhat related, thinking out loud: I have been wondering about what is the best way to parallelize top-k query processing. Lexical search has a similar issue as knn search in that it i

Re: [PR] Add knn result consistency test [lucene]

2025-01-27 Thread via GitHub
jpountz commented on PR #14167: URL: https://github.com/apache/lucene/pull/14167#issuecomment-2616390109 > I don't know of another query where multiple passes over a static dataset can return different docs. Currently, this does not happen because Lucene only enables so-called "rank-

Re: [PR] Add knn result consistency test [lucene]

2025-01-27 Thread via GitHub
benwtrent commented on PR #14167: URL: https://github.com/apache/lucene/pull/14167#issuecomment-2616250521 @mayya-sharipova maybe a search time flag is possible, but it would stink to have a "inconsistent but fast" flag that users then have to worry about. I don't know of another quer

Re: [PR] Add nullability annotations to IndexSearcher APIs [lucene]

2025-01-27 Thread via GitHub
msokolov commented on PR #14132: URL: https://github.com/apache/lucene/pull/14132#issuecomment-2616198970 > Specifying that something is nullable doesn't provide any value: all types are nullable by default already. I guess these null-checking systems impose their own assumptions and

Re: [PR] Add nullability annotations to IndexSearcher APIs [lucene]

2025-01-27 Thread via GitHub
rmuir commented on PR #14132: URL: https://github.com/apache/lucene/pull/14132#issuecomment-2616174314 For the java language: Specifying that something is nullable doesn't provide any value: all types are nullable by default already. Specifying that something is NOT-nullable wo

Re: [PR] Add a HNSW collector that exits early when nearest neighbor queue saturates [lucene]

2025-01-27 Thread via GitHub
benwtrent commented on code in PR #14094: URL: https://github.com/apache/lucene/pull/14094#discussion_r1930637609 ## lucene/core/src/java/org/apache/lucene/search/HnswKnnCollector.java: ## @@ -0,0 +1,24 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mor

Re: [PR] Integrating GPU based Vector Search using cuVS [lucene]

2025-01-27 Thread via GitHub
chatman commented on PR #14131: URL: https://github.com/apache/lucene/pull/14131#issuecomment-2615867677 > I want to start cleaning some of the outstanding items in this PR, but I do not have push access to SearchScale:cuvs-integration-main. Can I get access, or is there a better way to pro

Re: [PR] Integrating GPU based Vector Search using cuVS [lucene]

2025-01-27 Thread via GitHub
chatman commented on PR #14131: URL: https://github.com/apache/lucene/pull/14131#issuecomment-2615875009 > restructure cuvs-java so that it compiles to a minimum JDK 21, with an mr jar/version specific loading. Maybe it can even strip use class file version 65 and strip the preview

Re: [PR] Integrating GPU based Vector Search using cuVS [lucene]

2025-01-27 Thread via GitHub
uschindler commented on PR #14131: URL: https://github.com/apache/lucene/pull/14131#issuecomment-2615941784 One additional important thing: no public API in this new Lucene module/codec must export any preview API, so it must all be private/pkg-private. -- This is an automated message fro

Re: [PR] Integrating GPU based Vector Search using cuVS [lucene]

2025-01-27 Thread via GitHub
ChrisHegarty commented on PR #14131: URL: https://github.com/apache/lucene/pull/14131#issuecomment-2615930453 > P.P.S. Elasticsearch has a Gradle plugin to strip preview flags. Basically it patches one byte in all class files that are created by Javac. Yes, we can do this. Along with

Re: [PR] Integrating GPU based Vector Search using cuVS [lucene]

2025-01-27 Thread via GitHub
ChrisHegarty commented on PR #14131: URL: https://github.com/apache/lucene/pull/14131#issuecomment-2615927388 > Tha APIs are available in Java 21, too (with minimal changes regarding some specific parts like string handling). If you omit those, you can compile against java 21 and later stri

Re: [PR] Integrating GPU based Vector Search using cuVS [lucene]

2025-01-27 Thread via GitHub
uschindler commented on PR #14131: URL: https://github.com/apache/lucene/pull/14131#issuecomment-2615924479 P.P.S. Elasticsearch has a Gradle plugin to strip preview flags. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and u

Re: [PR] Integrating GPU based Vector Search using cuVS [lucene]

2025-01-27 Thread via GitHub
benwtrent commented on PR #14131: URL: https://github.com/apache/lucene/pull/14131#issuecomment-2615923918 @dweiss While I think that might work for unused/snapshot Lucene releases, I think @chatman et. al. is aiming for usage in the current Lucene Main so that Lucene focused search engines

Re: [PR] Integrating GPU based Vector Search using cuVS [lucene]

2025-01-27 Thread via GitHub
uschindler commented on PR #14131: URL: https://github.com/apache/lucene/pull/14131#issuecomment-2615909166 > > restructure cuvs-java so that it compiles to a minimum JDK 21, with an mr jar/version specific loading. Maybe it can even strip use class file version 65 and strip the preview bit

Re: [PR] Integrating GPU based Vector Search using cuVS [lucene]

2025-01-27 Thread via GitHub
dweiss commented on PR #14131: URL: https://github.com/apache/lucene/pull/14131#issuecomment-2615902695 The third option is to bump the minimum Java requirement to Java 22 on main? I know it's an interim release but maybe we should just do it, anticipating the next major lts (due to be rele

Re: [PR] Add a HNSW collector that exits early when nearest neighbor queue saturates [lucene]

2025-01-27 Thread via GitHub
benwtrent commented on code in PR #14094: URL: https://github.com/apache/lucene/pull/14094#discussion_r1930581288 ## lucene/core/src/java/org/apache/lucene/search/HnswKnnCollector.java: ## @@ -0,0 +1,24 @@ +/* + * Licensed to the Apache Software Foundation (ASF) under one or mor

Re: [I] Query parser support for wildcards in phrase queries [lucene]

2025-01-27 Thread via GitHub
jpountz commented on issue #14168: URL: https://github.com/apache/lucene/issues/14168#issuecomment-2615747790 `PhraseWildcardSearch` is appealing, but its implementation makes trade-offs to work around the fact that it doesn't work efficiently if any of the wildcards expands to many terms.

Re: [PR] Add nullability annotations to IndexSearcher APIs [lucene]

2025-01-27 Thread via GitHub
msokolov commented on PR #14132: URL: https://github.com/apache/lucene/pull/14132#issuecomment-2615754639 This seems reasonable to me, but it implies a future promise to maintain it, and I don't know how we would ever know if we added some new usage that isn't properly annotated -- This

Re: [I] TestBPReorderingMergePolicy fails CheckIndex.testHnswGraph [lucene]

2025-01-27 Thread via GitHub
msokolov commented on issue #14127: URL: https://github.com/apache/lucene/issues/14127#issuecomment-2615749814 the tests seem to have stopped failing, as expected, closing -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Upgrade commons-codec from 1.13.0 to 1.17.2 [lucene]

2025-01-27 Thread via GitHub
msokolov commented on PR #14129: URL: https://github.com/apache/lucene/pull/14129#issuecomment-2615746064 Thanks, @msfroh -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. T

Re: [I] TestBPReorderingMergePolicy fails CheckIndex.testHnswGraph [lucene]

2025-01-27 Thread via GitHub
msokolov closed issue #14127: TestBPReorderingMergePolicy fails CheckIndex.testHnswGraph URL: https://github.com/apache/lucene/issues/14127 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specif

Re: [PR] Upgrade commons-codec from 1.13.0 to 1.17.2 [lucene]

2025-01-27 Thread via GitHub
msokolov merged PR #14129: URL: https://github.com/apache/lucene/pull/14129 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] Optimize ContextQuery with big number of contexts [lucene]

2025-01-27 Thread via GitHub
mayya-sharipova merged PR #14169: URL: https://github.com/apache/lucene/pull/14169 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lu

Re: [PR] Integrating GPU based Vector Search using cuVS [lucene]

2025-01-27 Thread via GitHub
ChrisHegarty commented on PR #14131: URL: https://github.com/apache/lucene/pull/14131#issuecomment-2615430123 I want to start cleaning some of the outstanding items in this PR, but I do not have push access to SearchScale:cuvs-integration-main. Can I get access, or is there a better way to

Re: [PR] Integrating GPU based Vector Search using cuVS [lucene]

2025-01-27 Thread via GitHub
ChrisHegarty commented on PR #14131: URL: https://github.com/apache/lucene/pull/14131#issuecomment-2615395910 Hi, `cuvs-java-25.02` is currently compiled with JDK 22, so has a minimum class file version of 66. Lucene compiles with a minimum of JDK 21, class file version 65. The reason why