[I] org.apache.lucene.search.TestPatienceFloatVectorQuery.testFindAll failed [lucene]

2025-05-20 Thread via GitHub
gf2121 opened a new issue, #14694: URL: https://github.com/apache/lucene/issues/14694 ### Description ``` org.apache.lucene.search.TestPatienceFloatVectorQuery > test suite's output saved to /home/runner/work/lucene/lucene/lucene/core/build/test-results/test/outputs/OUTPUT-org.apa

Re: [PR] DocIdRunEnd implementation missed in Lucene103PostingsReader [lucene]

2025-05-20 Thread via GitHub
github-actions[bot] commented on PR #14693: URL: https://github.com/apache/lucene/pull/14693#issuecomment-2896701459 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

Re: [PR] Speed up exhaustive evaluation. [lucene]

2025-05-20 Thread via GitHub
rmuir commented on PR #14679: URL: https://github.com/apache/lucene/pull/14679#issuecomment-2896403654 Thank you! "bulkpostings 2.0" is looking really clean and non-invasive :) > I suspect it may be tempting in the future, because it enables further optimizations as @gf2121 showed in

Re: [PR] Improve BytesRef creation from String [lucene]

2025-05-20 Thread via GitHub
rmuir commented on PR #14678: URL: https://github.com/apache/lucene/pull/14678#issuecomment-2896282594 yeah it is tricky, since the 'strings' indexed in search are usually small: words. for a lot of natural languages average word length is already small (e.g. english: ~5), and often in sear

Re: [PR] Update created version major [lucene]

2025-05-20 Thread via GitHub
github-actions[bot] commented on PR #14607: URL: https://github.com/apache/lucene/pull/14607#issuecomment-2896124380 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Fix FuzzySet#createSetBasedOnMaxMemory to honor bytes not bits [lucene]

2025-05-20 Thread via GitHub
github-actions[bot] commented on PR #14616: URL: https://github.com/apache/lucene/pull/14616#issuecomment-2896124348 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-05-20 Thread via GitHub
dungba88 commented on PR #14009: URL: https://github.com/apache/lucene/pull/14009#issuecomment-2896110176 Thanks @vigyasharma for the comment! I updated the comment to make it less confusing. I'll think about generalization, but the idea is that as long as the field can expose the fl

Re: [PR] deps(java): bump org.eclipse.jgit:org.eclipse.jgit from 7.2.0.202503040940-r to 7.2.1.202505142326-r [lucene]

2025-05-20 Thread via GitHub
github-actions[bot] commented on PR #14692: URL: https://github.com/apache/lucene/pull/14692#issuecomment-2896069370 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

Re: [PR] deps(java): bump org.gradle.toolchains.foojay-resolver-convention from 0.10.0 to 1.0.0 [lucene]

2025-05-20 Thread via GitHub
github-actions[bot] commented on PR #14691: URL: https://github.com/apache/lucene/pull/14691#issuecomment-2896069185 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

[PR] deps(java): bump org.eclipse.jgit:org.eclipse.jgit from 7.2.0.202503040940-r to 7.2.1.202505142326-r [lucene]

2025-05-20 Thread via GitHub
dependabot[bot] opened a new pull request, #14692: URL: https://github.com/apache/lucene/pull/14692 Bumps [org.eclipse.jgit:org.eclipse.jgit](https://github.com/eclipse-jgit/jgit) from 7.2.0.202503040940-r to 7.2.1.202505142326-r. Commits https://github.com/eclipse-jgit/jgit/c

[PR] deps(java): bump org.gradle.toolchains.foojay-resolver-convention from 0.10.0 to 1.0.0 [lucene]

2025-05-20 Thread via GitHub
dependabot[bot] opened a new pull request, #14691: URL: https://github.com/apache/lucene/pull/14691 Bumps org.gradle.toolchains.foojay-resolver-convention from 0.10.0 to 1.0.0. [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?de

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-05-20 Thread via GitHub
vigyasharma commented on code in PR #14009: URL: https://github.com/apache/lucene/pull/14009#discussion_r2098815198 ## lucene/core/src/java/org/apache/lucene/search/RerankKnnFloatVectorQuery.java: ## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] Improve BytesRef creation from String [lucene]

2025-05-20 Thread via GitHub
schlosna closed pull request #14678: Improve BytesRef creation from String URL: https://github.com/apache/lucene/pull/14678 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Improve BytesRef creation from String [lucene]

2025-05-20 Thread via GitHub
schlosna commented on PR #14678: URL: https://github.com/apache/lucene/pull/14678#issuecomment-2896009970 > > In #12071 these is mention [#12071 (comment)](https://github.com/apache/lucene/issues/12071#issuecomment-1379313710) of using the vector APIs to speed up UnicodeUtil conversions. Ha

Re: [PR] Clean up how the test framework creates asserting scorables. [lucene]

2025-05-20 Thread via GitHub
jpountz merged PR #14452: URL: https://github.com/apache/lucene/pull/14452 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Make competitive iterators more robust. [lucene]

2025-05-20 Thread via GitHub
jpountz merged PR #14532: URL: https://github.com/apache/lucene/pull/14532 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Remove DISIDocIdStream. [lucene]

2025-05-20 Thread via GitHub
jpountz merged PR #14550: URL: https://github.com/apache/lucene/pull/14550 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Implement AssertingPostingsEnum#intoBitSet. [lucene]

2025-05-20 Thread via GitHub
jpountz merged PR #14675: URL: https://github.com/apache/lucene/pull/14675 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[PR] Speed up conjunctive queries that need scores. [lucene]

2025-05-20 Thread via GitHub
jpountz opened a new pull request, #14690: URL: https://github.com/apache/lucene/pull/14690 Calls to `DocIdSetIterator#nextDoc`, `DocIdSetIterator#advance` and `SimScorer#score` are currently interleaved and include lots of conditionals. This builds up on #14679 and refactors the code a

Re: [I] Try GroupVInt for writing HNSW neighbor node arrays? [lucene]

2025-05-20 Thread via GitHub
msokolov closed issue #14689: Try GroupVInt for writing HNSW neighbor node arrays? URL: https://github.com/apache/lucene/issues/14689 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Try GroupVInt for writing HNSW neighbor node arrays? [lucene]

2025-05-20 Thread via GitHub
msokolov commented on issue #14689: URL: https://github.com/apache/lucene/issues/14689#issuecomment-2895734509 yup, looks like a duplicate - thanks for finding @benwtrent -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Speed up exhaustive evaluation. [lucene]

2025-05-20 Thread via GitHub
jpountz commented on PR #14679: URL: https://github.com/apache/lucene/pull/14679#issuecomment-2895700581 You are correct, no need for additional APIs on Similarity at this point, I removed it. I suspect it may be tempting in the future, because it enables further optimizations as @gf2121 sh

Re: [PR] Speed up exhaustive evaluation. [lucene]

2025-05-20 Thread via GitHub
jpountz commented on code in PR #14679: URL: https://github.com/apache/lucene/pull/14679#discussion_r2098761902 ## lucene/core/src/java/org/apache/lucene/index/PostingsEnum.java: ## @@ -97,4 +98,44 @@ protected PostingsEnum() {} * anything (neither members of the returned By

Re: [I] Try GroupVInt for writing HNSW neighbor node arrays? [lucene]

2025-05-20 Thread via GitHub
benwtrent commented on issue #14689: URL: https://github.com/apache/lucene/issues/14689#issuecomment-2895637071 I think this might be a duplicate? https://github.com/apache/lucene/issues/12871 I agree it's a good idea :) -- This is an automated message from the Apache Git Ser

Re: [I] Support for DocIdSetBuilder with (min,max) docId [lucene]

2025-05-20 Thread via GitHub
jainankitk commented on issue #14485: URL: https://github.com/apache/lucene/issues/14485#issuecomment-2895538068 Thanks @javanna for getting back with the current status. Will wait for @prudhvigodithi to make progress on the proposal and PRs. Will loop you in for reviews given your intra-se

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-05-20 Thread via GitHub
kaivalnp commented on PR #14178: URL: https://github.com/apache/lucene/pull/14178#issuecomment-2895246147 Rebased the PR to incorporate recent changes (including the optimistic collection based on pro-rating) --- Single-segment search has no impact as expected: Lucene:

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-05-20 Thread via GitHub
github-actions[bot] commented on PR #14178: URL: https://github.com/apache/lucene/pull/14178#issuecomment-2895178019 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

Re: [PR] Fix patience knn queries to work with seeded knn queries [lucene]

2025-05-20 Thread via GitHub
tteofili merged PR #14688: URL: https://github.com/apache/lucene/pull/14688 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

[I] Try GroupVInt for writing HNSW neighbor node arrays? [lucene]

2025-05-20 Thread via GitHub
mikemccand opened a new issue, #14689: URL: https://github.com/apache/lucene/issues/14689 ### Description @msokolov relayed this idea from @jpountz: today, the default `KnnVectorsFormat` uses delta vInt (I think?) to write the neighbor nodes array ... maybe `GroupVInt` would be small

Re: [I] Support for Pluggable Custom Vector Similarity Functions [lucene]

2025-05-20 Thread via GitHub
msokolov commented on issue #14520: URL: https://github.com/apache/lucene/issues/14520#issuecomment-2894524859 I think it's a duplicate of #14025 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] HyphenationCompoundWordTokenFilter fixed token position and preserves original token [lucene]

2025-05-20 Thread via GitHub
mikemccand commented on issue #14624: URL: https://github.com/apache/lucene/issues/14624#issuecomment-2894460764 To address your 2nd idea (increment the position for each sub-word in the compound word), I think we'd need to create a graph-aware `CompoundWordTokenFilter`. It would also emit

Re: [I] Multi-threaded vector search over multiple segments can lead to inconsistent results [lucene]

2025-05-20 Thread via GitHub
mikemccand commented on issue #14180: URL: https://github.com/apache/lucene/issues/14180#issuecomment-2894399826 It sounds like this is fixed, I will close this now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Multi-threaded vector search over multiple segments can lead to inconsistent results [lucene]

2025-05-20 Thread via GitHub
mikemccand closed issue #14180: Multi-threaded vector search over multiple segments can lead to inconsistent results URL: https://github.com/apache/lucene/issues/14180 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [I] Segment count (merging) can impact recall on KNN ParentJoin queries [lucene]

2025-05-20 Thread via GitHub
mikemccand commented on issue #14643: URL: https://github.com/apache/lucene/issues/14643#issuecomment-2894378185 > This doesn't look like a problem with regular KNN vector queries, only appears with parent-join query benchmarks. Hmm it's odd for the 500K docs case that recall is so mu

Re: [I] Integrate a JVector codec for KNN searches [lucene]

2025-05-20 Thread via GitHub
mikemccand commented on issue #14681: URL: https://github.com/apache/lucene/issues/14681#issuecomment-2894290455 +1, it'd be awesome to refactor OpenSearch's jvector integration down to Lucene as an alternative Codec (`KnnVectorsFormat`) component in sandbox. https://github.com/apache

Re: [PR] Speed up exhaustive evaluation. [lucene]

2025-05-20 Thread via GitHub
rmuir commented on PR #14679: URL: https://github.com/apache/lucene/pull/14679#issuecomment-2894197496 Do we really need the method on Similarity? I guess I feel, most users are probably using BM25Similarity, so I don't understand the explanation in the comments. If we have "bogus" i

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-05-20 Thread via GitHub
Coqueue commented on PR #14333: URL: https://github.com/apache/lucene/pull/14333#issuecomment-2893516032 Thanks for the fantastic change! Want to share that we adopted and backported this codec to Lucene 912, and ran it against an Amazon Search internal benchmark, from which we observ