Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-05-22 Thread via GitHub
msokolov commented on PR #14009: URL: https://github.com/apache/lucene/pull/14009#issuecomment-2902557094 Is it this? https://github.com/apache/lucene/blob/main/lucene/queries/src/java/org/apache/lucene/queries/function/FunctionScoreQuery.java -- This is an automated message from the

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-05-22 Thread via GitHub
vigyasharma commented on PR #14009: URL: https://github.com/apache/lucene/pull/14009#issuecomment-2902502742 > Lucene typically rewrites the Query first, and then createWeight on the simplest form. That's a good callout. I'll override the `rewrite` method to rewrite the inner query.

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-05-22 Thread via GitHub
msokolov commented on PR #14009: URL: https://github.com/apache/lucene/pull/14009#issuecomment-2902339944 I haven't dug in deep yet, but along the lines of further generality, would it make sense to accept a DoubleValuesSource (that can compute dot products against a vector field) rather t

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2025-05-22 Thread via GitHub
heemin32 commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2902300025 On a second thought, if user just diable indexing, implementing multi vector in the same field might be same as having a separate field without indexing capability. -- This is an aut

Re: [I] Segment count (merging) can impact recall on KNN ParentJoin queries [lucene]

2025-05-22 Thread via GitHub
msokolov commented on issue #14643: URL: https://github.com/apache/lucene/issues/14643#issuecomment-2902293560 > Hmm it's odd for the 500K docs case that recall is so much better with FEWER segments: .969 with 8 segments, .705 with 89 segments. -- This is an automated message from the Apa

Re: [I] Integrate a JVector codec for KNN searches [lucene]

2025-05-22 Thread via GitHub
msokolov commented on issue #14681: URL: https://github.com/apache/lucene/issues/14681#issuecomment-2902282294 This seems worth exploring. One question I have, after reading the linked discussion thread about integrating JVector as an OpenSearch plugin, is about this claim ("JVector is a th

Re: [PR] Use a temporary repository location to download certain ecj versions ("drops") [lucene]

2025-05-22 Thread via GitHub
dweiss merged PR #14703: URL: https://github.com/apache/lucene/pull/14703 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2025-05-22 Thread via GitHub
heemin32 commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2902248141 I believe this proposal to add a new field for multi-vector support is facing significant challenges primarily because we aim to support HNSW-based search on it. However, if our goal we

Re: [PR] [WIP] Multi-Vector support for HNSW search [lucene]

2025-05-22 Thread via GitHub
heemin32 commented on PR #13525: URL: https://github.com/apache/lucene/pull/13525#issuecomment-2902247646 Certainly! Here's a clearer and more polished rephrasing of your message: --- I believe this proposal to add a new field for multi-vector support is facing significant chal

Re: [PR] Allow reading binary doc values as a RandomAccessInput [lucene]

2025-05-22 Thread via GitHub
github-actions[bot] commented on PR #13948: URL: https://github.com/apache/lucene/pull/13948#issuecomment-2902069576 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

Re: [I] Remove Telugu normalization of vu వు to ma మ from IndicNormalizer [lucene]

2025-05-22 Thread via GitHub
Trey314159 commented on issue #14659: URL: https://github.com/apache/lucene/issues/14659#issuecomment-2902054358 @praveen-d291: Thanks for the pull request! I was unsure how best to modify the tests since I don't read Telugu. I couldn't tell what would make natural-looking examples and I di

Re: [PR] Update the IOContext on IndexInput rather than the ReadAdvice [lucene]

2025-05-22 Thread via GitHub
jpountz commented on code in PR #14702: URL: https://github.com/apache/lucene/pull/14702#discussion_r2102989752 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java: ## @@ -124,7 +124,7 @@ public abstract void search( * * The default implementation retu

Re: [PR] Add assumption to ignore test failures due to disconnected graphs [lucene]

2025-05-22 Thread via GitHub
msokolov merged PR #14696: URL: https://github.com/apache/lucene/pull/14696 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

Re: [PR] Return MatchNoDocsQuery when IndexOrDocValuesQuery::rewrite does not match [lucene]

2025-05-22 Thread via GitHub
ChrisHegarty merged PR #14700: URL: https://github.com/apache/lucene/pull/14700 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [PR] Fix AbstractRangeQueryNode#toQueryString [lucene]

2025-05-22 Thread via GitHub
phb-ig commented on code in PR #14697: URL: https://github.com/apache/lucene/pull/14697#discussion_r2102606331 ## lucene/queryparser/src/test/org/apache/lucene/queryparser/flexible/standard/nodes/TestAbstractRangeQueryNode.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the Apache

Re: [PR] Fix AbstractRangeQueryNode#toQueryString [lucene]

2025-05-22 Thread via GitHub
phb-ig commented on code in PR #14697: URL: https://github.com/apache/lucene/pull/14697#discussion_r2102594352 ## lucene/CHANGES.txt: ## @@ -854,7 +862,7 @@ Improvements * GITHUB#13285: Early terminate graph searches of AbstractVectorSimilarityQuery to follow timeout set fro

Re: [PR] Fix AbstractRangeQueryNode#toQueryString [lucene]

2025-05-22 Thread via GitHub
phb-ig commented on PR #14697: URL: https://github.com/apache/lucene/pull/14697#issuecomment-2901267587 @stefanvodita I am the one who originally submitted the patches all that time ago because the work I was doing at the time required inspecting the query string and altering, then re-parsi

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-05-22 Thread via GitHub
dungba88 commented on PR #14009: URL: https://github.com/apache/lucene/pull/14009#issuecomment-2901240220 I changed the query to be generic but keep other the same. (Didn't have JDK24 yet so build would fail.) -- This is an automated message from the Apache Git Service. To respond to the

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-05-22 Thread via GitHub
github-actions[bot] commented on PR #14009: URL: https://github.com/apache/lucene/pull/14009#issuecomment-2901186441 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

Re: [PR] Use a temporary repository location to download certain ecj versions ("drops") [lucene]

2025-05-22 Thread via GitHub
github-actions[bot] commented on PR #14703: URL: https://github.com/apache/lucene/pull/14703#issuecomment-2901110046 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-05-22 Thread via GitHub
dungba88 commented on PR #14009: URL: https://github.com/apache/lucene/pull/14009#issuecomment-2901114417 No worry, I think generalization makes sense to me too. And I like the idea of moving this forward. The use of `KnnFloatVectorQuery` in this implementation only adds a small bene

[PR] Use a temporary repository location to download certain ecj versions ("drops") [lucene]

2025-05-22 Thread via GitHub
dweiss opened a new pull request, #14703: URL: https://github.com/apache/lucene/pull/14703 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-ma

Re: [PR] Update the IOContext on IndexInput rather than the ReadAdvice [lucene]

2025-05-22 Thread via GitHub
github-actions[bot] commented on PR #14702: URL: https://github.com/apache/lucene/pull/14702#issuecomment-2900966501 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

[PR] Update the IOContext on IndexInput rather than the ReadAdvice [lucene]

2025-05-22 Thread via GitHub
thecoop opened a new pull request, #14702: URL: https://github.com/apache/lucene/pull/14702 Update the `IOContext` on `IndexInput` implementations rather than the `ReadAdvice`. This change means that `ReadAdvice` is now only used by `MMapDirectory` and friends. Note that this is a refactori

Re: [PR] Speed up conjunctive queries that need scores. [lucene]

2025-05-22 Thread via GitHub
jpountz commented on PR #14690: URL: https://github.com/apache/lucene/pull/14690#issuecomment-2900934920 I'm superseding this change with a more general one for now, which doesn't introduce new public APIs: #14701. We can look into taking ideas from this PR as follow-ups. -- This is an a

Re: [PR] Speed up conjunctive queries that need scores. [lucene]

2025-05-22 Thread via GitHub
jpountz closed pull request #14690: Speed up conjunctive queries that need scores. URL: https://github.com/apache/lucene/pull/14690 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comme

Re: [PR] Refactor main top-n bulk scorers to evaluate hits in a more term-at-a-time fashion. [lucene]

2025-05-22 Thread via GitHub
jpountz commented on PR #14701: URL: https://github.com/apache/lucene/pull/14701#issuecomment-2900931305 Filtered queries get a slowdown, but some important queries get a big speedup: ``` TaskQPS baseline StdDevQPS my_modified_version StdDev

Re: [PR] Refactor main top-n bulk scorers to evaluate hits in a more term-at-a-time fashion. [lucene]

2025-05-22 Thread via GitHub
github-actions[bot] commented on PR #14701: URL: https://github.com/apache/lucene/pull/14701#issuecomment-2900925674 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

[PR] Refactor main top-n bulk scorers to evaluate hits in a more term-at-a-time fashion. [lucene]

2025-05-22 Thread via GitHub
jpountz opened a new pull request, #14701: URL: https://github.com/apache/lucene/pull/14701 `MaxScoreBulkScorer` and `BlockMaxConjunctionBulkScorer` currently evaluate hits in a doc-at-a-time (DAAT) fashion, meaning that they they look at all their clauses to find the next doc and so forth

Re: [I] Remove Telugu normalization of vu వు to ma మ from IndicNormalizer [lucene]

2025-05-22 Thread via GitHub
rmuir commented on issue #14659: URL: https://github.com/apache/lucene/issues/14659#issuecomment-2900766946 The filter here is working as documented: problem is, user didn't read the documentation. Just don't use the filter if you dont want the transformations that it does. -- This is an

[PR] Fixed incorrect Telugu normalization of vu వు to ma మ ( [lucene]

2025-05-22 Thread via GitHub
praveen-d291 opened a new pull request, #14699: URL: https://github.com/apache/lucene/pull/14699 Fixes: #14659 Remove incorrect Telugu వు/మ conflation in Indic Normalization. They look similar, but they are distinct with different meanings. Currently "వు" is mapped to "మ" in

Re: [I] Remove Telugu normalization of vu వు to ma మ from IndicNormalizer [lucene]

2025-05-22 Thread via GitHub
rmuir commented on issue #14659: URL: https://github.com/apache/lucene/issues/14659#issuecomment-2900761266 > It's like conflating "rn" and "m" to merge burn/bum and corn/com. It could happen when reading quickly or with poor handwriting, but it is not something that should happen for searc

Re: [I] Nightly benchmark regression on 2025.05.01 [lucene]

2025-05-22 Thread via GitHub
mikemccand commented on issue #14630: URL: https://github.com/apache/lucene/issues/14630#issuecomment-2900807348 Hmm @yugushihuang (on my team (Amazon product search) team) found this is another way to query the kernel (our Amazon Linux 2 boxes seem not to have `/proc/config.gz`): ``

[PR] Return MatchNoDocsQuery when IndexOrDocValuesQuery::rewrite does not match [lucene]

2025-05-22 Thread via GitHub
ChrisHegarty opened a new pull request, #14700: URL: https://github.com/apache/lucene/pull/14700 In a similar way to how `IndexOrDocValuesQuery` may rewrite to a `MatchAllDocsQuery`, if either of the queries rewrites to match no docs, then return a `MatchNoDocsQuery`. -- This is an autom

Re: [PR] Return MatchNoDocsQuery when IndexOrDocValuesQuery::rewrite does not match [lucene]

2025-05-22 Thread via GitHub
github-actions[bot] commented on PR #14700: URL: https://github.com/apache/lucene/pull/14700#issuecomment-2900736002 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

Re: [PR] Minor access modifier adjustment to a couple of lucene90 backward compat types [lucene]

2025-05-22 Thread via GitHub
ChrisHegarty merged PR #14695: URL: https://github.com/apache/lucene/pull/14695 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [PR] Speed up exhaustive evaluation. [lucene]

2025-05-22 Thread via GitHub
jpountz merged PR #14679: URL: https://github.com/apache/lucene/pull/14679 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Fixed incorrect Telugu normalization of vu వు to ma మ ( [lucene]

2025-05-22 Thread via GitHub
github-actions[bot] commented on PR #14699: URL: https://github.com/apache/lucene/pull/14699#issuecomment-2900515377 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-05-22 Thread via GitHub
vigyasharma commented on PR #14009: URL: https://github.com/apache/lucene/pull/14009#issuecomment-2900200570 @dungba88 : I shared an [alternate implementation](https://github.com/apache/lucene/pull/14698) for RerankVectorQuery that I feel can generalize more broadly. **Main Differen

[PR] An alternate implementation for RerankVectorQuery [lucene]

2025-05-22 Thread via GitHub
vigyasharma opened a new pull request, #14698: URL: https://github.com/apache/lucene/pull/14698 This PR presents an alternate to #14009 for `RerankVectorQuery`. It takes an input query, a target vector and a field to use for vector values, and rescores the output of wrapped query with full

Re: [I] Could/should KNN queries use per-segment query caching? [lucene]

2025-05-21 Thread via GitHub
rmuir commented on issue #14669: URL: https://github.com/apache/lucene/issues/14669#issuecomment-2899564405 for an http-based service, you can accomplish this by setting cache headers correctly as well. then the caching is much more flexible: can happen on user's device/client, load balance

Re: [PR] Introduce a mapping to map sparse labels to a continuous range [lucene]

2025-05-21 Thread via GitHub
github-actions[bot] commented on PR #14494: URL: https://github.com/apache/lucene/pull/14494#issuecomment-2899568006 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Fix AbstractRangeQueryNode#toQueryString [lucene]

2025-05-21 Thread via GitHub
stefanvodita commented on code in PR #14697: URL: https://github.com/apache/lucene/pull/14697#discussion_r2101276705 ## lucene/queryparser/src/test/org/apache/lucene/queryparser/flexible/standard/nodes/TestAbstractRangeQueryNode.java: ## @@ -0,0 +1,69 @@ +/* + * Licensed to the

[PR] Fix AbstractRangeQueryNode#toQueryString [lucene]

2025-05-21 Thread via GitHub
phb-ig opened a new pull request, #14697: URL: https://github.com/apache/lucene/pull/14697 Re: #7865 It now returns a string which is valid Lucene range query syntax and can be parsed back into the original node. Added public method `getTermEscaped(EscapeQuerySyntax)` to `Valu

Re: [PR] Add assumption to ignore test failures due to disconnected graphs [lucene]

2025-05-21 Thread via GitHub
github-actions[bot] commented on PR #14696: URL: https://github.com/apache/lucene/pull/14696#issuecomment-2898976362 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

[PR] Add assumption to ignore test failures due to disconnected graphs [lucene]

2025-05-21 Thread via GitHub
msokolov opened a new pull request, #14696: URL: https://github.com/apache/lucene/pull/14696 We've seen a occasional test failures like this one: gradlew test --tests TestFloatVectorSimilarityQuery.testTimeout -Dtests.seed=B1F95AA82C52ACA8 -Dtests.multiplier=3 -Dtests.locale=nl-BE

Re: [PR] Minor access modifier adjustment to a couple of lucene90 backward compat types [lucene]

2025-05-21 Thread via GitHub
github-actions[bot] commented on PR #14695: URL: https://github.com/apache/lucene/pull/14695#issuecomment-2898413053 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

[PR] Minor access modifier adjustment to a couple of lucene90 backward compat types [lucene]

2025-05-21 Thread via GitHub
ChrisHegarty opened a new pull request, #14695: URL: https://github.com/apache/lucene/pull/14695 This commit makes a minor adjustment to a couple of lucene90 backward compat types to avoid duplicating them for older code. -- This is an automated message from the Apache Git Service. To res

Re: [PR] Use a hint to specify READONCE IOContext [lucene]

2025-05-21 Thread via GitHub
thecoop commented on code in PR #14509: URL: https://github.com/apache/lucene/pull/14509#discussion_r2100453646 ## lucene/core/src/java/org/apache/lucene/store/IOContext.java: ## @@ -56,7 +56,7 @@ interface FileOpenHint {} * This context should only be used when the read ope

Re: [I] Could/should KNN queries use per-segment query caching? [lucene]

2025-05-21 Thread via GitHub
jpountz commented on issue #14669: URL: https://github.com/apache/lucene/issues/14669#issuecomment-2897936261 I wonder if this use-case would be better served by something like Elasticsearch's shard request cache. The cache key is the whole request (query, number of hits retrieved, etc.), p

Re: [I] Integrate a JVector codec for KNN searches [lucene]

2025-05-21 Thread via GitHub
jpountz commented on issue #14681: URL: https://github.com/apache/lucene/issues/14681#issuecomment-2897879834 > sandbox Nit: it's fine for any codec to live in `lucene/codecs` in my opinion, the bar isn't much higher than sandbox, and this allows us to put them into codec randomizati

Re: [PR] Speed up exhaustive evaluation. [lucene]

2025-05-21 Thread via GitHub
jpountz commented on PR #14679: URL: https://github.com/apache/lucene/pull/14679#issuecomment-2897806175 CheckIndex integration is pushed, I hooked into a place where we were already exhaustively consuming the `PostingsEnum` anyway, so it shouldn't cause a major slowdown. -- This is an a

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-05-21 Thread via GitHub
jpountz commented on PR #14333: URL: https://github.com/apache/lucene/pull/14333#issuecomment-2897729953 This is great info, thanks for sharing! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to th

Re: [PR] A specialized Trie for Block Tree Index [lucene]

2025-05-21 Thread via GitHub
mikemccand commented on PR #14333: URL: https://github.com/apache/lucene/pull/14333#issuecomment-2897725511 Thank you @Coqueue. > ran it against an Amazon Search internal benchmark, from which we observed an increase of 2.7% in Searcher throughput : D Small correction: Amazon P

Re: [PR] Speed up exhaustive evaluation. [lucene]

2025-05-21 Thread via GitHub
gf2121 commented on code in PR #14679: URL: https://github.com/apache/lucene/pull/14679#discussion_r2099773865 ## lucene/core/src/java/org/apache/lucene/search/Scorer.java: ## @@ -76,4 +77,57 @@ public int advanceShallow(int target) throws IOException { * {@link #advanceShal

Re: [PR] Speed up exhaustive evaluation. [lucene]

2025-05-21 Thread via GitHub
gf2121 commented on code in PR #14679: URL: https://github.com/apache/lucene/pull/14679#discussion_r2099767800 ## lucene/core/src/java/org/apache/lucene/search/TermScorer.java: ## @@ -120,4 +126,54 @@ public void setMinCompetitiveScore(float minScore) { impactsDisi.setMin

Re: [PR] Speed up exhaustive evaluation. [lucene]

2025-05-21 Thread via GitHub
jpountz commented on PR #14679: URL: https://github.com/apache/lucene/pull/14679#issuecomment-2897149675 Thanks for the feedback, both. I added coverage to `BasePostingsFormatTestCase`. `TestDuelingCodecs` is a bit tricky since implementations are free to return buffers of arbitrary sizes.

Re: [PR] Speed up exhaustive evaluation. [lucene]

2025-05-21 Thread via GitHub
jpountz commented on code in PR #14679: URL: https://github.com/apache/lucene/pull/14679#discussion_r2099741906 ## lucene/core/src/java/org/apache/lucene/search/Scorer.java: ## @@ -76,4 +77,57 @@ public int advanceShallow(int target) throws IOException { * {@link #advanceSha

Re: [PR] Specify and test that IOContext is immutable [lucene]

2025-05-21 Thread via GitHub
ChrisHegarty merged PR #14686: URL: https://github.com/apache/lucene/pull/14686 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucen

Re: [PR] Speed up exhaustive evaluation. [lucene]

2025-05-21 Thread via GitHub
gf2121 commented on code in PR #14679: URL: https://github.com/apache/lucene/pull/14679#discussion_r2099660480 ## lucene/core/src/java/org/apache/lucene/index/PostingsEnum.java: ## @@ -97,4 +98,44 @@ protected PostingsEnum() {} * anything (neither members of the returned Byt

Re: [PR] DocIdRunEnd implementation missed in Lucene103PostingsReader [lucene]

2025-05-21 Thread via GitHub
gf2121 commented on PR #14693: URL: https://github.com/apache/lucene/pull/14693#issuecomment-2896897637 Test failure is unrelated, i raised https://github.com/apache/lucene/issues/14694. -- This is an automated message from the Apache Git Service. To respond to the message, please log on

Re: [PR] Speed up exhaustive evaluation. [lucene]

2025-05-21 Thread via GitHub
gf2121 commented on code in PR #14679: URL: https://github.com/apache/lucene/pull/14679#discussion_r2099522207 ## lucene/core/src/java/org/apache/lucene/search/Scorer.java: ## @@ -76,4 +77,57 @@ public int advanceShallow(int target) throws IOException { * {@link #advanceShal

Re: [PR] deps(java): bump org.eclipse.jgit:org.eclipse.jgit from 7.2.0.202503040940-r to 7.2.1.202505142326-r [lucene]

2025-05-21 Thread via GitHub
dweiss merged PR #14692: URL: https://github.com/apache/lucene/pull/14692 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

Re: [PR] deps(java): bump org.gradle.toolchains.foojay-resolver-convention from 0.10.0 to 1.0.0 [lucene]

2025-05-21 Thread via GitHub
dweiss merged PR #14691: URL: https://github.com/apache/lucene/pull/14691 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apac

[PR] DocIdRunEnd implementation missed in Lucene103PostingsReader [lucene]

2025-05-21 Thread via GitHub
gf2121 opened a new pull request, #14693: URL: https://github.com/apache/lucene/pull/14693 The `docIdRunEnd` implementation of postings (introduced in #14390) missed in `Lucene103PostingsReader`. Thanks @bugmakerr for finding this and reminding me! -- This is an automated messa

[I] org.apache.lucene.search.TestPatienceFloatVectorQuery.testFindAll failed [lucene]

2025-05-20 Thread via GitHub
gf2121 opened a new issue, #14694: URL: https://github.com/apache/lucene/issues/14694 ### Description ``` org.apache.lucene.search.TestPatienceFloatVectorQuery > test suite's output saved to /home/runner/work/lucene/lucene/lucene/core/build/test-results/test/outputs/OUTPUT-org.apa

Re: [PR] DocIdRunEnd implementation missed in Lucene103PostingsReader [lucene]

2025-05-20 Thread via GitHub
github-actions[bot] commented on PR #14693: URL: https://github.com/apache/lucene/pull/14693#issuecomment-2896701459 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

Re: [PR] Speed up exhaustive evaluation. [lucene]

2025-05-20 Thread via GitHub
rmuir commented on PR #14679: URL: https://github.com/apache/lucene/pull/14679#issuecomment-2896403654 Thank you! "bulkpostings 2.0" is looking really clean and non-invasive :) > I suspect it may be tempting in the future, because it enables further optimizations as @gf2121 showed in

Re: [PR] Improve BytesRef creation from String [lucene]

2025-05-20 Thread via GitHub
rmuir commented on PR #14678: URL: https://github.com/apache/lucene/pull/14678#issuecomment-2896282594 yeah it is tricky, since the 'strings' indexed in search are usually small: words. for a lot of natural languages average word length is already small (e.g. english: ~5), and often in sear

Re: [PR] Update created version major [lucene]

2025-05-20 Thread via GitHub
github-actions[bot] commented on PR #14607: URL: https://github.com/apache/lucene/pull/14607#issuecomment-2896124380 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Fix FuzzySet#createSetBasedOnMaxMemory to honor bytes not bits [lucene]

2025-05-20 Thread via GitHub
github-actions[bot] commented on PR #14616: URL: https://github.com/apache/lucene/pull/14616#issuecomment-2896124348 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contributi

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-05-20 Thread via GitHub
dungba88 commented on PR #14009: URL: https://github.com/apache/lucene/pull/14009#issuecomment-2896110176 Thanks @vigyasharma for the comment! I updated the comment to make it less confusing. I'll think about generalization, but the idea is that as long as the field can expose the fl

Re: [PR] deps(java): bump org.eclipse.jgit:org.eclipse.jgit from 7.2.0.202503040940-r to 7.2.1.202505142326-r [lucene]

2025-05-20 Thread via GitHub
github-actions[bot] commented on PR #14692: URL: https://github.com/apache/lucene/pull/14692#issuecomment-2896069370 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

Re: [PR] deps(java): bump org.gradle.toolchains.foojay-resolver-convention from 0.10.0 to 1.0.0 [lucene]

2025-05-20 Thread via GitHub
github-actions[bot] commented on PR #14691: URL: https://github.com/apache/lucene/pull/14691#issuecomment-2896069185 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

[PR] deps(java): bump org.eclipse.jgit:org.eclipse.jgit from 7.2.0.202503040940-r to 7.2.1.202505142326-r [lucene]

2025-05-20 Thread via GitHub
dependabot[bot] opened a new pull request, #14692: URL: https://github.com/apache/lucene/pull/14692 Bumps [org.eclipse.jgit:org.eclipse.jgit](https://github.com/eclipse-jgit/jgit) from 7.2.0.202503040940-r to 7.2.1.202505142326-r. Commits https://github.com/eclipse-jgit/jgit/c

[PR] deps(java): bump org.gradle.toolchains.foojay-resolver-convention from 0.10.0 to 1.0.0 [lucene]

2025-05-20 Thread via GitHub
dependabot[bot] opened a new pull request, #14691: URL: https://github.com/apache/lucene/pull/14691 Bumps org.gradle.toolchains.foojay-resolver-convention from 0.10.0 to 1.0.0. [![Dependabot compatibility score](https://dependabot-badges.githubapp.com/badges/compatibility_score?de

Re: [PR] Add Query for reranking KnnFloatVectorQuery with full-precision vectors [lucene]

2025-05-20 Thread via GitHub
vigyasharma commented on code in PR #14009: URL: https://github.com/apache/lucene/pull/14009#discussion_r2098815198 ## lucene/core/src/java/org/apache/lucene/search/RerankKnnFloatVectorQuery.java: ## @@ -0,0 +1,117 @@ +/* + * Licensed to the Apache Software Foundation (ASF) unde

Re: [PR] Improve BytesRef creation from String [lucene]

2025-05-20 Thread via GitHub
schlosna closed pull request #14678: Improve BytesRef creation from String URL: https://github.com/apache/lucene/pull/14678 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To u

Re: [PR] Improve BytesRef creation from String [lucene]

2025-05-20 Thread via GitHub
schlosna commented on PR #14678: URL: https://github.com/apache/lucene/pull/14678#issuecomment-2896009970 > > In #12071 these is mention [#12071 (comment)](https://github.com/apache/lucene/issues/12071#issuecomment-1379313710) of using the vector APIs to speed up UnicodeUtil conversions. Ha

Re: [PR] Clean up how the test framework creates asserting scorables. [lucene]

2025-05-20 Thread via GitHub
jpountz merged PR #14452: URL: https://github.com/apache/lucene/pull/14452 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Make competitive iterators more robust. [lucene]

2025-05-20 Thread via GitHub
jpountz merged PR #14532: URL: https://github.com/apache/lucene/pull/14532 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Remove DISIDocIdStream. [lucene]

2025-05-20 Thread via GitHub
jpountz merged PR #14550: URL: https://github.com/apache/lucene/pull/14550 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Implement AssertingPostingsEnum#intoBitSet. [lucene]

2025-05-20 Thread via GitHub
jpountz merged PR #14675: URL: https://github.com/apache/lucene/pull/14675 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

[PR] Speed up conjunctive queries that need scores. [lucene]

2025-05-20 Thread via GitHub
jpountz opened a new pull request, #14690: URL: https://github.com/apache/lucene/pull/14690 Calls to `DocIdSetIterator#nextDoc`, `DocIdSetIterator#advance` and `SimScorer#score` are currently interleaved and include lots of conditionals. This builds up on #14679 and refactors the code a

Re: [I] Try GroupVInt for writing HNSW neighbor node arrays? [lucene]

2025-05-20 Thread via GitHub
msokolov closed issue #14689: Try GroupVInt for writing HNSW neighbor node arrays? URL: https://github.com/apache/lucene/issues/14689 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific com

Re: [I] Try GroupVInt for writing HNSW neighbor node arrays? [lucene]

2025-05-20 Thread via GitHub
msokolov commented on issue #14689: URL: https://github.com/apache/lucene/issues/14689#issuecomment-2895734509 yup, looks like a duplicate - thanks for finding @benwtrent -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and us

Re: [PR] Speed up exhaustive evaluation. [lucene]

2025-05-20 Thread via GitHub
jpountz commented on PR #14679: URL: https://github.com/apache/lucene/pull/14679#issuecomment-2895700581 You are correct, no need for additional APIs on Similarity at this point, I removed it. I suspect it may be tempting in the future, because it enables further optimizations as @gf2121 sh

Re: [PR] Speed up exhaustive evaluation. [lucene]

2025-05-20 Thread via GitHub
jpountz commented on code in PR #14679: URL: https://github.com/apache/lucene/pull/14679#discussion_r2098761902 ## lucene/core/src/java/org/apache/lucene/index/PostingsEnum.java: ## @@ -97,4 +98,44 @@ protected PostingsEnum() {} * anything (neither members of the returned By

Re: [I] Try GroupVInt for writing HNSW neighbor node arrays? [lucene]

2025-05-20 Thread via GitHub
benwtrent commented on issue #14689: URL: https://github.com/apache/lucene/issues/14689#issuecomment-2895637071 I think this might be a duplicate? https://github.com/apache/lucene/issues/12871 I agree it's a good idea :) -- This is an automated message from the Apache Git Ser

Re: [I] Support for DocIdSetBuilder with (min,max) docId [lucene]

2025-05-20 Thread via GitHub
jainankitk commented on issue #14485: URL: https://github.com/apache/lucene/issues/14485#issuecomment-2895538068 Thanks @javanna for getting back with the current status. Will wait for @prudhvigodithi to make progress on the proposal and PRs. Will loop you in for reviews given your intra-se

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-05-20 Thread via GitHub
kaivalnp commented on PR #14178: URL: https://github.com/apache/lucene/pull/14178#issuecomment-2895246147 Rebased the PR to incorporate recent changes (including the optimistic collection based on pro-rating) --- Single-segment search has no impact as expected: Lucene:

Re: [PR] Add a Faiss codec for KNN searches [lucene]

2025-05-20 Thread via GitHub
github-actions[bot] commented on PR #14178: URL: https://github.com/apache/lucene/pull/14178#issuecomment-2895178019 This PR does not have an entry in lucene/CHANGES.txt. Consider adding one. If the PR doesn't need a changelog entry, then add the skip-changelog-check label to it and you wil

Re: [PR] Fix patience knn queries to work with seeded knn queries [lucene]

2025-05-20 Thread via GitHub
tteofili merged PR #14688: URL: https://github.com/apache/lucene/pull/14688 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.ap

[I] Try GroupVInt for writing HNSW neighbor node arrays? [lucene]

2025-05-20 Thread via GitHub
mikemccand opened a new issue, #14689: URL: https://github.com/apache/lucene/issues/14689 ### Description @msokolov relayed this idea from @jpountz: today, the default `KnnVectorsFormat` uses delta vInt (I think?) to write the neighbor nodes array ... maybe `GroupVInt` would be small

Re: [I] Support for Pluggable Custom Vector Similarity Functions [lucene]

2025-05-20 Thread via GitHub
msokolov commented on issue #14520: URL: https://github.com/apache/lucene/issues/14520#issuecomment-2894524859 I think it's a duplicate of #14025 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to

Re: [I] HyphenationCompoundWordTokenFilter fixed token position and preserves original token [lucene]

2025-05-20 Thread via GitHub
mikemccand commented on issue #14624: URL: https://github.com/apache/lucene/issues/14624#issuecomment-2894460764 To address your 2nd idea (increment the position for each sub-word in the compound word), I think we'd need to create a graph-aware `CompoundWordTokenFilter`. It would also emit

Re: [I] Multi-threaded vector search over multiple segments can lead to inconsistent results [lucene]

2025-05-20 Thread via GitHub
mikemccand commented on issue #14180: URL: https://github.com/apache/lucene/issues/14180#issuecomment-2894399826 It sounds like this is fixed, I will close this now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the

Re: [I] Multi-threaded vector search over multiple segments can lead to inconsistent results [lucene]

2025-05-20 Thread via GitHub
mikemccand closed issue #14180: Multi-threaded vector search over multiple segments can lead to inconsistent results URL: https://github.com/apache/lucene/issues/14180 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [I] Segment count (merging) can impact recall on KNN ParentJoin queries [lucene]

2025-05-20 Thread via GitHub
mikemccand commented on issue #14643: URL: https://github.com/apache/lucene/issues/14643#issuecomment-2894378185 > This doesn't look like a problem with regular KNN vector queries, only appears with parent-join query benchmarks. Hmm it's odd for the 500K docs case that recall is so mu

Re: [I] Integrate a JVector codec for KNN searches [lucene]

2025-05-20 Thread via GitHub
mikemccand commented on issue #14681: URL: https://github.com/apache/lucene/issues/14681#issuecomment-2894290455 +1, it'd be awesome to refactor OpenSearch's jvector integration down to Lucene as an alternative Codec (`KnnVectorsFormat`) component in sandbox. https://github.com/apache

  1   2   3   4   5   6   7   8   9   10   >