gf2121 opened a new issue, #14694:
URL: https://github.com/apache/lucene/issues/14694
### Description
```
org.apache.lucene.search.TestPatienceFloatVectorQuery > test suite's output
saved to
/home/runner/work/lucene/lucene/lucene/core/build/test-results/test/outputs/OUTPUT-org.apa
github-actions[bot] commented on PR #14693:
URL: https://github.com/apache/lucene/pull/14693#issuecomment-2896701459
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one.
If the PR doesn't need a changelog entry, then add the skip-changelog-check
label to it and you wil
rmuir commented on PR #14679:
URL: https://github.com/apache/lucene/pull/14679#issuecomment-2896403654
Thank you! "bulkpostings 2.0" is looking really clean and non-invasive :)
> I suspect it may be tempting in the future, because it enables further
optimizations as @gf2121 showed in
rmuir commented on PR #14678:
URL: https://github.com/apache/lucene/pull/14678#issuecomment-2896282594
yeah it is tricky, since the 'strings' indexed in search are usually small:
words. for a lot of natural languages average word length is already small
(e.g. english: ~5), and often in sear
github-actions[bot] commented on PR #14607:
URL: https://github.com/apache/lucene/pull/14607#issuecomment-2896124380
This PR has not had activity in the past 2 weeks, labeling it as stale. If
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you
for your contributi
github-actions[bot] commented on PR #14616:
URL: https://github.com/apache/lucene/pull/14616#issuecomment-2896124348
This PR has not had activity in the past 2 weeks, labeling it as stale. If
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you
for your contributi
dungba88 commented on PR #14009:
URL: https://github.com/apache/lucene/pull/14009#issuecomment-2896110176
Thanks @vigyasharma for the comment! I updated the comment to make it less
confusing.
I'll think about generalization, but the idea is that as long as the field
can expose the fl
github-actions[bot] commented on PR #14692:
URL: https://github.com/apache/lucene/pull/14692#issuecomment-2896069370
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one.
If the PR doesn't need a changelog entry, then add the skip-changelog-check
label to it and you wil
github-actions[bot] commented on PR #14691:
URL: https://github.com/apache/lucene/pull/14691#issuecomment-2896069185
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one.
If the PR doesn't need a changelog entry, then add the skip-changelog-check
label to it and you wil
dependabot[bot] opened a new pull request, #14692:
URL: https://github.com/apache/lucene/pull/14692
Bumps
[org.eclipse.jgit:org.eclipse.jgit](https://github.com/eclipse-jgit/jgit) from
7.2.0.202503040940-r to 7.2.1.202505142326-r.
Commits
https://github.com/eclipse-jgit/jgit/c
dependabot[bot] opened a new pull request, #14691:
URL: https://github.com/apache/lucene/pull/14691
Bumps org.gradle.toolchains.foojay-resolver-convention from 0.10.0 to 1.0.0.
[ unde
schlosna closed pull request #14678: Improve BytesRef creation from String
URL: https://github.com/apache/lucene/pull/14678
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To u
schlosna commented on PR #14678:
URL: https://github.com/apache/lucene/pull/14678#issuecomment-2896009970
> > In #12071 these is mention [#12071
(comment)](https://github.com/apache/lucene/issues/12071#issuecomment-1379313710)
of using the vector APIs to speed up UnicodeUtil conversions. Ha
jpountz merged PR #14452:
URL: https://github.com/apache/lucene/pull/14452
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jpountz merged PR #14532:
URL: https://github.com/apache/lucene/pull/14532
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jpountz merged PR #14550:
URL: https://github.com/apache/lucene/pull/14550
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jpountz merged PR #14675:
URL: https://github.com/apache/lucene/pull/14675
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
jpountz opened a new pull request, #14690:
URL: https://github.com/apache/lucene/pull/14690
Calls to `DocIdSetIterator#nextDoc`, `DocIdSetIterator#advance` and
`SimScorer#score` are currently interleaved and include lots of conditionals.
This builds up on #14679 and refactors the code a
msokolov closed issue #14689: Try GroupVInt for writing HNSW neighbor node
arrays?
URL: https://github.com/apache/lucene/issues/14689
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific com
msokolov commented on issue #14689:
URL: https://github.com/apache/lucene/issues/14689#issuecomment-2895734509
yup, looks like a duplicate - thanks for finding @benwtrent
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and us
jpountz commented on PR #14679:
URL: https://github.com/apache/lucene/pull/14679#issuecomment-2895700581
You are correct, no need for additional APIs on Similarity at this point, I
removed it. I suspect it may be tempting in the future, because it enables
further optimizations as @gf2121 sh
jpountz commented on code in PR #14679:
URL: https://github.com/apache/lucene/pull/14679#discussion_r2098761902
##
lucene/core/src/java/org/apache/lucene/index/PostingsEnum.java:
##
@@ -97,4 +98,44 @@ protected PostingsEnum() {}
* anything (neither members of the returned By
benwtrent commented on issue #14689:
URL: https://github.com/apache/lucene/issues/14689#issuecomment-2895637071
I think this might be a duplicate?
https://github.com/apache/lucene/issues/12871
I agree it's a good idea :)
--
This is an automated message from the Apache Git Ser
jainankitk commented on issue #14485:
URL: https://github.com/apache/lucene/issues/14485#issuecomment-2895538068
Thanks @javanna for getting back with the current status. Will wait for
@prudhvigodithi to make progress on the proposal and PRs. Will loop you in for
reviews given your intra-se
kaivalnp commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2895246147
Rebased the PR to incorporate recent changes (including the optimistic
collection based on pro-rating)
---
Single-segment search has no impact as expected:
Lucene:
github-actions[bot] commented on PR #14178:
URL: https://github.com/apache/lucene/pull/14178#issuecomment-2895178019
This PR does not have an entry in lucene/CHANGES.txt. Consider adding one.
If the PR doesn't need a changelog entry, then add the skip-changelog-check
label to it and you wil
tteofili merged PR #14688:
URL: https://github.com/apache/lucene/pull/14688
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.ap
mikemccand opened a new issue, #14689:
URL: https://github.com/apache/lucene/issues/14689
### Description
@msokolov relayed this idea from @jpountz: today, the default
`KnnVectorsFormat` uses delta vInt (I think?) to write the neighbor nodes array
... maybe `GroupVInt` would be small
msokolov commented on issue #14520:
URL: https://github.com/apache/lucene/issues/14520#issuecomment-2894524859
I think it's a duplicate of #14025
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to
mikemccand commented on issue #14624:
URL: https://github.com/apache/lucene/issues/14624#issuecomment-2894460764
To address your 2nd idea (increment the position for each sub-word in the
compound word), I think we'd need to create a graph-aware
`CompoundWordTokenFilter`. It would also emit
mikemccand commented on issue #14180:
URL: https://github.com/apache/lucene/issues/14180#issuecomment-2894399826
It sounds like this is fixed, I will close this now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
mikemccand closed issue #14180: Multi-threaded vector search over multiple
segments can lead to inconsistent results
URL: https://github.com/apache/lucene/issues/14180
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
U
mikemccand commented on issue #14643:
URL: https://github.com/apache/lucene/issues/14643#issuecomment-2894378185
> This doesn't look like a problem with regular KNN vector queries, only
appears with parent-join query benchmarks.
Hmm it's odd for the 500K docs case that recall is so mu
mikemccand commented on issue #14681:
URL: https://github.com/apache/lucene/issues/14681#issuecomment-2894290455
+1, it'd be awesome to refactor OpenSearch's jvector integration down to
Lucene as an alternative Codec (`KnnVectorsFormat`) component in sandbox.
https://github.com/apache
rmuir commented on PR #14679:
URL: https://github.com/apache/lucene/pull/14679#issuecomment-2894197496
Do we really need the method on Similarity? I guess I feel, most users are
probably using BM25Similarity, so I don't understand the explanation in the
comments.
If we have "bogus" i
Coqueue commented on PR #14333:
URL: https://github.com/apache/lucene/pull/14333#issuecomment-2893516032
Thanks for the fantastic change!
Want to share that we adopted and backported this codec to Lucene 912, and
ran it against an Amazon Search internal benchmark, from which we observ
37 matches
Mail list logo