tteofili commented on PR #13984:
URL: https://github.com/apache/lucene/pull/13984#issuecomment-2469922060
this might also check for graph connectivity, see #12627
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
UR
jfboeuf opened a new pull request, #13989:
URL: https://github.com/apache/lucene/pull/13989
Take advantage of the existing buffer in `BufferedChecksum` to speed up
reads for Longs, Ints, Shorts, and Long arrays by avoiding byte-by-byte reads.
Use the faster `readLongs()` method to decode
benchaplin commented on code in PR #13984:
URL: https://github.com/apache/lucene/pull/13984#discussion_r1838595160
##
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##
@@ -2746,6 +2769,84 @@ public static Status.VectorValuesStatus testVectors(
return status;
msfroh commented on PR #13987:
URL: https://github.com/apache/lucene/pull/13987#issuecomment-2471331326
Luckily, this is a Lucene 10-only bug (from when `docId()` was removed from
`Scorable`).
I came across it when updating OpenSearch to support Lucene 10 and needed to
refactor some
vigyasharma opened a new pull request, #13991:
URL: https://github.com/apache/lucene/pull/13991
This is a small first change towards adding support for multi-vectors. We
start with adding a `MultiVectorSimilarityFunction` that can handle (late)
interaction across multiple vector values.
vigyasharma commented on PR #13991:
URL: https://github.com/apache/lucene/pull/13991#issuecomment-2472019578
I am thinking we can leverage the `NONE` aggregation (in #13525) for
non-ColBERT passage vector use-cases, by making each graph node correspond to a
single value in the multi-vector
msfroh commented on PR #13987:
URL: https://github.com/apache/lucene/pull/13987#issuecomment-2472044035
I think the caching worked fine, albeit in a funny way.
When you call `collect()` with any valid doc ID, it invalidates the cache,
causing scores to be computed. After the score wa
benchaplin commented on code in PR #13984:
URL: https://github.com/apache/lucene/pull/13984#discussion_r1838568093
##
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##
@@ -2746,6 +2769,84 @@ public static Status.VectorValuesStatus testVectors(
return status;
uschindler commented on PR #13985:
URL: https://github.com/apache/lucene/pull/13985#issuecomment-2471068437
Hi,
I am currently on travel, so I can't review this. Will look into it posisbly
later this week. Greetings from Costa Rica!
--
This is an automated message from the Apache Git S
viswanathk commented on code in PR #13990:
URL: https://github.com/apache/lucene/pull/13990#discussion_r1838356398
##
lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenByteKnnVectorQuery.java:
##
@@ -154,7 +154,14 @@ protected TopDocs approximateSearch(
vsop-479 commented on PR #13782:
URL: https://github.com/apache/lucene/pull/13782#issuecomment-2472163944
Hello @jpountz , Please take a look when you get a chance.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
U
dsmiley commented on code in PR #13384:
URL: https://github.com/apache/lucene/pull/13384#discussion_r1839282216
##
lucene/CHANGES.txt:
##
@@ -280,6 +280,8 @@ Optimizations
Bug Fixes
-
+* GITHUB#13384: Fix highlighter to use longer passages instead of shor
ChrisHegarty commented on PR #13986:
URL: https://github.com/apache/lucene/pull/13986#issuecomment-2471543719
> Personally I would prefer a less if/else/default handling using Optional
like done in the previous sysprops.
I'll make that change before merging.
> Greetings from Co
jpountz commented on code in PR #13989:
URL: https://github.com/apache/lucene/pull/13989#discussion_r1839685981
##
lucene/core/src/java/org/apache/lucene/store/BufferedChecksum.java:
##
@@ -60,6 +64,37 @@ public void update(byte[] b, int off, int len) {
}
}
+ void upd
gsmiller commented on code in PR #13886:
URL: https://github.com/apache/lucene/pull/13886#discussion_r1838528845
##
lucene/core/src/java/org/apache/lucene/search/IndexSortSortedNumericDocValuesRangeQuery.java:
##
@@ -397,106 +413,80 @@ private boolean matchAll(PointValues points
benchaplin commented on code in PR #13984:
URL: https://github.com/apache/lucene/pull/13984#discussion_r1838509594
##
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##
@@ -406,6 +411,21 @@ public static final class VectorValuesStatus {
public Throwable erro
jfboeuf commented on PR #13989:
URL: https://github.com/apache/lucene/pull/13989#issuecomment-2471157022
@jpountz
[I modified the benchmark to make it more realistic by adding a header to
the `IndexOutput`
](https://github.com/apache/lucene/commit/8dc6eac23b3a1158ef4c82860d8574c779bad04
benchaplin commented on code in PR #13984:
URL: https://github.com/apache/lucene/pull/13984#discussion_r1838595160
##
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##
@@ -2746,6 +2769,84 @@ public static Status.VectorValuesStatus testVectors(
return status;
rmuir commented on PR #13989:
URL: https://github.com/apache/lucene/pull/13989#issuecomment-2471135059
OK, I see @jfboeuf, thank you for the explanation. My only concern with
with the optimization is testing. If there is a bug here, the user will get
CorruptIndexException.
Could we
viswanathk commented on PR #13990:
URL: https://github.com/apache/lucene/pull/13990#issuecomment-2470836662
I hope I got all of them now.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the speci
benwtrent commented on code in PR #13990:
URL: https://github.com/apache/lucene/pull/13990#discussion_r1838315863
##
lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenByteKnnVectorQuery.java:
##
@@ -154,7 +154,14 @@ protected TopDocs approximateSearch(
mikemccand commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2470394946
> @ShashwatShivam I don't think there is a "memory column" provided
anywhere. I simply looked at the individual file sizes (veb, vex) and summed
their sizes together.
Once this
benwtrent commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2470503074
Quick update, we have been bothered with some of the numbers (for example,
models like "gist" perform poorly) and we have some improvements to get done
first before flipping back to "r
jpountz commented on PR #13990:
URL: https://github.com/apache/lucene/pull/13990#issuecomment-2470768672
I believe that @benwtrent meant `KnnByteVectorQuery`.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL abo
mikemccand commented on code in PR #13984:
URL: https://github.com/apache/lucene/pull/13984#discussion_r1837944575
##
lucene/core/src/java/org/apache/lucene/index/CheckIndex.java:
##
@@ -406,6 +411,21 @@ public static final class VectorValuesStatus {
public Throwable erro
jpountz merged PR #13970:
URL: https://github.com/apache/lucene/pull/13970
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: issues-unsubscr...@lucene.apa
rmuir commented on PR #13989:
URL: https://github.com/apache/lucene/pull/13989#issuecomment-2470068010
This is actually slower, we only want to call `updateBytes(byte[])` or the
checksum calculation is very slow (not vectorized).
--
This is an automated message from the Apache Git Service
jpountz commented on PR #13989:
URL: https://github.com/apache/lucene/pull/13989#issuecomment-2470814754
The change makes sense to me and looks like it could speed up loading live
docs.
> The benchmark shows the single-long approach performs better on small
arrays. [...] It can be im
viswanathk commented on code in PR #13990:
URL: https://github.com/apache/lucene/pull/13990#discussion_r1838322211
##
lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenByteKnnVectorQuery.java:
##
@@ -154,7 +154,14 @@ protected TopDocs approximateSearch(
viswanathk commented on code in PR #13990:
URL: https://github.com/apache/lucene/pull/13990#discussion_r1838356398
##
lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenByteKnnVectorQuery.java:
##
@@ -154,7 +154,14 @@ protected TopDocs approximateSearch(
mikemccand commented on code in PR #13987:
URL: https://github.com/apache/lucene/pull/13987#discussion_r1837971840
##
lucene/core/src/test/org/apache/lucene/search/TestTopFieldCollector.java:
##
@@ -359,7 +359,7 @@ public void testTotalHitsWithScore() throws Exception {
l
RS146BIJAY commented on issue #13387:
URL: https://github.com/apache/lucene/issues/13387#issuecomment-2470112408
On some more analysis figured out an approach which addresses all the above
comments and obtain same improvement with different IndexWriter for different
group as we got with usi
ShashwatShivam commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2470937060
I conducted a benchmark using Cohere's 768-dimensional data. Here are the
steps I followed for reproducibility:
1. **Set up** the [luceneutil
repository](https://github.com
ChrisHegarty commented on issue #13551:
URL: https://github.com/apache/lucene/issues/13551#issuecomment-2470361869
Late to the party!!!
I want to dig a little further on the distinction between the implementation
of preload and prefetch, at least with the mmap implementation. The for
jfboeuf commented on PR #13989:
URL: https://github.com/apache/lucene/pull/13989#issuecomment-2470657083
Thank you for your feedback. Perhaps I misunderstood your point, but the
implementation I propose only calls `Checksum.update(byte[])`. The change
resides in how the buffer is fed to avo
viswanathk commented on PR #13990:
URL: https://github.com/apache/lucene/pull/13990#issuecomment-2470717378
> Could you update the byte knn query & DiversifyingChildern* knn queries as
well?
I made the changes fo DiversifyingChildren*, but by byteknn do you mean
`ByteVectorSimilarity
benwtrent commented on code in PR #13990:
URL: https://github.com/apache/lucene/pull/13990#discussion_r1838141983
##
lucene/core/src/test/org/apache/lucene/search/TestKnnFloatVectorQuery.java:
##
@@ -29,13 +29,7 @@
import org.apache.lucene.document.Field;
import org.apache.luc
mikemccand commented on PR #13984:
URL: https://github.com/apache/lucene/pull/13984#issuecomment-2470306515
> this might also check for graph connectivity, see #12627
+1 -- this is a tricky thing about these HNSW graphs (that they are not
necessarily born connected but rather must be
mikemccand commented on PR #13984:
URL: https://github.com/apache/lucene/pull/13984#issuecomment-2470310487
Actually, `CheckIndex` does have some coverage for vectors and KNN graph (it
confirms it can enumerate all vectors, and also runs some searches on it if
it's not just flat vectors (`c
mikemccand commented on PR #13987:
URL: https://github.com/apache/lucene/pull/13987#issuecomment-2470331708
Wow, good catch @msfroh. Could we maybe add a new test case that explicitly
confirms that the wrapped `Scorable`'s `score` method is indeed only called
once even if the outer user ca
ChrisHegarty commented on PR #13985:
URL: https://github.com/apache/lucene/pull/13985#issuecomment-2470311965
@shatejas I'm curious how much this actually helps, and I know that you said
that benchmark results would be posted.
I do like that we can update the ReadAdvice on an index in
viswanathk opened a new pull request, #13990:
URL: https://github.com/apache/lucene/pull/13990
(no comment)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe,
42 matches
Mail list logo