date:20241112

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-12 Thread via GitHub

tteofili commented on PR #13984: URL: https://github.com/apache/lucene/pull/13984#issuecomment-2469922060 this might also check for graph connectivity, see #12627 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the UR

[PR] Improve checksum calculations [lucene]

2024-11-12 Thread via GitHub

jfboeuf opened a new pull request, #13989: URL: https://github.com/apache/lucene/pull/13989 Take advantage of the existing buffer in `BufferedChecksum` to speed up reads for Longs, Ints, Shorts, and Long arrays by avoiding byte-by-byte reads. Use the faster `readLongs()` method to decode

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-12 Thread via GitHub

benchaplin commented on code in PR #13984: URL: https://github.com/apache/lucene/pull/13984#discussion_r1838595160 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -2746,6 +2769,84 @@ public static Status.VectorValuesStatus testVectors( return status;

Re: [PR] Update lastDoc in ScoreCachingWrappingScorer [lucene]

2024-11-12 Thread via GitHub

msfroh commented on PR #13987: URL: https://github.com/apache/lucene/pull/13987#issuecomment-2471331326 Luckily, this is a Lucene 10-only bug (from when `docId()` was removed from `Scorable`). I came across it when updating OpenSearch to support Lucene 10 and needed to refactor some

[PR] Add a Multi-Vector Similarity Function [lucene]

2024-11-12 Thread via GitHub

vigyasharma opened a new pull request, #13991: URL: https://github.com/apache/lucene/pull/13991 This is a small first change towards adding support for multi-vectors. We start with adding a `MultiVectorSimilarityFunction` that can handle (late) interaction across multiple vector values.

Re: [PR] Add a Multi-Vector Similarity Function [lucene]

2024-11-12 Thread via GitHub

vigyasharma commented on PR #13991: URL: https://github.com/apache/lucene/pull/13991#issuecomment-2472019578 I am thinking we can leverage the `NONE` aggregation (in #13525) for non-ColBERT passage vector use-cases, by making each graph node correspond to a single value in the multi-vector

Re: [PR] Update lastDoc in ScoreCachingWrappingScorer [lucene]

2024-11-12 Thread via GitHub

msfroh commented on PR #13987: URL: https://github.com/apache/lucene/pull/13987#issuecomment-2472044035 I think the caching worked fine, albeit in a funny way. When you call `collect()` with any valid doc ID, it invalidates the cache, causing scores to be computed. After the score wa

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-12 Thread via GitHub

benchaplin commented on code in PR #13984: URL: https://github.com/apache/lucene/pull/13984#discussion_r1838568093 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -2746,6 +2769,84 @@ public static Status.VectorValuesStatus testVectors( return status;

Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-12 Thread via GitHub

uschindler commented on PR #13985: URL: https://github.com/apache/lucene/pull/13985#issuecomment-2471068437 Hi, I am currently on travel, so I can't review this. Will look into it posisbly later this week. Greetings from Costa Rica! -- This is an automated message from the Apache Git S

Re: [PR] Adding filter to the toString() method of KnnFloatVectorQuery [lucene]

2024-11-12 Thread via GitHub

viswanathk commented on code in PR #13990: URL: https://github.com/apache/lucene/pull/13990#discussion_r1838356398 ## lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenByteKnnVectorQuery.java: ## @@ -154,7 +154,14 @@ protected TopDocs approximateSearch(

Re: [PR] Use Arrays.compareUnsigned in IDVersionSegmentTermsEnum and OrdsSegmentTermsEnum. [lucene]

2024-11-12 Thread via GitHub

vsop-479 commented on PR #13782: URL: https://github.com/apache/lucene/pull/13782#issuecomment-2472163944 Hello @jpountz , Please take a look when you get a chance. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the U

Re: [PR] Prevent DefaultPassageFormatter from taking shorter overlapping passages [lucene]

2024-11-12 Thread via GitHub

dsmiley commented on code in PR #13384: URL: https://github.com/apache/lucene/pull/13384#discussion_r1839282216 ## lucene/CHANGES.txt: ## @@ -280,6 +280,8 @@ Optimizations Bug Fixes - +* GITHUB#13384: Fix highlighter to use longer passages instead of shor

Re: [PR] Allow easier verification of the Panama Vectorization provider with newer Java versions [lucene]

2024-11-12 Thread via GitHub

ChrisHegarty commented on PR #13986: URL: https://github.com/apache/lucene/pull/13986#issuecomment-2471543719 > Personally I would prefer a less if/else/default handling using Optional like done in the previous sysprops. I'll make that change before merging. > Greetings from Co

Re: [PR] Improve checksum calculations [lucene]

2024-11-12 Thread via GitHub

jpountz commented on code in PR #13989: URL: https://github.com/apache/lucene/pull/13989#discussion_r1839685981 ## lucene/core/src/java/org/apache/lucene/store/BufferedChecksum.java: ## @@ -60,6 +64,37 @@ public void update(byte[] b, int off, int len) { } } + void upd

Re: [PR] DocValuesSkipper implementation in IndexSortSorted [lucene]

2024-11-12 Thread via GitHub

gsmiller commented on code in PR #13886: URL: https://github.com/apache/lucene/pull/13886#discussion_r1838528845 ## lucene/core/src/java/org/apache/lucene/search/IndexSortSortedNumericDocValuesRangeQuery.java: ## @@ -397,106 +413,80 @@ private boolean matchAll(PointValues points

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-12 Thread via GitHub

benchaplin commented on code in PR #13984: URL: https://github.com/apache/lucene/pull/13984#discussion_r1838509594 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -406,6 +411,21 @@ public static final class VectorValuesStatus { public Throwable erro

Re: [PR] Improve checksum calculations [lucene]

2024-11-12 Thread via GitHub

jfboeuf commented on PR #13989: URL: https://github.com/apache/lucene/pull/13989#issuecomment-2471157022 @jpountz [I modified the benchmark to make it more realistic by adding a header to the `IndexOutput` ](https://github.com/apache/lucene/commit/8dc6eac23b3a1158ef4c82860d8574c779bad04

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-12 Thread via GitHub

benchaplin commented on code in PR #13984: URL: https://github.com/apache/lucene/pull/13984#discussion_r1838595160 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -2746,6 +2769,84 @@ public static Status.VectorValuesStatus testVectors( return status;

Re: [PR] Improve checksum calculations [lucene]

2024-11-12 Thread via GitHub

rmuir commented on PR #13989: URL: https://github.com/apache/lucene/pull/13989#issuecomment-2471135059 OK, I see @jfboeuf, thank you for the explanation. My only concern with with the optimization is testing. If there is a bug here, the user will get CorruptIndexException. Could we

Re: [PR] Adding filter to the toString() method of KnnFloatVectorQuery [lucene]

2024-11-12 Thread via GitHub

viswanathk commented on PR #13990: URL: https://github.com/apache/lucene/pull/13990#issuecomment-2470836662 I hope I got all of them now. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the speci

Re: [PR] Adding filter to the toString() method of KnnFloatVectorQuery [lucene]

2024-11-12 Thread via GitHub

benwtrent commented on code in PR #13990: URL: https://github.com/apache/lucene/pull/13990#discussion_r1838315863 ## lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenByteKnnVectorQuery.java: ## @@ -154,7 +154,14 @@ protected TopDocs approximateSearch(

Re: [PR] Add a Better Binary Quantizer format for dense vectors [lucene]

2024-11-12 Thread via GitHub

mikemccand commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2470394946 > @ShashwatShivam I don't think there is a "memory column" provided anywhere. I simply looked at the individual file sizes (veb, vex) and summed their sizes together. Once this

Re: [PR] Add a Better Binary Quantizer format for dense vectors [lucene]

2024-11-12 Thread via GitHub

benwtrent commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2470503074 Quick update, we have been bothered with some of the numbers (for example, models like "gist" perform poorly) and we have some improvements to get done first before flipping back to "r

Re: [PR] Adding filter to the toString() method of KnnFloatVectorQuery [lucene]

2024-11-12 Thread via GitHub

jpountz commented on PR #13990: URL: https://github.com/apache/lucene/pull/13990#issuecomment-2470768672 I believe that @benwtrent meant `KnnByteVectorQuery`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL abo

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-12 Thread via GitHub

mikemccand commented on code in PR #13984: URL: https://github.com/apache/lucene/pull/13984#discussion_r1837944575 ## lucene/core/src/java/org/apache/lucene/index/CheckIndex.java: ## @@ -406,6 +411,21 @@ public static final class VectorValuesStatus { public Throwable erro

Re: [PR] Simplify codec setup in vector-related tests. [lucene]

2024-11-12 Thread via GitHub

jpountz merged PR #13970: URL: https://github.com/apache/lucene/pull/13970 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apa

Re: [PR] Improve checksum calculations [lucene]

2024-11-12 Thread via GitHub

rmuir commented on PR #13989: URL: https://github.com/apache/lucene/pull/13989#issuecomment-2470068010 This is actually slower, we only want to call `updateBytes(byte[])` or the checksum calculation is very slow (not vectorized). -- This is an automated message from the Apache Git Service

Re: [PR] Improve checksum calculations [lucene]

2024-11-12 Thread via GitHub

jpountz commented on PR #13989: URL: https://github.com/apache/lucene/pull/13989#issuecomment-2470814754 The change makes sense to me and looks like it could speed up loading live docs. > The benchmark shows the single-long approach performs better on small arrays. [...] It can be im

Re: [PR] Adding filter to the toString() method of KnnFloatVectorQuery [lucene]

2024-11-12 Thread via GitHub

viswanathk commented on code in PR #13990: URL: https://github.com/apache/lucene/pull/13990#discussion_r1838322211 ## lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenByteKnnVectorQuery.java: ## @@ -154,7 +154,14 @@ protected TopDocs approximateSearch(

Re: [PR] Adding filter to the toString() method of KnnFloatVectorQuery [lucene]

2024-11-12 Thread via GitHub

viswanathk commented on code in PR #13990: URL: https://github.com/apache/lucene/pull/13990#discussion_r1838356398 ## lucene/join/src/java/org/apache/lucene/search/join/DiversifyingChildrenByteKnnVectorQuery.java: ## @@ -154,7 +154,14 @@ protected TopDocs approximateSearch(

Re: [PR] Update lastDoc in ScoreCachingWrappingScorer [lucene]

2024-11-12 Thread via GitHub

mikemccand commented on code in PR #13987: URL: https://github.com/apache/lucene/pull/13987#discussion_r1837971840 ## lucene/core/src/test/org/apache/lucene/search/TestTopFieldCollector.java: ## @@ -359,7 +359,7 @@ public void testTotalHitsWithScore() throws Exception { l

Re: [I] Support for criteria based DWPT selection inside DocumentWriter [lucene]

2024-11-12 Thread via GitHub

RS146BIJAY commented on issue #13387: URL: https://github.com/apache/lucene/issues/13387#issuecomment-2470112408 On some more analysis figured out an approach which addresses all the above comments and obtain same improvement with different IndexWriter for different group as we got with usi

Re: [PR] Add a Better Binary Quantizer format for dense vectors [lucene]

2024-11-12 Thread via GitHub

ShashwatShivam commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2470937060 I conducted a benchmark using Cohere's 768-dimensional data. Here are the steps I followed for reproducibility: 1. **Set up** the [luceneutil repository](https://github.com

Re: [I] Could Lucene's default Directory (`FSDirectory.open`) somehow preload `.vec` files? [lucene]

2024-11-12 Thread via GitHub

ChrisHegarty commented on issue #13551: URL: https://github.com/apache/lucene/issues/13551#issuecomment-2470361869 Late to the party!!! I want to dig a little further on the distinction between the implementation of preload and prefetch, at least with the mmap implementation. The for

Re: [PR] Improve checksum calculations [lucene]

2024-11-12 Thread via GitHub

jfboeuf commented on PR #13989: URL: https://github.com/apache/lucene/pull/13989#issuecomment-2470657083 Thank you for your feedback. Perhaps I misunderstood your point, but the implementation I propose only calls `Checksum.update(byte[])`. The change resides in how the buffer is fed to avo

Re: [PR] Adding filter to the toString() method of KnnFloatVectorQuery [lucene]

2024-11-12 Thread via GitHub

viswanathk commented on PR #13990: URL: https://github.com/apache/lucene/pull/13990#issuecomment-2470717378 > Could you update the byte knn query & DiversifyingChildern* knn queries as well? I made the changes fo DiversifyingChildren*, but by byteknn do you mean `ByteVectorSimilarity

Re: [PR] Adding filter to the toString() method of KnnFloatVectorQuery [lucene]

2024-11-12 Thread via GitHub

benwtrent commented on code in PR #13990: URL: https://github.com/apache/lucene/pull/13990#discussion_r1838141983 ## lucene/core/src/test/org/apache/lucene/search/TestKnnFloatVectorQuery.java: ## @@ -29,13 +29,7 @@ import org.apache.lucene.document.Field; import org.apache.luc

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-12 Thread via GitHub

mikemccand commented on PR #13984: URL: https://github.com/apache/lucene/pull/13984#issuecomment-2470306515 > this might also check for graph connectivity, see #12627 +1 -- this is a tricky thing about these HNSW graphs (that they are not necessarily born connected but rather must be

Re: [PR] Add some basic HNSW graph checks to CheckIndex [lucene]

2024-11-12 Thread via GitHub

mikemccand commented on PR #13984: URL: https://github.com/apache/lucene/pull/13984#issuecomment-2470310487 Actually, `CheckIndex` does have some coverage for vectors and KNN graph (it confirms it can enumerate all vectors, and also runs some searches on it if it's not just flat vectors (`c

Re: [PR] Update lastDoc in ScoreCachingWrappingScorer [lucene]

2024-11-12 Thread via GitHub

mikemccand commented on PR #13987: URL: https://github.com/apache/lucene/pull/13987#issuecomment-2470331708 Wow, good catch @msfroh. Could we maybe add a new test case that explicitly confirms that the wrapped `Scorable`'s `score` method is indeed only called once even if the outer user ca

Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-12 Thread via GitHub

ChrisHegarty commented on PR #13985: URL: https://github.com/apache/lucene/pull/13985#issuecomment-2470311965 @shatejas I'm curious how much this actually helps, and I know that you said that benchmark results would be posted. I do like that we can update the ReadAdvice on an index in

[PR] Addresses the issue #13983 by adding filter to the toString() method of KnnFloatVectorQuery [lucene]

2024-11-12 Thread via GitHub

viswanathk opened a new pull request, #13990: URL: https://github.com/apache/lucene/pull/13990 (no comment) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe,

42 matches

Mail list logo