Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]
shatejas commented on code in PR #13985: URL: https://github.com/apache/lucene/pull/13985#discussion_r1835999756 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java: ## @@ -123,4 +123,11 @@ public abstract void search( public KnnVectorsReader getMergeInstance() { return this; } + + /** + * Optional: reset or close merge resources used in the reader + * + * The default implementation is empty + */ + public void finishMerge() throws IOException {} Review Comment: > I wonder if we should reuse the close() method of merge instances for this, what do you think @uschindler ? I went with that solution at first, `close` is called way to late, to be able to benefit the search requests going on as per my understanding. I was looking for something to switch back to random access earlier and thats how I landed with this approach. Happy to move it to `close()` if there is a preference >> Separately, it would be nice to improve the asserting framework (AssertingKnnVectorsReader in this case) to validate that finishMerge() is always called on merge instances. Wasn't aware of AssertingKnnVectorsReader. I can take a look ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -113,6 +114,25 @@ public Lucene99HnswVectorsReader(SegmentReadState state, FlatVectorsReader flatV } } + private Lucene99HnswVectorsReader( + Lucene99HnswVectorsReader reader, KnnVectorsReader flatVectorsReader) { +assert flatVectorsReader instanceof FlatVectorsReader; Review Comment: >> I would rather make this ctor take a FlatVectorsReader and push the responsibility to callers to make the cast This makes sense to me, I will change it >> maybe we don't even need a cast if we make getMergeInstance() return a FlatVectorsReader This seems difficult as of now, [MergeState](https://github.com/shatejas/lucene/blob/12ca4779b962c96367f3e6a8b06523837e5e6434/lucene/core/src/java/org/apache/lucene/index/MergeState.java#L157) expects KNNVectorsReader and FlatVectorsReader is wrapped inside of KNNVectorsReader -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]
shatejas commented on code in PR #13985: URL: https://github.com/apache/lucene/pull/13985#discussion_r1835999858 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java: ## @@ -113,6 +114,25 @@ public Lucene99HnswVectorsReader(SegmentReadState state, FlatVectorsReader flatV } } + private Lucene99HnswVectorsReader( + Lucene99HnswVectorsReader reader, KnnVectorsReader flatVectorsReader) { +assert flatVectorsReader instanceof FlatVectorsReader; Review Comment: > I would rather make this ctor take a FlatVectorsReader and push the responsibility to callers to make the cast This makes sense to me, I will change it > maybe we don't even need a cast if we make getMergeInstance() return a FlatVectorsReader This seems difficult as of now, [MergeState](https://github.com/shatejas/lucene/blob/12ca4779b962c96367f3e6a8b06523837e5e6434/lucene/core/src/java/org/apache/lucene/index/MergeState.java#L157) expects KNNVectorsReader and FlatVectorsReader is wrapped inside of KNNVectorsReader -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]
shatejas commented on code in PR #13985: URL: https://github.com/apache/lucene/pull/13985#discussion_r1836000686 ## lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java: ## @@ -327,4 +336,61 @@ static FieldEntry create(IndexInput input, FieldInfo info) throws IOException { info); } } + + private static final class MergeLucene99FlatVectorsReader extends FlatVectorsReader { + +private final Lucene99FlatVectorsReader delegate; + +MergeLucene99FlatVectorsReader(final Lucene99FlatVectorsReader flatVectorsReader) Review Comment: This seemed a bit more clean, Open to the approach if there is a strong preference -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]
shatejas commented on code in PR #13985: URL: https://github.com/apache/lucene/pull/13985#discussion_r1835999756 ## lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java: ## @@ -123,4 +123,11 @@ public abstract void search( public KnnVectorsReader getMergeInstance() { return this; } + + /** + * Optional: reset or close merge resources used in the reader + * + * The default implementation is empty + */ + public void finishMerge() throws IOException {} Review Comment: > I wonder if we should reuse the close() method of merge instances for this, what do you think @uschindler ? I went with that solution at first, `close` is called way to late, to be able to benefit the search requests going on as per my understanding. I was looking for something to switch back to random access earlier and thats how I landed with this approach. Happy to move it to `close()` if there is a preference > Separately, it would be nice to improve the asserting framework (AssertingKnnVectorsReader in this case) to validate that finishMerge() is always called on merge instances. Wasn't aware of AssertingKnnVectorsReader. I can take a look -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]
shatejas commented on code in PR #13985: URL: https://github.com/apache/lucene/pull/13985#discussion_r1836001654 ## lucene/test-framework/src/java/org/apache/lucene/tests/store/BaseDirectoryTestCase.java: ## @@ -1554,6 +1555,42 @@ public void testPrefetchOnSlice() throws IOException { doTestPrefetch(TestUtil.nextInt(random(), 1, 1024)); } + public void testUpdateReadAdvice() throws IOException { +try (Directory dir = getDirectory(createTempDir("testUpdateReadAdvice"))) { + final int totalLength = TestUtil.nextInt(random(), 16384, 65536); + byte[] arr = new byte[totalLength]; + random().nextBytes(arr); + try (IndexOutput out = dir.createOutput("temp.bin", IOContext.DEFAULT)) { +out.writeBytes(arr, arr.length); + } + + try (IndexInput orig = dir.openInput("temp.bin", IOContext.DEFAULT)) { +IndexInput in = random().nextBoolean() ? orig.clone() : orig; +// Read advice updated at start +orig.updateReadAdvice(randomReadAdvice()); +for (int i = 0; i < totalLength; i++) { + int offset = TestUtil.nextInt(random(), 0, (int) in.length() - 1); + in.seek(offset); + assertEquals(arr[offset], in.readByte()); +} + +// Updating readAdvice in the middle +for (int i = 0; i < 10_000; ++i) { + int offset = TestUtil.nextInt(random(), 0, (int) in.length() - 1); + in.seek(offset); + assertEquals(arr[offset], in.readByte()); + if (random().nextBoolean()) { +orig.updateReadAdvice(randomReadAdvice()); + } +} + } +} + } + + private ReadAdvice randomReadAdvice() { +return ReadAdvice.values()[TestUtil.nextInt(random(), 0, ReadAdvice.values().length - 1)]; Review Comment: Will switch -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Add a Better Binary Quantizer format for dense vectors [lucene]
ShashwatShivam commented on PR #13651: URL: https://github.com/apache/lucene/pull/13651#issuecomment-2466635037 Hey @benwtrent, Thank you for all your help so far! I have a question about the oversampling used to increase recall. From what I understand, it scales up the top-k and fanout values by the oversampling factor. In the final match set, do we return only the best top-k documents (not scaled up, but the original value)? I couldn't locate the code where the reranking or selection of the best k results from the expanded match set happens. Could you please help me find that part? Thanks again! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org
Re: [PR] Allow reading binary doc values as a RandomAccessInput [lucene]
github-actions[bot] commented on PR #13948: URL: https://github.com/apache/lucene/pull/13948#issuecomment-2467011723 This PR has not had activity in the past 2 weeks, labeling it as stale. If the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you for your contribution! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org - To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org