Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-10 Thread via GitHub


shatejas commented on code in PR #13985:
URL: https://github.com/apache/lucene/pull/13985#discussion_r1835999756


##
lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java:
##
@@ -123,4 +123,11 @@ public abstract void search(
   public KnnVectorsReader getMergeInstance() {
 return this;
   }
+
+  /**
+   * Optional: reset or close merge resources used in the reader
+   *
+   * The default implementation is empty
+   */
+  public void finishMerge() throws IOException {}

Review Comment:
   > I wonder if we should reuse the close() method of merge instances for 
this, what do you think @uschindler ?
   
   I went with that solution at first, `close` is called way to late, to be 
able to benefit the search requests going on as per my understanding. I was 
looking for something to switch back to random access earlier and thats how I 
landed with this approach. Happy to move it to `close()` if there is a 
preference
   
   >> Separately, it would be nice to improve the asserting framework 
(AssertingKnnVectorsReader in this case) to validate that finishMerge() is 
always called on merge instances.
   
   Wasn't aware of AssertingKnnVectorsReader. I can take a look



##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java:
##
@@ -113,6 +114,25 @@ public Lucene99HnswVectorsReader(SegmentReadState state, 
FlatVectorsReader flatV
 }
   }
 
+  private Lucene99HnswVectorsReader(
+  Lucene99HnswVectorsReader reader, KnnVectorsReader flatVectorsReader) {
+assert flatVectorsReader instanceof FlatVectorsReader;

Review Comment:
   >> I would rather make this ctor take a FlatVectorsReader and push the 
responsibility to callers to make the cast
   
   This makes sense to me, I will change it
   
   >> maybe we don't even need a cast if we make getMergeInstance() return a 
FlatVectorsReader
   
   This seems difficult as of now, 
[MergeState](https://github.com/shatejas/lucene/blob/12ca4779b962c96367f3e6a8b06523837e5e6434/lucene/core/src/java/org/apache/lucene/index/MergeState.java#L157)
 expects KNNVectorsReader and FlatVectorsReader is wrapped inside of 
KNNVectorsReader



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-10 Thread via GitHub


shatejas commented on code in PR #13985:
URL: https://github.com/apache/lucene/pull/13985#discussion_r1835999858


##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99HnswVectorsReader.java:
##
@@ -113,6 +114,25 @@ public Lucene99HnswVectorsReader(SegmentReadState state, 
FlatVectorsReader flatV
 }
   }
 
+  private Lucene99HnswVectorsReader(
+  Lucene99HnswVectorsReader reader, KnnVectorsReader flatVectorsReader) {
+assert flatVectorsReader instanceof FlatVectorsReader;

Review Comment:
   > I would rather make this ctor take a FlatVectorsReader and push the 
responsibility to callers to make the cast
   
   This makes sense to me, I will change it
   
   > maybe we don't even need a cast if we make getMergeInstance() return a 
FlatVectorsReader
   
   This seems difficult as of now, 
[MergeState](https://github.com/shatejas/lucene/blob/12ca4779b962c96367f3e6a8b06523837e5e6434/lucene/core/src/java/org/apache/lucene/index/MergeState.java#L157)
 expects KNNVectorsReader and FlatVectorsReader is wrapped inside of 
KNNVectorsReader



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-10 Thread via GitHub


shatejas commented on code in PR #13985:
URL: https://github.com/apache/lucene/pull/13985#discussion_r1836000686


##
lucene/core/src/java/org/apache/lucene/codecs/lucene99/Lucene99FlatVectorsReader.java:
##
@@ -327,4 +336,61 @@ static FieldEntry create(IndexInput input, FieldInfo info) 
throws IOException {
   info);
 }
   }
+
+  private static final class MergeLucene99FlatVectorsReader extends 
FlatVectorsReader {
+
+private final Lucene99FlatVectorsReader delegate;
+
+MergeLucene99FlatVectorsReader(final Lucene99FlatVectorsReader 
flatVectorsReader)

Review Comment:
   This seemed a bit more clean, Open to the approach if there is a strong 
preference



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-10 Thread via GitHub


shatejas commented on code in PR #13985:
URL: https://github.com/apache/lucene/pull/13985#discussion_r1835999756


##
lucene/core/src/java/org/apache/lucene/codecs/KnnVectorsReader.java:
##
@@ -123,4 +123,11 @@ public abstract void search(
   public KnnVectorsReader getMergeInstance() {
 return this;
   }
+
+  /**
+   * Optional: reset or close merge resources used in the reader
+   *
+   * The default implementation is empty
+   */
+  public void finishMerge() throws IOException {}

Review Comment:
   > I wonder if we should reuse the close() method of merge instances for 
this, what do you think @uschindler ?
   
   I went with that solution at first, `close` is called way to late, to be 
able to benefit the search requests going on as per my understanding. I was 
looking for something to switch back to random access earlier and thats how I 
landed with this approach. Happy to move it to `close()` if there is a 
preference
   
   > Separately, it would be nice to improve the asserting framework 
(AssertingKnnVectorsReader in this case) to validate that finishMerge() is 
always called on merge instances.
   
   Wasn't aware of AssertingKnnVectorsReader. I can take a look



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Introduces IndexInput#updateReadAdvice to change the readadvice while [lucene]

2024-11-10 Thread via GitHub


shatejas commented on code in PR #13985:
URL: https://github.com/apache/lucene/pull/13985#discussion_r1836001654


##
lucene/test-framework/src/java/org/apache/lucene/tests/store/BaseDirectoryTestCase.java:
##
@@ -1554,6 +1555,42 @@ public void testPrefetchOnSlice() throws IOException {
 doTestPrefetch(TestUtil.nextInt(random(), 1, 1024));
   }
 
+  public void testUpdateReadAdvice() throws IOException {
+try (Directory dir = getDirectory(createTempDir("testUpdateReadAdvice"))) {
+  final int totalLength = TestUtil.nextInt(random(), 16384, 65536);
+  byte[] arr = new byte[totalLength];
+  random().nextBytes(arr);
+  try (IndexOutput out = dir.createOutput("temp.bin", IOContext.DEFAULT)) {
+out.writeBytes(arr, arr.length);
+  }
+
+  try (IndexInput orig = dir.openInput("temp.bin", IOContext.DEFAULT)) {
+IndexInput in = random().nextBoolean() ? orig.clone() : orig;
+// Read advice updated at start
+orig.updateReadAdvice(randomReadAdvice());
+for (int i = 0; i < totalLength; i++) {
+  int offset = TestUtil.nextInt(random(), 0, (int) in.length() - 1);
+  in.seek(offset);
+  assertEquals(arr[offset], in.readByte());
+}
+
+// Updating readAdvice in the middle
+for (int i = 0; i < 10_000; ++i) {
+  int offset = TestUtil.nextInt(random(), 0, (int) in.length() - 1);
+  in.seek(offset);
+  assertEquals(arr[offset], in.readByte());
+  if (random().nextBoolean()) {
+orig.updateReadAdvice(randomReadAdvice());
+  }
+}
+  }
+}
+  }
+
+  private ReadAdvice randomReadAdvice() {
+return ReadAdvice.values()[TestUtil.nextInt(random(), 0, 
ReadAdvice.values().length - 1)];

Review Comment:
   Will switch



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Add a Better Binary Quantizer format for dense vectors [lucene]

2024-11-10 Thread via GitHub


ShashwatShivam commented on PR #13651:
URL: https://github.com/apache/lucene/pull/13651#issuecomment-2466635037

   Hey @benwtrent,
   Thank you for all your help so far! I have a question about the oversampling 
used to increase recall. From what I understand, it scales up the top-k and 
fanout values by the oversampling factor. In the final match set, do we return 
only the best top-k documents (not scaled up, but the original value)? I 
couldn't locate the code where the reranking or selection of the best k results 
from the expanded match set happens. Could you please help me find that part?
   Thanks again!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org



Re: [PR] Allow reading binary doc values as a RandomAccessInput [lucene]

2024-11-10 Thread via GitHub


github-actions[bot] commented on PR #13948:
URL: https://github.com/apache/lucene/pull/13948#issuecomment-2467011723

   This PR has not had activity in the past 2 weeks, labeling it as stale. If 
the PR is waiting for review, notify the d...@lucene.apache.org list. Thank you 
for your contribution!


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org