jpountz commented on code in PR #12357:
URL: https://github.com/apache/lucene/pull/12357#discussion_r1224098583


##########
lucene/CHANGES.txt:
##########
@@ -177,6 +177,9 @@ Optimizations
 
 * GITHUB#12328: Optimize ConjunctionDISI.createConjunction (Armin Braun)
 
+* GITHUB#12357: Better paging when doing backwards random reads.  This speeds 
up
+  queries relying on terms in NIOFSDirectory. (Alan Woodward)

Review Comment:
   ```suggestion
     queries relying on terms in NIOFSDirectory and SimpleFSDirectory. (Alan 
Woodward)
   ```



##########
lucene/core/src/java/org/apache/lucene/store/BufferedIndexInput.java:
##########
@@ -159,55 +159,53 @@ public final long readLong() throws IOException {
     }
   }
 
-  @Override
-  public final byte readByte(long pos) throws IOException {
+  // Computes an offset into the current buffer from an absolute position to 
read
+  // `width` bytes from.  If the buffer does not contain the position, then we
+  // readjust the bufferStart and refill.
+  private long resolvePositionInBuffer(long pos, int width) throws IOException 
{
     long index = pos - bufferStart;
-    if (index < 0 || index >= buffer.limit()) {
+    if (index >= 0 && index < buffer.limit() - width + 1) {
+      return index;
+    }
+    if (index < 0) {
+      // if we're moving backwards, then try and fill up the previous page 
rather than
+      // starting again at the current pos, to avoid successive backwards 
reads reloading
+      // the same data over and over again
+      bufferStart = Math.min(pos, Math.max(0, bufferStart - bufferSize));
+      if (bufferStart + bufferSize - pos < width) {
+        bufferStart += width - 1; // make sure we don't read over the end of 
the buffer

Review Comment:
   Nit: should we update bufferStart in that case so that `pos + width` ends 
right at the end of the buffer instead of adding `width-1` to `bufferStart` 
regardless of the number of bytes that we were missing in the buffer?
   
   ```suggestion
           bufferStart = pos + width - bufferSize; // make sure we don't read 
over the end of the buffer
   ```
   
   Maybe this could be folded directly into the min/max computations? E.g. 
something like (not tested)
   
   ```java
   bufferStart = bufferStart - bufferSize;
   bufferStart = Math.max(bufferStart, pos + width - bufferSize); // make sure 
we don't read over the end of the buffer
   bufferStart = Math.max(bufferStart, 0);
   bufferStart = Math.min(bufferStart, pos);
   ```



##########
lucene/core/src/java/org/apache/lucene/store/BufferedIndexInput.java:
##########
@@ -159,55 +159,53 @@ public final long readLong() throws IOException {
     }
   }
 
-  @Override
-  public final byte readByte(long pos) throws IOException {
+  // Computes an offset into the current buffer from an absolute position to 
read
+  // `width` bytes from.  If the buffer does not contain the position, then we
+  // readjust the bufferStart and refill.
+  private long resolvePositionInBuffer(long pos, int width) throws IOException 
{
     long index = pos - bufferStart;
-    if (index < 0 || index >= buffer.limit()) {
+    if (index >= 0 && index < buffer.limit() - width + 1) {

Review Comment:
   nit: let's save an addition?
   
   ```suggestion
       if (index >= 0 && index <= buffer.limit() - width) {
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to