jpountz commented on code in PR #12357:
URL: https://github.com/apache/lucene/pull/12357#discussion_r1224098583
##########
lucene/CHANGES.txt:
##########
@@ -177,6 +177,9 @@ Optimizations
* GITHUB#12328: Optimize ConjunctionDISI.createConjunction (Armin Braun)
+* GITHUB#12357: Better paging when doing backwards random reads. This speeds
up
+ queries relying on terms in NIOFSDirectory. (Alan Woodward)
Review Comment:
```suggestion
queries relying on terms in NIOFSDirectory and SimpleFSDirectory. (Alan
Woodward)
```
##########
lucene/core/src/java/org/apache/lucene/store/BufferedIndexInput.java:
##########
@@ -159,55 +159,53 @@ public final long readLong() throws IOException {
}
}
- @Override
- public final byte readByte(long pos) throws IOException {
+ // Computes an offset into the current buffer from an absolute position to
read
+ // `width` bytes from. If the buffer does not contain the position, then we
+ // readjust the bufferStart and refill.
+ private long resolvePositionInBuffer(long pos, int width) throws IOException
{
long index = pos - bufferStart;
- if (index < 0 || index >= buffer.limit()) {
+ if (index >= 0 && index < buffer.limit() - width + 1) {
+ return index;
+ }
+ if (index < 0) {
+ // if we're moving backwards, then try and fill up the previous page
rather than
+ // starting again at the current pos, to avoid successive backwards
reads reloading
+ // the same data over and over again
+ bufferStart = Math.min(pos, Math.max(0, bufferStart - bufferSize));
+ if (bufferStart + bufferSize - pos < width) {
+ bufferStart += width - 1; // make sure we don't read over the end of
the buffer
Review Comment:
Nit: should we update bufferStart in that case so that `pos + width` ends
right at the end of the buffer instead of adding `width-1` to `bufferStart`
regardless of the number of bytes that we were missing in the buffer?
```suggestion
bufferStart = pos + width - bufferSize; // make sure we don't read
over the end of the buffer
```
Maybe this could be folded directly into the min/max computations? E.g.
something like (not tested)
```java
bufferStart = bufferStart - bufferSize;
bufferStart = Math.max(bufferStart, pos + width - bufferSize); // make sure
we don't read over the end of the buffer
bufferStart = Math.max(bufferStart, 0);
bufferStart = Math.min(bufferStart, pos);
```
##########
lucene/core/src/java/org/apache/lucene/store/BufferedIndexInput.java:
##########
@@ -159,55 +159,53 @@ public final long readLong() throws IOException {
}
}
- @Override
- public final byte readByte(long pos) throws IOException {
+ // Computes an offset into the current buffer from an absolute position to
read
+ // `width` bytes from. If the buffer does not contain the position, then we
+ // readjust the bufferStart and refill.
+ private long resolvePositionInBuffer(long pos, int width) throws IOException
{
long index = pos - bufferStart;
- if (index < 0 || index >= buffer.limit()) {
+ if (index >= 0 && index < buffer.limit() - width + 1) {
Review Comment:
nit: let's save an addition?
```suggestion
if (index >= 0 && index <= buffer.limit() - width) {
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]