Copilot commented on code in PR #13253:
URL: https://github.com/apache/lucene/pull/13253#discussion_r3122724722
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java:
##########
@@ -302,6 +302,68 @@ void rewind() {
*/
}
+ // Only rewind, don't force reload block.
+ // Reset readers' position, don't read, decompress.
+ // Current term greater than target, reduce endCount.
+ void rewindWithoutReload() {
+ // Set nextEnt to 0, to prevent force load.
+ nextEnt = 0;
+ suffixesReader.setPosition(0);
+ suffixLengthsReader.setPosition(0);
+ statsReader.setPosition(0);
+ bytesReader.setPosition(0);
+
+ // TODO: Since we only rewind without reload for first
floor(currentFrame.fp ==
+ // currentFrame.fpOrig)
+ // So no need to set floorDataReader again?
+ // if (isFloor) {
+ // floorDataReader.setPosition(rewindPos);
+ // numFollowFloorBlocks = floorDataReader.readVInt();
+ // assert numFollowFloorBlocks > 0;
+ // nextFloorLabel = floorDataReader.readByte() & 0xff;
+ // }
+
+ metaDataUpto = 0;
+
+ statsSingletonRunLength = 0;
+ state.termBlockOrd = 0;
+ lastSubFP = -1;
+ // state.termBlockOrd = 0;
+ /*
+ //System.out.println("rewind");
+ // Keeps the block loaded, but rewinds its state:
+ if (nextEnt > 0 || fp != fpOrig) {
+ if (DEBUG) {
+ System.out.println(" rewind frame ord=" + ord + " fpOrig=" + fpOrig +
" fp=" + fp + " hasTerms?=" + hasTerms + " isFloor?=" + isFloor + " nextEnt=" +
nextEnt + " prefixLen=" + prefix);
+ }
+ if (fp != fpOrig) {
+ fp = fpOrig;
+ nextEnt = -1;
+ } else {
+ nextEnt = 0;
+ }
+ hasTerms = hasTermsOrig;
+ if (isFloor) {
+ floorDataReader.rewind();
+ numFollowFloorBlocks = floorDataReader.readVInt();
+ nextFloorLabel = floorDataReader.readByte() & 0xff;
+ }
+ assert suffixBytes != null;
+ suffixesReader.rewind();
+ assert statBytes != null;
+ statsReader.rewind();
+ metaDataUpto = 0;
+ state.termBlockOrd = 0;
+ // TODO: skip this if !hasTerms? Then postings
+ // impl wouldn't have to write useless 0 byte
+ postingsReader.resetTermsBlock(fieldInfo, state);
+ lastSubFP = -1;
+ } else if (DEBUG) {
+ System.out.println(" skip rewind fp=" + fp + " fpOrig=" + fpOrig + "
nextEnt=" + nextEnt + " ord=" + ord);
+ }
+ */
Review Comment:
`rewindWithoutReload()` contains a large commented-out copy of the old
`rewind()` implementation plus TODO notes. This makes the method harder to
audit/maintain and risks diverging from the real logic. Prefer deleting the
dead commented block and keeping only the minimal explanation/TODOs that are
still actionable.
```suggestion
postingsReader.resetTermsBlock(fieldInfo, state);
lastSubFP = -1;
```
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnumFrame.java:
##########
@@ -302,6 +302,68 @@ void rewind() {
*/
}
+ // Only rewind, don't force reload block.
+ // Reset readers' position, don't read, decompress.
+ // Current term greater than target, reduce endCount.
Review Comment:
The comment says "reduce endCount" but the field is `entCount`. Please fix
the typo to avoid confusion (and consider clarifying under what invariants this
method is safe to call).
```suggestion
// Only rewind; do not force reloading the block.
// Reset readers' positions without reading or decompressing again.
// Safe to call only when the current loaded block remains valid for this
frame
// (for example, rewinding within the same block, such as the first floor
block
// where fp == fpOrig), so we only need to reduce entCount.
```
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java:
##########
@@ -568,10 +602,25 @@ private IOBooleanSupplier prepareSeekExact(BytesRef
target, boolean prefetch) th
currentFrame.prefetchBlock();
}
+ long finalWithOutReloadFp1 = withOutReloadFp;
+ int finalOrigNextEnt1 = origNextEnt;
return () -> {
currentFrame.loadBlock();
+ // If we still stay on withOutReload frame, reduce entCount to nextEnt.
+ int origEntCount = -1;
+ if (currentFrame.fp == finalWithOutReloadFp1 && finalOrigNextEnt1 != 0) {
+ origEntCount = currentFrame.entCount;
+ currentFrame.entCount = finalOrigNextEnt1;
+ }
+
final SeekStatus result = currentFrame.scanToTerm(target, true);
+
+ // Restore entCount to origEntCount.
+ if (origEntCount != -1) {
+ currentFrame.entCount = origEntCount;
Review Comment:
Same issue as above: `entCount` is temporarily narrowed and then restored,
but the restore is not in a finally block. If `scanToTerm(...)` throws,
`entCount` can remain truncated and break later seeks/next(). Use try/finally
to guarantee restoration.
```suggestion
final SeekStatus result;
try {
result = currentFrame.scanToTerm(target, true);
} finally {
// Restore entCount to origEntCount.
if (origEntCount != -1) {
currentFrame.entCount = origEntCount;
}
```
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java:
##########
@@ -779,8 +846,21 @@ public SeekStatus seekCeil(BytesRef target) throws
IOException {
currentFrame.loadBlock();
+ // If we still stay on withOutReload frame, reduce entCount to nextEnt.
+ int origEntCount = -1;
+ if (currentFrame.fp == withOutReloadFp && origNextEnt != 0) {
+ origEntCount = currentFrame.entCount;
+ currentFrame.entCount = origNextEnt;
+ }
+
// if (DEBUG) System.out.println(" now scanToTerm");
final SeekStatus result = currentFrame.scanToTerm(target, false);
+
+ // Restore entCount to origEntCount.
+ if (origEntCount != -1) {
+ currentFrame.entCount = origEntCount;
Review Comment:
In `seekCeil`, `entCount` is temporarily changed and restored only after
`scanToTerm(...)` returns. Because `scanToTerm(...)` can throw `IOException`,
the restore needs to be in a finally block to avoid leaving `entCount` in an
inconsistent state (also applies to the later identical block in this method).
```suggestion
final SeekStatus result;
try {
// if (DEBUG) System.out.println(" now scanToTerm");
result = currentFrame.scanToTerm(target, false);
} finally {
// Restore entCount to origEntCount.
if (origEntCount != -1) {
currentFrame.entCount = origEntCount;
}
```
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java:
##########
@@ -836,8 +916,20 @@ public SeekStatus seekCeil(BytesRef target) throws
IOException {
currentFrame.loadBlock();
+ // If we still stay on withOutReload frame, reduce entCount to nextEnt.
+ int origEntCount = -1;
+ if (currentFrame.fp == withOutReloadFp && origNextEnt != 0) {
+ origEntCount = currentFrame.entCount;
+ currentFrame.entCount = origNextEnt;
+ }
+
final SeekStatus result = currentFrame.scanToTerm(target, false);
+ // Restore entCount to origEntCount.
+ if (origEntCount != -1) {
+ currentFrame.entCount = origEntCount;
Review Comment:
Here too, `entCount` is temporarily reduced and then restored, but the
restoration is not protected by a finally block. If `scanToTerm(...)` throws
`IOException`, `entCount` can remain modified and corrupt subsequent
operations. Use try/finally around the `scanToTerm` call to guarantee
restoration.
```suggestion
final SeekStatus result;
try {
result = currentFrame.scanToTerm(target, false);
} finally {
// Restore entCount to origEntCount.
if (origEntCount != -1) {
currentFrame.entCount = origEntCount;
}
```
##########
lucene/backward-codecs/src/test/org/apache/lucene/backward_codecs/lucene912/TestLucene912PostingsFormat.java:
##########
@@ -154,4 +160,113 @@ private void doTestImpactSerialization(List<Impact>
impacts) throws IOException
}
}
}
+
+ public void testBackwardSeek() throws Exception {
+ Directory dir = newDirectory();
+ // Set minTermBlockSize to 2, maxTermBlockSize to 3, to generate deep
subBlock.
+ PostingsFormat postingsFormat = getDefaultPostingsFormat(2, 3);
+
+ IndexWriter writer =
+ new IndexWriter(dir,
newIndexWriterConfig().setCodec(alwaysPostingsFormat(postingsFormat)));
+ String[] categories =
+ new String[] {
+ "regular", "request1", "request2", "request3", "request4", "rest",
"teacher", "team"
+ };
+
+ for (String category : categories) {
+ Document doc = new Document();
+ doc.add(newStringField("category", category, Field.Store.YES));
+ writer.addDocument(doc);
+ }
+
+ IndexReader reader = DirectoryReader.open(writer);
+
+ TermsEnum termsEnum =
getOnlyLeafReader(reader).terms("category").iterator();
+
+ // Test seekExact.
+ BytesRef target = new BytesRef("request2");
+ assertTrue(termsEnum.seekExact(target));
+ assertEquals(termsEnum.term(), target);
+
+ target = new BytesRef("request1");
+ assertTrue(termsEnum.seekExact(target));
+ assertEquals(termsEnum.term(), target);
+
+ target = new BytesRef("request4");
+ assertTrue(termsEnum.seekExact(target));
+ assertEquals(termsEnum.term(), target);
+
+ target = new BytesRef("request3");
+ assertTrue(termsEnum.seekExact(target));
+ assertEquals(termsEnum.term(), target);
+
+ target = new BytesRef("request4");
+ assertTrue(termsEnum.seekExact(target));
+ assertEquals(termsEnum.term(), target);
+
+ target = new BytesRef("request1");
+ assertTrue(termsEnum.seekExact(target));
+ assertEquals(termsEnum.term(), target);
+
+ target = new BytesRef("request4");
+ assertTrue(termsEnum.seekExact(target));
+ assertEquals(termsEnum.term(), target);
+
+ target = new BytesRef("regular");
+ assertTrue(termsEnum.seekExact(target));
+ assertEquals(termsEnum.term(), target);
+
+ target = new BytesRef("rest");
+ assertTrue(termsEnum.seekExact(target));
+ assertEquals(termsEnum.term(), target);
+
+ target = new BytesRef("regular");
+ assertTrue(termsEnum.seekExact(target));
+ assertEquals(termsEnum.term(), target);
+
+ // Test seekCeil.
+ target = new BytesRef("request2");
+ assertEquals(TermsEnum.SeekStatus.FOUND, termsEnum.seekCeil(target));
+ assertEquals(termsEnum.term(), target);
+
+ target = new BytesRef("request1");
+ assertEquals(TermsEnum.SeekStatus.FOUND, termsEnum.seekCeil(target));
+ assertEquals(termsEnum.term(), target);
+
+ target = new BytesRef("request4");
+ assertEquals(TermsEnum.SeekStatus.FOUND, termsEnum.seekCeil(target));
+ assertEquals(termsEnum.term(), target);
+
+ target = new BytesRef("request3");
+ assertEquals(TermsEnum.SeekStatus.FOUND, termsEnum.seekCeil(target));
+ assertEquals(termsEnum.term(), target);
+
+ target = new BytesRef("request4");
+ assertEquals(TermsEnum.SeekStatus.FOUND, termsEnum.seekCeil(target));
+ assertEquals(termsEnum.term(), target);
+
+ target = new BytesRef("request1");
+ assertEquals(TermsEnum.SeekStatus.FOUND, termsEnum.seekCeil(target));
+ assertEquals(termsEnum.term(), target);
+
+ target = new BytesRef("request4");
+ assertEquals(TermsEnum.SeekStatus.FOUND, termsEnum.seekCeil(target));
+ assertEquals(termsEnum.term(), target);
+
+ target = new BytesRef("regular");
+ assertEquals(TermsEnum.SeekStatus.FOUND, termsEnum.seekCeil(target));
+ assertEquals(termsEnum.term(), target);
+
+ target = new BytesRef("rest");
+ assertEquals(TermsEnum.SeekStatus.FOUND, termsEnum.seekCeil(target));
+ assertEquals(termsEnum.term(), target);
+
+ target = new BytesRef("regular");
+ assertEquals(TermsEnum.SeekStatus.FOUND, termsEnum.seekCeil(target));
+ assertEquals(termsEnum.term(), target);
+
+ writer.close();
+ reader.close();
Review Comment:
Resource closing order: this test opens an NRT reader from the `IndexWriter`
(`DirectoryReader.open(writer)`) but closes the writer before closing the
reader. Other tests in this module close the reader first; closing the writer
early can cause issues in some NRT implementations. Close `reader` before
`writer` (or use try-with-resources for both).
```suggestion
reader.close();
writer.close();
```
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java:
##########
@@ -309,6 +309,10 @@ private boolean setEOF() {
}
private IOBooleanSupplier prepareSeekExact(BytesRef target, boolean
prefetch) throws IOException {
+ // Record rewind without reload block's state.
+ long withOutReloadFp = -1;
+ int origNextEnt = -1;
Review Comment:
Variable naming: `withOutReloadFp` reads like a typo/inconsistent camelCase
(and the same pattern appears as `finalWithOutReloadFp*`). Consider renaming to
`withoutReloadFp` / `withoutReloadFrameFp` (and similar for `origNextEnt`) to
improve readability.
```suggestion
long withoutReloadFp = -1;
int originalNextEnt = -1;
```
##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java:
##########
@@ -507,10 +526,25 @@ private IOBooleanSupplier prepareSeekExact(BytesRef
target, boolean prefetch) th
currentFrame.prefetchBlock();
}
+ long finalWithOutReloadFp = withOutReloadFp;
+ int finalOrigNextEnt = origNextEnt;
return () -> {
currentFrame.loadBlock();
+ // If we still stay on withOutReload frame, reduce entCount to
nextEnt.
+ int origEntCount = -1;
+ if (currentFrame.fp == finalWithOutReloadFp && finalOrigNextEnt !=
0) {
+ origEntCount = currentFrame.entCount;
+ currentFrame.entCount = finalOrigNextEnt;
+ }
+
final SeekStatus result = currentFrame.scanToTerm(target, true);
+
+ // Restore entCount to origEntCount.
+ if (origEntCount != -1) {
+ currentFrame.entCount = origEntCount;
Review Comment:
The temporary modification of `currentFrame.entCount` is restored only on
the normal path. Since `scanToTerm(...)` declares `throws IOException`, an
exception would skip the restore and leave `entCount` corrupted for subsequent
operations. Wrap the `scanToTerm` call in a try/finally (or refactor into a
helper) to guarantee restoration.
```suggestion
final SeekStatus result;
try {
result = currentFrame.scanToTerm(target, true);
} finally {
// Restore entCount to origEntCount.
if (origEntCount != -1) {
currentFrame.entCount = origEntCount;
}
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]