mikemccand commented on code in PR #12699: URL: https://github.com/apache/lucene/pull/12699#discussion_r1391017311
########## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ########## @@ -484,15 +487,15 @@ public boolean seekExact(BytesRef target) throws IOException { // System.out.println(" no seek state; push root frame"); // } - output = arc.output(); + accumulator.push(arc.output()); currentFrame = staticFrame; // term.length = 0; targetUpto = 0; - currentFrame = - pushFrame( - arc, Lucene90BlockTreeTermsReader.FST_OUTPUTS.add(output, arc.nextFinalOutput()), 0); + accumulator.push(arc.nextFinalOutput()); Review Comment: Could we rename these to `outputAccumulator`? ########## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ########## @@ -1190,4 +1176,65 @@ public void seekExact(long ord) { public long ord() { throw new UnsupportedOperationException(); } + + static class OutputAccumulator extends DataInput { + + /** We could have at most 10 no-empty arcs: 9 for vLong and 1 for floor data. */ + private static final int MAX_ARC = 10; Review Comment: Hmm -- makes me nervous. What if we change our output format and FST can do more sharing than it does today? Can we make it grow on demand if needed? ########## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ########## @@ -680,14 +676,8 @@ public SeekStatus seekCeil(BytesRef target) throws IOException { + (char) arc.label() + " targetLabel=" + (char) (target.bytes[target.offset + targetUpto] & 0xFF); - // TODO: we could save the outputs in local - // byte[][] instead of making new objs ever - // seek; but, often the FST doesn't have any - // shared bytes (but this could change if we - // reverse vLong byte order) Review Comment: Aha! Another ancient yet important `TODO`! Nice ;) ########## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ########## @@ -1190,4 +1176,65 @@ public void seekExact(long ord) { public long ord() { throw new UnsupportedOperationException(); } + + static class OutputAccumulator extends DataInput { + + /** We could have at most 10 no-empty arcs: 9 for vLong and 1 for floor data. */ + private static final int MAX_ARC = 10; + + BytesRef[] outputs = new BytesRef[MAX_ARC]; + BytesRef current; + int num = 0; + int outputIndex = 0; + int index = 0; Review Comment: You don't need the `= 0` initializers -- it is already java's default. ########## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/FieldReader.java: ########## @@ -118,13 +118,11 @@ long readVLongOutput(DataInput in) throws IOException { * <p>Package private for testing. */ static long readMSBVLong(DataInput in) throws IOException { - long l = 0L; - while (true) { - byte b = in.readByte(); + byte b = in.readByte(); + long l = b & 0x7FL; + while (b < 0) { Review Comment: Why this change? Is it a bit of specialization so CPU can predict branch a bit better? ########## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/IntersectTermsEnumFrame.java: ########## @@ -142,12 +138,20 @@ public void setState(int state) { } void load(BytesRef frameIndexData) throws IOException { - if (frameIndexData != null) { Review Comment: Hmm was `frameIndexData` never `null`? ########## lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java: ########## @@ -1190,4 +1176,65 @@ public void seekExact(long ord) { public long ord() { throw new UnsupportedOperationException(); } + + static class OutputAccumulator extends DataInput { Review Comment: So this class (efficiently) accumulates `BytesRef` outputs, and then after all appending is done, becomes a `DataInput` allow the caller to read byte bytes sequentially from the logically concatenated `byte[]`? Cool. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org