mikemccand commented on code in PR #12699:
URL: https://github.com/apache/lucene/pull/12699#discussion_r1391017311


##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java:
##########
@@ -484,15 +487,15 @@ public boolean seekExact(BytesRef target) throws 
IOException {
       //   System.out.println("    no seek state; push root frame");
       // }
 
-      output = arc.output();
+      accumulator.push(arc.output());
 
       currentFrame = staticFrame;
 
       // term.length = 0;
       targetUpto = 0;
-      currentFrame =
-          pushFrame(
-              arc, Lucene90BlockTreeTermsReader.FST_OUTPUTS.add(output, 
arc.nextFinalOutput()), 0);
+      accumulator.push(arc.nextFinalOutput());

Review Comment:
   Could we rename these to `outputAccumulator`?



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java:
##########
@@ -1190,4 +1176,65 @@ public void seekExact(long ord) {
   public long ord() {
     throw new UnsupportedOperationException();
   }
+
+  static class OutputAccumulator extends DataInput {
+
+    /** We could have at most 10 no-empty arcs: 9 for vLong and 1 for floor 
data. */
+    private static final int MAX_ARC = 10;

Review Comment:
   Hmm -- makes me nervous.  What if we change our output format and FST can do 
more sharing than it does today?  Can we make it grow on demand if needed?



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java:
##########
@@ -680,14 +676,8 @@ public SeekStatus seekCeil(BytesRef target) throws 
IOException {
                 + (char) arc.label()
                 + " targetLabel="
                 + (char) (target.bytes[target.offset + targetUpto] & 0xFF);
-        // TODO: we could save the outputs in local
-        // byte[][] instead of making new objs ever
-        // seek; but, often the FST doesn't have any
-        // shared bytes (but this could change if we
-        // reverse vLong byte order)

Review Comment:
   Aha!  Another ancient yet important `TODO`!  Nice ;)



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java:
##########
@@ -1190,4 +1176,65 @@ public void seekExact(long ord) {
   public long ord() {
     throw new UnsupportedOperationException();
   }
+
+  static class OutputAccumulator extends DataInput {
+
+    /** We could have at most 10 no-empty arcs: 9 for vLong and 1 for floor 
data. */
+    private static final int MAX_ARC = 10;
+
+    BytesRef[] outputs = new BytesRef[MAX_ARC];
+    BytesRef current;
+    int num = 0;
+    int outputIndex = 0;
+    int index = 0;

Review Comment:
   You don't need the `= 0` initializers -- it is already java's default.



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/FieldReader.java:
##########
@@ -118,13 +118,11 @@ long readVLongOutput(DataInput in) throws IOException {
    * <p>Package private for testing.
    */
   static long readMSBVLong(DataInput in) throws IOException {
-    long l = 0L;
-    while (true) {
-      byte b = in.readByte();
+    byte b = in.readByte();
+    long l = b & 0x7FL;
+    while (b < 0) {

Review Comment:
   Why this change?  Is it a bit of specialization so CPU can predict branch a 
bit better?



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/IntersectTermsEnumFrame.java:
##########
@@ -142,12 +138,20 @@ public void setState(int state) {
   }
 
   void load(BytesRef frameIndexData) throws IOException {
-    if (frameIndexData != null) {

Review Comment:
   Hmm was `frameIndexData` never `null`?



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene90/blocktree/SegmentTermsEnum.java:
##########
@@ -1190,4 +1176,65 @@ public void seekExact(long ord) {
   public long ord() {
     throw new UnsupportedOperationException();
   }
+
+  static class OutputAccumulator extends DataInput {

Review Comment:
   So this class (efficiently) accumulates `BytesRef` outputs, and then after 
all appending is done, becomes a `DataInput` allow the caller to read byte 
bytes sequentially from the logically concatenated `byte[]`?  Cool.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to