Re: [PR] Use value-based LRU cache in NodeHash [lucene]

via GitHub Thu, 02 Nov 2023 02:42:51 -0700


mikemccand commented on code in PR #12738:
URL: https://github.com/apache/lucene/pull/12738#discussion_r1379769524



##########
lucene/core/src/java/org/apache/lucene/util/fst/FSTCompiler.java:
##########
@@ -145,7 +145,7 @@ private FSTCompiler(
     if (suffixRAMLimitMB < 0) {
       throw new IllegalArgumentException("ramLimitMB must be >= 0; got: " + 
suffixRAMLimitMB);
     } else if (suffixRAMLimitMB > 0) {
-      dedupHash = new NodeHash<>(this, suffixRAMLimitMB, 
bytes.getReverseReader(false));

Review Comment:
   Ahh -- just here (used the `allowSingle`), so we can now always 
`allowSingle`, OK.



##########
lucene/core/src/java/org/apache/lucene/util/fst/BytesStore.java:
##########
@@ -444,11 +444,7 @@ public boolean reversed() {
 
   @Override
   public FST.BytesReader getReverseBytesReader() {
-    return getReverseReader(true);
-  }
-
-  FST.BytesReader getReverseReader(boolean allowSingle) {

Review Comment:
   Hmm, who needed/used this `allowSingle`?



##########
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##########
@@ -186,138 +206,88 @@ private long hash(FSTCompiler.UnCompiledNode<T> node) {
     return h;
   }
 
-  // hash code for a frozen node.  this must precisely match the hash 
computation of an unfrozen
-  // node!
-  private long hash(long node) throws IOException {
-    final int PRIME = 31;
-
-    long h = 0;
-    fstCompiler.fst.readFirstRealTargetArc(node, scratchArc, in);
-    while (true) {
-      h = PRIME * h + scratchArc.label();
-      h = PRIME * h + (int) (scratchArc.target() ^ (scratchArc.target() >> 
32));
-      h = PRIME * h + scratchArc.output().hashCode();
-      h = PRIME * h + scratchArc.nextFinalOutput().hashCode();
-      if (scratchArc.isFinal()) {
-        h += 17;
-      }
-      if (scratchArc.isLast()) {
-        break;
-      }
-      fstCompiler.fst.readNextRealArc(scratchArc, in);
-    }
-
-    return h;
-  }
-
-  /**
-   * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte 
location address (long),
-   * returning true if they are equal.
-   */
-  private boolean nodesEqual(FSTCompiler.UnCompiledNode<T> node, long address) 
throws IOException {
-    fstCompiler.fst.readFirstRealTargetArc(address, scratchArc, in);
-
-    // fail fast for a node with fixed length arcs
-    if (scratchArc.bytesPerArc() != 0) {
-      assert node.numArcs > 0;
-      // the frozen node uses fixed-with arc encoding (same number of bytes 
per arc), but may be
-      // sparse or dense
-      switch (scratchArc.nodeFlags()) {
-        case FST.ARCS_FOR_BINARY_SEARCH:
-          // sparse
-          if (node.numArcs != scratchArc.numArcs()) {
-            return false;
-          }
-          break;
-        case FST.ARCS_FOR_DIRECT_ADDRESSING:
-          // dense -- compare both the number of labels allocated in the array 
(some of which may
-          // not actually be arcs), and the number of arcs
-          if ((node.arcs[node.numArcs - 1].label - node.arcs[0].label + 1) != 
scratchArc.numArcs()
-              || node.numArcs != FST.Arc.BitTable.countBits(scratchArc, in)) {
-            return false;
-          }
-          break;
-        default:
-          throw new AssertionError("unhandled scratchArc.nodeFlag() " + 
scratchArc.nodeFlags());
-      }
-    }
-
-    // compare arc by arc to see if there is a difference
-    for (int arcUpto = 0; arcUpto < node.numArcs; arcUpto++) {
-      final FSTCompiler.Arc<T> arc = node.arcs[arcUpto];
-      if (arc.label != scratchArc.label()
-          || arc.output.equals(scratchArc.output()) == false
-          || ((FSTCompiler.CompiledNode) arc.target).node != 
scratchArc.target()
-          || arc.nextFinalOutput.equals(scratchArc.nextFinalOutput()) == false
-          || arc.isFinal != scratchArc.isFinal()) {
-        return false;
-      }
-
-      if (scratchArc.isLast()) {
-        if (arcUpto == node.numArcs - 1) {
-          return true;
-        } else {
-          return false;
-        }
-      }
-
-      fstCompiler.fst.readNextRealArc(scratchArc, in);
-    }
-
-    // unfrozen node has fewer arcs than frozen node
-
-    return false;
-  }
-
   /** Inner class because it needs access to hash function and FST bytes. */
   private class PagedGrowableHash {
-    private PagedGrowableWriter entries;
+    public long copiedBytes;
+    // storing the FST node address where the position is the masked hash of 
the node arcs
+    private PagedGrowableWriter fstNodeAddress;
+    // storing the local copiedNodes address in the same position as 
fstNodeAddress
+    // here we are effectively storing a Map<Long, Long> from the FST node 
address to copiedNodes
+    // address
+    private PagedGrowableWriter copiedNodeAddress;
     private long count;
     private long mask;
+    // storing the byte slice from the FST for nodes we added to the hash so 
that we don't need to
+    // look up from the FST itself. each node will be written subsequently

Review Comment:
   Maybe after `itself` add `, so the FST bytes can stream directly to disk as 
append-only writes`?



##########
lucene/core/src/java/org/apache/lucene/util/fst/ByteBlockPoolReverseBytesReader.java:
##########
@@ -0,0 +1,65 @@
+/*
+ * Licensed to the Apache Software Foundation (ASF) under one or more
+ * contributor license agreements.  See the NOTICE file distributed with
+ * this work for additional information regarding copyright ownership.
+ * The ASF licenses this file to You under the Apache License, Version 2.0
+ * (the "License"); you may not use this file except in compliance with
+ * the License.  You may obtain a copy of the License at
+ *
+ *     http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+package org.apache.lucene.util.fst;
+
+import java.io.IOException;
+import org.apache.lucene.util.ByteBlockPool;
+
+/** Reads in reverse from a ByteBlockPool. */
+final class ByteBlockPoolReverseBytesReader extends FST.BytesReader {
+
+  private final ByteBlockPool buf;
+  private final long relativePos;

Review Comment:
   Maybe `posOffset` or `posDelta`, making it clear it is a delta off of `pos`?



##########
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##########
@@ -186,138 +206,88 @@ private long hash(FSTCompiler.UnCompiledNode<T> node) {
     return h;
   }
 
-  // hash code for a frozen node.  this must precisely match the hash 
computation of an unfrozen
-  // node!
-  private long hash(long node) throws IOException {
-    final int PRIME = 31;
-
-    long h = 0;
-    fstCompiler.fst.readFirstRealTargetArc(node, scratchArc, in);
-    while (true) {
-      h = PRIME * h + scratchArc.label();
-      h = PRIME * h + (int) (scratchArc.target() ^ (scratchArc.target() >> 
32));
-      h = PRIME * h + scratchArc.output().hashCode();
-      h = PRIME * h + scratchArc.nextFinalOutput().hashCode();
-      if (scratchArc.isFinal()) {
-        h += 17;
-      }
-      if (scratchArc.isLast()) {
-        break;
-      }
-      fstCompiler.fst.readNextRealArc(scratchArc, in);
-    }
-
-    return h;
-  }
-
-  /**
-   * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte 
location address (long),
-   * returning true if they are equal.
-   */
-  private boolean nodesEqual(FSTCompiler.UnCompiledNode<T> node, long address) 
throws IOException {
-    fstCompiler.fst.readFirstRealTargetArc(address, scratchArc, in);
-
-    // fail fast for a node with fixed length arcs
-    if (scratchArc.bytesPerArc() != 0) {
-      assert node.numArcs > 0;
-      // the frozen node uses fixed-with arc encoding (same number of bytes 
per arc), but may be
-      // sparse or dense
-      switch (scratchArc.nodeFlags()) {
-        case FST.ARCS_FOR_BINARY_SEARCH:
-          // sparse
-          if (node.numArcs != scratchArc.numArcs()) {
-            return false;
-          }
-          break;
-        case FST.ARCS_FOR_DIRECT_ADDRESSING:
-          // dense -- compare both the number of labels allocated in the array 
(some of which may
-          // not actually be arcs), and the number of arcs
-          if ((node.arcs[node.numArcs - 1].label - node.arcs[0].label + 1) != 
scratchArc.numArcs()
-              || node.numArcs != FST.Arc.BitTable.countBits(scratchArc, in)) {
-            return false;
-          }
-          break;
-        default:
-          throw new AssertionError("unhandled scratchArc.nodeFlag() " + 
scratchArc.nodeFlags());
-      }
-    }
-
-    // compare arc by arc to see if there is a difference
-    for (int arcUpto = 0; arcUpto < node.numArcs; arcUpto++) {
-      final FSTCompiler.Arc<T> arc = node.arcs[arcUpto];
-      if (arc.label != scratchArc.label()
-          || arc.output.equals(scratchArc.output()) == false
-          || ((FSTCompiler.CompiledNode) arc.target).node != 
scratchArc.target()
-          || arc.nextFinalOutput.equals(scratchArc.nextFinalOutput()) == false
-          || arc.isFinal != scratchArc.isFinal()) {
-        return false;
-      }
-
-      if (scratchArc.isLast()) {
-        if (arcUpto == node.numArcs - 1) {
-          return true;
-        } else {
-          return false;
-        }
-      }
-
-      fstCompiler.fst.readNextRealArc(scratchArc, in);
-    }
-
-    // unfrozen node has fewer arcs than frozen node
-
-    return false;
-  }
-
   /** Inner class because it needs access to hash function and FST bytes. */
   private class PagedGrowableHash {
-    private PagedGrowableWriter entries;
+    public long copiedBytes;
+    // storing the FST node address where the position is the masked hash of 
the node arcs
+    private PagedGrowableWriter fstNodeAddress;
+    // storing the local copiedNodes address in the same position as 
fstNodeAddress
+    // here we are effectively storing a Map<Long, Long> from the FST node 
address to copiedNodes
+    // address
+    private PagedGrowableWriter copiedNodeAddress;
     private long count;
     private long mask;
+    // storing the byte slice from the FST for nodes we added to the hash so 
that we don't need to
+    // look up from the FST itself. each node will be written subsequently
+    private final ByteBlockPool copiedNodes;
 
     // 256K blocks, but note that the final block is sized only as needed so 
it won't use the full
     // block size when just a few elements were written to it
     private static final int BLOCK_SIZE_BYTES = 1 << 18;
 
     public PagedGrowableHash() {
-      entries = new PagedGrowableWriter(16, BLOCK_SIZE_BYTES, 8, 
PackedInts.COMPACT);
+      fstNodeAddress = new PagedGrowableWriter(16, BLOCK_SIZE_BYTES, 8, 
PackedInts.COMPACT);
+      copiedNodeAddress = new PagedGrowableWriter(16, BLOCK_SIZE_BYTES, 8, 
PackedInts.COMPACT);
       mask = 15;
+      copiedNodes = new ByteBlockPool(new ByteBlockPool.DirectAllocator());
     }
 
     public PagedGrowableHash(long lastNodeAddress, long size) {
-      entries =
+      fstNodeAddress =
           new PagedGrowableWriter(
               size, BLOCK_SIZE_BYTES, 
PackedInts.bitsRequired(lastNodeAddress), PackedInts.COMPACT);
+      copiedNodeAddress = new PagedGrowableWriter(size, BLOCK_SIZE_BYTES, 8, 
PackedInts.COMPACT);
       mask = size - 1;
       assert (mask & size) == 0 : "size must be a power-of-2; got size=" + 
size + " mask=" + mask;
+      copiedNodes = new ByteBlockPool(new ByteBlockPool.DirectAllocator());
+    }
+
+    public byte[] getBytes(long pos, int length) {
+      long address = copiedNodeAddress.get(pos);
+      byte[] buf = new byte[length];
+      copiedNodes.readBytes(address - length + 1, buf, 0, length);
+      return buf;
     }
 
     public long get(long index) {
-      return entries.get(index);
+      return fstNodeAddress.get(index);
     }
 
-    public void set(long index, long pointer) throws IOException {
-      entries.set(index, pointer);
+    public void set(long index, long pointer, byte[] bytes) {
+      fstNodeAddress.set(index, pointer);
       count++;
+      copiedNodes.append(new BytesRef(bytes));
+      copiedBytes += bytes.length;
+      // write the offset, which is the last offset of the node
+      copiedNodeAddress.set(index, copiedBytes - 1);
     }
 
     private void rehash(long lastNodeAddress) throws IOException {
+      // TODO: https://github.com/apache/lucene/issues/12744
+      // should we always use a small startBitsPerValue here (e.g 8) instead 
base off of
+      // lastNodeAddress?
+
       // double hash table size on each rehash
+      long newSize = 2 * fstNodeAddress.size();
+      PagedGrowableWriter newCopiedOffsets =

Review Comment:
   Can we rename to `newCopiedNodeAddress`?



##########
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##########
@@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws 
IOException {
       }
 
       mask = newMask;
-      entries = newEntries;
+      fstNodeAddress = newEntries;
+      copiedNodeAddress = newCopiedOffsets;
+    }
+
+    // hash code for a frozen node.  this must precisely match the hash 
computation of an unfrozen
+    // node!
+    private long hash(long node, long pos) throws IOException {
+      FST.BytesReader in = getBytesReader(node, pos);
+
+      final int PRIME = 31;
+
+      long h = 0;
+      fstCompiler.fst.readFirstRealTargetArc(node, scratchArc, in);
+      while (true) {
+        h = PRIME * h + scratchArc.label();
+        h = PRIME * h + (int) (scratchArc.target() ^ (scratchArc.target() >> 
32));
+        h = PRIME * h + scratchArc.output().hashCode();
+        h = PRIME * h + scratchArc.nextFinalOutput().hashCode();
+        if (scratchArc.isFinal()) {
+          h += 17;
+        }
+        if (scratchArc.isLast()) {
+          break;
+        }
+        fstCompiler.fst.readNextRealArc(scratchArc, in);
+      }
+
+      return h;
+    }
+
+    /**
+     * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte 
location address
+     * (long), returning the node length if the two nodes are matched, or -1 
otherwise
+     */
+    private int getMatchedNodeLength(FSTCompiler.UnCompiledNode<T> node, long 
address, long pos)
+        throws IOException {
+      FST.BytesReader in = getBytesReader(address, pos);

Review Comment:
   Can we also fix this so caller passes in the `BytesReader`, to minimize who 
must track the `pos` (`hashSlot`)?



##########
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##########
@@ -186,138 +206,88 @@ private long hash(FSTCompiler.UnCompiledNode<T> node) {
     return h;
   }
 
-  // hash code for a frozen node.  this must precisely match the hash 
computation of an unfrozen
-  // node!
-  private long hash(long node) throws IOException {
-    final int PRIME = 31;
-
-    long h = 0;
-    fstCompiler.fst.readFirstRealTargetArc(node, scratchArc, in);
-    while (true) {
-      h = PRIME * h + scratchArc.label();
-      h = PRIME * h + (int) (scratchArc.target() ^ (scratchArc.target() >> 
32));
-      h = PRIME * h + scratchArc.output().hashCode();
-      h = PRIME * h + scratchArc.nextFinalOutput().hashCode();
-      if (scratchArc.isFinal()) {
-        h += 17;
-      }
-      if (scratchArc.isLast()) {
-        break;
-      }
-      fstCompiler.fst.readNextRealArc(scratchArc, in);
-    }
-
-    return h;
-  }
-
-  /**
-   * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte 
location address (long),
-   * returning true if they are equal.
-   */
-  private boolean nodesEqual(FSTCompiler.UnCompiledNode<T> node, long address) 
throws IOException {
-    fstCompiler.fst.readFirstRealTargetArc(address, scratchArc, in);
-
-    // fail fast for a node with fixed length arcs
-    if (scratchArc.bytesPerArc() != 0) {
-      assert node.numArcs > 0;
-      // the frozen node uses fixed-with arc encoding (same number of bytes 
per arc), but may be
-      // sparse or dense
-      switch (scratchArc.nodeFlags()) {
-        case FST.ARCS_FOR_BINARY_SEARCH:
-          // sparse
-          if (node.numArcs != scratchArc.numArcs()) {
-            return false;
-          }
-          break;
-        case FST.ARCS_FOR_DIRECT_ADDRESSING:
-          // dense -- compare both the number of labels allocated in the array 
(some of which may
-          // not actually be arcs), and the number of arcs
-          if ((node.arcs[node.numArcs - 1].label - node.arcs[0].label + 1) != 
scratchArc.numArcs()
-              || node.numArcs != FST.Arc.BitTable.countBits(scratchArc, in)) {
-            return false;
-          }
-          break;
-        default:
-          throw new AssertionError("unhandled scratchArc.nodeFlag() " + 
scratchArc.nodeFlags());
-      }
-    }
-
-    // compare arc by arc to see if there is a difference
-    for (int arcUpto = 0; arcUpto < node.numArcs; arcUpto++) {
-      final FSTCompiler.Arc<T> arc = node.arcs[arcUpto];
-      if (arc.label != scratchArc.label()
-          || arc.output.equals(scratchArc.output()) == false
-          || ((FSTCompiler.CompiledNode) arc.target).node != 
scratchArc.target()
-          || arc.nextFinalOutput.equals(scratchArc.nextFinalOutput()) == false
-          || arc.isFinal != scratchArc.isFinal()) {
-        return false;
-      }
-
-      if (scratchArc.isLast()) {
-        if (arcUpto == node.numArcs - 1) {
-          return true;
-        } else {
-          return false;
-        }
-      }
-
-      fstCompiler.fst.readNextRealArc(scratchArc, in);
-    }
-
-    // unfrozen node has fewer arcs than frozen node
-
-    return false;
-  }
-
   /** Inner class because it needs access to hash function and FST bytes. */
   private class PagedGrowableHash {
-    private PagedGrowableWriter entries;
+    public long copiedBytes;

Review Comment:
   Add a comment explaining this is total bytes copied out of the FST `byte[]` 
into our `ByteBlockPool` for hashed suffix nodes?



##########
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##########
@@ -186,132 +214,99 @@ private long hash(FSTCompiler.UnCompiledNode<T> node) {
     return h;
   }
 
-  // hash code for a frozen node.  this must precisely match the hash 
computation of an unfrozen
-  // node!
-  private long hash(long node) throws IOException {
-    final int PRIME = 31;
-
-    long h = 0;
-    fstCompiler.fst.readFirstRealTargetArc(node, scratchArc, in);
-    while (true) {
-      h = PRIME * h + scratchArc.label();
-      h = PRIME * h + (int) (scratchArc.target() ^ (scratchArc.target() >> 
32));
-      h = PRIME * h + scratchArc.output().hashCode();
-      h = PRIME * h + scratchArc.nextFinalOutput().hashCode();
-      if (scratchArc.isFinal()) {
-        h += 17;
-      }
-      if (scratchArc.isLast()) {
-        break;
-      }
-      fstCompiler.fst.readNextRealArc(scratchArc, in);
-    }
-
-    return h;
-  }
-
-  /**
-   * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte 
location address (long),
-   * returning true if they are equal.
-   */
-  private boolean nodesEqual(FSTCompiler.UnCompiledNode<T> node, long address) 
throws IOException {
-    fstCompiler.fst.readFirstRealTargetArc(address, scratchArc, in);
-
-    // fail fast for a node with fixed length arcs
-    if (scratchArc.bytesPerArc() != 0) {
-      assert node.numArcs > 0;
-      // the frozen node uses fixed-with arc encoding (same number of bytes 
per arc), but may be
-      // sparse or dense
-      switch (scratchArc.nodeFlags()) {
-        case FST.ARCS_FOR_BINARY_SEARCH:
-          // sparse
-          if (node.numArcs != scratchArc.numArcs()) {
-            return false;
-          }
-          break;
-        case FST.ARCS_FOR_DIRECT_ADDRESSING:
-          // dense -- compare both the number of labels allocated in the array 
(some of which may
-          // not actually be arcs), and the number of arcs
-          if ((node.arcs[node.numArcs - 1].label - node.arcs[0].label + 1) != 
scratchArc.numArcs()
-              || node.numArcs != FST.Arc.BitTable.countBits(scratchArc, in)) {
-            return false;
-          }
-          break;
-        default:
-          throw new AssertionError("unhandled scratchArc.nodeFlag() " + 
scratchArc.nodeFlags());
-      }
-    }
-
-    // compare arc by arc to see if there is a difference
-    for (int arcUpto = 0; arcUpto < node.numArcs; arcUpto++) {
-      final FSTCompiler.Arc<T> arc = node.arcs[arcUpto];
-      if (arc.label != scratchArc.label()
-          || arc.output.equals(scratchArc.output()) == false
-          || ((FSTCompiler.CompiledNode) arc.target).node != 
scratchArc.target()
-          || arc.nextFinalOutput.equals(scratchArc.nextFinalOutput()) == false
-          || arc.isFinal != scratchArc.isFinal()) {
-        return false;
-      }
-
-      if (scratchArc.isLast()) {
-        if (arcUpto == node.numArcs - 1) {
-          return true;
-        } else {
-          return false;
-        }
-      }
-
-      fstCompiler.fst.readNextRealArc(scratchArc, in);
-    }
-
-    // unfrozen node has fewer arcs than frozen node
-
-    return false;
-  }
-
   /** Inner class because it needs access to hash function and FST bytes. */
   private class PagedGrowableHash {
-    private PagedGrowableWriter entries;
+    public long copiedBytes;
+    // storing the FST node address where the position is the masked hash of 
the node arcs
+    private PagedGrowableWriter fstHashAddress;
+    // storing the local copiedNodes address
+    private PagedGrowableWriter copiedNodeAddress;
+    // storing the global FST nodes address in the same position as 
copiedNodeAddress
+    private PagedGrowableWriter fstNodeAddress;
     private long count;
     private long mask;
+    private final ByteBlockPool copiedNodes;
 
     // 256K blocks, but note that the final block is sized only as needed so 
it won't use the full
     // block size when just a few elements were written to it
     private static final int BLOCK_SIZE_BYTES = 1 << 18;
 
     public PagedGrowableHash() {
-      entries = new PagedGrowableWriter(16, BLOCK_SIZE_BYTES, 8, 
PackedInts.COMPACT);
+      fstHashAddress = new PagedGrowableWriter(16, BLOCK_SIZE_BYTES, 8, 
PackedInts.COMPACT);
+      fstNodeAddress = new PagedGrowableWriter(16, BLOCK_SIZE_BYTES, 8, 
PackedInts.COMPACT);
+      copiedNodeAddress = new PagedGrowableWriter(16, BLOCK_SIZE_BYTES, 8, 
PackedInts.COMPACT);
       mask = 15;
+      copiedNodes = new ByteBlockPool(new ByteBlockPool.DirectAllocator());
     }
 
     public PagedGrowableHash(long lastNodeAddress, long size) {
-      entries =
+      fstHashAddress =
           new PagedGrowableWriter(
               size, BLOCK_SIZE_BYTES, 
PackedInts.bitsRequired(lastNodeAddress), PackedInts.COMPACT);
+      fstNodeAddress =
+          new PagedGrowableWriter(
+              size, BLOCK_SIZE_BYTES, 
PackedInts.bitsRequired(lastNodeAddress), PackedInts.COMPACT);
+      copiedNodeAddress = new PagedGrowableWriter(size, BLOCK_SIZE_BYTES, 8, 
PackedInts.COMPACT);
       mask = size - 1;
       assert (mask & size) == 0 : "size must be a power-of-2; got size=" + 
size + " mask=" + mask;
+      copiedNodes = new ByteBlockPool(new ByteBlockPool.DirectAllocator());
+    }
+
+    public long getCopiedNodeAddress(long node) {
+      long pos = Long.hashCode(node) & mask;
+      while (true) {
+        long address = fstNodeAddress.get(pos);
+        assert address != 0;
+        if (address == node) {
+          return copiedNodeAddress.get(pos);
+        }
+        pos = (pos + 1) & mask;
+      }
+    }
+
+    public byte[] getBytes(long node, int length) {
+      long copiedNodeAddress = getCopiedNodeAddress(node);
+      byte[] buf = new byte[length];
+      copiedNodes.readBytes(copiedNodeAddress - length + 1, buf, 0, length);
+      return buf;
     }
 
     public long get(long index) {
-      return entries.get(index);
+      return fstHashAddress.get(index);
     }
 
-    public void set(long index, long pointer) throws IOException {
-      entries.set(index, pointer);
+    public void set(long index, long pointer, byte[] bytes) {
+      fstHashAddress.set(index, pointer);
       count++;
+      setOffset(pointer, bytes);
+    }
+
+    private void setOffset(long pointer, byte[] bytes) {
+      // TODO: Write the bytes directly from BytesStore
+      copiedNodes.append(new BytesRef(bytes));

Review Comment:
   Can we add a method to `ByteBlockPool` that directly takes a `byte[]`?



##########
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##########
@@ -186,138 +206,88 @@ private long hash(FSTCompiler.UnCompiledNode<T> node) {
     return h;
   }
 
-  // hash code for a frozen node.  this must precisely match the hash 
computation of an unfrozen
-  // node!
-  private long hash(long node) throws IOException {
-    final int PRIME = 31;
-
-    long h = 0;
-    fstCompiler.fst.readFirstRealTargetArc(node, scratchArc, in);
-    while (true) {
-      h = PRIME * h + scratchArc.label();
-      h = PRIME * h + (int) (scratchArc.target() ^ (scratchArc.target() >> 
32));
-      h = PRIME * h + scratchArc.output().hashCode();
-      h = PRIME * h + scratchArc.nextFinalOutput().hashCode();
-      if (scratchArc.isFinal()) {
-        h += 17;
-      }
-      if (scratchArc.isLast()) {
-        break;
-      }
-      fstCompiler.fst.readNextRealArc(scratchArc, in);
-    }
-
-    return h;
-  }
-
-  /**
-   * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte 
location address (long),
-   * returning true if they are equal.
-   */
-  private boolean nodesEqual(FSTCompiler.UnCompiledNode<T> node, long address) 
throws IOException {
-    fstCompiler.fst.readFirstRealTargetArc(address, scratchArc, in);
-
-    // fail fast for a node with fixed length arcs
-    if (scratchArc.bytesPerArc() != 0) {
-      assert node.numArcs > 0;
-      // the frozen node uses fixed-with arc encoding (same number of bytes 
per arc), but may be
-      // sparse or dense
-      switch (scratchArc.nodeFlags()) {
-        case FST.ARCS_FOR_BINARY_SEARCH:
-          // sparse
-          if (node.numArcs != scratchArc.numArcs()) {
-            return false;
-          }
-          break;
-        case FST.ARCS_FOR_DIRECT_ADDRESSING:
-          // dense -- compare both the number of labels allocated in the array 
(some of which may
-          // not actually be arcs), and the number of arcs
-          if ((node.arcs[node.numArcs - 1].label - node.arcs[0].label + 1) != 
scratchArc.numArcs()
-              || node.numArcs != FST.Arc.BitTable.countBits(scratchArc, in)) {
-            return false;
-          }
-          break;
-        default:
-          throw new AssertionError("unhandled scratchArc.nodeFlag() " + 
scratchArc.nodeFlags());
-      }
-    }
-
-    // compare arc by arc to see if there is a difference
-    for (int arcUpto = 0; arcUpto < node.numArcs; arcUpto++) {
-      final FSTCompiler.Arc<T> arc = node.arcs[arcUpto];
-      if (arc.label != scratchArc.label()
-          || arc.output.equals(scratchArc.output()) == false
-          || ((FSTCompiler.CompiledNode) arc.target).node != 
scratchArc.target()
-          || arc.nextFinalOutput.equals(scratchArc.nextFinalOutput()) == false
-          || arc.isFinal != scratchArc.isFinal()) {
-        return false;
-      }
-
-      if (scratchArc.isLast()) {
-        if (arcUpto == node.numArcs - 1) {
-          return true;
-        } else {
-          return false;
-        }
-      }
-
-      fstCompiler.fst.readNextRealArc(scratchArc, in);
-    }
-
-    // unfrozen node has fewer arcs than frozen node
-
-    return false;
-  }
-
   /** Inner class because it needs access to hash function and FST bytes. */
   private class PagedGrowableHash {
-    private PagedGrowableWriter entries;
+    public long copiedBytes;
+    // storing the FST node address where the position is the masked hash of 
the node arcs
+    private PagedGrowableWriter fstNodeAddress;
+    // storing the local copiedNodes address in the same position as 
fstNodeAddress
+    // here we are effectively storing a Map<Long, Long> from the FST node 
address to copiedNodes
+    // address
+    private PagedGrowableWriter copiedNodeAddress;
     private long count;
     private long mask;
+    // storing the byte slice from the FST for nodes we added to the hash so 
that we don't need to
+    // look up from the FST itself. each node will be written subsequently
+    private final ByteBlockPool copiedNodes;
 
     // 256K blocks, but note that the final block is sized only as needed so 
it won't use the full
     // block size when just a few elements were written to it
     private static final int BLOCK_SIZE_BYTES = 1 << 18;
 
     public PagedGrowableHash() {
-      entries = new PagedGrowableWriter(16, BLOCK_SIZE_BYTES, 8, 
PackedInts.COMPACT);
+      fstNodeAddress = new PagedGrowableWriter(16, BLOCK_SIZE_BYTES, 8, 
PackedInts.COMPACT);
+      copiedNodeAddress = new PagedGrowableWriter(16, BLOCK_SIZE_BYTES, 8, 
PackedInts.COMPACT);
       mask = 15;
+      copiedNodes = new ByteBlockPool(new ByteBlockPool.DirectAllocator());
     }
 
     public PagedGrowableHash(long lastNodeAddress, long size) {
-      entries =
+      fstNodeAddress =
           new PagedGrowableWriter(
               size, BLOCK_SIZE_BYTES, 
PackedInts.bitsRequired(lastNodeAddress), PackedInts.COMPACT);
+      copiedNodeAddress = new PagedGrowableWriter(size, BLOCK_SIZE_BYTES, 8, 
PackedInts.COMPACT);
       mask = size - 1;
       assert (mask & size) == 0 : "size must be a power-of-2; got size=" + 
size + " mask=" + mask;
+      copiedNodes = new ByteBlockPool(new ByteBlockPool.DirectAllocator());
+    }
+
+    public byte[] getBytes(long pos, int length) {
+      long address = copiedNodeAddress.get(pos);
+      byte[] buf = new byte[length];
+      copiedNodes.readBytes(address - length + 1, buf, 0, length);
+      return buf;
     }
 
     public long get(long index) {
-      return entries.get(index);
+      return fstNodeAddress.get(index);
     }
 
-    public void set(long index, long pointer) throws IOException {
-      entries.set(index, pointer);
+    public void set(long index, long pointer, byte[] bytes) {
+      fstNodeAddress.set(index, pointer);
       count++;
+      copiedNodes.append(new BytesRef(bytes));
+      copiedBytes += bytes.length;
+      // write the offset, which is the last offset of the node

Review Comment:
   Maybe reword to `// write the offset, which points to the last byte we 
copied since we later read this node in reverse`?



##########
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##########
@@ -186,138 +206,88 @@ private long hash(FSTCompiler.UnCompiledNode<T> node) {
     return h;
   }
 
-  // hash code for a frozen node.  this must precisely match the hash 
computation of an unfrozen
-  // node!
-  private long hash(long node) throws IOException {
-    final int PRIME = 31;
-
-    long h = 0;
-    fstCompiler.fst.readFirstRealTargetArc(node, scratchArc, in);
-    while (true) {
-      h = PRIME * h + scratchArc.label();
-      h = PRIME * h + (int) (scratchArc.target() ^ (scratchArc.target() >> 
32));
-      h = PRIME * h + scratchArc.output().hashCode();
-      h = PRIME * h + scratchArc.nextFinalOutput().hashCode();
-      if (scratchArc.isFinal()) {
-        h += 17;
-      }
-      if (scratchArc.isLast()) {
-        break;
-      }
-      fstCompiler.fst.readNextRealArc(scratchArc, in);
-    }
-
-    return h;
-  }
-
-  /**
-   * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte 
location address (long),
-   * returning true if they are equal.
-   */
-  private boolean nodesEqual(FSTCompiler.UnCompiledNode<T> node, long address) 
throws IOException {
-    fstCompiler.fst.readFirstRealTargetArc(address, scratchArc, in);
-
-    // fail fast for a node with fixed length arcs
-    if (scratchArc.bytesPerArc() != 0) {
-      assert node.numArcs > 0;
-      // the frozen node uses fixed-with arc encoding (same number of bytes 
per arc), but may be
-      // sparse or dense
-      switch (scratchArc.nodeFlags()) {
-        case FST.ARCS_FOR_BINARY_SEARCH:
-          // sparse
-          if (node.numArcs != scratchArc.numArcs()) {
-            return false;
-          }
-          break;
-        case FST.ARCS_FOR_DIRECT_ADDRESSING:
-          // dense -- compare both the number of labels allocated in the array 
(some of which may
-          // not actually be arcs), and the number of arcs
-          if ((node.arcs[node.numArcs - 1].label - node.arcs[0].label + 1) != 
scratchArc.numArcs()
-              || node.numArcs != FST.Arc.BitTable.countBits(scratchArc, in)) {
-            return false;
-          }
-          break;
-        default:
-          throw new AssertionError("unhandled scratchArc.nodeFlag() " + 
scratchArc.nodeFlags());
-      }
-    }
-
-    // compare arc by arc to see if there is a difference
-    for (int arcUpto = 0; arcUpto < node.numArcs; arcUpto++) {
-      final FSTCompiler.Arc<T> arc = node.arcs[arcUpto];
-      if (arc.label != scratchArc.label()
-          || arc.output.equals(scratchArc.output()) == false
-          || ((FSTCompiler.CompiledNode) arc.target).node != 
scratchArc.target()
-          || arc.nextFinalOutput.equals(scratchArc.nextFinalOutput()) == false
-          || arc.isFinal != scratchArc.isFinal()) {
-        return false;
-      }
-
-      if (scratchArc.isLast()) {
-        if (arcUpto == node.numArcs - 1) {
-          return true;
-        } else {
-          return false;
-        }
-      }
-
-      fstCompiler.fst.readNextRealArc(scratchArc, in);
-    }
-
-    // unfrozen node has fewer arcs than frozen node
-
-    return false;
-  }
-
   /** Inner class because it needs access to hash function and FST bytes. */
   private class PagedGrowableHash {
-    private PagedGrowableWriter entries;
+    public long copiedBytes;
+    // storing the FST node address where the position is the masked hash of 
the node arcs
+    private PagedGrowableWriter fstNodeAddress;
+    // storing the local copiedNodes address in the same position as 
fstNodeAddress
+    // here we are effectively storing a Map<Long, Long> from the FST node 
address to copiedNodes
+    // address
+    private PagedGrowableWriter copiedNodeAddress;
     private long count;
     private long mask;
+    // storing the byte slice from the FST for nodes we added to the hash so 
that we don't need to
+    // look up from the FST itself. each node will be written subsequently
+    private final ByteBlockPool copiedNodes;
 
     // 256K blocks, but note that the final block is sized only as needed so 
it won't use the full
     // block size when just a few elements were written to it
     private static final int BLOCK_SIZE_BYTES = 1 << 18;
 
     public PagedGrowableHash() {
-      entries = new PagedGrowableWriter(16, BLOCK_SIZE_BYTES, 8, 
PackedInts.COMPACT);
+      fstNodeAddress = new PagedGrowableWriter(16, BLOCK_SIZE_BYTES, 8, 
PackedInts.COMPACT);
+      copiedNodeAddress = new PagedGrowableWriter(16, BLOCK_SIZE_BYTES, 8, 
PackedInts.COMPACT);
       mask = 15;
+      copiedNodes = new ByteBlockPool(new ByteBlockPool.DirectAllocator());
     }
 
     public PagedGrowableHash(long lastNodeAddress, long size) {
-      entries =
+      fstNodeAddress =
           new PagedGrowableWriter(
               size, BLOCK_SIZE_BYTES, 
PackedInts.bitsRequired(lastNodeAddress), PackedInts.COMPACT);
+      copiedNodeAddress = new PagedGrowableWriter(size, BLOCK_SIZE_BYTES, 8, 
PackedInts.COMPACT);
       mask = size - 1;
       assert (mask & size) == 0 : "size must be a power-of-2; got size=" + 
size + " mask=" + mask;
+      copiedNodes = new ByteBlockPool(new ByteBlockPool.DirectAllocator());
+    }
+
+    public byte[] getBytes(long pos, int length) {
+      long address = copiedNodeAddress.get(pos);
+      byte[] buf = new byte[length];
+      copiedNodes.readBytes(address - length + 1, buf, 0, length);
+      return buf;
     }
 
     public long get(long index) {
-      return entries.get(index);
+      return fstNodeAddress.get(index);
     }
 
-    public void set(long index, long pointer) throws IOException {
-      entries.set(index, pointer);
+    public void set(long index, long pointer, byte[] bytes) {
+      fstNodeAddress.set(index, pointer);
       count++;
+      copiedNodes.append(new BytesRef(bytes));
+      copiedBytes += bytes.length;
+      // write the offset, which is the last offset of the node
+      copiedNodeAddress.set(index, copiedBytes - 1);
     }
 
     private void rehash(long lastNodeAddress) throws IOException {
+      // TODO: https://github.com/apache/lucene/issues/12744
+      // should we always use a small startBitsPerValue here (e.g 8) instead 
base off of
+      // lastNodeAddress?
+
       // double hash table size on each rehash
+      long newSize = 2 * fstNodeAddress.size();
+      PagedGrowableWriter newCopiedOffsets =
+          new PagedGrowableWriter(
+              newSize, BLOCK_SIZE_BYTES, PackedInts.bitsRequired(copiedBytes), 
PackedInts.COMPACT);
       PagedGrowableWriter newEntries =

Review Comment:
   Can we rename to `newFSTNodeAddress`?



##########
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##########
@@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws 
IOException {
       }
 
       mask = newMask;
-      entries = newEntries;
+      fstNodeAddress = newEntries;
+      copiedNodeAddress = newCopiedOffsets;
+    }
+
+    // hash code for a frozen node.  this must precisely match the hash 
computation of an unfrozen
+    // node!
+    private long hash(long node, long pos) throws IOException {
+      FST.BytesReader in = getBytesReader(node, pos);
+
+      final int PRIME = 31;
+
+      long h = 0;
+      fstCompiler.fst.readFirstRealTargetArc(node, scratchArc, in);
+      while (true) {
+        h = PRIME * h + scratchArc.label();
+        h = PRIME * h + (int) (scratchArc.target() ^ (scratchArc.target() >> 
32));
+        h = PRIME * h + scratchArc.output().hashCode();
+        h = PRIME * h + scratchArc.nextFinalOutput().hashCode();
+        if (scratchArc.isFinal()) {
+          h += 17;
+        }
+        if (scratchArc.isLast()) {
+          break;
+        }
+        fstCompiler.fst.readNextRealArc(scratchArc, in);
+      }
+
+      return h;
+    }
+
+    /**
+     * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte 
location address
+     * (long), returning the node length if the two nodes are matched, or -1 
otherwise
+     */
+    private int getMatchedNodeLength(FSTCompiler.UnCompiledNode<T> node, long 
address, long pos)

Review Comment:
   Can we change the name back to `nodesEqual`, and just let the javadoc / 
comment explain the returned `long` semantics?  I think this original name 
better reflects the method's purpose :)



##########
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##########
@@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws 
IOException {
       }
 
       mask = newMask;
-      entries = newEntries;
+      fstNodeAddress = newEntries;
+      copiedNodeAddress = newCopiedOffsets;
+    }
+
+    // hash code for a frozen node.  this must precisely match the hash 
computation of an unfrozen
+    // node!
+    private long hash(long node, long pos) throws IOException {
+      FST.BytesReader in = getBytesReader(node, pos);
+
+      final int PRIME = 31;
+
+      long h = 0;
+      fstCompiler.fst.readFirstRealTargetArc(node, scratchArc, in);
+      while (true) {
+        h = PRIME * h + scratchArc.label();
+        h = PRIME * h + (int) (scratchArc.target() ^ (scratchArc.target() >> 
32));
+        h = PRIME * h + scratchArc.output().hashCode();
+        h = PRIME * h + scratchArc.nextFinalOutput().hashCode();
+        if (scratchArc.isFinal()) {
+          h += 17;
+        }
+        if (scratchArc.isLast()) {
+          break;
+        }
+        fstCompiler.fst.readNextRealArc(scratchArc, in);
+      }
+
+      return h;
+    }
+
+    /**
+     * Compares an unfrozen node (UnCompiledNode) with a frozen node at byte 
location address
+     * (long), returning the node length if the two nodes are matched, or -1 
otherwise
+     */
+    private int getMatchedNodeLength(FSTCompiler.UnCompiledNode<T> node, long 
address, long pos)
+        throws IOException {
+      FST.BytesReader in = getBytesReader(address, pos);
+      fstCompiler.fst.readFirstRealTargetArc(address, scratchArc, in);
+
+      // fail fast for a node with fixed length arcs
+      if (scratchArc.bytesPerArc() != 0) {
+        assert node.numArcs > 0;
+        // the frozen node uses fixed-with arc encoding (same number of bytes 
per arc), but may be
+        // sparse or dense
+        switch (scratchArc.nodeFlags()) {
+          case FST.ARCS_FOR_BINARY_SEARCH:
+            // sparse
+            if (node.numArcs != scratchArc.numArcs()) {
+              return -1;
+            }
+            break;
+          case FST.ARCS_FOR_DIRECT_ADDRESSING:
+            // dense -- compare both the number of labels allocated in the 
array (some of which may
+            // not actually be arcs), and the number of arcs
+            if ((node.arcs[node.numArcs - 1].label - node.arcs[0].label + 1) 
!= scratchArc.numArcs()
+                || node.numArcs != FST.Arc.BitTable.countBits(scratchArc, in)) 
{
+              return -1;
+            }
+            break;
+          default:
+            throw new AssertionError("unhandled scratchArc.nodeFlag() " + 
scratchArc.nodeFlags());
+        }
+      }
+
+      // compare arc by arc to see if there is a difference
+      for (int arcUpto = 0; arcUpto < node.numArcs; arcUpto++) {
+        final FSTCompiler.Arc<T> arc = node.arcs[arcUpto];
+        if (arc.label != scratchArc.label()
+            || arc.output.equals(scratchArc.output()) == false
+            || ((FSTCompiler.CompiledNode) arc.target).node != 
scratchArc.target()
+            || arc.nextFinalOutput.equals(scratchArc.nextFinalOutput()) == 
false
+            || arc.isFinal != scratchArc.isFinal()) {
+          return -1;
+        }
+
+        if (scratchArc.isLast()) {
+          if (arcUpto == node.numArcs - 1) {
+            return Math.toIntExact(address - in.getPosition() + 1);
+          } else {
+            return -1;
+          }
+        }
+
+        fstCompiler.fst.readNextRealArc(scratchArc, in);
+      }
+
+      // unfrozen node has fewer arcs than frozen node
+
+      return -1;
+    }
+
+    private FST.BytesReader getBytesReader(long address, long pos) {
+      long localAddress = copiedNodeAddress.get(pos);
+      return new ByteBlockPoolReverseBytesReader(copiedNodes, address - 
localAddress);

Review Comment:
   Must we create a new one each time?  Or could we have a reused instance for 
the lifetime of the `NodeHash`?



##########
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##########
@@ -49,14 +51,15 @@ final class NodeHash<T> {
 
   private final FSTCompiler<T> fstCompiler;
   private final FST.Arc<T> scratchArc = new FST.Arc<>();
-  private final FST.BytesReader in;
+  // store the last fallback table node length in getFallback()
+  private int lastFallbackNodeLength;
 
   /**
    * ramLimitMB is the max RAM we can use for recording suffixes. If we hit 
this limit, the least
    * recently used suffixes are discarded, and the FST is no longer minimalI. 
Still, larger
    * ramLimitMB will make the FST smaller (closer to minimal).
    */
-  public NodeHash(FSTCompiler<T> fstCompiler, double ramLimitMB, 
FST.BytesReader in) {
+  public NodeHash(FSTCompiler<T> fstCompiler, double ramLimitMB) {

Review Comment:
   So cool not to pass in an `FST.BytesReader` anymore!  It shows the 
decoupling well :)



##########
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##########
@@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws 
IOException {
       }
 
       mask = newMask;
-      entries = newEntries;
+      fstNodeAddress = newEntries;
+      copiedNodeAddress = newCopiedOffsets;
+    }
+
+    // hash code for a frozen node.  this must precisely match the hash 
computation of an unfrozen
+    // node!
+    private long hash(long node, long pos) throws IOException {
+      FST.BytesReader in = getBytesReader(node, pos);

Review Comment:
   Instead of having the `hash` function pull this reader, can we fix the 
caller to pass it in?  This way the hash function only gets a `BytesReader`, 
and the hairiness of this `hashSlot` is more contained (to the caller).



##########
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##########
@@ -85,9 +87,14 @@ private long getFallback(FSTCompiler.UnCompiledNode<T> 
nodeIn, long hash) throws
       if (node == 0) {
         // not found
         return 0;
-      } else if (nodesEqual(nodeIn, node)) {
-        // frozen version of this node is already here
-        return node;
+      } else {
+        int length = fallbackTable.getMatchedNodeLength(nodeIn, node, pos);
+        if (length != -1) {
+          // store the node length for further use
+          this.lastFallbackNodeLength = length;

Review Comment:
   Could we set `this.lastFallbackNodeLength = -1` at the start of this method? 
 Let's not risk the prior lookup's length somehow being seen for the wrong node?



##########
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##########
@@ -328,7 +298,100 @@ private void rehash(long lastNodeAddress) throws 
IOException {
       }
 
       mask = newMask;
-      entries = newEntries;
+      fstNodeAddress = newEntries;
+      copiedNodeAddress = newCopiedOffsets;
+    }
+
+    // hash code for a frozen node.  this must precisely match the hash 
computation of an unfrozen
+    // node!
+    private long hash(long node, long pos) throws IOException {

Review Comment:
   I think the `long pos` here is the "hash mod position" / slot in the 
parallel hash tables?  Could we rename everywhere to `long hashSlot` maybe?  
`pos` and `node` and `address` are all confusing here :)



##########
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##########
@@ -110,25 +117,39 @@ public long add(FSTCompiler.UnCompiledNode<T> nodeIn) 
throws IOException {
         node = getFallback(nodeIn, hash);
         if (node != 0) {
           // it was already in fallback -- promote to primary
-          primaryTable.set(pos, node);
+          // TODO: Copy directly between 2 ByteBlockPool to avoid double-copy
+          primaryTable.set(pos, node, fallbackTable.getBytes(pos, 
lastFallbackNodeLength));
         } else {
           // not in fallback either -- freeze & add the incoming node
 
+          long startAddress = fstCompiler.bytes.getPosition();
           // freeze & add
           node = fstCompiler.addNode(nodeIn);
 
+          // TODO: Write the bytes directly from BytesStore
           // we use 0 as empty marker in hash table, so it better be 
impossible to get a frozen node
           // at 0:
-          assert node != 0;
+          assert node != FST.FINAL_END_NODE && node != FST.NON_FINAL_END_NODE;
+          byte[] buf = new byte[Math.toIntExact(node - startAddress + 1)];

Review Comment:
   Hmm why the `+1` here?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Use value-based LRU cache in NodeHash [lucene]

Reply via email to