mikemccand commented on code in PR #12738:
URL: https://github.com/apache/lucene/pull/12738#discussion_r1377721562


##########
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##########
@@ -269,36 +283,58 @@ private boolean nodesEqual(FSTCompiler.UnCompiledNode<T> 
node, long address) thr
     return false;
   }
 
+  record OffsetAndLength(long offset, int length) {}
+
   /** Inner class because it needs access to hash function and FST bytes. */
-  private class PagedGrowableHash {
+  class PagedGrowableHash {
     private PagedGrowableWriter entries;
-    private long count;
+    // nocommit: use PagedGrowableWriter? there was some size overflow issue 
with
+    // PagedGrowableWriter
+    // mapping from FST real address to copiedNodes offsets & length
+    private Map<Long, OffsetAndLength> copiedOffsets;
+    long count;
+    long currentOffsets;
     private long mask;
+    private final ByteBlockPool copiedNodes;
 
     // 256K blocks, but note that the final block is sized only as needed so 
it won't use the full
     // block size when just a few elements were written to it
     private static final int BLOCK_SIZE_BYTES = 1 << 18;
 
     public PagedGrowableHash() {
       entries = new PagedGrowableWriter(16, BLOCK_SIZE_BYTES, 8, 
PackedInts.COMPACT);
+      copiedOffsets = new HashMap<>();
       mask = 15;
+      copiedNodes = new ByteBlockPool(new ByteBlockPool.DirectAllocator());
     }
 
     public PagedGrowableHash(long lastNodeAddress, long size) {
       entries =
           new PagedGrowableWriter(
               size, BLOCK_SIZE_BYTES, 
PackedInts.bitsRequired(lastNodeAddress), PackedInts.COMPACT);
+      copiedOffsets = new HashMap<>();
       mask = size - 1;
       assert (mask & size) == 0 : "size must be a power-of-2; got size=" + 
size + " mask=" + mask;
+      copiedNodes = new ByteBlockPool(new ByteBlockPool.DirectAllocator());
+    }
+
+    public byte[] getBytes(long index) {
+      OffsetAndLength offsetAndLength = copiedOffsets.get(index);

Review Comment:
   > But when we want to promote from the fallback table to the primary table, 
having a length is easier I think
   
   Actually there is a way to do this: when promoting from fallback, you will 
have to first locate where this entry is in the fallback, which entails reading 
the self-delimiting `byte[]` during `nodeEquals`.  So if you just record, as a 
side effect, how far that reading went, you know its length?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to