dungba88 commented on code in PR #12738:
URL: https://github.com/apache/lucene/pull/12738#discussion_r1377360860


##########
lucene/core/src/java/org/apache/lucene/util/fst/NodeHash.java:
##########
@@ -269,36 +283,58 @@ private boolean nodesEqual(FSTCompiler.UnCompiledNode<T> 
node, long address) thr
     return false;
   }
 
+  record OffsetAndLength(long offset, int length) {}
+
   /** Inner class because it needs access to hash function and FST bytes. */
-  private class PagedGrowableHash {
+  class PagedGrowableHash {
     private PagedGrowableWriter entries;
-    private long count;
+    // nocommit: use PagedGrowableWriter? there was some size overflow issue 
with
+    // PagedGrowableWriter
+    // mapping from FST real address to copiedNodes offsets & length
+    private Map<Long, OffsetAndLength> copiedOffsets;
+    long count;
+    long currentOffsets;
     private long mask;
+    private final ByteBlockPool copiedNodes;
 
     // 256K blocks, but note that the final block is sized only as needed so 
it won't use the full
     // block size when just a few elements were written to it
     private static final int BLOCK_SIZE_BYTES = 1 << 18;
 
     public PagedGrowableHash() {
       entries = new PagedGrowableWriter(16, BLOCK_SIZE_BYTES, 8, 
PackedInts.COMPACT);
+      copiedOffsets = new HashMap<>();
       mask = 15;
+      copiedNodes = new ByteBlockPool(new ByteBlockPool.DirectAllocator());
     }
 
     public PagedGrowableHash(long lastNodeAddress, long size) {
       entries =
           new PagedGrowableWriter(
               size, BLOCK_SIZE_BYTES, 
PackedInts.bitsRequired(lastNodeAddress), PackedInts.COMPACT);
+      copiedOffsets = new HashMap<>();
       mask = size - 1;
       assert (mask & size) == 0 : "size must be a power-of-2; got size=" + 
size + " mask=" + mask;
+      copiedNodes = new ByteBlockPool(new ByteBlockPool.DirectAllocator());
+    }
+
+    public byte[] getBytes(long index) {
+      OffsetAndLength offsetAndLength = copiedOffsets.get(index);

Review Comment:
   It's self-delimiting so it's fine when reading with the FST operation. But 
when we want to promote from the fallback table to the primary table, having a 
length is easier I think. Unless there is a special byte mark that I can check 
against.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to