gf2121 commented on code in PR #14494: URL: https://github.com/apache/lucene/pull/14494#discussion_r2074987659
########## lucene/core/src/java/org/apache/lucene/codecs/lucene103/blocktree/TrieReader.java: ########## @@ -74,14 +77,39 @@ IndexInput floorData(TrieReader r) throws IOException { final RandomAccessInput access; final IndexInput input; final Node root; + final int[] labelMap; - TrieReader(IndexInput input, long rootFP) throws IOException { + static IOSupplier<TrieReader> readerSupplier(DataInput metaIn, IndexInput indexIn) + throws IOException { + int[] labelMap = TrieReader.labelMap(metaIn); + long start = metaIn.readVLong(); + long rootFP = metaIn.readVLong(); + long end = metaIn.readVLong(); + return () -> new TrieReader(indexIn.slice("outputs", start, end - start), rootFP, labelMap); + } + + private TrieReader(IndexInput input, long rootFP, int[] labelMap) throws IOException { this.access = input.randomAccessSlice(0, input.length()); + this.labelMap = labelMap; this.input = input; this.root = new Node(); load(root, rootFP); } + private static int[] labelMap(DataInput in) throws IOException { + int cnt = in.readVInt(); + if (cnt == 0) { + return null; + } else { + int[] labelMap = new int[TrieBuilder.BYTE_RANGE]; Review Comment: For now, we need a value, like `-1` to represent 'this label does not exist in this trie'. So it can not be simply replaced by `byte[]`. I personally think 256 * 4 = 1KB heap per field is OK. But we can reduce the heap usage in cost of looking up overhead, like a `bitset` representing whether the value exists, and a `byte[]` to map values. I can make the change if you think this is worth :) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org