keikino commented on code in PR #14494: URL: https://github.com/apache/lucene/pull/14494#discussion_r2076665581
########## lucene/core/src/java/org/apache/lucene/codecs/lucene103/blocktree/TrieReader.java: ########## @@ -74,14 +77,39 @@ IndexInput floorData(TrieReader r) throws IOException { final RandomAccessInput access; final IndexInput input; final Node root; + final int[] labelMap; - TrieReader(IndexInput input, long rootFP) throws IOException { + static IOSupplier<TrieReader> readerSupplier(DataInput metaIn, IndexInput indexIn) + throws IOException { + int[] labelMap = TrieReader.labelMap(metaIn); + long start = metaIn.readVLong(); + long rootFP = metaIn.readVLong(); + long end = metaIn.readVLong(); + return () -> new TrieReader(indexIn.slice("outputs", start, end - start), rootFP, labelMap); + } + + private TrieReader(IndexInput input, long rootFP, int[] labelMap) throws IOException { this.access = input.randomAccessSlice(0, input.length()); + this.labelMap = labelMap; this.input = input; this.root = new Node(); load(root, rootFP); } + private static int[] labelMap(DataInput in) throws IOException { + int cnt = in.readVInt(); + if (cnt == 0) { + return null; + } else { + int[] labelMap = new int[TrieBuilder.BYTE_RANGE]; Review Comment: If the main reasoning for using an `int[]` is because we want to check for existence, and we'd otherwise get the same functionality by using the `byte[]`, I agree that using something like a `bitset` instead would be better - 256 size `bitset`: 256/64 bytes + overhead = ~48 bytes 256 size `byte[]` : 256 bytes + overhead = ~272 bytes sum ~320 bytes vs 256 size `int[]`: 256*4 + overhead = ~1048 bytes We'd get a ~70% memory footprint reduction by using the `bitset`+`byte[]`, and as this will compound over all of the indexes, I think it's definitely a worthy memory optimization! -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org