Re: [PR] Introduce a mapping to map sparse labels to a continuous range [lucene]

via GitHub Tue, 06 May 2025 19:34:56 -0700


keikino commented on code in PR #14494:
URL: https://github.com/apache/lucene/pull/14494#discussion_r2076665581



##########
lucene/core/src/java/org/apache/lucene/codecs/lucene103/blocktree/TrieReader.java:
##########
@@ -74,14 +77,39 @@ IndexInput floorData(TrieReader r) throws IOException {
   final RandomAccessInput access;
   final IndexInput input;
   final Node root;
+  final int[] labelMap;
 
-  TrieReader(IndexInput input, long rootFP) throws IOException {
+  static IOSupplier<TrieReader> readerSupplier(DataInput metaIn, IndexInput 
indexIn)
+      throws IOException {
+    int[] labelMap = TrieReader.labelMap(metaIn);
+    long start = metaIn.readVLong();
+    long rootFP = metaIn.readVLong();
+    long end = metaIn.readVLong();
+    return () -> new TrieReader(indexIn.slice("outputs", start, end - start), 
rootFP, labelMap);
+  }
+
+  private TrieReader(IndexInput input, long rootFP, int[] labelMap) throws 
IOException {
     this.access = input.randomAccessSlice(0, input.length());
+    this.labelMap = labelMap;
     this.input = input;
     this.root = new Node();
     load(root, rootFP);
   }
 
+  private static int[] labelMap(DataInput in) throws IOException {
+    int cnt = in.readVInt();
+    if (cnt == 0) {
+      return null;
+    } else {
+      int[] labelMap = new int[TrieBuilder.BYTE_RANGE];

Review Comment:
   If the main reasoning for using an `int[]` is because we want to check for 
existence, and we'd otherwise get the same functionality by using the `byte[]`, 
I agree that using something like a `bitset` instead would be better -
   
   256 size `bitset`: 256/64 bytes + overhead = ~48 bytes
   256 size `byte[]` : 256 bytes + overhead = ~272 bytes
   sum ~320 bytes
   
   vs
   
   256 size `int[]`: 256*4 + overhead = ~1048 bytes
   
   We'd get a ~70% memory footprint reduction by using the `bitset`+`byte[]`, 
and as this will compound over all of the indexes, I think it's definitely a 
worthy memory optimization!



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Re: [PR] Introduce a mapping to map sparse labels to a continuous range [lucene]

Reply via email to