[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1234: Add compression for Binary doc value fields

GitBox Tue, 04 Feb 2020 07:46:01 -0800

mikemccand commented on a change in pull request #1234: Add compression for 
Binary doc value fields
URL: https://github.com/apache/lucene-solr/pull/1234#discussion_r374747203


 ##########
 File path: 
lucene/core/src/java/org/apache/lucene/codecs/lucene80/Lucene80DocValuesProducer.java
 ##########
 @@ -742,6 +755,107 @@ public BytesRef binaryValue() throws IOException {
         };
       }
     }
+  }  
+  
+  // Decompresses blocks of binary values to retrieve content
+  class BinaryDecoder {
+    
+    private final LongValues addresses;
+    private final IndexInput compressedData;
+    // Cache of last uncompressed block 
+    private long lastBlockId = -1;
+    private int []uncompressedDocEnds = new 
int[Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK];
+    private int uncompressedBlockLength = 0;        
+    private int numDocsInBlock = 0;
+    private final byte[] uncompressedBlock;
+    private BytesRef uncompressedBytesRef;
+    
+    public BinaryDecoder(LongValues addresses, IndexInput compressedData, int 
biggestUncompressedBlockSize) {
+      super();
+      this.addresses = addresses;
+      this.compressedData = compressedData;
+      // pre-allocate a byte array large enough for the biggest uncompressed 
block needed.
+      this.uncompressedBlock = new byte[biggestUncompressedBlockSize];
+      
+    }
+
+    BytesRef decode(int docNumber) throws IOException {
+      int blockId = docNumber >> Lucene80DocValuesFormat.BINARY_BLOCK_SHIFT; 
+      int docInBlockId = docNumber % 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK;
+      assert docInBlockId < 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK;
+      
+      
+      // already read and uncompressed?
+      if ( blockId != lastBlockId) {
+        lastBlockId = blockId;
+        long blockStartOffset = addresses.get(blockId);
+        compressedData.seek(blockStartOffset);
+        
+        numDocsInBlock = compressedData.readVInt();
+        assert numDocsInBlock <= 
Lucene80DocValuesFormat.BINARY_DOCS_PER_COMPRESSED_BLOCK;
+        uncompressedDocEnds = new int[numDocsInBlock];
+        uncompressedBlockLength = 0;        
+        for (int i = 0; i < numDocsInBlock; i++) {
+          uncompressedBlockLength += compressedData.readVInt();
+          uncompressedDocEnds[i] = uncompressedBlockLength;
+        }
+        
+        if (uncompressedBlockLength == 0) {
+          uncompressedBytesRef = new BytesRef(BytesRef.EMPTY_BYTES);
+        } else {
+          assert uncompressedBlockLength <= uncompressedBlock.length;
+          LZ4.decompress(compressedData, uncompressedBlockLength, 
uncompressedBlock, 0);
+          uncompressedBytesRef = new BytesRef(uncompressedBlock);
+        }
+      }
+      
+      // Position the Bytes ref to the relevant part of the uncompressed block
 
 Review comment:
   s/`Bytes ref`/`BytesRef`?

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [lucene-solr] mikemccand commented on a change in pull request #1234: Add compression for Binary doc value fields

Reply via email to