jpountz commented on code in PR #14133: URL: https://github.com/apache/lucene/pull/14133#discussion_r1915083479
########## lucene/core/src/java/org/apache/lucene/codecs/lucene101/Lucene101PostingsReader.java: ########## @@ -572,7 +597,36 @@ public int freq() throws IOException { } private void refillFullBlock() throws IOException { - forDeltaUtil.decodeAndPrefixSum(docInUtil, prevDocID, docBuffer); + int bitsPerValue = docIn.readByte(); + if (bitsPerValue > 0) { + forDeltaUtil.decodeAndPrefixSum(bitsPerValue, docInUtil, prevDocID, docBuffer); + encoding = DeltaEncoding.PACKED; + } else if (bitsPerValue == 0) { + // dense block: 128 one bits + docBitSet.set(0, BLOCK_SIZE); + docBitSetBase = prevDocID + 1; + docCumulativeWordPopCounts[0] = Long.SIZE; + docCumulativeWordPopCounts[1] = 2 * Long.SIZE; + encoding = DeltaEncoding.UNARY; + } else { + assert level0LastDocID != NO_MORE_DOCS; + // block is encoded as a bit set + docBitSetBase = prevDocID + 1; + int numLongs = -bitsPerValue; + docIn.readLongs(docBitSet.getBits(), 0, numLongs); + // Note: this for loop auto-vectorizes + for (int i = 0; i < numLongs - 1; ++i) { + docCumulativeWordPopCounts[i] = Long.bitCount(docBitSet.getBits()[i]); + } + for (int i = 1; i < numLongs - 1; ++i) { + docCumulativeWordPopCounts[i] += docCumulativeWordPopCounts[i - 1]; + } + docCumulativeWordPopCounts[numLongs - 1] = BLOCK_SIZE; + assert docCumulativeWordPopCounts[numLongs - 2] Review Comment: We only use the bit set encoding for "full" blocks. Tail blocks, which may have less than 128 doc IDs to record, keep using the current encoding that stores deltas using group-varint, they never use a bit set. ########## lucene/core/src/java/org/apache/lucene/codecs/lucene101/Lucene101PostingsReader.java: ########## @@ -572,7 +597,36 @@ public int freq() throws IOException { } private void refillFullBlock() throws IOException { - forDeltaUtil.decodeAndPrefixSum(docInUtil, prevDocID, docBuffer); + int bitsPerValue = docIn.readByte(); + if (bitsPerValue > 0) { + forDeltaUtil.decodeAndPrefixSum(bitsPerValue, docInUtil, prevDocID, docBuffer); + encoding = DeltaEncoding.PACKED; + } else if (bitsPerValue == 0) { + // dense block: 128 one bits + docBitSet.set(0, BLOCK_SIZE); + docBitSetBase = prevDocID + 1; + docCumulativeWordPopCounts[0] = Long.SIZE; + docCumulativeWordPopCounts[1] = 2 * Long.SIZE; + encoding = DeltaEncoding.UNARY; + } else { + assert level0LastDocID != NO_MORE_DOCS; + // block is encoded as a bit set + docBitSetBase = prevDocID + 1; + int numLongs = -bitsPerValue; + docIn.readLongs(docBitSet.getBits(), 0, numLongs); + // Note: this for loop auto-vectorizes + for (int i = 0; i < numLongs - 1; ++i) { + docCumulativeWordPopCounts[i] = Long.bitCount(docBitSet.getBits()[i]); + } + for (int i = 1; i < numLongs - 1; ++i) { Review Comment: Indeed. :) I added a comment to make it clearer. ########## lucene/core/src/java/org/apache/lucene/codecs/lucene101/Lucene101PostingsReader.java: ########## @@ -572,7 +597,36 @@ public int freq() throws IOException { } private void refillFullBlock() throws IOException { - forDeltaUtil.decodeAndPrefixSum(docInUtil, prevDocID, docBuffer); + int bitsPerValue = docIn.readByte(); + if (bitsPerValue > 0) { + forDeltaUtil.decodeAndPrefixSum(bitsPerValue, docInUtil, prevDocID, docBuffer); + encoding = DeltaEncoding.PACKED; + } else if (bitsPerValue == 0) { + // dense block: 128 one bits Review Comment: I'm not sure what is confusing, `docBitSet.set(0, BLOCK_SIZE)` sets BLOCK_SIZE bits to `true`? I refactored a bit, hopefully it is clearer. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org