gsmiller commented on code in PR #13658: URL: https://github.com/apache/lucene/pull/13658#discussion_r1718827005
########## lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForDeltaUtil.java: ########## @@ -62,22 +256,374 @@ void encodeDeltas(long[] longs, DataOutput out) throws IOException { assert or != 0; final int bitsPerValue = PackedInts.bitsRequired(or); out.writeByte((byte) bitsPerValue); - forUtil.encode(longs, bitsPerValue, out); + encodeDeltas(longs, bitsPerValue, out); + } + } + + private void encodeDeltas(long[] longs, int bitsPerValue, DataOutput out) throws IOException { + final int nextPrimitive; + final int numLongs; + if (bitsPerValue <= 4) { + nextPrimitive = 8; + numLongs = BLOCK_SIZE / 8; + collapse8(longs); + } else if (bitsPerValue <= 11) { + nextPrimitive = 16; + numLongs = BLOCK_SIZE / 4; + collapse16(longs); + } else { + nextPrimitive = 32; + numLongs = BLOCK_SIZE / 2; + collapse32(longs); + } + + final int numLongsPerShift = bitsPerValue * 2; + int idx = 0; + int shift = nextPrimitive - bitsPerValue; + for (int i = 0; i < numLongsPerShift; ++i) { + tmp[i] = longs[idx++] << shift; + } + for (shift = shift - bitsPerValue; shift >= 0; shift -= bitsPerValue) { + for (int i = 0; i < numLongsPerShift; ++i) { + tmp[i] |= longs[idx++] << shift; + } + } + + final int remainingBitsPerLong = shift + bitsPerValue; + final long maskRemainingBitsPerLong; + if (nextPrimitive == 8) { + maskRemainingBitsPerLong = MASKS8[remainingBitsPerLong]; + } else if (nextPrimitive == 16) { + maskRemainingBitsPerLong = MASKS16[remainingBitsPerLong]; + } else { + maskRemainingBitsPerLong = MASKS32[remainingBitsPerLong]; + } + + int tmpIdx = 0; + int remainingBitsPerValue = bitsPerValue; + while (idx < numLongs) { + if (remainingBitsPerValue >= remainingBitsPerLong) { + remainingBitsPerValue -= remainingBitsPerLong; + tmp[tmpIdx++] |= (longs[idx] >>> remainingBitsPerValue) & maskRemainingBitsPerLong; + if (remainingBitsPerValue == 0) { + idx++; + remainingBitsPerValue = bitsPerValue; + } + } else { + final long mask1, mask2; + if (nextPrimitive == 8) { + mask1 = MASKS8[remainingBitsPerValue]; + mask2 = MASKS8[remainingBitsPerLong - remainingBitsPerValue]; + } else if (nextPrimitive == 16) { + mask1 = MASKS16[remainingBitsPerValue]; + mask2 = MASKS16[remainingBitsPerLong - remainingBitsPerValue]; + } else { + mask1 = MASKS32[remainingBitsPerValue]; + mask2 = MASKS32[remainingBitsPerLong - remainingBitsPerValue]; + } + tmp[tmpIdx] |= (longs[idx++] & mask1) << (remainingBitsPerLong - remainingBitsPerValue); + remainingBitsPerValue = bitsPerValue - remainingBitsPerLong + remainingBitsPerValue; + tmp[tmpIdx++] |= (longs[idx] >>> remainingBitsPerValue) & mask2; + } + } + + for (int i = 0; i < numLongsPerShift; ++i) { + out.writeLong(tmp[i]); + } + } + + /** Number of bytes required to encode 128 integers of {@code bitsPerValue} bits per value. */ + int numBytes(int bitsPerValue) { + return bitsPerValue << (BLOCK_SIZE_LOG2 - 3); + } + + private static void decodeSlow( Review Comment: And one more method that I think is duplicated from `ForUtil`? ########## lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForDeltaUtil.java: ########## @@ -62,22 +256,374 @@ void encodeDeltas(long[] longs, DataOutput out) throws IOException { assert or != 0; final int bitsPerValue = PackedInts.bitsRequired(or); out.writeByte((byte) bitsPerValue); - forUtil.encode(longs, bitsPerValue, out); + encodeDeltas(longs, bitsPerValue, out); + } + } + + private void encodeDeltas(long[] longs, int bitsPerValue, DataOutput out) throws IOException { Review Comment: I think this is identical to `ForDelta#encode` right? Can we leverage that instead of duplicating it? I guess the issue is that `ForUtil` relies on its `tmp` buffer so the method isn't static, but we could split out a helper static method that takes the buffer as input that could be common I think. ########## lucene/core/src/java/org/apache/lucene/internal/vectorization/DefaultPostingDecodingUtil.java: ########## @@ -29,13 +29,14 @@ public DefaultPostingDecodingUtil(IndexInput in) { @Override public void splitLongs( Review Comment: What if we add the "slow" implementation as a static method in `PostingDecodingUtil` (as sketched out below) and then delegate to it here and in the "slow path" in the vectorized implementation? That we can avoid the code duplication. ``` protected static void splitLongsSlow(IndexInput in, int count, long[] b, int bShift, int dec, long bMask, long[] c, int cIndex, long cMask) throws IOException { in.readLongs(c, cIndex, count); // The below loop is auto-vectorized for (int i = 0; i < count; ++i) { for (int j = 0, end = (bShift - 1) / dec; j <= end; ++j) { b[count * j + i] = (c[cIndex + i] >>> (bShift - j * dec)) & bMask; } // can we determine cIndex as `count * ((bShift - 1) / dec + 1)`? c[cIndex + i] &= cMask; } } ``` ########## lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForUtil.java: ########## @@ -712,14 +465,14 @@ private static void decode7(IndexInput in, PostingDecodingUtil pdu, long[] tmp, } } - private static void decode8(IndexInput in, PostingDecodingUtil pdu, long[] tmp, long[] longs) + static void decode8(IndexInput in, PostingDecodingUtil pdu, long[] tmp, long[] longs) Review Comment: If I remember correctly, there was a version of the change where you always read the longs through `PostingDecodingUtil` and didn't need `IndexInput` for these edge-cases. It seems a little easier to understand that way (to me at least). Since `PostingDecodingUtil` is always getting created with an `IndexInput`, would it make sense to have a method in `PostingDecodingUtil` that just does the read delegation? Or an alternative would be to expose `IndexInput` (e.g., it could just be a public field in the abstract class)? It's a minor detail but it seems like it would make the code wiring slightly cleaner to not have to provide both `IndexInput` and `PostingDecodingUtil` when initializing `ForUtil` / `ForDeltaUtil`. ########## lucene/core/src/java/org/apache/lucene/codecs/lucene912/ForDeltaUtil.java: ########## @@ -62,22 +256,374 @@ void encodeDeltas(long[] longs, DataOutput out) throws IOException { assert or != 0; final int bitsPerValue = PackedInts.bitsRequired(or); out.writeByte((byte) bitsPerValue); - forUtil.encode(longs, bitsPerValue, out); + encodeDeltas(longs, bitsPerValue, out); + } + } + + private void encodeDeltas(long[] longs, int bitsPerValue, DataOutput out) throws IOException { + final int nextPrimitive; + final int numLongs; + if (bitsPerValue <= 4) { + nextPrimitive = 8; + numLongs = BLOCK_SIZE / 8; + collapse8(longs); + } else if (bitsPerValue <= 11) { + nextPrimitive = 16; + numLongs = BLOCK_SIZE / 4; + collapse16(longs); + } else { + nextPrimitive = 32; + numLongs = BLOCK_SIZE / 2; + collapse32(longs); + } + + final int numLongsPerShift = bitsPerValue * 2; + int idx = 0; + int shift = nextPrimitive - bitsPerValue; + for (int i = 0; i < numLongsPerShift; ++i) { + tmp[i] = longs[idx++] << shift; + } + for (shift = shift - bitsPerValue; shift >= 0; shift -= bitsPerValue) { + for (int i = 0; i < numLongsPerShift; ++i) { + tmp[i] |= longs[idx++] << shift; + } + } + + final int remainingBitsPerLong = shift + bitsPerValue; + final long maskRemainingBitsPerLong; + if (nextPrimitive == 8) { + maskRemainingBitsPerLong = MASKS8[remainingBitsPerLong]; + } else if (nextPrimitive == 16) { + maskRemainingBitsPerLong = MASKS16[remainingBitsPerLong]; + } else { + maskRemainingBitsPerLong = MASKS32[remainingBitsPerLong]; + } + + int tmpIdx = 0; + int remainingBitsPerValue = bitsPerValue; + while (idx < numLongs) { + if (remainingBitsPerValue >= remainingBitsPerLong) { + remainingBitsPerValue -= remainingBitsPerLong; + tmp[tmpIdx++] |= (longs[idx] >>> remainingBitsPerValue) & maskRemainingBitsPerLong; + if (remainingBitsPerValue == 0) { + idx++; + remainingBitsPerValue = bitsPerValue; + } + } else { + final long mask1, mask2; + if (nextPrimitive == 8) { + mask1 = MASKS8[remainingBitsPerValue]; + mask2 = MASKS8[remainingBitsPerLong - remainingBitsPerValue]; + } else if (nextPrimitive == 16) { + mask1 = MASKS16[remainingBitsPerValue]; + mask2 = MASKS16[remainingBitsPerLong - remainingBitsPerValue]; + } else { + mask1 = MASKS32[remainingBitsPerValue]; + mask2 = MASKS32[remainingBitsPerLong - remainingBitsPerValue]; + } + tmp[tmpIdx] |= (longs[idx++] & mask1) << (remainingBitsPerLong - remainingBitsPerValue); + remainingBitsPerValue = bitsPerValue - remainingBitsPerLong + remainingBitsPerValue; + tmp[tmpIdx++] |= (longs[idx] >>> remainingBitsPerValue) & mask2; + } + } + + for (int i = 0; i < numLongsPerShift; ++i) { + out.writeLong(tmp[i]); + } + } + + /** Number of bytes required to encode 128 integers of {@code bitsPerValue} bits per value. */ + int numBytes(int bitsPerValue) { Review Comment: I think this is also identical to what's in `ForUtil`? Could we reference it and avoid the duplication? ########## lucene/core/src/java/org/apache/lucene/codecs/lucene912/PostingIndexInput.java: ########## @@ -50,6 +53,6 @@ public void decode(int bitsPerValue, long[] longs) throws IOException { * and store results into {@code longs}. */ public void decodeAndPrefixSum(int bitsPerValue, long base, long[] longs) throws IOException { Review Comment: Actually, I guess this whole class is now only used by benchmarks? Do we actually need it or could benchmarks go directly against `ForUtil` / `ForDeltaUtil`? ########## lucene/core/src/java/org/apache/lucene/codecs/lucene912/PostingIndexInput.java: ########## @@ -50,6 +53,6 @@ public void decode(int bitsPerValue, long[] longs) throws IOException { * and store results into {@code longs}. */ public void decodeAndPrefixSum(int bitsPerValue, long base, long[] longs) throws IOException { Review Comment: It looks like this method is only around for benchmarking? Could benchmarks directly call `ForDeltaUtil#decodeAndPrefixSum` so we could not need this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org