msfroh commented on PR #13521:
URL: https://github.com/apache/lucene/pull/13521#issuecomment-2313906920
I tried modifying the loop to process 4 longs per iteration and noticed no
difference on my Xeon host, which is unsurprising since there was no difference
between 1 and 3.
I also tried the following SIMD implementation of `decode`:
```
@Override
public void decode(IndexInput in, int start, int count, int[]
docIDs) throws IOException {
int i = 0;
long[] inputScratch = new long[LONG_SPECIES.length()];
long[] outputScratch = new long[LONG_SPECIES.length() * 3];
int bound = LONG_SPECIES.loopBound(count / 3) * 3;
for (; i < bound; i += outputScratch.length) {
for (int j = 0; j < LONG_SPECIES.length(); j++) {
inputScratch[j] = in.readLong();
}
LongVector longVector = LongVector.fromArray(LONG_SPECIES,
inputScratch, 0);
longVector.lanewise(VectorOperators.LSHR, 42)
.intoArray(outputScratch, 0);
longVector.lanewise(VectorOperators.AND, 0x000003FFFFE00000L)
.lanewise(VectorOperators.LSHR, 21)
.intoArray(outputScratch, LONG_SPECIES.length());
longVector.lanewise(VectorOperators.AND, 0x001FFFFFL)
.intoArray(outputScratch, LONG_SPECIES.length() * 2);
for (int j = 0; j < LONG_SPECIES.length(); j++) {
docIDs[i + j] = (int) outputScratch[j];
docIDs[i + j + 1] = (int) outputScratch[j +
LONG_SPECIES.length()];
docIDs[i + j + 2] = (int) outputScratch[j +
LONG_SPECIES.length() * 2];
}
}
for (; i < count - 2; i += 3) {
long packedLong = in.readLong();
docIDs[i] = (int) (packedLong >>> 42);
docIDs[i + 1] = (int) ((packedLong & 0x000003FFFFE00000L)
>>> 21);
docIDs[i + 2] = (int) (packedLong & 0x001FFFFFL);
}
for (; i < count; i++) {
docIDs[i] = in.readInt();
}
}
```
Unfortunately, it performs noticeably worse than the other implementations:
```
Benchmark (encoderName) Mode Cnt Score
Error Units
DocIdEncodingBenchmark.decode Bit21WithSimdEncoder avgt 5 2191.040 ±
14.913 ms/op
DocIdEncodingBenchmark.decode Bit21With3StepsEncoder avgt 5 850.331 ±
4.576 ms/op
DocIdEncodingBenchmark.decode Bit21With2StepsEncoder avgt 5 859.980 ±
4.567 ms/op
DocIdEncodingBenchmark.decode Bit24Encoder avgt 5 912.914 ±
5.488 ms/op
```
Maybe I'm doing it wrong 🤷
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]