msfroh commented on PR #13521: URL: https://github.com/apache/lucene/pull/13521#issuecomment-2313906920
I tried modifying the loop to process 4 longs per iteration and noticed no difference on my Xeon host, which is unsurprising since there was no difference between 1 and 3. I also tried the following SIMD implementation of `decode`: ``` @Override public void decode(IndexInput in, int start, int count, int[] docIDs) throws IOException { int i = 0; long[] inputScratch = new long[LONG_SPECIES.length()]; long[] outputScratch = new long[LONG_SPECIES.length() * 3]; int bound = LONG_SPECIES.loopBound(count / 3) * 3; for (; i < bound; i += outputScratch.length) { for (int j = 0; j < LONG_SPECIES.length(); j++) { inputScratch[j] = in.readLong(); } LongVector longVector = LongVector.fromArray(LONG_SPECIES, inputScratch, 0); longVector.lanewise(VectorOperators.LSHR, 42) .intoArray(outputScratch, 0); longVector.lanewise(VectorOperators.AND, 0x000003FFFFE00000L) .lanewise(VectorOperators.LSHR, 21) .intoArray(outputScratch, LONG_SPECIES.length()); longVector.lanewise(VectorOperators.AND, 0x001FFFFFL) .intoArray(outputScratch, LONG_SPECIES.length() * 2); for (int j = 0; j < LONG_SPECIES.length(); j++) { docIDs[i + j] = (int) outputScratch[j]; docIDs[i + j + 1] = (int) outputScratch[j + LONG_SPECIES.length()]; docIDs[i + j + 2] = (int) outputScratch[j + LONG_SPECIES.length() * 2]; } } for (; i < count - 2; i += 3) { long packedLong = in.readLong(); docIDs[i] = (int) (packedLong >>> 42); docIDs[i + 1] = (int) ((packedLong & 0x000003FFFFE00000L) >>> 21); docIDs[i + 2] = (int) (packedLong & 0x001FFFFFL); } for (; i < count; i++) { docIDs[i] = in.readInt(); } } ``` Unfortunately, it performs noticeably worse than the other implementations: ``` Benchmark (encoderName) Mode Cnt Score Error Units DocIdEncodingBenchmark.decode Bit21WithSimdEncoder avgt 5 2191.040 ± 14.913 ms/op DocIdEncodingBenchmark.decode Bit21With3StepsEncoder avgt 5 850.331 ± 4.576 ms/op DocIdEncodingBenchmark.decode Bit21With2StepsEncoder avgt 5 859.980 ± 4.567 ms/op DocIdEncodingBenchmark.decode Bit24Encoder avgt 5 912.914 ± 5.488 ms/op ``` Maybe I'm doing it wrong 🤷 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org