Jackie-Jiang commented on a change in pull request #7708: URL: https://github.com/apache/pinot/pull/7708#discussion_r744006615
########## File path: pinot-segment-local/src/main/java/org/apache/pinot/segment/local/io/util/FixedByteValueReaderWriter.java ########## @@ -56,14 +57,28 @@ public String getUnpaddedString(int index, int numBytesPerValue, byte paddingByt assert buffer.length >= numBytesPerValue; long startOffset = (long) index * numBytesPerValue; - for (int i = 0; i < numBytesPerValue; i++) { - byte currentByte = _dataBuffer.getByte(startOffset + i); - if (currentByte == paddingByte) { - return new String(buffer, 0, i, UTF_8); + int written = 0; + long pattern = (paddingByte & 0xFFL) * 0x101010101010101L; + ByteBuffer wrapper = ByteBuffer.wrap(buffer); + for (int i = 0; i < ((numBytesPerValue >>> 3) << 3); i += 8) { + long word = _dataBuffer.getLong(startOffset + i); + wrapper.putLong(i, word); + long zeroed = word ^ pattern; + long tmp = (zeroed & 0x7F7F7F7F7F7F7F7FL) + 0x7F7F7F7F7F7F7F7FL; + tmp = ~(tmp | zeroed | 0x7F7F7F7F7F7F7F7FL); + int end = Long.numberOfLeadingZeros(tmp) >>> 3; Review comment: In both case, we need one if check, so that is not adding overhead. The longer the string is, the more operations we can save. I tried the suggested change using the benchmark in this PR, and do see some improvement: Code tested: ``` for (; written < ((numBytesPerValue >>> 3) << 3); written += 8) { long word = _dataBuffer.getLong(startOffset + written); wrapper.putLong(written, word); long zeroed = word ^ pattern; long tmp = (zeroed & 0x7F7F7F7F7F7F7F7FL) + 0x7F7F7F7F7F7F7F7FL; tmp = ~(tmp | zeroed | 0x7F7F7F7F7F7F7F7FL); if (tmp != 0) { int end = le ? Long.numberOfTrailingZeros(tmp) >>> 3 : Long.numberOfLeadingZeros(tmp) >>> 3; return new String(buffer, 0, written + end, UTF_8); } } ``` Before: ``` Benchmark (_length) (_nativeOrder) (_paddingByte) (_values) Mode Cnt Score Error Units BenchmarkFixedByteValueReaderWriter.readStrings 8 true 42 4096 avgt 3 137.822 ± 32.385 us/op BenchmarkFixedByteValueReaderWriter.readStrings 8 false 42 4096 avgt 3 123.427 ± 18.967 us/op BenchmarkFixedByteValueReaderWriter.readStrings 32 true 42 4096 avgt 3 195.033 ± 17.679 us/op BenchmarkFixedByteValueReaderWriter.readStrings 32 false 42 4096 avgt 3 198.772 ± 33.305 us/op BenchmarkFixedByteValueReaderWriter.readStrings 1024 true 42 4096 avgt 3 1094.745 ± 245.923 us/op BenchmarkFixedByteValueReaderWriter.readStrings 1024 false 42 4096 avgt 3 1095.573 ± 143.495 us/op BenchmarkFixedByteValueReaderWriter.readStrings 65536 true 42 4096 avgt 3 79073.673 ± 1514.600 us/op BenchmarkFixedByteValueReaderWriter.readStrings 65536 false 42 4096 avgt 3 67112.430 ± 2974.358 us/op ``` After: ``` Benchmark (_length) (_nativeOrder) (_paddingByte) (_values) Mode Cnt Score Error Units BenchmarkFixedByteValueReaderWriter.readStrings 8 true 42 4096 avgt 3 129.318 ± 126.111 us/op BenchmarkFixedByteValueReaderWriter.readStrings 8 false 42 4096 avgt 3 125.779 ± 30.194 us/op BenchmarkFixedByteValueReaderWriter.readStrings 32 true 42 4096 avgt 3 178.220 ± 38.041 us/op BenchmarkFixedByteValueReaderWriter.readStrings 32 false 42 4096 avgt 3 179.780 ± 20.051 us/op BenchmarkFixedByteValueReaderWriter.readStrings 1024 true 42 4096 avgt 3 956.858 ± 15.926 us/op BenchmarkFixedByteValueReaderWriter.readStrings 1024 false 42 4096 avgt 3 1035.626 ± 530.377 us/op BenchmarkFixedByteValueReaderWriter.readStrings 65536 true 42 4096 avgt 3 71804.195 ± 6806.822 us/op BenchmarkFixedByteValueReaderWriter.readStrings 65536 false 42 4096 avgt 3 63845.820 ± 734.957 us/op ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org