richardstartin opened a new pull request #8013: URL: https://github.com/apache/pinot/pull/8013
This is a fix for some low hanging fruit found by a user with a heavy RealtimeToOfflineTask - a good amount of time is spent sanitising strings.  This change makes this method more efficient (less than half the allocation when truncation is required, and no allocation when it isn't, ~2x faster) by making better use of the `String` API: ```java @State(Scope.Benchmark) public class BenchmarkSanitizeStringValue { @Param({"10", "100", "1000"}) int _length; @Param("512") int _maxLength; String _string; @Setup(Level.Trial) public void setup() { byte[] bytes = new byte[_length]; for (int i = 0; i < _length; i++) { bytes[i] = (byte) ('a' + ThreadLocalRandom.current().nextInt(26)); } _string = new String(bytes, StandardCharsets.UTF_8); } @Benchmark public String sanitize() { return StringUtil.sanitizeStringValue(_string, _maxLength); } @Benchmark public String sanitizeNew() { return StringUtil.sanitizeStringValueNew(_string, _maxLength); } } ``` ``` Benchmark (_length) (_maxLength) Mode Cnt Score Error Units Benchmark (_length) (_maxLength) Mode Cnt Score Error Units BenchmarkSanitizeStringValue.sanitize 10 512 avgt 5 10.444 ± 0.213 ns/op BenchmarkSanitizeStringValue.sanitize:·gc.alloc.rate.norm 10 512 avgt 5 40.000 ± 0.001 B/op BenchmarkSanitizeStringValue.sanitize 100 512 avgt 5 29.909 ± 1.637 ns/op BenchmarkSanitizeStringValue.sanitize:·gc.alloc.rate.norm 100 512 avgt 5 216.000 ± 0.001 B/op BenchmarkSanitizeStringValue.sanitize 1000 512 avgt 5 203.566 ± 13.111 ns/op BenchmarkSanitizeStringValue.sanitize:·gc.alloc.rate.norm 1000 512 avgt 5 2568.000 ± 0.001 B/op BenchmarkSanitizeStringValue.sanitizeNew 10 512 avgt 5 6.107 ± 0.369 ns/op BenchmarkSanitizeStringValue.sanitizeNew:·gc.alloc.rate.norm 10 512 avgt 5 ≈ 10⁻⁶ B/op BenchmarkSanitizeStringValue.sanitizeNew 100 512 avgt 5 14.349 ± 0.261 ns/op BenchmarkSanitizeStringValue.sanitizeNew:·gc.alloc.rate.norm 100 512 avgt 5 ≈ 10⁻⁵ B/op BenchmarkSanitizeStringValue.sanitizeNew 1000 512 avgt 5 102.201 ± 0.703 ns/op BenchmarkSanitizeStringValue.sanitizeNew:·gc.alloc.rate.norm 1000 512 avgt 5 552.000 ± 0.001 B/op ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org For additional commands, e-mail: commits-h...@pinot.apache.org