richardstartin opened a new pull request #8013:
URL: https://github.com/apache/pinot/pull/8013


   This is a fix for some low hanging fruit found by a user with a heavy 
RealtimeToOfflineTask - a good amount of time is spent sanitising strings.
   
   
![image](https://user-images.githubusercontent.com/16439049/149320377-9f190cf3-a950-449a-834b-ec426af2c78c.png)
   
   This change makes this method more efficient (less than half the allocation 
when truncation is required, and no allocation when it isn't, ~2x faster) by 
making better use of the `String` API:
   
   ```java
   @State(Scope.Benchmark)
   public class BenchmarkSanitizeStringValue {
   
     @Param({"10", "100", "1000"})
     int _length;
   
     @Param("512")
     int _maxLength;
   
     String _string;
   
     @Setup(Level.Trial)
     public void setup() {
       byte[] bytes = new byte[_length];
       for (int i = 0; i < _length; i++) {
         bytes[i] = (byte) ('a' + ThreadLocalRandom.current().nextInt(26));
       }
       _string = new String(bytes, StandardCharsets.UTF_8);
     }
   
     @Benchmark
     public String sanitize() {
       return StringUtil.sanitizeStringValue(_string, _maxLength);
     }
   
     @Benchmark
     public String sanitizeNew() {
       return StringUtil.sanitizeStringValueNew(_string, _maxLength);
     }
   }
   ```
   
   
   ```
   Benchmark                                                              
(_length)  (_maxLength)  Mode  Cnt     Score     Error   Units
   Benchmark                                                              
(_length)  (_maxLength)  Mode  Cnt     Score     Error   Units
   BenchmarkSanitizeStringValue.sanitize                                        
 10           512  avgt    5    10.444 ±   0.213   ns/op
   BenchmarkSanitizeStringValue.sanitize:·gc.alloc.rate.norm                    
 10           512  avgt    5    40.000 ±   0.001    B/op
   BenchmarkSanitizeStringValue.sanitize                                        
100           512  avgt    5    29.909 ±   1.637   ns/op
   BenchmarkSanitizeStringValue.sanitize:·gc.alloc.rate.norm                    
100           512  avgt    5   216.000 ±   0.001    B/op
   BenchmarkSanitizeStringValue.sanitize                                       
1000           512  avgt    5   203.566 ±  13.111   ns/op
   BenchmarkSanitizeStringValue.sanitize:·gc.alloc.rate.norm                   
1000           512  avgt    5  2568.000 ±   0.001    B/op
   BenchmarkSanitizeStringValue.sanitizeNew                                     
 10           512  avgt    5     6.107 ±   0.369   ns/op
   BenchmarkSanitizeStringValue.sanitizeNew:·gc.alloc.rate.norm                 
 10           512  avgt    5    ≈ 10⁻⁶              B/op
   BenchmarkSanitizeStringValue.sanitizeNew                                     
100           512  avgt    5    14.349 ±   0.261   ns/op
   BenchmarkSanitizeStringValue.sanitizeNew:·gc.alloc.rate.norm                 
100           512  avgt    5    ≈ 10⁻⁵              B/op
   BenchmarkSanitizeStringValue.sanitizeNew                                    
1000           512  avgt    5   102.201 ±   0.703   ns/op
   BenchmarkSanitizeStringValue.sanitizeNew:·gc.alloc.rate.norm                
1000           512  avgt    5   552.000 ±   0.001    B/op
   ```
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



---------------------------------------------------------------------
To unsubscribe, e-mail: commits-unsubscr...@pinot.apache.org
For additional commands, e-mail: commits-h...@pinot.apache.org

Reply via email to