[GitHub] [lucene] jainankitk opened a new issue, #12527: Optimize readInts24 performance for DocIdsWriter

via GitHub Tue, 29 Aug 2023 17:34:16 -0700


jainankitk opened a new issue, #12527:
URL: https://github.com/apache/lucene/issues/12527


   ### Description
   
   While recently [working on numeric range 
queries](https://github.com/opensearch-project/OpenSearch/issues/9541), I 
noticed readInts24 to be consuming significant CPU cycles. When I looked into 
the code, I noticed [multiple consecutive invocations of 
readLong](https://github.com/apache/lucene/blob/main/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java#L330).
   
   Initially, it seems that the overhead from multiple syscalls should not be 
as much, but I tried quick patch by reading all the longs together and it 
seemed to help. Sharing the patch and numbers below (nyc_taxis range query):
   
   ```
   diff --git 
a/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java 
b/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java
   index 40db4c0069d..40ee7a1c968 100644
   --- a/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java
   +++ b/lucene/core/src/java/org/apache/lucene/util/bkd/DocIdsWriter.java
   @@ -325,11 +325,14 @@ final class DocIdsWriter {
    
      private static void readInts24(IndexInput in, int count, IntersectVisitor 
visitor)
          throws IOException {
   +    long[] scratchLong = new long[(count/8) * 3];
   +    in.readLongs(scratchLong, 0, (count/8) * 3);
        int i;
        for (i = 0; i < count - 7; i += 8) {
   -      long l1 = in.readLong();
   -      long l2 = in.readLong();
   -      long l3 = in.readLong();
   +      int li = (i/8) * 3;
   +      long l1 = scratchLong[li];
   +      long l2 = scratchLong[li+1];
   +      long l3 = scratchLong[li+2];
          visitor.visit((int) (l1 >>> 40));
          visitor.visit((int) (l1 >>> 16) & 0xffffff);
          visitor.visit((int) (((l1 & 0xffff) << 8) | (l2 >>> 56)));
   ```
   
   Without this change:
   
   ```
   |                                                 Max Throughput |  range |  
      0.71 |  ops/s |                    
   |                                        50th percentile latency |  range |  
   245.533 |     ms |                    
   |                                        90th percentile latency |  range |  
   248.005 |     ms |                    
   |                                        99th percentile latency |  range |  
   254.824 |     ms |                                                           
                                                                                
   |                                       100th percentile latency |  range |  
   256.902 |     ms |                                                           
                                                                                
   |                                   50th percentile service time |  range |  
   243.585 |     ms |                    
   |                                   90th percentile service time |  range |  
   246.178 |     ms |                    
   |                                   99th percentile service time |  range |  
   252.672 |     ms |                    
   |                                  100th percentile service time |  range |  
   255.072 |     ms |                                                           
                                                                                
   |                                                     error rate |  range |  
         0 |      % | 
   
   ```
   
   With this change:
   
   ```
   |                                              Median Throughput |  range |  
       0.7 |  ops/s |
   |                                                 Max Throughput |  range |  
      0.71 |  ops/s |
   |                                        50th percentile latency |  range |  
   207.554 |     ms |
   |                                        90th percentile latency |  range |  
   209.392 |     ms |
   |                                        99th percentile latency |  range |  
   213.157 |     ms |
   |                                       100th percentile latency |  range |  
   219.398 |     ms |
   |                                   50th percentile service time |  range |  
   205.421 |     ms |
   |                                   90th percentile service time |  range |  
   207.361 |     ms |
   |                                   99th percentile service time |  range |  
   211.164 |     ms |
   |                                  100th percentile service time |  range |  
   217.787 |     ms |
   |                                                     error rate |  range |  
         0 |      % |
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[GitHub] [lucene] jainankitk opened a new issue, #12527: Optimize readInts24 performance for DocIdsWriter

Reply via email to