[PR] PostingsDecodingUtil: interchange loops to enable better memory access and SIMD vectorisation [lucene]

via GitHub Thu, 21 Aug 2025 22:22:15 -0700


RamakrishnaChilaka opened a new pull request, #15110:
URL: https://github.com/apache/lucene/pull/15110


   PostingsDecodingUtil: interchange loops to enable better memory access and 
SIMD vectorisation.
   
   ### Description
   
   Rewrite splitInts() so the innermost loop walks the array contiguously 
instead of strided manner. Giving chance to HotSpot C2 to vectorise the 
shift-and-mask operations and also improves cache locality.
   
   No functional change - the temporary buffer `b` is only accessed inside
   this method and its layout order is irrelevant to callers.
   
   ### Performance comparsion on intel i5-13600k
   Full JMH results are in the [benchmark 
repo](https://github.com/RamakrishnaChilaka/postings-decoding-util).
   
   Speed ups:
   ```
   Benchmark                                   (count)  (dec)   Mode  Cnt      
Score     Error   Units
   BitUnpackerBenchmark.baselineScalar              32      1  thrpt   25   
1667.831 _   5.508  ops/ms
   BitUnpackerBenchmark.baselineScalar              32      2  thrpt   25   
3007.441 _  54.965  ops/ms
   BitUnpackerBenchmark.baselineScalar              32      4  thrpt   25   
5837.677 _  50.676  ops/ms
   BitUnpackerBenchmark.baselineScalar              32      8  thrpt   25  
10565.109 _ 133.238  ops/ms
   BitUnpackerBenchmark.baselineScalar            1024      1  thrpt   25     
13.448 _   0.036  ops/ms
   BitUnpackerBenchmark.baselineScalar            1024      2  thrpt   25     
25.123 _   0.024  ops/ms
   BitUnpackerBenchmark.baselineScalar            1024      4  thrpt   25    
210.144 _   2.174  ops/ms
   BitUnpackerBenchmark.baselineScalar            1024      8  thrpt   25    
379.248 _   9.067  ops/ms
   BitUnpackerBenchmark.optimizedWithFastPath       32      1  thrpt   25   
3265.182 _  10.690  ops/ms
   BitUnpackerBenchmark.optimizedWithFastPath       32      2  thrpt   25   
6507.689 _  19.246  ops/ms
   BitUnpackerBenchmark.optimizedWithFastPath       32      4  thrpt   25  
12516.050 _  72.828  ops/ms
   BitUnpackerBenchmark.optimizedWithFastPath       32      8  thrpt   25  
25706.735 _ 105.216  ops/ms
   BitUnpackerBenchmark.optimizedWithFastPath     1024      1  thrpt   25    
132.753 _   0.223  ops/ms
   BitUnpackerBenchmark.optimizedWithFastPath     1024      2  thrpt   25    
293.396 _   0.527  ops/ms
   BitUnpackerBenchmark.optimizedWithFastPath     1024      4  thrpt   25    
597.617 _  32.484  ops/ms
   BitUnpackerBenchmark.optimizedWithFastPath     1024      8  thrpt   25   
1202.047 _  38.059  ops/ms
   ```
   
   Summarising:
   | count | dec | baseline | optimized | **factor** |
   | ----: | --: | -------: | --------: | ---------: |
   |    32 |   1 |    1 668 |     3 265 |  **1.96×** |
   |    32 |   2 |    3 007 |     6 508 |  **2.16×** |
   |    32 |   4 |    5 838 |    12 516 |  **2.14×** |
   |    32 |   8 |   10 565 |    25 707 |  **2.43×** |
   |  1024 |   1 |     13.4 |     132.8 |   **9.9×** |
   |  1024 |   2 |     25.1 |     293.4 |  **11.7×** |
   |  1024 |   4 |      210 |     597.6 |  **2.85×** |
   |  1024 |   8 |      379 |     1 202 |  **3.17×** |
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

[PR] PostingsDecodingUtil: interchange loops to enable better memory access and SIMD vectorisation [lucene]

Reply via email to