This is an automated email from the ASF dual-hosted git repository.

viirya pushed a commit to branch branch-4.x
in repository https://gitbox.apache.org/repos/asf/spark.git


The following commit(s) were added to refs/heads/branch-4.x by this push:
     new 9f7ab8058665 [SPARK-57024][SQL] Use bulk fill APIs to materialize RLE 
runs in Parquet vectorized reader
9f7ab8058665 is described below

commit 9f7ab80586655ba99340db973671544355361382
Author: Liang-Chi Hsieh <[email protected]>
AuthorDate: Wed May 27 00:49:48 2026 -0700

    [SPARK-57024][SQL] Use bulk fill APIs to materialize RLE runs in Parquet 
vectorized reader
    
    ### What changes were proposed in this pull request?
    
    `VectorizedRleValuesReader` materializes RLE runs of nulls and
    definition levels with degenerate per-element loops:
    
    ```java
    // VectorizedRleValuesReader.java
    for (int k = 0; k < runLen; k++) {
      nulls.putNull(valueOff + k);
    }
    for (int k = 0; k < runLen; k++) {
      defLevels.putInt(levelIdx + k, runValue);
    }
    ```
    
    `WritableColumnVector` already exposes the bulk equivalents
    `putNulls(rowId, count)` and `putInts(rowId, count, value)`. This PR
    switches the three caller sites to the bulk APIs, and reimplements the
    bulk APIs themselves (which were also degenerate loops) using JIT
    intrinsics:
    
    - `OnHeapColumnVector.putNulls` -> `Arrays.fill(byte[], ..., (byte) 1)`
    - `OnHeapColumnVector.putInts(rowId, count, value)` ->
      `Arrays.fill(int[], ..., value)`
    - `OffHeapColumnVector.putNulls` -> `Platform.setMemory(addr, (byte) 1, 
count)`
      with a small-count fallback to an inline byte loop
    
    `Arrays.fill` is backed by HotSpot's `_jbyte_fill` / `_jint_fill`
    intrinsic stubs and `Unsafe.setMemory` lowers to a native memset; both
    are faster than the byte/int loops they replace once `runLen` grows
    beyond a handful of elements.
    
    For `OffHeap.putNulls`, `Unsafe.setMemory` has a non-trivial JNI fixed
    cost, so it loses to the inline byte loop for very short fills (which
    are common in random null patterns). A threshold of 128 is used to pick
    between the two paths.
    
    ### Why are the changes needed?
    
    The bulk-fill APIs on `WritableColumnVector` were the obviously-correct
    calls to make in `VectorizedRleValuesReader`, but their implementations
    were not actually bulk — both the callers and the implementations have
    been small per-element loops.
    
    #### Caller-side (Parquet RLE materialization)
    
    Measured on Apple M4 Max + OpenJDK 21.0.8 using
    `VectorizedRleValuesReaderBenchmark` (Group C, "Nullable batch decode
    with def-level materialization", 1M rows, BATCH_SIZE=4096), ns/row:
    
    | nullRatio | shape     | baseline | patched | delta  |
    | --------- | --------- | -------: | ------: | -----: |
    | 0.1       | random    | 4.0      | 4.2     | noise  |
    | 0.1       | clustered | 2.8      | 2.7     | +4%    |
    | 0.3       | random    | 6.2      | 6.3     | noise  |
    | 0.3       | clustered | 2.8      | 2.7     | +4%    |
    | 0.5       | random    | 7.1      | 7.1     | 0%     |
    | 0.5       | clustered | 2.8      | 2.6     | +7%    |
    | 0.9       | random    | 3.9      | 3.5     | +10%   |
    | 0.9       | clustered | 2.6      | 2.3     | +12%   |
    
    Gains concentrate on clustered null patterns (long RLE runs), which are
    common in real workloads — sparse dimension columns, ETL-staged nulls,
    time-bucketed missing metrics. Random null patterns produce short runs
    where the bulk-API call cost matches the original loop, hence the
    no-op-to-noise band there.
    
    #### Implementation-side (OffHeap putNulls)
    
    A separate micro-benchmark of `OffHeapColumnVector.putNulls` (run via
    `WritableColumnVectorBulkFillBenchmark`, not included in this PR) shows
    the threshold matters: a naive unconditional `Platform.setMemory`
    regresses small-count fills (`count <= 64`) by up to 7x against the
    original byte loop due to JNI fixed cost, while the count=4096+ path
    gains ~10x. The 128-element threshold picks the right path for both
    regimes; the crossover on the benchmarked hardware sits between 64 and
    512, so 128 is conservative.
    
    ### Does this PR introduce _any_ user-facing change?
    
    No.
    
    ### How was this patch tested?
    
    Existing tests; no behavior change. Ran locally:
    
    - `VectorizedRleValuesReaderSuite` (covers the modified caller paths)
    - `ColumnVectorSuite` and `ColumnarBatchSuite` (cover the modified
      `OnHeap/OffHeapColumnVector.putNulls` / `putInts` bulk APIs)
    - `ParquetIOSuite` (end-to-end vectorized reader coverage)
    
    237 tests, all pass.
    
    Benchmark numbers above produced by the existing
    `VectorizedRleValuesReaderBenchmark` (no benchmark changes in this PR).
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    Generated-by: Claude Code (Claude Opus 4.7)
    
    Closes #56072 from viirya/SPARK-57024.
    
    Authored-by: Liang-Chi Hsieh <[email protected]>
    Signed-off-by: Liang-Chi Hsieh <[email protected]>
    (cherry picked from commit febc98790605d7353d8fc254b54f622d85053a63)
    Signed-off-by: Liang-Chi Hsieh <[email protected]>
---
 ...rizedRleValuesReaderBenchmark-jdk21-results.txt | 180 ++++++++++-----------
 ...rizedRleValuesReaderBenchmark-jdk25-results.txt | 180 ++++++++++-----------
 .../VectorizedRleValuesReaderBenchmark-results.txt | 150 ++++++++---------
 .../parquet/VectorizedRleValuesReader.java         |  12 +-
 .../execution/vectorized/OffHeapColumnVector.java  |  21 ++-
 .../execution/vectorized/OnHeapColumnVector.java   |   8 +-
 6 files changed, 273 insertions(+), 278 deletions(-)

diff --git 
a/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-jdk21-results.txt 
b/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-jdk21-results.txt
index cb53e9dd5b2a..b5f6d62d2b6b 100644
--- a/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-jdk21-results.txt
+++ b/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-jdk21-results.txt
@@ -2,153 +2,153 @@
 Boolean decode
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 9V74 80-Core Processor
 RLE readBooleans decode:                  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-cold reader, trueRatio=0.0                            0              0         
  0      59466.7           0.0       1.0X
-reused reader, trueRatio=0.0                          0              0         
  0      82871.7           0.0       1.4X
-cold reader, trueRatio=0.1                            1              1         
  0        744.9           1.3       0.0X
-reused reader, trueRatio=0.1                          1              1         
  0        746.1           1.3       0.0X
-cold reader, trueRatio=0.5                            1              1         
  0        826.2           1.2       0.0X
-reused reader, trueRatio=0.5                          1              1         
  0        828.1           1.2       0.0X
-cold reader, trueRatio=0.9                            1              1         
  0        743.5           1.3       0.0X
-reused reader, trueRatio=0.9                          1              1         
  0        738.4           1.4       0.0X
-cold reader, trueRatio=1.0                            0              0         
  0      82409.3           0.0       1.4X
-reused reader, trueRatio=1.0                          0              0         
  0      82871.7           0.0       1.4X
+cold reader, trueRatio=0.0                            0              0         
  0       4886.6           0.2       1.0X
+reused reader, trueRatio=0.0                          0              0         
  0       4857.7           0.2       1.0X
+cold reader, trueRatio=0.1                            1              1         
  0       1186.3           0.8       0.2X
+reused reader, trueRatio=0.1                          1              1         
  0        789.7           1.3       0.2X
+cold reader, trueRatio=0.5                            1              1         
  0       1335.3           0.7       0.3X
+reused reader, trueRatio=0.5                          1              1         
  0        855.2           1.2       0.2X
+cold reader, trueRatio=0.9                            1              1         
  0       1186.3           0.8       0.2X
+reused reader, trueRatio=0.9                          1              1         
  0        787.6           1.3       0.2X
+cold reader, trueRatio=1.0                            0              0         
  0       4064.6           0.2       0.8X
+reused reader, trueRatio=1.0                          0              0         
  0       4855.5           0.2       1.0X
 
 
 
================================================================================================
 Integer decode
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 9V74 80-Core Processor
 RLE readIntegers dictionary-id decode:    Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-PACKED cold, bitWidth=4                               2              2         
  0        489.7           2.0       1.0X
-PACKED reused, bitWidth=4                             2              2         
  0        487.7           2.1       1.0X
-RLE, bitWidth=4                                       0              0         
  0       4506.8           0.2       9.2X
-PACKED cold, bitWidth=8                               2              2         
  0        524.2           1.9       1.1X
-PACKED reused, bitWidth=8                             2              2         
  0        524.6           1.9       1.1X
-RLE, bitWidth=8                                       0              0         
  0       4507.0           0.2       9.2X
-PACKED cold, bitWidth=12                              3              3         
  0        417.6           2.4       0.9X
-PACKED reused, bitWidth=12                            3              3         
  0        415.4           2.4       0.8X
-RLE, bitWidth=12                                      0              0         
  0       4507.2           0.2       9.2X
-PACKED cold, bitWidth=20                              3              3         
  0        351.9           2.8       0.7X
-PACKED reused, bitWidth=20                            3              3         
  0        349.4           2.9       0.7X
-RLE, bitWidth=20                                      0              0         
  0       4499.6           0.2       9.2X
+PACKED cold, bitWidth=4                               2              2         
  0        542.4           1.8       1.0X
+PACKED reused, bitWidth=4                             2              2         
  0        541.2           1.8       1.0X
+RLE, bitWidth=4                                       0              0         
  0      23543.9           0.0      43.4X
+PACKED cold, bitWidth=8                               2              2         
  0        621.3           1.6       1.1X
+PACKED reused, bitWidth=8                             2              2         
  0        618.1           1.6       1.1X
+RLE, bitWidth=8                                       0              0         
  0      23511.7           0.0      43.3X
+PACKED cold, bitWidth=12                              2              2         
  0        482.5           2.1       0.9X
+PACKED reused, bitWidth=12                            2              2         
  0        480.5           2.1       0.9X
+RLE, bitWidth=12                                      0              0         
  0      23507.0           0.0      43.3X
+PACKED cold, bitWidth=20                              3              3         
  0        401.9           2.5       0.7X
+PACKED reused, bitWidth=20                            3              3         
  0        400.6           2.5       0.7X
+RLE, bitWidth=20                                      0              0         
  0      23570.4           0.0      43.5X
 
 
 
================================================================================================
 Nullable batch decode with def-level materialization
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 9V74 80-Core Processor
 Nullable batch with def-levels:           Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, n/a                                    0              0         
  0       6695.3           0.1       1.0X
-nullRatio=0.1, random                                 9              9         
  0        123.2           8.1       0.0X
-nullRatio=0.1, clustered                              6              6         
  1        174.1           5.7       0.0X
-nullRatio=0.3, random                                12             12         
  0         85.3          11.7       0.0X
-nullRatio=0.3, clustered                              6              6         
  0        172.7           5.8       0.0X
-nullRatio=0.5, random                                14             14         
  0         76.5          13.1       0.0X
-nullRatio=0.5, clustered                              6              6         
  0        173.6           5.8       0.0X
-nullRatio=0.9, random                                 8              8         
  0        132.0           7.6       0.0X
-nullRatio=0.9, clustered                              6              6         
  0        182.4           5.5       0.0X
-nullRatio=1.0, random                                 0              0         
  0       5048.8           0.2       0.8X
+nullRatio=0.0, n/a                                    0              0         
  0       8154.1           0.1       1.0X
+nullRatio=0.1, random                                 7              8         
  0        140.5           7.1       0.0X
+nullRatio=0.1, clustered                              5              5         
  0        204.1           4.9       0.0X
+nullRatio=0.3, random                                11             11         
  0         96.0          10.4       0.0X
+nullRatio=0.3, clustered                              5              5         
  0        207.8           4.8       0.0X
+nullRatio=0.5, random                                12             12         
  0         87.4          11.4       0.0X
+nullRatio=0.5, clustered                              5              5         
  0        213.6           4.7       0.0X
+nullRatio=0.9, random                                 6              7         
  0        162.1           6.2       0.0X
+nullRatio=0.9, clustered                              4              5         
  0        235.1           4.3       0.0X
+nullRatio=1.0, random                                 0              0         
  0      22458.3           0.0       2.8X
 
 
 
================================================================================================
 Nullable batch decode without def-level materialization
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 9V74 80-Core Processor
 Nullable batch without def-levels:        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, n/a                                    0              0         
  0      12199.7           0.1       1.0X
-nullRatio=0.1, random                                 7              7         
  0        147.8           6.8       0.0X
-nullRatio=0.1, clustered                              5              5         
  0        204.6           4.9       0.0X
-nullRatio=0.3, random                                10             10         
  0        100.8           9.9       0.0X
-nullRatio=0.3, clustered                              5              5         
  0        200.6           5.0       0.0X
-nullRatio=0.5, random                                12             12         
  0         89.4          11.2       0.0X
-nullRatio=0.5, clustered                              5              5         
  0        199.3           5.0       0.0X
-nullRatio=0.9, random                                 7              7         
  0        153.3           6.5       0.0X
-nullRatio=0.9, clustered                              5              5         
  0        202.2           4.9       0.0X
-nullRatio=1.0, random                                 0              0         
  0      11887.9           0.1       1.0X
+nullRatio=0.0, n/a                                    0              0         
  0      12686.2           0.1       1.0X
+nullRatio=0.1, random                                 6              6         
  0        172.3           5.8       0.0X
+nullRatio=0.1, clustered                              4              4         
  0        251.2           4.0       0.0X
+nullRatio=0.3, random                                 9              9         
  0        115.5           8.7       0.0X
+nullRatio=0.3, clustered                              4              4         
  0        253.9           3.9       0.0X
+nullRatio=0.5, random                                10             10         
  0        105.1           9.5       0.0X
+nullRatio=0.5, clustered                              4              4         
  0        259.7           3.9       0.0X
+nullRatio=0.9, random                                 5              5         
  0        198.9           5.0       0.0X
+nullRatio=0.9, clustered                              4              4         
  0        282.6           3.5       0.0X
+nullRatio=1.0, random                                 0              0         
  0      96058.6           0.0       7.6X
 
 
 
================================================================================================
 Nullable batch decode with row-index filtering (with def-levels)
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 9V74 80-Core Processor
 Nullable batch with def-levels, row-index filtered:  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
----------------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, contiguous 50%                                   1              
1           0        757.3           1.3       1.0X
-nullRatio=0.3, contiguous 50%                                   9              
9           0        119.6           8.4       0.2X
-nullRatio=0.9, contiguous 50%                                   7              
7           0        158.9           6.3       0.2X
-nullRatio=0.0, alt 1000-row windows                             3              
3           0        377.7           2.6       0.5X
-nullRatio=0.3, alt 1000-row windows                            10             
10           0        102.3           9.8       0.1X
-nullRatio=0.9, alt 1000-row windows                             8              
8           1        130.9           7.6       0.2X
+nullRatio=0.0, contiguous 50%                                   1              
1           0       1247.7           0.8       1.0X
+nullRatio=0.3, contiguous 50%                                   8              
8           0        135.1           7.4       0.1X
+nullRatio=0.9, contiguous 50%                                   5              
6           0        191.0           5.2       0.2X
+nullRatio=0.0, alt 1000-row windows                             2              
2           0        433.2           2.3       0.3X
+nullRatio=0.3, alt 1000-row windows                             9              
9           0        113.6           8.8       0.1X
+nullRatio=0.9, alt 1000-row windows                             7              
7           0        150.4           6.6       0.1X
 
 
 
================================================================================================
 Nullable batch decode with row-index filtering (without def-levels)
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 9V74 80-Core Processor
 Nullable batch without def-levels, row-index filtered:  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
-------------------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, contiguous 50%                                      1           
   2           0        767.0           1.3       1.0X
-nullRatio=0.3, contiguous 50%                                      8           
   8           0        129.1           7.7       0.2X
-nullRatio=0.9, contiguous 50%                                      6           
   7           0        166.0           6.0       0.2X
-nullRatio=0.0, alt 1000-row windows                                3           
   3           0        377.2           2.7       0.5X
-nullRatio=0.3, alt 1000-row windows                               10           
  10           0        109.0           9.2       0.1X
-nullRatio=0.9, alt 1000-row windows                                8           
   8           0        137.5           7.3       0.2X
+nullRatio=0.0, contiguous 50%                                      1           
   1           0       1004.6           1.0       1.0X
+nullRatio=0.3, contiguous 50%                                      7           
   7           1        154.7           6.5       0.2X
+nullRatio=0.9, contiguous 50%                                      5           
   5           0        214.0           4.7       0.2X
+nullRatio=0.0, alt 1000-row windows                                2           
   2           0        464.0           2.2       0.5X
+nullRatio=0.3, alt 1000-row windows                                8           
   8           0        130.3           7.7       0.1X
+nullRatio=0.9, alt 1000-row windows                                6           
   6           0        169.6           5.9       0.2X
 
 
 
================================================================================================
 Single-value reads
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 9V74 80-Core Processor
 Single-value reads:                       Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-readBoolean                                           3              3         
  0        311.5           3.2       1.0X
-readInteger, bitWidth=4                               4              4         
  0        275.7           3.6       0.9X
-readValueDictionaryId, bitWidth=4                     4              4         
  0        276.2           3.6       0.9X
-readInteger, bitWidth=8                               4              4         
  0        289.2           3.5       0.9X
-readValueDictionaryId, bitWidth=8                     4              4         
  0        289.8           3.5       0.9X
-readInteger, bitWidth=12                              4              4         
  0        252.3           4.0       0.8X
-readValueDictionaryId, bitWidth=12                    4              4         
  0        252.1           4.0       0.8X
-readInteger, bitWidth=20                              5              5         
  0        227.7           4.4       0.7X
-readValueDictionaryId, bitWidth=20                    5              5         
  0        227.2           4.4       0.7X
+readBoolean                                           3              3         
  0        361.8           2.8       1.0X
+readInteger, bitWidth=4                               3              3         
  0        314.3           3.2       0.9X
+readValueDictionaryId, bitWidth=4                     3              3         
  0        315.3           3.2       0.9X
+readInteger, bitWidth=8                               3              3         
  0        339.9           2.9       0.9X
+readValueDictionaryId, bitWidth=8                     3              3         
  0        339.8           2.9       0.9X
+readInteger, bitWidth=12                              4              4         
  0        293.8           3.4       0.8X
+readValueDictionaryId, bitWidth=12                    4              4         
  0        293.9           3.4       0.8X
+readInteger, bitWidth=20                              4              4         
  0        262.1           3.8       0.7X
+readValueDictionaryId, bitWidth=20                    4              4         
  0        261.9           3.8       0.7X
 
 
 
================================================================================================
 Skip
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 9V74 80-Core Processor
 Skip:                                     Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-skipBooleans, trueRatio=0.0                           0              0         
  0   26214400.0           0.0       1.0X
-skipBooleans, trueRatio=0.5                           2              2         
  0        559.1           1.8       0.0X
-skipBooleans, trueRatio=1.0                           0              0         
  0   26214400.0           0.0       1.0X
-skipIntegers PACKED, bitWidth=4                       2              2         
  0        502.4           2.0       0.0X
-skipIntegers RLE, bitWidth=4                          0              0         
  0   21399510.2           0.0       0.8X
-skipIntegers PACKED, bitWidth=8                       2              2         
  0        551.4           1.8       0.0X
-skipIntegers RLE, bitWidth=8                          0              0         
  0   21399510.2           0.0       0.8X
-skipIntegers PACKED, bitWidth=12                      2              2         
  0        431.5           2.3       0.0X
-skipIntegers RLE, bitWidth=12                         0              0         
  0   21399510.2           0.0       0.8X
-skipIntegers PACKED, bitWidth=20                      3              3         
  0        364.1           2.7       0.0X
-skipIntegers RLE, bitWidth=20                         0              0         
  0   21399510.2           0.0       0.8X
+skipBooleans, trueRatio=0.0                           0              0         
  0   34952533.3           0.0       1.0X
+skipBooleans, trueRatio=0.5                           2              2         
  0        662.2           1.5       0.0X
+skipBooleans, trueRatio=1.0                           0              0         
  0   34952533.3           0.0       1.0X
+skipIntegers PACKED, bitWidth=4                       2              2         
  0        553.8           1.8       0.0X
+skipIntegers RLE, bitWidth=4                          0              0         
  0   34952533.3           0.0       1.0X
+skipIntegers PACKED, bitWidth=8                       2              2         
  0        637.9           1.6       0.0X
+skipIntegers RLE, bitWidth=8                          0              0         
  0   34952533.3           0.0       1.0X
+skipIntegers PACKED, bitWidth=12                      2              2         
  0        493.4           2.0       0.0X
+skipIntegers RLE, bitWidth=12                         0              0         
  0   26214400.0           0.0       0.8X
+skipIntegers PACKED, bitWidth=20                      3              3         
  0        415.6           2.4       0.0X
+skipIntegers RLE, bitWidth=20                         0              0         
  0   26214400.0           0.0       0.8X
 
 
diff --git 
a/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-jdk25-results.txt 
b/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-jdk25-results.txt
index 3029a4b3268b..860fafe62b1e 100644
--- a/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-jdk25-results.txt
+++ b/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-jdk25-results.txt
@@ -2,153 +2,153 @@
 Boolean decode
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1011-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
 RLE readBooleans decode:                  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-cold reader, trueRatio=0.0                            0              0         
  0       5323.3           0.2       1.0X
-reused reader, trueRatio=0.0                          0              0         
  0       4464.5           0.2       0.8X
-cold reader, trueRatio=0.1                            2              2         
  0        669.3           1.5       0.1X
-reused reader, trueRatio=0.1                          2              2         
  0        672.6           1.5       0.1X
-cold reader, trueRatio=0.5                            1              1         
  0        722.5           1.4       0.1X
-reused reader, trueRatio=0.5                          1              1         
  0        724.3           1.4       0.1X
-cold reader, trueRatio=0.9                            2              2         
  0        669.6           1.5       0.1X
-reused reader, trueRatio=0.9                          2              2         
  0        674.0           1.5       0.1X
-cold reader, trueRatio=1.0                            0              0         
  0       4574.1           0.2       0.9X
-reused reader, trueRatio=1.0                          0              0         
  0       4460.9           0.2       0.8X
+cold reader, trueRatio=0.0                            0              0         
  0      64903.2           0.0       1.0X
+reused reader, trueRatio=0.0                          0              0         
  0      63220.5           0.0       1.0X
+cold reader, trueRatio=0.1                            1              1         
  0        830.6           1.2       0.0X
+reused reader, trueRatio=0.1                          1              1         
  0        789.9           1.3       0.0X
+cold reader, trueRatio=0.5                            1              1         
  0        927.6           1.1       0.0X
+reused reader, trueRatio=0.5                          1              1         
  0        905.4           1.1       0.0X
+cold reader, trueRatio=0.9                            1              1         
  0        831.8           1.2       0.0X
+reused reader, trueRatio=0.9                          1              1         
  0        833.2           1.2       0.0X
+cold reader, trueRatio=1.0                            0              0         
  0      64176.3           0.0       1.0X
+reused reader, trueRatio=1.0                          0              0         
  0      62706.4           0.0       1.0X
 
 
 
================================================================================================
 Integer decode
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1011-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
 RLE readIntegers dictionary-id decode:    Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-PACKED cold, bitWidth=4                               2              2         
  0        516.7           1.9       1.0X
-PACKED reused, bitWidth=4                             2              2         
  0        516.1           1.9       1.0X
-RLE, bitWidth=4                                       0              0         
  0      18696.2           0.1      36.2X
-PACKED cold, bitWidth=8                               2              2         
  0        570.0           1.8       1.1X
-PACKED reused, bitWidth=8                             2              2         
  0        567.0           1.8       1.1X
-RLE, bitWidth=8                                       0              0         
  0      18583.5           0.1      36.0X
-PACKED cold, bitWidth=12                              2              2         
  0        454.6           2.2       0.9X
-PACKED reused, bitWidth=12                            2              2         
  0        452.6           2.2       0.9X
-RLE, bitWidth=12                                      0              0         
  0      18696.2           0.1      36.2X
-PACKED cold, bitWidth=20                              3              3         
  0        373.2           2.7       0.7X
-PACKED reused, bitWidth=20                            3              3         
  0        369.4           2.7       0.7X
-RLE, bitWidth=20                                      0              0         
  0      15516.8           0.1      30.0X
+PACKED cold, bitWidth=4                               2              2         
  0        534.3           1.9       1.0X
+PACKED reused, bitWidth=4                             2              2         
  0        522.1           1.9       1.0X
+RLE, bitWidth=4                                       0              0         
  0       7632.3           0.1      14.3X
+PACKED cold, bitWidth=8                               2              2         
  0        584.0           1.7       1.1X
+PACKED reused, bitWidth=8                             2              2         
  0        579.4           1.7       1.1X
+RLE, bitWidth=8                                       0              0         
  0       7615.7           0.1      14.3X
+PACKED cold, bitWidth=12                              2              2         
  0        443.8           2.3       0.8X
+PACKED reused, bitWidth=12                            2              3         
  0        436.4           2.3       0.8X
+RLE, bitWidth=12                                      0              0         
  0       7589.5           0.1      14.2X
+PACKED cold, bitWidth=20                              3              3         
  0        378.5           2.6       0.7X
+PACKED reused, bitWidth=20                            3              3         
  0        382.8           2.6       0.7X
+RLE, bitWidth=20                                      0              0         
  0       7590.5           0.1      14.2X
 
 
 
================================================================================================
 Nullable batch decode with def-level materialization
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1011-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
 Nullable batch with def-levels:           Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, n/a                                    0              0         
  0       6608.2           0.2       1.0X
-nullRatio=0.1, random                                 9              9         
  0        119.1           8.4       0.0X
-nullRatio=0.1, clustered                              6              6         
  0        166.2           6.0       0.0X
-nullRatio=0.3, random                                13             13         
  0         81.2          12.3       0.0X
-nullRatio=0.3, clustered                              6              6         
  0        166.0           6.0       0.0X
-nullRatio=0.5, random                                15             15         
  1         71.3          14.0       0.0X
-nullRatio=0.5, clustered                              6              6         
  0        166.6           6.0       0.0X
-nullRatio=0.9, random                                 8              8         
  0        127.6           7.8       0.0X
-nullRatio=0.9, clustered                              6              6         
  0        175.5           5.7       0.0X
-nullRatio=1.0, random                                 0              0         
  0       8275.6           0.1       1.3X
+nullRatio=0.0, n/a                                    0              0         
  0       5285.1           0.2       1.0X
+nullRatio=0.1, random                                 9              9         
  0        113.1           8.8       0.0X
+nullRatio=0.1, clustered                              6              6         
  0        173.0           5.8       0.0X
+nullRatio=0.3, random                                14             14         
  0         75.5          13.2       0.0X
+nullRatio=0.3, clustered                              6              6         
  0        173.1           5.8       0.0X
+nullRatio=0.5, random                                15             16         
  0         67.7          14.8       0.0X
+nullRatio=0.5, clustered                              6              6         
  0        176.2           5.7       0.0X
+nullRatio=0.9, random                                 8              8         
  0        128.6           7.8       0.0X
+nullRatio=0.9, clustered                              5              6         
  0        196.2           5.1       0.0X
+nullRatio=1.0, random                                 0              0         
  0      35936.0           0.0       6.8X
 
 
 
================================================================================================
 Nullable batch decode without def-level materialization
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1011-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
 Nullable batch without def-levels:        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, n/a                                    0              0         
  0      11464.6           0.1       1.0X
-nullRatio=0.1, random                                 7              7         
  0        148.7           6.7       0.0X
-nullRatio=0.1, clustered                              5              5         
  0        207.2           4.8       0.0X
-nullRatio=0.3, random                                10             10         
  0        100.6           9.9       0.0X
-nullRatio=0.3, clustered                              5              5         
  0        204.8           4.9       0.0X
-nullRatio=0.5, random                                12             12         
  0         90.6          11.0       0.0X
-nullRatio=0.5, clustered                              5              5         
  0        205.1           4.9       0.0X
-nullRatio=0.9, random                                 7              7         
  0        158.9           6.3       0.0X
-nullRatio=0.9, clustered                              5              5         
  0        212.4           4.7       0.0X
-nullRatio=1.0, random                                 0              0         
  0      11983.2           0.1       1.0X
+nullRatio=0.0, n/a                                    0              0         
  0       8150.4           0.1       1.0X
+nullRatio=0.1, random                                 7              7         
  0        147.0           6.8       0.0X
+nullRatio=0.1, clustered                              5              5         
  0        218.2           4.6       0.0X
+nullRatio=0.3, random                                11             11         
  0         98.0          10.2       0.0X
+nullRatio=0.3, clustered                              5              5         
  1        218.0           4.6       0.0X
+nullRatio=0.5, random                                12             12         
  0         90.0          11.1       0.0X
+nullRatio=0.5, clustered                              5              5         
  0        221.1           4.5       0.0X
+nullRatio=0.9, random                                 6              6         
  0        167.0           6.0       0.0X
+nullRatio=0.9, clustered                              4              5         
  0        237.7           4.2       0.0X
+nullRatio=1.0, random                                 0              0         
  0     115647.5           0.0      14.2X
 
 
 
================================================================================================
 Nullable batch decode with row-index filtering (with def-levels)
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1011-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
 Nullable batch with def-levels, row-index filtered:  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
----------------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, contiguous 50%                                   1              
1           0        762.9           1.3       1.0X
-nullRatio=0.3, contiguous 50%                                   9              
9           0        116.5           8.6       0.2X
-nullRatio=0.9, contiguous 50%                                   7              
7           0        156.2           6.4       0.2X
-nullRatio=0.0, alt 1000-row windows                             2              
2           0        433.5           2.3       0.6X
-nullRatio=0.3, alt 1000-row windows                            10             
10           0        103.5           9.7       0.1X
-nullRatio=0.9, alt 1000-row windows                             8              
8           1        136.1           7.3       0.2X
+nullRatio=0.0, contiguous 50%                                   1              
1           0        817.3           1.2       1.0X
+nullRatio=0.3, contiguous 50%                                   9             
10           0        111.1           9.0       0.1X
+nullRatio=0.9, contiguous 50%                                   7              
7           0        159.5           6.3       0.2X
+nullRatio=0.0, alt 1000-row windows                             3              
3           0        327.9           3.1       0.4X
+nullRatio=0.3, alt 1000-row windows                            12             
12           1         90.9          11.0       0.1X
+nullRatio=0.9, alt 1000-row windows                             9              
9           0        119.7           8.4       0.1X
 
 
 
================================================================================================
 Nullable batch decode with row-index filtering (without def-levels)
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1011-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
 Nullable batch without def-levels, row-index filtered:  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
-------------------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, contiguous 50%                                      1           
   2           0        732.5           1.4       1.0X
-nullRatio=0.3, contiguous 50%                                      8           
   8           0        131.7           7.6       0.2X
-nullRatio=0.9, contiguous 50%                                      6           
   6           0        173.9           5.8       0.2X
-nullRatio=0.0, alt 1000-row windows                                2           
   2           0        423.8           2.4       0.6X
-nullRatio=0.3, alt 1000-row windows                                9           
   9           0        115.8           8.6       0.2X
-nullRatio=0.9, alt 1000-row windows                                7           
   7           0        147.6           6.8       0.2X
+nullRatio=0.0, contiguous 50%                                      1           
   1           0        842.7           1.2       1.0X
+nullRatio=0.3, contiguous 50%                                      8           
   8           0        128.2           7.8       0.2X
+nullRatio=0.9, contiguous 50%                                      6           
   6           0        181.8           5.5       0.2X
+nullRatio=0.0, alt 1000-row windows                                3           
   3           0        331.8           3.0       0.4X
+nullRatio=0.3, alt 1000-row windows                               10           
  11           0        102.5           9.8       0.1X
+nullRatio=0.9, alt 1000-row windows                                8           
   8           0        134.2           7.4       0.2X
 
 
 
================================================================================================
 Single-value reads
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1011-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
 Single-value reads:                       Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-readBoolean                                           3              3         
  0        321.6           3.1       1.0X
-readInteger, bitWidth=4                               3              4         
  0        305.3           3.3       0.9X
-readValueDictionaryId, bitWidth=4                     3              4         
  0        305.0           3.3       0.9X
-readInteger, bitWidth=8                               3              3         
  0        322.7           3.1       1.0X
-readValueDictionaryId, bitWidth=8                     3              3         
  0        322.7           3.1       1.0X
-readInteger, bitWidth=12                              4              4         
  0        282.4           3.5       0.9X
-readValueDictionaryId, bitWidth=12                    4              4         
  0        282.4           3.5       0.9X
-readInteger, bitWidth=20                              4              4         
  0        248.5           4.0       0.8X
-readValueDictionaryId, bitWidth=20                    4              4         
  0        248.4           4.0       0.8X
+readBoolean                                           4              4         
  0        263.6           3.8       1.0X
+readInteger, bitWidth=4                               3              4         
  0        301.6           3.3       1.1X
+readValueDictionaryId, bitWidth=4                     3              4         
  0        300.4           3.3       1.1X
+readInteger, bitWidth=8                               3              4         
  0        314.9           3.2       1.2X
+readValueDictionaryId, bitWidth=8                     3              3         
  0        315.7           3.2       1.2X
+readInteger, bitWidth=12                              4              4         
  0        276.9           3.6       1.1X
+readValueDictionaryId, bitWidth=12                    4              4         
  0        275.9           3.6       1.0X
+readInteger, bitWidth=20                              4              4         
  0        249.3           4.0       0.9X
+readValueDictionaryId, bitWidth=20                    4              4         
  0        247.9           4.0       0.9X
 
 
 
================================================================================================
 Skip
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1011-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
 Skip:                                     Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-skipBooleans, trueRatio=0.0                           0              0         
  0   26214400.0           0.0       1.0X
-skipBooleans, trueRatio=0.5                           2              2         
  0        621.6           1.6       0.0X
-skipBooleans, trueRatio=1.0                           0              0         
  0   26214400.0           0.0       1.0X
-skipIntegers PACKED, bitWidth=4                       2              2         
  0        537.5           1.9       0.0X
-skipIntegers RLE, bitWidth=4                          0              0         
  0   26214400.0           0.0       1.0X
-skipIntegers PACKED, bitWidth=8                       2              2         
  0        599.4           1.7       0.0X
-skipIntegers RLE, bitWidth=8                          0              0         
  0   26214400.0           0.0       1.0X
-skipIntegers PACKED, bitWidth=12                      2              2         
  0        471.2           2.1       0.0X
-skipIntegers RLE, bitWidth=12                         0              0         
  0   21399510.2           0.0       0.8X
-skipIntegers PACKED, bitWidth=20                      3              3         
  0        384.7           2.6       0.0X
-skipIntegers RLE, bitWidth=20                         0              0         
  0   21399510.2           0.0       0.8X
+skipBooleans, trueRatio=0.0                           0              0         
  0   29959314.3           0.0       1.0X
+skipBooleans, trueRatio=0.5                           2              2         
  0        611.9           1.6       0.0X
+skipBooleans, trueRatio=1.0                           0              0         
  0   29959314.3           0.0       1.0X
+skipIntegers PACKED, bitWidth=4                       2              2         
  0        542.6           1.8       0.0X
+skipIntegers RLE, bitWidth=4                          0              0         
  0   29959314.3           0.0       1.0X
+skipIntegers PACKED, bitWidth=8                       2              2         
  0        590.8           1.7       0.0X
+skipIntegers RLE, bitWidth=8                          0              0         
  0   29959314.3           0.0       1.0X
+skipIntegers PACKED, bitWidth=12                      2              2         
  0        467.9           2.1       0.0X
+skipIntegers RLE, bitWidth=12                         0              0         
  0   26214400.0           0.0       0.9X
+skipIntegers PACKED, bitWidth=20                      3              3         
  0        404.2           2.5       0.0X
+skipIntegers RLE, bitWidth=20                         0              0         
  0   23831272.7           0.0       0.8X
 
 
diff --git a/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-results.txt 
b/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-results.txt
index 749296283bd3..2ec7742a9d13 100644
--- a/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-results.txt
+++ b/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-results.txt
@@ -2,113 +2,113 @@
 Boolean decode
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
 AMD EPYC 7763 64-Core Processor
 RLE readBooleans decode:                  Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-cold reader, trueRatio=0.0                            0              0         
  0      66239.8           0.0       1.0X
-reused reader, trueRatio=0.0                          0              0         
  0      57887.6           0.0       0.9X
-cold reader, trueRatio=0.1                            1              1         
  0        893.5           1.1       0.0X
-reused reader, trueRatio=0.1                          1              1         
  0        895.6           1.1       0.0X
-cold reader, trueRatio=0.5                            1              1         
  0       1018.7           1.0       0.0X
-reused reader, trueRatio=0.5                          1              1         
  0       1029.4           1.0       0.0X
-cold reader, trueRatio=0.9                            1              1         
  0        891.9           1.1       0.0X
-reused reader, trueRatio=0.9                          1              1         
  0        894.8           1.1       0.0X
-cold reader, trueRatio=1.0                            0              0         
  0      67001.7           0.0       1.0X
-reused reader, trueRatio=1.0                          0              0         
  0      72380.5           0.0       1.1X
+cold reader, trueRatio=0.0                            0              0         
  0      25842.3           0.0       1.0X
+reused reader, trueRatio=0.0                          0              0         
  0      25810.5           0.0       1.0X
+cold reader, trueRatio=0.1                            2              2         
  0        485.6           2.1       0.0X
+reused reader, trueRatio=0.1                          2              2         
  0        485.6           2.1       0.0X
+cold reader, trueRatio=0.5                            2              2         
  0        500.3           2.0       0.0X
+reused reader, trueRatio=0.5                          2              2         
  0        483.7           2.1       0.0X
+cold reader, trueRatio=0.9                            2              2         
  0        484.3           2.1       0.0X
+reused reader, trueRatio=0.9                          2              2         
  0        484.7           2.1       0.0X
+cold reader, trueRatio=1.0                            0              0         
  0      25804.1           0.0       1.0X
+reused reader, trueRatio=1.0                          0              0         
  0      25791.4           0.0       1.0X
 
 
 
================================================================================================
 Integer decode
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
 AMD EPYC 7763 64-Core Processor
 RLE readIntegers dictionary-id decode:    Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-PACKED cold, bitWidth=4                               2              2         
  0        505.6           2.0       1.0X
-PACKED reused, bitWidth=4                             2              2         
  0        504.6           2.0       1.0X
-RLE, bitWidth=4                                       0              0         
  0      18249.1           0.1      36.1X
-PACKED cold, bitWidth=8                               2              2         
  0        497.6           2.0       1.0X
-PACKED reused, bitWidth=8                             2              2         
  0        496.2           2.0       1.0X
-RLE, bitWidth=8                                       0              0         
  0      18123.0           0.1      35.8X
-PACKED cold, bitWidth=12                              3              3         
  0        370.2           2.7       0.7X
-PACKED reused, bitWidth=12                            3              3         
  0        369.6           2.7       0.7X
-RLE, bitWidth=12                                      0              0         
  0      18573.3           0.1      36.7X
-PACKED cold, bitWidth=20                              3              3         
  0        315.1           3.2       0.6X
-PACKED reused, bitWidth=20                            3              3         
  0        316.2           3.2       0.6X
-RLE, bitWidth=20                                      0              0         
  0      18570.0           0.1      36.7X
+PACKED cold, bitWidth=4                               2              2         
  0        485.7           2.1       1.0X
+PACKED reused, bitWidth=4                             2              2         
  0        485.2           2.1       1.0X
+RLE, bitWidth=4                                       0              0         
  0      20688.1           0.0      42.6X
+PACKED cold, bitWidth=8                               2              2         
  0        478.8           2.1       1.0X
+PACKED reused, bitWidth=8                             2              2         
  0        476.5           2.1       1.0X
+RLE, bitWidth=8                                       0              0         
  0      15710.2           0.1      32.3X
+PACKED cold, bitWidth=12                              3              3         
  0        357.4           2.8       0.7X
+PACKED reused, bitWidth=12                            3              3         
  0        357.3           2.8       0.7X
+RLE, bitWidth=12                                      0              0         
  0      20684.0           0.0      42.6X
+PACKED cold, bitWidth=20                              3              4         
  0        303.0           3.3       0.6X
+PACKED reused, bitWidth=20                            3              4         
  0        302.1           3.3       0.6X
+RLE, bitWidth=20                                      0              0         
  0      20675.9           0.0      42.6X
 
 
 
================================================================================================
 Nullable batch decode with def-level materialization
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
 AMD EPYC 7763 64-Core Processor
 Nullable batch with def-levels:           Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, n/a                                    0              0         
  0       6431.5           0.2       1.0X
-nullRatio=0.1, random                                 9              9         
  0        114.6           8.7       0.0X
-nullRatio=0.1, clustered                              7              7         
  0        159.7           6.3       0.0X
-nullRatio=0.3, random                                13             13         
  0         80.1          12.5       0.0X
-nullRatio=0.3, clustered                              7              7         
  1        157.8           6.3       0.0X
-nullRatio=0.5, random                                14             15         
  0         72.7          13.7       0.0X
-nullRatio=0.5, clustered                              6              7         
  0        162.0           6.2       0.0X
-nullRatio=0.9, random                                 8              8         
  0        126.4           7.9       0.0X
-nullRatio=0.9, clustered                              6              6         
  0        174.0           5.7       0.0X
-nullRatio=1.0, random                                 0              0         
  0       8062.5           0.1       1.3X
+nullRatio=0.0, n/a                                    0              0         
  0       6435.2           0.2       1.0X
+nullRatio=0.1, random                                10             10         
  0        106.0           9.4       0.0X
+nullRatio=0.1, clustered                              7              7         
  0        144.9           6.9       0.0X
+nullRatio=0.3, random                                14             14         
  0         75.0          13.3       0.0X
+nullRatio=0.3, clustered                              7              7         
  0        147.9           6.8       0.0X
+nullRatio=0.5, random                                16             16         
  0         67.5          14.8       0.0X
+nullRatio=0.5, clustered                              7              7         
  0        151.9           6.6       0.0X
+nullRatio=0.9, random                                 9              9         
  0        121.9           8.2       0.0X
+nullRatio=0.9, clustered                              6              6         
  0        166.4           6.0       0.0X
+nullRatio=1.0, random                                 0              0         
  0       4126.2           0.2       0.6X
 
 
 
================================================================================================
 Nullable batch decode without def-level materialization
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
 AMD EPYC 7763 64-Core Processor
 Nullable batch without def-levels:        Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, n/a                                    0              0         
  0      11054.0           0.1       1.0X
-nullRatio=0.1, random                                 7              8         
  0        140.6           7.1       0.0X
-nullRatio=0.1, clustered                              5              5         
  0        193.2           5.2       0.0X
-nullRatio=0.3, random                                11             11         
  0         97.4          10.3       0.0X
-nullRatio=0.3, clustered                              6              6         
  0        184.4           5.4       0.0X
-nullRatio=0.5, random                                12             12         
  0         87.7          11.4       0.0X
-nullRatio=0.5, clustered                              5              6         
  0        191.6           5.2       0.0X
-nullRatio=0.9, random                                 7              7         
  0        151.7           6.6       0.0X
-nullRatio=0.9, clustered                              5              5         
  0        200.8           5.0       0.0X
-nullRatio=1.0, random                                 0              0         
  0      11662.5           0.1       1.1X
+nullRatio=0.0, n/a                                    0              0         
  0      10982.3           0.1       1.0X
+nullRatio=0.1, random                                 8              8         
  0        139.7           7.2       0.0X
+nullRatio=0.1, clustered                              5              6         
  0        191.6           5.2       0.0X
+nullRatio=0.3, random                                11             11         
  0         98.9          10.1       0.0X
+nullRatio=0.3, clustered                              5              5         
  0        194.1           5.2       0.0X
+nullRatio=0.5, random                                12             12         
  0         89.4          11.2       0.0X
+nullRatio=0.5, clustered                              5              5         
  0        199.1           5.0       0.0X
+nullRatio=0.9, random                                 7              7         
  0        160.5           6.2       0.0X
+nullRatio=0.9, clustered                              5              5         
  0        218.1           4.6       0.0X
+nullRatio=1.0, random                                 0              0         
  0       4916.7           0.2       0.4X
 
 
 
================================================================================================
 Nullable batch decode with row-index filtering (with def-levels)
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
 AMD EPYC 7763 64-Core Processor
 Nullable batch with def-levels, row-index filtered:  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
----------------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, contiguous 50%                                   1              
2           0        763.7           1.3       1.0X
-nullRatio=0.3, contiguous 50%                                   9              
9           0        117.3           8.5       0.2X
-nullRatio=0.9, contiguous 50%                                   7              
7           0        157.5           6.3       0.2X
-nullRatio=0.0, alt 1000-row windows                             3              
3           0        418.8           2.4       0.5X
-nullRatio=0.3, alt 1000-row windows                            10             
10           0        103.8           9.6       0.1X
-nullRatio=0.9, alt 1000-row windows                             8              
8           0        134.7           7.4       0.2X
+nullRatio=0.0, contiguous 50%                                   1              
2           0        759.7           1.3       1.0X
+nullRatio=0.3, contiguous 50%                                  10             
10           0        108.5           9.2       0.1X
+nullRatio=0.9, contiguous 50%                                   7              
7           0        149.5           6.7       0.2X
+nullRatio=0.0, alt 1000-row windows                             2              
3           0        419.6           2.4       0.6X
+nullRatio=0.3, alt 1000-row windows                            11             
11           0         96.9          10.3       0.1X
+nullRatio=0.9, alt 1000-row windows                             8              
8           0        128.3           7.8       0.2X
 
 
 
================================================================================================
 Nullable batch decode with row-index filtering (without def-levels)
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
 AMD EPYC 7763 64-Core Processor
 Nullable batch without def-levels, row-index filtered:  Best Time(ms)   Avg 
Time(ms)   Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
-------------------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, contiguous 50%                                      1           
   1           0        865.4           1.2       1.0X
-nullRatio=0.3, contiguous 50%                                      8           
   8           0        128.8           7.8       0.1X
-nullRatio=0.9, contiguous 50%                                      6           
   6           0        173.4           5.8       0.2X
-nullRatio=0.0, alt 1000-row windows                                2           
   2           0        425.9           2.3       0.5X
-nullRatio=0.3, alt 1000-row windows                                9           
   9           0        111.3           9.0       0.1X
+nullRatio=0.0, contiguous 50%                                      1           
   1           0        865.3           1.2       1.0X
+nullRatio=0.3, contiguous 50%                                      8           
   8           0        127.6           7.8       0.1X
+nullRatio=0.9, contiguous 50%                                      6           
   6           0        174.1           5.7       0.2X
+nullRatio=0.0, alt 1000-row windows                                2           
   2           0        425.3           2.4       0.5X
+nullRatio=0.3, alt 1000-row windows                                9           
  10           0        110.5           9.1       0.1X
 nullRatio=0.9, alt 1000-row windows                                7           
   7           0        143.3           7.0       0.2X
 
 
@@ -116,39 +116,39 @@ nullRatio=0.9, alt 1000-row windows                       
         7
 Single-value reads
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
 AMD EPYC 7763 64-Core Processor
 Single-value reads:                       Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-readBoolean                                           4              4         
  0        287.0           3.5       1.0X
-readInteger, bitWidth=4                               4              4         
  0        275.7           3.6       1.0X
-readValueDictionaryId, bitWidth=4                     4              4         
  0        275.5           3.6       1.0X
-readInteger, bitWidth=8                               4              4         
  0        273.7           3.7       1.0X
+readBoolean                                           4              4         
  1        272.0           3.7       1.0X
+readInteger, bitWidth=4                               4              4         
  0        274.0           3.6       1.0X
+readValueDictionaryId, bitWidth=4                     4              4         
  0        275.2           3.6       1.0X
+readInteger, bitWidth=8                               4              4         
  0        272.3           3.7       1.0X
 readValueDictionaryId, bitWidth=8                     4              4         
  0        273.4           3.7       1.0X
-readInteger, bitWidth=12                              5              5         
  1        230.1           4.3       0.8X
-readValueDictionaryId, bitWidth=12                    5              5         
  0        229.3           4.4       0.8X
-readInteger, bitWidth=20                              5              5         
  0        207.9           4.8       0.7X
-readValueDictionaryId, bitWidth=20                    5              5         
  0        207.3           4.8       0.7X
+readInteger, bitWidth=12                              5              5         
  0        228.2           4.4       0.8X
+readValueDictionaryId, bitWidth=12                    5              5         
  0        228.9           4.4       0.8X
+readInteger, bitWidth=20                              5              5         
  0        204.3           4.9       0.8X
+readValueDictionaryId, bitWidth=20                    5              5         
  0        205.5           4.9       0.8X
 
 
 
================================================================================================
 Skip
 
================================================================================================
 
-OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
 AMD EPYC 7763 64-Core Processor
 Skip:                                     Best Time(ms)   Avg Time(ms)   
Stdev(ms)    Rate(M/s)   Per Row(ns)   Relative
 
------------------------------------------------------------------------------------------------------------------------
-skipBooleans, trueRatio=0.0                           0              0         
  0   20971520.0           0.0       1.0X
-skipBooleans, trueRatio=0.5                           2              2         
  0        569.4           1.8       0.0X
+skipBooleans, trueRatio=0.0                           0              0         
  0   21399510.2           0.0       1.0X
+skipBooleans, trueRatio=0.5                           2              2         
  0        536.1           1.9       0.0X
 skipBooleans, trueRatio=1.0                           0              0         
  0   21399510.2           0.0       1.0X
-skipIntegers PACKED, bitWidth=4                       2              2         
  0        522.7           1.9       0.0X
+skipIntegers PACKED, bitWidth=4                       2              2         
  0        502.5           2.0       0.0X
 skipIntegers RLE, bitWidth=4                          0              0         
  0   20971520.0           0.0       1.0X
-skipIntegers PACKED, bitWidth=8                       2              2         
  0        516.6           1.9       0.0X
+skipIntegers PACKED, bitWidth=8                       2              2         
  0        497.5           2.0       0.0X
 skipIntegers RLE, bitWidth=8                          0              0         
  0   21399510.2           0.0       1.0X
-skipIntegers PACKED, bitWidth=12                      3              3         
  0        382.4           2.6       0.0X
+skipIntegers PACKED, bitWidth=12                      3              3         
  0        367.9           2.7       0.0X
 skipIntegers RLE, bitWidth=12                         0              0         
  0   17476266.7           0.0       0.8X
-skipIntegers PACKED, bitWidth=20                      3              3         
  0        323.0           3.1       0.0X
+skipIntegers PACKED, bitWidth=20                      3              3         
  0        310.9           3.2       0.0X
 skipIntegers RLE, bitWidth=20                         0              0         
  0   17476266.7           0.0       0.8X
 
 
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java
index 5dac20209ef3..7c5742b65ada 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java
@@ -298,9 +298,7 @@ public final class VectorizedRleValuesReader extends 
ValuesReader
         } while (currentBufferIdx < bufEnd
             && currentBuffer[currentBufferIdx] != maxDefLevel);
         int runLen = currentBufferIdx - runStart;
-        for (int k = 0; k < runLen; k++) {
-          nulls.putNull(valueOff + k);
-        }
+        nulls.putNulls(valueOff, runLen);
         valueOff += runLen;
       }
     }
@@ -714,14 +712,10 @@ public final class VectorizedRleValuesReader extends 
ValuesReader
           updater.readValues(runLen, valueOff, values, valueReader);
         }
       } else {
-        for (int k = 0; k < runLen; k++) {
-          nulls.putNull(valueOff + k);
-        }
+        nulls.putNulls(valueOff, runLen);
       }
       valueOff += runLen;
-      for (int k = 0; k < runLen; k++) {
-        defLevels.putInt(levelIdx + k, runValue);
-      }
+      defLevels.putInts(levelIdx, runLen, runValue);
       levelIdx += runLen;
     }
     state.valueOffset = valueOff;
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
index 03a645acedf5..e9d4ac8aa8d6 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
@@ -33,12 +33,13 @@ public final class OffHeapColumnVector extends 
WritableColumnVector {
   private static final boolean bigEndianPlatform =
     ByteOrder.nativeOrder().equals(ByteOrder.BIG_ENDIAN);
 
-  // Below this count, byte-fill methods (putBytes / putBooleans) write bytes 
in an inline
-  // loop. At or above this count, they call Platform.setMemory which lowers 
to a native
-  // memset. The JNI fixed cost of setMemory dominates for very short fills; 
on the
+  // Below this count, byte-fill methods (putBytes / putBooleans / putNulls) 
write bytes in
+  // an inline loop. At or above this count, they call Platform.setMemory 
which lowers to a
+  // native memset. The JNI fixed cost of setMemory dominates for very short 
fills; on the
   // benchmarked hardware (Apple M4 Max + OpenJDK 21) the crossover sits 
between 64 and
-  // 512, so 128 is a conservative choice that avoids regression at small 
counts while
-  // retaining the bulk of the asymptotic gain.
+  // 512, so 128 is a conservative choice that avoids regression at small 
counts (common
+  // for random null patterns where RLE runs are short) while retaining the 
bulk of the
+  // asymptotic gain.
   private static final int SET_MEMORY_THRESHOLD = 128;
 
   /**
@@ -127,9 +128,13 @@ public final class OffHeapColumnVector extends 
WritableColumnVector {
   @Override
   public void putNulls(int rowId, int count) {
     if (isAllNull()) return; // Skip writing nulls to all-null vector.
-    long offset = nulls + rowId;
-    for (int i = 0; i < count; ++i, ++offset) {
-      Platform.putByte(null, offset, (byte) 1);
+    if (count < SET_MEMORY_THRESHOLD) {
+      long offset = nulls + rowId;
+      for (int i = 0; i < count; ++i, ++offset) {
+        Platform.putByte(null, offset, (byte) 1);
+      }
+    } else {
+      Platform.setMemory(nulls + rowId, (byte) 1, count);
     }
     numNulls += count;
   }
diff --git 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
index 784c9053ca81..146c4b1bf3eb 100644
--- 
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
+++ 
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
@@ -116,9 +116,7 @@ public final class OnHeapColumnVector extends 
WritableColumnVector {
   @Override
   public void putNulls(int rowId, int count) {
     if (isAllNull()) return; // Skip writing nulls to all-null vector.
-    for (int i = 0; i < count; ++i) {
-      nulls[rowId + i] = (byte)1;
-    }
+    Arrays.fill(nulls, rowId, rowId + count, (byte) 1);
     numNulls += count;
   }
 
@@ -313,9 +311,7 @@ public final class OnHeapColumnVector extends 
WritableColumnVector {
 
   @Override
   public void putInts(int rowId, int count, int value) {
-    for (int i = 0; i < count; ++i) {
-      intData[i + rowId] = value;
-    }
+    Arrays.fill(intData, rowId, rowId + count, value);
   }
 
   @Override


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to