This is an automated email from the ASF dual-hosted git repository.
viirya pushed a commit to branch branch-4.x
in repository https://gitbox.apache.org/repos/asf/spark.git
The following commit(s) were added to refs/heads/branch-4.x by this push:
new 9f7ab8058665 [SPARK-57024][SQL] Use bulk fill APIs to materialize RLE
runs in Parquet vectorized reader
9f7ab8058665 is described below
commit 9f7ab80586655ba99340db973671544355361382
Author: Liang-Chi Hsieh <[email protected]>
AuthorDate: Wed May 27 00:49:48 2026 -0700
[SPARK-57024][SQL] Use bulk fill APIs to materialize RLE runs in Parquet
vectorized reader
### What changes were proposed in this pull request?
`VectorizedRleValuesReader` materializes RLE runs of nulls and
definition levels with degenerate per-element loops:
```java
// VectorizedRleValuesReader.java
for (int k = 0; k < runLen; k++) {
nulls.putNull(valueOff + k);
}
for (int k = 0; k < runLen; k++) {
defLevels.putInt(levelIdx + k, runValue);
}
```
`WritableColumnVector` already exposes the bulk equivalents
`putNulls(rowId, count)` and `putInts(rowId, count, value)`. This PR
switches the three caller sites to the bulk APIs, and reimplements the
bulk APIs themselves (which were also degenerate loops) using JIT
intrinsics:
- `OnHeapColumnVector.putNulls` -> `Arrays.fill(byte[], ..., (byte) 1)`
- `OnHeapColumnVector.putInts(rowId, count, value)` ->
`Arrays.fill(int[], ..., value)`
- `OffHeapColumnVector.putNulls` -> `Platform.setMemory(addr, (byte) 1,
count)`
with a small-count fallback to an inline byte loop
`Arrays.fill` is backed by HotSpot's `_jbyte_fill` / `_jint_fill`
intrinsic stubs and `Unsafe.setMemory` lowers to a native memset; both
are faster than the byte/int loops they replace once `runLen` grows
beyond a handful of elements.
For `OffHeap.putNulls`, `Unsafe.setMemory` has a non-trivial JNI fixed
cost, so it loses to the inline byte loop for very short fills (which
are common in random null patterns). A threshold of 128 is used to pick
between the two paths.
### Why are the changes needed?
The bulk-fill APIs on `WritableColumnVector` were the obviously-correct
calls to make in `VectorizedRleValuesReader`, but their implementations
were not actually bulk — both the callers and the implementations have
been small per-element loops.
#### Caller-side (Parquet RLE materialization)
Measured on Apple M4 Max + OpenJDK 21.0.8 using
`VectorizedRleValuesReaderBenchmark` (Group C, "Nullable batch decode
with def-level materialization", 1M rows, BATCH_SIZE=4096), ns/row:
| nullRatio | shape | baseline | patched | delta |
| --------- | --------- | -------: | ------: | -----: |
| 0.1 | random | 4.0 | 4.2 | noise |
| 0.1 | clustered | 2.8 | 2.7 | +4% |
| 0.3 | random | 6.2 | 6.3 | noise |
| 0.3 | clustered | 2.8 | 2.7 | +4% |
| 0.5 | random | 7.1 | 7.1 | 0% |
| 0.5 | clustered | 2.8 | 2.6 | +7% |
| 0.9 | random | 3.9 | 3.5 | +10% |
| 0.9 | clustered | 2.6 | 2.3 | +12% |
Gains concentrate on clustered null patterns (long RLE runs), which are
common in real workloads — sparse dimension columns, ETL-staged nulls,
time-bucketed missing metrics. Random null patterns produce short runs
where the bulk-API call cost matches the original loop, hence the
no-op-to-noise band there.
#### Implementation-side (OffHeap putNulls)
A separate micro-benchmark of `OffHeapColumnVector.putNulls` (run via
`WritableColumnVectorBulkFillBenchmark`, not included in this PR) shows
the threshold matters: a naive unconditional `Platform.setMemory`
regresses small-count fills (`count <= 64`) by up to 7x against the
original byte loop due to JNI fixed cost, while the count=4096+ path
gains ~10x. The 128-element threshold picks the right path for both
regimes; the crossover on the benchmarked hardware sits between 64 and
512, so 128 is conservative.
### Does this PR introduce _any_ user-facing change?
No.
### How was this patch tested?
Existing tests; no behavior change. Ran locally:
- `VectorizedRleValuesReaderSuite` (covers the modified caller paths)
- `ColumnVectorSuite` and `ColumnarBatchSuite` (cover the modified
`OnHeap/OffHeapColumnVector.putNulls` / `putInts` bulk APIs)
- `ParquetIOSuite` (end-to-end vectorized reader coverage)
237 tests, all pass.
Benchmark numbers above produced by the existing
`VectorizedRleValuesReaderBenchmark` (no benchmark changes in this PR).
### Was this patch authored or co-authored using generative AI tooling?
Generated-by: Claude Code (Claude Opus 4.7)
Closes #56072 from viirya/SPARK-57024.
Authored-by: Liang-Chi Hsieh <[email protected]>
Signed-off-by: Liang-Chi Hsieh <[email protected]>
(cherry picked from commit febc98790605d7353d8fc254b54f622d85053a63)
Signed-off-by: Liang-Chi Hsieh <[email protected]>
---
...rizedRleValuesReaderBenchmark-jdk21-results.txt | 180 ++++++++++-----------
...rizedRleValuesReaderBenchmark-jdk25-results.txt | 180 ++++++++++-----------
.../VectorizedRleValuesReaderBenchmark-results.txt | 150 ++++++++---------
.../parquet/VectorizedRleValuesReader.java | 12 +-
.../execution/vectorized/OffHeapColumnVector.java | 21 ++-
.../execution/vectorized/OnHeapColumnVector.java | 8 +-
6 files changed, 273 insertions(+), 278 deletions(-)
diff --git
a/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-jdk21-results.txt
b/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-jdk21-results.txt
index cb53e9dd5b2a..b5f6d62d2b6b 100644
--- a/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-jdk21-results.txt
+++ b/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-jdk21-results.txt
@@ -2,153 +2,153 @@
Boolean decode
================================================================================================
-OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 9V74 80-Core Processor
RLE readBooleans decode: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-cold reader, trueRatio=0.0 0 0
0 59466.7 0.0 1.0X
-reused reader, trueRatio=0.0 0 0
0 82871.7 0.0 1.4X
-cold reader, trueRatio=0.1 1 1
0 744.9 1.3 0.0X
-reused reader, trueRatio=0.1 1 1
0 746.1 1.3 0.0X
-cold reader, trueRatio=0.5 1 1
0 826.2 1.2 0.0X
-reused reader, trueRatio=0.5 1 1
0 828.1 1.2 0.0X
-cold reader, trueRatio=0.9 1 1
0 743.5 1.3 0.0X
-reused reader, trueRatio=0.9 1 1
0 738.4 1.4 0.0X
-cold reader, trueRatio=1.0 0 0
0 82409.3 0.0 1.4X
-reused reader, trueRatio=1.0 0 0
0 82871.7 0.0 1.4X
+cold reader, trueRatio=0.0 0 0
0 4886.6 0.2 1.0X
+reused reader, trueRatio=0.0 0 0
0 4857.7 0.2 1.0X
+cold reader, trueRatio=0.1 1 1
0 1186.3 0.8 0.2X
+reused reader, trueRatio=0.1 1 1
0 789.7 1.3 0.2X
+cold reader, trueRatio=0.5 1 1
0 1335.3 0.7 0.3X
+reused reader, trueRatio=0.5 1 1
0 855.2 1.2 0.2X
+cold reader, trueRatio=0.9 1 1
0 1186.3 0.8 0.2X
+reused reader, trueRatio=0.9 1 1
0 787.6 1.3 0.2X
+cold reader, trueRatio=1.0 0 0
0 4064.6 0.2 0.8X
+reused reader, trueRatio=1.0 0 0
0 4855.5 0.2 1.0X
================================================================================================
Integer decode
================================================================================================
-OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 9V74 80-Core Processor
RLE readIntegers dictionary-id decode: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-PACKED cold, bitWidth=4 2 2
0 489.7 2.0 1.0X
-PACKED reused, bitWidth=4 2 2
0 487.7 2.1 1.0X
-RLE, bitWidth=4 0 0
0 4506.8 0.2 9.2X
-PACKED cold, bitWidth=8 2 2
0 524.2 1.9 1.1X
-PACKED reused, bitWidth=8 2 2
0 524.6 1.9 1.1X
-RLE, bitWidth=8 0 0
0 4507.0 0.2 9.2X
-PACKED cold, bitWidth=12 3 3
0 417.6 2.4 0.9X
-PACKED reused, bitWidth=12 3 3
0 415.4 2.4 0.8X
-RLE, bitWidth=12 0 0
0 4507.2 0.2 9.2X
-PACKED cold, bitWidth=20 3 3
0 351.9 2.8 0.7X
-PACKED reused, bitWidth=20 3 3
0 349.4 2.9 0.7X
-RLE, bitWidth=20 0 0
0 4499.6 0.2 9.2X
+PACKED cold, bitWidth=4 2 2
0 542.4 1.8 1.0X
+PACKED reused, bitWidth=4 2 2
0 541.2 1.8 1.0X
+RLE, bitWidth=4 0 0
0 23543.9 0.0 43.4X
+PACKED cold, bitWidth=8 2 2
0 621.3 1.6 1.1X
+PACKED reused, bitWidth=8 2 2
0 618.1 1.6 1.1X
+RLE, bitWidth=8 0 0
0 23511.7 0.0 43.3X
+PACKED cold, bitWidth=12 2 2
0 482.5 2.1 0.9X
+PACKED reused, bitWidth=12 2 2
0 480.5 2.1 0.9X
+RLE, bitWidth=12 0 0
0 23507.0 0.0 43.3X
+PACKED cold, bitWidth=20 3 3
0 401.9 2.5 0.7X
+PACKED reused, bitWidth=20 3 3
0 400.6 2.5 0.7X
+RLE, bitWidth=20 0 0
0 23570.4 0.0 43.5X
================================================================================================
Nullable batch decode with def-level materialization
================================================================================================
-OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 9V74 80-Core Processor
Nullable batch with def-levels: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, n/a 0 0
0 6695.3 0.1 1.0X
-nullRatio=0.1, random 9 9
0 123.2 8.1 0.0X
-nullRatio=0.1, clustered 6 6
1 174.1 5.7 0.0X
-nullRatio=0.3, random 12 12
0 85.3 11.7 0.0X
-nullRatio=0.3, clustered 6 6
0 172.7 5.8 0.0X
-nullRatio=0.5, random 14 14
0 76.5 13.1 0.0X
-nullRatio=0.5, clustered 6 6
0 173.6 5.8 0.0X
-nullRatio=0.9, random 8 8
0 132.0 7.6 0.0X
-nullRatio=0.9, clustered 6 6
0 182.4 5.5 0.0X
-nullRatio=1.0, random 0 0
0 5048.8 0.2 0.8X
+nullRatio=0.0, n/a 0 0
0 8154.1 0.1 1.0X
+nullRatio=0.1, random 7 8
0 140.5 7.1 0.0X
+nullRatio=0.1, clustered 5 5
0 204.1 4.9 0.0X
+nullRatio=0.3, random 11 11
0 96.0 10.4 0.0X
+nullRatio=0.3, clustered 5 5
0 207.8 4.8 0.0X
+nullRatio=0.5, random 12 12
0 87.4 11.4 0.0X
+nullRatio=0.5, clustered 5 5
0 213.6 4.7 0.0X
+nullRatio=0.9, random 6 7
0 162.1 6.2 0.0X
+nullRatio=0.9, clustered 4 5
0 235.1 4.3 0.0X
+nullRatio=1.0, random 0 0
0 22458.3 0.0 2.8X
================================================================================================
Nullable batch decode without def-level materialization
================================================================================================
-OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 9V74 80-Core Processor
Nullable batch without def-levels: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, n/a 0 0
0 12199.7 0.1 1.0X
-nullRatio=0.1, random 7 7
0 147.8 6.8 0.0X
-nullRatio=0.1, clustered 5 5
0 204.6 4.9 0.0X
-nullRatio=0.3, random 10 10
0 100.8 9.9 0.0X
-nullRatio=0.3, clustered 5 5
0 200.6 5.0 0.0X
-nullRatio=0.5, random 12 12
0 89.4 11.2 0.0X
-nullRatio=0.5, clustered 5 5
0 199.3 5.0 0.0X
-nullRatio=0.9, random 7 7
0 153.3 6.5 0.0X
-nullRatio=0.9, clustered 5 5
0 202.2 4.9 0.0X
-nullRatio=1.0, random 0 0
0 11887.9 0.1 1.0X
+nullRatio=0.0, n/a 0 0
0 12686.2 0.1 1.0X
+nullRatio=0.1, random 6 6
0 172.3 5.8 0.0X
+nullRatio=0.1, clustered 4 4
0 251.2 4.0 0.0X
+nullRatio=0.3, random 9 9
0 115.5 8.7 0.0X
+nullRatio=0.3, clustered 4 4
0 253.9 3.9 0.0X
+nullRatio=0.5, random 10 10
0 105.1 9.5 0.0X
+nullRatio=0.5, clustered 4 4
0 259.7 3.9 0.0X
+nullRatio=0.9, random 5 5
0 198.9 5.0 0.0X
+nullRatio=0.9, clustered 4 4
0 282.6 3.5 0.0X
+nullRatio=1.0, random 0 0
0 96058.6 0.0 7.6X
================================================================================================
Nullable batch decode with row-index filtering (with def-levels)
================================================================================================
-OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 9V74 80-Core Processor
Nullable batch with def-levels, row-index filtered: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
----------------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, contiguous 50% 1
1 0 757.3 1.3 1.0X
-nullRatio=0.3, contiguous 50% 9
9 0 119.6 8.4 0.2X
-nullRatio=0.9, contiguous 50% 7
7 0 158.9 6.3 0.2X
-nullRatio=0.0, alt 1000-row windows 3
3 0 377.7 2.6 0.5X
-nullRatio=0.3, alt 1000-row windows 10
10 0 102.3 9.8 0.1X
-nullRatio=0.9, alt 1000-row windows 8
8 1 130.9 7.6 0.2X
+nullRatio=0.0, contiguous 50% 1
1 0 1247.7 0.8 1.0X
+nullRatio=0.3, contiguous 50% 8
8 0 135.1 7.4 0.1X
+nullRatio=0.9, contiguous 50% 5
6 0 191.0 5.2 0.2X
+nullRatio=0.0, alt 1000-row windows 2
2 0 433.2 2.3 0.3X
+nullRatio=0.3, alt 1000-row windows 9
9 0 113.6 8.8 0.1X
+nullRatio=0.9, alt 1000-row windows 7
7 0 150.4 6.6 0.1X
================================================================================================
Nullable batch decode with row-index filtering (without def-levels)
================================================================================================
-OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 9V74 80-Core Processor
Nullable batch without def-levels, row-index filtered: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, contiguous 50% 1
2 0 767.0 1.3 1.0X
-nullRatio=0.3, contiguous 50% 8
8 0 129.1 7.7 0.2X
-nullRatio=0.9, contiguous 50% 6
7 0 166.0 6.0 0.2X
-nullRatio=0.0, alt 1000-row windows 3
3 0 377.2 2.7 0.5X
-nullRatio=0.3, alt 1000-row windows 10
10 0 109.0 9.2 0.1X
-nullRatio=0.9, alt 1000-row windows 8
8 0 137.5 7.3 0.2X
+nullRatio=0.0, contiguous 50% 1
1 0 1004.6 1.0 1.0X
+nullRatio=0.3, contiguous 50% 7
7 1 154.7 6.5 0.2X
+nullRatio=0.9, contiguous 50% 5
5 0 214.0 4.7 0.2X
+nullRatio=0.0, alt 1000-row windows 2
2 0 464.0 2.2 0.5X
+nullRatio=0.3, alt 1000-row windows 8
8 0 130.3 7.7 0.1X
+nullRatio=0.9, alt 1000-row windows 6
6 0 169.6 5.9 0.2X
================================================================================================
Single-value reads
================================================================================================
-OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 9V74 80-Core Processor
Single-value reads: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-readBoolean 3 3
0 311.5 3.2 1.0X
-readInteger, bitWidth=4 4 4
0 275.7 3.6 0.9X
-readValueDictionaryId, bitWidth=4 4 4
0 276.2 3.6 0.9X
-readInteger, bitWidth=8 4 4
0 289.2 3.5 0.9X
-readValueDictionaryId, bitWidth=8 4 4
0 289.8 3.5 0.9X
-readInteger, bitWidth=12 4 4
0 252.3 4.0 0.8X
-readValueDictionaryId, bitWidth=12 4 4
0 252.1 4.0 0.8X
-readInteger, bitWidth=20 5 5
0 227.7 4.4 0.7X
-readValueDictionaryId, bitWidth=20 5 5
0 227.2 4.4 0.7X
+readBoolean 3 3
0 361.8 2.8 1.0X
+readInteger, bitWidth=4 3 3
0 314.3 3.2 0.9X
+readValueDictionaryId, bitWidth=4 3 3
0 315.3 3.2 0.9X
+readInteger, bitWidth=8 3 3
0 339.9 2.9 0.9X
+readValueDictionaryId, bitWidth=8 3 3
0 339.8 2.9 0.9X
+readInteger, bitWidth=12 4 4
0 293.8 3.4 0.8X
+readValueDictionaryId, bitWidth=12 4 4
0 293.9 3.4 0.8X
+readInteger, bitWidth=20 4 4
0 262.1 3.8 0.7X
+readValueDictionaryId, bitWidth=20 4 4
0 261.9 3.8 0.7X
================================================================================================
Skip
================================================================================================
-OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1010-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 21.0.11+10-LTS on Linux 6.17.0-1013-azure
+AMD EPYC 9V74 80-Core Processor
Skip: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-skipBooleans, trueRatio=0.0 0 0
0 26214400.0 0.0 1.0X
-skipBooleans, trueRatio=0.5 2 2
0 559.1 1.8 0.0X
-skipBooleans, trueRatio=1.0 0 0
0 26214400.0 0.0 1.0X
-skipIntegers PACKED, bitWidth=4 2 2
0 502.4 2.0 0.0X
-skipIntegers RLE, bitWidth=4 0 0
0 21399510.2 0.0 0.8X
-skipIntegers PACKED, bitWidth=8 2 2
0 551.4 1.8 0.0X
-skipIntegers RLE, bitWidth=8 0 0
0 21399510.2 0.0 0.8X
-skipIntegers PACKED, bitWidth=12 2 2
0 431.5 2.3 0.0X
-skipIntegers RLE, bitWidth=12 0 0
0 21399510.2 0.0 0.8X
-skipIntegers PACKED, bitWidth=20 3 3
0 364.1 2.7 0.0X
-skipIntegers RLE, bitWidth=20 0 0
0 21399510.2 0.0 0.8X
+skipBooleans, trueRatio=0.0 0 0
0 34952533.3 0.0 1.0X
+skipBooleans, trueRatio=0.5 2 2
0 662.2 1.5 0.0X
+skipBooleans, trueRatio=1.0 0 0
0 34952533.3 0.0 1.0X
+skipIntegers PACKED, bitWidth=4 2 2
0 553.8 1.8 0.0X
+skipIntegers RLE, bitWidth=4 0 0
0 34952533.3 0.0 1.0X
+skipIntegers PACKED, bitWidth=8 2 2
0 637.9 1.6 0.0X
+skipIntegers RLE, bitWidth=8 0 0
0 34952533.3 0.0 1.0X
+skipIntegers PACKED, bitWidth=12 2 2
0 493.4 2.0 0.0X
+skipIntegers RLE, bitWidth=12 0 0
0 26214400.0 0.0 0.8X
+skipIntegers PACKED, bitWidth=20 3 3
0 415.6 2.4 0.0X
+skipIntegers RLE, bitWidth=20 0 0
0 26214400.0 0.0 0.8X
diff --git
a/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-jdk25-results.txt
b/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-jdk25-results.txt
index 3029a4b3268b..860fafe62b1e 100644
--- a/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-jdk25-results.txt
+++ b/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-jdk25-results.txt
@@ -2,153 +2,153 @@
Boolean decode
================================================================================================
-OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1011-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
RLE readBooleans decode: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-cold reader, trueRatio=0.0 0 0
0 5323.3 0.2 1.0X
-reused reader, trueRatio=0.0 0 0
0 4464.5 0.2 0.8X
-cold reader, trueRatio=0.1 2 2
0 669.3 1.5 0.1X
-reused reader, trueRatio=0.1 2 2
0 672.6 1.5 0.1X
-cold reader, trueRatio=0.5 1 1
0 722.5 1.4 0.1X
-reused reader, trueRatio=0.5 1 1
0 724.3 1.4 0.1X
-cold reader, trueRatio=0.9 2 2
0 669.6 1.5 0.1X
-reused reader, trueRatio=0.9 2 2
0 674.0 1.5 0.1X
-cold reader, trueRatio=1.0 0 0
0 4574.1 0.2 0.9X
-reused reader, trueRatio=1.0 0 0
0 4460.9 0.2 0.8X
+cold reader, trueRatio=0.0 0 0
0 64903.2 0.0 1.0X
+reused reader, trueRatio=0.0 0 0
0 63220.5 0.0 1.0X
+cold reader, trueRatio=0.1 1 1
0 830.6 1.2 0.0X
+reused reader, trueRatio=0.1 1 1
0 789.9 1.3 0.0X
+cold reader, trueRatio=0.5 1 1
0 927.6 1.1 0.0X
+reused reader, trueRatio=0.5 1 1
0 905.4 1.1 0.0X
+cold reader, trueRatio=0.9 1 1
0 831.8 1.2 0.0X
+reused reader, trueRatio=0.9 1 1
0 833.2 1.2 0.0X
+cold reader, trueRatio=1.0 0 0
0 64176.3 0.0 1.0X
+reused reader, trueRatio=1.0 0 0
0 62706.4 0.0 1.0X
================================================================================================
Integer decode
================================================================================================
-OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1011-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
RLE readIntegers dictionary-id decode: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-PACKED cold, bitWidth=4 2 2
0 516.7 1.9 1.0X
-PACKED reused, bitWidth=4 2 2
0 516.1 1.9 1.0X
-RLE, bitWidth=4 0 0
0 18696.2 0.1 36.2X
-PACKED cold, bitWidth=8 2 2
0 570.0 1.8 1.1X
-PACKED reused, bitWidth=8 2 2
0 567.0 1.8 1.1X
-RLE, bitWidth=8 0 0
0 18583.5 0.1 36.0X
-PACKED cold, bitWidth=12 2 2
0 454.6 2.2 0.9X
-PACKED reused, bitWidth=12 2 2
0 452.6 2.2 0.9X
-RLE, bitWidth=12 0 0
0 18696.2 0.1 36.2X
-PACKED cold, bitWidth=20 3 3
0 373.2 2.7 0.7X
-PACKED reused, bitWidth=20 3 3
0 369.4 2.7 0.7X
-RLE, bitWidth=20 0 0
0 15516.8 0.1 30.0X
+PACKED cold, bitWidth=4 2 2
0 534.3 1.9 1.0X
+PACKED reused, bitWidth=4 2 2
0 522.1 1.9 1.0X
+RLE, bitWidth=4 0 0
0 7632.3 0.1 14.3X
+PACKED cold, bitWidth=8 2 2
0 584.0 1.7 1.1X
+PACKED reused, bitWidth=8 2 2
0 579.4 1.7 1.1X
+RLE, bitWidth=8 0 0
0 7615.7 0.1 14.3X
+PACKED cold, bitWidth=12 2 2
0 443.8 2.3 0.8X
+PACKED reused, bitWidth=12 2 3
0 436.4 2.3 0.8X
+RLE, bitWidth=12 0 0
0 7589.5 0.1 14.2X
+PACKED cold, bitWidth=20 3 3
0 378.5 2.6 0.7X
+PACKED reused, bitWidth=20 3 3
0 382.8 2.6 0.7X
+RLE, bitWidth=20 0 0
0 7590.5 0.1 14.2X
================================================================================================
Nullable batch decode with def-level materialization
================================================================================================
-OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1011-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Nullable batch with def-levels: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, n/a 0 0
0 6608.2 0.2 1.0X
-nullRatio=0.1, random 9 9
0 119.1 8.4 0.0X
-nullRatio=0.1, clustered 6 6
0 166.2 6.0 0.0X
-nullRatio=0.3, random 13 13
0 81.2 12.3 0.0X
-nullRatio=0.3, clustered 6 6
0 166.0 6.0 0.0X
-nullRatio=0.5, random 15 15
1 71.3 14.0 0.0X
-nullRatio=0.5, clustered 6 6
0 166.6 6.0 0.0X
-nullRatio=0.9, random 8 8
0 127.6 7.8 0.0X
-nullRatio=0.9, clustered 6 6
0 175.5 5.7 0.0X
-nullRatio=1.0, random 0 0
0 8275.6 0.1 1.3X
+nullRatio=0.0, n/a 0 0
0 5285.1 0.2 1.0X
+nullRatio=0.1, random 9 9
0 113.1 8.8 0.0X
+nullRatio=0.1, clustered 6 6
0 173.0 5.8 0.0X
+nullRatio=0.3, random 14 14
0 75.5 13.2 0.0X
+nullRatio=0.3, clustered 6 6
0 173.1 5.8 0.0X
+nullRatio=0.5, random 15 16
0 67.7 14.8 0.0X
+nullRatio=0.5, clustered 6 6
0 176.2 5.7 0.0X
+nullRatio=0.9, random 8 8
0 128.6 7.8 0.0X
+nullRatio=0.9, clustered 5 6
0 196.2 5.1 0.0X
+nullRatio=1.0, random 0 0
0 35936.0 0.0 6.8X
================================================================================================
Nullable batch decode without def-level materialization
================================================================================================
-OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1011-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Nullable batch without def-levels: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, n/a 0 0
0 11464.6 0.1 1.0X
-nullRatio=0.1, random 7 7
0 148.7 6.7 0.0X
-nullRatio=0.1, clustered 5 5
0 207.2 4.8 0.0X
-nullRatio=0.3, random 10 10
0 100.6 9.9 0.0X
-nullRatio=0.3, clustered 5 5
0 204.8 4.9 0.0X
-nullRatio=0.5, random 12 12
0 90.6 11.0 0.0X
-nullRatio=0.5, clustered 5 5
0 205.1 4.9 0.0X
-nullRatio=0.9, random 7 7
0 158.9 6.3 0.0X
-nullRatio=0.9, clustered 5 5
0 212.4 4.7 0.0X
-nullRatio=1.0, random 0 0
0 11983.2 0.1 1.0X
+nullRatio=0.0, n/a 0 0
0 8150.4 0.1 1.0X
+nullRatio=0.1, random 7 7
0 147.0 6.8 0.0X
+nullRatio=0.1, clustered 5 5
0 218.2 4.6 0.0X
+nullRatio=0.3, random 11 11
0 98.0 10.2 0.0X
+nullRatio=0.3, clustered 5 5
1 218.0 4.6 0.0X
+nullRatio=0.5, random 12 12
0 90.0 11.1 0.0X
+nullRatio=0.5, clustered 5 5
0 221.1 4.5 0.0X
+nullRatio=0.9, random 6 6
0 167.0 6.0 0.0X
+nullRatio=0.9, clustered 4 5
0 237.7 4.2 0.0X
+nullRatio=1.0, random 0 0
0 115647.5 0.0 14.2X
================================================================================================
Nullable batch decode with row-index filtering (with def-levels)
================================================================================================
-OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1011-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Nullable batch with def-levels, row-index filtered: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
----------------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, contiguous 50% 1
1 0 762.9 1.3 1.0X
-nullRatio=0.3, contiguous 50% 9
9 0 116.5 8.6 0.2X
-nullRatio=0.9, contiguous 50% 7
7 0 156.2 6.4 0.2X
-nullRatio=0.0, alt 1000-row windows 2
2 0 433.5 2.3 0.6X
-nullRatio=0.3, alt 1000-row windows 10
10 0 103.5 9.7 0.1X
-nullRatio=0.9, alt 1000-row windows 8
8 1 136.1 7.3 0.2X
+nullRatio=0.0, contiguous 50% 1
1 0 817.3 1.2 1.0X
+nullRatio=0.3, contiguous 50% 9
10 0 111.1 9.0 0.1X
+nullRatio=0.9, contiguous 50% 7
7 0 159.5 6.3 0.2X
+nullRatio=0.0, alt 1000-row windows 3
3 0 327.9 3.1 0.4X
+nullRatio=0.3, alt 1000-row windows 12
12 1 90.9 11.0 0.1X
+nullRatio=0.9, alt 1000-row windows 9
9 0 119.7 8.4 0.1X
================================================================================================
Nullable batch decode with row-index filtering (without def-levels)
================================================================================================
-OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1011-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Nullable batch without def-levels, row-index filtered: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, contiguous 50% 1
2 0 732.5 1.4 1.0X
-nullRatio=0.3, contiguous 50% 8
8 0 131.7 7.6 0.2X
-nullRatio=0.9, contiguous 50% 6
6 0 173.9 5.8 0.2X
-nullRatio=0.0, alt 1000-row windows 2
2 0 423.8 2.4 0.6X
-nullRatio=0.3, alt 1000-row windows 9
9 0 115.8 8.6 0.2X
-nullRatio=0.9, alt 1000-row windows 7
7 0 147.6 6.8 0.2X
+nullRatio=0.0, contiguous 50% 1
1 0 842.7 1.2 1.0X
+nullRatio=0.3, contiguous 50% 8
8 0 128.2 7.8 0.2X
+nullRatio=0.9, contiguous 50% 6
6 0 181.8 5.5 0.2X
+nullRatio=0.0, alt 1000-row windows 3
3 0 331.8 3.0 0.4X
+nullRatio=0.3, alt 1000-row windows 10
11 0 102.5 9.8 0.1X
+nullRatio=0.9, alt 1000-row windows 8
8 0 134.2 7.4 0.2X
================================================================================================
Single-value reads
================================================================================================
-OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1011-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Single-value reads: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-readBoolean 3 3
0 321.6 3.1 1.0X
-readInteger, bitWidth=4 3 4
0 305.3 3.3 0.9X
-readValueDictionaryId, bitWidth=4 3 4
0 305.0 3.3 0.9X
-readInteger, bitWidth=8 3 3
0 322.7 3.1 1.0X
-readValueDictionaryId, bitWidth=8 3 3
0 322.7 3.1 1.0X
-readInteger, bitWidth=12 4 4
0 282.4 3.5 0.9X
-readValueDictionaryId, bitWidth=12 4 4
0 282.4 3.5 0.9X
-readInteger, bitWidth=20 4 4
0 248.5 4.0 0.8X
-readValueDictionaryId, bitWidth=20 4 4
0 248.4 4.0 0.8X
+readBoolean 4 4
0 263.6 3.8 1.0X
+readInteger, bitWidth=4 3 4
0 301.6 3.3 1.1X
+readValueDictionaryId, bitWidth=4 3 4
0 300.4 3.3 1.1X
+readInteger, bitWidth=8 3 4
0 314.9 3.2 1.2X
+readValueDictionaryId, bitWidth=8 3 3
0 315.7 3.2 1.2X
+readInteger, bitWidth=12 4 4
0 276.9 3.6 1.1X
+readValueDictionaryId, bitWidth=12 4 4
0 275.9 3.6 1.0X
+readInteger, bitWidth=20 4 4
0 249.3 4.0 0.9X
+readValueDictionaryId, bitWidth=20 4 4
0 247.9 4.0 0.9X
================================================================================================
Skip
================================================================================================
-OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1011-azure
-AMD EPYC 7763 64-Core Processor
+OpenJDK 64-Bit Server VM 25.0.3+9-LTS on Linux 6.17.0-1013-azure
+Intel(R) Xeon(R) Platinum 8370C CPU @ 2.80GHz
Skip: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-skipBooleans, trueRatio=0.0 0 0
0 26214400.0 0.0 1.0X
-skipBooleans, trueRatio=0.5 2 2
0 621.6 1.6 0.0X
-skipBooleans, trueRatio=1.0 0 0
0 26214400.0 0.0 1.0X
-skipIntegers PACKED, bitWidth=4 2 2
0 537.5 1.9 0.0X
-skipIntegers RLE, bitWidth=4 0 0
0 26214400.0 0.0 1.0X
-skipIntegers PACKED, bitWidth=8 2 2
0 599.4 1.7 0.0X
-skipIntegers RLE, bitWidth=8 0 0
0 26214400.0 0.0 1.0X
-skipIntegers PACKED, bitWidth=12 2 2
0 471.2 2.1 0.0X
-skipIntegers RLE, bitWidth=12 0 0
0 21399510.2 0.0 0.8X
-skipIntegers PACKED, bitWidth=20 3 3
0 384.7 2.6 0.0X
-skipIntegers RLE, bitWidth=20 0 0
0 21399510.2 0.0 0.8X
+skipBooleans, trueRatio=0.0 0 0
0 29959314.3 0.0 1.0X
+skipBooleans, trueRatio=0.5 2 2
0 611.9 1.6 0.0X
+skipBooleans, trueRatio=1.0 0 0
0 29959314.3 0.0 1.0X
+skipIntegers PACKED, bitWidth=4 2 2
0 542.6 1.8 0.0X
+skipIntegers RLE, bitWidth=4 0 0
0 29959314.3 0.0 1.0X
+skipIntegers PACKED, bitWidth=8 2 2
0 590.8 1.7 0.0X
+skipIntegers RLE, bitWidth=8 0 0
0 29959314.3 0.0 1.0X
+skipIntegers PACKED, bitWidth=12 2 2
0 467.9 2.1 0.0X
+skipIntegers RLE, bitWidth=12 0 0
0 26214400.0 0.0 0.9X
+skipIntegers PACKED, bitWidth=20 3 3
0 404.2 2.5 0.0X
+skipIntegers RLE, bitWidth=20 0 0
0 23831272.7 0.0 0.8X
diff --git a/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-results.txt
b/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-results.txt
index 749296283bd3..2ec7742a9d13 100644
--- a/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-results.txt
+++ b/sql/core/benchmarks/VectorizedRleValuesReaderBenchmark-results.txt
@@ -2,113 +2,113 @@
Boolean decode
================================================================================================
-OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
AMD EPYC 7763 64-Core Processor
RLE readBooleans decode: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-cold reader, trueRatio=0.0 0 0
0 66239.8 0.0 1.0X
-reused reader, trueRatio=0.0 0 0
0 57887.6 0.0 0.9X
-cold reader, trueRatio=0.1 1 1
0 893.5 1.1 0.0X
-reused reader, trueRatio=0.1 1 1
0 895.6 1.1 0.0X
-cold reader, trueRatio=0.5 1 1
0 1018.7 1.0 0.0X
-reused reader, trueRatio=0.5 1 1
0 1029.4 1.0 0.0X
-cold reader, trueRatio=0.9 1 1
0 891.9 1.1 0.0X
-reused reader, trueRatio=0.9 1 1
0 894.8 1.1 0.0X
-cold reader, trueRatio=1.0 0 0
0 67001.7 0.0 1.0X
-reused reader, trueRatio=1.0 0 0
0 72380.5 0.0 1.1X
+cold reader, trueRatio=0.0 0 0
0 25842.3 0.0 1.0X
+reused reader, trueRatio=0.0 0 0
0 25810.5 0.0 1.0X
+cold reader, trueRatio=0.1 2 2
0 485.6 2.1 0.0X
+reused reader, trueRatio=0.1 2 2
0 485.6 2.1 0.0X
+cold reader, trueRatio=0.5 2 2
0 500.3 2.0 0.0X
+reused reader, trueRatio=0.5 2 2
0 483.7 2.1 0.0X
+cold reader, trueRatio=0.9 2 2
0 484.3 2.1 0.0X
+reused reader, trueRatio=0.9 2 2
0 484.7 2.1 0.0X
+cold reader, trueRatio=1.0 0 0
0 25804.1 0.0 1.0X
+reused reader, trueRatio=1.0 0 0
0 25791.4 0.0 1.0X
================================================================================================
Integer decode
================================================================================================
-OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
AMD EPYC 7763 64-Core Processor
RLE readIntegers dictionary-id decode: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-PACKED cold, bitWidth=4 2 2
0 505.6 2.0 1.0X
-PACKED reused, bitWidth=4 2 2
0 504.6 2.0 1.0X
-RLE, bitWidth=4 0 0
0 18249.1 0.1 36.1X
-PACKED cold, bitWidth=8 2 2
0 497.6 2.0 1.0X
-PACKED reused, bitWidth=8 2 2
0 496.2 2.0 1.0X
-RLE, bitWidth=8 0 0
0 18123.0 0.1 35.8X
-PACKED cold, bitWidth=12 3 3
0 370.2 2.7 0.7X
-PACKED reused, bitWidth=12 3 3
0 369.6 2.7 0.7X
-RLE, bitWidth=12 0 0
0 18573.3 0.1 36.7X
-PACKED cold, bitWidth=20 3 3
0 315.1 3.2 0.6X
-PACKED reused, bitWidth=20 3 3
0 316.2 3.2 0.6X
-RLE, bitWidth=20 0 0
0 18570.0 0.1 36.7X
+PACKED cold, bitWidth=4 2 2
0 485.7 2.1 1.0X
+PACKED reused, bitWidth=4 2 2
0 485.2 2.1 1.0X
+RLE, bitWidth=4 0 0
0 20688.1 0.0 42.6X
+PACKED cold, bitWidth=8 2 2
0 478.8 2.1 1.0X
+PACKED reused, bitWidth=8 2 2
0 476.5 2.1 1.0X
+RLE, bitWidth=8 0 0
0 15710.2 0.1 32.3X
+PACKED cold, bitWidth=12 3 3
0 357.4 2.8 0.7X
+PACKED reused, bitWidth=12 3 3
0 357.3 2.8 0.7X
+RLE, bitWidth=12 0 0
0 20684.0 0.0 42.6X
+PACKED cold, bitWidth=20 3 4
0 303.0 3.3 0.6X
+PACKED reused, bitWidth=20 3 4
0 302.1 3.3 0.6X
+RLE, bitWidth=20 0 0
0 20675.9 0.0 42.6X
================================================================================================
Nullable batch decode with def-level materialization
================================================================================================
-OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
AMD EPYC 7763 64-Core Processor
Nullable batch with def-levels: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, n/a 0 0
0 6431.5 0.2 1.0X
-nullRatio=0.1, random 9 9
0 114.6 8.7 0.0X
-nullRatio=0.1, clustered 7 7
0 159.7 6.3 0.0X
-nullRatio=0.3, random 13 13
0 80.1 12.5 0.0X
-nullRatio=0.3, clustered 7 7
1 157.8 6.3 0.0X
-nullRatio=0.5, random 14 15
0 72.7 13.7 0.0X
-nullRatio=0.5, clustered 6 7
0 162.0 6.2 0.0X
-nullRatio=0.9, random 8 8
0 126.4 7.9 0.0X
-nullRatio=0.9, clustered 6 6
0 174.0 5.7 0.0X
-nullRatio=1.0, random 0 0
0 8062.5 0.1 1.3X
+nullRatio=0.0, n/a 0 0
0 6435.2 0.2 1.0X
+nullRatio=0.1, random 10 10
0 106.0 9.4 0.0X
+nullRatio=0.1, clustered 7 7
0 144.9 6.9 0.0X
+nullRatio=0.3, random 14 14
0 75.0 13.3 0.0X
+nullRatio=0.3, clustered 7 7
0 147.9 6.8 0.0X
+nullRatio=0.5, random 16 16
0 67.5 14.8 0.0X
+nullRatio=0.5, clustered 7 7
0 151.9 6.6 0.0X
+nullRatio=0.9, random 9 9
0 121.9 8.2 0.0X
+nullRatio=0.9, clustered 6 6
0 166.4 6.0 0.0X
+nullRatio=1.0, random 0 0
0 4126.2 0.2 0.6X
================================================================================================
Nullable batch decode without def-level materialization
================================================================================================
-OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
AMD EPYC 7763 64-Core Processor
Nullable batch without def-levels: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, n/a 0 0
0 11054.0 0.1 1.0X
-nullRatio=0.1, random 7 8
0 140.6 7.1 0.0X
-nullRatio=0.1, clustered 5 5
0 193.2 5.2 0.0X
-nullRatio=0.3, random 11 11
0 97.4 10.3 0.0X
-nullRatio=0.3, clustered 6 6
0 184.4 5.4 0.0X
-nullRatio=0.5, random 12 12
0 87.7 11.4 0.0X
-nullRatio=0.5, clustered 5 6
0 191.6 5.2 0.0X
-nullRatio=0.9, random 7 7
0 151.7 6.6 0.0X
-nullRatio=0.9, clustered 5 5
0 200.8 5.0 0.0X
-nullRatio=1.0, random 0 0
0 11662.5 0.1 1.1X
+nullRatio=0.0, n/a 0 0
0 10982.3 0.1 1.0X
+nullRatio=0.1, random 8 8
0 139.7 7.2 0.0X
+nullRatio=0.1, clustered 5 6
0 191.6 5.2 0.0X
+nullRatio=0.3, random 11 11
0 98.9 10.1 0.0X
+nullRatio=0.3, clustered 5 5
0 194.1 5.2 0.0X
+nullRatio=0.5, random 12 12
0 89.4 11.2 0.0X
+nullRatio=0.5, clustered 5 5
0 199.1 5.0 0.0X
+nullRatio=0.9, random 7 7
0 160.5 6.2 0.0X
+nullRatio=0.9, clustered 5 5
0 218.1 4.6 0.0X
+nullRatio=1.0, random 0 0
0 4916.7 0.2 0.4X
================================================================================================
Nullable batch decode with row-index filtering (with def-levels)
================================================================================================
-OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
AMD EPYC 7763 64-Core Processor
Nullable batch with def-levels, row-index filtered: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
----------------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, contiguous 50% 1
2 0 763.7 1.3 1.0X
-nullRatio=0.3, contiguous 50% 9
9 0 117.3 8.5 0.2X
-nullRatio=0.9, contiguous 50% 7
7 0 157.5 6.3 0.2X
-nullRatio=0.0, alt 1000-row windows 3
3 0 418.8 2.4 0.5X
-nullRatio=0.3, alt 1000-row windows 10
10 0 103.8 9.6 0.1X
-nullRatio=0.9, alt 1000-row windows 8
8 0 134.7 7.4 0.2X
+nullRatio=0.0, contiguous 50% 1
2 0 759.7 1.3 1.0X
+nullRatio=0.3, contiguous 50% 10
10 0 108.5 9.2 0.1X
+nullRatio=0.9, contiguous 50% 7
7 0 149.5 6.7 0.2X
+nullRatio=0.0, alt 1000-row windows 2
3 0 419.6 2.4 0.6X
+nullRatio=0.3, alt 1000-row windows 11
11 0 96.9 10.3 0.1X
+nullRatio=0.9, alt 1000-row windows 8
8 0 128.3 7.8 0.2X
================================================================================================
Nullable batch decode with row-index filtering (without def-levels)
================================================================================================
-OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
AMD EPYC 7763 64-Core Processor
Nullable batch without def-levels, row-index filtered: Best Time(ms) Avg
Time(ms) Stdev(ms) Rate(M/s) Per Row(ns) Relative
-------------------------------------------------------------------------------------------------------------------------------------
-nullRatio=0.0, contiguous 50% 1
1 0 865.4 1.2 1.0X
-nullRatio=0.3, contiguous 50% 8
8 0 128.8 7.8 0.1X
-nullRatio=0.9, contiguous 50% 6
6 0 173.4 5.8 0.2X
-nullRatio=0.0, alt 1000-row windows 2
2 0 425.9 2.3 0.5X
-nullRatio=0.3, alt 1000-row windows 9
9 0 111.3 9.0 0.1X
+nullRatio=0.0, contiguous 50% 1
1 0 865.3 1.2 1.0X
+nullRatio=0.3, contiguous 50% 8
8 0 127.6 7.8 0.1X
+nullRatio=0.9, contiguous 50% 6
6 0 174.1 5.7 0.2X
+nullRatio=0.0, alt 1000-row windows 2
2 0 425.3 2.4 0.5X
+nullRatio=0.3, alt 1000-row windows 9
10 0 110.5 9.1 0.1X
nullRatio=0.9, alt 1000-row windows 7
7 0 143.3 7.0 0.2X
@@ -116,39 +116,39 @@ nullRatio=0.9, alt 1000-row windows
7
Single-value reads
================================================================================================
-OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
AMD EPYC 7763 64-Core Processor
Single-value reads: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-readBoolean 4 4
0 287.0 3.5 1.0X
-readInteger, bitWidth=4 4 4
0 275.7 3.6 1.0X
-readValueDictionaryId, bitWidth=4 4 4
0 275.5 3.6 1.0X
-readInteger, bitWidth=8 4 4
0 273.7 3.7 1.0X
+readBoolean 4 4
1 272.0 3.7 1.0X
+readInteger, bitWidth=4 4 4
0 274.0 3.6 1.0X
+readValueDictionaryId, bitWidth=4 4 4
0 275.2 3.6 1.0X
+readInteger, bitWidth=8 4 4
0 272.3 3.7 1.0X
readValueDictionaryId, bitWidth=8 4 4
0 273.4 3.7 1.0X
-readInteger, bitWidth=12 5 5
1 230.1 4.3 0.8X
-readValueDictionaryId, bitWidth=12 5 5
0 229.3 4.4 0.8X
-readInteger, bitWidth=20 5 5
0 207.9 4.8 0.7X
-readValueDictionaryId, bitWidth=20 5 5
0 207.3 4.8 0.7X
+readInteger, bitWidth=12 5 5
0 228.2 4.4 0.8X
+readValueDictionaryId, bitWidth=12 5 5
0 228.9 4.4 0.8X
+readInteger, bitWidth=20 5 5
0 204.3 4.9 0.8X
+readValueDictionaryId, bitWidth=20 5 5
0 205.5 4.9 0.8X
================================================================================================
Skip
================================================================================================
-OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1010-azure
+OpenJDK 64-Bit Server VM 17.0.19+10-LTS on Linux 6.17.0-1013-azure
AMD EPYC 7763 64-Core Processor
Skip: Best Time(ms) Avg Time(ms)
Stdev(ms) Rate(M/s) Per Row(ns) Relative
------------------------------------------------------------------------------------------------------------------------
-skipBooleans, trueRatio=0.0 0 0
0 20971520.0 0.0 1.0X
-skipBooleans, trueRatio=0.5 2 2
0 569.4 1.8 0.0X
+skipBooleans, trueRatio=0.0 0 0
0 21399510.2 0.0 1.0X
+skipBooleans, trueRatio=0.5 2 2
0 536.1 1.9 0.0X
skipBooleans, trueRatio=1.0 0 0
0 21399510.2 0.0 1.0X
-skipIntegers PACKED, bitWidth=4 2 2
0 522.7 1.9 0.0X
+skipIntegers PACKED, bitWidth=4 2 2
0 502.5 2.0 0.0X
skipIntegers RLE, bitWidth=4 0 0
0 20971520.0 0.0 1.0X
-skipIntegers PACKED, bitWidth=8 2 2
0 516.6 1.9 0.0X
+skipIntegers PACKED, bitWidth=8 2 2
0 497.5 2.0 0.0X
skipIntegers RLE, bitWidth=8 0 0
0 21399510.2 0.0 1.0X
-skipIntegers PACKED, bitWidth=12 3 3
0 382.4 2.6 0.0X
+skipIntegers PACKED, bitWidth=12 3 3
0 367.9 2.7 0.0X
skipIntegers RLE, bitWidth=12 0 0
0 17476266.7 0.0 0.8X
-skipIntegers PACKED, bitWidth=20 3 3
0 323.0 3.1 0.0X
+skipIntegers PACKED, bitWidth=20 3 3
0 310.9 3.2 0.0X
skipIntegers RLE, bitWidth=20 0 0
0 17476266.7 0.0 0.8X
diff --git
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java
index 5dac20209ef3..7c5742b65ada 100644
---
a/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java
+++
b/sql/core/src/main/java/org/apache/spark/sql/execution/datasources/parquet/VectorizedRleValuesReader.java
@@ -298,9 +298,7 @@ public final class VectorizedRleValuesReader extends
ValuesReader
} while (currentBufferIdx < bufEnd
&& currentBuffer[currentBufferIdx] != maxDefLevel);
int runLen = currentBufferIdx - runStart;
- for (int k = 0; k < runLen; k++) {
- nulls.putNull(valueOff + k);
- }
+ nulls.putNulls(valueOff, runLen);
valueOff += runLen;
}
}
@@ -714,14 +712,10 @@ public final class VectorizedRleValuesReader extends
ValuesReader
updater.readValues(runLen, valueOff, values, valueReader);
}
} else {
- for (int k = 0; k < runLen; k++) {
- nulls.putNull(valueOff + k);
- }
+ nulls.putNulls(valueOff, runLen);
}
valueOff += runLen;
- for (int k = 0; k < runLen; k++) {
- defLevels.putInt(levelIdx + k, runValue);
- }
+ defLevels.putInts(levelIdx, runLen, runValue);
levelIdx += runLen;
}
state.valueOffset = valueOff;
diff --git
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
index 03a645acedf5..e9d4ac8aa8d6 100644
---
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
+++
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OffHeapColumnVector.java
@@ -33,12 +33,13 @@ public final class OffHeapColumnVector extends
WritableColumnVector {
private static final boolean bigEndianPlatform =
ByteOrder.nativeOrder().equals(ByteOrder.BIG_ENDIAN);
- // Below this count, byte-fill methods (putBytes / putBooleans) write bytes
in an inline
- // loop. At or above this count, they call Platform.setMemory which lowers
to a native
- // memset. The JNI fixed cost of setMemory dominates for very short fills;
on the
+ // Below this count, byte-fill methods (putBytes / putBooleans / putNulls)
write bytes in
+ // an inline loop. At or above this count, they call Platform.setMemory
which lowers to a
+ // native memset. The JNI fixed cost of setMemory dominates for very short
fills; on the
// benchmarked hardware (Apple M4 Max + OpenJDK 21) the crossover sits
between 64 and
- // 512, so 128 is a conservative choice that avoids regression at small
counts while
- // retaining the bulk of the asymptotic gain.
+ // 512, so 128 is a conservative choice that avoids regression at small
counts (common
+ // for random null patterns where RLE runs are short) while retaining the
bulk of the
+ // asymptotic gain.
private static final int SET_MEMORY_THRESHOLD = 128;
/**
@@ -127,9 +128,13 @@ public final class OffHeapColumnVector extends
WritableColumnVector {
@Override
public void putNulls(int rowId, int count) {
if (isAllNull()) return; // Skip writing nulls to all-null vector.
- long offset = nulls + rowId;
- for (int i = 0; i < count; ++i, ++offset) {
- Platform.putByte(null, offset, (byte) 1);
+ if (count < SET_MEMORY_THRESHOLD) {
+ long offset = nulls + rowId;
+ for (int i = 0; i < count; ++i, ++offset) {
+ Platform.putByte(null, offset, (byte) 1);
+ }
+ } else {
+ Platform.setMemory(nulls + rowId, (byte) 1, count);
}
numNulls += count;
}
diff --git
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
index 784c9053ca81..146c4b1bf3eb 100644
---
a/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
+++
b/sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/OnHeapColumnVector.java
@@ -116,9 +116,7 @@ public final class OnHeapColumnVector extends
WritableColumnVector {
@Override
public void putNulls(int rowId, int count) {
if (isAllNull()) return; // Skip writing nulls to all-null vector.
- for (int i = 0; i < count; ++i) {
- nulls[rowId + i] = (byte)1;
- }
+ Arrays.fill(nulls, rowId, rowId + count, (byte) 1);
numNulls += count;
}
@@ -313,9 +311,7 @@ public final class OnHeapColumnVector extends
WritableColumnVector {
@Override
public void putInts(int rowId, int count, int value) {
- for (int i = 0; i < count; ++i) {
- intData[i + rowId] = value;
- }
+ Arrays.fill(intData, rowId, rowId + count, value);
}
@Override
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]