This is an automated email from the ASF dual-hosted git repository.
csy pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/orc.git
The following commit(s) were added to refs/heads/branch-2.3 by this push:
new 499e107f9 ORC-2131: Set default of orc.stripe.size.check.ratio and
orc.dictionary.max.size.bytes to 0
499e107f9 is described below
commit 499e107f9b47dddc3ab19b90459dc82c53a44c2b
Author: yongqian <[email protected]>
AuthorDate: Wed Mar 25 14:43:21 2026 +0800
ORC-2131: Set default of orc.stripe.size.check.ratio and
orc.dictionary.max.size.bytes to 0
### What changes were proposed in this pull request?
Set default of `orc.stripe.size.check.ratio` and
`orc.dictionary.max.size.bytes` to 0
### Why are the changes needed?
After enabling the optimizations related to orc.stripe.size.check.ratio and
orc.dictionary.max.size.bytes, we observed that ORC files written with the
current defaults are about 10%–20% larger than before. For example, datasets
that were previously ~1.0–1.1 TB grow to ~1.2 TB with the current defaults,
causing noticeable storage cost increase.
### How was this patch tested?
Local test
With orc.dictionary.max.size.bytes=16777216 or
orc.stripe.size.check.ratio=2.0, the written ORC data grows to 1.2 TB (data
inflation).
```shell
1 6665 1300347279057
hdfs://ns/user/hive/warehouse/tmp_sandbox_xxx.db/tmp_test_123_2/d=2026-03-15
```
With orc.dictionary.max.size.bytes=0 and orc.stripe.size.check.ratio=0.0,
the data size remains at the expected 1.0 TB.
```shell
1 6665 1143347882367
hdfs://ns/user/hive/warehouse/tmp_sandbox_xxx.db/tmp_test_123_1/d=2026-03-15
```
### Was this patch authored or co-authored using generative AI tooling?
No
Closes #2580 from QianyongY/features/ORC-2131.
Authored-by: yongqian <[email protected]>
Signed-off-by: Shaoyun Chen <[email protected]>
(cherry picked from commit 016b076998cbe08fdbbb6841d971accd863a9894)
Signed-off-by: Shaoyun Chen <[email protected]>
---
java/core/src/java/org/apache/orc/OrcConf.java | 4 ++--
site/_docs/core-java-config.md | 4 ++--
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/java/core/src/java/org/apache/orc/OrcConf.java
b/java/core/src/java/org/apache/orc/OrcConf.java
index cf8345e4a..04a2af641 100644
--- a/java/core/src/java/org/apache/orc/OrcConf.java
+++ b/java/core/src/java/org/apache/orc/OrcConf.java
@@ -121,7 +121,7 @@ public enum OrcConf {
"dictionary encoding. Use 1 to always use dictionary encoding."),
DICTIONARY_MAX_SIZE_IN_BYTES("orc.dictionary.max.size.bytes",
"orc.dictionary.max.size.bytes",
- 16 * 1024 * 1024,
+ 0,
"If the total size of the dictionary is greater than this\n" +
", turn off dictionary encoding. Use 0 to disable this check."),
ROW_INDEX_STRIDE_DICTIONARY_CHECK("orc.dictionary.early.check",
@@ -190,7 +190,7 @@ public enum OrcConf {
+ " Use orc.stripe.row.count instead if the value larger than
orc.stripe.row.count."),
STRIPE_SIZE_CHECKRATIO("orc.stripe.size.check.ratio",
"orc.stripe.size.check.ratio",
- 2.0,
+ 0.0,
"Flush stripe if the tree writer size in bytes is larger than (this *
orc.stripe.size). " +
"Use 0 to disable this check."),
OVERWRITE_OUTPUT_FILE("orc.overwrite.output.file",
"orc.overwrite.output.file", false,
diff --git a/site/_docs/core-java-config.md b/site/_docs/core-java-config.md
index 42bbbd17f..e9ad48512 100644
--- a/site/_docs/core-java-config.md
+++ b/site/_docs/core-java-config.md
@@ -167,7 +167,7 @@ permalink: /docs/core-java-config.html
</tr>
<tr>
<td><code>orc.dictionary.max.size.bytes</code></td>
- <td>16777216</td>
+ <td>0</td>
<td>
If the total size of the dictionary is greater than this, turn off
dictionary encoding. Use 0 to disable this check.
</td>
@@ -293,7 +293,7 @@ permalink: /docs/core-java-config.html
</tr>
<tr>
<td><code>orc.stripe.size.check.ratio</code></td>
- <td>2.0</td>
+ <td>0.0</td>
<td>
Flush stripe if the tree writer size in bytes is larger than (this *
orc.stripe.size). Use 0 to disable this check.
</td>