This is an automated email from the ASF dual-hosted git repository.

csy pushed a commit to branch branch-2.3
in repository https://gitbox.apache.org/repos/asf/orc.git


The following commit(s) were added to refs/heads/branch-2.3 by this push:
     new 499e107f9 ORC-2131: Set default of orc.stripe.size.check.ratio and 
orc.dictionary.max.size.bytes to 0
499e107f9 is described below

commit 499e107f9b47dddc3ab19b90459dc82c53a44c2b
Author: yongqian <[email protected]>
AuthorDate: Wed Mar 25 14:43:21 2026 +0800

    ORC-2131: Set default of orc.stripe.size.check.ratio and 
orc.dictionary.max.size.bytes to 0
    
    ### What changes were proposed in this pull request?
    
    Set default of `orc.stripe.size.check.ratio` and 
`orc.dictionary.max.size.bytes` to 0
    
    ### Why are the changes needed?
    
    After enabling the optimizations related to orc.stripe.size.check.ratio and 
orc.dictionary.max.size.bytes, we observed that ORC files written with the 
current defaults are about 10%–20% larger than before. For example, datasets 
that were previously ~1.0–1.1 TB grow to ~1.2 TB with the current defaults, 
causing noticeable storage cost increase.
    
    ### How was this patch tested?
    Local test
    
    With orc.dictionary.max.size.bytes=16777216 or 
orc.stripe.size.check.ratio=2.0, the written ORC data grows to 1.2 TB (data 
inflation).
    
    ```shell
               1         6665      1300347279057 
hdfs://ns/user/hive/warehouse/tmp_sandbox_xxx.db/tmp_test_123_2/d=2026-03-15
    ```
    
    With orc.dictionary.max.size.bytes=0 and orc.stripe.size.check.ratio=0.0, 
the data size remains at the expected 1.0 TB.
    ```shell
               1         6665      1143347882367 
hdfs://ns/user/hive/warehouse/tmp_sandbox_xxx.db/tmp_test_123_1/d=2026-03-15
    ```
    
    ### Was this patch authored or co-authored using generative AI tooling?
    
    No
    
    Closes #2580 from QianyongY/features/ORC-2131.
    
    Authored-by: yongqian <[email protected]>
    Signed-off-by: Shaoyun Chen <[email protected]>
    (cherry picked from commit 016b076998cbe08fdbbb6841d971accd863a9894)
    Signed-off-by: Shaoyun Chen <[email protected]>
---
 java/core/src/java/org/apache/orc/OrcConf.java | 4 ++--
 site/_docs/core-java-config.md                 | 4 ++--
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/java/core/src/java/org/apache/orc/OrcConf.java 
b/java/core/src/java/org/apache/orc/OrcConf.java
index cf8345e4a..04a2af641 100644
--- a/java/core/src/java/org/apache/orc/OrcConf.java
+++ b/java/core/src/java/org/apache/orc/OrcConf.java
@@ -121,7 +121,7 @@ public enum OrcConf {
           "dictionary encoding.  Use 1 to always use dictionary encoding."),
   DICTIONARY_MAX_SIZE_IN_BYTES("orc.dictionary.max.size.bytes",
       "orc.dictionary.max.size.bytes",
-      16 * 1024 * 1024,
+      0,
       "If the total size of the dictionary is greater than this\n" +
           ", turn off dictionary encoding. Use 0 to disable this check."),
   ROW_INDEX_STRIDE_DICTIONARY_CHECK("orc.dictionary.early.check",
@@ -190,7 +190,7 @@ public enum OrcConf {
         + " Use orc.stripe.row.count instead if the value larger than 
orc.stripe.row.count."),
   STRIPE_SIZE_CHECKRATIO("orc.stripe.size.check.ratio",
       "orc.stripe.size.check.ratio",
-      2.0,
+      0.0,
       "Flush stripe if the tree writer size in bytes is larger than (this * 
orc.stripe.size). " +
           "Use 0 to disable this check."),
   OVERWRITE_OUTPUT_FILE("orc.overwrite.output.file", 
"orc.overwrite.output.file", false,
diff --git a/site/_docs/core-java-config.md b/site/_docs/core-java-config.md
index 42bbbd17f..e9ad48512 100644
--- a/site/_docs/core-java-config.md
+++ b/site/_docs/core-java-config.md
@@ -167,7 +167,7 @@ permalink: /docs/core-java-config.html
 </tr>
 <tr>
   <td><code>orc.dictionary.max.size.bytes</code></td>
-  <td>16777216</td>
+  <td>0</td>
   <td>
     If the total size of the dictionary is greater than this, turn off 
dictionary encoding. Use 0 to disable this check.
   </td>
@@ -293,7 +293,7 @@ permalink: /docs/core-java-config.html
 </tr>
 <tr>
   <td><code>orc.stripe.size.check.ratio</code></td>
-  <td>2.0</td>
+  <td>0.0</td>
   <td>
     Flush stripe if the tree writer size in bytes is larger than (this * 
orc.stripe.size). Use 0 to disable this check.
   </td>

Reply via email to