[PR] Core, Docs: Update write.metadata.metrics.max-inferred-column-defaults documentation and add benchmark [iceberg]

via GitHub Mon, 11 Aug 2025 11:58:11 -0700


dramaticlly opened a new pull request, #13785:
URL: https://github.com/apache/iceberg/pull/13785


   1. Updated table properties documentation documentation with #13039 we now 
change the behavior from collecting default metrics mode of 100 top level field 
metrics to 100 fields (i.e nested fields within in struct/map/list now counts 
toward the 100 limit)
   
   2. Added 
`core/src/jmh/java/org/apache/iceberg/metrics/LimitFieldIdsBenchmark.java` for 
wide columns to understand its performance
   ```
   Benchmark                           (limitFields)  (numFields)  Mode  Cnt   
Score    Error  Units
   LimitFieldIdsBenchmark.limitFields            100           50    ss    5  ≈ 
10⁻⁴            s/op
   LimitFieldIdsBenchmark.limitFields            100        10000    ss    5  ≈ 
10⁻³            s/op
   LimitFieldIdsBenchmark.limitFields            100       100000    ss    5   
0.003 ±  0.009   s/op
   LimitFieldIdsBenchmark.limitFields            100      1000000    ss    5   
0.032 ±  0.162   s/op
   LimitFieldIdsBenchmark.limitFields          10000           50    ss    5  ≈ 
10⁻⁴            s/op
   LimitFieldIdsBenchmark.limitFields          10000        10000    ss    5   
0.001 ±  0.004   s/op
   LimitFieldIdsBenchmark.limitFields          10000       100000    ss    5   
0.002 ±  0.001   s/op
   LimitFieldIdsBenchmark.limitFields          10000      1000000    ss    5   
0.013 ±  0.002   s/op
   ```
   
   3. Change the visibility scope of static method `limitFieldIds(Schema 
schema, int limit)` to public, as I find it helpful to provide list of 
fieldsIds where we shall apply default metrics mode. This expose a method to 
allow engines to honor iceberg default by only keeping the required column 
level metrics 
    


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] Core, Docs: Update write.metadata.metrics.max-inferred-column-defaults documentation and add benchmark [iceberg]

Reply via email to