dramaticlly opened a new pull request, #13785:
URL: https://github.com/apache/iceberg/pull/13785
1. Updated table properties documentation documentation with #13039 we now
change the behavior from collecting default metrics mode of 100 top level field
metrics to 100 fields (i.e nested fields within in struct/map/list now counts
toward the 100 limit)
2. Added
`core/src/jmh/java/org/apache/iceberg/metrics/LimitFieldIdsBenchmark.java` for
wide columns to understand its performance
```
Benchmark (limitFields) (numFields) Mode Cnt
Score Error Units
LimitFieldIdsBenchmark.limitFields 100 50 ss 5 ≈
10⁻⁴ s/op
LimitFieldIdsBenchmark.limitFields 100 10000 ss 5 ≈
10⁻³ s/op
LimitFieldIdsBenchmark.limitFields 100 100000 ss 5
0.003 ± 0.009 s/op
LimitFieldIdsBenchmark.limitFields 100 1000000 ss 5
0.032 ± 0.162 s/op
LimitFieldIdsBenchmark.limitFields 10000 50 ss 5 ≈
10⁻⁴ s/op
LimitFieldIdsBenchmark.limitFields 10000 10000 ss 5
0.001 ± 0.004 s/op
LimitFieldIdsBenchmark.limitFields 10000 100000 ss 5
0.002 ± 0.001 s/op
LimitFieldIdsBenchmark.limitFields 10000 1000000 ss 5
0.013 ± 0.002 s/op
```
3. Change the visibility scope of static method `limitFieldIds(Schema
schema, int limit)` to public, as I find it helpful to provide list of
fieldsIds where we shall apply default metrics mode. This expose a method to
allow engines to honor iceberg default by only keeping the required column
level metrics
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]