wombatu-kun opened a new pull request, #16362:
URL: https://github.com/apache/iceberg/pull/16362
## What
Resolves the long-standing `TODO: translate truncate(col) == value to
startsWith(value)` in `UnboundPredicate.bindLiteralOperation`. When the term of
an `EQ`/`NOT_EQ` predicate is a **string** `truncate[W]` transform, binding now
produces an exactly-equivalent predicate on the untransformed source column.
The equivalence depends on the literal length vs. the truncate width `W`:
| Condition | `truncate[W](col) == v` | `truncate[W](col) != v` |
|---|---|---|
| `len(v) > W` | `alwaysFalse()` | `alwaysTrue()` |
| `len(v) == W` | `col STARTS_WITH v` | `col NOT_STARTS_WITH v` |
| `len(v) < W` | `col == v` | `col != v` |
Integer/long/decimal/binary truncate and all other operators are
intentionally left unchanged — they have no exact source-column equivalence.
## Why
This rewrite is already assumed by the rest of the engine.
`InclusiveMetricsEvaluator.startsWith()` returns `ROWS_MIGHT_MATCH` for
non-identity transform terms with the explicit comment *"truncate must be
rewritten in binding"*. Until now the binder never performed that rewrite for
equality, so `truncate(col) == v` kept an opaque `BoundTransform` term and
metrics/dictionary/partition pruning could not use the column. After this
change such predicates prune correctly (e.g. `equal(truncate("str",3),"xyz")`
against bounds `["abc","abe"]` now skips the file instead of reading it).
## Implementation notes
- The `< / == / > width` decision is centralized in a single
`Truncate.lengthRewrite` helper, shared by predicate binding and by
`TruncateString.project` / `projectStrict`, so the two paths cannot diverge.
- Removing the `BoundTransform` term would otherwise defeat
`ProjectionUtil.projectTransformPredicate` (which matches partition transforms
by `toString()`) and collapse strict projection to `False`. To preserve the
previous precision, `TruncateString` now projects EXACT-length (`len(v) < W`)
`EQ`/`NOT_EQ` predicates directly onto the partition value — provably the same
result the old transform-term path produced, since `truncate[W](x) == v ⟺ x ==
v` when `len(v) < W`.
- New public API is additive only (`Transforms.StringTruncateRewrite` enum +
`Transforms.stringTruncateRewrite`); `revapi` passes with no accepted-breaks
entry.
## Testing
- `TestPredicateBinding`: all three length classes for `EQ`/`NOT_EQ`,
empty-string literal, plus negatives (non-string truncate, non-truncate
transforms, other operators unchanged).
- `TestStartsWith`: runtime-equivalence check via `Evaluator`, including the
EXACT-vs-prefix distinction.
- `TestInclusiveMetricsEvaluatorWithTransforms`: a pruning case that now
prunes where it previously could not.
- Full `:iceberg-api:test` and `:iceberg-core:test`, all
transform/projection/residual regression suites, `spotlessCheck`, and
`:iceberg-api:revapi` pass.
🤖 Generated with [Claude Code](https://claude.com/claude-code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]