Kimura-S2001 commented on issue #15103:
URL: https://github.com/apache/iceberg/issues/15103#issuecomment-3857991471
Thanks for the detailed question.
I tried to reproduce this behavior locally and was able to confirm it with a
minimal, reproducible example.
---
## Environment
- Spark: 4.0.1
- Iceberg: 1.10.0
- Catalog: Iceberg REST Catalog
- Storage: S3-compatible storage (MinIO)
---
## Table definition
```sql
CREATE TABLE default.ts_15103 (
serverTime TIMESTAMP,
id BIGINT
)
USING iceberg
PARTITIONED BY (days(serverTime));
```
---
## Test data
```sql
INSERT INTO default.ts_15103 VALUES
('2026-01-15 00:00:01', 1),
('2026-01-15 12:00:00', 2),
('2026-01-15 23:59:59', 3);
```
---
## File-level statistics (Iceberg metadata)
```sql
SELECT
file_path,
record_count,
lower_bounds,
upper_bounds
FROM default.ts_15103.files;
```
This shows that for the Parquet file of partition
`serverTime_day=2026-01-15`,
the `lower_bounds` of `serverTime` are strictly greater than
`2026-01-15 00:00:00`.
---
## Query with `>=` (partition pruning works)
```sql
SELECT count(*)
FROM default.ts_15103
WHERE serverTime >= TIMESTAMP '2026-01-15 00:00:00';
```
- Result: `3`
- Physical plan uses metadata-based pruning
- No Parquet file scan is required
---
## Query with `>` (unexpected full scan)
```sql
SELECT count(*)
FROM default.ts_15103
WHERE serverTime > TIMESTAMP '2026-01-15 00:00:00';
```
- Result: `3`
- Physical plan shows a `BatchScan`
- Parquet files are scanned even though
file-level `lower_bounds` should allow pruning
---
## Observation
Although the file-level Iceberg statistics indicate that:
```
lower_bounds(serverTime) > '2026-01-15 00:00:00'
```
Spark + Iceberg only applies metadata pruning for `>=`,
but not for `>`.
From a logical perspective, both predicates should allow
the file to be safely included without scanning its contents.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]