dejangvozdenac opened a new issue, #13328: URL: https://github.com/apache/iceberg/issues/13328
### Apache Iceberg version 1.9.1 (latest release) ### Query engine Trino ### Please describe the bug 🐞 In Spark, we create a nested struct `address.street`. The outermost field `address` is optional, but the innermost field `street` is required. When querying with trino with condition `address.street is null` with projection pushdown disabled, trino reads the entire file and returns those fields where address is null (and thus `address.street` is null). However, when using projection pushdown, Trino delegates the planning decision to Iceberg and it seems to get no eligible files to read, leading to no rows returned. I can't find anything in the docs that says what's the right behavior here (as in, does `address.street is null` mean that `address` exists and `address.street` is null or that `address.street` is not set in that row in any way), but agreement between iceberg and Trino is essential. Here is the spark-sql commands that I used to create the table ``` spark-sql> CREATE TABLE default.dejan_test ( id INT NOT NULL, name STRING NOT NULL, age INT NOT NULL, address STRUCT<street: STRING NOT NULL, address_info: STRUCT<city: STRING NOT NULL, county: STRING NOT NULL, state: STRING NOT NULL>>) USING iceberg; spark-sql> INSERT INTO default.dejan_test (id, name, age, address) VALUES ( 0, 'Jane Doe', 27, NULL ); spark-sql> INSERT INTO default.dejan_test (id, name, age, address) VALUES ( 1, 'John Doe', 30, STRUCT( '123 Main St', STRUCT('San Francisco', 'San Francisco County', 'California') ) ); ``` Here are the two different results we get from Trino: ``` trino> set session iceberg.projection_pushdown_enabled=false; SET SESSION trino> select id from iceberg.default.dejan_test where address.street is null; id ---- 0 (1 row) Query 20250613_033713_00001_xn59q, FINISHED, 1 node Splits: 2 total, 2 done (100.00%) 2.85 [2 rows, 4.43KiB] [0 rows/s, 1.56KiB/s] trino> set session iceberg.projection_pushdown_enabled=true; SET SESSION trino> select id from iceberg.default.dejan_test where address.street is null; id ---- (0 rows) Query 20250613_034027_00008_xn59q, FINISHED, 1 node Splits: 1 total, 1 done (100.00%) 0.36 [0 rows, 0B] [0 rows/s, 0B/s] ``` Full issue and reproduction steps can be found here: https://github.com/trinodb/trino/issues/20511#issuecomment-2968932230 ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [x] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org