ZachDischner commented on issue #9018: URL: https://github.com/apache/iceberg/issues/9018#issuecomment-1885807415
I am also seeing this issue. I have existing Iceberg tables, for which a large number of Spark SQL queries simply fail once I use more updated libraries. My existing tables were created and updated using Spark on EMR over the past year. I can recreate only on modern EMR/Iceberg environments for the same queries that run on previous ones. https://docs.aws.amazon.com/emr/latest/ReleaseGuide/Iceberg-release-history.html emr-6.10.0 - Reads and Writes work emr-6.10.1 - Reads and Writes work ... emr-6.14.0 - Reads and Writes work emr-6.15.0 - Some reads _don't_ work emr-7.0.0 - Some reads _don't_ work The cutoff appears to be Iceberg version `1.4.0`+. I'm not sure if this helps, working with an obfuscated example. The situation I'm seeing where a query that fails includes many CTEs, and the error only appears with a particular one. ``` spark.sql(""" WITH a as (SELECT * FROM table WHERE <predicate>), b as (SELECT * FROM table2 WHERE <predicate>) ... joins, filters, etc z as (SELECT * FROM a union b union c join d...) SELECT * FROM z""" ) ``` An intermediate CTE is where the error manifests. I cannot tell anything about it that is immediately suspicious ``` m AS (SELECT col1, col2, col3, col4, ... FROM l) ``` Such that `SELECT * FROM l` succeeds, but `SELECT * FROM m` fails. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org