maximethebault opened a new issue, #6224: URL: https://github.com/apache/iceberg/issues/6224
### Apache Iceberg version 1.0.0 (latest release) ### Query engine Spark ### Please describe the bug 🐞 After upgrading to Iceberg 1.0.0 & Spark 3.3.1 (from 0.13.x & 3.2.x), some of our SQL queries stopped working. We suspect it may be a Iceberg-related issue as we couldn't reproduce the issue with Hive tables. ### Stripped-down reproducer Set-up tables & views ``` val table1 = Seq(("204")).toDF("id") table1.createOrReplaceTempView("table1") val table2_1 = Seq(("204")).toDF("id") table2_1.writeTo("dev.table2_1").using("iceberg").createOrReplace() val table2_2 = Seq(("204")).toDF("id") table2_2.createOrReplaceTempView("table2_2") val table2 = spark.table("dev.table2_1").union(spark.table("table2_2")) table2.createOrReplaceTempView("table2") ``` Run query ``` SELECT u.* FROM table1 LEFT JOIN ( SELECT id FROM table1 LEFT JOIN table2 USING(id) ) u USING(id) ``` Results in an exception: ``` java.lang.IllegalArgumentException: requirement failed at scala.Predef$.require(Predef.scala:268) at org.apache.spark.sql.catalyst.plans.logical.View.<init>(basicLogicalOperators.scala:569) at org.apache.spark.sql.catalyst.plans.logical.View.copy(basicLogicalOperators.scala:568) at org.apache.spark.sql.catalyst.plans.logical.View.withNewChildInternal(basicLogicalOperators.scala:604) at org.apache.spark.sql.catalyst.plans.logical.View.withNewChildInternal(basicLogicalOperators.scala:565) at org.apache.spark.sql.catalyst.trees.UnaryLike.withNewChildrenInternal(TreeNode.scala:1242) at org.apache.spark.sql.catalyst.trees.UnaryLike.withNewChildrenInternal$(TreeNode.scala:1240) at org.apache.spark.sql.catalyst.plans.logical.View.withNewChildrenInternal(basicLogicalOperators.scala:565) at org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$withNewChildren$2(TreeNode.scala:462) at org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:176) at org.apache.spark.sql.catalyst.trees.TreeNode.withNewChildren(TreeNode.scala:461) at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.org$apache$spark$sql$catalyst$analysis$Analyzer$AddMetadataColumns$$addMetadataCol(Analyzer.scala:975) at org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns$.$anonfun$addMetadataCol$1(Analyzer.scala:975) ``` ### Further investigation If I replace "USING" with classical "ON" clauses, the exception is not thrown. I think this issue is caused by the fact I'm mixing Iceberg & non-Iceberg tables in the UNION clause. If I inline table2 in the query, I get a different exception: ``` SELECT u.* FROM table1 LEFT JOIN ( SELECT id FROM table1 LEFT JOIN ((SELECT id id FROM dev.table2_1 limit 1) UNION (SELECT id FROM table2_2)) USING(id) ) u USING(id) ``` results in: ``` org.apache.spark.sql.AnalysisException: Union can only be performed on tables with the same number of columns, but the first table has 6 columns and the second table has 1 columns; 'Project [id#1302] +- 'Project [id#1302, id#1302] +- 'Project [id#1302, id#998] +- 'Join LeftOuter, (id#998 = id#1302) :- SubqueryAlias table1 : +- View (`table1`, [id#998]) : +- Project [value#995 AS id#998] : +- LocalRelation [value#995] +- 'SubqueryAlias u +- 'Project [id#1294, id#1302] +- 'Project [id#1294, id#1302] +- 'Join LeftOuter, (id#1302 = id#1294) :- SubqueryAlias table1 : +- View (`table1`, [id#1302]) : +- Project [value#1296 AS id#1302] : +- LocalRelation [value#1296] +- 'SubqueryAlias __auto_generated_subquery_name +- 'Distinct +- 'Union false, false :- GlobalLimit 1 : +- LocalLimit 1 : +- Project [_spec_id#1297, _partition#1298, _file#1299, _pos#1300L, _deleted#1301, id#1295 AS id#1294] : +- SubqueryAlias spark_catalog.dev.table2_1 : +- RelationV2[id#1295, _spec_id#1297, _partition#1298, _file#1299, _pos#1300L, _deleted#1301] spark_catalog.dev.table2_1 +- Project [id#1011] +- SubqueryAlias table2_2 +- View (`table2_2`, [id#1011]) +- Project [value#1008 AS id#1011] +- LocalRelation [value#1008] ``` It looks like some Iceberg metadata columns are visible to Spark during the query analysis and I'm not sure they are supposed to. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org