nastra commented on code in PR #9510: URL: https://github.com/apache/iceberg/pull/9510#discussion_r1469684751
########## spark/v3.5/spark-extensions/src/main/scala/org/apache/spark/sql/catalyst/analysis/CheckViews.scala: ########## @@ -36,6 +38,9 @@ object CheckViews extends (LogicalPlan => Unit) { verifyColumnCount(ident, columnAliases, query) SchemaUtils.checkColumnNameDuplication(query.schema.fieldNames, SQLConf.get.resolver) + case AlterViewAs(ResolvedV2View(_, _, _), _, query) => + SchemaUtils.checkColumnNameDuplication(query.schema.fieldNames, SQLConf.get.resolver) Review Comment: I agree that this is a weird case and I also had a different expectation when initially running into this. That's why I added the `alterViewWithUpdatedQueryColumns()` test to make this behavior explicit. I also looking into how V1 views behave here and they do the same thing (aka losing the column comments from the original CREATE): ``` spark-sql (default)> create view iceberg1.v10 (x COMMENT 'comment 1', y COMMENT 'comment 2') AS SELECT count(id), zip from iceberg1.foo group by zip; spark-sql (default)> show create table iceberg1.v10; CREATE VIEW iceberg1.v10 ( x COMMENT 'comment 1', y COMMENT 'comment 2') TBLPROPERTIES ( 'transient_lastDdlTime' = '1706538575') AS SELECT count(id) AS cnt, zip from iceberg1.foo group by zip Time taken: 0.055 seconds, Fetched 1 row(s) spark-sql (default)> describe extended iceberg1.v10; x bigint comment 1 y int comment 2 # Detailed Table Information Catalog spark_catalog Database iceberg1 Table v10 Owner nastra Created Time Mon Jan 29 15:29:35 CET 2024 Last Access UNKNOWN Created By Spark 3.5.0 Type VIEW View Text SELECT count(id) AS cnt, zip from iceberg1.foo group by zip View Original Text SELECT count(id) AS cnt, zip from iceberg1.foo group by zip View Catalog and Namespace spark_catalog.default View Query Output Columns [cnt, zip] Table Properties [transient_lastDdlTime=1706538575] Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat org.apache.hadoop.mapred.SequenceFileInputFormat OutputFormat org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Storage Properties [serialization.format=1] Time taken: 0.063 seconds, Fetched 21 row(s) ``` ``` spark-sql (default)> alter view iceberg1.v10 AS SELECT count(zip) AS cnt, zip from iceberg1.foo group by zip; Time taken: 0.213 seconds spark-sql (default)> show create table iceberg1.v10; CREATE VIEW iceberg1.v10 ( cnt, zip) TBLPROPERTIES ( 'transient_lastDdlTime' = '1706538634') AS SELECT count(zip) AS cnt, zip from iceberg1.foo group by zip Time taken: 0.029 seconds, Fetched 1 row(s) spark-sql (default)> describe extended iceberg1.v10; cnt bigint zip int # Detailed Table Information Catalog spark_catalog Database iceberg1 Table v10 Owner nastra Created Time Mon Jan 29 15:29:35 CET 2024 Last Access UNKNOWN Created By Spark 3.5.0 Type VIEW View Text SELECT count(zip) AS cnt, zip from iceberg1.foo group by zip View Original Text SELECT count(zip) AS cnt, zip from iceberg1.foo group by zip View Catalog and Namespace spark_catalog.default View Query Output Columns [cnt, zip] Table Properties [transient_lastDdlTime=1706538634] Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat org.apache.hadoop.mapred.SequenceFileInputFormat OutputFormat org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat Storage Properties [serialization.format=1] Time taken: 0.06 seconds, Fetched 21 row(s) ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org