anandnalya commented on issue #9960:
URL: https://github.com/apache/iceberg/issues/9960#issuecomment-4657859861
This still reproduces on the latest Iceberg release, and I can isolate it to
the `write.spark.accept-any-schema` table property (confirming @voducdan's and
@siddiquebagwan's observations above).
**Environment**
- Spark 3.5.7
- Hadoop 3.3.6
- Iceberg `iceberg-spark-runtime-3.5_2.13:1.10.0`
- Catalog: `org.apache.iceberg.spark.SparkSessionCatalog`, `type=hive`
-
`spark.sql.extensions=org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions`
(confirmed set)
- No `spark.sql.optimizer.excludedRules` / `analyzer.excludedRules`
**Minimal repro — the property is the only variable**
Fails:
```sql
CREATE TABLE scratch.t_accept (id BIGINT, val STRING) USING iceberg
TBLPROPERTIES ('format-version'='2',
'write.spark.accept-any-schema'='true');
EXPLAIN UPDATE scratch.t_accept SET val = NULL WHERE id = 1;
-- Error occurred during query planning:
-- UPDATE TABLE is not supported temporarily.
```
Works (identical table, property absent):
```sql
CREATE TABLE scratch.t_plain (id BIGINT, val STRING) USING iceberg
TBLPROPERTIES ('format-version'='2');
EXPLAIN UPDATE scratch.t_plain SET val = NULL WHERE id = 1;
-- == Physical Plan ==
-- ReplaceData IcebergWrite(table=spark_catalog.scratch.t_plain,
format=PARQUET)
```
**Only UPDATE is affected — the copy-on-write rewrite path itself is fine.**
A DELETE that forces the same copy-on-write rewrite (`ReplaceData`) on the
*same* `accept-any-schema=true` table succeeds:
```sql
EXPLAIN DELETE FROM scratch.t_accept WHERE id IN (SELECT id FROM
scratch.t_plain);
-- == Physical Plan ==
-- ReplaceData IcebergWrite(table=spark_catalog.scratch.t_accept,
format=PARQUET) ✅
```
This lines up with @nastra's note that
[SPARK-43324](https://github.com/apache/spark/pull/41028) moved UPDATE handling
out of the Iceberg extensions into Spark 3.5's native `RewriteUpdateTable`
rule. That rule appears not to handle tables advertising the
`ACCEPT_ANY_SCHEMA` capability (set via `write.spark.accept-any-schema=true`),
so the `UpdateTable` node is never rewritten and falls through to the V1
`BasicOperators` path (`SparkStrategies.scala`), which throws
`ddlUnsupportedTemporarilyError("UPDATE TABLE")`. The DELETE copy-on-write
rewrite above doesn't have to align `SET` assignments against the table schema,
which is consistent with only UPDATE breaking.
**Workaround:** clear the property, run the UPDATE, then restore it:
```sql
ALTER TABLE <t> UNSET TBLPROPERTIES ('write.spark.accept-any-schema');
UPDATE <t> SET ... WHERE ...;
ALTER TABLE <t> SET TBLPROPERTIES ('write.spark.accept-any-schema'='true');
```
Reproducible on Iceberg 1.10.0 / Spark 3.5.7. Since the root cause now looks
like it lives in Spark's `RewriteUpdateTable` (post SPARK-43324) rather than
the Iceberg extensions, could this be reopened — or is there an existing Spark
JIRA tracking the `ACCEPT_ANY_SCHEMA` + row-level UPDATE interaction it should
be redirected to?
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]