jpugliesi opened a new issue, #7502: URL: https://github.com/apache/iceberg/issues/7502
### Apache Iceberg version
1.2.1 (latest release)
### Query engine
Spark
### Please describe the bug 🐞
We’re running into a new issue since upgrading from Spark 3.3.1 + Iceberg
1.0 to Spark 3.3.2 + Iceberg 1.1 (and 1.2) where, in Iceberg >=1.1, we get a
new exception upon a MERGE INTO a table:
```
java.lang.RuntimeException: Max iterations (100) reached for batch
Resolution, please set 'spark.sql.analyzer.maxIterations' to a larger value.
```
The query is essentially just updating a column on a target table:
```
(
spark.table(failure_assoc_table)
.filter(f"profile_id = '{profile_id}'")
.join(failures, on="failure_id", how="inner")
.select("event_id")
.distinct()
.createOrReplaceTempView("view_failure_events")
)
spark.sql(
f"""
MERGE INTO {events_table} AS t
USING view_failure_events AS s
ON t.profile_id = '{profile_id}' AND t.event_id = s.event_id
WHEN MATCHED THEN UPDATE SET t.is_open = true
"""
)
```
We’ve tried increasing `spark.sql.analyzer.maxIterations` to a larger value
(ie 10000), and get the same error.
Setting `spark.sql.planChangeLog.level: "warn"` shows that the analyzer
starts iteratively applying the rule
`org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns` until
`spark.sql.analyzer.maxIterations` is exceeded. To elaborate, here are portions
of the plan changelog for the `MERGE INTO` query:
```
...
23/05/02 20:07:41 WARN PlanChangeLogger:
=== Applying Rule
org.apache.spark.sql.catalyst.analysis.RewriteMergeIntoTable ===
```
Then loops on the `AddMetadataColumns` rule until it hits `100`. Notice that
the column `failure_id` is repeated many times, which seems odd.
```
23/05/02 20:07:42 WARN PlanChangeLogger:
=== Applying Rule
org.apache.spark.sql.catalyst.analysis.Analyzer$AddMetadataColumns ===
MergeIntoIcebergTable ((profile_id#574 =
57e18aee-6038-4557-b42d-042739a2ccad) AND (event_id#576 = event_id#554)),
[updateaction(None, assignment(profile_id#574, profile_id#574),
assignment(asset_id#575, asset_id#575), assignment(event_id#576, event_id#576),
assignment(ano_id#577, ano_id#577), assignment(ano_date#578, ano_date#578),
assignment(ano_count#579, ano_count#579), assignment(is_open#580, true),
assignment(r2_group_label#581, r2_group_label#581))]
MergeIntoIcebergTable
((profile_id#574 = 57e18aee-6038-4557-b42d-042739a2ccad) AND (event_id#576 =
event_id#554)), [updateaction(None, assignment(profile_id#574, profile_id#574),
assignment(asset_id#575, asset_id#575), assignment(event_id#576, event_id#576),
assignment(ano_id#577, ano_id#577), assignment(ano_date#578, ano_date#578),
assignment(ano_count#579, ano_count#579), assignment(is_open#580, true),
assignment(r2_group_label#581, r2_group_label#581))]
:- SubqueryAlias t
:-
SubqueryAlias t
: +- SubqueryAlias iceberg.demo.core__fct_ano_events
: +- SubqueryAlias
iceberg.demo.core__fct_ano_events
: +- RelationV2[profile_id#574, asset_id#575, event_id#576, ano_id#577,
ano_date#578, ano_count#579, is_open#580, r2_group_label#581]
iceberg.demo.core__fct_ano_events
: +-
RelationV2[profile_id#574, asset_id#575, event_id#576, ano_id#577,
ano_date#578, ano_count#579, is_open#580, r2_group_label#581]
iceberg.demo.core__fct_ano_events
:- SubqueryAlias s
:-
SubqueryAlias s
: +- SubqueryAlias view_failure_events
: +-
SubqueryAlias view_failure_events
: +- View (`view_failure_events`, [event_id#554])
: +- View
(`view_failure_events`, [event_id#554])
: +- Deduplicate [event_id#554]
: +-
Deduplicate [event_id#554]
: +- Project [event_id#554]
: +-
Project [event_id#554]
: +- Project [failure_id#555, profile_id#553, event_id#554,
weight#556, time_to_fail#557]
:
+- Project [failure_id#555, profile_id#553, event_id#554, weight#556,
time_to_fail#557]
: +- Join Inner, (failure_id#555 = failure_id#538)
:
+- Join Inner, (failure_id#555 = failure_id#538)
: :- Filter (profile_id#553 =
57e18aee-6038-4557-b42d-042739a2ccad)
:
:- Filter (profile_id#553 = 57e18aee-6038-4557-b42d-042739a2ccad)
: : +- SubqueryAlias
iceberg.demo.__bdg_ano_event_failure
: : +-
SubqueryAlias iceberg.demo.__bdg_ano_event_failure
: : +- RelationV2[profile_id#553, event_id#554,
failure_id#555, weight#556, time_to_fail#557]
iceberg.demo.__bdg_ano_event_failure
: :
+- RelationV2[profile_id#553, event_id#554, failure_id#555, weight#556,
time_to_fail#557] iceberg.demo.__bdg_ano_event_failure
: +- Project [failure_id#538]
:
+- Project [failure_id#538]
: +- Filter ((profile_id#536 =
57e18aee-6038-4557-b42d-042739a2ccad) AND (failure_date#539 >= cast(2023-02-06
as date)))
:
+- Filter ((profile_id#536 = 57e18aee-6038-4557-b42d-042739a2ccad) AND
(failure_date#539 >= cast(2023-02-06 as date)))
: +- SubqueryAlias iceberg.demo.core__fct_failures
:
+- SubqueryAlias iceberg.demo.core__fct_failures
: +- RelationV2[profile_id#536, asset_id#537,
failure_id#538, failure_date#539, failure_label#540, failure_desc#541,
failure_type#542, cost#543, parts_cost#544, labor_cost#545, currency_code#546]
iceberg.demo.core__fct_failures
:
+- RelationV2[profile_id#536, asset_id#537, failure_id#538,
failure_date#539, failure_label#540, failure_desc#541, failure_type#542,
cost#543, parts_cost#544, labor_cost#545, currency_code#546]
iceberg.demo.core__fct_failures
!+- ReplaceIcebergData
+-
ReplaceIcebergData RelationV2[profile_id#574, asset_id#575, event_id#576,
ano_id#577, ano_date#578, ano_count#579, is_open#580, r2_group_label#581]
iceberg.demo.core__fct_ano_events
+- MergeRows[profile_id#590, asset_id#591, event_id#592, ano_id#593,
ano_date#594, ano_count#595, is_open#596, r2_group_label#597, _file#598]
+-
MergeRows[profile_id#590, asset_id#591, event_id#592, ano_id#593, ano_date#594,
ano_count#595, is_open#596, r2_group_label#597, _file#598]
+- Join LeftOuter, ((profile_id#574 =
57e18aee-6038-4557-b42d-042739a2ccad) AND (event_id#576 = event_id#554)),
leftHint=(strategy=no_broadcast_hash)
+- Join
LeftOuter, ((profile_id#574 = 57e18aee-6038-4557-b42d-042739a2ccad) AND
(event_id#576 = event_id#554)), leftHint=(strategy=no_broadcast_hash)
:- NoStatsUnaryNode
:-
NoStatsUnaryNode
: +- Project [profile_id#574, asset_id#575, event_id#576,
ano_id#577, ano_date#578, ano_count#579, is_open#580, r2_group_label#581,
_file#584, true AS __row_from_target#587, monotonically_increasing_id() AS
__row_id#588L]
: +-
Project [profile_id#574, asset_id#575, event_id#576, ano_id#577, ano_date#578,
ano_count#579, is_open#580, r2_group_label#581, _file#584, true AS
__row_from_target#587, monotonically_increasing_id() AS __row_id#588L]
: +- RelationV2[profile_id#574, asset_id#575, event_id#576,
ano_id#577, ano_date#578, ano_count#579, is_open#580, r2_group_label#581,
_file#584] iceberg.demo.core__fct_ano_events
: +-
RelationV2[profile_id#574, asset_id#575, event_id#576, ano_id#577,
ano_date#578, ano_count#579, is_open#580, r2_group_label#581, _file#584]
iceberg.demo.core__fct_ano_events
+- Project [event_id#554, true AS __row_from_source#589]
+-
Project [event_id#554, true AS __row_from_source#589]
+- SubqueryAlias s
+-
SubqueryAlias s
+- SubqueryAlias view_failure_events
+- SubqueryAlias view_failure_events
! +- View (`view_failure_events`,
[event_id#554,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,
failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538])
+- View (`view_failure_events`,
[event_id#554,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538
,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538,failure_id#538...,failure_id#538])
+- Deduplicate [event_id#554]
+- Deduplicate [event_id#554]
! +- Project [event_id#554, failure_id#538,
failure_id#538, failure_id#538, failure_id#538, failure_id#538, failure_id#538,
failure_id#538, failure_id#538, failure_id#538, failure_id#538, failure_id#538,
failure_id#538, failure_id#538, failure_id#538, failure_id#538, failure_id#538,
failure_id#538, failure_id#538, failure_id#538, failure_id#538, failure_id#538,
failure_id#538, failure_id#538, ... 75 more fields]
+- Project [event_id#554, failure_id#538, failure_id#538,
failure_id#538, failure_id#538, failure_id#538, failure_id#538, failure_id#538,
failure_id#538, failure_id#538, failure_id#538, failure_id#538, failure_id#538,
failure_id#538, failure_id#538, failure_id#538, failure_id#538, failure_id#538,
failure_id#538, failure_id#538, failure_id#538, failure_id#538, failure_id#538,
failure_id#538, ... 76 more fields]
! +- Project [failure_id#555, profile_id#553,
event_id#554, weight#556, time_to_fail#557, failure_id#538, failure_id#538,
failure_id#538, failure_id#538, failure_id#538, failure_id#538, failure_id#538,
failure_id#538, failure_id#538, failure_id#538, failure_id#538, failure_id#538,
failure_id#538, failure_id#538, failure_id#538, failure_id#538, failure_id#538,
failure_id#538, failure_id#538, ... 79 more fields]
+- Project [failure_id#555, profile_id#553, event_id#554,
weight#556, time_to_fail#557, failure_id#538, failure_id#538, failure_id#538,
failure_id#538, failure_id#538, failure_id#538, failure_id#538, failure_id#538,
failure_id#538, failure_id#538, failure_id#538, failure_id#538, failure_id#538,
failure_id#538, failure_id#538, failure_id#538, failure_id#538, failure_id#538,
failure_id#538, ... 80 more fields]
+- Join Inner, (failure_id#555 =
failure_id#538)
+- Join Inner, (failure_id#555 = failure_id#538)
:- Filter (profile_id#553 =
57e18aee-6038-4557-b42d-042739a2ccad)
:- Filter (profile_id#553 =
57e18aee-6038-4557-b42d-042739a2ccad)
: +- SubqueryAlias
iceberg.demo.__bdg_ano_event_failure
: +- SubqueryAlias iceberg.demo.__bdg_ano_event_failure
: +- RelationV2[profile_id#553,
event_id#554, failure_id#555, weight#556, time_to_fail#557, _spec_id#563,
_partition#564, _file#565, _pos#566L, _deleted#567]
iceberg.demo.__bdg_ano_event_failure
: +- RelationV2[profile_id#553, event_id#554, failure_id#555,
weight#556, time_to_fail#557, _spec_id#563, _partition#564, _file#565,
_pos#566L, _deleted#567] iceberg.demo.__bdg_ano_event_failure
+- Project [failure_id#538, _spec_id#547,
_partition#548, _file#549, _pos#550L, _deleted#551]
+- Project [failure_id#538, _spec_id#547, _partition#548,
_file#549, _pos#550L, _deleted#551]
...
...
```
[A recent Spark PR mentions a somewhat similar issue, but its not clear to
me whether this analyzer exception is rooted in Iceberg or Spark's
behavior](https://github.com/apache/spark/pull/40321/files#r1131931758)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]
