nastra commented on code in PR #9050: URL: https://github.com/apache/iceberg/pull/9050#discussion_r1394354418
########## spark/v3.4/spark/src/main/java/org/apache/iceberg/spark/source/SparkWrite.java: ########## @@ -390,7 +390,7 @@ private Expression conflictDetectionFilter() { filter = Expressions.and(filter, expr); } - return filter; + return SERIALIZABLE == isolationLevel ? Expressions.alwaysTrue() : filter; Review Comment: > The same rule applies in Spark 3.5 but the planning works differently. In Spark 3.4, we push the join condition that was already replaced. Therefore, `ref(name="id") == 1005` and `ref(name="id") == 1006` become the conflict detection conditions. According to those conditions, the operations are serializable. Therefore, both are committed. I actually think that having `ref(name="id") == 1005` as a conflict detection condition in the one statement and `ref(name="id") == 1006` in the other statement and the way we detect conflicts isn't enough, because it can't prevent **write skew**. In order to avoid **write skew** (a transaction reads some data, examines the results, and takes some action based the results it saw - thus there's a causal dependency between the queries and the writes of a transaction), serializable isolation needs to detect such a case, where a TX may have acted on an outdated premise (aka the data it read may have been updated in the meantime by another TX). What I saw in the case with Spark 3.5. is that the conflict detection filter that was pushed down to Iceberg was empty, thus we ended up using `Expressions.alwaysTrue()` and correctly detected that there's a conflict. Using `Expressions.alwaysTrue()` isn't ideal for conflict detection, but it's better than violating serializable isolation. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org