Ambarish-29 opened a new issue, #13974: URL: https://github.com/apache/iceberg/issues/13974
### Query engine Spark ### Question # Issue: Concurrent Writes Behavior in Apache Iceberg (Merge Into Operations) ## Goal My aim is to verify the possibility of concurrent writes on Apache Iceberg tables. ## Background From my research, I found that Iceberg supports two isolation levels: - **Serializable** - **Snapshot** I tested this using a Spark engine, running operations from two different terminals against the same Iceberg table. --- ## Test 1: Serializable Isolation **Steps:** - Run a `MERGE INTO` on almost all rows inside a partition, updating column values. - Concurrently, insert a new record into the same table and same partition, where the inserted record matches the condition specified in the merge. **Result:** - The insert succeeds. - The merge fails with a data conflict, as expected (per the official docs). Reference: [RowDelta Javadoc](https://iceberg.apache.org/javadoc/1.4.2/org/apache/iceberg/RowDelta.html) I believe this happens because `validateNoConflictingDataFiles()` is executed, which correctly prevents conflicts. --- ## Test 2: Snapshot Isolation **Steps:** Same as above. **Result:** - Both operations succeed. - However, the new inserted row is **not updated** by the merge condition. This also seems expected since snapshot isolation provides weaker guarantees than serializable. --- ## Test 3: Two Concurrent Merge Operations Now, I tried running two different `MERGE INTO` statements concurrently: ```sql MERGE INTO table_a a USING (SELECT * FROM another_table) b ON a.unique_id = b.unique_id SET colA = 'dummy1'; MERGE INTO table_a a USING (SELECT * FROM another_table) b ON a.unique_id = b.unique_id SET colB = 'dummy1'; ``` ## Result One merge succeeds. The second merge fails with an error: “Found conflicting delete files that may match the condition provided.” ## Question Is this the expected behavior? If yes: - Does this mean two concurrent MERGE INTO operations on the same rows/partitions are not possible, even if they update different columns? From the docs, I noticed that `validateNoConflictingDeleteFiles()` is always executed for MERGE and UPDATE, regardless of isolation level. This seems to explain the conflict, but I want to confirm if this is indeed the designed behavior. ## Request Could someone from the community/team clarify: - Whether this limitation is intentional in Iceberg’s concurrency model. - If so, does this mean Iceberg officially does not support concurrent updates/merges on the same rows/partitions? ## Notes Apologies in advance for any grammar mistakes or if I used the wrong technical terms. Thanks a lot for your time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
