Ambarish-29 opened a new issue, #13974:
URL: https://github.com/apache/iceberg/issues/13974

   ### Query engine
   
   Spark
   
   ### Question
   
   # Issue: Concurrent Writes Behavior in Apache Iceberg (Merge Into Operations)
   
   ## Goal
   My aim is to verify the possibility of concurrent writes on Apache Iceberg 
tables.
   
   ## Background
   From my research, I found that Iceberg supports two isolation levels:
   
   - **Serializable**
   - **Snapshot**
   
   I tested this using a Spark engine, running operations from two different 
terminals against the same Iceberg table.
   
   ---
   
   ## Test 1: Serializable Isolation
   
   **Steps:**
   - Run a `MERGE INTO` on almost all rows inside a partition, updating column 
values.  
   - Concurrently, insert a new record into the same table and same partition, 
where the inserted record matches the condition specified in the merge.  
   
   **Result:**
   - The insert succeeds.  
   - The merge fails with a data conflict, as expected (per the official docs). 
 
   
   Reference: [RowDelta 
Javadoc](https://iceberg.apache.org/javadoc/1.4.2/org/apache/iceberg/RowDelta.html)
   
   I believe this happens because `validateNoConflictingDataFiles()` is 
executed, which correctly prevents conflicts.
   
   ---
   
   ## Test 2: Snapshot Isolation
   
   **Steps:**  
   Same as above.  
   
   **Result:**
   - Both operations succeed.  
   - However, the new inserted row is **not updated** by the merge condition.  
   
   This also seems expected since snapshot isolation provides weaker guarantees 
than serializable.
   
   ---
   
   ## Test 3: Two Concurrent Merge Operations
   
   Now, I tried running two different `MERGE INTO` statements concurrently:
   
   ```sql
   MERGE INTO table_a a
   USING (SELECT * FROM another_table) b
   ON a.unique_id = b.unique_id
   SET colA = 'dummy1';
   
   MERGE INTO table_a a 
   USING (SELECT * FROM another_table) b 
   ON a.unique_id = b.unique_id 
   SET colB = 'dummy1';
   ```
   
   ## Result
   
   One merge succeeds.  
   
   The second merge fails with an error:  
   “Found conflicting delete files that may match the condition provided.”  
   
   ## Question
   
   Is this the expected behavior?  
   
   If yes:  
   
   - Does this mean two concurrent MERGE INTO operations on the same 
rows/partitions are not possible, even if they update different columns?  
   
   From the docs, I noticed that `validateNoConflictingDeleteFiles()` is always 
executed for MERGE and UPDATE, regardless of isolation level.  
   This seems to explain the conflict, but I want to confirm if this is indeed 
the designed behavior.  
   
   ## Request
   
   Could someone from the community/team clarify:  
   
   - Whether this limitation is intentional in Iceberg’s concurrency model.  
   - If so, does this mean Iceberg officially does not support concurrent 
updates/merges on the same rows/partitions?  
   
   ## Notes
   
   Apologies in advance for any grammar mistakes or if I used the wrong 
technical terms.  
   
   Thanks a lot for your time
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to