601madman opened a new issue, #11709:
URL: https://github.com/apache/iceberg/issues/11709

   ### Apache Iceberg version
   
   1.6.1
   
   ### Query engine
   
   Spark
   
   ### Please describe the bug 🐞
   
   ### Issue
   When an Iceberg table meets the following two conditions:
   
   1. **IDENTIFIER FIELDS** is set.
   2. The property write.merge.mode/write.delete.mode/write.update.mode is set 
to the merge-on-read mode.
   
   Performing the corresponding operations (**MERGE INTO**, **DELETE**, 
**UPDATE**) will result in the following error:
   ```
    java.lang.IllegalArgumentException: Cannot add fieldId 1 as an identifier 
field: field does not exist.
   ```
   ### Analysis
   After reviewing Iceberg’s source code (versions 1.6.1 and 1.7.0), I believe 
this is a bug. Below is my explanation:
   1. When executing MERGE INTO, DELETE, or UPDATE, if the corresponding 
property is set to merge-on-read, the code enters the 
SparkPositionDeltaOperation.
   2. Within SparkPositionDeltaOperation, the method buildMergeOnReadScan() is 
invoked. The issue arises in the schemaWithMetadataColumns() method.
   3. When fetching the schema of the Iceberg table, the schema is divided into 
two parts:
   - Explicitly defined fields in the table.
   - Metadata fields (metadataSchema).
   4. Separate Schema objects are created for each part. During the creation of 
the Schema object, a validation (validateIdentifierField) is performed, 
iterating through all identifierFieldIds to check whether they exist in the 
field list.
   5. Here lies the problem:
   - For explicitly defined fields in the table, the IDENTIFIER FIELDS should 
belong to this part and pass validation.
   - However, for metadata fields, the identifierFieldIds are certainly not 
present, which leads to the error.
   This is my understanding of the issue.
   
   ### Proposed Fix
   Based on my understanding, I tested a small modification to the source code:
   - In the SparkScanBuilder class, I modified the calculateMetadataSchema() 
method. Specifically, when creating the Schema at the end of the method, I 
changed the second parameter to an empty list, since metadata fields do not 
require validateIdentifierField validation.
   
   After making this modification and running my tests, the issue was resolved.
   
   ### Willingness to contribute
   
   - [ ] I can contribute a fix for this bug independently
   - [X] I would be willing to contribute a fix for this bug with guidance from 
the Iceberg community
   - [ ] I cannot contribute a fix for this bug at this time


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to