jordepic opened a new pull request, #16243:
URL: https://github.com/apache/iceberg/pull/16243

   DynamicSinkUtil.getEqualityFieldIds and DynamicWriter.getEqualityFields both 
fall back to the schema's identifierFieldIds when the user-supplied equality 
fields are empty, but two routing decisions in the dynamic sink ignored that 
fallback:
   
   1. HashKeyGenerator distributed identifier-only records round-robin, so two 
rows sharing an identifier-derived key could land on different writer subtasks 
while the writer still emitted equality deletes keyed by those identifier 
fields - breaking equality-delete correctness.
   2. DynamicRecordProcessor forwarded any record with a null distributionMode 
straight to the writer, even when the record resolved to a non-empty 
equality-field set. Forward-mode records sharing an equality key could likewise 
split across writers and leave duplicates behind.
   
   Centralize the resolution in DynamicSinkUtil.resolveEqualityFieldNames and 
use it in both call sites so distribution and write-side equality-field 
inference stay aligned. Document the carve-out in flink-writes.md and add unit 
tests covering both paths across v2.1, v2.0 and v1.20.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to