herbherbherb opened a new pull request, #15784:
URL: https://github.com/apache/iceberg/pull/15784

   Closes #15783
   
   ### Summary
   
   Adds a `WriteObserver` plugin interface to `IcebergSink` (Sink V2) that 
observes each record written and produces per-checkpoint metadata that flows 
through the entire sink pipeline (writer → serializer → aggregator → committer) 
to the Iceberg snapshot summary.
   
   ### Motivation
   
   Users need to extract per-record metadata (e.g., watermark timestamps, data 
quality scores) at the writer level and attach it to the committed Iceberg 
snapshot. Currently there is no way to propagate custom metadata from the 
writer through the aggregator to the committer without subclassing multiple 
internal classes.
   
   ### Changes
   
   - New: `WriteObserver.java` -- interface with `observe(RowData, 
SinkWriter.Context)` and `default Map<String, String> snapshotMetadata()`
   - New: `WriteObserverMetadataHolder.java` -- ThreadLocal holder for passing 
metadata through serialization boundaries
   - Modified: `IcebergSinkWriter` -- calls observer per-record, collects 
metadata at checkpoint
   - Modified: `WriteResultSerializer` -- v2 format carries observer metadata 
alongside WriteResult
   - Modified: `IcebergWriteAggregator` -- merges metadata from writer subtasks
   - Modified: `IcebergCommittable` + `IcebergCommittableSerializer` -- v2 
format with observer metadata (backward-compatible with v1)
   - Modified: `IcebergCommitter` -- applies observer metadata as snapshot 
properties
   - Modified: `IcebergSink.Builder` -- new `writeObserver()` method
   
   ### Compatibility
   
   - No behavioral change when the observer is not set (null default)
   - Serializers v2 can deserialize v1 payloads (backward compatible)
   - No changes to public API signatures of existing methods


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to