herbherbherb opened a new pull request, #15784: URL: https://github.com/apache/iceberg/pull/15784
Closes #15783 ### Summary Adds a `WriteObserver` plugin interface to `IcebergSink` (Sink V2) that observes each record written and produces per-checkpoint metadata that flows through the entire sink pipeline (writer → serializer → aggregator → committer) to the Iceberg snapshot summary. ### Motivation Users need to extract per-record metadata (e.g., watermark timestamps, data quality scores) at the writer level and attach it to the committed Iceberg snapshot. Currently there is no way to propagate custom metadata from the writer through the aggregator to the committer without subclassing multiple internal classes. ### Changes - New: `WriteObserver.java` -- interface with `observe(RowData, SinkWriter.Context)` and `default Map<String, String> snapshotMetadata()` - New: `WriteObserverMetadataHolder.java` -- ThreadLocal holder for passing metadata through serialization boundaries - Modified: `IcebergSinkWriter` -- calls observer per-record, collects metadata at checkpoint - Modified: `WriteResultSerializer` -- v2 format carries observer metadata alongside WriteResult - Modified: `IcebergWriteAggregator` -- merges metadata from writer subtasks - Modified: `IcebergCommittable` + `IcebergCommittableSerializer` -- v2 format with observer metadata (backward-compatible with v1) - Modified: `IcebergCommitter` -- applies observer metadata as snapshot properties - Modified: `IcebergSink.Builder` -- new `writeObserver()` method ### Compatibility - No behavioral change when the observer is not set (null default) - Serializers v2 can deserialize v1 payloads (backward compatible) - No changes to public API signatures of existing methods -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
