sqd opened a new pull request, #16339:
URL: https://github.com/apache/iceberg/pull/16339
Introduce DynamicTaskWriterFactoryProvider so callers can supply a custom
TaskWriterFactory<RowData> in place of the default RowDataTaskWriterFactory,
while reusing the surrounding table, schema, partition spec, and write-property
resolution already done in DynamicWriter.
The primary motivation is throughput. Our pipelines have a data pattern tied
deeply into business logic that a hand-rolled TaskWriter can exploit to produce
files far faster than the generic RowDataTaskWriterFactory.
Making the factory pluggable also enables other use cases without forking
the sink:
- Row-level or file-level audit and metrics: sampling, lineage stamps,
metric counters layered around the writer.
- Custom file naming and layout: custom prefixes, alternative partition
paths, custom filesystem properties such as storage class and permissions.
The default provider preserves existing behavior, so callers that do not
supply one are unaffected.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]