[GitHub] [iceberg] peay commented on pull request #6470: Spark: Allow specifying file format in RewriteDataFiles

GitBox Tue, 10 Jan 2023 00:36:34 -0800


peay commented on PR #6470:
URL: https://github.com/apache/iceberg/pull/6470#issuecomment-1376904416


   The motivation in https://github.com/apache/iceberg/issues/6464 is to allow 
writing as Avro from a streaming pipeline, where row-based can make sense for 
small but frequent micro-batches, but then compacting to Parquet for 
longer-term batch analytics. This can be done today by configuring the table as 
Parquet, and explicitly setting the streaming writer to write as Avro, but it 
is a bit less flexible and I think it'd make sense to keep such details in 
compaction settings instead.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [iceberg] peay commented on pull request #6470: Spark: Allow specifying file format in RewriteDataFiles

Reply via email to