nagraham opened a new pull request, #1468: URL: https://github.com/apache/iceberg-rust/pull/1468
## Which issue does this PR close? This attempts to addresses https://github.com/apache/iceberg-rust/issues/1406 ## What changes are included in this PR? ### Problem Writing large files to Cloudflare R2 via `iceberg-rust` fails due to the following error: ``` S3Error { code: "InvalidPart", message: "All non-trailing parts must have the same length.", resource: "", request_id: "" } ``` ### Info Multipart uploads to Cloudflare R2 have a strict requirement that all parts (except the final part) must have the same size ([link to docs](https://developers.cloudflare.com/r2/objects/multipart-objects/)). Iceberg rust uses OpenDAL for writing to object storage. OpenDAL appears to have logic to adaptively set chunk sizes during multi-part uploads, but that doesn't work with r2. That project used to have a configuration setting to handle consistent chunk sizes, but they removed that config, and instead added the `chunk()` feature. See [this OpenDAL issue for context](https://github.com/apache/opendal/issues/6252), where the maintainer suggested setting that value in `iceberg-rust`. ### Solution This commit adds a generic optional configuration property called `io.write.chunk-size` which sets the chunk size on the writer. If the value is not present, then writes work as they do now; otherwise, it applies the consistent chunk size. Here's an example of setting up a `RestCatalog` with this property to write 32MB chunks. ``` props.insert( "io.write.chunk-size".to_string(), (32 * 1024 * 1024).to_string(), ); let cat = RestCatalog::new( RestCatalogConfig::builder() .uri(catalog_uri) .warehouse(warehouse) .props(props) .build(), ); ``` ## Are these changes tested? - A unit test validates that setting the `io.write.chunk-size` property will set the chunk size. - I manually tested the change by writing large files into R2 Data Catalog (which otherwise would have failed with the "InvalidPart" error). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org