nagraham opened a new pull request, #1468:
URL: https://github.com/apache/iceberg-rust/pull/1468

   ## Which issue does this PR close?
   
   This attempts to addresses https://github.com/apache/iceberg-rust/issues/1406
   
   ## What changes are included in this PR?
   
   ### Problem
   
   Writing large files to Cloudflare R2 via `iceberg-rust` fails due to the 
following error:
   ```
   S3Error { code: "InvalidPart", message: "All non-trailing parts must have 
the same length.", resource: "", request_id: "" }
   ```
   
   ### Info
   
   Multipart uploads to Cloudflare R2 have a strict requirement that all parts 
(except the final part) must have the same size ([link to 
docs](https://developers.cloudflare.com/r2/objects/multipart-objects/)).
   
   Iceberg rust uses OpenDAL for writing to object storage. OpenDAL appears to 
have logic to adaptively set chunk sizes during multi-part uploads, but that 
doesn't work with r2. That project used to have a configuration setting to 
handle consistent chunk sizes, but they removed that config, and instead added 
the `chunk()` feature. See [this OpenDAL issue for 
context](https://github.com/apache/opendal/issues/6252), where the maintainer 
suggested setting that value in `iceberg-rust`.
   
   ### Solution
   
   This commit adds a generic optional configuration property called 
`io.write.chunk-size` which sets the chunk size on the writer. If the value is 
not present, then writes work as they do now; otherwise, it applies the 
consistent chunk size.
   
   Here's an example of setting up a `RestCatalog` with this property to write 
32MB chunks.
   
   ```
       props.insert(
           "io.write.chunk-size".to_string(),
           (32 * 1024 * 1024).to_string(),
       );
   
       let cat = RestCatalog::new(
           RestCatalogConfig::builder()
               .uri(catalog_uri)
               .warehouse(warehouse)
               .props(props)
               .build(),
       );
   ```
   
   ## Are these changes tested?
   
   - A unit test validates that setting the `io.write.chunk-size` property will 
set the chunk size.
   - I manually tested the change by writing large files into R2 Data Catalog 
(which otherwise would have failed with the "InvalidPart" error).


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to