cassio-paesleme opened a new pull request, #1078: URL: https://github.com/apache/iceberg-go/pull/1078
## Summary The blob `FileIO` implementation assumes all file operations target a single S3 bucket (the warehouse bucket opened at init time). When Iceberg's `write.metadata.path` table property points to a different bucket (e.g. a dedicated versioned metadata bucket), `defaultKeyExtractor` strips the wrong bucket prefix and the file lands in the warehouse bucket under a mangled key. Readers following the absolute S3 URI in `metadata.json` get a 404. **Concrete failure mode**: Setting `write.metadata.path = s3://metadata-bucket/db/table/` causes manifest lists (`snap-*.avro`) and manifest files (`*-m*.avro`) to be written to `s3://warehouse-bucket/metadata-bucket/db/table/snap-*.avro` instead of `s3://metadata-bucket/db/table/snap-*.avro`. The `metadata.json` records the correct URI, but the bytes are in the wrong place. ## Changes - Add `resolveBucket()` to `blobFileIO` which parses the full S3 URI and routes to the correct bucket - Primary bucket (warehouse) uses the fast path with no map lookup - Secondary buckets are opened lazily via a `BucketOpener` callback and cached with `sync.RWMutex` - Update `Open`, `NewWriter`, `WriteFile`, `Remove`, and `DeleteFiles` to use `resolveBucket` - Wire S3 scheme registration to pass a `BucketOpener` that reuses the same AWS config - Backward compatible: callers without a `BucketOpener` get the same legacy behavior ## Test plan - [x] `TestMultiBucketWriteAndRead` - writes to two memblob buckets via different S3 URIs, verifies files land in the correct bucket and can be read back - [x] `TestMultiBucketDelete` - verifies `Remove` and `DeleteFiles` route to the correct bucket - [x] `TestMultiBucketOpenerCaching` - verifies the opener is called once per bucket name - [x] `TestMultiBucketFallbackWithoutOpener` - verifies backward compatibility when no opener is set - [x] All existing tests pass unchanged 🤖 Generated with [Claude Code](https://claude.com/claude-code) -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
