karuppayya commented on PR #4335:
URL:
https://github.com/apache/datafusion-comet/pull/4335#issuecomment-4463758481
@mbutrovich Appreciate the feedback — agree this is worth pinning down
before going further.
The main scenario is REST-catalog deployments where vended credentials and
refresh config originate in the driver's catalog context.
**1.** Iceberg already does exactly init+broadcast for `S3FileIO`.
[`SparkBatch.planInputPartitions`](https://github.com/apache/iceberg/blob/main/spark/v3.5/spark/src/main/java/org/apache/iceberg/spark/source/SparkBatch.java#L86-L92)
broadcasts the configured FileIO and
[`S3FileIO.initialize(Map<String,String>)`](https://github.com/apache/iceberg/blob/main/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java#L502)
runs on the driver. This change follows the same pattern.
**2.** I think REST makes this broadcast load-bearing( and not optional i
think).
[`LoadTableResponse.credentials`](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/responses/LoadTableResponse.java#L47)
are obtained driver-side and reach executors because
[`RESTSessionCatalog.newFileIO` calls
`setCredentials(...)`](https://github.com/apache/iceberg/blob/main/core/src/main/java/org/apache/iceberg/rest/RESTSessionCatalog.java#L1187-L1208)
and the FileIO is then broadcast. Refresh on the executor uses
[`VendedCredentialsProvider` built from broadcast
properties](https://github.com/apache/iceberg/blob/main/aws/src/main/java/org/apache/iceberg/aws/s3/S3FileIO.java#L459-L478).
**3.** Multi-catalog. This change keys broadcasts by catalog name, so
per-catalog isolation works.
[#4309](https://github.com/apache/datafusion-comet/pull/4309)'s SPI has no
catalog identifier, so two catalogs sharing a vendor class silently collide on
one cached instance — fixable though.
**4.** [#4309](https://github.com/apache/datafusion-comet/pull/4309)
requires every vendor to ship Comet-Spark plumbing. A REST-catalog vendor would
need to redo much of what this change does i guess
Curious what you think.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]