XJDKC commented on code in PR #1506:
URL: https://github.com/apache/polaris/pull/1506#discussion_r2087269395
##########
spec/polaris-management-service.yml:
##########
@@ -938,6 +940,38 @@ components:
format: password
description: Bearer token (input-only)
+ SigV4AuthenticationParameters:
+ type: object
+ description: AWS Signature Version 4 authentication
+ allOf:
+ - $ref: '#/components/schemas/AuthenticationParameters'
+ properties:
+ roleArn:
+ type: string
+ description: The aws IAM role arn assumed by polaris userArn when
signing requests
+ example:
"arn:aws:iam::123456789001:role/role-that-has-remote-catalog-access"
+ roleSessionName:
+ type: string
+ description: The role session name to be used by the SigV4 protocol
for signing requests
+ example: "polaris-remote-catalog-access"
+ externalId:
+ type: string
+ description: An optional external id used to establish a trust
relationship with AWS in the trust policy
+ example: "external-id-1234"
+ signingRegion:
+ type: string
+ description: Region to be used by the SigV4 protocol for signing
requests
+ example: "us-west-2"
+ signingName:
+ type: string
+ description: The service name to be used by the SigV4 protocol for
signing requests, the default signing name is "execute-api" is if not provided
+ example: "glue"
+ serviceIdentity:
+ $ref: '#/components/schemas/ServiceIdentityInfo'
Review Comment:
Hey Dmitri, I discussed the proposal of adding a separate set of management
APIs for handling service identity info in my design doc (topic 2, proposal 3):
[Apache Polaris Creds Management
Proposal](https://docs.google.com/document/d/1MAW87DtyHWPPNIEkUCRVUKBGjhh5bPn0GbtV7fifm30/edit?usp=sharing)
Here is the pros and cons:
* Pros
* **Clean separation of concerns**: Identity info isn’t mixed into catalog
or storage config anymore.
* **Simplifies catalog schema**: Keeps it focused strictly on user input.
* **More extensible**: We can evolve the service identity model without
touching the catalog schema or breaking clients.
* **Flexible**: Works well for both self-managed and SaaS-style
deployments.
* Cons
* **Requires extra coordination**: Users must make an additional API call
to fetch identity info and coordinate across two APIs for full setup.
* **Less cohesive**: Identity context is no longer colocated with the
catalog entity. This assumes **Polaris uses the same identity across all
catalogs in a given realm**.
* **Not aligned with current behavior**: Today, identity fields are
visible in the catalog response, this approach would change that.
* **Dynamic fields challenge**: Some service-managed fields like
consentUrl depend on user-provided input (e.g., Azure tenant ID) and can’t be
precomputed globally, they need to be generated after Polaris receives the
config.
Also, some specific vendors may want to use different service identities to
access different external services.
**e.g. use SIGV4 auth to access Glue/Amazon API Gateway (host polaris behind
the API Gateway), but the table is stored in azure blob (or s3 comp storage)**.
In that case, would it be better to include serviceIdentityInfo as a
top-level field in both the storage config and connection config? That way, the
scope is clearly limited to the relevant config, reducing the blast radius and
keeping things more modular.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]