Thanks for starting this discussion! Some refactoring does seem helpful in
this area. However, I think there are some elements of the currently
entrenched SPIs to discuss. Specifically, the coordination with the
persistence layer is an intentional feature, even though we should probably
encapsulate it to simplify the cases where it's not needed by the
persistence layer. Reposting some of my comments from the PR:
Though we should indeed be able to *decouple* the various interfaces pulled
in/aggregated by `PolarisMetaStoreManager`:
PolarisSecretsManager,
PolarisGrantManager,
PolarisCredentialVendor,
PolarisPolicyMappingManager,
PolarisEventManager,
PolarisMetricsManager
those interfaces were specifically written as part of SPIs, and at least
the same *functionality* provided by those extensibility points needs to be
preserved more strictly than syntax evolution of those interfaces, unless a
much deeper attempt is made amongst Polaris users to agree on a migration
path off of such functionality.
There's some discussion about the list of core SPIs in here:
https://lists.apache.org/thread/0nj24zro7kyctqfnlml08ppo7zs9xcqs
The way that applies to this PR is that the interaction with the
persistence layer is intentional, especially for the
TransactionalMetaStoreManager path.
The basic use case if if a custom Polaris deployment uses subclasses that
leverage those SPIs in order to perform additional book-keeping around
storage-integration functionality and needs that to be transactional with
entity creation/updates. This is best highlighted by thinking about the
interaction between the subset of `PolarisCredentialVendor` method
implementations with `IntegrationPersistence` which has these methods:
createStorageIntegration
persistStorageIntegrationIfNeeded
loadPolarisStorageIntegration
These are in the interface specifically to define the SPI by which users
customize a "lease, commit, use" model for secondary metadata necessary for
StorageIntegrations. In this current PR, it's not clear what happens to
those methods now -- will
`IntegrationPersistence::loadPolarisStorageIntegration` just be ignored?
Overall, I think the refactoring on the credential caching layer should
still be possible, while preserving the `PolarisCredentialVendor` interface
so that implementations of `PolarisCredentialVendor` still work correctly.
The use case to preserve is as follows:
1. Suppose the running Polaris service doesn't just use a static/fixed
source IAM User from environment variables, but instead maintains a *pool*
of different source-IAM-user credentials in an offloaded sidecar
2. Each new Catalog/StorageConfig created is intended to lease a new
identity from the *pool* and keep track of that lease atomically with the
CatalogEntity
3. The CreateCatalog flow as traced through
`TransactionalMetaStoreManagerImpl.createcatalog` is as follows:
a. Prepare catalog entity info in-memory
b. Call IntegrationPersistence.createStorageIntegrationInCurrentTxn --
this will call the sidecar credential-pool layer to *lease* a new IAM user
by reference
c. Start *transaction* for Polaris persistence layer
d. Lookups to validate current transaction state
e. In the same transaction that persists the new CatalogEntity,
calls persistStorageIntegrationIfNeededInCurrentTxn - this may be custom to
different Polaris service owners, where a book-keeping table is updated
cooperatively with the sidecar credential-pool layer to consume the leased
IAM user and store the fact that the new CatalogEntity's entityId is
tightly coupled to that leased IAM user
f. Commit transaction all in one piece
4. The getSubscopedCreds flow would then be as follows:
a. Load entity again based on id (this is indeed probably redundant and
we could probably refactor this)
b. Call
IntegrationPersistence.loadPolarisStorageIntegrationInCurrentTxn with the
retrieved linked PolarisEntity
c. The custom service instance can have an impl of
IntegrationPersistence here that uses the *stateful* book-keeping of the
leased credential-pool object to return a handle into that sidecar
credential-pool
d. The StorageIntegration returned may not be the same thing as the
original object from "createStorageIntegrationInCurrentTxn" - it might be a
handle to a temporary authenticated connection into the sidecar
credential-vendor, for example
e. This new stateful and entity-coupled PolarisStorageIntegration is
ultimately used to getSubscopedCreds
Today, I think the main cause of confusion is that we only have
"placeholder" implementations in the main repo of
loadPolarisStorageIntegrationInCurrentTxn that conflate "lease/construct
StorageIntegration" with "fetch a vending-ready StorageIntegration". This
was because credential-pooling semantics tend to be specific to each
individual Polaris service-runner, and it'd be a lot of complexity to have
a "toy" implementation of credential-pooling for single-tenant use cases
that don't need it.
I think the overall flow could still be preserved while performing most of
the proposed refactors if we:
1. Solidify some variation of PolarisCredentialVendor as the interface
definition for *using* a PolarisStorageIntegration to get a credential.
This doesn't *need* to be implemented by monolithic MetaStoreManager impls,
but it still needs to exist as an optionally persistence-aware interface
2. Solidify some variation of PolarisStorageIntegrationProvider as the
interface definition for *constructing/leasing* a new
PolarisStorageIntegration strictly for CreateCatalog (or Create for any
entity that might hold a StorageConfig)
3. Maybe reconcile those two "using" and "constructing" methods under a
single unified interface that is still injectable
4. If we want some persistence types like the NoSql stack to always stay
away from being in the business of facilitating credential vending, the
callsite should obtain a PolarisCredentialVendor from some injection or
factory method -- the default PolarisCredentialVendor can indeed be the
current dumb in-memory "Construct from application-level configs in-memory
and vend immediately" implementation.
5. Persistence impls that need stateful cooperation with a credential pool
can still configure to provide their existing "CustomMetaStoreManagerImpl
implements PolarisCredentialVendor" as the thing that's returned to
callsites wanting to get credentials
6. We can evolve the PolarisCredentialVendor method interface to take a
whole PolarisEntity instead of entityId, type, etc., to potentially avoid a
re-lookup. Such a syntactic change is still *semantically* compatible with
the SPI because any impls that used to need enttyId can still call
entity.getId() if we pass in the whole entity
7. Alternatively, if we really want to get callsites to call
getSubscopedCreds directly on a PolarisStorageIntegration object, maybe
it's possible to model the persistence cooperation in subclasses of
PolarisStorageIntegration itself, but we still need a way to distinguish
between whether a caller is *constructing* a new StorageIntegration or
*using* an entity-linked StorageIntegration
On Wed, Apr 1, 2026 at 1:34 PM Tornike Gurgenidze <[email protected]>
wrote:
> Hi all,
>
> I'd like to bring up a refactoring effort around credential vending that
> I've been working on in PR #3699
> <https://github.com/apache/polaris/pull/3699>. Dmitri has been providing
> feedback and helping a lot along the way, but I wanted to open this up for
> broader discussion before iterating further.
>
> Motivation
>
> The current credential vending flow is deeply entangled with the
> persistence layer. When a client requests scoped credentials (e.g. for S3,
> GCS, or Azure), the request goes through: StorageCredentialsVendor ->
> PolarisCredentialVendor -> MetaStoreManager -> persistence layer -> back
> out through PolarisStorageIntegrationProvider. This means credential
> vending re-loads entities from persistence even though the caller already
> has them, and MetaStoreManager implementations are burdened with credential
> vending logic that doesn't belong in persistence.
>
> Overall, the sheer amount of complexity and the amount of layers that
> credential vending flow goes through makes further changes particularly
> challenging as evidenced by some recent efforts around cache key
> generation, storage info resolution, additional storage backends and so on.
>
> What the PR does
>
> 1. Removes credential vending from MetaStoreManager. The
> PolarisCredentialVendor interface, StorageCredentialsVendor, and
> getSubscopedCredsForEntity() implementations are removed from
> MetaStoreManager. This cleans up both the transactional and NoSQL backends.
>
> 2. Moves orchestration into StorageAccessConfigProvider. This
> application-scoped bean now directly resolves the storage integration and
> delegates to it, cutting out the persistence round-trip.
>
> 3. Moves caching into storage integrations. Each PolarisStorageIntegration
> subclass (AWS, GCP, Azure) now owns its StorageCredentialCache interaction
> and builds cloud-specific cache keys, rather than using a one-size-fits-all
> key.
>
> I'd appreciate any feedback on the overall direction, concerns about API
> compatibility in polaris-core, or suggestions for how to best land these
> changes.
>
> Thanks,
> Tornike
>