dennishuo commented on code in PR #2290:
URL: https://github.com/apache/polaris/pull/2290#discussion_r2280150574
##########
runtime/service/src/main/java/org/apache/polaris/service/admin/PolarisAdminService.java:
##########
@@ -984,8 +946,14 @@ public List<Catalog> listCatalogs() {
/** List all catalogs without checking for permission. */
private Stream<CatalogEntity> listCatalogsUnsafe() {
- return loadEntities(
- PolarisEntityType.CATALOG, PolarisEntitySubType.ANY_SUBTYPE, null,
CatalogEntity::of);
+ return metaStoreManager
+ .loadEntitiesAll(
+ getCurrentPolarisContext(),
+ null,
+ PolarisEntityType.CATALOG,
+ PolarisEntitySubType.ANY_SUBTYPE,
+ CatalogEntity::of)
Review Comment:
Are these type-wrappers from
https://github.com/apache/polaris/pull/2261/files the only usage of
`transformer` that aren't `Identity` transformations now? Alongside the
`PolicyEntity.of` in `PolicyCatalog`?
I guess I can see how it's a bit cleaner being able to pass these into the
`metaStoreManager` and if callers don't end up doing yet another
`.stream().map(...).toList()` maybe there's avoidance of a bit of memory
copying between lists, but as we're also starting to clarify better in the
mailing list discussion about Polaris SPIs, the `PolarisMetaStoreManager` is
one critical interface boundary, and having the open-ended transformer
introduces some subtle pitfalls.
Previously, it was already a bit dicey having the Transformer in the
`BasePersistence::listEntities` method but at least it was contained because
`BasePersistence` has been de-facto package-private in its usage (i.e.
IcebergCatalog/PolarisAdminService interacts with MetaStoreManager, not
BasePersistence), so the blast radius has luckily remained somewhat contained.
One example pitfall is when considering how this pushdown benefits from now
having the ability to achieve SNAPSHOT_READ isolation for list contents, and
how that translates into running inside the `runInTransaction` for the
TransactionalMetaStoreManager branch, then if we're willing to run arbitrary
transformer functions inside the transactional critical block we need callers
to "promise" they'll only provide "trivial" transformers.
Otherwise, I could imagine someday a "convenient" heavyweight
transformation, such as for example, a remote-catalog entity resolver that uses
a PolarisBaseEntity as just a passthrough facade, leaking in and causing
problems by making slow external remote-catalog network requests within the
transactional/snapshot read section.
Also, right now the interfaces permit the MetaStoreManager implementation
and/or BasePersistence implementation to potentially perform concurrent
underlying operations if desired to parallelize i.e. sharded list results;
depending on whether the filter/transformer are applied inline or as
post-processing, we'd also need to know if the provided functions are
thread-safe.
If we *really* want to expose it, we should at least document some initial
basic contraints/guidelines in the method jaavadocs:
1. Must be "lightweight" without network/IO dependencies
2. Must not cause re-entrant transactions in the database layer
3. Must be threadsafe
These probably need to apply to both the Transformer and the Filter.
Otherwise, my preferred approach is to change all of these `FooEntity::of`
transformer use cases to simply pop them out to the caller like:
return metaStoreManager
.loadEntitiesAll(
getCurrentPolarisContext(),
null,
PolarisEntityType.CATALOG,
PolarisEntitySubType.ANY_SUBTYPE) // No 'transformer' arg
.stream()
.map(CatalogEntity::of);
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]