Hi Dmitri, Thanks for adding this to the Community Sync agenda and for keeping me in the loop.
Since the meeting time is around midnight in my time zone, I won't be able to join live. Could you please confirm where I can find the meeting outcomes? Should I check the community Google Doc for the notes, or will there be a recording available? I look forward to the community's feedback from the sync. I'll follow up on the mailing list or the PR once I’ve had a chance to process the meeting's outcomes. Best regards, I-Ting yun zou <[email protected]> 於 2026年3月18日週三 上午8:12寫道: > Hi ITing, > > Thanks for bringing this up! > > *>>> Should Polaris determine the provider first (via metadata) and > delegate to a single matching catalog, or should it attempt multiple > sub-catalogs in a defined order? * > > *>>> If multiple sub-catalogs are supported, should there be a documented, > deterministic.* > > As Dimitri pointed out, Polaris Catalog today is designed to support mixed > table types. In other words, a single catalog (and namespace) can contain > Iceberg, Delta, and Hudi tables, and table identifiers must be unique > across all of them. > > Currently: > > - > > Iceberg tables are only visible through Iceberg endpoints > - > > Generic tables are only visible through generic table endpoints > - > > These two views are disjoint > > Because of this, to get a complete view of all tables in a catalog, we need > to call listTables on both the Iceberg and generic endpoints. > > For loadTable, since we only have the table identifier and don’t know the > table type upfront, we may need to try both endpoints in the worst case. > Client-side table format caching could help optimize this in near future. > > Regarding ordering, there isn’t a strict or required sequence when checking > different table types. For example, checking Generic first and then Iceberg > (or vice versa) won’t change the outcome. The current approach of > attempting Iceberg first is simply a convention, not a requirement. > > *>>> Should the per-format routing logic be centralised behind an > abstraction (e.g. a SubCatalogRouter interface or a provider registry), so > that adding a new format is a single registration rather than edits across > loadTable, alterTable, and dropTable? * > > I think the current if/else logic mainly exists because we didn’t have a > clear understanding of how different formats would behave on the client > side at the time. Now that Delta, Hudi, and Lance appear to follow a > similar pattern, it makes sense to extract a common routing abstraction. > That would definitely simplify the code and make adding new formats a > matter of registration rather than touching multiple code paths. > > *>>> Consistency:Should all table operations (loadTable, createTable, > alterTable, dropTable, renameTable) follow the same routing strategy, or > are per-operation differences acceptable? Currently createTable has a > different branching structure from loadTable.* > > In general, it would be good for most table operations (loadTable, > alterTable, dropTable, renameTable) to follow a consistent routing > strategy. However, createTable is a bit different — since we already know > the table format at creation time, we can directly route to the correct > endpoint. So I think it’s reasonable for createTable to have a different > branching structure. > > *>>> Is it in scope for Polaris to act as a routing layer for multiple > table providers, or should users who need both Polaris and Paimon configure > them as separate catalogs in their Spark session and route at the session > level themselves?* > > Polaris Server itself doesn’t perform routing. This responsibility lies > with the Polaris Spark Client, which should determine the correct endpoint > to call for each operation. > > *>>> Paimon does not support a delegating catalog mode (unlike Delta/Hudi), > it cannot automatically notify Polaris of its changes.* > > I may have missed this detail in the PR and will double-check. My > understanding is that Paimon’s SparkCatalog does not call into a REST > catalog as part of its table operations. In that case, it becomes the > client’s responsibility to ensure operations are executed correctly. If > needed, we could invoke operations twice, but we’d also need to ensure > proper failure handling — i.e., if any step fails, the operation should be > marked as failed and the transaction rolled back correctly. > > > Best Regards, > > Yun > > On Tue, Mar 17, 2026 at 7:45 AM Dmitri Bourlatchkov <[email protected]> > wrote: > > > Hi I-Ting, > > > > Unfortunately, I do not have an answer to your double registration > question > > off the top of my head, but I added an item for this discussion to the > > Community Sync [1] agenda for March 19. > > > > [1] https://polaris.apache.org/community/meetings/ > > > > Cheers, > > Dmitri. > > > > On Tue, Mar 17, 2026 at 10:19 AM ITing Lee <[email protected]> wrote: > > > > > Hi Dmitri, > > > > > > Thank you for your clear guidance! > > > > > > > > > I completely agree with the unified namespace tree principle. > > > > > > To ensure Polaris acts as the single source of truth and avoids > > resolution > > > ambiguity, I will refactor the implementation to follow a lookup then > > > dispatch pattern. > > > > > > Instead of speculative probing, the sparkCatalog will first resolve the > > > table entity via Polaris metadata to identify the provider, then > > > deterministically route the call or throw a Table format mismatch error > > if > > > the API mode is incompatible. > > > > > > > > > I have another question regarding table registration for non-delegating > > > formats. > > > > > > Since Paimon does not support a delegating catalog mode (unlike > > > Delta/Hudi), it cannot automatically notify Polaris of its changes. > > > > > > In my PR, I've implemented an explicit dual-registration during > > createTable > > > (Physical creation in Paimon warehouse followed by logical registration > > in > > > Polaris). > > > > > > This ensures Paimon tables are visible via SHOW TABLES. > > > > > > > > > I would like to ask if the community has better ideas for handling such > > > standalone formats? (From my perspective, the dual-registration is not > an > > > atomic operator for both systems. There's still a chance that only one > > of > > > the services succeeds but the other fails, which will cause > > inconsistency. > > > However, it _seems_ this is the only way to achieve it for > non-delegating > > > format.) > > > > > > > > > The alternative for having Polaris actively scan external warehouses > > which > > > seems to introduce significant performance overhead. > > > > > > Is there a more elegant way to ensure catalog visibility without > > > sacrificing the goal of single source of truth , or is this explicit > > > registration the preferred pattern for now? > > > > > > > > > Best regards, > > > > > > I-Ting > > > > > > Dmitri Bourlatchkov <[email protected]> 於 2026年3月16日週一 下午9:42寫道: > > > > > > > Hi I-Ting, > > > > > > > > Thanks for starting this discussion. You bring up important points. > > > > > > > > From my point of view, the catalog data controlled by Polaris should > > > form a > > > > unified namespace tree. In other words, each full table name owned by > > > > Polaris must be unique and resolve to the same table entity > regardless > > of > > > > the API used by the client. > > > > > > > > If a name is accessed via the Icebert REST Catalog API and happens to > > > point > > > > to a Paimon table, I think Polaris ought to report an error to the > > client > > > > (something like HTTP 422 "Table format mismatch"). > > > > > > > > If a name is accessed via the Generic Tables API, the response must > > > > indicate actual table format. > > > > > > > > I do not think the client should make multiple "lookup" calls for the > > > same > > > > table name. That creates ambiguity in the name resolution logic and > > could > > > > lead to different lookup results in different clients. > > > > > > > > I believe the client should select the API it wants to use (IRC or > > > Generic > > > > Tables) at setup time and then rely on that API for all primary > lookup > > > > calls. > > > > > > > > WDYT? > > > > > > > > Thanks, > > > > Dmitri. > > > > > > > > On Sat, Mar 14, 2026 at 3:34 AM 李宜頲 <[email protected]> wrote: > > > > > > > > > Hi all, > > > > > > > > > > We are adding support for Paimon inside Polaris's SparkCatalog. > > Before > > > we > > > > > add more formats, we would like to get community input on the > > intended > > > > > architecture. > > > > > > > > > > This discussion originated from a code review conversation in PR > > #3820 > > > > > < > https://github.com/apache/polaris/pull/3820#discussion_r2865885791> > > > > > > > > > > > > > > > > > > > > *Current design* > > > > > > > > > > When SparkCatalog.loadTable is called, the routing works in three > > > phases: > > > > > > > > > > > > > > > 1. Try the Iceberg catalog (icebergSparkCatalog.loadTable). If it > > > > succeeds, > > > > > return immediately. > > > > > > > > > > 2. Call getTableFormat(ident), which makes a single HTTP GET to the > > > > Polaris > > > > > server to read the provider property stored in the generic table > > > > metadata, > > > > > without triggering any Spark DataSource resolution. > > > > > > > > > > 3. Route based on the provider string: > > > > > > > > > > - "paimon" : delegate to Paimon's SparkCatalog > > > > > > > > > > - unknown/other : fall back to polarisSparkCatalog.loadTable, > > which > > > > > performs full DataSource resolution > > > > > > > > > > > > > > > The same three-phase pattern is repeated independently in > loadTable, > > > > > alterTable, and dropTable*(But createTable is not following this > > > > pattern)*. > > > > > It might raise the concern that this makes the routing logic > > intrusive: > > > > > every new format requires parallel changes across all three > methods, > > > and > > > > > there is no single place that describes the full routing policy. > > > > > > > > > > > > > > > *Questions for discussion* > > > > > > > > > > > > > > > 1. Should Polaris determine the provider first (via metadata) and > > > > delegate > > > > > to a single matching catalog, or should it attempt multiple > > > sub-catalogs > > > > in > > > > > a defined order? > > > > > > > > > > 2. If multiple sub-catalogs are supported, should there be a > > > documented, > > > > > deterministic > > > > > > > > > > resolution order (Iceberg -> Paimon -> Delta -> Hudi -> Polaris > > > > > fallback)? Who owns that order, should it be configurable by > > operators? > > > > > > > > > > 3. Should the per-format routing logic be centralised behind an > > > > abstraction > > > > > (e.g. a SubCatalogRouter interface or a provider registry), so that > > > > adding > > > > > a new format is a single registration rather than edits across > > > loadTable, > > > > > alterTable, and dropTable? > > > > > > > > > > 4. Consistency:Should all table operations (loadTable, createTable, > > > > > alterTable, dropTable, > > > > > > > > > > renameTable) follow the same routing strategy, or are > per-operation > > > > > differences acceptable? Currently createTable has a different > > branching > > > > > structure from loadTable. > > > > > > > > > > 5. Is it in scope for Polaris to act as a routing layer for > multiple > > > > table > > > > > providers, or should users who need both Polaris and Paimon > configure > > > > them > > > > > as separate catalogs in their Spark session and route at the > session > > > > level > > > > > themselves? > > > > > > > > > > > > > > > We have a working Paimon implementation today and would like to > avoid > > > > > locking in a pattern that becomes hard to extend. Any input on the > > > design > > > > > direction, or pointers to prior discussion on this topic, would be > > much > > > > > appreciated. > > > > > > > > > > > > > > > Best regards, > > > > > > > > > > I-Ting > > > > > > > > > > > > > > >
