Hi I-Ting, Unfortunately, I do not have an answer to your double registration question off the top of my head, but I added an item for this discussion to the Community Sync [1] agenda for March 19.
[1] https://polaris.apache.org/community/meetings/ Cheers, Dmitri. On Tue, Mar 17, 2026 at 10:19 AM ITing Lee <[email protected]> wrote: > Hi Dmitri, > > Thank you for your clear guidance! > > > I completely agree with the unified namespace tree principle. > > To ensure Polaris acts as the single source of truth and avoids resolution > ambiguity, I will refactor the implementation to follow a lookup then > dispatch pattern. > > Instead of speculative probing, the sparkCatalog will first resolve the > table entity via Polaris metadata to identify the provider, then > deterministically route the call or throw a Table format mismatch error if > the API mode is incompatible. > > > I have another question regarding table registration for non-delegating > formats. > > Since Paimon does not support a delegating catalog mode (unlike > Delta/Hudi), it cannot automatically notify Polaris of its changes. > > In my PR, I've implemented an explicit dual-registration during createTable > (Physical creation in Paimon warehouse followed by logical registration in > Polaris). > > This ensures Paimon tables are visible via SHOW TABLES. > > > I would like to ask if the community has better ideas for handling such > standalone formats? (From my perspective, the dual-registration is not an > atomic operator for both systems. There's still a chance that only one of > the services succeeds but the other fails, which will cause inconsistency. > However, it _seems_ this is the only way to achieve it for non-delegating > format.) > > > The alternative for having Polaris actively scan external warehouses which > seems to introduce significant performance overhead. > > Is there a more elegant way to ensure catalog visibility without > sacrificing the goal of single source of truth , or is this explicit > registration the preferred pattern for now? > > > Best regards, > > I-Ting > > Dmitri Bourlatchkov <[email protected]> 於 2026年3月16日週一 下午9:42寫道: > > > Hi I-Ting, > > > > Thanks for starting this discussion. You bring up important points. > > > > From my point of view, the catalog data controlled by Polaris should > form a > > unified namespace tree. In other words, each full table name owned by > > Polaris must be unique and resolve to the same table entity regardless of > > the API used by the client. > > > > If a name is accessed via the Icebert REST Catalog API and happens to > point > > to a Paimon table, I think Polaris ought to report an error to the client > > (something like HTTP 422 "Table format mismatch"). > > > > If a name is accessed via the Generic Tables API, the response must > > indicate actual table format. > > > > I do not think the client should make multiple "lookup" calls for the > same > > table name. That creates ambiguity in the name resolution logic and could > > lead to different lookup results in different clients. > > > > I believe the client should select the API it wants to use (IRC or > Generic > > Tables) at setup time and then rely on that API for all primary lookup > > calls. > > > > WDYT? > > > > Thanks, > > Dmitri. > > > > On Sat, Mar 14, 2026 at 3:34 AM 李宜頲 <[email protected]> wrote: > > > > > Hi all, > > > > > > We are adding support for Paimon inside Polaris's SparkCatalog. Before > we > > > add more formats, we would like to get community input on the intended > > > architecture. > > > > > > This discussion originated from a code review conversation in PR #3820 > > > <https://github.com/apache/polaris/pull/3820#discussion_r2865885791> > > > > > > > > > > > > *Current design* > > > > > > When SparkCatalog.loadTable is called, the routing works in three > phases: > > > > > > > > > 1. Try the Iceberg catalog (icebergSparkCatalog.loadTable). If it > > succeeds, > > > return immediately. > > > > > > 2. Call getTableFormat(ident), which makes a single HTTP GET to the > > Polaris > > > server to read the provider property stored in the generic table > > metadata, > > > without triggering any Spark DataSource resolution. > > > > > > 3. Route based on the provider string: > > > > > > - "paimon" : delegate to Paimon's SparkCatalog > > > > > > - unknown/other : fall back to polarisSparkCatalog.loadTable, which > > > performs full DataSource resolution > > > > > > > > > The same three-phase pattern is repeated independently in loadTable, > > > alterTable, and dropTable*(But createTable is not following this > > pattern)*. > > > It might raise the concern that this makes the routing logic intrusive: > > > every new format requires parallel changes across all three methods, > and > > > there is no single place that describes the full routing policy. > > > > > > > > > *Questions for discussion* > > > > > > > > > 1. Should Polaris determine the provider first (via metadata) and > > delegate > > > to a single matching catalog, or should it attempt multiple > sub-catalogs > > in > > > a defined order? > > > > > > 2. If multiple sub-catalogs are supported, should there be a > documented, > > > deterministic > > > > > > resolution order (Iceberg -> Paimon -> Delta -> Hudi -> Polaris > > > fallback)? Who owns that order, should it be configurable by operators? > > > > > > 3. Should the per-format routing logic be centralised behind an > > abstraction > > > (e.g. a SubCatalogRouter interface or a provider registry), so that > > adding > > > a new format is a single registration rather than edits across > loadTable, > > > alterTable, and dropTable? > > > > > > 4. Consistency:Should all table operations (loadTable, createTable, > > > alterTable, dropTable, > > > > > > renameTable) follow the same routing strategy, or are per-operation > > > differences acceptable? Currently createTable has a different branching > > > structure from loadTable. > > > > > > 5. Is it in scope for Polaris to act as a routing layer for multiple > > table > > > providers, or should users who need both Polaris and Paimon configure > > them > > > as separate catalogs in their Spark session and route at the session > > level > > > themselves? > > > > > > > > > We have a working Paimon implementation today and would like to avoid > > > locking in a pattern that becomes hard to extend. Any input on the > design > > > direction, or pointers to prior discussion on this topic, would be much > > > appreciated. > > > > > > > > > Best regards, > > > > > > I-Ting > > > > > >
