Hi I-Ting,

Thanks for starting this discussion. You bring up important points.

>From my point of view, the catalog data controlled by Polaris should form a
unified namespace tree. In other words, each full table name owned by
Polaris must be unique and resolve to the same table entity regardless of
the API used by the client.

If a name is accessed via the Icebert REST Catalog API and happens to point
to a Paimon table, I think Polaris ought to report an error to the client
(something like HTTP 422 "Table format mismatch").

If a name is accessed via the Generic Tables API, the response must
indicate actual table format.

I do not think the client should make multiple "lookup" calls for the same
table name. That creates ambiguity in the name resolution logic and could
lead to different lookup results in different clients.

I believe the client should select the API it wants to use (IRC or Generic
Tables) at setup time and then rely on that API for all primary lookup
calls.

WDYT?

Thanks,
Dmitri.

On Sat, Mar 14, 2026 at 3:34 AM 李宜頲 <[email protected]> wrote:

> Hi all,
>
> We are adding support for Paimon inside Polaris's SparkCatalog. Before we
> add more formats, we would like to get community input on the intended
> architecture.
>
> This discussion originated from a code review conversation in PR #3820
> <https://github.com/apache/polaris/pull/3820#discussion_r2865885791>
>
>
>
> *Current design*
>
> When SparkCatalog.loadTable is called, the routing works in three phases:
>
>
> 1. Try the Iceberg catalog (icebergSparkCatalog.loadTable). If it succeeds,
> return immediately.
>
> 2. Call getTableFormat(ident), which makes a single HTTP GET to the Polaris
> server to read the provider property stored in the generic table metadata,
> without triggering any Spark DataSource resolution.
>
> 3. Route based on the provider string:
>
>     - "paimon"  : delegate to Paimon's SparkCatalog
>
>     - unknown/other : fall back to polarisSparkCatalog.loadTable, which
> performs full DataSource resolution
>
>
> The same three-phase pattern is repeated independently in loadTable,
> alterTable, and dropTable*(But createTable is not following this pattern)*.
> It might raise the concern that this makes the routing logic intrusive:
> every new format requires parallel changes across all three methods, and
> there is no single place that describes the full routing policy.
>
>
> *Questions for discussion*
>
>
> 1. Should Polaris determine the provider first (via metadata) and delegate
> to a single matching catalog, or should it attempt multiple sub-catalogs in
> a defined order?
>
> 2. If multiple sub-catalogs are supported, should there be a documented,
> deterministic
>
>   resolution order (Iceberg -> Paimon -> Delta -> Hudi -> Polaris
> fallback)? Who owns that order, should it be configurable by operators?
>
> 3. Should the per-format routing logic be centralised behind an abstraction
> (e.g. a SubCatalogRouter interface or a provider registry), so that adding
> a new format is a single registration rather than edits across loadTable,
> alterTable, and dropTable?
>
> 4. Consistency:Should all table operations (loadTable, createTable,
> alterTable, dropTable,
>
>   renameTable) follow the same routing strategy, or are per-operation
> differences acceptable? Currently createTable has a different branching
> structure from loadTable.
>
> 5. Is it in scope for Polaris to act as a routing layer for multiple table
> providers, or should users who need both Polaris and Paimon configure them
> as separate catalogs in their Spark session and route at the session level
> themselves?
>
>
> We have a working Paimon implementation today and would like to avoid
> locking in a pattern that becomes hard to extend. Any input on the design
> direction, or pointers to prior discussion on this topic, would be much
> appreciated.
>
>
> Best regards,
>
> I-Ting
>

Reply via email to