Thanks Dmitri! I agree the SPI itself is not the main concern here.

IIUC, the question is less whether some realm-awareness exists somewhere in
the implementation, and more what Polaris wants to make part of the initial
built-in contract. My concern is that once the default story starts looking
realm-driven, we may be implicitly hardening a broader model than we
actually want to support yet.

I’m still biased toward keeping the v1 built-in path narrower:
- config-driven,
- a small fixed set of purpose/store buckets,
- and routing behavior that stays stable at the transaction / unit-of-work
boundary.

That still leaves room for a pluggable resolver SPI and downstream
customization, but keeps OSS Polaris itself from committing too early to a
more dynamic routing model before the config / migration / consistency
story is fully settled. So I think the main thing I’d want to make more
explicit is the boundary between:
1. extensibility the SPI permits, and
2. the built-in routing model Polaris is actually recommending and prepared
to support in v1.

If we can make that boundary crisp, I’m much less worried about the
abstraction itself.

-ej

On Thu, Mar 12, 2026 at 11:04 AM Dmitri Bourlatchkov <[email protected]>
wrote:

> Hi EJ,
>
> In the current state of PR [3960], the DataSourceResolver interface enables
> customizing per-realm DataSource resolution.
>
> However, Apache Polaris code is not complicated with any new per-realm
> logic related to DataSources. The OSS side keeps working as before.
>
> I'm not sure whether Sunham intended to delegate this logic to custom
> implementations or not, but as the PR stands the code looks pretty
> reasonable to me.
>
> In the current state of the codebase, I do not think we can completely
> avoid dealing with realms as they relate to DataSources. The decision about
> how to cross-link them has to be made somewhere. IMHO, the
> proposed DefaultDataSourceResolver looks like a good place to make that
> decision. It is certainly subject to further evolution if we need to make
> adjustments later.
>
> Before [3960] all realms implicitly received the default DataSource. This
> is now explicit in the code.
>
> [3960] https://github.com/apache/polaris/pull/3960
>
> Cheers,
> Dmitri.
>
> On Thu, Mar 12, 2026 at 2:29 AM EJ Wang <[email protected]>
> wrote:
>
> > Hi Subham, Dmitri, Yufei, JB,
> >
> > I’m generally aligned with the direction here, and I left a few more
> > detailed comments on the PR.
> > At a high level, my main concern is that the current proposal may still
> be
> > a bit ahead of the current story/scope. I’d lean toward keeping the first
> > step narrower, preferring purpose-based routing over per-realm routing
> for
> > v1, and making the supported model/config+migration story more explicit
> > before the broader contract hardens.
> >
> > -ej
> >
> > On Wed, Mar 11, 2026 at 12:37 PM Jean-Baptiste Onofré <[email protected]>
> > wrote:
> >
> > > Hi Subham,
> > >
> > > Thanks for this contribution. It's an interesting feature.
> > >
> > > As mentioned in the GitHub issues, I am fine with moving forward with a
> > PR
> > > as long as it remains a Draft PR to help drive the discussion. I
> suggest
> > > linking a GitHub Discussion or a Design Doc within the PR to help build
> > > consensus.
> > >
> > > That said, I have a few initial comments:
> > >
> > > 1. I like the SPI approach used in the PR. This should become a
> standard
> > in
> > > Polaris to facilitate custom implementations.
> > > 2. I agree that having a data source "per purpose" is a good idea. The
> > main
> > > question is how we should handle the split:
> > > - Very granular (per entity)
> > > - By table "meaning"
> > > - By realm (this may not be granular enough)
> > >
> > > From a user standpoint, I believe we should keep it simple. It would
> be a
> > > first great step forward. For example, in the configuration (e.g.,
> > > application.properties), it could look like:
> > > - polaris.datasource.entities=
> > > - polaris.datasource.events=
> > > - polaris.datasource.grants=
> > >
> > > Regards,
> > > JB
> > >
> > > On Tue, Mar 10, 2026 at 11:19 PM Yufei Gu <[email protected]>
> wrote:
> > >
> > > > Hi Subham,
> > > >
> > > > Thanks for working on this. Given the complexity and long term
> > > implications
> > > > discussed in https://github.com/apache/polaris/issues/3890, I think
> a
> > > > short
> > > > design doc could still be helpful to capture the intended
> architecture
> > > and
> > > > future evolution. Here are a few questions listed in the issue. I
> > believe
> > > > these should be answered before jumping to an implementation.
> > > >
> > > >
> > > >    1. Should we split each potential noisy table into its own
> dedicated
> > > >    data source. For example, one data source for events, one for
> > metrics,
> > > > and
> > > >    one for idempotency.
> > > >    2. Should we allow flexible grouping. For example, events and
> > > >    idempotency tables sharing one data source, while metrics uses
> > > another.
> > > >    3. Should we consider different DS per realm instead of
> table-level
> > > >    spliting?
> > > >    4. How should schema version information be managed. If tables
> live
> > in
> > > >    different data sources, how do we track and coordinate schema
> > > evolution.
> > > >    5. Should different data sources be allowed to point to different
> > > >    schemas or databases. This likely aligns with the isolation goal,
> > but
> > > it
> > > >    implies that cross table joins become difficult or impossible at
> the
> > > >    database level, leaving only in memory joins as an option.
> > > >    6. Should different data sources be allowed to point to the same
> > > schema.
> > > >    If not, we need validation logic to detect and prevent
> > > misconfiguration.
> > > >
> > > >
> > > > Yufei
> > > >
> > > >
> > > > On Tue, Mar 10, 2026 at 7:33 AM Dmitri Bourlatchkov <
> [email protected]>
> > > > wrote:
> > > >
> > > > > Hi Subham,
> > > > >
> > > > > Thanks again for your contribution!
> > > > >
> > > > > I believe PR 3960 moves in the right direction by establishing an
> SPI
> > > to
> > > > > delegate DataSource resolution logic to the runtime environment.
> > > > >
> > > > > It immediately allows custom implementations in downstream projects
> > (if
> > > > > people wish to do that) and opens a way for supporting multiple
> > > > DataSources
> > > > > in Apache Polaris (in follow-up PRs),
> > > > >
> > > > > I think the PR is pretty clear in itself and does not require any
> > extra
> > > > > design docs. Let's review it in GH and merge when we have
> consensus.
> > > > >
> > > > > Cheers,
> > > > > Dmitri.
> > > > >
> > > > > On Tue, Mar 10, 2026 at 8:27 AM Subham Sangwan <
> > > > > [email protected]>
> > > > > wrote:
> > > > >
> > > > > > Hi Polaris Dev Team I have opened PR #3960 [1] to introduce the
> > > > > > foundational groundwork for multi-datasource support in JDBC
> > > > persistence,
> > > > > > addressing Issue #3890 [2].The goal is to enable physical
> isolation
> > > of
> > > > > > different persistence workloads (METASTORE, METRICS, EVENTS) into
> > > > > dedicated
> > > > > > connection pools or databases. This will allow Polaris to better
> > > handle
> > > > > > high-traffic environments by preventing "noisy neighbor" effects
> on
> > > the
> > > > > > core entity tables.
> > > > > >
> > > > > > Key Highlights:
> > > > > >
> > > > > >    - DataSourceResolver: A new pluggable interface for routing
> JDBC
> > > > > >    connections based on RealmContext and StoreType.
> > > > > >    - Modular Design: Decoupled the resolution implementation into
> > the
> > > > > >    runtime-common module.
> > > > > >    - Consistency: Utilizes a type-safe StoreType enum and aligns
> > with
> > > > > >    existing RealmContext patterns.
> > > > > >
> > > > > > The PR has been refined with feedback from @dimas-b and is now
> > ready
> > > > for
> > > > > > community review. I'd appreciate any feedback on the overall
> > > approach.
> > > > > >
> > > > > > Best regards,
> > > > > >
> > > > > > Subham Sangwan
> > > > > > GitHub: Subham-KRLX
> > > > > >
> > > > > > [1] https://github.com/apache/polaris/pull/3960
> > > > > > [2] https://github.com/apache/polaris/issues/3890
> > > > > >
> > > > >
> > > >
> > >
> >
>

Reply via email to