Re: [PR] [KYUUBI #7379][1/4] Data Agent Engine: module skeleton, configuration, and engine core [kyuubi]

via GitHub Fri, 10 Apr 2026 01:42:43 -0700


wangzhigang1999 commented on code in PR #7385:
URL: https://github.com/apache/kyuubi/pull/7385#discussion_r3063133983



##########
kyuubi-server/src/main/scala/org/apache/kyuubi/engine/EngineRef.scala:
##########
@@ -113,6 +116,17 @@ private[kyuubi] class EngineRef(
 
   @VisibleForTesting
   private[kyuubi] val subdomain: String = 
conf.get(ENGINE_SHARE_LEVEL_SUBDOMAIN) match {
+    // DATA_AGENT engines must be isolated by datasource (JDBC URL). This 
takes precedence over

Review Comment:
   Thanks for pushing on this — "must" really was under-motivated, let me walk 
through the thinking.
   
   Starting from what's actually in this PR, the provider SPI is 
single-datasource by contract:
   
   - `DataAgentProvider.load(KyuubiConf)` is called once per engine from 
engine-level conf.
   - `ENGINE_DATA_AGENT_JDBC_URL` is a single `OptionalConfigEntry[String]`.
   - `ProviderRunRequest` carries `question` / `modelName` / `approvalMode` — 
no datasource field.
   
   So a provider instance is bound to exactly one JDBC URL when it's loaded, 
and a caller has no way to dispatch a request against a different URL. The 
`EngineRef.subdomain` branch is just the routing side of that same contract, 
keeping the two halves in sync.
   
   As for why the SPI is shaped that way: I tried the multi-datasource version 
in a previous project and ran into real pain — SQL dialect divergence (tools 
built around one dialect don't transfer cleanly to another), connection-pool 
and credential isolation, schema introspection cost, and the general 
awkwardness of one process quietly talking to several production databases. The 
takeaway for me was that binding a process to a single datasource makes tool 
design, prompt construction, and credential handling a lot simpler, so I 
carried that lesson into the SPI here.
   
   Multi-datasource isn't ruled out forever — it would just mean rethinking the 
SPI (per-request datasource, or a provider keyed by datasource), the config 
surface, and the routing together, which feels like its own discussion. Happy 
to revisit if a concrete use case comes up.
   
   On the comment itself, I'll tighten it to say the *why* directly:
   
   ```scala
   // A DATA_AGENT engine is bound 1:1 to a JDBC datasource because the 
provider SPI
   // is constructed once per engine from a single 
kyuubi.engine.data.agent.jdbc.url,
   // and ProviderRunRequest carries no datasource field — there is no way to 
dispatch
   // a request against a different JDBC URL than the one the provider was 
loaded
   // with. Sessions targeting different datasources must therefore route to 
distinct
   // engines.
   ```
   
   WDYT?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [KYUUBI #7379][1/4] Data Agent Engine: module skeleton, configuration, and engine core [kyuubi]

Reply via email to