jerryshao commented on PR #10720:
URL: https://github.com/apache/gravitino/pull/10720#issuecomment-4311724869

   ### Design Doc Review
   
   Found 12 issues:
   
   **1. `listSchemas` default behavior is a breaking REST API change**
   Changing `GET /schemas` to return only top-level schemas (instead of all 
schemas) silently breaks existing clients with no error. Needs a migration 
strategy or an explicit opt-in flag (e.g. `?view=hierarchical`).
   
   **2. No design for Java/Python client API hierarchy representation**
   The doc only covers REST. How the Java/Python client APIs express nested 
schema paths — opaque strings with `:`, a new `SchemaPath` type, or varargs in 
`NameIdentifier` — is unaddressed. This is a public API surface decision that 
should be part of the design.
   
   **3. Existing schemas with `:` in names silently break on upgrade**
   Any schema currently named `team:sales` will be reinterpreted as a nested 
namespace `team → sales` after this feature is enabled. No migration scan or 
validation strategy is mentioned.
   
   **4. No atomicity for parent-chain creation**
   Creating `A:B:C` creates three schemas (`A`, `A.B`, `A.B.C`) in sequence. If 
the second step fails, `A` is left as an orphan with no rollback or cleanup 
defined.
   
   **5. Privilege escalation during auto-created parent schemas**
   If `A` does not yet exist when creating `A:B:C`, the requester becomes owner 
of `A`. A user with permission only to create leaf namespaces should not 
implicitly become owner of a top-level namespace.
   
   **6. Physical storage `.` separator conflicts with `NameIdentifier` 
semantics**
   The doc stores `A:B:C` as physical schema name `A.B.C`. Gravitino's 
`NameIdentifier` uses `.` as its level separator throughout the codebase. Any 
code path that reconstructs a `NameIdentifier` by splitting a full dotted path 
string will misparse `A.B.C` as three separate levels instead of one schema 
name. `NameIdentifier.java` and `NameIdentifierUtil.java` are listed as 
affected classes but the doc does not explain how this conflict is resolved.
   
   **7. Configurable separator has no migration path**
   If an operator changes the configured separator after data is already 
stored, all existing hierarchical schema names become unresolvable. The 
separator should be documented as immutable after first use, or a migration 
procedure should be defined.
   
   **8. Authorization parent-chain walking adds unaddressed per-request 
overhead**
   Every `USE_SCHEMA` check now walks the parent chain (`A:B:C` → `A:B` → `A`). 
The performance section covers listing latency but does not address the 
authorization check overhead for high-QPS workloads.
   
   **9. "Drop nested namespace" section is incomplete**
   The sentence "Will drop the schemas of the Gravitino" is cut off. The 
cascade behavior also does not specify whether tables inside dropped schemas 
are deleted or whether the operation is rejected if tables exist.
   
   **10. `toPhysicalSchemaName` breaks if a namespace segment contains `.`**
   The snippet uses `hierarchicalPath.replace(":", ".")`. If any namespace 
segment already contains `.` (e.g. a Hive schema named `my.schema`), the output 
is ambiguous and cannot be parsed back correctly.
   
   **11. Cache invalidation not addressed for distributed deployments**
   The performance mitigation mentions caching per catalog with TTL 
invalidation on mutations, but does not address how multiple Gravitino server 
instances stay consistent — servers could serve stale hierarchy views after 
schema mutations.
   
   **12. Connector code change scope is understated**
   The compatibility section implies this is a server-side change. In practice, 
Spark, Flink, and Trino connectors all require explicit code changes (the 
"Connector NestedNameIdentifier Conversion" section describes a non-trivial 
translation layer). The connector change scope should be explicitly called out.
   
   ---
   
   🤖 Generated with [Claude Code](https://claude.ai/code)


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to