jerryshao commented on PR #10720:
URL: https://github.com/apache/gravitino/pull/10720#issuecomment-4311724869
### Design Doc Review
Found 12 issues:
**1. `listSchemas` default behavior is a breaking REST API change**
Changing `GET /schemas` to return only top-level schemas (instead of all
schemas) silently breaks existing clients with no error. Needs a migration
strategy or an explicit opt-in flag (e.g. `?view=hierarchical`).
**2. No design for Java/Python client API hierarchy representation**
The doc only covers REST. How the Java/Python client APIs express nested
schema paths — opaque strings with `:`, a new `SchemaPath` type, or varargs in
`NameIdentifier` — is unaddressed. This is a public API surface decision that
should be part of the design.
**3. Existing schemas with `:` in names silently break on upgrade**
Any schema currently named `team:sales` will be reinterpreted as a nested
namespace `team → sales` after this feature is enabled. No migration scan or
validation strategy is mentioned.
**4. No atomicity for parent-chain creation**
Creating `A:B:C` creates three schemas (`A`, `A.B`, `A.B.C`) in sequence. If
the second step fails, `A` is left as an orphan with no rollback or cleanup
defined.
**5. Privilege escalation during auto-created parent schemas**
If `A` does not yet exist when creating `A:B:C`, the requester becomes owner
of `A`. A user with permission only to create leaf namespaces should not
implicitly become owner of a top-level namespace.
**6. Physical storage `.` separator conflicts with `NameIdentifier`
semantics**
The doc stores `A:B:C` as physical schema name `A.B.C`. Gravitino's
`NameIdentifier` uses `.` as its level separator throughout the codebase. Any
code path that reconstructs a `NameIdentifier` by splitting a full dotted path
string will misparse `A.B.C` as three separate levels instead of one schema
name. `NameIdentifier.java` and `NameIdentifierUtil.java` are listed as
affected classes but the doc does not explain how this conflict is resolved.
**7. Configurable separator has no migration path**
If an operator changes the configured separator after data is already
stored, all existing hierarchical schema names become unresolvable. The
separator should be documented as immutable after first use, or a migration
procedure should be defined.
**8. Authorization parent-chain walking adds unaddressed per-request
overhead**
Every `USE_SCHEMA` check now walks the parent chain (`A:B:C` → `A:B` → `A`).
The performance section covers listing latency but does not address the
authorization check overhead for high-QPS workloads.
**9. "Drop nested namespace" section is incomplete**
The sentence "Will drop the schemas of the Gravitino" is cut off. The
cascade behavior also does not specify whether tables inside dropped schemas
are deleted or whether the operation is rejected if tables exist.
**10. `toPhysicalSchemaName` breaks if a namespace segment contains `.`**
The snippet uses `hierarchicalPath.replace(":", ".")`. If any namespace
segment already contains `.` (e.g. a Hive schema named `my.schema`), the output
is ambiguous and cannot be parsed back correctly.
**11. Cache invalidation not addressed for distributed deployments**
The performance mitigation mentions caching per catalog with TTL
invalidation on mutations, but does not address how multiple Gravitino server
instances stay consistent — servers could serve stale hierarchy views after
schema mutations.
**12. Connector code change scope is understated**
The compatibility section implies this is a server-side change. In practice,
Spark, Flink, and Trino connectors all require explicit code changes (the
"Connector NestedNameIdentifier Conversion" section describes a non-trivial
translation layer). The connector change scope should be explicitly called out.
---
🤖 Generated with [Claude Code](https://claude.ai/code)
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]