roryqi commented on PR #10720:
URL: https://github.com/apache/gravitino/pull/10720#issuecomment-4311788210
> ### esign Doc Review
> Found 12 issues:
>
> **1. `listSchemas` default behavior is a breaking REST API change**
Changing `GET /schemas` to return only top-level schemas (instead of all
schemas) silently breaks existing clients with no error. Needs a migration
strategy or an explicit opt-in flag (e.g. `?view=hierarchical`).
>
> **2. No design for Java/Python client API hierarchy representation** The
doc only covers REST. How the Java/Python client APIs express nested schema
paths — opaque strings with `:`, a new `SchemaPath` type, or varargs in
`NameIdentifier` — is unaddressed. This is a public API surface decision that
should be part of the design.
>
> **3. Existing schemas with `:` in names silently break on upgrade** Any
schema currently named `team:sales` will be reinterpreted as a nested namespace
`team → sales` after this feature is enabled. No migration scan or validation
strategy is mentioned.
>
> **4. No atomicity for parent-chain creation** Creating `A:B:C` creates
three schemas (`A`, `A.B`, `A.B.C`) in sequence. If the second step fails, `A`
is left as an orphan with no rollback or cleanup defined.
>
> **5. Privilege escalation during auto-created parent schemas** If `A` does
not yet exist when creating `A:B:C`, the requester becomes owner of `A`. A user
with permission only to create leaf namespaces should not implicitly become
owner of a top-level namespace.
>
> **6. Physical storage `.` separator conflicts with `NameIdentifier`
semantics** The doc stores `A:B:C` as physical schema name `A.B.C`. Gravitino's
`NameIdentifier` uses `.` as its level separator throughout the codebase. Any
code path that reconstructs a `NameIdentifier` by splitting a full dotted path
string will misparse `A.B.C` as three separate levels instead of one schema
name. `NameIdentifier.java` and `NameIdentifierUtil.java` are listed as
affected classes but the doc does not explain how this conflict is resolved.
>
> **7. Configurable separator has no migration path** If an operator changes
the configured separator after data is already stored, all existing
hierarchical schema names become unresolvable. The separator should be
documented as immutable after first use, or a migration procedure should be
defined.
>
> **8. Authorization parent-chain walking adds unaddressed per-request
overhead** Every `USE_SCHEMA` check now walks the parent chain (`A:B:C` → `A:B`
→ `A`). The performance section covers listing latency but does not address the
authorization check overhead for high-QPS workloads.
>
> **9. "Drop nested namespace" section is incomplete** The sentence "Will
drop the schemas of the Gravitino" is cut off. The cascade behavior also does
not specify whether tables inside dropped schemas are deleted or whether the
operation is rejected if tables exist.
>
> **10. `toPhysicalSchemaName` breaks if a namespace segment contains `.`**
The snippet uses `hierarchicalPath.replace(":", ".")`. If any namespace segment
already contains `.` (e.g. a Hive schema named `my.schema`), the output is
ambiguous and cannot be parsed back correctly.
>
> **11. Cache invalidation not addressed for distributed deployments** The
performance mitigation mentions caching per catalog with TTL invalidation on
mutations, but does not address how multiple Gravitino server instances stay
consistent — servers could serve stale hierarchy views after schema mutations.
>
> **12. Connector code change scope is understated** The compatibility
section implies this is a server-side change. In practice, Spark, Flink, and
Trino connectors all require explicit code changes (the "Connector
NestedNameIdentifier Conversion" section describes a non-trivial translation
layer). The connector change scope should be explicitly called out.
I will follow to solve the issues.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]