Re: [PR] [Design Docs] Iceberg REST supports nested namespace [gravitino]

via GitHub Fri, 24 Apr 2026 01:35:55 -0700


roryqi commented on PR #10720:
URL: https://github.com/apache/gravitino/pull/10720#issuecomment-4311788210


   > ### esign Doc Review
   > Found 12 issues:
   > 
   > **1. `listSchemas` default behavior is a breaking REST API change** 
Changing `GET /schemas` to return only top-level schemas (instead of all 
schemas) silently breaks existing clients with no error. Needs a migration 
strategy or an explicit opt-in flag (e.g. `?view=hierarchical`).
   > 
   > **2. No design for Java/Python client API hierarchy representation** The 
doc only covers REST. How the Java/Python client APIs express nested schema 
paths — opaque strings with `:`, a new `SchemaPath` type, or varargs in 
`NameIdentifier` — is unaddressed. This is a public API surface decision that 
should be part of the design.
   > 
   > **3. Existing schemas with `:` in names silently break on upgrade** Any 
schema currently named `team:sales` will be reinterpreted as a nested namespace 
`team → sales` after this feature is enabled. No migration scan or validation 
strategy is mentioned.
   > 
   > **4. No atomicity for parent-chain creation** Creating `A:B:C` creates 
three schemas (`A`, `A.B`, `A.B.C`) in sequence. If the second step fails, `A` 
is left as an orphan with no rollback or cleanup defined.
   > 
   > **5. Privilege escalation during auto-created parent schemas** If `A` does 
not yet exist when creating `A:B:C`, the requester becomes owner of `A`. A user 
with permission only to create leaf namespaces should not implicitly become 
owner of a top-level namespace.
   > 
   > **6. Physical storage `.` separator conflicts with `NameIdentifier` 
semantics** The doc stores `A:B:C` as physical schema name `A.B.C`. Gravitino's 
`NameIdentifier` uses `.` as its level separator throughout the codebase. Any 
code path that reconstructs a `NameIdentifier` by splitting a full dotted path 
string will misparse `A.B.C` as three separate levels instead of one schema 
name. `NameIdentifier.java` and `NameIdentifierUtil.java` are listed as 
affected classes but the doc does not explain how this conflict is resolved.
   > 
   > **7. Configurable separator has no migration path** If an operator changes 
the configured separator after data is already stored, all existing 
hierarchical schema names become unresolvable. The separator should be 
documented as immutable after first use, or a migration procedure should be 
defined.
   > 
   > **8. Authorization parent-chain walking adds unaddressed per-request 
overhead** Every `USE_SCHEMA` check now walks the parent chain (`A:B:C` → `A:B` 
→ `A`). The performance section covers listing latency but does not address the 
authorization check overhead for high-QPS workloads.
   > 
   > **9. "Drop nested namespace" section is incomplete** The sentence "Will 
drop the schemas of the Gravitino" is cut off. The cascade behavior also does 
not specify whether tables inside dropped schemas are deleted or whether the 
operation is rejected if tables exist.
   > 
   > **10. `toPhysicalSchemaName` breaks if a namespace segment contains `.`** 
The snippet uses `hierarchicalPath.replace(":", ".")`. If any namespace segment 
already contains `.` (e.g. a Hive schema named `my.schema`), the output is 
ambiguous and cannot be parsed back correctly.
   > 
   > **11. Cache invalidation not addressed for distributed deployments** The 
performance mitigation mentions caching per catalog with TTL invalidation on 
mutations, but does not address how multiple Gravitino server instances stay 
consistent — servers could serve stale hierarchy views after schema mutations.
   > 
   > **12. Connector code change scope is understated** The compatibility 
section implies this is a server-side change. In practice, Spark, Flink, and 
Trino connectors all require explicit code changes (the "Connector 
NestedNameIdentifier Conversion" section describes a non-trivial translation 
layer). The connector change scope should be explicitly called out.
   
   I will follow to solve the issues.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [Design Docs] Iceberg REST supports nested namespace [gravitino]

Reply via email to