roryqi commented on code in PR #10720:
URL: https://github.com/apache/gravitino/pull/10720#discussion_r3099306396


##########
design-docs/iceberg-supported-nested-namespace.md:
##########
@@ -0,0 +1,398 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+# [Iceberg REST] Supported Nested Namespace Design
+
+## Background
+
+This document describes one practical solution to support Iceberg nested 
namespaces in Gravitino.
+The scope is not only UI privilege granting, but also namespace mapping, 
identifier handling,
+authorization scope, and compatibility behavior across Iceberg REST and 
Gravitino.
+
+References:
+
+- https://github.com/apache/gravitino/blob/main/docs/security/access-control.md
+- https://github.com/apache/gravitino/blob/main/docs/iceberg-rest-service.md
+- 
https://github.com/apache/gravitino/blob/main/docs/manage-relational-metadata-using-gravitino.md
+- https://github.com/apache/gravitino/discussions/7296
+
+## Goal
+
+- Support nested namespace operations from Iceberg REST to Gravitino through 
schema mapping.
+- Support privilege granting for different nested namespace scopes (including 
UI workflow).
+- Keep metadata model stable and avoid heavy refactor.
+
+
+## Solution Options
+
+### Option A: Add a new metadata object `NestedNamespace`
+
+Use a new metadata object `NestedNamespace` to represent nested namespace 
explicitly.
+`NestedNamespace` has a one-to-one mapping with Iceberg `Namespace` to avoid 
ambiguity
+with existing Gravitino `Namespace` concepts.
+
+Catalog -> NestedNamespace a -> NestedNamespace a.b -> Table a.b.c
+                              -> NestedNamespace a.c -> NestedNamespace a.c.d 
-> Table a.c.d.e
+
+Pros:
+
+- Clearer concept modeling.
+
+Cons:
+
+- Large refactor across metadata model, API, authorization, and UI.
+
+### Option B (Recommended): Reuse `Schema` entity and enhance schema 
expression capability
+
+Keep physical metadata unchanged (still persisted as `Schema`) and introduce
+`HierarchicalSchema` as a logical expression layer in Iceberg REST adaptation,
+identifier rendering, and authorization scope matching.
+
+Pros:
+
+- Low-impact evolution path without introducing a new metadata entity.
+- Decouples nested namespace semantics from `.` and reduces parser ambiguity.
+- Reuses existing metadata and authorization model to reduce implementation 
risk.
+
+Cons:
+
+- Requires explicit conversion rules between logical path and physical schema 
name.
+- Authorization matching and identifier serialization become more complex.
+#### Option B Separator (Fixed): Use `:` as logical separator
+
+Examples:
+
+- `A:B:C` as logical `HierarchicalSchema` path.
+- Physical schema name remains mapped through conversion layer.
+
+Pros:
+
+- Better readability than escaping `.` in many clients and UI forms.
+- Lower routing conflict risk than `/`.
+- Easier to keep backward compatibility with existing non-nested schema 
handling.
+
+Cons:
+
+- Needs clear validation rule to avoid ambiguity with existing schema names 
containing `:`.
+
+## Design
+
+### Identifier Rules
+
+- Introduce logical identifier concept: `HierarchicalSchema`.
+- `HierarchicalSchema` uses fixed separator `:` in logic layer.
+- `:` is mandatory in this design, not configurable.
+- For Gravitino REST create/update schema APIs, `request.getName()` keeps the 
logical schema name
+  and may contain `:` (for example `A:B` or `A:B:C`).
+- This phase does not pre-convert `request.getName()` from `:` to `.` before 
capability name
+  validation; capability naming rules must explicitly allow `:` for schema 
scope.
+- Schema name uses `:`-separated hierarchical representation in this design; 
`.` is not used as
+  schema separator.
+- `.` remains an Iceberg namespace hierarchy notation in external semantics 
only.
+- When `:` is used as separator, segment values containing `:` must be 
percent-encoded.
+- `%` must also be encoded as `%25` to avoid decode ambiguity.
+- Encoding/decoding order:
+  - Serialize: encode each segment first, then join with `:`.
+  - Parse: split by `:`, then decode each segment.
+- Decode is applied exactly once to avoid double-decoding issues.
+- Keep flat storage model and convert `HierarchicalSchema` path to physical 
schema name by mapping rules.
+- Identifier rendering rule:
+  - Use encoded `HierarchicalSchema` path directly in schema position.
+  - Do not rely on single-quote wrapping for schema disambiguation in this 
phase.
+
+Examples:
+
+- Nested namespace `A:B` maps to logical `HierarchicalSchema` path `A:B`.
+- Nested namespace `A:B:C` maps to logical `HierarchicalSchema` path `A:B:C`.
+- Logical `HierarchicalSchema` path is then converted to physical schema name 
through mapping rules.
+- Namespace levels `["team:core", "sales"]` are serialized as 
`team%3Acore:sales`.
+- Parsing `team%3Acore:sales` returns `["team:core", "sales"]`.
+- Identifier rendering example:
+  - `metalake.catalog.A:B.table1`
+  - `metalake.catalog.team%3Acore:sales.table2`
+- In UI display, show logical path (for example `A:B:C`) for readability.
+- HTTP transport rule:
+  - Namespace values in URL/query/body must follow RFC 3986 percent-encoding 
when needed.
+  - Example: namespace path `team%3Acore:sales` should be URL-encoded as
+    `team%253Acore%3Asales` when put into a URL query component.
+
+
+### Iceberg REST Side Behavior
+
+- **Create nested namespace**:
+  - Creating `A:B:C` will create (or ensure existence of) three schemas in 
Gravitino: `A`, `A.B`, and `A.B.C`.
+  - Set the created namespace owner as current user.
+- **Update nested namespace**:
+  - Support updating namespace properties through mapped schema operations.
+  - Property update is applied to the mapped target namespace scope.
+- **Drop nested namespace**: drop corresponding schema in Gravitino.
+- **Rename nested namespace**: not needed because Iceberg REST does not 
support namespace rename.
+
+### Gravitino Side Behavior
+
+- `list schema` should express nested hierarchy semantics for users.
+- `list schema` REST API (GET 
`/metalakes/{metalake}/catalogs/{catalog}/schemas`) should support an
+  optional query parameter `parentHierarchicalSchema`.
+  - When `parentHierarchicalSchema` is not provided, return only top-level 
schemas (first layer).
+  - When `parentHierarchicalSchema` is provided, return only the direct child 
schemas under the
+    given parent (next layer), instead of the full subtree.
+  - `parentHierarchicalSchema` value follows the same logical 
`HierarchicalSchema` encoding
+    rules as described in `Identifier Rules` (segment percent-encoding, and 
then RFC 3986
+    percent-encoding for transport in a query component).
+- Gravitino does not provide a dedicated `list sub-schema` API; hierarchy is 
expressed via
+  `list schema`/`list namespaces` results.
+- Example: for schemas `A`, `B`, `A:B`, `A:B:C`, hierarchy view is `A -> A:B 
-> A:B:C` and `B`;
+  root listing returns `A` and `B`, and querying parent `A` returns `A:B`.
+- To make nested semantics explicit, `list namespaces` should express 
parent-child relationships
+  (hierarchical view) even when underlying storage is flat.
+- Example hierarchical view from flat schemas: `A` -> `A:B` -> `A:B:C`, and 
`B` as another root.
+- This list-level hierarchical expression is the primary semantic model for 
users, reducing
+  ambiguity caused by one request creating multiple physical schema objects.
+- Gravitino server REST supports namespace create/update/drop operations for 
nested namespace
+  workflows, aligned with Iceberg REST behavior.
+- Existing schema/table APIs remain compatible with non-nested cases.
+
+### Performance Considerations
+
+- Current `EntityStore` does not support prefix matching on schema names.
+- As a result, `list schema` with `parentHierarchicalSchema` cannot be 
implemented as an
+  efficient prefix query at storage layer. The server must list all schemas in 
the catalog and
+  then compute the top-level / direct-children view in memory.
+- This may introduce higher-than-expected latency and load for catalogs with a 
large number of
+  schemas.
+
+Mitigations:
+
+- Cache computed hierarchy results per catalog with a short TTL and invalidate 
on schema
+  create/update/drop.
+- Enforce reasonable limits (pagination / maximum returned items) for list 
operations.
+- Add performance/regression tests with large schema counts to validate 
behavior.
+- Consider enhancing `EntityStore` to support prefix/index queries as a 
follow-up optimization.

Review Comment:
   I don't wanna introduce cache, too. I want `EntityStore` to support prefix 
queries.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to