jerryshao commented on code in PR #10720: URL: https://github.com/apache/gravitino/pull/10720#discussion_r3146795568
########## design-docs/iceberg-supported-nested-namespace.md: ########## @@ -0,0 +1,573 @@ +<!-- + Licensed to the Apache Software Foundation (ASF) under one + or more contributor license agreements. See the NOTICE file + distributed with this work for additional information + regarding copyright ownership. The ASF licenses this file + to you under the Apache License, Version 2.0 (the + "License"); you may not use this file except in compliance + with the License. You may obtain a copy of the License at + + http://www.apache.org/licenses/LICENSE-2.0 + + Unless required by applicable law or agreed to in writing, + software distributed under the License is distributed on an + "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY + KIND, either express or implied. See the License for the + specific language governing permissions and limitations + under the License. +--> +# [Iceberg REST] Supported Nested Namespace Design + +## Background + +This document describes one practical solution to support Iceberg nested namespaces in Gravitino. +The scope is not only UI privilege granting, but also namespace mapping, identifier handling, +authorization scope, and compatibility behavior across Iceberg REST and Gravitino. + +References: + +- https://github.com/apache/gravitino/blob/main/docs/security/access-control.md +- https://github.com/apache/gravitino/blob/main/docs/iceberg-rest-service.md +- https://github.com/apache/gravitino/blob/main/docs/manage-relational-metadata-using-gravitino.md +- https://github.com/apache/gravitino/discussions/7296 + +## Goal + +- Support nested namespace operations from Iceberg REST to Gravitino through schema mapping. +- Support privilege granting for different nested namespace scopes (including UI workflow). +- Keep metadata model stable and avoid heavy refactor. + + +## Solution Options + +### Option A: Add a new metadata object `NestedNamespace` + +Use a new metadata object `NestedNamespace` to represent nested namespace explicitly. +`NestedNamespace` has a one-to-one mapping with Iceberg `Namespace` to avoid ambiguity +with existing Gravitino `Namespace` concepts. + +Catalog -> NestedNamespace a -> NestedNamespace a.b -> Table a.b.c + -> NestedNamespace a.c -> NestedNamespace a.c.d -> Table a.c.d.e + +Pros: + +- Clearer concept modeling. + +Cons: + +- Large refactor across metadata model, API, authorization, and UI. + +### Option B (Recommended): Reuse `Schema` entity and enhance schema expression capability + +Keep physical metadata unchanged (still persisted as `Schema`) and introduce +`HierarchicalSchema` as a logical expression layer in Iceberg REST adaptation, +identifier rendering, and authorization scope matching. + +Pros: + +- Low-impact evolution path without introducing a new metadata entity. +- Decouples nested namespace semantics from `.` and reduces parser ambiguity. +- Reuses existing metadata and authorization model to reduce implementation risk. + +Cons: + +- Requires explicit conversion rules between logical path and physical schema name. +- Authorization matching and identifier serialization become more complex. +#### Option B Separator: Configurable logical separator (default `:`) + +Examples: + +- `A:B:C` as logical `HierarchicalSchema` path when separator is `:`. +- Physical schema name remains mapped through conversion layer. + +Pros: + +- Better readability than escaping `.` in many clients and UI forms. +- Lower routing conflict risk than `/`. +- Easier to keep backward compatibility with existing non-nested schema handling. + +Cons: + +- Needs clear validation rule to avoid ambiguity with existing schema names containing configured + separator. + +## Design + +### Identifier Rules + +- Introduce logical identifier concept: `HierarchicalSchema`. +- `HierarchicalSchema` uses a configurable logical separator (default `:`) in API/logic layer. +- For Gravitino REST create/update schema APIs, `request.getName()` keeps the logical schema name + and may contain `:` (for example `A:B` or `A:B:C`). +- Before persisting to `EntityStore`, schema path is normalized to `.`-separated physical schema + name. +- Escaping strategy: each path segment is encoded before physical flattening to avoid ambiguity. +- Configured logical separator is reserved as hierarchy separator and is not allowed inside one + namespace segment. +- `.` inside one segment is allowed and must not be treated as hierarchy separator. +- Parsing is direct split/join by configured logical separator at API boundary. +- Keep flat storage model and convert `HierarchicalSchema` path to physical schema name by mapping rules. +- Identifier rendering rule: + - Use encoded `HierarchicalSchema` path directly in schema position. + - Do not rely on single-quote wrapping for schema disambiguation in this phase. + +Examples: + +- Nested namespace `A:B` maps to logical `HierarchicalSchema` path `A:B` (assuming configured + separator is `:`). +- Nested namespace `A:B:C` maps to logical `HierarchicalSchema` path `A:B:C` (assuming configured + separator is `:`). +- Logical `HierarchicalSchema` path is then converted to physical schema name through mapping rules. +- Namespace levels `["team", "sales"]` are serialized using configured separator, e.g. + `team:sales`. +- Parsing `team:sales` returns `["team", "sales"]` when separator is `:`. +- Identifier rendering example: + - `metalake.catalog.A:B.table1` + - `metalake.catalog.team:sales.table2` +- In UI display and API transport, use logical path directly (for example `A:B:C`). + +### Physical Name Mapping and Reversibility + +- **Persisted schema name in `EntityStore` always uses `.` as the internal storage separator** for + stable storage semantics. +- External request/response handling uses configured logical separator and converts at API boundary. +- Connector-facing behavior remains Iceberg-compatible and does not require users to configure or + input internal storage representation. +- Mapping must be reversible: + - `logical path segments` -> `encode each segment` -> `join by '.'` for physical storage. + - physical schema name -> `split by '.'` -> `decode each segment` -> logical path segments. + - This avoids ambiguity when one segment contains `.` (for example `my.schema`). + +### Existing Name Compatibility and Migration Guard + +- Before enabling nested namespace mode for a catalog, run a pre-check scan on existing schema names + against configured logical separator. +- If existing schema names conflict with selected separator, enabling is rejected with actionable + error and user can choose another separator or rename conflicting schema names. +- Once nested mode is enabled for a catalog, creating new schema names containing configured logical + separator as plain text is rejected. + +### Delimiter Configuration Policy + +- Logical separator is configurable (for example `:`, `;`, `$`) and can be chosen to avoid conflict + with existing names. +- Physical separator in storage remains fixed as `.`. +- Recommended: keep logical separator stable after nested namespace is enabled for a catalog. +- Delimiter is configured at server level for nested-namespace parsing behavior. + +Two delimiter-governance options are under evaluation: + +- **Option 1 (restricted delimiter set)**: + - Server only accepts delimiters from a predefined allowed set. + - Delimiter is treated as a reserved hierarchy marker in nested-aware flows. + - Behavior difference for new object creation: + - Iceberg nested schema creation that uses delimiter as hierarchy path is allowed. + - Hive creation of a non-nested schema name that contains the configured delimiter is rejected + with validation error. +- **Option 2 (unrestricted delimiter)**: + - Server allows any delimiter value configured by users. + - Compatibility is prioritized for engines that treat schema name as plain string. + - Behavior difference for new object creation: + - Hive can create a non-nested schema successfully even when the schema name contains the + configured delimiter. + - Nested interpretation is applied only in nested-aware request paths (for example Iceberg + namespace APIs), not as a blanket rule for all engines. + +### Delimiter Validation and Rejection Rationale + +Delimiter validity should be explicit and observable, not implicit. + +- Validation checkpoints: + - Validate delimiter when server starts or when delimiter configuration is updated. + - Re-validate against existing catalog schema names before enabling nested mode for a catalog. + - Reject invalid delimiter configuration early before request-time namespace operations. Review Comment: If it is invalid, will you fail the server? What is your behavior? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
