roryqi commented on code in PR #10735: URL: https://github.com/apache/gravitino/pull/10735#discussion_r3078542848
########## design-docs/gravitino-function-privilege.md: ########## @@ -0,0 +1,582 @@ +# Design of Function Privilege Control in Gravitino + +## Background + +Apache Gravitino provides a unified function registry that allows users to define custom functions (UDFs) once and share them across multiple compute engines (Trino, Spark, Flink). Gravitino currently supports full function metadata management — register, list, get, alter, drop — but **does not yet have privilege control for functions**. + +The existing Gravitino access control framework covers catalogs, schemas, tables, views, topics, filesets, models, tags, policies, and jobs. Each object type has well-defined privilege types (e.g., `CREATE_TABLE`, `SELECT_TABLE`, `MODIFY_TABLE`) with privilege inheritance through the metalake → catalog → schema hierarchy. Functions are the notable gap in this model: + +- **No function privilege types** are defined in `Privilege.Name` — any authenticated user can register, execute, alter, or drop any function. +- **No function visibility control** — `listFunctions` returns all functions regardless of the user's permissions. +- **No ownership tracking** — `FunctionHookDispatcher` is missing, so function ownership is not set on creation. + +--- + +## Goals + +1. **Integrate with Existing Access Control Framework**: Define function privilege types that follow established Gravitino naming conventions and privilege inheritance patterns. + +2. **Function Visibility Control**: Users should only see functions they have privileges on. `listFunctions` and `getFunction` should filter results based on user permissions, following the "can't see what you can't execute" pattern found across all surveyed systems. + +3. **Ownership Tracking**: Functions should have owners, set automatically on registration and manageable through Gravitino's existing ownership mechanism. + +4. **Backward Compatibility**: Existing function management APIs remain unchanged. Privilege enforcement is additive — when authorization is disabled, behavior is identical to current functionality. + +--- + +## Non-Goals + +1. **DEFINER/INVOKER Security Mode**: Gravitino's function metadata model does not currently include a `securityType` field. Adding security mode support to functions is a separate metadata model concern and is outside the scope of this privilege design. + +2. **New Function Management Capabilities**: This design adds privilege control on top of the existing function management API. No new CRUD operations or metadata model changes are introduced. + +3. **Per-Definition Privilege Control**: Privileges are defined at the function level, not at the individual function definition (overload) level. A user who can execute a function can execute any of its definitions. + +4. **Built-in Function Privilege Control**: Built-in functions provided by compute engines are not managed by Gravitino and are outside the scope of this design. + +--- + +## Proposal + +### Privilege Types + +Three privilege types are defined for functions, following established Gravitino privilege naming conventions: + +| Privilege | Securable Object Levels | Description | +|-----------|------------------------|-------------| +| `REGISTER_FUNCTION` | Metalake, Catalog, Schema | Permission to register new functions | +| `EXECUTE_FUNCTION` | Metalake, Catalog, Schema, Function | Permission to execute/invoke a function and view its metadata | +| `MODIFY_FUNCTION` | Metalake, Catalog, Schema, Function | Permission to alter a function's metadata | + +**Naming rationale:** + +- `REGISTER_FUNCTION` — Consistent with `REGISTER_MODEL` and the `registerFunction` API method. Gravitino uses "register" for managed metadata objects (functions, models) to distinguish from "create" used for delegated objects (tables, views, schemas). +- `EXECUTE_FUNCTION` — The universally adopted privilege name across all surveyed systems (Databricks UC, Trino, MySQL, PostgreSQL, OceanBase, Hologres). Analogous to `SELECT_TABLE` / `SELECT_VIEW` in semantics (the fundamental "use" privilege), but `EXECUTE` is the standard term for functions. +- `MODIFY_FUNCTION` — Consistent with `MODIFY_TABLE`. Covers alter operations on function metadata (comment, definitions, properties). Drop operations require function ownership, following the same pattern as table and fileset drop (see `TableOperations.java` and `FilesetOperations.java` where DROP uses owner-only expressions). + +**Privilege inheritance:** Privileges granted at metalake, catalog, or schema level cascade to all functions within that scope, consistent with existing behavior for tables and filesets. + +**Deny privileges:** Each privilege has a corresponding deny form (`DENY_REGISTER_FUNCTION`, `DENY_EXECUTE_FUNCTION`, `DENY_MODIFY_FUNCTION`) following the existing deny privilege pattern. + +--- + +### Securable Object Hierarchy + +Functions follow the existing three-level namespace hierarchy: + +``` +metalake + └── catalog + └── schema + └── function +``` + +This is consistent with tables, filesets, and other schema-scoped objects. Functions can be registered under any catalog type that supports schemas (relational, Hive, Iceberg, etc.). A new `MetadataObject.Type.FUNCTION` type and `SecurableObjects.ofFunction()` factory method are added. + +**Privilege applicability by level:** + +| Securable Object | REGISTER_FUNCTION | EXECUTE_FUNCTION | MODIFY_FUNCTION | +|-----------------|:---:|:---:|:---:| +| Metalake | ✅ | ✅ | ✅ | +| Catalog | ✅ | ✅ | ✅ | +| Schema | ✅ | ✅ | ✅ | +| Function | — | ✅ | ✅ | + +> `REGISTER_FUNCTION` is not applicable at the function level because creation happens at the schema level (a function must be created within a schema). + +--- + +### Visibility Control + +Function visibility follows the "can't see what you can't execute" pattern observed across all surveyed systems: + +1. **`listFunctions`** — Returns only functions the user has at least one privilege on (`EXECUTE_FUNCTION`, `MODIFY_FUNCTION`, or ownership). `REGISTER_FUNCTION` alone does not grant visibility. This is implemented via a filter expression applied to the result set, consistent with table and fileset listing. + +2. **`getFunction`** — Requires `EXECUTE_FUNCTION`, `MODIFY_FUNCTION`, or ownership. If the user lacks privileges, the authorization framework denies access. + +3. **Function definition protection** — Function definitions (implementations, source code) are part of the function metadata returned by `getFunction`. Since `getFunction` requires privilege, function definitions are protected by default. + +--- + +### Authorization Pushdown — Not Applicable + +Unlike tables, which are delegated to underlying data sources (Hive, MySQL, Iceberg, etc.) that have their own privilege systems, **function management is fully managed by Gravitino** — all function metadata is stored in Gravitino's own database via `ManagedFunctionOperations`. There is no delegation to external catalogs. + +Therefore, **authorization pushdown is not needed for functions**. Gravitino's own authorization layer is the single enforcement point for all function privilege checks. This is simpler than the table model and eliminates the complexity of privilege mapping to heterogeneous data source privilege systems. + +--- + +### Authorization Expressions + +Authorization expressions define the privilege requirements for each function operation. These follow the established pattern from tables and filesets (see `TableOperations.java` and `FilesetOperations.java` for reference implementations). + +#### Register Function + +``` +ANY(OWNER, METALAKE, CATALOG) || +SCHEMA_OWNER_WITH_USE_CATALOG || +ANY_USE_CATALOG && ANY_USE_SCHEMA && ANY_REGISTER_FUNCTION +``` + +- Metalake/catalog/schema owners can always register functions. +- Non-owners need `USE_CATALOG` + `USE_SCHEMA` + `REGISTER_FUNCTION`. +- The `accessMetadataType` is `SCHEMA` (the parent container). + +#### Get Function + +``` +ANY(OWNER, METALAKE, CATALOG) || +SCHEMA_OWNER_WITH_USE_CATALOG || +ANY_USE_CATALOG && ANY_USE_SCHEMA && (FUNCTION::OWNER || ANY_EXECUTE_FUNCTION || ANY_MODIFY_FUNCTION) +``` + +- Metalake/catalog/schema owners can always view function metadata. +- Non-owners need `USE_CATALOG` + `USE_SCHEMA` + (`EXECUTE_FUNCTION` or `MODIFY_FUNCTION` or function ownership). +- The `accessMetadataType` is `FUNCTION`. + +#### List Functions + +Listing uses **filter-based authorization** rather than deny-based, consistent with how table listing works: + +``` +ANY(OWNER, METALAKE, CATALOG, SCHEMA, FUNCTION) || +ANY_EXECUTE_FUNCTION || +ANY_MODIFY_FUNCTION +``` + +Functions are filtered from the list if the user has no matching privilege. This implements the "can't see what you can't execute" pattern. Note that `REGISTER_FUNCTION` alone does not grant visibility (consistent with `CREATE_TABLE` not granting table visibility). + +#### Alter Function + +``` +ANY(OWNER, METALAKE, CATALOG) || +SCHEMA_OWNER_WITH_USE_CATALOG || +ANY_USE_CATALOG && ANY_USE_SCHEMA && (FUNCTION::OWNER || ANY_MODIFY_FUNCTION) +``` + +- Metalake/catalog/schema owners and function owners can alter functions. +- Non-owners need `USE_CATALOG` + `USE_SCHEMA` + `MODIFY_FUNCTION`. + +#### Drop Function + +``` +ANY(OWNER, METALAKE, CATALOG) || +SCHEMA_OWNER_WITH_USE_CATALOG || +ANY_USE_CATALOG && ANY_USE_SCHEMA && FUNCTION::OWNER +``` + +- Only function owners (and metalake/catalog/schema owners) can drop functions. +- This follows the same pattern as `DROP_TABLE` and `DROP_FILESET`, where only the object owner can perform the drop — `MODIFY_FUNCTION` alone is not sufficient for drop. + +--- + +### Implementation Changes Review Comment: Could u remove this segment? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
