lasdf1234 commented on code in PR #10883:
URL: https://github.com/apache/gravitino/pull/10883#discussion_r3158325081


##########
design-docs/gravitino-local-authentication.md:
##########
@@ -0,0 +1,679 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one
+  or more contributor license agreements.  See the NOTICE file
+  distributed with this work for additional information
+  regarding copyright ownership.  The ASF licenses this file
+  to you under the Apache License, Version 2.0 (the
+  "License"); you may not use this file except in compliance
+  with the License.  You may obtain a copy of the License at
+
+   http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing,
+  software distributed under the License is distributed on an
+  "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
+  KIND, either express or implied.  See the License for the
+  specific language governing permissions and limitations
+  under the License.
+-->
+
+# Design of Local Authentication Support in Gravitino
+
+## 1. Background
+
+Apache Gravitino already has mature support for OAuth 2.0 authentication. 
Today, Gravitino acts as
+an OAuth 2.0 client and delegates authentication to an external identity 
provider (IdP), typically
+using the Client Credentials flow together with Bearer JWT.
+
+This model works well in enterprise deployments where an external IdP such as 
Okta, Azure AD, or
+Keycloak already exists. However, it introduces friction in several important 
scenarios:
+
+- **POC and demo environments**: users want to start Gravitino in minutes, 
without first deploying
+  and configuring a dedicated IdP.
+- **Offline or isolated environments**: air-gapped, edge, or embedded 
deployments may not have
+  access to an external identity service.
+- **Data sovereignty requirements**: some organizations do not allow identity 
information to be
+  managed by an external service.
+- **Operational simplicity**: small deployments may not want the cost and 
maintenance burden of a
+  separate OAuth server.
+
+To address these cases, Gravitino should provide an optional built-in local 
IdP mode with a simple
+username/password authentication flow.
+
+---
+
+## 2. Goals
+
+1. **Lower the barrier to entry**: allow users to evaluate and use Gravitino 
without deploying an
+   external IdP.
+
+2. **Support self-contained deployments**: provide a fully local 
authentication mechanism for
+   offline, air-gapped, and privacy-sensitive environments.
+
+3. **Keep the design intentionally simple**: optimize for POC and small 
deployment scenarios rather
+   than building a full-featured general-purpose identity platform.
+
+4. **Avoid vendor lock-in**: let users run Gravitino in environments where 
third-party IdPs are
+   impractical, undesirable, or cost-prohibitive.
+
+---
+
+## 3. Proposal
+
+### 3.1 Authentication Model
+
+The local IdP is introduced as a new Gravitino authenticator mode: **basic**.
+
+When enabled, Gravitino authenticates incoming requests through HTTP Basic 
authentication:
+
+```text
+Authorization: Basic <base64(username:password)>
+```
+
+This mode is intended for quick-start deployments and isolated environments. 
It should work out of
+the box with a minimal configuration and without any dependency on an external 
identity system.
+
+### 3.2 Why Basic Authentication
+
+The surveyed systems show that local authentication support typically 
converges on
+username/password-based flows. For Gravitino, simplicity matters more than 
protocol richness:
+
+- it shortens time-to-first-use,
+- it is easy to explain and operate,
+- it fits POC and offline scenarios well,
+- and it avoids introducing token lifecycle complexity into the server.
+
+For these reasons, the initial local IdP implementation uses:
+
+| Item | Decision |
+|---|---|
+| Credential type | Username / password |
+| Password storage | Database |
+| Local token support | No |
+| Recommended deployment scope | POC, offline, and isolated scenarios |
+
+### 3.3 Why Database Storage
+
+Passwords and user/group metadata should be stored in the Gravitino relational 
store rather than in
+files:
+
+- **File-based storage** requires a server restart to add users or rotate 
passwords.
+- **Database storage** supports normal metadata-style CRUD operations and 
matches Gravitino's
+  existing persistence model.
+
+Database-backed storage is the most practical choice for a built-in local IdP.
+
+### 3.4 Module Layout
+
+The local IdP feature should be implemented as an independent Gravitino module 
rather than being
+scattered directly across the existing `server-common`, `server`, and `core` 
modules.
+
+The recommended module name is:
+
+- `authenticators:authenticator-local-idp`
+
+This naming keeps the capability grouping explicit and avoids tying the module 
name to the HTTP
+Basic transport syntax alone. Although the initial login flow uses Basic 
authentication, the module
+itself is responsible for the broader local IdP capability set, including:
+
+- local user and local group management,
+- password hashing and verification,
+- bootstrap credential handling,
+- and the local IdP management API wiring.
+
+At a high level, the implementation can still integrate with existing 
Gravitino modules as needed:
+
+- `core` for relational storage integration,
+- `server-common` for authenticator/filter extension points,
+- and `server` for REST resource exposure.
+
+However, the Local IdP-specific logic should be owned primarily by
+`authenticators:authenticator-local-idp` so that the feature has a clear 
packaging boundary and can
+evolve independently from the generic server authentication framework.
+
+---
+
+## 4. Password Hashing
+
+User credentials must never be stored in plaintext. Passwords are stored as 
password hashes in the
+database.
+
+Among the common password hashing algorithms, **Argon2id** is the recommended 
choice for
+Gravitino.
+
+| Algorithm | Status |
+|---|---|
+| Argon2id | Recommended default |
+
+The initial design uses **Argon2id** as the only supported algorithm, which 
keeps the
+implementation simple while aligning with modern password storage 
recommendations.
+
+To make this implementable, the password hashing design should also define the 
storage and
+dependency model explicitly:
+
+- introduce one dedicated server-side password-hashing dependency that 
supports Argon2id
+- store the full Argon2id hash string in `password_hash`, including algorithm 
marker, parameters,
+  salt, and hash output
+- use a self-describing format so future parameter tuning does not require 
schema changes
+
+For example, `password_hash` should store a PHC-style string such as:
+
+```text
+$argon2id$v=19$m=65536,t=3,p=1$<salt>$<hash>
+```
+
+This keeps verification logic simple and allows future upgrades of Argon2id 
cost parameters without
+introducing additional columns.
+
+---
+
+## 5. Data Model
+
+The local IdP requires three new tables:
+
+1. `local_user_meta` — local user records
+2. `local_group_meta` — local group records
+3. `local_group_user_rel` — user/group membership mapping
+
+These tables follow Gravitino's existing metadata table conventions:
+
+- numeric primary keys,
+- `audit_info`,
+- optimistic version fields,
+- and `deleted_at` for soft deletion.
+
+Soft-deleted rows in `local_user_meta` and `local_group_meta` should be 
cleaned asynchronously by
+Gravitino's GC thread, following the same lifecycle management pattern used by 
other metadata
+tables. When a local user or local group is physically removed by the GC 
thread, the implementation
+should also clean the corresponding soft-deleted rows in 
`local_group_user_rel` to avoid leaving
+orphaned membership records.
+
+Unlike Gravitino's existing `user_meta` and `group_meta` tables, 
`local_user_meta` and
+`local_group_meta` are intentionally designed as **global identity tables** 
and therefore **do not
+contain `metalake_id`**. The purpose of these tables is to store local 
authentication identities
+and credentials once at the server level, instead of duplicating the same 
login identity in every
+metalake.
+
+### 5.1 `local_user_meta`
+
+```sql
+CREATE TABLE IF NOT EXISTS `local_user_meta` (
+    `user_id` BIGINT(20) UNSIGNED NOT NULL COMMENT 'user id',
+    `user_name` VARCHAR(128) NOT NULL COMMENT 'username',
+    `password_hash` VARCHAR(1024) NOT NULL COMMENT 'hashed password',
+    `audit_info` MEDIUMTEXT NOT NULL COMMENT 'user audit info',
+    `current_version` INT UNSIGNED NOT NULL DEFAULT 1 COMMENT 'user current 
version',
+    `last_version` INT UNSIGNED NOT NULL DEFAULT 1 COMMENT 'user last version',
+    `deleted_at` BIGINT(20) UNSIGNED NOT NULL DEFAULT 0 COMMENT 'user deleted 
at',
+    PRIMARY KEY (`user_id`),
+    UNIQUE KEY `uk_un_del` (`user_name`, `deleted_at`)
+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT 'local 
user metadata';
+```
+
+### 5.2 `local_group_meta`
+
+```sql
+CREATE TABLE IF NOT EXISTS `local_group_meta` (
+    `group_id` BIGINT(20) UNSIGNED NOT NULL COMMENT 'group id',
+    `group_name` VARCHAR(128) NOT NULL COMMENT 'group name',
+    `audit_info` MEDIUMTEXT NOT NULL COMMENT 'group audit info',
+    `current_version` INT UNSIGNED NOT NULL DEFAULT 1 COMMENT 'group current 
version',
+    `last_version` INT UNSIGNED NOT NULL DEFAULT 1 COMMENT 'group last 
version',
+    `deleted_at` BIGINT(20) UNSIGNED NOT NULL DEFAULT 0 COMMENT 'group deleted 
at',
+    PRIMARY KEY (`group_id`),
+    UNIQUE KEY `uk_gn_del` (`group_name`, `deleted_at`)
+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT 'local 
group metadata';
+```
+
+### 5.3 `local_group_user_rel`
+
+```sql
+CREATE TABLE IF NOT EXISTS `local_group_user_rel` (
+    `id` BIGINT(20) UNSIGNED NOT NULL AUTO_INCREMENT COMMENT 'auto increment 
id',
+    `group_id` BIGINT(20) UNSIGNED NOT NULL COMMENT 'local group id',
+    `user_id` BIGINT(20) UNSIGNED NOT NULL COMMENT 'local user id',
+    `audit_info` MEDIUMTEXT NOT NULL COMMENT 'relation audit info',
+    `current_version` INT UNSIGNED NOT NULL DEFAULT 1 COMMENT 'relation 
current version',
+    `last_version` INT UNSIGNED NOT NULL DEFAULT 1 COMMENT 'relation last 
version',
+    `deleted_at` BIGINT(20) UNSIGNED NOT NULL DEFAULT 0 COMMENT 'relation 
deleted at',
+    PRIMARY KEY (`id`),
+    UNIQUE KEY `uk_gi_ui_del` (`group_id`, `user_id`, `deleted_at`),
+    KEY `idx_uid` (`user_id`)
+) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_bin COMMENT 'local 
group user relation';
+```
+
+### 5.4 Relationship Model
+
+The logical entity relationship is straightforward:
+
+```text
+local_user_meta
+    └──< local_group_user_rel >── local_group_meta
+```
+
+For integration with Gravitino's existing access control model, the local IdP 
tables are also
+logically associated with the existing metadata tables:
+
+- `local_user_meta` is associated with `user_meta` through `user_name`
+- `local_group_meta` is associated with `group_meta` through `group_name`
+
+In other words, `local_user_meta` stores local authentication credentials, 
while `user_meta`
+continues to represent the Gravitino user object used by the current 
authorization model.
+Similarly, `local_group_meta` stores local group identities, while 
`group_meta` remains the
+authorization-side group metadata.
+
+Because `user_meta` and `group_meta` are metalake-scoped while 
`local_user_meta` and
+`local_group_meta` are global, this association should be treated as a 
**name-based logical
+mapping**, not as a database-level one-to-one foreign key constraint. A single 
local user or local
+group may correspond to multiple `user_meta` or `group_meta` entries with the 
same name across
+different metalakes.
+
+The combined relationship can be viewed as:
+
+```text
+local_user_meta --(user_name, logical mapping)--> user_meta[*]
+local_group_meta --(group_name, logical mapping)--> group_meta[*]
+
+local_user_meta
+    └──< local_group_user_rel >── local_group_meta
+```
+
+This supports direct username lookup for authentication, group resolution for 
authorization, and a
+clear mapping from local IdP identities to Gravitino's existing user/group 
metadata model while
+preserving the requirement that local identity tables remain global and 
metalake-agnostic.
+
+---
+
+## 6. Bootstrap Experience
+
+To keep local IdP mode usable immediately after installation, Gravitino should 
provision a default
+administrator account on first startup.
+
+### 6.1 Initial Administrator
+
+- only one bootstrap account is created by default: the configured **service 
admin** account (for
+  example, **adminUser**)
+- the default password is **123456**
+- the default password **123456** is intended only for the initial bootstrap 
login and the immediate
+  password reset flow
+- other management operations must not rely on the bootstrap password
+- after the first login, the service admin is expected to reset the bootstrap 
password immediately
+
+This design favors usability for POC scenarios. The default password is 
intentionally simple so that
+the service admin can complete the bootstrap flow immediately, but it must be 
treated as a
+bootstrap-only credential rather than as a secure long-term password or a 
general-purpose access
+credential for other APIs.
+
+### 6.2 Bootstrap Process
+
+The bootstrap process should be:
+
+1. When the `basic` authenticator is enabled for the first time, check whether 
the configured
+   bootstrap **service admin** account already exists in `local_user_meta`.
+2. If the bootstrap account does not exist, the web filter allows the 
configured bootstrap service
+   admin account (for example, **adminUser**) with password **123456** to pass 
Basic verification
+   only for the bootstrap login and immediate password reset flow.
+3. Reject other management operations until the bootstrap password has been 
reset successfully.
+4. During the first successful password reset, create the bootstrap service 
admin record and store
+   the new password hash in `password_hash`.
+5. After the password reset succeeds, treat the account as a normal service 
admin account for later
+   authentication and authorization.
+
+### 6.3 Example Basic Authentication Bootstrap Flow
+
+The following end-to-end flow shows how a user can start from a fresh 
Gravitino deployment, enable
+Basic authentication, configure the bootstrap service admin identity, and 
immediately rotate the
+bootstrap password.
+
+1. Deploy Gravitino with the `basic` authenticator enabled:
+
+   ```properties
+   gravitino.authenticators=basic
+   gravitino.authenticator.basic.passwordHasher=Argon2id
+   gravitino.authorization.serviceAdmins=adminUser
+   ```
+
+2. Start Gravitino.
+
+3. On first startup, no active `adminUser` record exists yet in 
`local_user_meta`, so Gravitino
+   accepts the bootstrap credential `adminUser:123456` only for the bootstrap 
login and immediate

Review Comment:
   For basic authentication, the user is stateless. Therefore, logic can be 
added in the filter: check the local_user_meta table. If the serviceAdmin does 
not have any data in this table, then the serviceAdmin must not have been 
registered in the built-in IDP. At this time, the filter will intercept all 
requests except for "change password" and prompt the serviceAdmin account to 
change the password.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to