Re: [PR] [MINOR] docs: add architecture-first optimizer guide and improve discoverability [gravitino]

via GitHub Mon, 09 Mar 2026 05:37:39 -0700


FANNG1 commented on code in PR #10203:
URL: https://github.com/apache/gravitino/pull/10203#discussion_r2905166810



##########
docs/table-maintenance-service/optimizer-extension-guide.md:
##########
@@ -0,0 +1,128 @@
+---
+title: "Optimizer Extension Guide"
+slug: /table-maintenance-service/extension-guide
+keyword: table maintenance, optimizer, extension, provider, ServiceLoader
+license: This software is licensed under the Apache License version 2.
+---
+
+Use this guide when built-in optimizer components do not match your 
environment and you need custom implementations.
+
+## Extension model
+
+Optimizer supports three loading patterns:
+
+1. `Provider` SPI (`name()` + `initialize()`): loaded by `ServiceLoader` and 
selected by config value.
+2. Class-name mapping for strategy handlers and job adapters.
+3. Typed SPI for `StatisticsCalculator` and `MetricsEvaluator`.
+
+## Extension points and config keys
+
+| Area | Interface / type | Config key | Loading mode |
+| --- | --- | --- | --- |
+| Recommender statistics | `StatisticsProvider` | 
`gravitino.optimizer.recommender.statisticsProvider` | `Provider` SPI by 
`name()` |
+| Recommender strategy source | `StrategyProvider` | 
`gravitino.optimizer.recommender.strategyProvider` | `Provider` SPI by `name()` 
|
+| Recommender table metadata | `TableMetadataProvider` | 
`gravitino.optimizer.recommender.tableMetaProvider` | `Provider` SPI by 
`name()` |
+| Recommender job submission | `JobSubmitter` | 
`gravitino.optimizer.recommender.jobSubmitter` | `Provider` SPI by `name()` |
+| Strategy evaluation logic | `StrategyHandler` | 
`gravitino.optimizer.strategyHandler.<strategyType>.className` | Reflection by 
class name |
+| Job template adaptation | `GravitinoJobAdapter` | 
`gravitino.optimizer.jobAdapter.<jobTemplate>.className` | Reflection by class 
name |
+| Update statistics sink | `StatisticsUpdater` | 
`gravitino.optimizer.updater.statisticsUpdater` | `Provider` SPI by `name()` |
+| Update metrics sink | `MetricsUpdater` | 
`gravitino.optimizer.updater.metricsUpdater` | `Provider` SPI by `name()` |
+| Monitor metrics source | `MetricsProvider` | 
`gravitino.optimizer.monitor.metricsProvider` | `Provider` SPI by `name()` |
+| Monitor table-job relation | `TableJobRelationProvider` | 
`gravitino.optimizer.monitor.tableJobRelationProvider` | `Provider` SPI by 
`name()` |
+| Monitor evaluator | `MetricsEvaluator` | 
`gravitino.optimizer.monitor.metricsEvaluator` | Typed SPI 
(`ServiceLoader<MetricsEvaluator>`) |
+| Monitor callbacks | `MonitorCallback` | 
`gravitino.optimizer.monitor.callbacks` | `Provider` SPI by `name()` 
(comma-separated) |
+| CLI calculator | `StatisticsCalculator` | CLI `--calculator-name` | Typed 
SPI (`ServiceLoader<StatisticsCalculator>`) |
+
+## Implement a custom provider
+
+Most extension points use `Provider`:
+
+```java
+public class MyStatisticsProvider implements StatisticsProvider {
+  @Override
+  public String name() {
+    return "my-statistics-provider";
+  }
+
+  @Override
+  public void initialize(OptimizerEnv optimizerEnv) {
+    // Initialize clients/resources from optimizer config.
+  }
+
+  @Override
+  public void close() throws Exception {}
+}
+```
+
+Requirements:
+
+- Keep a stable `name()` value; config resolves by this name 
(case-insensitive).
+- Provide a public no-arg constructor.
+- Implement `initialize(OptimizerEnv)` and `close()` lifecycle correctly.
+
+## Register with ServiceLoader
+
+### For `Provider` implementations
+
+Create file:
+
+`META-INF/services/org.apache.gravitino.maintenance.optimizer.api.common.Provider`

Review Comment:
   Verified against source: these three FQNs match the interfaces under  
exactly (, , ).



##########
docs/table-maintenance-service/optimizer-quick-start.md:
##########
@@ -0,0 +1,237 @@
+---
+title: "Optimizer Quick Start and Verification"
+slug: /table-maintenance-service/quick-start
+keyword: table maintenance, optimizer, quick start, compaction, update stats
+license: This software is licensed under the Apache License version 2.
+---
+
+## Before running quick start
+
+- Prepare a running Gravitino server.
+- Ensure target metalake exists (examples use `test`).
+- Configure `SPARK_HOME` or `gravitino.jobExecutor.local.sparkHome` for Spark 
templates.
+- If your Iceberg REST backend is in-memory, metadata is reset after restart.
+
+For full config details, see [Optimizer 
Configuration](./optimizer-configuration.md).
+
+## Success criteria
+
+- Update-stats job finishes and statistics include `custom-data-file-mse` and 
`custom-delete-file-number`.
+- `submit-strategy-jobs` prints `SUBMIT` with a rewrite job ID.
+- Rewrite job log shows `Rewritten data files: <N>` where `N > 0` for 
non-empty tables.
+
+## Quick start A: built-in table maintenance workflow
+
+This workflow uses:
+
+- Built-in policy type: `system_iceberg_compaction`
+- Built-in update stats job template: `builtin-iceberg-update-stats`
+- Built-in rewrite data files job template: 
`builtin-iceberg-rewrite-data-files`
+
+### 1. Preflight checks
+
+```bash
+# Check metalake
+curl -sS "http://localhost:8090/api/metalakes/test"; | jq
+
+# Check built-in templates
+curl -sS 
"http://localhost:8090/api/metalakes/test/jobs/templates?details=true"; | jq 
'.jobTemplates[].name'

Review Comment:
   Confirmed:  is defined in OpenAPI for jobs templates (). I kept the  usage 
in the command example and removed the extra explanatory sentence to keep the 
section concise.



##########
docs/table-maintenance-service/optimizer-quick-start.md:
##########
@@ -0,0 +1,237 @@
+---
+title: "Optimizer Quick Start and Verification"
+slug: /table-maintenance-service/quick-start
+keyword: table maintenance, optimizer, quick start, compaction, update stats
+license: This software is licensed under the Apache License version 2.
+---
+
+## Before running quick start
+
+- Prepare a running Gravitino server.
+- Ensure target metalake exists (examples use `test`).
+- Configure `SPARK_HOME` or `gravitino.jobExecutor.local.sparkHome` for Spark 
templates.
+- If your Iceberg REST backend is in-memory, metadata is reset after restart.
+
+For full config details, see [Optimizer 
Configuration](./optimizer-configuration.md).
+
+## Success criteria
+
+- Update-stats job finishes and statistics include `custom-data-file-mse` and 
`custom-delete-file-number`.
+- `submit-strategy-jobs` prints `SUBMIT` with a rewrite job ID.
+- Rewrite job log shows `Rewritten data files: <N>` where `N > 0` for 
non-empty tables.
+
+## Quick start A: built-in table maintenance workflow
+
+This workflow uses:
+
+- Built-in policy type: `system_iceberg_compaction`
+- Built-in update stats job template: `builtin-iceberg-update-stats`
+- Built-in rewrite data files job template: 
`builtin-iceberg-rewrite-data-files`
+
+### 1. Preflight checks
+
+```bash
+# Check metalake
+curl -sS "http://localhost:8090/api/metalakes/test"; | jq
+
+# Check built-in templates
+curl -sS 
"http://localhost:8090/api/metalakes/test/jobs/templates?details=true"; | jq 
'.jobTemplates[].name'
+```
+
+Expected names include:
+
+- `builtin-iceberg-update-stats`
+- `builtin-iceberg-rewrite-data-files`
+
+If missing, verify `gravitino-jobs` JAR in `auxlib`, then restart Gravitino.
+
+### 2. Prepare demo metadata objects
+
+Create a REST Iceberg catalog, schema, and table:
+
+```bash
+# Create catalog (ignore "already exists" errors)
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "name": "rest_catalog",
+    "type": "RELATIONAL",
+    "comment": "Iceberg REST catalog",
+    "provider": "lakehouse-iceberg",
+    "properties": {
+      "catalog-backend": "rest",
+      "uri": "http://localhost:9001/iceberg";
+    }
+  }' \
+  http://localhost:8090/api/metalakes/test/catalogs
+
+# Create schema
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "name": "db",
+    "comment": "optimizer demo schema",
+    "properties": {}
+  }' \
+  http://localhost:8090/api/metalakes/test/catalogs/rest_catalog/schemas
+
+# Create table
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "name": "t1",
+    "comment": "optimizer demo table",
+    "columns": [
+      {"name": "id", "type": "integer", "nullable": true},
+      {"name": "name", "type": "string", "nullable": true}
+    ],
+    "properties": {}
+  }' \
+  
http://localhost:8090/api/metalakes/test/catalogs/rest_catalog/schemas/db/tables
+```
+
+### 3. Seed demo data (recommended)
+
+Use Spark SQL to create enough small files so compaction has visible effect:
+
+```bash
+${SPARK_HOME}/bin/spark-sql \
+  --conf spark.hadoop.fs.defaultFS=file:/// \
+  --conf spark.sql.catalog.rest_demo=org.apache.iceberg.spark.SparkCatalog \
+  --conf spark.sql.catalog.rest_demo.type=rest \
+  --conf spark.sql.catalog.rest_demo.uri=http://localhost:9001/iceberg \
+  -e "CREATE NAMESPACE IF NOT EXISTS rest_demo.db; \
+      SET spark.sql.files.maxRecordsPerFile=1000; \
+      INSERT INTO rest_demo.db.t1 \
+      SELECT id, concat('name_', CAST(id AS STRING)) FROM range(0, 100000);"
+```
+
+### 4. Create and attach built-in compaction policy
+
+```bash
+# Create policy
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "name": "iceberg_compaction_default",
+    "comment": "Built-in iceberg compaction policy",
+    "policyType": "system_iceberg_compaction",
+    "enabled": true,
+    "content": {}
+  }' \
+  http://localhost:8090/api/metalakes/test/policies
+
+# Attach policy to table
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "policiesToAdd": ["iceberg_compaction_default"]
+  }' \
+  
http://localhost:8090/api/metalakes/test/objects/table/rest_catalog.db.t1/policies
+```
+
+Verify association:
+
+```bash
+curl -sS 
"http://localhost:8090/api/metalakes/test/objects/table/rest_catalog.db.t1/policies?details=true";
 | jq
+```
+
+### 5. Submit built-in update stats job
+
+```bash
+update_stats_job_id=$(curl -sS -X POST -H "Accept: 
application/vnd.gravitino.v1+json" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "jobTemplateName": "builtin-iceberg-update-stats",
+    "jobConf": {
+      "catalog_name": "rest_catalog",
+      "table_identifier": "db.t1",
+      "update_mode": "all",
+      "updater_options": 
"{\"gravitino_uri\":\"http://localhost:8090\",\"metalake\":\"test\",\"statistics_updater\":\"gravitino-statistics-updater\",\"metrics_updater\":\"gravitino-metrics-updater\"}";,
+      "spark_conf": 
"{\"spark.master\":\"local[2]\",\"spark.hadoop.fs.defaultFS\":\"file:///\"}",
+      "spark_master": "local[2]",
+      "spark_executor_instances": "1",
+      "spark_executor_cores": "1",
+      "spark_executor_memory": "1g",
+      "spark_driver_memory": "1g",
+      "catalog_type": "rest",
+      "catalog_uri": "http://localhost:9001/iceberg";,
+      "warehouse_location": ""
+    }
+  }' \
+  http://localhost:8090/api/metalakes/test/jobs/runs | jq -r '.job.jobId')
+
+echo "update-stats job id: ${update_stats_job_id}"
+```
+
+### 6. Trigger rewrite submission with `submit-strategy-jobs`
+
+```bash
+# Required optimizer CLI config for strategy submission.
+# Note: --strategy-name is policy name, not strategy.type.
+cat > /tmp/gravitino-optimizer-submit.conf <<'EOF_CONF'
+gravitino.optimizer.gravitinoUri = http://localhost:8090
+gravitino.optimizer.gravitinoMetalake = test
+gravitino.optimizer.gravitinoDefaultCatalog = rest_catalog
+gravitino.optimizer.recommender.statisticsProvider = 
gravitino-statistics-provider
+gravitino.optimizer.recommender.strategyProvider = gravitino-strategy-provider
+gravitino.optimizer.recommender.tableMetaProvider = 
gravitino-table-metadata-provider
+gravitino.optimizer.recommender.jobSubmitter = gravitino-job-submitter
+gravitino.optimizer.strategyHandler.iceberg-data-compaction.className = 
org.apache.gravitino.maintenance.optimizer.recommender.handler.compaction.CompactionStrategyHandler
+gravitino.optimizer.jobSubmitterConfig.catalog_name = rest_catalog
+gravitino.optimizer.jobSubmitterConfig.spark_master = local[2]
+gravitino.optimizer.jobSubmitterConfig.spark_executor_instances = 1
+gravitino.optimizer.jobSubmitterConfig.spark_executor_cores = 1
+gravitino.optimizer.jobSubmitterConfig.spark_executor_memory = 1g
+gravitino.optimizer.jobSubmitterConfig.spark_driver_memory = 1g
+gravitino.optimizer.jobSubmitterConfig.catalog_type = rest
+gravitino.optimizer.jobSubmitterConfig.catalog_uri = 
http://localhost:9001/iceberg
+gravitino.optimizer.jobSubmitterConfig.warehouse_location =
+gravitino.optimizer.jobSubmitterConfig.spark_conf = 
{"spark.master":"local[2]","spark.hadoop.fs.defaultFS":"file:///"}
+EOF_CONF
+
+# Optional: preview recommendations without submitting jobs.
+./bin/gravitino-optimizer.sh \
+  --type submit-strategy-jobs \
+  --identifiers rest_catalog.db.t1 \
+  --strategy-name iceberg_compaction_default \
+  --dry-run \
+  --limit 10 \
+  --conf-path /tmp/gravitino-optimizer-submit.conf
+
+# Submit rewrite job through strategy evaluation.
+submit_output=$(./bin/gravitino-optimizer.sh \
+  --type submit-strategy-jobs \
+  --identifiers rest_catalog.db.t1 \
+  --strategy-name iceberg_compaction_default \
+  --limit 10 \
+  --conf-path /tmp/gravitino-optimizer-submit.conf)
+echo "${submit_output}"
+
+strategy_job_id=$(echo "${submit_output}" | sed -n 
's/.*jobId=\([^[:space:]]*\).*/\1/p')

Review Comment:
   Fixed. Added a guard right after extraction:
   ERROR: failed to extract strategy job ID



##########
docs/table-maintenance-service/optimizer-quick-start.md:
##########
@@ -0,0 +1,237 @@
+---
+title: "Optimizer Quick Start and Verification"
+slug: /table-maintenance-service/quick-start
+keyword: table maintenance, optimizer, quick start, compaction, update stats
+license: This software is licensed under the Apache License version 2.
+---
+
+## Before running quick start
+
+- Prepare a running Gravitino server.
+- Ensure target metalake exists (examples use `test`).
+- Configure `SPARK_HOME` or `gravitino.jobExecutor.local.sparkHome` for Spark 
templates.
+- If your Iceberg REST backend is in-memory, metadata is reset after restart.
+
+For full config details, see [Optimizer 
Configuration](./optimizer-configuration.md).
+
+## Success criteria
+
+- Update-stats job finishes and statistics include `custom-data-file-mse` and 
`custom-delete-file-number`.
+- `submit-strategy-jobs` prints `SUBMIT` with a rewrite job ID.
+- Rewrite job log shows `Rewritten data files: <N>` where `N > 0` for 
non-empty tables.
+
+## Quick start A: built-in table maintenance workflow
+
+This workflow uses:
+
+- Built-in policy type: `system_iceberg_compaction`
+- Built-in update stats job template: `builtin-iceberg-update-stats`
+- Built-in rewrite data files job template: 
`builtin-iceberg-rewrite-data-files`
+
+### 1. Preflight checks
+
+```bash
+# Check metalake
+curl -sS "http://localhost:8090/api/metalakes/test"; | jq
+
+# Check built-in templates
+curl -sS 
"http://localhost:8090/api/metalakes/test/jobs/templates?details=true"; | jq 
'.jobTemplates[].name'
+```
+
+Expected names include:
+
+- `builtin-iceberg-update-stats`
+- `builtin-iceberg-rewrite-data-files`
+
+If missing, verify `gravitino-jobs` JAR in `auxlib`, then restart Gravitino.
+
+### 2. Prepare demo metadata objects
+
+Create a REST Iceberg catalog, schema, and table:
+
+```bash
+# Create catalog (ignore "already exists" errors)
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "name": "rest_catalog",
+    "type": "RELATIONAL",
+    "comment": "Iceberg REST catalog",
+    "provider": "lakehouse-iceberg",
+    "properties": {
+      "catalog-backend": "rest",
+      "uri": "http://localhost:9001/iceberg";
+    }
+  }' \
+  http://localhost:8090/api/metalakes/test/catalogs
+
+# Create schema
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "name": "db",
+    "comment": "optimizer demo schema",
+    "properties": {}
+  }' \
+  http://localhost:8090/api/metalakes/test/catalogs/rest_catalog/schemas
+
+# Create table
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "name": "t1",
+    "comment": "optimizer demo table",
+    "columns": [
+      {"name": "id", "type": "integer", "nullable": true},
+      {"name": "name", "type": "string", "nullable": true}
+    ],
+    "properties": {}
+  }' \
+  
http://localhost:8090/api/metalakes/test/catalogs/rest_catalog/schemas/db/tables
+```
+
+### 3. Seed demo data (recommended)
+
+Use Spark SQL to create enough small files so compaction has visible effect:
+
+```bash
+${SPARK_HOME}/bin/spark-sql \
+  --conf spark.hadoop.fs.defaultFS=file:/// \
+  --conf spark.sql.catalog.rest_demo=org.apache.iceberg.spark.SparkCatalog \
+  --conf spark.sql.catalog.rest_demo.type=rest \
+  --conf spark.sql.catalog.rest_demo.uri=http://localhost:9001/iceberg \
+  -e "CREATE NAMESPACE IF NOT EXISTS rest_demo.db; \
+      SET spark.sql.files.maxRecordsPerFile=1000; \
+      INSERT INTO rest_demo.db.t1 \
+      SELECT id, concat('name_', CAST(id AS STRING)) FROM range(0, 100000);"
+```
+
+### 4. Create and attach built-in compaction policy
+
+```bash
+# Create policy
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "name": "iceberg_compaction_default",
+    "comment": "Built-in iceberg compaction policy",
+    "policyType": "system_iceberg_compaction",
+    "enabled": true,
+    "content": {}
+  }' \
+  http://localhost:8090/api/metalakes/test/policies
+
+# Attach policy to table
+curl -X POST -H "Accept: application/vnd.gravitino.v1+json" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "policiesToAdd": ["iceberg_compaction_default"]
+  }' \
+  
http://localhost:8090/api/metalakes/test/objects/table/rest_catalog.db.t1/policies
+```
+
+Verify association:
+
+```bash
+curl -sS 
"http://localhost:8090/api/metalakes/test/objects/table/rest_catalog.db.t1/policies?details=true";
 | jq
+```
+
+### 5. Submit built-in update stats job
+
+```bash
+update_stats_job_id=$(curl -sS -X POST -H "Accept: 
application/vnd.gravitino.v1+json" \
+  -H "Content-Type: application/json" \
+  -d '{
+    "jobTemplateName": "builtin-iceberg-update-stats",
+    "jobConf": {
+      "catalog_name": "rest_catalog",
+      "table_identifier": "db.t1",
+      "update_mode": "all",
+      "updater_options": 
"{\"gravitino_uri\":\"http://localhost:8090\",\"metalake\":\"test\",\"statistics_updater\":\"gravitino-statistics-updater\",\"metrics_updater\":\"gravitino-metrics-updater\"}";,
+      "spark_conf": 
"{\"spark.master\":\"local[2]\",\"spark.hadoop.fs.defaultFS\":\"file:///\"}",
+      "spark_master": "local[2]",
+      "spark_executor_instances": "1",
+      "spark_executor_cores": "1",
+      "spark_executor_memory": "1g",
+      "spark_driver_memory": "1g",
+      "catalog_type": "rest",
+      "catalog_uri": "http://localhost:9001/iceberg";,
+      "warehouse_location": ""
+    }
+  }' \
+  http://localhost:8090/api/metalakes/test/jobs/runs | jq -r '.job.jobId')
+
+echo "update-stats job id: ${update_stats_job_id}"
+```
+
+### 6. Trigger rewrite submission with `submit-strategy-jobs`
+
+```bash
+# Required optimizer CLI config for strategy submission.
+# Note: --strategy-name is policy name, not strategy.type.
+cat > /tmp/gravitino-optimizer-submit.conf <<'EOF_CONF'
+gravitino.optimizer.gravitinoUri = http://localhost:8090
+gravitino.optimizer.gravitinoMetalake = test
+gravitino.optimizer.gravitinoDefaultCatalog = rest_catalog
+gravitino.optimizer.recommender.statisticsProvider = 
gravitino-statistics-provider
+gravitino.optimizer.recommender.strategyProvider = gravitino-strategy-provider
+gravitino.optimizer.recommender.tableMetaProvider = 
gravitino-table-metadata-provider
+gravitino.optimizer.recommender.jobSubmitter = gravitino-job-submitter
+gravitino.optimizer.strategyHandler.iceberg-data-compaction.className = 
org.apache.gravitino.maintenance.optimizer.recommender.handler.compaction.CompactionStrategyHandler
+gravitino.optimizer.jobSubmitterConfig.catalog_name = rest_catalog
+gravitino.optimizer.jobSubmitterConfig.spark_master = local[2]
+gravitino.optimizer.jobSubmitterConfig.spark_executor_instances = 1
+gravitino.optimizer.jobSubmitterConfig.spark_executor_cores = 1
+gravitino.optimizer.jobSubmitterConfig.spark_executor_memory = 1g
+gravitino.optimizer.jobSubmitterConfig.spark_driver_memory = 1g
+gravitino.optimizer.jobSubmitterConfig.catalog_type = rest
+gravitino.optimizer.jobSubmitterConfig.catalog_uri = 
http://localhost:9001/iceberg
+gravitino.optimizer.jobSubmitterConfig.warehouse_location =
+gravitino.optimizer.jobSubmitterConfig.spark_conf = 
{"spark.master":"local[2]","spark.hadoop.fs.defaultFS":"file:///"}
+EOF_CONF
+
+# Optional: preview recommendations without submitting jobs.
+./bin/gravitino-optimizer.sh \
+  --type submit-strategy-jobs \
+  --identifiers rest_catalog.db.t1 \
+  --strategy-name iceberg_compaction_default \
+  --dry-run \
+  --limit 10 \
+  --conf-path /tmp/gravitino-optimizer-submit.conf
+
+# Submit rewrite job through strategy evaluation.
+submit_output=$(./bin/gravitino-optimizer.sh \
+  --type submit-strategy-jobs \
+  --identifiers rest_catalog.db.t1 \
+  --strategy-name iceberg_compaction_default \
+  --limit 10 \
+  --conf-path /tmp/gravitino-optimizer-submit.conf)
+echo "${submit_output}"
+
+strategy_job_id=$(echo "${submit_output}" | sed -n 
's/.*jobId=\([^[:space:]]*\).*/\1/p')
+echo "strategy rewrite job id: ${strategy_job_id}"
+```
+
+### 7. Track status and verify results
+
+```bash
+# Check job status by id
+curl -sS 
"http://localhost:8090/api/metalakes/test/jobs/runs/${update_stats_job_id}"; | jq
+curl -sS 
"http://localhost:8090/api/metalakes/test/jobs/runs/${strategy_job_id}"; | jq
+
+# Verify table statistics after update-stats
+curl -sS 
"http://localhost:8090/api/metalakes/test/objects/table/rest_catalog.db.t1/statistics";
 | jq
+
+# Verify rewrite actually rewrote files (N should be > 0 for non-empty table)
+grep -E "Rewritten data files|Added data files|completed successfully" \
+  
"/tmp/gravitino/jobs/staging/test/builtin-iceberg-rewrite-data-files/${strategy_job_id}/error.log"

Review Comment:
   Updated. I documented that the staging path is controlled by  and noted its 
default value () in the verification section.



##########
maintenance/optimizer/build.gradle.kts:
##########
@@ -117,6 +117,7 @@ tasks {
 
   register("copyConfigs", Copy::class) {
     from("src/main/resources")
+    include("**/*.conf", "**/*.template")

Review Comment:
   Addressed.  now keeps the conf/template filter and also explicitly copies , 
so the packaged optimizer conf is generated from the actual template source.



##########
docs/index.md:
##########
@@ -92,6 +92,8 @@ Gravitino currently supports the following catalogs:
 
 If you want to operate table and partition statistics, you can refer to the 
[document](./manage-statistics-in-gravitino.md).
 
+If you want an operations guide for automated maintenance workflows 
(statistics, metrics, monitoring, and strategy jobs), see [Table Maintenance 
Service (Optimizer)](./table-maintenance-service/optimizer.md). You can start 
with Gravitino built-in policies and built-in job templates first, and use 
custom extension interfaces when built-ins do not meet your requirements.

Review Comment:
   Fixed earlier. The optimizer entry is no longer inserted between 
catalog-type sections; it is placed after the catalog list.



##########
docs/table-maintenance-service/optimizer.md:
##########
@@ -0,0 +1,115 @@
+---
+title: "Table Maintenance Service (Optimizer)"
+slug: /table-maintenance-service
+keyword: table maintenance, optimizer, statistics, metrics, monitor
+license: This software is licensed under the Apache License version 2.
+---
+
+## What is this service
+
+The Table Maintenance Service (Optimizer) automates table maintenance by 
connecting:
+
+- Statistics and metrics collection
+- Rule evaluation and strategy recommendation
+- Job template based execution
+
+The CLI command and configuration keys keep using the `optimizer` name for 
compatibility.
+
+## Architecture overview
+
+The optimizer workflow is based on six parts:
+
+1. Metadata objects: catalog/schema/table in a metalake.
+2. Statistics and metrics: table/partition signals used for decision making.
+3. Policies: strategy intent, for example `system_iceberg_compaction`.
+4. Job templates: executable contracts, for example built-in Spark templates.
+5. Job executor: local or custom backend that runs submitted jobs.
+6. Status and logs: REST job state plus local staging logs.
+
+Typical data flow:
+
+1. Collect statistics and metrics for target tables.
+2. Evaluate rules and produce candidate actions.
+3. Submit jobs using a concrete template and `jobConf`.
+4. Track status and verify results on table metadata and logs.
+
+## Execution modes
+
+| Mode | Main entry | Best for | Output |
+| --- | --- | --- | --- |
+| Built-in maintenance workflow | Gravitino REST + built-in templates | 
Server-side operational runs | Submitted Spark jobs and updated metadata |
+| Optimizer CLI local calculator | `gravitino-optimizer.sh` | Local 
file-driven testing and batch scripts | Statistics/metrics updates and optional 
submissions |
+
+Use built-in maintenance workflow when you want policy-driven server execution.
+Use CLI local calculator when you want to feed JSONL input directly.
+
+## Start here
+
+- Configuration first: read [Optimizer 
Configuration](./optimizer-configuration.md).
+- Need custom integrations: read [Optimizer Extension 
Guide](./optimizer-extension-guide.md).
+- First-time enablement: run [Optimizer Quick Start and 
Verification](./optimizer-quick-start.md).
+- CLI-only usage: read [Optimizer CLI Reference](./optimizer-cli-reference.md).
+- Runtime failures or mismatched results: check [Optimizer 
Troubleshooting](./optimizer-troubleshooting.md).
+
+## Lifecycle
+
+### 1. Collect
+
+Generate or ingest table and partition statistics/metrics.
+
+### 2. Evaluate
+
+Apply policies and rules to decide whether maintenance should run.
+
+### 3. Submit
+
+Pick a job template and submit job with concrete `jobConf`.
+
+### 4. Observe
+
+Check REST job status and validate resulting statistics, metrics, or rewritten 
data files.
+
+## Configuration model
+
+| Layer | Scope | Typical keys |
+| --- | --- | --- |
+| Gravitino server config | Runtime for job manager and executor | 
`gravitino.job.executor`, `gravitino.job.statusPullIntervalInMs`, 
`gravitino.jobExecutor.local.sparkHome` |
+| Job submission `jobConf` | Per job run | `catalog_name`, `table_identifier`, 
`spark_*`, template-specific args |
+| Optimizer CLI config | CLI commands | `gravitino.optimizer.*` in 
`conf/gravitino-optimizer.conf` |
+
+## Terminology mapping
+
+| Term | Example value | Used in |
+| --- | --- | --- |
+| Policy name | `iceberg_compaction_default` | Policy identity and CLI 
`--strategy-name` |
+| Policy type | `system_iceberg_compaction` | REST policy creation field 
`policyType` |
+| Strategy type | `iceberg-data-compaction` | Policy content field 
`strategy.type` and strategy handler config key |
+
+For strategy submission, `--strategy-name` must use policy name, not policy 
type or strategy type.
+
+## Before you start
+
+- Prepare a running Gravitino server.
+- Ensure target metalake exists (examples use `test`).
+- Configure `SPARK_HOME` or `gravitino.jobExecutor.local.sparkHome` for Spark 
templates.
+- For CLI mode, prepare `conf/gravitino-optimizer.conf` from template.
+- Use fully qualified identifiers where possible, for example 
`catalog.schema.table`.
+- If your Iceberg REST backend is in-memory, metadata is reset after restart.
+
+## Success criteria
+
+- Update-stats job finishes and statistics include `custom-data-file-mse` and 
`custom-delete-file-number`.
+- `submit-strategy-jobs` prints `SUBMIT` with a rewrite job ID.
+- Rewrite job log shows `Rewritten data files: <N>` where `N > 0` for 
non-empty tables.
+
+## Related docs
+
+- [Optimizer Configuration](./optimizer-configuration.md)
+- [Optimizer Extension Guide](./optimizer-extension-guide.md)
+- [Optimizer Quick Start and Verification](./optimizer-quick-start.md)
+- [Optimizer CLI Reference](./optimizer-cli-reference.md)
+- [Optimizer Troubleshooting](./optimizer-troubleshooting.md)
+- [Manage policies in Gravitino](./manage-policies-in-gravitino.md)
+- [Iceberg compaction policy](./iceberg-compaction-policy.md)
+- [Manage jobs in Gravitino](./manage-jobs-in-gravitino.md)
+- [Manage statistics in Gravitino](./manage-statistics-in-gravitino.md)

Review Comment:
   Fixed earlier. The related-doc links in  now correctly point to parent docs 
with  prefixes.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Re: [PR] [MINOR] docs: add architecture-first optimizer guide and improve discoverability [gravitino]

Reply via email to