laserninja opened a new issue, #10594: URL: https://github.com/apache/gravitino/issues/10594
### Describe the feature Attached document proposes adding **ML Feature Store** support to Gravitino—enabling Gravitino to serve as a unified metadata management layer for ML features, bridging the gap between data engineering and machine learning workflows. Feature store metadata is the natural complement to model metadata: models consume features, and features are derived from underlying data assets that Gravitino already manages. [ML feature store proposal document](https://docs.google.com/document/d/11NYd3KWxryCFxuG887kJ3d0p87leNyqK/edit?usp=sharing&ouid=114123410449326225963&rtpof=true&sd=true) The doc contains details on: **Motivation** — Why Gravitino is uniquely positioned (federated metadata, existing model catalog, governance infra) **Metadata Model** — `FeatureEntity`, `FeatureGroup`, `Feature`, `FeatureGroupVersion`, `FeatureView` with full field definitions **API Design** — `FeatureStoreCatalog` Java interface with CRUD for all entities, following Gravitino's dispatcher/change-object patterns **REST API** — Full endpoint specification with example request/response JSON **Java & Python SDKs** — Code examples showing end-to-end workflows **Storage** — Database table schemas for MySQL/PostgreSQL/H2 persistence **Federated Connectors** — Architecture for Feast, Tecton, Hopsworks, SageMaker connectors (Feast as priority) **Model Catalog Integration** — Feature-model lineage and impact analysis **Engine Connectors** — Spark, Trino, and Python framework integration **Governance** — RBAC privileges, tag-based governance, audit events **CLI & Web UI** — Command examples and UI wireframe descriptions **Implementation Plan** — 6-phase rollout with task breakdowns **Testing Strategy** — Unit and integration test plan following Gravitino conventions ### Motivation Machine learning teams face a critical challenge: managing the lifecycle of **features**—the processed, structured data inputs that ML models consume. In production environments, feature engineering represents 60-80% of the effort in building ML systems, yet features remain one of the least managed assets in the data ecosystem. ### Describe the solution Add a new FEATURE catalog type to Gravitino that manages feature metadata (not data) — acting as a unified registry and governance layer. ### Additional context _No response_ -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
