laserninja opened a new issue, #10594:
URL: https://github.com/apache/gravitino/issues/10594

   ### Describe the feature
   
   Attached document proposes adding **ML Feature Store** support to 
Gravitino—enabling Gravitino to serve as a unified metadata management layer 
for ML features, bridging the gap between data engineering and machine learning 
workflows. Feature store metadata is the natural complement to model metadata: 
models consume features, and features are derived from underlying data assets 
that Gravitino already manages.
   
   [ML feature store proposal 
document](https://docs.google.com/document/d/11NYd3KWxryCFxuG887kJ3d0p87leNyqK/edit?usp=sharing&ouid=114123410449326225963&rtpof=true&sd=true)
   
   The doc contains details on:
   **Motivation** — Why Gravitino is uniquely positioned (federated metadata, 
existing model catalog, governance infra)
   **Metadata Model** — `FeatureEntity`, `FeatureGroup`, `Feature`, 
`FeatureGroupVersion`, `FeatureView` with full field definitions
   **API Design** — `FeatureStoreCatalog` Java interface with CRUD for all 
entities, following Gravitino's dispatcher/change-object patterns
   **REST API** — Full endpoint specification with example request/response JSON
   **Java & Python SDKs** — Code examples showing end-to-end workflows
   **Storage** — Database table schemas for MySQL/PostgreSQL/H2 persistence
   **Federated Connectors** — Architecture for Feast, Tecton, Hopsworks, 
SageMaker connectors (Feast as priority)
   **Model Catalog Integration** — Feature-model lineage and impact analysis
   **Engine Connectors** — Spark, Trino, and Python framework integration
   **Governance** — RBAC privileges, tag-based governance, audit events
   **CLI & Web UI** — Command examples and UI wireframe descriptions
   **Implementation Plan** — 6-phase rollout with task breakdowns
   **Testing Strategy** — Unit and integration test plan following Gravitino 
conventions
   
   ### Motivation
   
   Machine learning teams face a critical challenge: managing the lifecycle of 
**features**—the processed, structured data inputs that ML models consume. In 
production environments, feature engineering represents 60-80% of the effort in 
building ML systems, yet features remain one of the least managed assets in the 
data ecosystem.
   
   ### Describe the solution
   
   Add a new FEATURE catalog type to Gravitino that manages feature metadata 
(not data) — acting as a unified registry and governance layer.
   
   
   
   ### Additional context
   
   _No response_


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to