roryqi commented on code in PR #10732: URL: https://github.com/apache/gravitino/pull/10732#discussion_r3077561704
########## design-docs/external_secret_management_design.md: ########## @@ -0,0 +1,1402 @@ +# External Secret Management Integration for Apache Gravitino + +--- + + +## Background + +Apache Gravitino currently supports credential vending for cloud storage systems (S3, GCS, ADLS, OSS) through various credential providers. However, sensitive credentials (access keys, secret keys, passwords, API tokens) are currently configured directly in configuration files or catalog properties as plain text, which poses significant security risks in production environments. + +Modern cloud-native applications follow the principle of **never storing secrets in database as plain-text or in configuration files** and instead rely on external secret management systems that provide: +- Centralized secret storage with encryption at rest +- Fine-grained access control and audit logging +- Automatic secret rotation +- Secret versioning and rollback capabilities +- Integration with cloud IAM systems + +### Security Risks Today + +1. **Plain Text Catalog Credentials in Database:** + When users create catalogs via UI/API, passwords are stored as plain text: + ```properties + # PostgreSQL Catalog + jdbc-password = MyDatabasePassword123 + + # MySQL Catalog + jdbc-password = SuperSecret456 + + # Hive Catalog + authentication.kerberos.keytab-uri = /path/to/keytab + ``` + These credentials are stored in Gravitino's backend database in plain text and accessible to anyone with database access. + +2. **Lack of Secret Rotation:** When catalog passwords are rotated, users must manually update the catalog, which exposes the new password again as plain text. + +3. **Compliance Requirements:** Many industries (finance, healthcare, government) mandate external secret management for SOC2, HIPAA, PCI-DSS compliance. + + +### Industry Best Practices + +Many organizations running applications in production already use an external secret management solution, such as: +- AWS Secrets Manager +- HashiCorp Vault +- Azure Key Vault +- Google Secret Manager +- Kubernetes Secrets with external secret operators + +Gravitino should integrate seamlessly with these existing secret management infrastructures and provide an extensible framework to support any other vault providers. + +## Current State + +### Understanding CredentialProvider vs SecretProvider + +It's important to clarify the relationship between two similar-sounding concepts: + +| Component | Purpose | Layer | Changes Needed | +|-----------|---------|-------|----------------| Review Comment: Could u align the table columns? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
