slfan1989 opened a new issue, #13311: URL: https://github.com/apache/iceberg/issues/13311
### Query engine Spark, Flink ### Question # Question Is there any tool or recommended practice for collecting and managing metadata for all Iceberg tables in a centralized way? # Iceberg brings us several key advantages: - Partition management outside of HMS: This significantly reduces the load on the Hive Metastore and helps avoid frequent Full GC issues. - Comprehensive predicate pushdown: Iceberg supports pushdown on nearly all fields, reducing the amount of data scanned and greatly improving query performance. - Efficient storage: Using Parquet with ZSTD compression helps us reduce overall storage costs. # Business background: Our platform generates a large volume of data on a daily basis, but storage resources are limited. Therefore, users often define TTL (time-to-live) rules to automatically clean up expired data. ## Previously with Hive tables: Metadata was stored in HMS; We regularly synced the `DBS` and `PARTITIONS` tables from MySQL into a Hive Table; Based on the partition creation time, we determined whether data had reached TTL; If expired: For managed tables: we executed `DROP PARTITION`; For external tables: we first deleted the data from HDFS, then called `DROP PARTITION` to clean up metadata. ## Now with Iceberg tables: Each table’s metadata is stored in HDFS (with plans to migrate to S3 in the future); We have written a custom program to manage this, which: - Lists all Iceberg tables by querying DBS from HMS; - Locates each table’s metadata.json file; - Uses Iceberg APIs to read PartitionData and extract partition information. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org