GonkCoelho opened a new issue, #16238: URL: https://github.com/apache/iceberg/issues/16238
### Apache Iceberg version 1.10.0 ### Query engine Spark ### Please describe the bug 🐞 Hi, We have in my project a way to detect if there is new partition column in the ETL process that writes into iceberg that if we want to add/drop a column into the partitions we `ALTER TABLE ADD/DROP PARTITION FIELD`. However, when we have a hidden partition with a bucket transformation if it is created for the first time via Pyspark createOrReplace in the metadata json it has the name "**column_name_bucket**". However, if the partition didn't exist before and we do an `ALTER TABLE` and added the partitions it is now named "**column_name_bucket_n**", being n the n from **bucket(column_name, n)**. This also happens when doing in athena queries: `SELECT * FROM "database"."table$partitions" limit 10;` This is causing issues how to properly identify if the partition was added since we have 2 different ways to detect. Using both seems risky in my perspective If possible column_name_bucket_n nomenclature is perfect for us and if it can be standardized. I also don't know if it is an issue here or in the Spark on how it handles the write if it is on the Spark side I can open the issue there ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [x] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
