maomaodev commented on issue #12790: URL: https://github.com/apache/iceberg/issues/12790#issuecomment-2812014228
> Would that be logically different? The Iceberg.dropTable should only succeed if it is an iceberg table? I believe that Iceberg depends on the engine (Spark, Trino, Hive, etc.), and theoretically, it should not disrupt the behavior of the engine. 1. Without Iceberg, Spark can delete the data files of a Hive table without the 'purge' keyword. 'Purge' is only used to control whether to skip trash while dropping table. <img width="1027" alt="Image" src="https://github.com/user-attachments/assets/251e3f15-2e58-47f3-9524-99b9d53b8326" /> 2. Now, after introducing Iceberg, Spark can only delete the metadata of a Hive table without the 'purge' keyword, **and the HDFS data files still exist**. If a Hive table with the same name is created again, it will throw a path already exists exception. This can be a bit confusing for users. After introducing Iceberg, all original `drop table` SQLs must be modified to include 'purge' in order to delete a Hive table. When dealing with a large number of such business SQLs, modifying SQL is undoubtedly a huge workload. Based on the above background, I think that when droping a Hive table, we should follow Spark's original deletion behavior. `getSessionCatalog().dropTable(ident)` will call `org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog#dropTable`. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org