Re: [I] Spark don't delete data files of hive table without purge [iceberg]

via GitHub Thu, 17 Apr 2025 00:15:23 -0700


maomaodev commented on issue #12790:
URL: https://github.com/apache/iceberg/issues/12790#issuecomment-2812014228


   > Would that be logically different? The Iceberg.dropTable should only 
succeed if it is an iceberg table?
   
   I believe that Iceberg depends on the engine (Spark, Trino, Hive, etc.), and 
theoretically, it should not disrupt the behavior of the engine.
   1. Without Iceberg, Spark can delete the data files of a Hive table without 
the 'purge' keyword. 'Purge' is only used to control whether to skip trash 
while dropping table.
   
   <img width="1027" alt="Image" 
src="https://github.com/user-attachments/assets/251e3f15-2e58-47f3-9524-99b9d53b8326";
 />
   
   2. Now, after introducing Iceberg, Spark can only delete the metadata of a 
Hive table without the 'purge' keyword, **and the HDFS data files still 
exist**. If a Hive table with the same name is created again, it will throw a 
path already exists exception. This can be a bit confusing for users. After 
introducing Iceberg, all original `drop table` SQLs must be modified to include 
'purge' in order to delete a Hive table. When dealing with a large number of 
such business SQLs, modifying SQL is undoubtedly a huge workload. 
   
   Based on the above background, I think that when droping a Hive table, we 
should follow Spark's original deletion behavior. 
`getSessionCatalog().dropTable(ident)` will call 
`org.apache.spark.sql.execution.datasources.v2.V2SessionCatalog#dropTable`.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [I] Spark don't delete data files of hive table without purge [iceberg]

Reply via email to