maomaodev opened a new issue, #12790:
URL: https://github.com/apache/iceberg/issues/12790

   ### Query engine
   
   Spark 3.4.2
   Iceberg 1.4.3
   
   ### Question
   
   1. After [SPARK-43203](https://issues.apache.org/jira/browse/SPARK-43203), 
Spark move all Drop Table case to DataSource V2.
   2. We use Iceberg's SparkSessionCatalog to replace Spark's built-in catalog, 
Spark will not delete data files of hive table without purge. **This behavior 
is inconsistent with the original behavior of Spark**. 
   ```
   spark.sql.catalog.spark_catalog    
org.apache.iceberg.spark.SparkSessionCatalog
   spark.sql.catalog.spark_catalog.type    hive
   spark.sql.extensions    
org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions 
   ```
   3. I reviewed the relevant code and found that the 
`SparkSessionCatalog#dropTable` method calls `icebergCatalog.dropTable(ident)` 
to perform the operation, which eventually invokes the Hive interface for 
deletion with `deleteData=false`. However, in Spark, this parameter is set to 
true, which is why the data files of the Hive table is not deleted. 
   ```
   public boolean dropTable(Identifier ident) {
       return icebergCatalog.dropTable(ident) || 
getSessionCatalog().dropTable(ident);
   }
   ```
   I would like to ask, why isn't the deletion operation performed using 
`getSessionCatalog().dropTable(ident)`? Are there any design considerations 
behind this?


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to