maomaodev opened a new issue, #12790: URL: https://github.com/apache/iceberg/issues/12790
### Query engine Spark 3.4.2 Iceberg 1.4.3 ### Question 1. After [SPARK-43203](https://issues.apache.org/jira/browse/SPARK-43203), Spark move all Drop Table case to DataSource V2. 2. We use Iceberg's SparkSessionCatalog to replace Spark's built-in catalog, Spark will not delete data files of hive table without purge. **This behavior is inconsistent with the original behavior of Spark**. ``` spark.sql.catalog.spark_catalog org.apache.iceberg.spark.SparkSessionCatalog spark.sql.catalog.spark_catalog.type hive spark.sql.extensions org.apache.iceberg.spark.extensions.IcebergSparkSessionExtensions ``` 3. I reviewed the relevant code and found that the `SparkSessionCatalog#dropTable` method calls `icebergCatalog.dropTable(ident)` to perform the operation, which eventually invokes the Hive interface for deletion with `deleteData=false`. However, in Spark, this parameter is set to true, which is why the data files of the Hive table is not deleted. ``` public boolean dropTable(Identifier ident) { return icebergCatalog.dropTable(ident) || getSessionCatalog().dropTable(ident); } ``` I would like to ask, why isn't the deletion operation performed using `getSessionCatalog().dropTable(ident)`? Are there any design considerations behind this? -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org