chinnaraolalam opened a new issue, #10157: URL: https://github.com/apache/iceberg/issues/10157
### Apache Iceberg version 1.4.3 ### Query engine Spark ### Please describe the bug 🐞 Drop table purge issue for parquet tables with **SparkSessionCatalog**. This was identified in Iceberg 1.4.3 + Spark 3.4.1 + SPARK-43203(On patch) in our environment **CASE 1**: Launching spark-sql session with **SessionCatalog**. Drop non-iceberg table like parquet table, it will purge data and not leave any data on disk. **CASE 2**: Launching spark-sql session with **SparkSessionCatalog**. Drop non-iceberg tables like parquet table, Drop will not purge the data and data will be left until manual cleanup. Here create table with same name fails. In CASE-1 and CASE-2 behaviour of parquet table is different. more over launching spark-sql with **SparkSessionCatalog** is bringing behavioural change for non-iceberg tables(Its an issue). **Tested In cluster for** CASE 1: Spark session launched with default spark catalog CREATE` TABLE parquettable (id bigint, data string) USING parquet; INSERT INTO parquettable VALUES (1,'A),(2,'B'),(3,'C'); SELECT id,data FROM parquettable WHERE lenght(data) = 1; DROP TABLE parquettable; CREATE TABLE parquettable (id bigint, data string) USING parquet; --> This query SUCCESSFUL. CASE 2: Spark session launched with iceberg SparkSessionCatalog (--conf spark.sql.catalog.spark_catalog=org.apache.iceberg.spark.SparkSessionCatalog) CREATE` TABLE parquettable (id bigint, data string) USING parquet; INSERT INTO parquettable VALUES (1,'A),(2,'B'),(3,'C'); SELECT id,data FROM parquettable WHERE lenght(data) = 1; DROP TABLE parquettable; CREATE TABLE parquettable (id bigint, data string) USING parquet; --> This query failed with exception [LOCATION_ALREADY_EXIST] (as drop table purge not happened) Same scenario in both sessions behaving differently. Here session launched with iceberg SparkSessionCatalog forcing to use purge for drop tables of non-iceberg tables. IIUC, iceberg SparkSessionCatalog can work for both iceberg tables and non-iceberg tables. So for non-iceberg tables it should fallback to spark default catalog and behaviour should be same as spark. But in the above case, creating parquet table with same name is failed due to purge not happened (This is not the behaviour in spark). Only for iceberg tables by default purge was off. On further analysis, my suspect is SparkSessionCatalog.dropTable(Identifier ident), drop table with icebergCatalog.dropTable(ident) here purge is off and returned from here, Purge off was send to spark (I guess this change is due to https://issues.apache.org/jira/browse/SPARK-43203 which was done on 3.4.2) So to fix this update SparkSessionCatalog.dropTable(Identifier ident) as below public boolean dropTable(Identifier ident) { if (icebergCatalog.tableExists(ident)) { return icebergCatalog.dropTable(ident); } else { return getSessionCatalog().dropTable(ident); } } To reproduce this issue on main branch (spark 3.5 is default), Reverted purge in tests which are added as part of https://github.com/apache/iceberg/pull/9187. Multiple tests are failing and after updating dropTable as above all tests are passing. For the same created patch(This is to demonstrate the issue not the final fix) Need to check 1. What is behaviour of iceberg tables now. 2. What about other api's of SparkSessionCatalog. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org