Muskan-m opened a new issue, #10779: URL: https://github.com/apache/iceberg/issues/10779
### Apache Iceberg version 1.2.0 ### Query engine Spark ### Please describe the bug 🐞 I have created a two tables which share same location and i dropped one of the table and it has removed data from location which is now impacting me for data loss. even unable to query on other table. I am even unable to recover dropped table to recover my data Question: Once the table gets dropped where can i find it's data if it's removed from it's location by drop table action I am expecting it to store temporarily for some time like Hive stores in it's trash. We are using CDP 7.1.9 version. We have one existing Iceberg table as below scala> spark.sql("show create table proc_mes_qdata.mesdp_archive").show(false) |createtab_stmt | --------------------------------------------------------------------+ |CREATE TABLE spark_catalog.proc_mes_qdata.mesdp_archive (\n source_type STRING COMMENT 'type of source data: partdetails or componenttrace or groups, used for partitioning',\n insert_date STRING COMMENT 'Date when data is inserted in Hive table, used for partitioning',\n kafka_offset BIGINT COMMENT 'offset from source Kafka topic',\n kafka_topic STRING COMMENT 'source Kafka topic name',\n key STRING COMMENT 'key from source Kafka topic',\n value STRING COMMENT 'message from source Kafka topic',\n kafka_partition BIGINT COMMENT 'partition from source Kafka topic',\n kafka_timestamp TIMESTAMP COMMENT 'timestamp from source Kafka topic',\n kafka_timestampType BIGINT COMMENT 'timestampType from source Kafka topic')\nUSING iceberg\nPARTITIONED BY (source_type, insert_date)\nCOMMENT 'MES Data Publisher - Storing raw messages of partdetails,componenttrace and groups, partitioned by column source_type and insert_date'\nLOCATION '/proc/mes_qdata/db/mesdp_archive'\nTBLPROPERTIES ( \n 'current-snapshot-id' = '8590217566145417146',\n 'format' = 'iceberg/parquet',\n 'format-version' = '1')\n| Now I have created another temp table on same path Command: |createtab_stmt | --------------------------------------------------------------------+ |CREATE TABLE spark_catalog.proc_mes_qdata.mesdp_archive_test (\n source_type STRING COMMENT 'type of source data: partdetails or componenttrace or groups, used for partitioning',\n insert_date STRING COMMENT 'Date when data is inserted in Hive table, used for partitioning',\n kafka_offset BIGINT COMMENT 'offset from source Kafka topic',\n kafka_topic STRING COMMENT 'source Kafka topic name',\n key STRING COMMENT 'key from source Kafka topic',\n value STRING COMMENT 'message from source Kafka topic',\n kafka_partition BIGINT COMMENT 'partition from source Kafka topic',\n kafka_timestamp TIMESTAMP COMMENT 'timestamp from source Kafka topic',\n kafka_timestampType BIGINT COMMENT 'timestampType from source Kafka topic')\nUSING iceberg\nPARTITIONED BY (source_type, insert_date)\nCOMMENT 'MES Data Publisher - Storing raw messages of partdetails,componenttrace and groups, partitioned by column source_type and insert_date'\nLOCATION '/proc/mes_qdata/db/mesdp_archive'\nTBLPROPERT IES (\n 'current-snapshot-id' = '8590217566145417146',\n 'format' = 'iceberg/parquet',\n 'format-version' = '1')\n| And run below commands: scala> spark.sql("""CALL aeanpprod.system.add_files(table =>'proc_mes_qdata.mesdp_archive_test',source_table => '`parquet`.`/proc/mes_qdata/db/mesdp_archive/data/source_type=componenttrace/`')""") scala> spark.sql("select max(insert_date) from proc_mes_qdata.mesdp_archive_test").show(false) +----------------+ |max(insert_date)| +----------------+ |2024-07-24 | +----------------+ scala> spark.sql("select min(insert_date) from proc_mes_qdata.mesdp_archive_test").show(false) +----------------+ |min(insert_date)| +----------------+ |2023-10-20 | +----------------+ Before executing below drop temp table command i had data in my path [t_mes_qdata_proc@an0vm004 ~]$ hdfs dfs -ls /proc/mes_qdata/db/mesdp_archive/data/source_type=componenttrace Found 269 items /proc/mes_qdata/db/mesdp_archive/data/source_type=componenttrace/insert_date=2024-02-19 drwxrwx---+ - t_mes_qdata_proc hive 0 2024-02-21 00:50 /proc/mes_qdata/db/mesdp_archive/data/source_type=componenttrace/insert_date=2024-02-20 drwxrwx---+ - t_mes_qdata_proc hive 0 2024-02-22 00:50 /proc/mes_qdata/db/mesdp_archive/data/source_type=componenttrace/insert_date=2024-02-21 drwxrwx---+ - t_mes_qdata_proc hive 0 2024-02-23 00:50 /proc/mes_qdata/db/mesdp_archive/data/source_type=componenttrace/insert_date=2024-02-22 drwxrwx---+ - t_mes_qdata_proc hive 0 2024-02-24 00:50 /proc/mes_qdata/db/mesdp_archive/data/source_type=componenttrace/insert_date=2024-02-23 drwxrwx---+ - t_mes_qdata_proc hive 0 2024-02-25 00:50 /proc/mes_qdata/db/mesdp_archive/data/source_type=componenttrace/insert_date=2024-02-24 drwxrwx---+ - t_mes_qdata_proc hive 0 2024-02-26 05:07 /proc/mes_qdata/db/mesdp_archive/data/source_type=componenttrace/insert_date=2024-02-25 drwxrwx---+ - t_mes_qdata_proc hive 0 2024-02-27 00:50 /proc/mes_qdata/db/mesdp_archive/data/source_type=componenttrace/insert_date=2024-02-26 drwxrwx---+ - t_mes_qdata_proc hive 0 2024-02-28 00:50 /proc/mes_qdata/db/mesdp_archive/data/source_type=componenttrace/insert_date=2024-02-27 .............. now Executed below drop command for temp table: scala> spark.sql("drop table proc_mes_qdata.mesdp_archive_test").show(false) 24/07/24 10:54:16 WARN conf.HiveConf: [main]: HiveConf of name hive.cluster.delegation.token.renew-interval does not exist 24/07/24 10:54:16 WARN conf.HiveConf: [main]: HiveConf of name hive.metastore.runworker.in does not exist 24/07/24 10:54:16 WARN conf.HiveConf: [main]: HiveConf of name hive.cluster.delegation.key.update-interval does not exist 24/07/24 10:54:16 WARN conf.HiveConf: [main]: HiveConf of name hive.masking.algo does not exist 24/07/24 10:54:16 WARN conf.HiveConf: [main]: HiveConf of name hive.cluster.delegation.token.max-lifetime does not exist 24/07/24 10:54:16 WARN conf.HiveConf: [main]: HiveConf of name hive.cluster.delegation.token.gc-interval does not exist ++ || ++ ++ My data got deleted from the below HDFS paths. It kept the partition directory but inside partition directory no parquet files are available. [t_mes_qdata_proc@an0vm004 ~]$ hdfs dfs -ls /proc/mes_qdata/db/mesdp_archive/data/source_type=componenttrace Found 269 items /db/mesdp_archive/data/source_type=componenttrace/insert_date=2024-07-14 drwxrwx---+ - t_mes_qdata_proc hive 0 2024-07-24 10:54 /proc/mes_qdata/db/mesdp_archive/data/source_type=componenttrace/insert_date=2024-07-15 drwxrwx---+ - t_mes_qdata_proc hive 0 2024-07-24 10:54 /proc/mes_qdata/db/mesdp_archive/data/source_type=componenttrace/insert_date=2024-07-16 drwxrwx---+ - t_mes_qdata_proc hive 0 2024-07-24 10:54 /proc/mes_qdata/db/mesdp_archive/data/source_type=componenttrace/insert_date=2024-07-17 drwxrwx---+ - t_mes_qdata_proc hive 0 2024-07-24 10:54 /proc/mes_qdata/db/mesdp_archive/data/source_type=componenttrace/insert_date=2024-07-18 drwxrwx---+ - t_mes_qdata_proc hive 0 2024-07-24 10:54 /proc/mes_qdata/db/mesdp_archive/data/source_type=componenttrace/insert_date=2024-07-19 drwxrwx---+ - t_mes_qdata_proc hive 0 2024-07-24 10:54 /proc/mes_qdata/db/mesdp_archive/data/source_type=componenttrace/insert_date=2024-07-22 drwxrwx---+ - t_mes_qdata_proc hive 0 2024-07-24 10:54 /proc/mes_qdata/db/mesdp_archive/data/source_type=componenttrace/insert_date=2024-07-23 .................. This should not happen as per Apache iceberg official documentation https://iceberg.apache.org/docs/latest/spark-ddl/#drop-table We haven't dropped main table, only the temp table was dropped which was created in this spark shell. I also checked in hive trash path could not find any traces of my table hdfs dfs -ls /user/hive/.Trash Even in Hiveserver there were no logs or traces were found for my temp table Attached screenshot Expectations: Need to know why this data is been deleted and do we have any trash location for iceberg table where this data can reside temporarily for sometime ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [X] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [ ] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org