paultipper opened a new issue, #11953: URL: https://github.com/apache/iceberg/issues/11953
### Apache Iceberg version 1.6.1 ### Query engine Spark ### Please describe the bug 🐞 Share I'm trying to use the Apache Spark MERGE INTO command to add/update some data from a source data frame into an Apache Iceberg table within an AWS Glue table using an AWS Glue job running Spark 3.5. If the source data frame is empty, then all of the existing data in the target table is deleted. Here is a sample of the Python code I'm using to do this: ``` # df is a data frame of the source data, and is passed into this code block df.createOrReplaceTempView("source_data") # Get start year, month and day from start_date, which is a datetime object passed into this code block year = start_date.year month = start_date.month day = start_date.day print(f"start_date: {start_date}, year: {year}, month: {month}, day: {day}") # Generate the WHERE part of the statement where_clause = f"WHERE year >= {year} AND (year > {year} OR month >= {month}) AND (year > {year} OR month > {month} OR day >= {day})" selected_df = spark.sql(f"SELECT * FROM source_data {where_clause}") logger.info(f"New CSV rows selected for merging: {selected_df.count()}") selected_df.createOrReplaceTempView("new_data") MERGE INTO iceberg_catalog.db.target_table t USING new_data AS s ON (t.surrogate_key = s.surrogate_key) WHEN MATCHED THEN UPDATE SET * WHEN NOT MATCHED THEN INSERT * ``` Before the MERGE INTO operation, the target table contains 8246 rows, and I've establised that the number of rows in the selected_df data frame was 0. My expectation is that merging `selected_df` into the target table should leave the target table with the same data as before, but I found that in fact that, after the MERGE INTO operation, the target table was empty. As I say, my assumption is that the MERGE INTO command will add any rows in `selected_df` that do not already exist into the target table; that it will update any rows that do exist, and will leave any rows that exist in the target table that are not in `selected_df` in place; is my assumption incorrect? ### Willingness to contribute - [ ] I can contribute a fix for this bug independently - [ ] I would be willing to contribute a fix for this bug with guidance from the Iceberg community - [X] I cannot contribute a fix for this bug at this time -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org