dalton-hall-cw opened a new issue, #7492: URL: https://github.com/apache/iceberg/issues/7492
### Query engine PySpark 3.3.1 ### Question We need to optimize our code that loads the data into the database. Specifically, we receive the full history of data every week from our data vendor, but we only need to upload the difference between the weekly file and what is currently in the database. Currently, checking the differences between the file and the database takes increasingly longer with each file uploaded. We suspect that this is due to the way we are adding these differences using Spark. We would appreciate any advice or guidance on the most efficient way to "upsert" data into an Iceberg database. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
