dalton-hall-cw opened a new issue, #7492:
URL: https://github.com/apache/iceberg/issues/7492

   ### Query engine
   
   PySpark 3.3.1
   
   ### Question
   
   We need to optimize our code that loads the data into the database. 
Specifically, we receive the full history of data every week from our data 
vendor, but we only need to upload the difference between the weekly file and 
what is currently in the database.
   
   Currently, checking the differences between the file and the database takes 
increasingly longer with each file uploaded. We suspect that this is due to the 
way we are adding these differences using Spark.
   
   We would appreciate any advice or guidance on the most efficient way to 
"upsert" data into an Iceberg database.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to