deepika094 commented on issue #1759: URL: https://github.com/apache/iceberg-python/issues/1759#issuecomment-2847371843
Hey guys, I still have similar issue.. i have around 5 million rows for a given day. I ran the process once it inserted data. But lets say i want to rerun the process and i use upsert, the process just takes forever . I tried to batch it with batch size 1000 it progressed but thats too time taking. Do we have any recommendations for that? I have 8 columns , out of which 4 are primary keys. Even though data isnt changed it takes too much time to run upsert. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org