Re: [I] Upserting large table extremely slow [iceberg-python]

2025-07-03 Thread via GitHub
jayceslesar commented on issue #2159: URL: https://github.com/apache/iceberg-python/issues/2159#issuecomment-3033347997 @Fokko @kevinjqliu do you think its worth setting up a roadmap for what should be candidates for rolling wheels from rust? Would really help focus efforts on lacking part

Re: [I] Upserting large table extremely slow [iceberg-python]

2025-07-03 Thread via GitHub
koenvo commented on issue #2159: URL: https://github.com/apache/iceberg-python/issues/2159#issuecomment-309213 Totally agree. Lets start exploring the iceberg-rust codebase -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub

Re: [I] Upserting large table extremely slow [iceberg-python]

2025-07-03 Thread via GitHub
jayceslesar commented on issue #2159: URL: https://github.com/apache/iceberg-python/issues/2159#issuecomment-3033171165 > Honestly, I think it would be a better use of community resources to invest more in the iceberg-rust/datafusion path so that the bulk of this logic can be moved out

Re: [I] Upserting large table extremely slow [iceberg-python]

2025-07-03 Thread via GitHub
corleyma commented on issue #2159: URL: https://github.com/apache/iceberg-python/issues/2159#issuecomment-3032961475 I think @Anton-Tarazi's original point -- creating a bunch of (Python object) filter expressions for every row in a large dataframe is going to be slow, and we do that befor

Re: [I] Upserting large table extremely slow [iceberg-python]

2025-07-03 Thread via GitHub
Fokko commented on issue #2159: URL: https://github.com/apache/iceberg-python/issues/2159#issuecomment-3031132431 Hey @koenvo thanks for raising this discussion. Nothing is set in stone, so there are always possibilities to optimize, and I agree, we started with rough building blocks.

Re: [I] Upserting large table extremely slow [iceberg-python]

2025-07-02 Thread via GitHub
koenvo commented on issue #2159: URL: https://github.com/apache/iceberg-python/issues/2159#issuecomment-3026872310 This aligns well with the discussion here: https://github.com/apache/iceberg-python/issues/2138#issuecomment-2997190853 While there have been improvements to `upsert` -

Re: [I] Upserting large table extremely slow [iceberg-python]

2025-06-28 Thread via GitHub
Anton-Tarazi commented on issue #2159: URL: https://github.com/apache/iceberg-python/issues/2159#issuecomment-3016154418 Being able to provide a "hint" seems like a decent workaround, but then we have to rely on the user providing the correct filter, otherwise the upsert won't work properl

Re: [I] Upserting large table extremely slow [iceberg-python]

2025-06-28 Thread via GitHub
jayceslesar commented on issue #2159: URL: https://github.com/apache/iceberg-python/issues/2159#issuecomment-3016142200 Haha I was just looking at this last week -- I wonder if it would make sense for a user to be able to supply their own filter into the util? That was what enabled me to s

[I] Upserting large table extremely slow [iceberg-python]

2025-06-28 Thread via GitHub
Anton-Tarazi opened a new issue, #2159: URL: https://github.com/apache/iceberg-python/issues/2159 ### Feature Request / Improvement ## Feature Request / Improvement Upserting large dataframes (tens of millions of rows) in un-usably slow due to creating a massive `BooleanExpress