mattmartin14 commented on PR #1534:
URL: https://github.com/apache/iceberg-python/pull/1534#issuecomment-2635079402

   Alright @Fokko  @tscottcoombes1 , some good news. I just pushed an update 
that removes the dependency of datafusion on the main pyiceberg merge_rows 
function. My test file still uses datafusion to generate the test datasets, 
which I'm told is ok. The two functions I ask you give careful scrutiny to are 
in the merge_rows_util.py file. They are called:
   - get_rows_to_update
   - get_rows_to_insert
   
   Given this is my first go at pyarrow filters, there is probably some 
optimization or functional changes that could be done to make it better. I'm 
open to suggestions, but in summary, we have gotten rid of datafusion at this 
point, and even though we are having to loop to compare rows and apply filters 
(no other way I really know of), we are not using pyarrow joins.
   
   Thanks for all the help getting here. I'm curious what updates in the 
merge_rows function will remain.
   
   Thanks,
   Matt


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to