Fokko commented on PR #1534:
URL: https://github.com/apache/iceberg-python/pull/1534#issuecomment-2634476591

   Hey @mattmartin14 Yes, I think we can merge this soon, but I would love to 
drop data fusion if possible. I've been doing some incremental reviews to make 
sure we can do it step by step. Regarding the `rows_to_insert`, I don't think 
we need a join for that. Joins are often pretty expensive since it involve 
sorting the data.
   
   Some pseudocode to explain what I was thinking of:
   ```python
   # From before
   pred: BooleanExpression = ...
   arrow_expr = expression_to_pyarrow(pred)
   
   # What we already have
   df = tbl.scan(row_filter=pred).to_arrow()
   
   rows_to_overwrite = df.filter(arrow_expr)
   rows_to_append = df.filter(~arrow_expr)  # ~ is the not expression in Arrow
   ```


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to