koenvo commented on PR #1878:
URL: https://github.com/apache/iceberg-python/pull/1878#issuecomment-2823647398

   > I was already experimenting with creating a mask. So instead of creating 
an expression (like you referenced to) it will be a mask.
   > 
   > We probably need to use the result of the join as indices for both tables. 
This requires to have index columns for both source and target, and do 
something like take(source_index_column) and take(target_index_column).
   
   This doesn't work as expected.
   
   I tried something like this:
   ```python
   
   def get_rows_to_update(....):
           .....
   
           filtered_source = source_table.take(joined[SOURCE_INDEX_COLUMN_NAME])
           filtered_target = target_table.take(joined[TARGET_INDEX_COLUMN_NAME])
   
   
           diff_expr = functools.reduce(
               pc.or_,
               [
                   pc.or_kleene(
                       pc.not_equal(filtered_source[col], filtered_target[col]),
                       pc.is_null(pc.not_equal(filtered_source[col], 
filtered_target[col])),
                   )
                   for col in non_key_cols
               ],
           )
   
           filtered_source = filtered_source.filter(diff_expr)
   ```
   
   `E   pyarrow.lib.ArrowNotImplementedError: Function 'not_equal' has no 
kernel matching input types (struct<sub1: large_string not null, sub2: 
large_string not null>, struct<sub1: large_string not null, sub2: large_string 
not null>)`


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to