koenvo commented on PR #1878: URL: https://github.com/apache/iceberg-python/pull/1878#issuecomment-2823647398
> I was already experimenting with creating a mask. So instead of creating an expression (like you referenced to) it will be a mask. > > We probably need to use the result of the join as indices for both tables. This requires to have index columns for both source and target, and do something like take(source_index_column) and take(target_index_column). This doesn't work as expected. I tried something like this: ```python def get_rows_to_update(....): ..... filtered_source = source_table.take(joined[SOURCE_INDEX_COLUMN_NAME]) filtered_target = target_table.take(joined[TARGET_INDEX_COLUMN_NAME]) diff_expr = functools.reduce( pc.or_, [ pc.or_kleene( pc.not_equal(filtered_source[col], filtered_target[col]), pc.is_null(pc.not_equal(filtered_source[col], filtered_target[col])), ) for col in non_key_cols ], ) filtered_source = filtered_source.filter(diff_expr) ``` `E pyarrow.lib.ArrowNotImplementedError: Function 'not_equal' has no kernel matching input types (struct<sub1: large_string not null, sub2: large_string not null>, struct<sub1: large_string not null, sub2: large_string not null>)` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org