Re: [PR] Feature: MERGE/Upsert Support [iceberg-python]

via GitHub Thu, 30 Jan 2025 04:04:50 -0800


Fokko commented on code in PR #1534:
URL: https://github.com/apache/iceberg-python/pull/1534#discussion_r1935497033



##########
pyiceberg/table/__init__.py:
##########
@@ -1064,6 +1064,125 @@ def name_mapping(self) -> Optional[NameMapping]:
         """Return the table's field-id NameMapping."""
         return self.metadata.name_mapping()
 
+    def merge_rows(self, df: pa.Table, join_cols: list
+                    ,merge_options: dict = {'when_matched_update_all': True, 
'when_not_matched_insert_all': True}
+                ) -> Dict:
+        """
+        Shorthand API for performing an upsert/merge to an iceberg table.
+        
+        Args:
+            df: The input dataframe to merge with the table's data.
+            join_cols: The columns to join on.

Review Comment:
   The primary-key equivalent of Iceberg is the identifier fields, so we could 
also get it from the table like this:
   
   ```python
           if join_cols is None:
               identifier_field_ids = self.schema().identifier_field_ids
               if len(identifier_field_ids) > 0:
                   join_cols = [
                       self.schema().find_column_name(identifier_field_id)
                       for identifier_field_id in identifier_field_ids 
                   ]
               else:
                   raise ValueError("The table doesn't have identifier fields, 
please set join_cols.")
   ```
   
   We can also do this in a follow-up PR.



##########
pyiceberg/table/__init__.py:
##########
@@ -1064,6 +1064,125 @@ def name_mapping(self) -> Optional[NameMapping]:
         """Return the table's field-id NameMapping."""
         return self.metadata.name_mapping()
 
+    def merge_rows(self, df: pa.Table, join_cols: list
+                    ,merge_options: dict = {'when_matched_update_all': True, 
'when_not_matched_insert_all': True}
+                ) -> Dict:
+        """
+        Shorthand API for performing an upsert/merge to an iceberg table.
+        
+        Args:
+            df: The input dataframe to merge with the table's data.
+            join_cols: The columns to join on.

Review Comment:
   The primary-key equivalent of Iceberg is the identifier fields, so we could 
also get it from the table like this:
   
   ```python
   if join_cols is None:
       identifier_field_ids = self.schema().identifier_field_ids
       if len(identifier_field_ids) > 0:
           join_cols = [
               self.schema().find_column_name(identifier_field_id)
               for identifier_field_id in identifier_field_ids 
           ]
       else:
           raise ValueError("The table doesn't have identifier fields, please 
set join_cols.")
   ```
   
   We can also do this in a follow-up PR.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Re: [PR] Feature: MERGE/Upsert Support [iceberg-python]

Reply via email to