kevinjqliu commented on code in PR #1036:
URL: https://github.com/apache/iceberg-python/pull/1036#discussion_r1713025181


##########
pyiceberg/table/__init__.py:
##########
@@ -630,7 +648,15 @@ def add_files(self, file_paths: List[str], 
snapshot_properties: Dict[str, str] =
 
         Raises:
             FileNotFoundError: If the file does not exist.
+            ValueError: Raises a ValueError in case file_paths is not unique

Review Comment:
   nit: given file_paths contains duplicate files 



##########
pyiceberg/table/__init__.py:
##########
@@ -621,6 +621,24 @@ def delete(self, delete_filter: Union[str, 
BooleanExpression], snapshot_properti
         if not delete_snapshot.files_affected and not 
delete_snapshot.rewrites_needed:
             warnings.warn("Delete operation did not match any records")
 
+    def has_duplicates(self, file_paths: List[str]) -> bool:
+        unique_files = set(file_paths)
+        return len(unique_files) != len(file_paths)
+
+    def find_referenced_files(self, file_paths: List[str]) -> list[str]:

Review Comment:
   I guess we only need to worry about the current snapshot. 
   If a data file existed in previous snapshots, but not in the current, we can 
still add the file.



##########
pyiceberg/table/__init__.py:
##########
@@ -621,6 +621,24 @@ def delete(self, delete_filter: Union[str, 
BooleanExpression], snapshot_properti
         if not delete_snapshot.files_affected and not 
delete_snapshot.rewrites_needed:
             warnings.warn("Delete operation did not match any records")
 
+    def has_duplicates(self, file_paths: List[str]) -> bool:

Review Comment:
   nit: static method. may be better if inlined



##########
pyiceberg/table/__init__.py:
##########
@@ -621,6 +621,24 @@ def delete(self, delete_filter: Union[str, 
BooleanExpression], snapshot_properti
         if not delete_snapshot.files_affected and not 
delete_snapshot.rewrites_needed:
             warnings.warn("Delete operation did not match any records")
 
+    def has_duplicates(self, file_paths: List[str]) -> bool:
+        unique_files = set(file_paths)
+        return len(unique_files) != len(file_paths)
+
+    def find_referenced_files(self, file_paths: List[str]) -> list[str]:

Review Comment:
   FYI there's also the `table.inspect.files()` API 
   https://py.iceberg.apache.org/api/#files
   which returns all data files in the current snapshot



##########
pyiceberg/table/__init__.py:
##########
@@ -630,7 +648,15 @@ def add_files(self, file_paths: List[str], 
snapshot_properties: Dict[str, str] =
 
         Raises:
             FileNotFoundError: If the file does not exist.
+            ValueError: Raises a ValueError in case file_paths is not unique
+            ValueError: Raises a ValueError in case file already referenced in 
table

Review Comment:
   nit: given file_paths already referenced by table



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to