sungwy commented on code in PR #1036:
URL: https://github.com/apache/iceberg-python/pull/1036#discussion_r1714392026


##########
pyiceberg/table/__init__.py:
##########
@@ -630,7 +630,20 @@ def add_files(self, file_paths: List[str], 
snapshot_properties: Dict[str, str] =
 
         Raises:
             FileNotFoundError: If the file does not exist.
+            ValueError: Raises a ValueError given file_paths contains 
duplicate files
+            ValueError: Raises a ValueError given file_paths already 
referenced by table
         """
+        if len(file_paths) != len(set(file_paths)):
+            raise ValueError("File paths must be unique")
+
+        import pyarrow.compute as pc
+
+        expr = pc.field("file_path").isin(file_paths)
+        referenced_files = [file["file_path"] for file in 
self._table.inspect.files().filter(expr).to_pylist()]
+
+        if referenced_files:
+            raise ValueError(f"Cannot add files that are already referenced by 
table, files: {', '.join(referenced_files)}")

Review Comment:
   so my suggestion is something like:
   
   `def add_files(self, file_paths: List[str], snapshot_properties: Dict[str, 
str] = EMPTY_DICT, check_duplicate_files: bool = True) -> None:`



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to