amitgilad3 commented on code in PR #1036: URL: https://github.com/apache/iceberg-python/pull/1036#discussion_r1714418419
########## pyiceberg/table/__init__.py: ########## @@ -630,7 +630,20 @@ def add_files(self, file_paths: List[str], snapshot_properties: Dict[str, str] = Raises: FileNotFoundError: If the file does not exist. + ValueError: Raises a ValueError given file_paths contains duplicate files + ValueError: Raises a ValueError given file_paths already referenced by table """ + if len(file_paths) != len(set(file_paths)): + raise ValueError("File paths must be unique") + + import pyarrow.compute as pc + + expr = pc.field("file_path").isin(file_paths) + referenced_files = [file["file_path"] for file in self._table.inspect.files().filter(expr).to_pylist()] + + if referenced_files: + raise ValueError(f"Cannot add files that are already referenced by table, files: {', '.join(referenced_files)}") Review Comment: i also added the flag to the table api and added tests to make sure the flag works with False -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org