Fokko commented on code in PR #301: URL: https://github.com/apache/iceberg-python/pull/301#discussion_r1465458297
########## pyiceberg/io/pyarrow.py: ########## @@ -288,6 +288,8 @@ def create(self, overwrite: bool = False) -> OutputStream: try: if not overwrite and self.exists() is True: raise FileExistsError(f"Cannot create file, already exists: {self.location}") + # Parent directories must be created first in certain file systems, such as the LocalFileSystem. + self._filesystem.create_dir(os.path.dirname(self._path), recursive=True) Review Comment: This is typically something that we try to avoid. Iceberg is designed to work with object stores, and those don't have a notion of directories. One recommendation is even to disallow moves and listing of directories. One thing is also creating a directory. I'm not sure what the behavior is for the Arrow S3 implementation. Since some of the implementations still make a call(s): - For example, a list operation to check if the directory is there - For example, they touch a small file under the prefix to indicate that the path should be created. Another concern is that we currently do this in Arrow, we also would need to do this for other implementations to avoid discrepancies. The concept of the FileIO is that you easily can swap them out. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org