kevinjqliu commented on code in PR #301: URL: https://github.com/apache/iceberg-python/pull/301#discussion_r1465718521
########## pyiceberg/io/pyarrow.py: ########## @@ -288,6 +288,8 @@ def create(self, overwrite: bool = False) -> OutputStream: try: if not overwrite and self.exists() is True: raise FileExistsError(f"Cannot create file, already exists: {self.location}") + # Parent directories must be created first in certain file systems, such as the LocalFileSystem. + self._filesystem.create_dir(os.path.dirname(self._path), recursive=True) Review Comment: @Fokko thanks for the review. I agree with the above. The Arrow FileIO implementation might not be the best place to implement this behavior. So far both of the supported FS implementations (`ARROW_FILE_IO` and `FSSPEC_FILE_IO`) are failing to write to the local file system. I want to make writes work for the local file system. Looking at the Java side, there is a [`LocalOutputFile` implementation](https://github.com/apache/iceberg/blob/fd1cf49280bde07d67c6bc1a6ec60238e1e38f7f/api/src/main/java/org/apache/iceberg/Files.java#L59) which implements the behavior for creating parent directories. Maybe we can implement a new FileIO implementation and make that the preferred implementation for the `file://` scheme. https://github.com/apache/iceberg-python/blob/4cf1f35dfd3e7cfb2996887e861d740239746306/pyiceberg/io/__init__.py#L278 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org