Fokko commented on code in PR #301:
URL: https://github.com/apache/iceberg-python/pull/301#discussion_r1465458297


##########
pyiceberg/io/pyarrow.py:
##########
@@ -288,6 +288,8 @@ def create(self, overwrite: bool = False) -> OutputStream:
         try:
             if not overwrite and self.exists() is True:
                 raise FileExistsError(f"Cannot create file, already exists: 
{self.location}")
+            # Parent directories must be created first in certain file 
systems, such as the LocalFileSystem.
+            self._filesystem.create_dir(os.path.dirname(self._path), 
recursive=True)

Review Comment:
   This is typically something that we try to avoid. Iceberg is designed to 
work with object stores, and those don't have a notion of directories. One 
recommendation is even to disallow moves and listing of directories. One thing 
is also creating a directory. I'm not sure what the behavior is for the Arrow 
S3 implementation. Since some of the implementations still make a call(s):
   
   - For example, a list operation to check if the directory is there
   - For example, they touch a small file under the prefix to indicate that the 
path should be created.
   
   Another concern is that we currently do this in Arrow, we also would need to 
do this for other implementations to avoid discrepancies. The concept of the 
FileIO is that you easily can swap them out.



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to