kevinjqliu opened a new issue, #310:
URL: https://github.com/apache/iceberg-python/issues/310

   ### Feature Request / Improvement
   
   **Can we consolidate and standardize FileIO to the PyArrow implementation?**
   
   There are currently two different FileIO implementations, `ARROW_FILE_IO` 
and `FSSPEC_FILE_IO`. `ARROW_FILE_IO` uses [Apache Arrow's Filesystem 
Interface](https://arrow.apache.org/docs/python/filesystems.html#) while 
`FSSPEC_FILE_IO` uses the [`fsspec` 
library](https://filesystem-spec.readthedocs.io/en/latest/). 
   
   Here are a few reasons for consolidating:
   
   1. PyArrow is already preferred over FsSpec for various FS implementations. 
   
https://github.com/apache/iceberg-python/blob/cd7fb502900a717d6b902a398b267eb10e4faa9b/pyiceberg/io/__init__.py#L273-L282
   
   2. PyIceberg is becoming more coupled with PyArrow, `to_arrow()` and 
`pa.Table` are widely used for reading and writing, including the new feature 
#305 
   
   3. Easier to keep the 2 FileIO's behavior in sync. For example, FsSpec 
defaults the path with no scheme (`/tmp/warehouse`) to the `file` scheme, but 
PyArrow does not. See 
[#301](https://github.com/apache/iceberg-python/pull/301/files#diff-24c3aa912b523fdb2afba6a0ea2dfe69fdcd05d9268e1e13ac1023ac26b54cccR176)
   
   4. The two FileIO implementations are not that different from one another. 
[FsSpec can use its underlying FS 
implementations](https://github.com/apache/iceberg-python/blob/cd7fb502900a717d6b902a398b267eb10e4faa9b/pyiceberg/io/fsspec.py#L175-L184),
 including `LocalFileSystem`, `S3FileSystem`, `GCSFileSystem`, and 
`AzureBlobFileSystem`. 
   While [PyArrow uses its FS 
implementations](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/pyarrow.py#L329-L386)
 including `LocalFileSystem`, `S3FileSystem`, `HadoopFileSystem`, and 
`GcsFileSystem`.
   PyArrow is currently missing the `HadoopFileSystem` implementation but it 
has [support for 
HDFS](https://arrow.apache.org/docs/python/filesystems.html#hadoop-distributed-file-system-hdfs).
   
   5. Fsspec and PyArrow can be used directionally
   PyArrow can [use fsspec-based 
filesystem](https://arrow.apache.org/docs/python/filesystems.html#using-fsspec-compatible-filesystems-with-arrow).
   FsSpec can [wrap PyArrow 
filesystem](https://arrow.apache.org/docs/python/filesystems.html#using-arrow-filesystems-with-fsspec).
   
   
   
   
   
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org
For additional commands, e-mail: issues-h...@iceberg.apache.org

Reply via email to