kevinjqliu opened a new issue, #310: URL: https://github.com/apache/iceberg-python/issues/310
### Feature Request / Improvement **Can we consolidate and standardize FileIO to the PyArrow implementation?** There are currently two different FileIO implementations, `ARROW_FILE_IO` and `FSSPEC_FILE_IO`. `ARROW_FILE_IO` uses [Apache Arrow's Filesystem Interface](https://arrow.apache.org/docs/python/filesystems.html#) while `FSSPEC_FILE_IO` uses the [`fsspec` library](https://filesystem-spec.readthedocs.io/en/latest/). Here are a few reasons for consolidating: 1. PyArrow is already preferred over FsSpec for various FS implementations. https://github.com/apache/iceberg-python/blob/cd7fb502900a717d6b902a398b267eb10e4faa9b/pyiceberg/io/__init__.py#L273-L282 2. PyIceberg is becoming more coupled with PyArrow, `to_arrow()` and `pa.Table` are widely used for reading and writing, including the new feature #305 3. Easier to keep the 2 FileIO's behavior in sync. For example, FsSpec defaults the path with no scheme (`/tmp/warehouse`) to the `file` scheme, but PyArrow does not. See [#301](https://github.com/apache/iceberg-python/pull/301/files#diff-24c3aa912b523fdb2afba6a0ea2dfe69fdcd05d9268e1e13ac1023ac26b54cccR176) 4. The two FileIO implementations are not that different from one another. [FsSpec can use its underlying FS implementations](https://github.com/apache/iceberg-python/blob/cd7fb502900a717d6b902a398b267eb10e4faa9b/pyiceberg/io/fsspec.py#L175-L184), including `LocalFileSystem`, `S3FileSystem`, `GCSFileSystem`, and `AzureBlobFileSystem`. While [PyArrow uses its FS implementations](https://github.com/apache/iceberg-python/blob/main/pyiceberg/io/pyarrow.py#L329-L386) including `LocalFileSystem`, `S3FileSystem`, `HadoopFileSystem`, and `GcsFileSystem`. PyArrow is currently missing the `HadoopFileSystem` implementation but it has [support for HDFS](https://arrow.apache.org/docs/python/filesystems.html#hadoop-distributed-file-system-hdfs). 5. Fsspec and PyArrow can be used directionally PyArrow can [use fsspec-based filesystem](https://arrow.apache.org/docs/python/filesystems.html#using-fsspec-compatible-filesystems-with-arrow). FsSpec can [wrap PyArrow filesystem](https://arrow.apache.org/docs/python/filesystems.html#using-arrow-filesystems-with-fsspec). -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@iceberg.apache.org For additional commands, e-mail: issues-h...@iceberg.apache.org