jaidisido opened a new issue, #41365:
URL: https://github.com/apache/arrow/issues/41365

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   Pyarrow fs incorrectly resolves valid S3 URIs with a whitespace as a local 
path:
   ```python
   from pyarrow.fs import _resolve_filesystem_and_path, FileSystem
   
   uri = "s3://bucket/prefix with space/a=a"
   
   resolved_filesystem, resolved_path = _resolve_filesystem_and_path(uri, None)
   
   resolved_filesystem
   <pyarrow._fs.LocalFileSystem at 0x10316ff30>
   ```
   
   This causes subsequent calls such as getting the file info to fail:
   ```python
   path_info = resolved_filesystem.get_file_info(resolved_path)
   
   pyarrow.lib.ArrowInvalid: Expected a local filesystem path, got a URI...
   ```
   
   A quick look into the 
[method](https://github.com/apache/arrow/blob/main/python/pyarrow/fs.py#L165) 
indicates that a LocalFilesytem is chosen by default and returned if 
alternative filesystems are not detected which seems like a dubious strategy...
   
   I assume this is 
[where](https://github.com/apache/arrow/blob/main/python/pyarrow/fs.py#L179) 
the S3 filesystem should be detected but a URI containing a whitespace seems to 
throw an exception although it's valid:
   ```python
   filesystem, path = FileSystem.from_uri(uri)
   
   Cannot parse URI: 's3://bucket/prefix with space/a=a/'
   ```
   
   ### Component(s)
   
   Python


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to