yuqi1129 opened a new issue, #44438: URL: https://github.com/apache/arrow/issues/44438
### Describe the bug, including details regarding any error messages, version, and platform. Requirement.txt ```text requests==2.32.2 dataclasses-json==0.6.6 readerwriterlock==1.0.9 fsspec==2024.9.0 pyarrow==16.1.0 cachetools==5.3.3 google-auth==2.35.0 ``` ```python from pyarrow.fs import GcsFileSystem from fsspec.implementations.arrow import ArrowFSWrapper import os import pandas import pyarrow.dataset as dt; fileset_storage_location = "gs://xxxx/catalog/schema/fileset3" os.environ["GOOGLE_APPLICATION_CREDENTIALS"] = "xxxxx.json" selffs = ArrowFSWrapper(GcsFileSystem()) data = pandas.DataFrame({"Name": ["A", "B", "C", "D"], "ID": [20, 21, 19, 18]}) parquet_file = fileset_storage_location + "/test.parquet" data.to_parquet(parquet_file, filesystem=selffs) arrow_dataset = dt.dataset(parquet_file, filesystem=selffs) ``` We will run into the following message: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ec2-user/gravitino/clients/client-python/venv/lib64/python3.9/site-packages/pyarrow/dataset.py", line 794, in dataset return _filesystem_dataset(source, **kwargs) File "/home/ec2-user/gravitino/clients/client-python/venv/lib64/python3.9/site-packages/pyarrow/dataset.py", line 486, in _filesystem_dataset return factory.finish(schema) File "pyarrow/_dataset.pyx", line 3089, in pyarrow._dataset.DatasetFactory.finish File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 88, in pyarrow.lib.check_status File "pyarrow/io.pxi", line 341, in pyarrow.lib.NativeFile.seek File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: google::cloud::Status(OUT_OF_RANGE: Permanent error, with a last message of Request range not satisfiable error_info={reason=, domain=, metadata={gcloud-cpp.retry.function=ReadObjectNotWrapped, gcloud-cpp.retry.reason=permanent-error, gcloud-cpp.retry.original-message=Request range not satisfiable}}) If we switch the pyarrow version to: ``` fsspec==2024.3.1 pyarrow==15.0.2 ``` then the error message will be: Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ec2-user/gravitino/clients/client-python/venv/lib64/python3.9/site-packages/pyarrow/dataset.py", line 782, in dataset return _filesystem_dataset(source, **kwargs) File "/home/ec2-user/gravitino/clients/client-python/venv/lib64/python3.9/site-packages/pyarrow/dataset.py", line 475, in _filesystem_dataset return factory.finish(schema) File "pyarrow/_dataset.pyx", line 3025, in pyarrow._dataset.DatasetFactory.finish File "pyarrow/error.pxi", line 154, in pyarrow.lib.pyarrow_internal_check_status File "pyarrow/error.pxi", line 88, in pyarrow.lib.check_status File "pyarrow/io.pxi", line 328, in pyarrow.lib.NativeFile.seek File "pyarrow/error.pxi", line 91, in pyarrow.lib.check_status pyarrow.lib.ArrowInvalid: google::cloud::Status(OUT_OF_RANGE: Permanent error ReadObjectNotWrapped: Request range not satisfiable) OS & python (venv) [ec2-user@ip-111- client-python]$ python --version Python 3.9.16 (venv) [ec2-user@ip-111-client-python]$ uname -a Linux ip-xxxxx.ap-northeast-1.compute.internal 6.1.102-111.182.amzn2023.x86_64 #1 SMP PREEMPT_DYNAMIC Tue Aug 13 22:23:09 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux (venv) [ec2-user@ip-172-31-10-123 client-python ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org