dberenbaum opened a new issue, #43497: URL: https://github.com/apache/arrow/issues/43497
### Describe the bug, including details regarding any error messages, version, and platform. Take the following example using a publicly available dataset: ```python import gcsfs from pyarrow.dataset import dataset # without fsspec filesystem, get segmentation fault fs = None # with fsspec filesystem, hangs and never finishes # fs = gcsfs.GCSFileSystem() uri = "gs://datachain-demo/laion-aesthetics-csv/laion_aesthetics_1024_33M_1.csv" ds = dataset(uri, format="csv", filesystem=fs) print(ds.head(5)) ``` As noted in the comments, depending on which filesystem is passed, it will either hang indefinitely or hit a segmentation fault. Strangely, s3 paths work (don't hang or fail) with the pyarrow filesystem but hang with the fsspec s3fs filesystem. Other findings: - Similar operations like `ds.take()` and `next(ds.to_batches())` have the same behavior as `ds.head()` - `ds.head(use_threads=False)` completes successfully with any filesystem but takes much longer ### Component(s) Python -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org