r2evans opened a new issue, #46149: URL: https://github.com/apache/arrow/issues/46149
### Describe the bug, including details regarding any error messages, version, and platform. I'm using `sshfs` to read parquet files remotely. Using `sshfs-2.10` (and FUSE 2.9.9), everything works fine. I updated to macfuse-4.10.2 and sshfs-3.7.3 (from https://macfuse.github.io/), `read_parquet` works but `open_dataset() |> collect()` fails with: ``` ### truncated for privacy reasons, but this works completely str(arrow::read_parquet("/sshfs/path/to/file.pq")[0,]) # Classes ‘data.table’ and 'data.frame': 0 obs. of 10 variables: # $ Name : chr # $ Coordinates : chr # $ Type : int # $ Enabled : logi # $ TrackAbbrev : chr # $ TrackName : chr # $ TrackId : chr # $ SessionId : chr # $ TrackConfigurationId: chr # $ ModifiedTime : 'POSIXct' num(0) # - attr(*, "tzone")= chr "UTC" ### open_dataset works, collect does not arr <- arrow::open_dataset("/sshfs/path/to/file.pq") arr # FileSystemDataset with 1 Parquet file # 10 columns # Name: string # Coordinates: string # Type: int32 # Enabled: bool # TrackAbbrev: string # TrackName: string # TrackId: string # SessionId: string # TrackConfigurationId: string # ModifiedTime: timestamp[us, tz=UTC] # See $metadata for additional Schema metadata collect(arr) # Error in `compute.Dataset()`: # ! IOError: fcntl(fd, F_RDADVISE, ...) failed. Detail: [errno 22] Invalid argument # Run `rlang::last_trace()` to see where the error occurred. rlang::last_trace(drop=FALSE) # <error/rlang_error> # Error in `compute.Dataset()`: # ! IOError: fcntl(fd, F_RDADVISE, ...) failed. Detail: [errno 22] Invalid argument # --- # Backtrace: # ▆ # 1. ├─dplyr::collect(arrow::open_dataset("/sshfs/path/to/file.pq")) # 2. └─arrow:::collect.Dataset(arrow::open_dataset("/sshfs/path/to/file.pq")) # 3. ├─arrow:::collect.ArrowTabular(compute.Dataset(x), as_data_frame) # 4. └─arrow:::compute.Dataset(x) # 5. └─base::tryCatch(...) # 6. └─base (local) tryCatchList(expr, classes, parentenv, handlers) # 7. └─base (local) tryCatchOne(expr, names, parentenv, handlers[[1L]]) # 8. └─value[[3L]](cond) # 9. └─arrow:::augment_io_error_msg(e, call, schema = schema()) # 10. └─rlang::abort(msg, call = call) ``` Versions: ```r packageVersion("arrow") # [1] ‘19.0.1.1’ packageVersion("dplyr") # [1] ‘1.1.4’ R.version # _ # platform aarch64-apple-darwin20 # arch aarch64 # os darwin20 # system aarch64, darwin20 # status # major 4 # minor 4.3 # year 2025 # month 02 # day 28 # svn rev 87843 # language R # version.string R version 4.4.3 (2025-02-28) # nickname Trophy Case ``` I've not found another file access that fails (on various file formats). I don't know that it's _not_ a bug in `sshfs`, perhaps there's a specific system call that `collect()` uses that others do not. I'm not familiar enough with the underlying code to be able to parse that out. Can somebody help to understand and qualify that aspect of the problem? ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org