r2evans opened a new issue, #46149:
URL: https://github.com/apache/arrow/issues/46149

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   I'm using `sshfs` to read parquet files remotely. Using `sshfs-2.10` (and 
FUSE 2.9.9), everything works fine.
   
   I updated to macfuse-4.10.2 and sshfs-3.7.3 (from 
https://macfuse.github.io/), `read_parquet` works but `open_dataset() |> 
collect()` fails with:
   
   ```
   ### truncated for privacy reasons, but this works completely
   str(arrow::read_parquet("/sshfs/path/to/file.pq")[0,])
   # Classes ‘data.table’ and 'data.frame':     0 obs. of  10 variables:
   #  $ Name                : chr 
   #  $ Coordinates         : chr 
   #  $ Type                : int 
   #  $ Enabled             : logi 
   #  $ TrackAbbrev         : chr 
   #  $ TrackName           : chr 
   #  $ TrackId             : chr 
   #  $ SessionId           : chr 
   #  $ TrackConfigurationId: chr 
   #  $ ModifiedTime        : 'POSIXct' num(0) 
   #  - attr(*, "tzone")= chr "UTC"
   
   ### open_dataset works, collect does not
   arr <- arrow::open_dataset("/sshfs/path/to/file.pq")
   arr
   # FileSystemDataset with 1 Parquet file
   # 10 columns
   # Name: string
   # Coordinates: string
   # Type: int32
   # Enabled: bool
   # TrackAbbrev: string
   # TrackName: string
   # TrackId: string
   # SessionId: string
   # TrackConfigurationId: string
   # ModifiedTime: timestamp[us, tz=UTC]
   # See $metadata for additional Schema metadata
   
   collect(arr)
   # Error in `compute.Dataset()`:
   # ! IOError: fcntl(fd, F_RDADVISE, ...) failed. Detail: [errno 22] Invalid 
argument
   # Run `rlang::last_trace()` to see where the error occurred.
   
   rlang::last_trace(drop=FALSE)
   # <error/rlang_error>
   # Error in `compute.Dataset()`:
   # ! IOError: fcntl(fd, F_RDADVISE, ...) failed. Detail: [errno 22] Invalid 
argument
   # ---
   # Backtrace:
   #      ▆
   #   1. ├─dplyr::collect(arrow::open_dataset("/sshfs/path/to/file.pq"))
   #   2. 
└─arrow:::collect.Dataset(arrow::open_dataset("/sshfs/path/to/file.pq"))
   #   3.   ├─arrow:::collect.ArrowTabular(compute.Dataset(x), as_data_frame)
   #   4.   └─arrow:::compute.Dataset(x)
   #   5.     └─base::tryCatch(...)
   #   6.       └─base (local) tryCatchList(expr, classes, parentenv, handlers)
   #   7.         └─base (local) tryCatchOne(expr, names, parentenv, 
handlers[[1L]])
   #   8.           └─value[[3L]](cond)
   #   9.             └─arrow:::augment_io_error_msg(e, call, schema = schema())
   #  10.               └─rlang::abort(msg, call = call)
   ```
   
   Versions:
   
   ```r
   packageVersion("arrow")
   # [1] ‘19.0.1.1’
   packageVersion("dplyr")
   # [1] ‘1.1.4’
   R.version
   #                _                           
   # platform       aarch64-apple-darwin20      
   # arch           aarch64                     
   # os             darwin20                    
   # system         aarch64, darwin20           
   # status                                     
   # major          4                           
   # minor          4.3                         
   # year           2025                        
   # month          02                          
   # day            28                          
   # svn rev        87843                       
   # language       R                           
   # version.string R version 4.4.3 (2025-02-28)
   # nickname       Trophy Case                 
   ```
   
   I've not found another file access that fails (on various file formats).
   
   I don't know that it's _not_ a bug in `sshfs`, perhaps there's a specific 
system call that `collect()` uses that others do not. I'm not familiar enough 
with the underlying code to be able to parse that out. Can somebody help to 
understand and qualify that aspect of the problem?
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to