sonicseamus opened a new issue, #44992: URL: https://github.com/apache/arrow/issues/44992
### Describe the usage question you have. Please include as many useful details as possible. Hello, I am hoping to use R to open the [MBTA LAMP Subway Performance Data](performancedata.mbta.com) which is stored in daily Parquet files as an Arrow `Table` object that I can query. The files are stored, separated by day, in the following format: > URL Construction: Replace [YYYY-MM-DD] with the YEAR, MONTH and DAY of the requested service date. > > https://performancedata.mbta.com/lamp/subway-on-time-performance-v1/YYYY-MM-DD-subway-on-time-performance-v1.parquet The problem is that while the individual files are publicly accessible, the root directory, [](https://performancedata.mbta.com/lamp/subway-on-time-performance-v1/), is not. As a result, I cannot use the typical `open_dataset("https://performancedata.mbta.com/lamp/subway-on-time-performance-v1/", format = "parquet")` command to open the table. Of course, I can access individual files (e.g. yesterday's) with `read_parquet("https://performancedata.mbta.com/lamp/subway-on-time-performance-v1/2024-12-09-subway-on-time-performance-v1.parquet")`, and I could iterate through a bunch of dates, but I would prefer to use Arrow's more efficient `open_dataset()`. My question is: is there a way by specifying the `schema` or `partitioning` that I can open the dataset even if the root directory is inaccessible, considering that all the subsidiary files are indeed accessible? ### Component(s) Parquet, R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org