[I] open_dataset with root directory inaccessible? [arrow]

via GitHub Tue, 10 Dec 2024 08:08:32 -0800


sonicseamus opened a new issue, #44992:
URL: https://github.com/apache/arrow/issues/44992

### Describe the usage question you have. Please include as many useful
details as possible.

Hello, I am hoping to use R to open the [MBTA LAMP Subway Performance
Data](performancedata.mbta.com) which is stored in daily Parquet files as an
Arrow `Table` object that I can query. The files are stored, separated by day,
in the following format:

> URL Construction: Replace [YYYY-MM-DD] with the YEAR, MONTH and DAY of the
requested service date.
>
>
https://performancedata.mbta.com/lamp/subway-on-time-performance-v1/YYYY-MM-DD-subway-on-time-performance-v1.parquet

The problem is that while the individual files are publicly accessible, the
root directory,
[](https://performancedata.mbta.com/lamp/subway-on-time-performance-v1/), is
not. As a result, I cannot use the typical
`open_dataset("https://performancedata.mbta.com/lamp/subway-on-time-performance-v1/";,
format = "parquet")` command to open the table.

Of course, I can access individual files (e.g. yesterday's) with
`read_parquet("https://performancedata.mbta.com/lamp/subway-on-time-performance-v1/2024-12-09-subway-on-time-performance-v1.parquet";)`,
and I could iterate through a bunch of dates, but I would prefer to use
Arrow's more efficient `open_dataset()`.

My question is: is there a way by specifying the `schema` or `partitioning`
that I can open the dataset even if the root directory is inaccessible,
considering that all the subsidiary files are indeed accessible?

### Component(s)

Parquet, R

--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

[I] open_dataset with root directory inaccessible? [arrow]

Reply via email to