pdmetcalfe opened a new issue, #50163:
URL: https://github.com/apache/arrow/issues/50163
### Describe the bug, including details regarding any error messages,
version, and platform.
## Description
Three methods in the arrow R package access R metadata using `x$metadata$r`.
Because `$` on a named list uses partial matching, any schema-level metadata
key that starts with `"r"` but is not `"r"` (e.g. `"rachel"`, `"row_count"`,
`"result"`) will be erroneously matched and its value passed to
`apply_arrow_r_metadata()` or used as group var metadata. This causes spurious
`"Invalid metadata$r"` warnings or hard errors depending on the matched value.
The fix in all three locations is to replace `x$metadata$r` with
`x$metadata[["r"]]`.
## Affected code
- `collect.ArrowTabular`: `apply_arrow_r_metadata(df, x$metadata$r)`
- `as.data.frame.ArrowTabular`: `apply_arrow_r_metadata(df, x$metadata$r)`
- `group_vars.ArrowTabular`: `x$metadata$r$attributes$.group_vars`
## Reprex
```r
library(arrow)
library(dplyr)
# Build a table with a schema metadata key that starts with "r" but isn't
"r".
# This can happen when integrating with systems that attach their own
metadata
# (e.g., a key called "rachel", "row_count", "result", etc.).
tbl <- arrow_table(x = 1:3)
tbl_rachel <- tbl$cast(
tbl$schema$WithMetadata(list(rachel = "some_value"))
)
# Confirm that $r partial-matches to $rachel, while [["r"]] correctly
returns NULL
meta <- tbl_rachel$metadata
meta$r # "some_value" <-- partial match: WRONG
meta[["r"]] # NULL <-- exact match: correct
# as.data.frame() spuriously warns "Invalid metadata$r"
as.data.frame(tbl_rachel)
#> Warning message: Invalid metadata$r
# collect() same spurious warning
collect(tbl_rachel)
#> Warning message: Invalid metadata$r
# group_vars() hard errors because it does
x$metadata$r$attributes$.group_vars
# and "$" is invalid on an atomic vector
group_vars(tbl_rachel)
#> Error in x$metadata$r$attributes : $ operator is invalid for atomic
vectors
```
## Expected behaviour
- `as.data.frame()` and `collect()` should return the data without any
warning — there is no `"r"` metadata key, so no R metadata should be applied.
- `group_vars()` should return `character(0)` — there are no group vars
encoded.
## Actual behaviour
- `as.data.frame()` and `collect()` emit a spurious `"Invalid metadata$r"`
warning.
- `group_vars()` throws `"$ operator is invalid for atomic vectors"`.
## Root cause
`schema$metadata` returns a plain R list. R's `$` operator performs partial
matching on lists, so `meta$r` resolves to `meta$rachel` when no exact `"r"`
key exists. The fix is to use `[[` (which never partial-matches) everywhere
`$metadata$r` appears:
```r
# Before (all three methods)
x$metadata$r
# After
x$metadata[["r"]]
```
## Session Info
```
R version 4.6.0 (2026-04-24)
Platform: aarch64-apple-darwin25.4.0
Running under: macOS Tahoe 26.5.1
Matrix products: default
BLAS: /opt/homebrew/Cellar/openblas/0.3.33/lib/libopenblasp-r0.3.33.dylib
LAPACK: /opt/homebrew/Cellar/r/4.6.0/lib/R/lib/libRlapack.dylib; LAPACK
version 3.12.1
locale:
[1] en_GB.UTF-8/en_GB.UTF-8/en_GB.UTF-8/C/en_GB.UTF-8/en_GB.UTF-8
time zone: Europe/London
tzcode source: internal
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] dplyr_1.2.1 arrow_24.0.0
loaded via a namespace (and not attached):
[1] assertthat_0.2.1 R6_2.6.1 bit_4.6.0 tidyselect_1.2.1
[5] magrittr_2.0.5 glue_1.8.1 tibble_3.3.1 pkgconfig_2.0.3
[9] bit64_4.8.2 generics_0.1.4 lifecycle_1.0.5 cli_3.6.6
[13] vctrs_0.7.3 compiler_4.6.0 purrr_1.2.2 pillar_1.11.1
[17] rlang_1.2.0
```
### Component(s)
R
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]