r2evans opened a new issue, #48712:
URL: https://github.com/apache/arrow/issues/48712

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   I started adding this solely as a comment on 
https://github.com/apache/arrow/issues/40423, but since the warning just 
started for me when I updated to `arrow_22.0.0` (and R-4.5) and that other 
issue is from 2024, I thought it might be a different cause. It may also be 
related to https://github.com/apache/arrow/issues/32729. A common theme is the 
presence of attributes in the columns (as shown in the second link).
   
   I can trigger the issue with a named-vector as one of the columns:
   
   ```r
   mutate(mtcars, cyl = setNames(nm = cyl)) |>
     arrow_table() |>
     rename_with(.fn = toupper)
   # Warning: Invalid metadata$r
   # Warning: Invalid metadata$r
   # Table (query)
   # MPG: double
   # CYL: double
   # DISP: double
   # HP: double
   # DRAT: double
   # WT: double
   # QSEC: double
   # VS: double
   # AM: double
   # GEAR: double
   # CARB: double
   # See $.data for the source Arrow object
   ```
   
   This can be hacked in `open_dataset()` by removing the `"names"` component 
of the attributes, but this does not work with a table created with 
`arrow_table()`.
   
   This breaks at 
https://github.com/apache/arrow/blob/29586f4d28c50a4344f14a78dc7e091ab635fa72/r/R/metadata.R#L211
   
   ```r
   attributes(x)[names(r_metadata$attributes)] <- r_metadata$attributes
   # Error in attributes(x)[names(r_metadata$attributes)] <- 
r_metadata$attributes : 
   #   'names' attribute [32] must be the same length as the vector [0]
   
   ### for context
   x
   # numeric(0)
   attributes(x)
   # NULL
   r_metadata$attributes
   # $names
   #  [1] "6" "6" "4" "6" "8" "6" "8" "4" "4" "6" "6" "8" "8" "8" "8" "8" "8" 
"4" "4" "4" "4" "8" "8" "8" "8" "4" "4" "4" "8" "6" "8" "4"
   ```
   
   The error is because base R itself requires the `"names"` attribute to be 
sized the same as the data, which at this point in the call `x` is length 0 
(`numeric(0)`).
   
   The underlying issue is that `x` here is still lazy with a placeholder 
`numeric(0)`. Is it possible to change whether the data is realized already for 
this specific data path?
   
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to