r2evans opened a new issue, #45373: URL: https://github.com/apache/arrow/issues/45373
### Describe the bug, including details regarding any error messages, version, and platform. I think there's a bug in when If there's an `arrange(.)` in the lazy pipeline that is followed by some aggregation with `summarize`, the collection still looks for the sorting column: ```r library(arrow) library(dplyr) arrow_table(mtcars) |> summarize(across(mpg, list(Min = min, Max = max))) |> collect() # # A tibble: 1 × 2 # mpg_Min mpg_Max # <dbl> <dbl> # 1 10.4 33.9 arrow_table(mtcars) |> arrange(mpg) |> summarize(across(mpg, list(Min = min, Max = max))) |> collect() # Error in compute.arrow_dplyr_query(x) : # Invalid: Invalid sort key column: No match for FieldRef.Name(mpg) in mpg_Min: double # mpg_Max: double # ---- # mpg_Min: # [ # [ # 10.4 # ] # ] # mpg_Max: # [ # [ # 33.9 # ] # ] ``` This example is somewhat contrived _here_, in that this summarization does not need ordered data. The underlying issue remains: why does it not sort the data _at that point_ and then summarize? I'm not certain if this is a problem with lazy sorting or if it is too aggressive preserving the sort-field(s). This behavior is in contrast to a `select`ion removing the sorting column: ```r arrow_table(mtcars) |> arrange(mpg) |> select(-mpg) |> collect() # # A tibble: 32 × 10 # cyl disp hp drat wt qsec vs am gear carb # <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> # 1 8 472 205 2.93 5.25 18.0 0 0 3 4 # 2 8 460 215 3 5.42 17.8 0 0 3 4 # 3 8 350 245 3.73 3.84 15.4 0 0 3 4 # 4 8 360 245 3.21 3.57 15.8 0 0 3 4 # 5 8 440 230 3.23 5.34 17.4 0 0 3 4 # 6 8 301 335 3.54 3.57 14.6 0 1 5 8 # 7 8 276. 180 3.07 3.78 18 0 0 3 3 # 8 8 304 150 3.15 3.44 17.3 0 0 3 2 # 9 8 318 150 2.76 3.52 16.9 0 0 3 2 # 10 8 351 264 4.22 3.17 14.5 0 1 5 4 # # ℹ 22 more rows # # ℹ Use `print(n = ...)` to see more rows ``` <details> <summary> <code>> sessionInfo()</code> </summary> ```r R version 4.4.2 (2024-10-31) Platform: aarch64-apple-darwin20 Running under: macOS Sequoia 15.2 Matrix products: default BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0 locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 time zone: America/New_York tzcode source: internal attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] arrow_18.1.0.1 dplyr_1.1.4 loaded via a namespace (and not attached): [1] assertthat_0.2.1 utf8_1.2.4 R6_2.5.1 bit_4.5.0.1 tidyselect_1.2.1 magrittr_2.0.3 glue_1.8.0 tibble_3.2.1 pkgconfig_2.0.3 bit64_4.5.2 [11] generics_0.1.3 lifecycle_1.0.4 cli_3.6.3 fansi_1.0.6 vctrs_0.6.5 withr_3.0.2 compiler_4.4.2 purrr_1.0.2 pillar_1.9.0 rlang_1.1.4 ``` </details> ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org