blongworth opened a new issue, #45139: URL: https://github.com/apache/arrow/issues/45139
### Describe the bug, including details regarding any error messages, version, and platform. slice_sample() in Arrow 18.1.0 does not randomly sample rows from an arrow dataset. Reprex follows. I'd expect a random sample of rows, so x distributed in 1:5000. ``` r library(dplyr) library(arrow) df <- data.frame(x = 1:5000) ds <- arrow_table(df) slice_sample(ds, n = 10) |> collect() #> # A tibble: 10 × 1 #> x #> <int> #> 1 3 #> 2 4 #> 3 16 #> 4 17 #> 5 65 #> 6 66 #> 7 74 #> 8 93 #> 9 123 #> 10 129 ``` <sup>Created on 2024-12-31 with [reprex v2.1.1](https://reprex.tidyverse.org)</sup> <details style="margin-bottom:10px;"> <summary> Session info </summary> ``` r sessioninfo::session_info() #> ─ Session info ─────────────────────────────────────────────────────────────── #> setting value #> version R version 4.4.2 (2024-10-31) #> os macOS Sonoma 14.7 #> system aarch64, darwin20 #> ui X11 #> language (EN) #> collate en_US.UTF-8 #> ctype en_US.UTF-8 #> tz America/New_York #> date 2024-12-31 #> pandoc 3.2 @ /Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via rmarkdown) #> #> ─ Packages ─────────────────────────────────────────────────────────────────── #> package * version date (UTC) lib source #> arrow * 18.1.0 2024-12-05 [1] CRAN (R 4.4.1) #> assertthat 0.2.1 2019-03-21 [1] CRAN (R 4.4.0) #> bit 4.0.5 2022-11-15 [1] CRAN (R 4.4.0) #> bit64 4.0.5 2020-08-30 [1] CRAN (R 4.4.0) #> cli 3.6.2 2023-12-11 [1] CRAN (R 4.4.0) #> digest 0.6.35 2024-03-11 [1] CRAN (R 4.4.0) #> dplyr * 1.1.4 2023-11-17 [1] CRAN (R 4.4.0) #> evaluate 0.23 2023-11-01 [1] CRAN (R 4.4.0) #> fansi 1.0.6 2023-12-08 [1] CRAN (R 4.4.0) #> fastmap 1.1.1 2023-02-24 [1] CRAN (R 4.4.0) #> fs 1.6.4 2024-04-25 [1] CRAN (R 4.4.0) #> generics 0.1.3 2022-07-05 [1] CRAN (R 4.4.0) #> glue 1.7.0 2024-01-09 [1] CRAN (R 4.4.0) #> htmltools 0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0) #> knitr 1.46 2024-04-06 [1] CRAN (R 4.4.0) #> lifecycle 1.0.4 2023-11-07 [1] CRAN (R 4.4.0) #> magrittr 2.0.3 2022-03-30 [1] CRAN (R 4.4.0) #> pillar 1.9.0 2023-03-22 [1] CRAN (R 4.4.0) #> pkgconfig 2.0.3 2019-09-22 [1] CRAN (R 4.4.0) #> purrr 1.0.2 2023-08-10 [1] CRAN (R 4.4.0) #> R6 2.5.1 2021-08-19 [1] CRAN (R 4.4.0) #> reprex 2.1.1 2024-07-06 [1] CRAN (R 4.4.0) #> rlang 1.1.3 2024-01-10 [1] CRAN (R 4.4.0) #> rmarkdown 2.26 2024-03-05 [1] CRAN (R 4.4.0) #> rstudioapi 0.16.0 2024-03-24 [1] CRAN (R 4.4.0) #> sessioninfo 1.2.2 2021-12-06 [1] CRAN (R 4.4.0) #> tibble 3.2.1 2023-03-20 [1] CRAN (R 4.4.0) #> tidyselect 1.2.1 2024-03-11 [1] CRAN (R 4.4.0) #> utf8 1.2.4 2023-10-22 [1] CRAN (R 4.4.0) #> vctrs 0.6.5 2023-12-01 [1] CRAN (R 4.4.0) #> withr 3.0.0 2024-01-16 [1] CRAN (R 4.4.0) #> xfun 0.49 2024-10-31 [1] CRAN (R 4.4.1) #> yaml 2.3.8 2023-12-11 [1] CRAN (R 4.4.0) #> #> [1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library #> #> ────────────────────────────────────────────────────────────────────────────── ``` </details> As this issue could be dangerous for someone assuming a random sample, should there be a note in the docs or slice_sample() removed until it's fixed? ### Component(s) R -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org