blongworth opened a new issue, #45139:
URL: https://github.com/apache/arrow/issues/45139

   ### Describe the bug, including details regarding any error messages, 
version, and platform.
   
   slice_sample() in Arrow 18.1.0 does not randomly sample rows from an arrow 
dataset. Reprex follows. I'd expect a random sample of rows, so x distributed 
in 1:5000. 
   
   ``` r
   library(dplyr)
   library(arrow)
   
   df <- data.frame(x = 1:5000)
   
   ds <- arrow_table(df)
   
   slice_sample(ds, n = 10) |> 
     collect()
   #> # A tibble: 10 × 1
   #>        x
   #>    <int>
   #>  1     3
   #>  2     4
   #>  3    16
   #>  4    17
   #>  5    65
   #>  6    66
   #>  7    74
   #>  8    93
   #>  9   123
   #> 10   129
   ```
   
   <sup>Created on 2024-12-31 with [reprex 
v2.1.1](https://reprex.tidyverse.org)</sup>
   
   <details style="margin-bottom:10px;">
   <summary>
   Session info
   </summary>
   
   ``` r
   sessioninfo::session_info()
   #> ─ Session info 
───────────────────────────────────────────────────────────────
   #>  setting  value
   #>  version  R version 4.4.2 (2024-10-31)
   #>  os       macOS Sonoma 14.7
   #>  system   aarch64, darwin20
   #>  ui       X11
   #>  language (EN)
   #>  collate  en_US.UTF-8
   #>  ctype    en_US.UTF-8
   #>  tz       America/New_York
   #>  date     2024-12-31
   #>  pandoc   3.2 @ 
/Applications/RStudio.app/Contents/Resources/app/quarto/bin/tools/aarch64/ (via 
rmarkdown)
   #> 
   #> ─ Packages 
───────────────────────────────────────────────────────────────────
   #>  package     * version date (UTC) lib source
   #>  arrow       * 18.1.0  2024-12-05 [1] CRAN (R 4.4.1)
   #>  assertthat    0.2.1   2019-03-21 [1] CRAN (R 4.4.0)
   #>  bit           4.0.5   2022-11-15 [1] CRAN (R 4.4.0)
   #>  bit64         4.0.5   2020-08-30 [1] CRAN (R 4.4.0)
   #>  cli           3.6.2   2023-12-11 [1] CRAN (R 4.4.0)
   #>  digest        0.6.35  2024-03-11 [1] CRAN (R 4.4.0)
   #>  dplyr       * 1.1.4   2023-11-17 [1] CRAN (R 4.4.0)
   #>  evaluate      0.23    2023-11-01 [1] CRAN (R 4.4.0)
   #>  fansi         1.0.6   2023-12-08 [1] CRAN (R 4.4.0)
   #>  fastmap       1.1.1   2023-02-24 [1] CRAN (R 4.4.0)
   #>  fs            1.6.4   2024-04-25 [1] CRAN (R 4.4.0)
   #>  generics      0.1.3   2022-07-05 [1] CRAN (R 4.4.0)
   #>  glue          1.7.0   2024-01-09 [1] CRAN (R 4.4.0)
   #>  htmltools     0.5.8.1 2024-04-04 [1] CRAN (R 4.4.0)
   #>  knitr         1.46    2024-04-06 [1] CRAN (R 4.4.0)
   #>  lifecycle     1.0.4   2023-11-07 [1] CRAN (R 4.4.0)
   #>  magrittr      2.0.3   2022-03-30 [1] CRAN (R 4.4.0)
   #>  pillar        1.9.0   2023-03-22 [1] CRAN (R 4.4.0)
   #>  pkgconfig     2.0.3   2019-09-22 [1] CRAN (R 4.4.0)
   #>  purrr         1.0.2   2023-08-10 [1] CRAN (R 4.4.0)
   #>  R6            2.5.1   2021-08-19 [1] CRAN (R 4.4.0)
   #>  reprex        2.1.1   2024-07-06 [1] CRAN (R 4.4.0)
   #>  rlang         1.1.3   2024-01-10 [1] CRAN (R 4.4.0)
   #>  rmarkdown     2.26    2024-03-05 [1] CRAN (R 4.4.0)
   #>  rstudioapi    0.16.0  2024-03-24 [1] CRAN (R 4.4.0)
   #>  sessioninfo   1.2.2   2021-12-06 [1] CRAN (R 4.4.0)
   #>  tibble        3.2.1   2023-03-20 [1] CRAN (R 4.4.0)
   #>  tidyselect    1.2.1   2024-03-11 [1] CRAN (R 4.4.0)
   #>  utf8          1.2.4   2023-10-22 [1] CRAN (R 4.4.0)
   #>  vctrs         0.6.5   2023-12-01 [1] CRAN (R 4.4.0)
   #>  withr         3.0.0   2024-01-16 [1] CRAN (R 4.4.0)
   #>  xfun          0.49    2024-10-31 [1] CRAN (R 4.4.1)
   #>  yaml          2.3.8   2023-12-11 [1] CRAN (R 4.4.0)
   #> 
   #>  [1] /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/library
   #> 
   #> 
──────────────────────────────────────────────────────────────────────────────
   ```
   
   </details>
   
   As this issue could be dangerous for someone assuming a random sample, 
should there be a note in the docs or slice_sample() removed until it's fixed?
   
   ### Component(s)
   
   R


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: issues-unsubscr...@arrow.apache.org.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to