Hello,
I forgot to say I redid the data set setting the RNG seed first.
set.seed(2020)
n <- 50
x <- 1:n
y <- sample(1:3, n, replace = TRUE)
z <- rnorm(n)
tib <- tibble(x,y,z)
Also, don't do
as_tibble(cbind(...))
as.data.frame(cbind(...))
If one of the variables is of a different class (example, "character")
all variables are coerced to the least common denominator. It's much
better to call tibble() or data.frame() directly.
Hope this helps,
Rui Barradas
Às 12:04 de 05/07/2020, Rui Barradas escreveu:
Hello,
You can pass a grouped tibble to a function with grouped_modify but the
function must return a data.frame (or similar).
## this will also do it
#sillyFun <- function(tib){
# tibble(nrow = nrow(tib), ncol = ncol(tib))
#}
sillyFun <- function(tib){
data.frame(nrow = nrow(tib), ncol = ncol(tib)))
}
tib %>%
group_by(y) %>%
group_modify(~ sillyFun(.))
## A tibble: 3 x 3
## Groups: y [3]
# y nrow ncol
# <dbl> <int> <int>
#1 1 17 2
#2 2 21 2
#3 3 12 2
Hope this helps,
Rui Barradas
Às 09:43 de 05/07/2020, Chris Evans escreveu:
Apologies if this is a stupid question but searching keeps getting
things I know and don't need.
What I want to do is to use the group-by() power of dplyr to run
functions that expect a dataframe/tibble per group but I can't see how
do it. Here is a reproducible example.
### create trivial tibble
n <- 50
x <- 1:n
y <- sample(1:3, n, replace = TRUE)
z <- rnorm(n)
tib <- as_tibble(cbind(x,y,z))
### create trivial function that expects a tibble/data frame
sillyFun <- function(tib){
return(list(nrow = nrow(tib),
ncol = ncol(tib)))
}
### works fine on the whole tibble
tib %>%
summarise(dim = list(sillyFun(.))) %>%
unnest_wider(dim)
That gives me:
# A tibble: 1 x 2
nrow ncol
<int> <int>
1 50 3
### So I try the following hoping to apply the function to the grouped
tibble
tib %>%
group_by(y) %>%
summarise(dim = list(sillyFun(.))) %>%
unnest_wider(dim)
### But that gives me:
# A tibble: 3 x 3
y nrow ncol
<dbl> <int> <int>
1 1 50 3
2 2 50 3
3 3 50 3
Clearly "." is still passing the whole tibble, not the grouped
subsets. What I can't find is whether there is an alternative to "."
that would pass just the grouped subset of the tibble.
I have bodged my way around this by writing a function that takes
individual columns and reassembles them into a data frame that the
actual functions I need to use require but that takes me back to a lot
of clumsiness both selecting the variables to pass in the dplyr call
to the function and putting the reassemble-to-data-frame bit in the
function I call. (The functions I really need are reliability
explorations and can called on whole dataframes.)
I know I can do this using base R split and lapply but I feel sure it
must be possible to do this within dplyr/tidyverse. I'm slowly
transferring most of my code to the tidyverse and hitting frustrations
but also finding that it does really help me program more sensibly,
handle relational data structures more easily, and write code that I
seem better at reading when I come back to it after months on other
things so I am slowly trying to move all my coding to tidyverse. If I
could see how to do this, it would help.
Very sorry if the answer should be blindingly obvious to me. I'd also
love to have pointers to guidance to the tidyverse written for people
who aren't professional coders or statisticians and that go a bit
beyond the obvious basics of tidyverse into issues like this.
TIA,
Chris
--
Este e-mail foi verificado em termos de vírus pelo software antivírus Avast.
https://www.avast.com/antivirus
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.