Hello All,
Would like to be able to summarize across in dplyr using variable names and a condition. Below is
an example "have" data set followed by an example "need" data set. After that,
I've got a vector of numeric variable names. After that, I've got the very humble beginnings of a
dplyr-based solution.
What I think I need to be able to do is to submit my variable names to dplyr
and then to have a conditional function. If the variable is is in my list of
names, calculate the mean and the std. If not, then calculate the mean but
label it as a proportion. The question is how to do that. It appears that using
variable names might involve !!, or possibly enquo, or possibly quo, but I
haven't had much success with these. I imagine I might have been very close but
not quite have gotten it. The conditional part seems less difficult but I'm not
quite sure how to do that either.
Help with this would be greatly appreciated.
Thanks,
Paul
have <- structure(list(
ptno = c("A", "B", "C", "D", "E", "F", "G", "H", "I", "J", "K", "L",
"M",
"N", "O", "P", "Q", "R", "S", "T", "U", "V", "W", "X", "Y",
"Z"),
age1 = c(74, 70, 78, 79, 72, 81, 76, 58, 53, 74, 72, 74, 75,
73, 80, 62, 67, 65, 83, 67, 72, 90, 73, 84, 90, 51),
age2 = c(71, 67, 72, 74, 65, 79, 70, 49, 45, 68, 70, 71, 74,
71, 69, 58, 65, 59, 80, 60, 68, 87, 71, 82, 80, 49),
gender_male = c(1L, 1L, 0L, 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L, 0L,
1L, 0L, 0L, 1L, 1L, 1L, 0L, 0L, 1L, 0L, 1L, 1L, 0L),
gender_female = c(0L, 0L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 1L, 0L, 1L,
0L, 1L, 1L, 0L, 0L, 0L, 1L, 1L, 0L, 1L, 0L, 0L, 1L),
race_white = c(0L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 0L, 1L, 1L, 1L, 0L,
1L, 1L, 0L, 1L, 1L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L),
race_black = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L),
race_other = c(1L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 0L, 0L, 1L,
0L, 0L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L)),
row.names = c(NA, -26L), class = c("tbl_df", "tbl", "data.frame"))
need <-structure(list(
age1_mean = 72.8076923076923, age1_std = 9.72838827666425,
age2_mean = 68.2307692307692, age2_std = 10.2227498934785,
gender_male_prop = 0.576923076923077, gender_female_prop =
0.423076923076923,
race_white_prop = 0.769230769230769, race_black_prop =
0.0384615384615385,
race_other_prop = 0.192307692307692),
row.names = c(NA, -1L), class = c("tbl_df", "tbl", "data.frame"))
vars_num <- c("age1", "age2")
library(magrittr)
library(dplyr)
have %>%
summarise(across(
.cols = !contains("ptno"),
.fns = list(mean = mean, std = sd),
.names = "{col}_{fn}"
))
______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.