I use parts of the tidyverse frequently, but this post is the best argument I can imagine for learning base R techniques.
On July 1, 2021 8:41:06 PM PDT, Avi Gross via R-help <r-help@r-project.org> wrote: >Micha, > >Others have provided ways in standard R so I will contribute a somewhat >odd solution using the dplyr and related packages in the tidyverse >including a sample data.frame/tibble I made. It requires newer versions >of R and other packages as it uses some fairly esoteric features >including "the big bang" and the new ":=" operator and more. > >You can use your own data with whatever columns you need, of course. > >The goal is to have umpteen columns in the data that you want to add an >additional columns to an existing tibble that is the result of >concatenating the rowwise contents of a dynamically supplied vector of >column names in quotes. First we need something to work with so here is >a sample: > >#--start ># load required packages, or a bunch at once! >library(tidyverse) > ># Pick how many rows you want. For a demo, 3 is plenty N <- 3 > ># Make a sample tibble with N rows and the following 4 columns mydf <- >tibble(alpha = 1:N, > beta=letters[1:N], > gamma = N:1, > delta = month.abb[1:N]) > ># show the original tibble >print(mydf) >#--end > >In flat text mode, here is the output: > >> print(mydf) ># A tibble: 3 x 4 >alpha beta gamma delta ><int> <chr> <int> <chr> > 1 1 a 3 Jan >2 2 b 2 Feb >3 3 c 1 Mar > >Now I want to make a function that is used instead of the mutate verb. >I made a weird one-liner that is a tad hard to explain so first let me >mention the requirements. > >It will take a first argument that is a tibble and in a pipeline this >would be passed invisibly. >The second required argument is a vector or list containing the names >of the columns as strings. A column can be re-used multiple times. >The third optional argument is what to name the new column with a >default if omitted. >The fourth optional argument allows you to choose a different separator >than "" if you wish. > >The function should be usable in a pipeline on both sides so it should >also return the input tibble with an extra column to the output. > >Here is the function: > >my_mutate <- function(df, columns, colnew="concatenated", sep=""){ > df %>% > mutate( "{colnew}" := paste(!!!rlang::syms(columns), sep = sep )) } > >Yes, the above can be done inline as a long one-liner: > >my_mutate <- function(df, columns, colnew="concatenated", sep="") >mutate(df, "{colnew}" := paste(!!!rlang::syms(columns), sep = sep )) > >Here are examples of it running: > > >> choices <- c("beta", "delta", "alpha", "delta") mydf %>% >> my_mutate(choices, "me2") ># A tibble: 3 x 5 >alpha beta gamma delta me2 ><int> <chr> <int> <chr> <chr> > 1 1 a 3 Jan aJan1Jan >2 2 b 2 Feb bFeb2Feb >3 3 c 1 Mar cMar3Mar >> mydf %>% my_mutate(choices, "me2",":") ># A tibble: 3 x 5 >alpha beta gamma delta me2 ><int> <chr> <int> <chr> <chr> > 1 1 a 3 Jan a:Jan:1:Jan >2 2 b 2 Feb b:Feb:2:Feb >3 3 c 1 Mar c:Mar:3:Mar >> mydf %>% my_mutate(c("beta", "beta", "gamma", "gamma", "delta", >> "alpha")) ># A tibble: 3 x 5 >alpha beta gamma delta concatenated ><int> <chr> <int> <chr> <chr> > 1 1 a 3 Jan aa33Jan1 >2 2 b 2 Feb bb22Feb2 >3 3 c 1 Mar cc11Mar3 >> mydf %>% my_mutate(list("beta", "beta", "gamma", "gamma", "delta", >> "alpha")) ># A tibble: 3 x 5 >alpha beta gamma delta concatenated ><int> <chr> <int> <chr> <chr> > 1 1 a 3 Jan aa33Jan1 >2 2 b 2 Feb bb22Feb2 >3 3 c 1 Mar cc11Mar3 >> mydf %>% my_mutate(columns=list("alpha", "beta", "gamma", "delta", >> "gamma", "beta", "alpha"), > + sep="/*/", > + colnew="NewRandomNAME" > + ) ># A tibble: 3 x 5 >alpha beta gamma delta NewRandomNAME ><int> <chr> <int> <chr> <chr> > 1 1 a 3 Jan 1/*/a/*/3/*/Jan/*/3/*/a/*/1 >2 2 b 2 Feb 2/*/b/*/2/*/Feb/*/2/*/b/*/2 >3 3 c 1 Mar 3/*/c/*/1/*/Mar/*/1/*/c/*/3 > >Does this meet your normal need? Just to show it works in a pipeline, >here is a variant: > >mydf %>% > tail(2) %>% > my_mutate(c("beta", "beta"), "betabeta") %>% > print() %>% > my_mutate(list("alpha", "betabeta", "gamma"), > "buildson", > "&") > >The above only keeps the last two lines of the tibble, makes a double >copy of "beta" under a new name, prints the intermediate result, >continues to make another concatenation using the variable created >earlier then prints the result: > >Here is the run: > >> mydf %>% > + tail(2) %>% > + my_mutate(c("beta", "beta"), "betabeta") %>% > + print() %>% > + my_mutate(list("alpha", "betabeta", "gamma"), > + "buildson", > + "&") ># A tibble: 2 x 5 >alpha beta gamma delta betabeta ><int> <chr> <int> <chr> <chr> > 1 2 b 2 Feb bb >2 3 c 1 Mar cc ># A tibble: 2 x 6 >alpha beta gamma delta betabeta buildson ><int> <chr> <int> <chr> <chr> <chr> > 1 2 b 2 Feb bb 2&bb&2 >2 3 c 1 Mar cc 3&cc&1 > >As to how the darn function works, that was a learning experience for >me to build using features I have not had occasion to use. If anyone >remains interested, read on. > >The following needs newish features: > > "{colnew}" := SOMETHING > >The colon-equals operator in newer R/dplyr can be sort of used in an >odd way that allows the name of the variable to be in quotes and in >brackets akin to the way glue() does it. The variable colnew is >evaluated and substituted so the name used for the column is now >dynamic. > >The function does a paste using this: > > !!!rlang::syms(columns) > >The problem is paste() wants multiple arguments and we have a single >argument that is either a vector or another kind of vector called a >list. The trick is to convert the vector into symbols then use "!!!" to >convert something like 'c("alpha", "beta", "gamma")' into something >more like ' "alpha", "beta", "gamma" ' so that paste sees them as >multiple arguments to concatenate in vector fashion. > >And, the function is not polished but I am sure you can all see some of >what is needed like checking the arguments for validity, including not >having a name for the new column that clashes with existing column >names, doing something sane if no columns to concatenate are offered >and so on. > >Just showing a different approach. The base R methods are fine. > >- Avi > >-----Original Message----- >From: R-help <r-help-boun...@r-project.org> On Behalf Of Micha Silver >Sent: Thursday, July 1, 2021 10:36 AM >To: R-help@r-project.org >Subject: [R] concatenating columns in data.frame > >I need to create a new data.frame column as a concatenation of existing >character columns. But the number and name of the columns to >concatenate needs to be passed in dynamically. The code below does what >I want, but seems very clumsy. Any suggestions how to improve? > > >df = data.frame("A"=sample(letters, 10), "B"=sample(letters, 10), >"C"=sample(letters,10), "D"=sample(letters, 10)) > ># Which columns to concat: > >use_columns = c("D", "B") > > >UpdateCombo = function(df, use_columns) { > use_df = df[, use_columns] > combo_list = lapply(1:nrow(use_df), function(r) { > r_combo = paste(use_df[r,], collapse="_") > return(data.frame("Combo" = r_combo)) > }) > combo = do.call(rbind, combo_list) > > names(combo) = "Combo" > > return(combo) > >} > > >combo_col = UpdateCombo(df, use_columns) > >df_combo = do.call(cbind, list(df, combo_col)) > > >Thanks > > >-- >Micha Silver >Ben Gurion Univ. >Sde Boker, Remote Sensing Lab >cell: +972-523-665918 > >______________________________________________ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. > >______________________________________________ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. -- Sent from my phone. Please excuse my brevity. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.