Micha,

Others have provided ways in standard R so I will contribute a somewhat odd 
solution using the dplyr and related packages in the tidyverse including a 
sample data.frame/tibble I made. It requires newer versions of R and other  
packages as it uses some fairly esoteric features including "the big bang" and 
the new ":=" operator and more.

You can use your own data with whatever columns you need, of course.

The goal is to have umpteen columns in the data that you want to add an 
additional columns to an existing tibble that is the result of concatenating 
the rowwise contents of a dynamically supplied vector of column names in 
quotes. First we need something to work with so here is a sample:

#--start
# load required packages, or a bunch at once!
library(tidyverse)

# Pick how many rows you want. For a demo, 3 is plenty N <- 3

# Make a sample tibble with N rows and the following 4 columns mydf <- 
tibble(alpha = 1:N, 
               beta=letters[1:N],
               gamma = N:1,
               delta = month.abb[1:N])

# show the original tibble
print(mydf)
#--end

In flat text mode, here is the output:

> print(mydf)
# A tibble: 3 x 4
alpha beta  gamma delta
<int> <chr> <int> <chr>
  1     1 a         3 Jan  
2     2 b         2 Feb  
3     3 c         1 Mar

Now I want to make a function that is used instead of the mutate verb. I made a 
weird one-liner that is a tad hard to explain so first let me mention the 
requirements.

It will take a first argument that is a tibble and in a pipeline this would be 
passed invisibly.
The second required argument is a vector or list containing the names of the 
columns as strings. A column can be re-used multiple times.
The third optional argument is what to name the new column with a default if 
omitted.
The fourth optional argument allows you to choose a different separator than "" 
if you wish.

The function should be usable in a pipeline on both sides so it should also 
return the input tibble with an extra column to the output.

Here is the function:

my_mutate <- function(df, columns, colnew="concatenated", sep=""){
  df %>%
    mutate( "{colnew}" := paste(!!!rlang::syms(columns), sep = sep )) }

Yes, the above can be done inline as a long one-liner:

my_mutate <- function(df, columns, colnew="concatenated", sep="") mutate(df, 
"{colnew}" := paste(!!!rlang::syms(columns), sep = sep ))

Here are examples of it running:


> choices <- c("beta", "delta", "alpha", "delta") mydf %>% 
> my_mutate(choices, "me2")
# A tibble: 3 x 5
alpha beta  gamma delta me2     
<int> <chr> <int> <chr> <chr>   
  1     1 a         3 Jan   aJan1Jan
2     2 b         2 Feb   bFeb2Feb
3     3 c         1 Mar   cMar3Mar
> mydf %>% my_mutate(choices, "me2",":")
# A tibble: 3 x 5
alpha beta  gamma delta me2        
<int> <chr> <int> <chr> <chr>      
  1     1 a         3 Jan   a:Jan:1:Jan
2     2 b         2 Feb   b:Feb:2:Feb
3     3 c         1 Mar   c:Mar:3:Mar
> mydf %>% my_mutate(c("beta", "beta", "gamma", "gamma", "delta", 
> "alpha"))
# A tibble: 3 x 5
alpha beta  gamma delta concatenated
<int> <chr> <int> <chr> <chr>       
  1     1 a         3 Jan   aa33Jan1    
2     2 b         2 Feb   bb22Feb2    
3     3 c         1 Mar   cc11Mar3    
> mydf %>% my_mutate(list("beta", "beta", "gamma", "gamma", "delta", 
> "alpha"))
# A tibble: 3 x 5
alpha beta  gamma delta concatenated
<int> <chr> <int> <chr> <chr>       
  1     1 a         3 Jan   aa33Jan1    
2     2 b         2 Feb   bb22Feb2    
3     3 c         1 Mar   cc11Mar3    
> mydf %>% my_mutate(columns=list("alpha", "beta", "gamma", "delta", 
> "gamma", "beta", "alpha"),
                     +                    sep="/*/",
                     +                    colnew="NewRandomNAME"
                     +                    )
# A tibble: 3 x 5
alpha beta  gamma delta NewRandomNAME              
<int> <chr> <int> <chr> <chr>                      
  1     1 a         3 Jan   1/*/a/*/3/*/Jan/*/3/*/a/*/1
2     2 b         2 Feb   2/*/b/*/2/*/Feb/*/2/*/b/*/2
3     3 c         1 Mar   3/*/c/*/1/*/Mar/*/1/*/c/*/3

Does this meet your normal need? Just to show it works in a pipeline, here is a 
variant:

mydf %>%
  tail(2) %>%
  my_mutate(c("beta", "beta"), "betabeta") %>%
  print() %>%
  my_mutate(list("alpha", "betabeta", "gamma"),
            "buildson", 
            "&")

The above only keeps the last two lines of the tibble, makes a double copy of 
"beta" under a new name, prints the intermediate result, continues to make 
another concatenation using the variable created earlier then prints the result:

Here is the run:

> mydf %>%
  +   tail(2) %>%
  +   my_mutate(c("beta", "beta"), "betabeta") %>%
  +   print() %>%
  +   my_mutate(list("alpha", "betabeta", "gamma"),
                +             "buildson", 
                +             "&")
# A tibble: 2 x 5
alpha beta  gamma delta betabeta
<int> <chr> <int> <chr> <chr>   
  1     2 b         2 Feb   bb      
2     3 c         1 Mar   cc      
# A tibble: 2 x 6
alpha beta  gamma delta betabeta buildson
<int> <chr> <int> <chr> <chr>    <chr>   
  1     2 b         2 Feb   bb       2&bb&2  
2     3 c         1 Mar   cc       3&cc&1  

As to how the darn function works, that was a learning experience for me to 
build using features I have not had occasion to use. If anyone remains 
interested, read on. 

The following needs newish features:

        "{colnew}" := SOMETHING

The colon-equals operator in newer R/dplyr can be sort of used in an odd way 
that allows the name of the variable to be in quotes and in brackets akin to 
the way glue() does it. The variable colnew is evaluated and substituted so the 
name used for the column is now dynamic.

The function does a paste using this:

        !!!rlang::syms(columns)

The problem is paste() wants multiple arguments and we have a single argument 
that is either a vector or another kind of vector called a list. The trick is 
to convert the vector into symbols then use "!!!" to convert something like 
'c("alpha", "beta", "gamma")' into something more like ' "alpha", "beta", 
"gamma" ' so that paste sees them as multiple arguments to concatenate in 
vector fashion.

And, the function is not polished but I am sure you can all see some of what is 
needed like checking the arguments for validity, including not having a name 
for the new column that clashes with existing column names, doing something 
sane if no columns to concatenate are offered and so on.

Just showing a different approach. The base R methods are fine.

- Avi

-----Original Message-----
From: R-help <r-help-boun...@r-project.org> On Behalf Of Micha Silver
Sent: Thursday, July 1, 2021 10:36 AM
To: R-help@r-project.org
Subject: [R] concatenating columns in data.frame

I need to create a new data.frame column as a concatenation of existing 
character columns. But the number and name of the columns to concatenate needs 
to be passed in dynamically. The code below does what I want, but seems very 
clumsy. Any suggestions how to improve?


df = data.frame("A"=sample(letters, 10), "B"=sample(letters, 10), 
"C"=sample(letters,10), "D"=sample(letters, 10))

# Which columns to concat:

use_columns = c("D", "B")


UpdateCombo = function(df, use_columns) {
     use_df = df[, use_columns]
     combo_list = lapply(1:nrow(use_df), function(r) {
     r_combo = paste(use_df[r,], collapse="_")
     return(data.frame("Combo" = r_combo))
     })
     combo = do.call(rbind, combo_list)

     names(combo) = "Combo"

     return(combo)

}


combo_col = UpdateCombo(df, use_columns)

df_combo = do.call(cbind, list(df, combo_col))


Thanks


--
Micha Silver
Ben Gurion Univ.
Sde Boker, Remote Sensing Lab
cell: +972-523-665918

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see 
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to