Thank you for the reprex. However your specification was too vague for me to know exactly what your data are like, so I tried to assume the most general possibility, with the consequence that I may be giving you an answer to the wrong question. Hopefully, you can adjust as needed to get what you want.
I need also warn you that I am nearly certain there are more elegant, cleverer, faster ways to do this. I just used simple tools. So you may wish to wait a bit to see whether others can improve on my attempt. First of all, I assumed the "a2/a3" in S5 in d1 is a typo and it should be "a2|a3". If it is is not a typo then substitute "\\||\\/" for "\\|" in the strsplit function in the code that follows. Secondly, I assumed that your identifiers, "a1" for example, could occur more than 1 time in your data. If the only possibilities are 0 or 1 times, then the code I provided --in particular the last sapply-- is too complicated. A faster approach in that case might be to use R's outer() function; I leave that as an exercise for you or someone else to help you with if so. Here is my code for your reprex: getall<- function(x){ ul <-unlist(strsplit(x,"\\|")) ul[ul != "w"] } allvals <- lapply(d1, getall) uneeks <- sort(unique(unlist(allvals))) sapply(allvals, function(x)table(factor(x, levels = uneeks))) ## which gives > sapply(allvals, function(x)table(factor(x, levels = uneeks))) S1 S2 S3 S4 S5 a1 1 0 0 0 0 a2 1 0 1 0 1 a3 0 0 0 0 1 b1 1 1 1 0 0 b3 1 0 1 0 0 b4 0 0 1 1 0 c1 0 0 1 0 0 c2 0 1 0 0 0 c4 0 0 1 1 0 Bert Bert Gunter "The trouble with having an open mind is that people keep coming along and sticking things into it." -- Opus (aka Berkeley Breathed in his "Bloom County" comic strip ) On Wed, May 26, 2021 at 2:18 PM Adrian Johnson <oriolebaltim...@gmail.com> wrote: > Hello, > > I am trying to convert a df (given below as d1) into df2 (given below as > res). > > I tried using loops for each row. I cannot get it right. Moreover the df > is 250000 x 500 in dimension and I cannot get it to work. > > Could anyone help me here please. > > Thanks. > Adrian. > > d1 <- > structure(list(S1 = c("a1|a2", "b1|b3", "w"), S2 = c("w", "b1", > "c2"), S3 = c("a2", "b3|b4|b1", "c1|c4"), S4 = c("w", "b4", "c4" > ), S5 = c("a2/a3", "w", "w")), class = "data.frame", row.names = c("A", > "B", "C")) > > res <- > structure(list(S1 = c(1L, 1L, 0L, 1L, 0L, 1L, 0L, 0L, 0L, 0L), > S2 = c(0L, 0L, 0L, 1L, 0L, 0L, 0L, 0L, 1L, 0L), S3 = c(0L, > 1L, 0L, 1L, 0L, 1L, 1L, 1L, 0L, 1L), S4 = c(0L, 0L, 0L, 0L, > 0L, 0L, 1L, 0L, 0L, 1L), S5 = c(0L, 1L, 1L, 0L, 0L, 0L, 0L, > 0L, 0L, 0L)), class = "data.frame", row.names = c("a1", "a2", > "a3", "b1", "b2", "b3", "b4", "c1", "c2", "c4")) > > [[alternative HTML version deleted]] > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.