On May 25, 7:23 am, "analys...@hotmail.com" <analys...@hotmail.com> wrote: > On May 25, 4:46 am, Stefan <ste...@inizio.se> wrote: > > > > > > > analyst41 <at> hotmail.com <analyst41 <at> hotmail.com> writes: > > > > I have a data set that has some comma separated strings in each row. > > > I'd like to create a vector consisting of all distinct strings that > > > occur. The number of strings in each row may vary. > > > > Thanks for any help. > > > # > > # > > # Some data: > > d <- data.frame(id = 1:5, > > text = c('one,two', > > 'two,three,three,four', > > 'one,three,three,five', > > 'five,five,five,five', > > 'one,two,three'), > > stringsAsFactors = FALSE > > ) > > # > > # > > # A function. I'm not a black belt at this, so there > > # are probably a more efficient way of writing this. > > fcn <- function(x){ > > a <- strsplit(x, ',') # Split the string by comma > > unique(a[[1]]) # Uniquify the vector} > > > # > > # > > # Use the function with sapply. > > sapply(d[,2], fcn) > > Thanks - but this solves a slightly different problem - it outputs the > unique values in each row. I want a list of the unique values in the > whole data frame. > > In this case the output should be a single vector = > c("one","two","three","four","five"). >
Actually I figured it out after I posted this: > levels(as.factor(unlist(strsplit(d$text,',')))) [1] "five" "four" "one" "three" "two" Thanks for pointing me the right way. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.