> From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Sarah Goslee > Sent: Thursday, May 07, 2009 3:00 PM > To: r-help > Subject: [R] paste with apply, spaces and NA > > Hello everyone, > > I've come up with a problem with using paste() inside apply() that I > can't seem to solve. > Briefly, if I'm using paste to collapse the rows of a data frame, AND > the data frame > contains strings with spaces, AND there are NA values in subsequent > columns, then > paste() introduces spaces. This only happens with that > particular combination of > data values and commands. I have a workaround - replacing NA > with "NA" - but > this seems odd. > > Thanks for any thoughts, > Sarah
Do you get similar results rif you use 10 instead of NA in your examples, with more spaces if you use 10000? I think this has to do with apply's call to as.matrix(X) when X is a data.frame with mixed numeric and character or factor columns. It calls format() on each numeric column to convert its elements to strings with the same number of characters in each string. apply() rarely gives you what you want on such mixed data.frames. Pasting the columns without apply is faster and will give the correct results. I find it convenient to use do.call here: > do.call(`paste`, c(unname(test3),list(sep=","))) [1] "1,a,a b,2" "1,a,a b,2" "1,a,a b,2" "1,a,a b,NA" "1,a,a b,2" (unname(as.list(test3)) would be a bit more legal. The unname would be required if one of the column names was 'sep' or 'collapse'.) Bill Dunlap TIBCO Software Inc - Spotfire Division wdunlap tibco.com > > > R --vanilla > # R version 2.9.0 (2009-04-17) > # Fedora Core 10 > > > test1 <- data.frame(A = rep(1, 5), B = rep("a", 5), C = > rep("a b", 5), D = rep(2, 5), stringsAsFactors=FALSE) > > > > # has an NA value in a column before the column containing > strings with spaces > > test2 <- test1 > > test2$B[4] <- NA > > > > # has an NA value in a column after the column containing > strings with spaces > > test3 <- test1 > > test3$D[4] <- NA > > > str(test1) > 'data.frame': 5 obs. of 4 variables: > $ A: num 1 1 1 1 1 > $ B: chr "a" "a" "a" "a" ... > $ C: chr "a b" "a b" "a b" "a b" ... > $ D: num 2 2 2 2 2 > > str(test2) > 'data.frame': 5 obs. of 4 variables: > $ A: num 1 1 1 1 1 > $ B: chr "a" "a" "a" NA ... > $ C: chr "a b" "a b" "a b" "a b" ... > $ D: num 2 2 2 2 2 > > str(test3) > 'data.frame': 5 obs. of 4 variables: > $ A: num 1 1 1 1 1 > $ B: chr "a" "a" "a" "a" ... > $ C: chr "a b" "a b" "a b" "a b" ... > $ D: num 2 2 2 NA 2 > > > # works as expected > > apply(test1, 1, paste, collapse=",") > [1] "1,a,a b,2" "1,a,a b,2" "1,a,a b,2" "1,a,a b,2" "1,a,a b,2" > > > # works as expected > > # does NOT add spaces to the column with the NA value > > apply(test2, 1, paste, collapse=",") > [1] "1,a,a b,2" "1,a,a b,2" "1,a,a b,2" "1,NA,a b,2" "1,a,a b,2" > > > # introduces spaces in the column with the NA value > > # only if that column is after a column that contains > strings with spaces > > apply(test3, 1, paste, collapse=",") > [1] "1,a,a b, 2" "1,a,a b, 2" "1,a,a b, 2" "1,a,a b,NA" "1,a,a b, 2" > > > # pasting the columns together manually works as expected > > paste(test3$A, test3$B, test3$C, test3$D, sep=",") > [1] "1,a,a b,2" "1,a,a b,2" "1,a,a b,2" "1,a,a b,NA" "1,a,a b,2" > > > # pasting a single row works as expected > > paste(test3[3,], collapse=",") > [1] "1,a,a b,2" > > ## workaround > > test3[is.na(test3)] <- "NA" > > apply(test3, 1, paste, sep="", collapse=",") > [1] "1,a,a b,2" "1,a,a b,2" "1,a,a b,2" "1,a,a b,NA" "1,a,a b,2" > > > > -- > Sarah Goslee > http://www.functionaldiversity.org > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.