Re: [R] applying strsplit to a whole column

David Winsemius Wed, 04 Aug 2010 12:47:29 -0700


On Aug 4, 2010, at 3:40 PM, Dimitri Liakhovitski wrote:

Thanks a lot, David.
It works perfectly. Of course, lapply is also a loop!

So, your method is:

z<-data.frame(nam1=c("bbb..aba","ccc..abb","ddd..abc","eee..abd"),stringsAsFactors=FALSE)

z$nam2<-unlist(lapply( strsplit(z[[1]],split="\\.."), "[", 1))
z$nam3<-unlist(lapply( strsplit(z[[1]],split="\\.."), "[", 2))


Unless you want to use the gsub method I later offered.


And using the new package "stringr" (thank you for sharing!):
y<-data.frame(nam1=c("aaa..aba","bbb..abb","ccc..abc","ddd..abd"),
stringsAsFactors=FALSE)
library(stringr)
y$nam2<-as.data.frame(str_split_fixed(y$nam1, "\\..", 2))[[1]]
y$nam3<-as.data.frame(str_split_fixed(y$nam1, "\\..", 2))[[2]]
(y)

One question - what exactly does the square bracket in your lapply
code mean? Looks like a shortcut - I've not seen it before.
lapply( strsplit(z[[1]],split="\\.."), "[", 1)

It is just the Extract function applied with an argument of 1 to eachsuccessive member of the list, so it is simply the series:

> strsplit(x[[1]],split="\\..")[[1]][1]

[1] "bbb"
> strsplit(x[[1]],split="\\..")[[2]][1]
[1] "ccc"
> strsplit(x[[1]],split="\\..")[[3]][1]
[1] "ddd"


Thank you!
Dimitri

On Wed, Aug 4, 2010 at 3:31 PM, David Winsemius <dwinsem...@comcast.net> wrote:


On Aug 4, 2010, at 3:03 PM, Dimitri Liakhovitski wrote:

I am sorry, someone said that strsplit automatically works on a
column. How exactly does it work?

For example, if I want to grab just the first (or the second) partof

the string in nam1 that should be split based on ".."
x<-data.frame(nam1=c("bbb..aba","ccc..abb","ddd..abc","eee..abd"),
stringsAsFactors=FALSE)
str(x)
strsplit(x[[1]],split="\\..")
str(strsplit(x[[1]],split="\\.."))

I am getting a list - hence, it looks like I have to go in aloop...?


lapply( strsplit(x[[1]],split="\\.."), "[", 1)

[[1]]
[1] "bbb"

[[2]]
[1] "ccc"

[[3]]
[1] "ddd"

[[4]]
[1] "eee"

lapply( strsplit(x[[1]],split="\\.."), "[", 2)

[[1]]
[1] "aba"

[[2]]
[1] "abb"

[[3]]
[1] "abc"

[[4]]
[1] "abd"

unlist(lapply( strsplit(x[[1]],split="\\.."), "[", 2) )

[1] "aba" "abb" "abc" "abd"

unlist(lapply( strsplit(x[[1]],split="\\.."), "[", 1) )

[1] "bbb" "ccc" "ddd" "eee"

Thank you!
Dimitri


On Wed, Aug 4, 2010 at 2:39 PM, Dimitri Liakhovitski
<dimitri.liakhovit...@gmail.com> wrote:
Thank you very much, everyone!
Dimitri
On Wed, Aug 4, 2010 at 2:10 PM, David Winsemius <dwinsem...@comcast.net>
wrote:
On Aug 4, 2010, at 1:42 PM, Dimitri Liakhovitski wrote:
I am sorry, I'd like to split my column ("names") such that allthebeginning of a string ("X..") is gone and only the rest of thetext is
left.
I could not tell whether it was the string "X.." or the pattern"X.."
that
was your goal for matching and removal.
x<-data.frame(names=c("X..aba","X..abb","X..abc","X..abd"))
x$names<-as.character(x$names)
a) Instead of "names" which is heavily used function name, usesomething
more specific. Otherwise you get:
names(x)
"names"  # and thereby avoid list comments about canines.
b) Instead of coercing a character vector back to a charactervector,
use
stringsAsFactors = FALSE.
x<-data.frame(nam1=c("X..aba","X..abb","X..abc","X..abd"),
stringsAsFactors=FALSE)
#Thus is the pattern version:
x$nam1 <- gsub("X..",'', x$nam1)
x
 nam1
1   aba
2   abb
3   abc
4   abd

This is the string version:
x<-data.frame(nam1=c("X......aba","X.y.abb","X..abc","X..abd"),
stringsAsFactors=FALSE)
 x$nam1 <- gsub("X\\.+",'', x$nam1)
x
 nam1
1   aba
2 y.abb
3   abc
4   abd
(x)
str(x)
Can't figure out how to apply strsplit in this situation -withoutusing a loop. I hope it's possible to do it without a loop - isit?
--


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] applying strsplit to a whole column

Reply via email to