Re: [R] splitting strings effriciently

Martin Morgan Sun, 08 Jan 2012 14:27:47 -0800

On 01/08/2012 11:37 AM, jim holtman wrote:

Just a quick followup to the previous post using 4M entries:  (20
seconds would seem like a reasonable time for the operation)

  ip<- "123.456.789.321"  ## example data
  df<- data.frame(ip = rep(ip, 4e6), stringsAsFactors=FALSE)
  system.time(x<- strsplit(df$ip, '\\.'))


or if the IP addresses really are repeated multiple times

df <- data.frame(ip=rep(ip, 4e6))  ## df$ip is a factor

> system.time(x <- local({
+     ip0 <- strsplit(levels(df$ip), "\\.")
+     ip0[match(df$ip, levels(df$ip))]
+ }))
   user  system elapsed
  0.352   0.000   0.352

although the speed-up in the example is best-case.

Martin

    user  system elapsed
   19.47    0.12   20.86

  str(x)

List of 4000000
  $ : chr [1:4] "123" "456" "789" "321"
  $ : chr [1:4] "123" "456" "789" "321"
  $ : chr [1:4] "123" "456" "789" "321"
  $ : chr [1:4] "123" "456" "789" "321"
  $ : chr [1:4] "123" "456" "789" "321"
  $ : chr [1:4] "123" "456" "789" "321"
  $ : chr [1:4] "123" "456" "789" "321"
  $ : chr [1:4] "123" "456" "789" "321"
  $ : chr [1:4] "123" "456" "789" "321"




On Sun, Jan 8, 2012 at 8:11 AM, Enrico Schumann<enricoschum...@yahoo.de>  wrote:


Hi Andrew,

you can use strsplit for a character vector; you do not have to call it for
every element data$ComputerName[i].

If I understand correctly, maybe something like this helps

ip<- "123.456.789.321"  ## example data
df<- data.frame(ip = rep(ip, 9), stringsAsFactors=FALSE)
df

               ip
1 123.456.789.321
2 123.456.789.321
3 123.456.789.321
4 123.456.789.321
5 123.456.789.321
6 123.456.789.321
7 123.456.789.321
8 123.456.789.321
9 123.456.789.321


res<- unlist(strsplit(df[["ip"]], "\\."))
ii<- seq(1, nrow(df)*4, by = 4)
res[ii]   ## A

[1] "123" "123" "123" "123" "123" "123" "123"
[8] "123" "123"

res[ii+1] ## B

[1] "456" "456" "456" "456" "456" "456" "456"
[8] "456" "456"

res[ii+2] ## C

[1] "789" "789" "789" "789" "789" "789" "789"
[8] "789" "789"

res[ii+3] ## D

[1] "321" "321" "321" "321" "321" "321" "321"
[8] "321" "321"


Regards,
Enrico


Am 08.01.2012 11:06, schrieb Andrew Roberts:

Folks,

I have a data frame with 4861469 rows that contains an ip address
xxx.xxx.xxx.xxx as one of the columns. I want to assign a site to each
row based on IP ranges. To do this I have a function to split the ip
address as character into class A,B,C and D components. It works but is
horribly inefficient in terms of speed. I can't quite see how one of the
l/s/m/t/apply functions could be brought to bear on the problem. Does
anyone have any thoughts?

for(i in 1:4861469)
    {
    lst<-unlist(strsplit(data$ComputerName[i], "\\."))
    data$IPA[i]<-lst[[1]]
    data$IPB[i]<-lst[[2]]
    data$IPC[i]<-lst[[3]]
    data$IPD[i]<-lst[[4]]
    rm(lst)
    }

Andrew

Andrew Roberts
Children's Orthopaedic Surgeon
RJAH, Oswestry, UK

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


--
Enrico Schumann
Lucerne, Switzerland
http://nmof.net/


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109

Location: M1-B861
Telephone: 206 667-2793

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] splitting strings effriciently

Reply via email to