On 01/08/2012 11:37 AM, jim holtman wrote:
Just a quick followup to the previous post using 4M entries: (20
seconds would seem like a reasonable time for the operation)
ip<- "123.456.789.321" ## example data
df<- data.frame(ip = rep(ip, 4e6), stringsAsFactors=FALSE)
system.time(x<- strsplit(df$ip, '\\.'))
or if the IP addresses really are repeated multiple times
df <- data.frame(ip=rep(ip, 4e6)) ## df$ip is a factor
> system.time(x <- local({
+ ip0 <- strsplit(levels(df$ip), "\\.")
+ ip0[match(df$ip, levels(df$ip))]
+ }))
user system elapsed
0.352 0.000 0.352
although the speed-up in the example is best-case.
Martin
user system elapsed
19.47 0.12 20.86
str(x)
List of 4000000
$ : chr [1:4] "123" "456" "789" "321"
$ : chr [1:4] "123" "456" "789" "321"
$ : chr [1:4] "123" "456" "789" "321"
$ : chr [1:4] "123" "456" "789" "321"
$ : chr [1:4] "123" "456" "789" "321"
$ : chr [1:4] "123" "456" "789" "321"
$ : chr [1:4] "123" "456" "789" "321"
$ : chr [1:4] "123" "456" "789" "321"
$ : chr [1:4] "123" "456" "789" "321"
On Sun, Jan 8, 2012 at 8:11 AM, Enrico Schumann<enricoschum...@yahoo.de> wrote:
Hi Andrew,
you can use strsplit for a character vector; you do not have to call it for
every element data$ComputerName[i].
If I understand correctly, maybe something like this helps
ip<- "123.456.789.321" ## example data
df<- data.frame(ip = rep(ip, 9), stringsAsFactors=FALSE)
df
ip
1 123.456.789.321
2 123.456.789.321
3 123.456.789.321
4 123.456.789.321
5 123.456.789.321
6 123.456.789.321
7 123.456.789.321
8 123.456.789.321
9 123.456.789.321
res<- unlist(strsplit(df[["ip"]], "\\."))
ii<- seq(1, nrow(df)*4, by = 4)
res[ii] ## A
[1] "123" "123" "123" "123" "123" "123" "123"
[8] "123" "123"
res[ii+1] ## B
[1] "456" "456" "456" "456" "456" "456" "456"
[8] "456" "456"
res[ii+2] ## C
[1] "789" "789" "789" "789" "789" "789" "789"
[8] "789" "789"
res[ii+3] ## D
[1] "321" "321" "321" "321" "321" "321" "321"
[8] "321" "321"
Regards,
Enrico
Am 08.01.2012 11:06, schrieb Andrew Roberts:
Folks,
I have a data frame with 4861469 rows that contains an ip address
xxx.xxx.xxx.xxx as one of the columns. I want to assign a site to each
row based on IP ranges. To do this I have a function to split the ip
address as character into class A,B,C and D components. It works but is
horribly inefficient in terms of speed. I can't quite see how one of the
l/s/m/t/apply functions could be brought to bear on the problem. Does
anyone have any thoughts?
for(i in 1:4861469)
{
lst<-unlist(strsplit(data$ComputerName[i], "\\."))
data$IPA[i]<-lst[[1]]
data$IPB[i]<-lst[[2]]
data$IPC[i]<-lst[[3]]
data$IPD[i]<-lst[[4]]
rm(lst)
}
Andrew
Andrew Roberts
Children's Orthopaedic Surgeon
RJAH, Oswestry, UK
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Enrico Schumann
Lucerne, Switzerland
http://nmof.net/
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Computational Biology
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N. PO Box 19024 Seattle, WA 98109
Location: M1-B861
Telephone: 206 667-2793
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.