Just a quick followup to the previous post using 4M entries: (20 seconds would seem like a reasonable time for the operation)
> ip <- "123.456.789.321" ## example data > df <- data.frame(ip = rep(ip, 4e6), stringsAsFactors=FALSE) > system.time(x <- strsplit(df$ip, '\\.')) user system elapsed 19.47 0.12 20.86 > str(x) List of 4000000 $ : chr [1:4] "123" "456" "789" "321" $ : chr [1:4] "123" "456" "789" "321" $ : chr [1:4] "123" "456" "789" "321" $ : chr [1:4] "123" "456" "789" "321" $ : chr [1:4] "123" "456" "789" "321" $ : chr [1:4] "123" "456" "789" "321" $ : chr [1:4] "123" "456" "789" "321" $ : chr [1:4] "123" "456" "789" "321" $ : chr [1:4] "123" "456" "789" "321" On Sun, Jan 8, 2012 at 8:11 AM, Enrico Schumann <enricoschum...@yahoo.de> wrote: > > Hi Andrew, > > you can use strsplit for a character vector; you do not have to call it for > every element data$ComputerName[i]. > > If I understand correctly, maybe something like this helps > >> ip <- "123.456.789.321" ## example data >> df <- data.frame(ip = rep(ip, 9), stringsAsFactors=FALSE) >> df > ip > 1 123.456.789.321 > 2 123.456.789.321 > 3 123.456.789.321 > 4 123.456.789.321 > 5 123.456.789.321 > 6 123.456.789.321 > 7 123.456.789.321 > 8 123.456.789.321 > 9 123.456.789.321 > >> >> res <- unlist(strsplit(df[["ip"]], "\\.")) >> ii <- seq(1, nrow(df)*4, by = 4) >> res[ii] ## A > [1] "123" "123" "123" "123" "123" "123" "123" > [8] "123" "123" >> res[ii+1] ## B > [1] "456" "456" "456" "456" "456" "456" "456" > [8] "456" "456" >> res[ii+2] ## C > [1] "789" "789" "789" "789" "789" "789" "789" > [8] "789" "789" >> res[ii+3] ## D > [1] "321" "321" "321" "321" "321" "321" "321" > [8] "321" "321" > > > Regards, > Enrico > > > Am 08.01.2012 11:06, schrieb Andrew Roberts: > >> Folks, >> >> I have a data frame with 4861469 rows that contains an ip address >> xxx.xxx.xxx.xxx as one of the columns. I want to assign a site to each >> row based on IP ranges. To do this I have a function to split the ip >> address as character into class A,B,C and D components. It works but is >> horribly inefficient in terms of speed. I can't quite see how one of the >> l/s/m/t/apply functions could be brought to bear on the problem. Does >> anyone have any thoughts? >> >> for(i in 1:4861469) >> { >> lst<-unlist(strsplit(data$ComputerName[i], "\\.")) >> data$IPA[i]<-lst[[1]] >> data$IPB[i]<-lst[[2]] >> data$IPC[i]<-lst[[3]] >> data$IPD[i]<-lst[[4]] >> rm(lst) >> } >> >> Andrew >> >> Andrew Roberts >> Children's Orthopaedic Surgeon >> RJAH, Oswestry, UK >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > -- > Enrico Schumann > Lucerne, Switzerland > http://nmof.net/ > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Jim Holtman Data Munger Guru What is the problem that you are trying to solve? Tell me what you want to do, not how you want to do it. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.