When compute time is important it often helps to loop over columns instead of over rows (assuming there are fewer columns than rows, the usual case). E.g., putting your code into a function f0 and the column-looping version into f1:
f0 <- function(v) { n <- length(v) d <- data.frame(A=vector(length=n,mode="integer"), B=vector(length=n,mode="integer"), C=vector(length=n,mode="integer")) l <- strsplit(gsub("(.{2})","\\1,",v),",") for (i in seq_along(l)) { l1 <- l[[i]] for (j in seq_along(l1)) { d[[substring(l1[j],1,1)]][i] <- as.integer(substring(l1[j],2,2)) } } d } f1 <- function(v) { n <- length(v) letters <- c("A", "B", "C") names(letters) <- letters data.frame(lapply(letters, function(letter) { retval <- integer(n) hasLetter <- grepl(letter, v) retval[hasLetter] <- as.integer( gsub(sprintf("^.*%s([[:digit:]]+).*$", letter), "\\1", v[hasLetter])) retval })) } I get the following times for a 10,000 long v like yours (and the results are the same): > vv <- rep(v, len=10000) > system.time(r1 <- f1(vv)) user system elapsed 0.13 0.00 0.14 > system.time(r0 <- f0(vv)) user system elapsed 10.75 0.19 10.99 > all.equal(r0, r1) [1] TRUE If I double the length of v, your code takes 53 seconds (5x slower, quadratic behavior?) while mine takes 0.17 (less than linear, suggesting that its time is still dominated by the function call overhead for such small input vectors). Bill Dunlap Spotfire, TIBCO Software wdunlap tibco.com > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of Sam Steingold > Sent: Wednesday, February 08, 2012 12:56 PM > To: r-help@r-project.org > Subject: Re: [R] "unsparse" a vector > > To be clear, I can do that with nested for loops: > > v <- c("A1B2","A3C4","B5","C6A7B8") > l <- strsplit(gsub("(.{2})","\\1,",v),",") > d <- data.frame(A=vector(length=4,mode="integer"), > B=vector(length=4,mode="integer"), > C=vector(length=4,mode="integer")) > > for (i in 1:length(l)) { > l1 <- l[[i]] > for (j in 1:length(l1)) { > d[[substring(l1[j],1,1)]][i] <- as.numeric(substring(l1[j],2,2)) > } > } > > > but I am afraid that handling 1,000,000 (=length(unlist(l))) strings in > a loop will kill me. > > > > * Sam Steingold <f...@tah.bet> [2012-02-08 15:34:38 -0500]: > > > > Suppose I have a vector of strings: > > c("A1B2","A3C4","B5","C6A7B8") > > [1] "A1B2" "A3C4" "B5" "C6A7B8" > > where each string is a sequence of <column><value> pairs > > (fixed width, in this example both value and name are 1 character, in > > reality the column name is 6 chars and value is 2 digits). > > I need to convert it to a data frame: > > data.frame(A=c(1,3,0,7),B=c(2,0,5,8),C=c(0,4,0,6)) > > A B C > > 1 1 2 0 > > 2 3 0 4 > > 3 0 5 0 > > 4 7 8 6 > > > > how do I do that? > > thanks. > > -- > Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X > 11.0.11004000 > http://palestinefacts.org http://iris.org.il http://camera.org > http://ffii.org http://www.PetitionOnline.com/tap12009/ > An elephant is a mouse with an operating system. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.