Sam: On Wed, Feb 8, 2012 at 12:56 PM, Sam Steingold <s...@gnu.org> wrote: > To be clear, I can do that with nested for loops: > > v <- c("A1B2","A3C4","B5","C6A7B8") > l <- strsplit(gsub("(.{2})","\\1,",v),",") > d <- data.frame(A=vector(length=4,mode="integer"), > B=vector(length=4,mode="integer"), > C=vector(length=4,mode="integer")) > > for (i in 1:length(l)) { > l1 <- l[[i]] > for (j in 1:length(l1)) { > d[[substring(l1[j],1,1)]][i] <- as.numeric(substring(l1[j],2,2)) > } > } > > > but I am afraid that handling 1,000,000 (=length(unlist(l))) strings in > a loop will kill me.
Well, that depends on how "dead" you can stand being. Try it with a 1000 entry subvector and see how bad it gets. A few extra minutes of computing time to save many more minutes of programming time seems a reasonable tradeoff. Alternatively, see ?compile to compile your solution into bytecode, which might give a few fold reduction in time (or not). The calculation could also be parallelized using the parallel package, I'm sure. -- Bert > > >> * Sam Steingold <f...@tah.bet> [2012-02-08 15:34:38 -0500]: >> >> Suppose I have a vector of strings: >> c("A1B2","A3C4","B5","C6A7B8") >> [1] "A1B2" "A3C4" "B5" "C6A7B8" >> where each string is a sequence of <column><value> pairs >> (fixed width, in this example both value and name are 1 character, in >> reality the column name is 6 chars and value is 2 digits). >> I need to convert it to a data frame: >> data.frame(A=c(1,3,0,7),B=c(2,0,5,8),C=c(0,4,0,6)) >> A B C >> 1 1 2 0 >> 2 3 0 4 >> 3 0 5 0 >> 4 7 8 6 >> >> how do I do that? >> thanks. > > -- > Sam Steingold (http://sds.podval.org/) on Ubuntu 11.10 (oneiric) X > 11.0.11004000 > http://palestinefacts.org http://iris.org.il http://camera.org > http://ffii.org http://www.PetitionOnline.com/tap12009/ > An elephant is a mouse with an operating system. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. -- Bert Gunter Genentech Nonclinical Biostatistics Internal Contact Info: Phone: 467-7374 Website: http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-biostatistics/pdb-ncb-home.htm ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.