Assuming only START fields match pat: > ## this one has more fields: how do I generalize the regular expression? > st2 = c("START text1 1 text2 2.3 text3 5", "whatever intermediate text", + "START text1 23.4 text2 3.1415 text3 6") > > pat <- "[[:alnum:]]+ +([0-9.]+)" > s <- strapply(st2, pat, c, simplify = rbind) > > pat2 <- "([[:alnum:]]+) +[0-9.]+" > colnames(s) <- strapply(st2[1], pat2, c, simplify = rbind) > s text1 text2 text3 [1,] "1" "2.3" "5" [2,] "23.4" "3.1415" "6"
If there are non-START fields that do match pat then grep out the START fields first. On Mon, Oct 26, 2009 at 9:30 AM, baptiste auguie <baptiste.aug...@googlemail.com> wrote: > Dear list, > > I have the following text to parse (originating from readLines as some > lines have unequal size), > > st = c("START text1 1 text2 2.3", "whatever intermediate text", "START > text1 23.4 text2 3.1415") > > from which I'd like to extract the lines starting with "START", and > group the subsequent fields in a data.frame in this format: > > text1 text2 > 1 2.3 > 23.4 3.1415 > > > All the lines containing "START" have the same number of fields, but > this number may vary from file to file. > > I have managed to get this minimal example work, but I am at a loss as > for handling an arbitrary number of couples (text value), > > library(gsubfn) > > ( parsed = > strapply(st, "^START +([[:alnum:]]+) +([0-9.]+) +([[:alnum:]]+) > +([0-9.]+)",c, simplify=rbind,combine=c) ) > > d = data.frame(parsed[ ,c(2,4)]) > names(d) <- apply(parsed[ ,c(1,3)], 2, unique) > d > > ## this one has more fields: how do I generalize the regular expression? > st2 = c("START text1 1 text2 2.3 text3 5", "whatever intermediate > text", "START text1 23.4 text2 3.1415 text3 6") > > Best regards, > > > Baptiste > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.