Here is a start: > # read the input file > input <- readLines('/tempxx.txt') > # process the file starting at each "Book" > result <- lapply(which(grepl("^Book", input)), function(.line){ + contents <- NULL # initialize + name <- strsplit(input[.line], '\t')[[1]][2] # book name + # process succeeding lines as long as they are "CD" + while (grepl("^CD", input[.line + 1L])){ + contents <- c(contents, strsplit(input[.line + 1L], '\t')[[1]][3]) + .line <- .line + 1L + } + c(bookname = name, contents = paste(contents, collapse = ',')) + }) > > do.call(rbind, result) bookname contents [1,] " bioR " " chapter5" [2,] " bioc++ " " workexamples, experiments" [3,] " management tools " "" >
On Wed, Nov 10, 2010 at 5:30 AM, Santosh Srinivas <santosh.srini...@gmail.com> wrote: > You could use the following to achieve your objective. To start with > > ?readLines > ?strsplit > ?for > ?ifelse > > As you try, you may receive more specific answers for the issues you come up > with. > > -----Original Message----- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On > Behalf Of karthicklakshman > Sent: 10 November 2010 15:06 > To: r-help@r-project.org > Subject: [R] Parsing txt file > > > Hello, > > I have a tab limited text document with multiple lines as mentioned below, > > > > #FILE FORMAT > #Book bookname author publisher pages > #CD name content > ############################################################################ > ######################## > ---------------------------------------------------------------------- > Book bioR xxx abc publishers 230 > CD biorexamples chapter5 > ---------------------------------------------------------------------- > Book bioc++ mmm tata publishers 400 > CD samples workexamples > CD data experiments > ---------------------------------------------------------------------- > Book management tools aaa some publishers 200 > ---------------------------------------------------------------------- > > > here the texts "book" and "CD" are present in each block. > > now, I am interested in creating a data frame with two columns, column > names="bookname" and "content". Using "grep" it is possible to pick specific > rows (grep("^book, finename")) but my expertise in programming is limited to > create the mentioned data.frame. > > Note: the rowname "book" is present in all blocks but "CD" is variable (ie., > some block has two and some with no CD row, as shown above) > > please help me in creating something like this, > > > bookname content > [1] bioR chapter5 > [2] bioc++ workexamples, experiments > [3] management tools NA > > > Thanks in advance, > karthick > > -- > View this message in context: > http://r.789695.n4.nabble.com/Parsing-txt-file-tp3035749p3035749.html > Sent from the R help mailing list archive at Nabble.com. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.