Hello,

I have a tab limited text document with multiple lines as mentioned below,



#FILE FORMAT
#Book   bookname        author  publisher       pages
#CD     name    content
####################################################################################################
----------------------------------------------------------------------
Book    bioR    xxx     abc publishers  230
CD      biorexamples    chapter5
----------------------------------------------------------------------
Book    bioc++  mmm     tata publishers 400
CD      samples workexamples
CD      data    experiments
----------------------------------------------------------------------
Book    management tools        aaa     some publishers 200
----------------------------------------------------------------------


here the texts "book" and "CD" are present in each block.

now, I am interested in creating a data frame with two columns, column
names="bookname" and "content". Using "grep" it is possible to pick specific
rows (grep("^book, finename")) but my expertise in programming is limited to
create the mentioned data.frame.

Note: the rowname "book" is present in all blocks but "CD" is variable (ie.,
some block has two and some with no CD row, as shown above)

please help me in creating something like this,


     bookname   content
[1] bioR           chapter5
[2] bioc++        workexamples, experiments
[3] management tools   NA


Thanks in advance,
karthick
 
-- 
View this message in context: 
http://r.789695.n4.nabble.com/Parsing-txt-file-tp3035749p3035749.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to