Hi,
You may try:
?list.files()
nm1 <- list.files(pattern=".txt")

res <- lapply(nm1,function(x) {
                                ln1 <- readLines(x)
                                 indx1 <- grep("DATE PROCESSED",ln1)
                                 indx2 <- grep("[A-Z]",ln1)
                                 ln2 <- if(max(indx2)==indx1) 
ln1[1:length(ln1)] else ln1[1:(indx2[match(indx1,indx2)+1]-1)]
                                 ln2 <- ln2[ln2!=""]
                                 indx3 <- grepl("[A-Z]",ln2)
                                 indx4 <- cumsum(c(TRUE,diff(which(!indx3))>1))
                                mat1 <- do.call(cbind, split(ln2[!indx3],indx4))
                                 colnames(mat1) <-  ln2[indx3][-1]
                                 
write.table(mat1,paste0(ln2[indx3][1],".txt"),row.names=FALSE,quote=FALSE,sep="\t")})



A.K.


I have a number of .txt files (1,200) from which I need to parse a 
number of pieces of information.  The files are read into R as such: 

TITLE 
EXAMPLE 
example 1 
example 2 
RELATED TITLE 
related title 1 
DATE PROCESSED 
06/12/2011 

Some of the files have examples 1-4, others 1-12 and beyond.   

How can I create a script that will grab the information from 
the different .txt files, put it in a matrix, and spit it out in a .csv 
file with appropriately named columns (the column titles are in CAPS 
above, where the information that will in the column is lower case). 

Thanks in advance.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to