Hi,
You may try:
?list.files()
nm1 <- list.files(pattern=".txt")
res <- lapply(nm1,function(x) {
ln1 <- readLines(x)
indx1 <- grep("DATE PROCESSED",ln1)
indx2 <- grep("[A-Z]",ln1)
ln2 <- if(max(indx2)==indx1)
ln1[1:length(ln1)] else ln1[1:(indx2[match(indx1,indx2)+1]-1)]
ln2 <- ln2[ln2!=""]
indx3 <- grepl("[A-Z]",ln2)
indx4 <- cumsum(c(TRUE,diff(which(!indx3))>1))
mat1 <- do.call(cbind, split(ln2[!indx3],indx4))
colnames(mat1) <- ln2[indx3][-1]
write.table(mat1,paste0(ln2[indx3][1],".txt"),row.names=FALSE,quote=FALSE,sep="\t")})
A.K.
I have a number of .txt files (1,200) from which I need to parse a
number of pieces of information. The files are read into R as such:
TITLE
EXAMPLE
example 1
example 2
RELATED TITLE
related title 1
DATE PROCESSED
06/12/2011
Some of the files have examples 1-4, others 1-12 and beyond.
How can I create a script that will grab the information from
the different .txt files, put it in a matrix, and spit it out in a .csv
file with appropriately named columns (the column titles are in CAPS
above, where the information that will in the column is lower case).
Thanks in advance.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.