Here is a way to process the file. You will have to add the loop, error checking, piecing multiple files together, and determination of the end of the data:
> x <- "I give below a sample of the kind of the information in the text file : + ######## + #(a lot of preceding text) + 2008-10-01 06:30:12 2 of 3 + page + + #(some lines of text - varies from file to file) + sekvens 890 + # lines of text + sNo start stop direction value + 1 70 85 up 60.2 + 3 60 90 down 71.5 + ######### + + In each of the files that I choose, I want to first go to the appropriate page number. This is the first line in the above text and the page number is 2 (from 2 of 3). The date and time preceding the page number vary from file to file, but the next line always has the word, page. + After that, I am interested in the number following the word, sekvens. Also, the table underneath." > input <- readLines(textConnection(x)) > closeAllConnections() > # find 'page' > pageNo <- grep("^page", input) > # backup one line and look for "2 of" > page2 <- grep("2 of ", input[pageNo - 1]) > # compute the start of the data and delete preceeding data > startData <- pageNo[page2] > input <- tail(input, -startData) > # find 'sekvens' > sek.indx <- grep("^sekvens", input) > # extract number after > sek.value <- sub(".*?(\\d+).*", "\\1", input[sek.indx], perl=TRUE) > # find start of table > sNo.indx <- grep("sNo", input) > # read the data (you did not say how to determine the end, so I will read the > three lines > values <- read.table(textConnection(input[sNo.indx + (0:2)]), header=TRUE) > closeAllConnections() > sek.value [1] "890" > values sNo start stop direction value 1 1 70 85 up 60.2 2 3 60 90 down 71.5 On Thu, Nov 20, 2008 at 5:18 AM, ravi <[EMAIL PROTECTED]> wrote: > Hi, > I want to extract information from a number of text files in a folder. The > files are named as : 82534.txt, 82555.txt, 8282787.txt etc. > > I give below a sample of the kind of the information in the text file : > ######## > #(a lot of preceding text) > 2008-10-01 06:30:12 2 of 3 > page > > #(some lines of text - varies from file to file) > sekvens 890 > # lines of text > sNo start stop direction value > 1 70 85 up 60.2 > 3 60 90 down 71.5 > ######### > > In each of the files that I choose, I want to first go to the appropriate > page number. This is the first line in the above text and the page number is > 2 (from 2 of 3). The date and time preceding the page number vary from file > to file, but the next line always has the word, page. > After that, I am interested in the number following the word, sekvens. Also, > the table underneath. > > Finally, I want to collect all the data in a data frame with the following > structure : > > fileno sekvens sNo start stop direction value > 82534 890 1 70 85 up 60.2 > 82534 890 3 60 90 down 71.5 > 82555 .. .. .. .. .. .. > > There are a number of topics involved here where I have almost no > familiarity. First, the use of regular expressions to specify the files that > I want from a folder. Next, how do I locate a particular section (or page) in > the text file from the description that I am interested in? Should these > files be read in their entirety first, or is it possible to directly go the > section with the relevant text? Next, how do I extract the data in the form > that I want? > > I have identified the following commands that would be useful for me here : > list.files(), readLines(), strsplit(). > I would appreciate some help in getting started here. I would certainly > benefit from a few hints. I would also appreciate it if I could get some > links to references with examples showing how similiar problems are tackled. > Thanking you, > Ravi > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.