Is this what you are looking for: > input <- readLines(textConnection(" 2010 10 01 00 + *82599 -35.25 -5.91 52 1* + 1008.0 -9999 115 3.1 298.6 294.6 64 + 2010 10 01 00 + *83649 -40.28 -20.26 4 7 + *1011.0 -9999 0 0.0 298.4 296.1 64 + 1000.0 96 40 5.7 297.9 295.1 32 + 925.0 782 325 3.1 295.4 294.1 32 + 850.0 1520 270 4.1 293.8 289.4 32 + 700.0 3171 240 8.7 284.1 279.1 32 + 500.0 5890 275 8.2 266.2 262.9 32 + 400.0 7600 335 9.8 255.4 242.4 32")) > closeAllConnections() > # remove the "*" since they seem to be inconsistent > input <- gsub("\\*|^ ", "", input) > > date <- NULL # hold the date > station <- NULL # hold the station ID > # now parse each line > # length = 4 => date > # length = 5 => station id > # length = 7 => data > result <- lapply(input, function(.line){ + x <- as.numeric(strsplit(.line, '[[:space:]]+')[[1]]) + if (length(x) == 4) date <<- x[1] * 1000000 + x[2] * 10000 + + x[3] * 100 + x[4] + else if (length(x) == 5) station <<- x[1] + else if (length(x) == 7) return(data.frame(date = date, + station = station, + x[1], x[2], x[3], x[4], x[5], x[6], x[7])) + else cat("invalid line:", .line, '\n') + return(NULL) + }) > > # combine into single dataframe > do.call(rbind, result) date station x.1. x.2. x.3. x.4. x.5. x.6. x.7. 1 2010100100 82599 1008 -9999 115 3.1 298.6 294.6 64 2 2010100100 83649 1011 -9999 0 0.0 298.4 296.1 64 3 2010100100 83649 1000 96 40 5.7 297.9 295.1 32 4 2010100100 83649 925 782 325 3.1 295.4 294.1 32 5 2010100100 83649 850 1520 270 4.1 293.8 289.4 32 6 2010100100 83649 700 3171 240 8.7 284.1 279.1 32 7 2010100100 83649 500 5890 275 8.2 266.2 262.9 32 8 2010100100 83649 400 7600 335 9.8 255.4 242.4 32 >
On Mon, Oct 4, 2010 at 9:52 PM, Nilza BARROS <nilzabar...@gmail.com> wrote: > Sorry, guys > I couldn`t explain what I really wanted. > I have a file with many station and many information for each one. > I need identified the line where the station information start. After that > I`d like to store that data (related to the station) so as to it could be > work in separate way. > > If I was using another language as Fortran , I would save the data in a > vector. > But in R I don`t know how to do this :( > > ====David`s Questions=========== > > *my.data<-file("d2010100100.txt",open="rt") > indata <- readLines(my.data, n=20000) > i<-grep("^[837]",indata) #station number* > ** > *That would give you the line numbers for any line that had an 8 , _or_ a 3, > _or_ a 7 as its first digit. Was that your intent? My guess is that you did > not really want to use the square braces and should have been using "^837".* > *?regex # Paragraph starting "A character class .... "* > *## In fact I am trying to find out the station in the file. As the > Brazilian station start with `83` I intend to picked them up.* > ** > ** > *my.data2<-read.table("d2010100100.txt",fill=TRUE,nrows=20000) > stn<- my.data2$V1[i]* > ** > *- That would give you the first column values for the lines you earlier > selected*. > ## It gave me all the station that started with `873`. I did it just because > I needed to know how many station there was in the file. But it is not > helping me to solve the problem. > Thanks in Advanced > Nilza Barros > On Sun, Oct 3, 2010 at 11:05 PM, David Winsemius > <dwinsem...@comcast.net>wrote: > >> >> On Oct 3, 2010, at 9:40 PM, Nilza BARROS wrote: >> >> Hi, Michael >>> Thank you for your help. I have already done what you said. >>> But I am still facing problems to deal with my data. >>> >>> I need to split the data according to station.. >>> >>> I was able to identify where the station information start using: >>> >>> my.data<-file("d2010100100.txt",open="rt") >>> indata <- readLines(my.data, n=20000) >>> i<-grep("^[837]",indata) #station number >>> >> >> That would give you the line numbers for any line that had an 8 , _or_ a 3, >> _or_ a 7 as its first digit. Was that your intent? My guess is that you did >> not really want to use the square braces and should have been using "^837". >> >> ?regex # Paragraph starting "A character class .... " >> >> >> my.data2<-read.table("d2010100100.txt",fill=TRUE,nrows=20000) >>> stn<- my.data2$V1[i] >>> >> >> That would give you the first column values for the lines you earlier >> selected. >> >> >> ==== >>> >> >> This does not look like what I would expect as a value for stn. Is that >> what you wanted us to think this was? >> >> -- >> David. >> >> >> >> 2010 10 01 00 >>> *82599 -35.25 -5.91 52 1 >>> * 1008.0 -9999 115 3.1 298.6 294.6 64 >>> 2010 10 01 00 >>> *83649 -40.28 -20.26 4 7* >>> 1011.0 -9999 0 0.0 298.4 296.1 64 >>> 1000.0 96 40 5.7 297.9 295.1 32 >>> 925.0 782 325 3.1 295.4 294.1 32 >>> 850.0 1520 270 4.1 293.8 289.4 32 >>> 700.0 3171 240 8.7 284.1 279.1 32 >>> 500.0 5890 275 8.2 266.2 262.9 32 >>> 400.0 7600 335 9.8 255.4 242.4 32 >>> =========== >>> As you can see in the data above the line show the number of leves (or >>> lines) for each station. >>> I need to catch these lines so as to be able to feed my database. >>> By the way, I didn't understand the regular expression you've used. I've >>> tried to run it but it did not work. >>> >>> Hope you can help me! >>> Best Regards, >>> Nilza >>> >>> >>> >>> >>> >>> On Sun, Oct 3, 2010 at 2:18 AM, Michael Bedward >>> <michael.bedw...@gmail.com>wrote: >>> >>> Hello Nilza, >>>> >>>> If your file is small you can read it into a character vector like this: >>>> >>>> indata <- readLines("foo.dat") >>>> >>>> If your file is very big you can read it in batches like this... >>>> >>>> MAXRECS <- 1000 # for example >>>> fcon <- file("foo.dat", open="r") >>>> indata <- readLines(fcon, n=MAXRECS) >>>> >>>> The number of lines read will be given by length(indata). >>>> >>>> You can check to see if the end of the file has been read yet with: >>>> isIncomplete( fcon ) >>>> >>>> If a leading "*" character is a flag for the start of a station data >>>> block you can find this in the indata vector with grepl... >>>> >>>> start.pos <- which(indata, grepl("^\\s*\\*", indata) >>>> >>>> When you're finished reading the file... >>>> close(fcon) >>>> >>>> Hope this helps, >>>> >>>> Michael >>>> >>>> >>>> On 3 October 2010 13:31, Nilza BARROS <nilzabar...@gmail.com> wrote: >>>> >>>>> Dear R-users, >>>>> >>>>> I would like to know how could I read a file with different lines >>>>> >>>> lengths. >>>> >>>>> I need read this file and create an output to feed my database. >>>>> So after reading I'll need create an output like this >>>>> >>>>> "INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20100910,837460, >>>>> >>>> 39,390)" >>>> >>>>> >>>>> I mean, each line should be read. But I don`t how to do this when these >>>>> lines have different lengths >>>>> >>>>> I really appreciate any help. >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> >>>>> ====Below the file that should be read =========== >>>>> >>>>> >>>>> *2010 10 01 00 >>>>> 83746 -43.25 -22.81 6 51* >>>>> 1012.0 -9999 320 1.5 299.1 294.4 64 >>>>> 1000.0 114 250 4.1 298.4 294.8 32 >>>>> 925.0 797 0 0.0 293.6 292.9 32 >>>>> 850.0 1524 195 3.1 289.6 288.9 32 >>>>> 700.0 3156 290 11.3 280.1 280.1 32 >>>>> 500.0 5870 280 20.1 266.1 260.1 32 >>>>> 400.0 7570 265 23.7 256.6 222.7 32 >>>>> 300.0 9670 265 28.8 240.2 218.2 32 >>>>> 250.0 10920 280 27.3 230.2 220.2 32 >>>>> 200.0 12390 260 32.4 218.7 206.7 32 >>>>> 176.0 -9999 255 37.6 -9999.0 -9999.0 8 >>>>> 150.0 14180 245 35.5 205.1 196.1 32 >>>>> 100.0 16560 300 17.0 195.2 186.2 32 >>>>> *2010 10 01 00 >>>>> 83768 -51.13 -23.33 569 41 >>>>> * 1000.0 79 -9999 -9999.0 -9999.0 -9999.0 32 >>>>> 946.0 -9999 270 1.0 295.8 292.1 64 >>>>> 925.0 763 15 2.1 296.4 290.4 32 >>>>> 850.0 1497 175 3.6 290.8 288.4 32 >>>>> 700.0 3140 295 9.8 282.9 278.6 32 >>>>> 500.0 5840 285 23.7 267.1 232.1 32 >>>>> 400.0 7550 255 35.5 255.4 231.4 32 >>>>> 300.0 9640 265 37.0 242.2 216.2 32 >>>>> >>>>> >>>>> Best Regards, >>>>> >>>>> -- >>>>> Abraço, >>>>> Nilza Barros >>>>> >>>> >>> >> >> David Winsemius, MD >> West Hartford, CT >> >> > > > -- > Abraço, > Nilza Barros > > [[alternative HTML version deleted]] > > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.