On Oct 3, 2010, at 9:40 PM, Nilza BARROS wrote:
Hi, Michael
Thank you for your help. I have already done what you said.
But I am still facing problems to deal with my data.
I need to split the data according to station..
I was able to identify where the station information start using:
my.data<-file("d2010100100.txt",open="rt")
indata <- readLines(my.data, n=20000)
i<-grep("^[837]",indata) #station number
That would give you the line numbers for any line that had an 8 , _or_
a 3, _or_ a 7 as its first digit. Was that your intent? My guess is
that you did not really want to use the square braces and should have
been using "^837".
?regex # Paragraph starting "A character class .... "
my.data2<-read.table("d2010100100.txt",fill=TRUE,nrows=20000)
stn<- my.data2$V1[i]
That would give you the first column values for the lines you earlier
selected.
====
This does not look like what I would expect as a value for stn. Is
that what you wanted us to think this was?
--
David.
2010 10 01 00
*82599 -35.25 -5.91 52 1
* 1008.0 -9999 115 3.1 298.6 294.6 64
2010 10 01 00
*83649 -40.28 -20.26 4 7*
1011.0 -9999 0 0.0 298.4 296.1 64
1000.0 96 40 5.7 297.9 295.1 32
925.0 782 325 3.1 295.4 294.1 32
850.0 1520 270 4.1 293.8 289.4 32
700.0 3171 240 8.7 284.1 279.1 32
500.0 5890 275 8.2 266.2 262.9 32
400.0 7600 335 9.8 255.4 242.4 32
===========
As you can see in the data above the line show the number of leves (or
lines) for each station.
I need to catch these lines so as to be able to feed my database.
By the way, I didn't understand the regular expression you've used.
I've
tried to run it but it did not work.
Hope you can help me!
Best Regards,
Nilza
On Sun, Oct 3, 2010 at 2:18 AM, Michael Bedward
<michael.bedw...@gmail.com>wrote:
Hello Nilza,
If your file is small you can read it into a character vector like
this:
indata <- readLines("foo.dat")
If your file is very big you can read it in batches like this...
MAXRECS <- 1000 # for example
fcon <- file("foo.dat", open="r")
indata <- readLines(fcon, n=MAXRECS)
The number of lines read will be given by length(indata).
You can check to see if the end of the file has been read yet with:
isIncomplete( fcon )
If a leading "*" character is a flag for the start of a station data
block you can find this in the indata vector with grepl...
start.pos <- which(indata, grepl("^\\s*\\*", indata)
When you're finished reading the file...
close(fcon)
Hope this helps,
Michael
On 3 October 2010 13:31, Nilza BARROS <nilzabar...@gmail.com> wrote:
Dear R-users,
I would like to know how could I read a file with different lines
lengths.
I need read this file and create an output to feed my database.
So after reading I'll need create an output like this
"INSERT INTO TEMP (DATA,STATION,VAR1,VAR2) VALUES (20100910,837460,
39,390)"
I mean, each line should be read. But I don`t how to do this when
these
lines have different lengths
I really appreciate any help.
Thanks.
====Below the file that should be read ===========
*2010 10 01 00
83746 -43.25 -22.81 6 51*
1012.0 -9999 320 1.5 299.1 294.4 64
1000.0 114 250 4.1 298.4 294.8 32
925.0 797 0 0.0 293.6 292.9 32
850.0 1524 195 3.1 289.6 288.9 32
700.0 3156 290 11.3 280.1 280.1 32
500.0 5870 280 20.1 266.1 260.1 32
400.0 7570 265 23.7 256.6 222.7 32
300.0 9670 265 28.8 240.2 218.2 32
250.0 10920 280 27.3 230.2 220.2 32
200.0 12390 260 32.4 218.7 206.7 32
176.0 -9999 255 37.6 -9999.0 -9999.0 8
150.0 14180 245 35.5 205.1 196.1 32
100.0 16560 300 17.0 195.2 186.2 32
*2010 10 01 00
83768 -51.13 -23.33 569 41
* 1000.0 79 -9999 -9999.0 -9999.0 -9999.0 32
946.0 -9999 270 1.0 295.8 292.1 64
925.0 763 15 2.1 296.4 290.4 32
850.0 1497 175 3.6 290.8 288.4 32
700.0 3140 295 9.8 282.9 278.6 32
500.0 5840 285 23.7 267.1 232.1 32
400.0 7550 255 35.5 255.4 231.4 32
300.0 9640 265 37.0 242.2 216.2 32
Best Regards,
--
Abraço,
Nilza Barros
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.