On Thu, 12 Aug 2010, Tim Gruene wrote:
I don't know if it's elegant enough for you, but you could split the file into
two files with 'grep "^3" file > file_3' and 'grep "^4" file > file_4'
and then read them in separately.
along the same lines, but all in R (untested)
original.lines <- readLines( filename )
tcon.3 <- textConnection( grep( "^3", original.lines, value=T ))
res.3 <- read.fwf( tcon.3, <etc> )
close(tcon.3)
tcon.4 <- textConnection( grep( "^4", original.lines, value=T ))
res.4 <- read.fwf( tcon.4, <etc> )
close(tcon.4)
rm( original.lines )
Or skip the readLines() step and use
tcon.3 <- pipe(paste("grep '^3'",filename))
...
I think you can use 'findstr.exe' on windows in lieu of grep.
HTH,
Chuck
Tim
On Thu, Aug 12, 2010 at 01:57:19PM -0400, Denis Chabot wrote:
Hi,
I know how to read fixed width format data with read.fwf, but suddenly I need to read in a large
number of old fwf files with 2 types of lines. Lines that begin with "3" in first column
carry one set of variables, and lines that begin with "4" carry another set, like this:
???
3A00206546L070049016090045 99 1015002 001001008010004002004007003 001
3A00206546L070049006090030 99 1029001002001001006014002
3A00206546L070049002290004 99 1015 001001
3A00206546L070049001692559049033 1015 018036024
3A00206546L070049002290004 99 1001 002
4A00176546L068047090010111000606516400150010000001501063 065914
4A00176546L06804709001011100040761600000000 1092 095614
4A00196546L098000100010111001706214400005010000000051062 065914
4A00176546L06804709001011100050591300000000 1062 065914
4A00196546L098000100010111002604721400020010000000201042 046114
4A00196546L098000100010111002504221400005012000000051042 046114
4A00196546L098000100010111002903721400050012200000501032 036214
???
I have searched for tricks to do this but I must not have used the right
keywords, I found nothing.
I suppose I could read the entire file as a single character variable for each
line, then subset for lines that begin with 3 and save this in an ascii file
that will then be reopened with a read.fwf call, and do the same with lines
that begin with 4. But this does not appear to me to be very elegant nor
efficient??? Is there a better method?
Thanks in advance,
Denis Chabot
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
--
Tim Gruene
Institut fuer anorganische Chemie
Tammannstr. 4
D-37077 Goettingen
GPG Key ID = A46BEE1A
Charles C. Berry (858) 534-2098
Dept of Family/Preventive Medicine
E mailto:cbe...@tajo.ucsd.edu UC San Diego
http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.