Ingo,
The awk solution may be your preferred bet,
but here's an R way to do it. It's based on
adding a copy of the longest row (in terms of
number of fields) at the top of the file so
that R knows that you need that many fields.
( read.table and friends check the first 5 rows
to determine what's needed.)
## check how many fields in each row
cf <- count.fields("test.dat")
## which row has most fields?
id <- which.max(cf)
## read file as character string rows
dL <- readLines("test.dat")
## put copy of 'longest' row on top and write back
dL <- c(dL[id], dL)
writeLines(dL, "test1.dat")
## read as dataframe
d <- read.delim("test1.dat", header=FALSE)
## remove top row
d <- d[-1,]
Peter Ehlers
On 2011-02-18 00:16, Ingo Reinhold wrote:
Hi John,
seems there is no easy way. I'll just precondition it with AWK as described
here http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg53401.html
There are some remarks in the thread that R is not supposed to read too large files for
"political" reasons. Maybe that's it.
Many thanks again for the effort.
Ingo
________________________________________
From: John Kane [jrkrid...@yahoo.ca]
Sent: Thursday, February 17, 2011 11:54 AM
To: Ingo Reinhold
Subject: RE: [R] Variable length datafile import problem
Generally most of the gurus are in this list. Hopefully someone will take an
interest in the problem.
I suspect that there may be some kind of weird value in the file that is
upsetting in import. Given the results I got when I removed the data past BD
and then at AL it seems that the problem might be within this range.
You could try removing half the data between those columns and see what
happens, then repeat if something turns up. It's tedious but unless someone
with a better grasp of variable length data import can help it's the best I can
suggest.
BTW you only replied to me. You should make sure to cc the list otherwise
readers won't realise that I am being of no help.
If you still have the problem by Saturday e-mail me or post to the list and
I'll try to spent some more time messing about with the problem.
Sorry to be of so little help.
--- On Thu, 2/17/11, Ingo Reinhold<in...@kth.se> wrote:
From: Ingo Reinhold<in...@kth.se>
Subject: RE: [R] Variable length datafile import problem
To: "John Kane"<jrkrid...@yahoo.ca>
Received: Thursday, February 17, 2011, 5:36 AM
Hi John,
as it seems we're hitting the wall here, can you maybe
recommend another mailing list with "gurus" (as you put it)
that may be able to help?
Regards,
Ingo
________________________________________
From: John Kane [jrkrid...@yahoo.ca]
Sent: Thursday, February 17, 2011 11:25 AM
To: Ingo Reinhold
Subject: RE: [R] Variable length datafile import problem
Hi Ingo,
I've had a bit of time to examine the file and I must say
that, at the moment, I have no idea what is going on.
I tried the old cut the file into pieces trick just came up
with even more anomalous results.
My first attempt remove all the data past column AL in an
OOo Calc spreadsheet. This created a rectangular
dataset It imported into R with no problem with 38 columns
as expected.
Then I deleted all the data from the orignal data file
(test.dat) removing all the data past column BD in an OOo
Calc spreadsheet.
This imported a file with only 38 columns.
Something very funny is happening but at the moment I have
no
--- On Wed, 2/16/11, Ingo Reinhold<in...@kth.se>
wrote:
From: Ingo Reinhold<in...@kth.se>
Subject: RE: [R] Variable length datafile import
problem
To: "John Kane"<jrkrid...@yahoo.ca>
Received: Wednesday, February 16, 2011, 1:59 AM
Hi John,
V1 should be just a character. However I figured
something
out myself. The import looks OK in terms of column
when
adding the flush=TRUE option.
I am still very confused about the dimensions that
the
imported data shows. Loading my data file into
something
like OOspreadsheet shows me a maximum of about 245,
which
does not correspond to the 146 generated by R. Any
idea
where this saturation comes from?
Thanks,
Ingo
________________________________________
From: John Kane [jrkrid...@yahoo.ca]
Sent: Wednesday, February 16, 2011 1:57 AM
To: Ingo Reinhold
Subject: RE: [R] Variable length datafile import
problem
Is rawData$V1 intended to be factor or character?
str(rawData) gives
$ V1 : Factor w/ 54 levels "-232.0","-234.0",..:
41
41 41 41 41 41 41 41 41 41 ...
If you were not expecting a factor you might try
options(stringsAsFactors = FALSE) before importing
the
data.
--- On Tue, 2/15/11, Ingo Reinhold<in...@kth.se>
wrote:
From: Ingo Reinhold<in...@kth.se>
Subject: RE: [R] Variable length datafile import
problem
To: "John Kane"<jrkrid...@yahoo.ca>
Received: Tuesday, February 15, 2011, 3:35 PM
Dear all,
I have changed the file-ending with no change in
the
result. I don't think that this should matter.
http://dl.dropbox.com/u/2414056/Test.dat
is a test file which represent the structure I
am
trying to
read. So far I have used
rawData=read.table("Test.txt", fill=TRUE,
sep="\t",
header=FALSE);
When then looking at rawData$V1 this gives me a
distorted
view of my original first column.
Thanks,
Ingo
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.