Rolf -
   I'd suggest using

    junk <- read.csv("junk.csv",header=TRUE,fill=FALSE)

if you don't want the behaviour you're seeing.

                                        - Phil Spector
                                         Statistical Computing Facility
                                         Department of Statistics
                                         UC Berkeley
                                         spec...@stat.berkeley.edu


On Fri, 3 Dec 2010, Rolf Turner wrote:


I have recently been bitten by an aspect of the behaviour of
the read.csv() function.

Some lines in a (fairly large) *.csv file that I read in had
too many entries.  I would have hoped that this would cause
read.csv() to throw an error, or at least issue a warning,
but it read the file without complaint, putting the extra
entries into an additional line.

This behaviour is illustrated by the toy example in the
attached file ``junk.csv''.  Just do

        junk <- read.csv("junk.csv",header=TRUE)
        junk

to see the problem.

If the offending over-long line were in the fourth line of data
or earlier, an error would be thrown, but if it is in the fifth line
of data or later no error is given.

This is in a way compatible with what the help on read.csv()
says:

        The number of data columns is determined by looking at
        the first five lines of input (or the whole file if it
        has less than five lines), or from the length of col.names
        if it is specified and is longer.

However, the help for read.table() says the same thing.  And yet if
one does

        gorp <- read.table("junk.csv",sep=",",header=TRUE)

one gets an error, whereas read.csv() gives none.

Am I correct in saying that is inappropriate behaviour on
the part of read.csv(), or am I missing something?

                cheers,

                        Rolf Turner



______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to