[R] read.table only reads part of file

Peter Langfelder Fri, 29 Jul 2011 17:54:41 -0700

Hi all,

I encountered a problem when trying to read in an Illumina chip
annotation file. The offending file is large, so I zipped it up and
posted it at


http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/tmp/ProbeInfo_Expression.txt.bz2

Executing this:

annot = read.table(bzfile("ProbeInfo_Expression.txt.bz2"),
                comment.char="",  sep = "\t", fill = TRUE, header = TRUE);

leads to

> dim(annot)
[1] 25952    28

i.e. 25952 rows were read, but the file is some 48000 rows long.

The file contains long text entries (up to several thousand
characters) which appear to be the problem since stripping out those
columns (outside of R) and re-reading gives he full 48k+ rows.

My question is why is read.table stopping the read (without any
warning or error)? Am I missing something in the documentation (read
it but didn't find anything). Any arguments I'm not setting right? I
tried to google the problem but came up empty-handed.

Session info:

> sessionInfo()
R version 2.11.1 Patched (2010-06-06 r52218)
i686-pc-linux-gnu

locale:
 [1] LC_CTYPE=en_US.utf8       LC_NUMERIC=C
 [3] LC_TIME=en_US.utf8        LC_COLLATE=en_US.utf8
 [5] LC_MONETARY=C             LC_MESSAGES=en_US.utf8
 [7] LC_PAPER=en_US.utf8       LC_NAME=C
 [9] LC_ADDRESS=C              LC_TELEPHONE=C
[11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base


Thanks,

Peter

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] read.table only reads part of file

Reply via email to