Hi Peter, I'm not going to look at your large file on what for me is Friday evening, but the usual cause of that kind of problem is a single or double quote in the text.
One way to diagnose the problem is to look at the rows in the text file itself right around 25952 - there's always something there causing the problem. I'd also look in R at the last row that was imported. Often you can see the problem there as well. Sarah On Fri, Jul 29, 2011 at 8:54 PM, Peter Langfelder <peter.langfel...@gmail.com> wrote: > Hi all, > > I encountered a problem when trying to read in an Illumina chip > annotation file. The offending file is large, so I zipped it up and > posted it at > > http://www.genetics.ucla.edu/labs/horvath/CoexpressionNetwork/tmp/ProbeInfo_Expression.txt.bz2 > > Executing this: > > annot = read.table(bzfile("ProbeInfo_Expression.txt.bz2"), > comment.char="", sep = "\t", fill = TRUE, header = TRUE); > > leads to > >> dim(annot) > [1] 25952 28 > > i.e. 25952 rows were read, but the file is some 48000 rows long. > > The file contains long text entries (up to several thousand > characters) which appear to be the problem since stripping out those > columns (outside of R) and re-reading gives he full 48k+ rows. > > My question is why is read.table stopping the read (without any > warning or error)? Am I missing something in the documentation (read > it but didn't find anything). Any arguments I'm not setting right? I > tried to google the problem but came up empty-handed. > > Session info: > >> sessionInfo() > R version 2.11.1 Patched (2010-06-06 r52218) > i686-pc-linux-gnu > > locale: > [1] LC_CTYPE=en_US.utf8 LC_NUMERIC=C > [3] LC_TIME=en_US.utf8 LC_COLLATE=en_US.utf8 > [5] LC_MONETARY=C LC_MESSAGES=en_US.utf8 > [7] LC_PAPER=en_US.utf8 LC_NAME=C > [9] LC_ADDRESS=C LC_TELEPHONE=C > [11] LC_MEASUREMENT=en_US.utf8 LC_IDENTIFICATION=C > > attached base packages: > [1] stats graphics grDevices utils datasets methods base > > > Thanks, > > Peter > > ____ -- Sarah Goslee http://www.functionaldiversity.org ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.