On Thu, 27 Sep 2007, Jun Ding wrote: > Hi Everyone, > > Recently I got puzzled by the function read.table, > even though I have used it for a long time. > > I have such a file (tmp.txt, 2 rows and 3 columns, > with a space among columns): > > 1 2'-PDE 4 > 2 3'-PDE 5 > > if I do: > a = read.table("tmp.txt", header = F, quote = "") > a > V1 V2 V3 > 1 1 2'-PDE 4 > 2 2 3'-PDE 5 > > Everything is fine. > > However, if I do: > a = read.table("tmp.txt", header = F) > a > V1 V2 V3 > 1 2 3'-PDE 5 > 2 1 2'-PDE 4 > 3 2 3'-PDE 5 > > I know it is related to the "quote" as the default > includes '. But how can it get one more row in the > file? Thank you very much for your help in advance!
read.table does a lot of work trying to figure out what kind of data it will see and doing preliminary checks on it before swallowing the whole file. It reads the first 5 lines of data thru a file() connection - if there are five lines - and then tries to pushBack() two copies of those lines. Then it rereads half of these and skips the extra header row if there is one. At that point, it should be positioned to read all of the data that was in the original file. Declaring a quote that should not be a quote really messes this up. I think this happens because the internal function readTableHead will ignore newlines that are between quotes. In your example all of the data is read by readTableHead as one line because of a quote on the first line, and this has downstream consequences that result in not repositioning the connection at the right place. And that leads to reading two copies of the second line in your example. If you want more details, use debug(read.table) and then run your examples. print 'lines', 'nlines', and 'pushBackLength( file )' at various points in the execution of read.table and you can see what is happening. HTH, Chuck > > Jun > > ______________________________________________ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > Charles C. Berry (858) 534-2098 Dept of Family/Preventive Medicine E mailto:[EMAIL PROTECTED] UC San Diego http://famprevmed.ucsd.edu/faculty/cberry/ La Jolla, San Diego 92093-0901 ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.