It is after all an R-related mailing list, and professor Ripley has set a certain standard ages ago ;)
On Mon, Sep 30, 2013 at 5:19 PM, Milan Bouchet-Valat <nalimi...@club.fr>wrote: > Le lundi 30 septembre 2013 à 10:07 -0500, Joshua Ulrich a écrit : > > On Mon, Sep 30, 2013 at 9:45 AM, Milan Bouchet-Valat <nalimi...@club.fr> > wrote: > > > Le lundi 30 septembre 2013 à 08:38 -0500, Joshua Ulrich a écrit : > > >> On Mon, Sep 30, 2013 at 7:33 AM, Milan Bouchet-Valat < > nalimi...@club.fr> wrote: > > >> > Hi! > > >> > > > >> > > > >> > It seems that read.table() in R 3.0.1 (Linux 64-bit) does not > consider > > >> > quoted integers as an acceptable value for columns for which > > >> > colClasses="integer". But when colClasses is omitted, these columns > are > > >> > read as integer anyway. > > >> > > > >> > For example, let's consider a file named file.dat, containing: > > >> > "1" > > >> > "2" > > >> > > > >> >> read.table("file.dat", colClasses="integer") > > >> > Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, > na.strings, : > > >> > scan() expected 'an integer' and got '"1"' > > >> > > > >> > But: > > >> >> str(read.table("file.dat")) > > >> > 'data.frame': 2 obs. of 1 variable: > > >> > $ V1: int 1 2 > > >> > > > >> > The latter result is indeed documented in ?read.table: > > >> > Unless colClasses is specified, all columns are read as > > >> > character columns and then converted using type.convert to > > >> > logical, integer, numeric, complex or (depending on as.is) > > >> > factor as appropriate. Quotes are (by default) interpreted in > all > > >> > fields, so a column of values like "42" will result in an > > >> > integer column. > > >> > > > >> > > > >> > Should the former behavior be considered a bug? > > >> > > > >> No. If you tell read.table the column is integer and it's actually > > >> character on disk, it should be an error. > > > All values in a CSV file are stored as characters on disk, disregarding > > > the fact that they are surrounded by quotes or not. 1 is saved as > > > 00110001 (ASCII character #49), not 00000001, nor 00000000 00000000 > > > 00000000 00000001 (as would for example imply a 32 bit storage of > > > integers). > > > > > Yes, I'm aware that write.table creates a character representation of > > the data on disk. That's its purpose. writeBin is for writing actual > > binary representations. I thought you would understand that by > > "actually character on disk" I meant "actually a quoted value". I > > assumed you would understand my intent. > > > > read.table uses scan to read the file. ?scan says: > > > > The allowed input for a numeric field is optional whitespace > > followed either NA or an optional sign followed by a decimal or > > hexadecimal constant (see NumericConstants), or NaN, Inf or > > infinity (ignoring case). Out-of-range values are recorded as > > Inf, -Inf or 0. > > > > For an integer field the allowed input is optional whitespace, > > followed by either NA or an optional sign and one or more digits > > (0-9): all out-of-range values are converted to NA_integer_. > > > > There's no mention of quotes being allowed. > > > > > So, with all due respect, please refrain from formulating such > blatantly > > > erroneous statements. > > > > > So, with all due respect, please refrain from formulating such > > blatantly pedantic responses to someone trying to help you. > Sorry, your reply came across as quite abrupt for somebody trying to > help. ;-) > > And I'm not really looking for help, honestly, as I found a workaround > some time ago already. I'd just like to know how we could make > read.csv.ffdf() work better in this case, and possibly improve R too. > > > Regards > > > > > > > > Regards > > > > > > > > >> > This creates problems when combined with read.table.ffdf from > package > > >> > ff, since this function tries to guess the column classes by > reading the > > >> > first rows of the file, and then passes colClasses to read.table to > read > > >> > the remaining rows by chunks. A column of quoted integers is > correctly > > >> > detected as integer in the first read, but read.table() fails in > > >> > subsequent reads. > > >> > > > >> This sounds like a issue with read.table.ffdf. The column of quoted > > >> integers is *incorrectly* detected as integer because they're actually > > >> character on disk. read.table.ffdf should rely on how the data are > > >> actually stored on disk (via as.is=TRUE), not how read.table might > > >> convert them once they're read into R. > > >> > > >> > > > >> > Regards > > >> > > > >> > ______________________________________________ > > >> > R-devel@r-project.org mailing list > > >> > https://stat.ethz.ch/mailman/listinfo/r-devel > > >> > > >> -- > > >> Joshua Ulrich | about.me/joshuaulrich > > >> FOSS Trading | www.fosstrading.com > > > > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > -- Joris Meys Statistical consultant Ghent University Faculty of Bioscience Engineering Department of Mathematical Modelling, Statistics and Bio-Informatics tel : +32 9 264 59 87 joris.m...@ugent.be ------------------------------- Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php [[alternative HTML version deleted]]
______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel