Yes, I'm also strongly in favor of having an option for this. If
there was an option in base R for controlling this we would just use
that and get rid of the separate RProtoBuf.int64AsString option we use
in the RProtoBuf package on CRAN to control whether 64-bit int types
from C++ are returned to R as numerics or character vectors.
I agree that reasonable people can disagree about the default, but I
found my original bug report about this, so I will counter Robert's
example with my favorite example of what was wrong with the previous
behavior :
tmp<-data.frame(n=c("72057594037927936", "72057594037927937"),
name=c("foo", "bar"))
length(unique(tmp$n))
# 2
write.csv(tmp, "/tmp/foo.csv", quote=FALSE, row.names=FALSE)
data <- read.csv("/tmp/foo.csv")
length(unique(data$n))
# 1
- Murray
On Sat, Apr 19, 2014 at 10:06 AM, Simon Urbanek
<[email protected]> wrote:
> On Apr 19, 2014, at 9:00 AM, Martin Maechler <[email protected]>
> wrote:
>
>>>>>>> McGehee, Robert <[email protected]>
>>>>>>> on Thu, 17 Apr 2014 19:15:47 -0400 writes:
>>
>>>> This is all application specific and
>>>> sort of beyond the scope of type.convert(), which now behaves as it
>>>> has been documented to behave.
>>
>>> That's only a true statement because the documentation was changed to
>>> reflect the new behavior! The new feature in type.convert certainly does
>>> not behave according to the documentation as of R 3.0.3. Here's a snippit:
>>
>>> The first type that can accept all the
>>> non-missing values is chosen (numeric and complex return values
>>> will represented approximately, of course).
>>
>>> The key phrase is in parentheses, which reminds the user to expect a
>>> possible loss of precision. That important parenthetical was removed from
>>> the documentation in R 3.1.0 (among other changes).
>>
>>> Putting aside the fact that this introduces a large amount of unnecessary
>>> work rewriting SQL / data import code, SQL packages, my biggest conceptual
>>> problem is that I can no longer rely on a particular function call
>>> returning a particular class. In my example querying stock prices, about 5%
>>> of prices came back as factors and the remaining 95% as numeric, so we had
>>> random errors popping in throughout the morning.
>>
>>> Here's a short example showing us how the new behavior can be unreliable. I
>>> pass a character representation of a uniformly distributed random variable
>>> to type.convert. 90% of the time it is converted to "numeric" and 10% it is
>>> a "factor" (in R 3.1.0). In the 10% of cases in which type.convert converts
>>> to a factor the leading non-zero digit is always a 9. So if you were
>>> expecting a numeric value, then 1 in 10 times you may have a bug in your
>>> code that didn't exist before.
>>
>>>> options(digits=16)
>>>> cl <- NULL; for (i in 1:10000) cl[i] <-
>>>> class(type.convert(format(runif(1))))
>>>> table(cl)
>>> cl
>>> factor numeric
>>> 990 9010
>>
>> Yes.
>>
>> Murray's point is valid, too.
>>
>> But in my view, with the reasoning we have seen here,
>> *and* with the well known software design principle of
>> "least surprise" in mind,
>> I also do think that the default for type.convert() should be what
>> it has been for > 10 years now.
>>
>
> I think there should be two separate discussions:
>
> a) have an option (argument to type.convert and possibly read.table) to
> enable/disable this behavior. I'm strongly in favor of this.
>
> b) decide what the default for a) will be. I have no strong opinion, I can
> see arguments in both directions
>
> But most importantly I think a) is better than the status quo - even if the
> discussion about b) drags out.
>
> Cheers,
> Simon
>
>
>
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel