On Apr 19, 2014, at 9:00 AM, Martin Maechler <maech...@stat.math.ethz.ch> wrote:
>>>>>> McGehee, Robert <robert.mcge...@geodecapital.com> >>>>>> on Thu, 17 Apr 2014 19:15:47 -0400 writes: > >>> This is all application specific and >>> sort of beyond the scope of type.convert(), which now behaves as it >>> has been documented to behave. > >> That's only a true statement because the documentation was changed to >> reflect the new behavior! The new feature in type.convert certainly does not >> behave according to the documentation as of R 3.0.3. Here's a snippit: > >> The first type that can accept all the >> non-missing values is chosen (numeric and complex return values >> will represented approximately, of course). > >> The key phrase is in parentheses, which reminds the user to expect a >> possible loss of precision. That important parenthetical was removed from >> the documentation in R 3.1.0 (among other changes). > >> Putting aside the fact that this introduces a large amount of unnecessary >> work rewriting SQL / data import code, SQL packages, my biggest conceptual >> problem is that I can no longer rely on a particular function call returning >> a particular class. In my example querying stock prices, about 5% of prices >> came back as factors and the remaining 95% as numeric, so we had random >> errors popping in throughout the morning. > >> Here's a short example showing us how the new behavior can be unreliable. I >> pass a character representation of a uniformly distributed random variable >> to type.convert. 90% of the time it is converted to "numeric" and 10% it is >> a "factor" (in R 3.1.0). In the 10% of cases in which type.convert converts >> to a factor the leading non-zero digit is always a 9. So if you were >> expecting a numeric value, then 1 in 10 times you may have a bug in your >> code that didn't exist before. > >>> options(digits=16) >>> cl <- NULL; for (i in 1:10000) cl[i] <- >>> class(type.convert(format(runif(1)))) >>> table(cl) >> cl >> factor numeric >> 990 9010 > > Yes. > > Murray's point is valid, too. > > But in my view, with the reasoning we have seen here, > *and* with the well known software design principle of > "least surprise" in mind, > I also do think that the default for type.convert() should be what > it has been for > 10 years now. > I think there should be two separate discussions: a) have an option (argument to type.convert and possibly read.table) to enable/disable this behavior. I'm strongly in favor of this. b) decide what the default for a) will be. I have no strong opinion, I can see arguments in both directions But most importantly I think a) is better than the status quo - even if the discussion about b) drags out. Cheers, Simon ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel