On Apr 22, 2009, at 4:49 AM, Martin Maechler wrote:

"MS" == Marc Schwartz <marc_schwa...@me.com>
   on Tue, 21 Apr 2009 08:06:46 -0500 writes:


   MS> It does look like R's behavior has changed since then. Using:

   MS> R version 2.9.0 Patched (2009-04-18 r48348)

   MS> on OSX:

   MS> # This first example has changed.
   MS> # Prior result was 414.99999999999994
print(4.145 * 100 + 0.5, digits = 20)
   MS> [1] 415

formatC(4.145 * 100 + 0.5, format = "E", digits = 20)
   MS> [1] "4.14999999999999943157E+02"

print(0.5 - 0.4 - 0.1, digits = 20)
   MS> [1] -2.77555756156289e-17

formatC(0.5 - 0.4 - 0.1, format = "E", digits = 20)
   MS> [1] "-2.77555756156289135106E-17"


   MS> What is interesting is that:

4.145 * 100 + 0.5 == 415
   MS> [1] FALSE

(4.145 * 100 + 0.5) - 415
   MS> [1] -5.684342e-14

all.equal(4.145 * 100 + 0.5, 415, 0)
   MS> [1] "Mean relative difference: 1.369721e-16"


MS> So it would appear that in the first R example above, the print()
   MS> function has changed in a material fashion.

Yes  ((though not with *my* vote...)).
However, be aware that such calculations *are* platform
dependent, and IIUC, you are now using OS X wheras you've used
another platform previously, so some of the differences you see
may not be from changes in R, but from changes in the platform
you use.

Back to the topic of print():
Actually, also format(<numeric>) has changed similarly to my chagrin.
In older versions of R, you could ask it to give "too many" digits,
but now it gives "too few" even for maximal 'digits'.
{There is a good reason - which I don't recall - for the new behavior}

With as.character() it was worse (in older R versions): it gave
sometimes too little digits, sometimes too many, whereas now it
is at least consistently giving "too little".
But the effect is that in  ch <- as.character(x) ,
ch may contain duplicated entries even for unique x,
e.g., for x <- c(1, 1 + 4e-16)

BTW, one alternative to {"my"}  formatC() is  sprintf(),
and if you are really interested: The latest changes (in 2.10.0 R- devel),
ensuring unique factor levels actually now make use of
         sprintf("%.17g", .)
instead of as.character(.) exactly in order to ensure that
different numbers map to different strings necessarily.

BTW, we are way off topic for R-help, being in R-devel realm,
but as this thread has started here, we may keep it...

Martin Maechler, ETH Zurich



Thanks for replying Martin.

While I appreciate your comment above, I am moving to r-devel given the content. I agree that we are getting into low level subject matter.

FWIW, I grabbed my dusty old Dell laptop running Fedora 10 out of the closet and booted it up.

I get the same behavior as above there with R 2.8.1 patched.

So this would suggest that it it not an OS issue, but indeed a change in R.

I did try to build R 1.7.1 (the version used in the prior examples almost 6 years ago) on OSX, but it would appear that things have changed sufficiently in the intervening time frame as to preclude a successful build. I suspect much of the issue may be that Apple moved to Intel CPU's only about 4 years ago, so perhaps the configuration of older versions of R on OSX for Intel would require much work which is not worth it here. I would of course defer to others with more in- depth knowledge on that point.

I did not see anything in any of the *NEWS files, but the help for print() does reference:

Warning
Using too large a value of digits may lead to representation errors in the calculation of the number of significant digits and the decimal representation: these are likely for digits >= 16, and these possible errors are taken into account in assessing the numher of significant digits to be printed in that case.

Whereas earlier versions of R might have printed further digits for digits >= 16 on some platforms, they were not necessarily reliable.



While I don't want to re-visit what from your comments appears to be a sensitive subject, I do want to point out that this new behavior arguably masks aspects of the original subject matter of the thread from users. It also results in inconsistent behavior when compared to the output of the other floating point comparisons I used, which suggest that the result of the operation is not an integer, which will serve to further confuse folks.

Is there some reasonable compromise to be had here such that consistent and predictable behavior is possible in this realm, especially given how frequently this fundamental subject comes up?

We of course don't need examples as complicated as the one above and can use the more common:

> print(0.5 - 0.4, 20)

[1] 0.1



> 0.5 - 0.4 == 0.1

[1] FALSE





> all.equal(0.5 - 0.4, 0.1, 0)

[1] "Mean relative difference: 2.775558e-16"



So arguably, we are talking about boundary situations here.

Thanks Martin!


Marc

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to