Re: [Rd] [R] bug when subtracting decimals?

Marc Schwartz Wed, 22 Apr 2009 06:58:10 -0700

On Apr 22, 2009, at 4:49 AM, Martin Maechler wrote:

"MS" == Marc Schwartz <marc_schwa...@me.com>
   on Tue, 21 Apr 2009 08:06:46 -0500 writes:



   MS> It does look like R's behavior has changed since then. Using:

   MS> R version 2.9.0 Patched (2009-04-18 r48348)

   MS> on OSX:

   MS> # This first example has changed.
   MS> # Prior result was 414.99999999999994

print(4.145 * 100 + 0.5, digits = 20)

   MS> [1] 415

formatC(4.145 * 100 + 0.5, format = "E", digits = 20)

   MS> [1] "4.14999999999999943157E+02"

print(0.5 - 0.4 - 0.1, digits = 20)

   MS> [1] -2.77555756156289e-17

formatC(0.5 - 0.4 - 0.1, format = "E", digits = 20)

   MS> [1] "-2.77555756156289135106E-17"


   MS> What is interesting is that:

4.145 * 100 + 0.5 == 415

   MS> [1] FALSE

(4.145 * 100 + 0.5) - 415

   MS> [1] -5.684342e-14

all.equal(4.145 * 100 + 0.5, 415, 0)

   MS> [1] "Mean relative difference: 1.369721e-16"

MS> So it would appear that in the first R example above, theprint()

   MS> function has changed in a material fashion.

Yes  ((though not with *my* vote...)).
However, be aware that such calculations *are* platform
dependent, and IIUC, you are now using OS X wheras you've used
another platform previously, so some of the differences you see
may not be from changes in R, but from changes in the platform
you use.

Back to the topic of print():

Actually, also format(<numeric>) has changed similarly to mychagrin.

In older versions of R, you could ask it to give "too many" digits,
but now it gives "too few" even for maximal 'digits'.
{There is a good reason - which I don't recall - for the new behavior}

With as.character() it was worse (in older R versions): it gave
sometimes too little digits, sometimes too many, whereas now it
is at least consistently giving "too little".
But the effect is that in  ch <- as.character(x) ,
ch may contain duplicated entries even for unique x,
e.g., for x <- c(1, 1 + 4e-16)

BTW, one alternative to {"my"}  formatC() is  sprintf(),

and if you are really interested: The latest changes (in 2.10.0 R-devel),

ensuring unique factor levels actually now make use of
         sprintf("%.17g", .)
instead of as.character(.) exactly in order to ensure that
different numbers map to different strings necessarily.

BTW, we are way off topic for R-help, being in R-devel realm,
but as this thread has started here, we may keep it...

Martin Maechler, ETH Zurich



Thanks for replying Martin.

While I appreciate your comment above, I am moving to r-devel giventhe content. I agree that we are getting into low level subject matter.

FWIW, I grabbed my dusty old Dell laptop running Fedora 10 out of thecloset and booted it up.


I get the same behavior as above there with R 2.8.1 patched.

So this would suggest that it it not an OS issue, but indeed a changein R.

I did try to build R 1.7.1 (the version used in the prior examplesalmost 6 years ago) on OSX, but it would appear that things havechanged sufficiently in the intervening time frame as to preclude asuccessful build. I suspect much of the issue may be that Apple movedto Intel CPU's only about 4 years ago, so perhaps the configuration ofolder versions of R on OSX for Intel would require much work which isnot worth it here. I would of course defer to others with more in-depth knowledge on that point.

I did not see anything in any of the *NEWS files, but the help forprint() does reference:


Warning

Using too large a value of digits may lead to representation errors inthe calculation of the number of significant digits and the decimalrepresentation: these are likely for digits >= 16, and these possibleerrors are taken into account in assessing the numher of significantdigits to be printed in that case.

Whereas earlier versions of R might have printed further digits fordigits >= 16 on some platforms, they were not necessarily reliable.

While I don't want to re-visit what from your comments appears to be asensitive subject, I do want to point out that this new behaviorarguably masks aspects of the original subject matter of the threadfrom users. It also results in inconsistent behavior when compared tothe output of the other floating point comparisons I used, whichsuggest that the result of the operation is not an integer, which willserve to further confuse folks.

Is there some reasonable compromise to be had here such thatconsistent and predictable behavior is possible in this realm,especially given how frequently this fundamental subject comes up?

We of course don't need examples as complicated as the one above andcan use the more common:


> print(0.5 - 0.4, 20)

[1] 0.1



> 0.5 - 0.4 == 0.1

[1] FALSE





> all.equal(0.5 - 0.4, 0.1, 0)

[1] "Mean relative difference: 2.775558e-16"



So arguably, we are talking about boundary situations here.

Thanks Martin!


Marc

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] [R] bug when subtracting decimals?

Reply via email to