On 03/26/2014 06:46 PM, Paul Gilbert wrote:


On 03/26/2014 04:58 AM, Kirill Müller wrote:
Dear list


It is possible to store expected output for tests and examples. From the
manual: "If tests has a subdirectory Examples containing a file
pkg-Ex.Rout.save, this is compared to the output file for running the
examples when the latter are checked." And, earlier (written in the
context of test output, but apparently applies here as well): "...,
these two are compared, with differences being reported but not causing
an error."

I think a NOTE would be appropriate here, in order to be able to detect
this by only looking at the summary. Is there a reason for not flagging
differences here?

The problem is that differences occur too often because this is a comparison of characters in the output files (a diff). Any output that is affected by locale, node name or Internet downloads, time, host, or OS, is likely to cause a difference. Also, if you print results to a high precision you will get differences on different systems, depending on OS, 32 vs 64 bit, numerical libraries, etc. A better test strategy when it is numerical results that you want to compare is to do a numerical comparison and throw an error if the result is not good, something like

  r <- result from your function
  rGood <- known good value
  fuzz <- 1e-12  #tolerance

  if (fuzz < max(abs(r - rGood))) stop('Test xxx failed.')

It is more work to set up, but the maintenance will be less, especially when you consider that your tests need to run on different OSes on CRAN.

You can also use try() and catch error codes if you want to check those.


Thanks for your input.

To me, this is a different kind of test, for which I'd rather use the facilities provided by the testthat package. Imagine a function that operates on, say, strings, vectors, or data frames, and that is expected to produce completely identical results on all platforms -- here, a character-by-character comparison of the output is appropriate, and I'd rather see a WARNING or ERROR if something fails.

Perhaps this functionality can be provided by external packages like roxygen and testthat: roxygen could create the "good" output (if asked for) and set up a testthat test that compares the example run with the "good" output. This would duplicate part of the work already done by base R; the duplication could be avoided if there was a way to specify the severity of a character-level difference between output and expected output, perhaps by means of an .Rout.cfg file in DCF format:

OnDifference: mute|note|warning|error
Normalize: [R expression]
Fuzziness: [number of different lines that are tolerated]

On that note: Is there a convenient way to create the .Rout.save files in base R? By "convenient" I mean a single function call, not checking and manually copying as suggested here: https://stat.ethz.ch/pipermail/r-help/2004-November/060310.html .


Cheers

Kirill

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Reply via email to