Re: [Rd] R vs. C

Dominick Samperi Mon, 17 Jan 2011 16:14:04 -0800

On Mon, Jan 17, 2011 at 7:00 PM, Spencer Graves <
spencer.gra...@structuremonitoring.com> wrote:


> Hi, Dominick, et al.:
>
>
>      Demanding complete unit test suites with all software contributed to
> CRAN would likely cut contributions by a factor of 10 or 100.  For me, the R
> package creation process is close to perfection in providing a standard
> process for documentation with places for examples and test suites of
> various kinds.  I mention "perfection", because it makes developing
> "trustworthy software" (Chamber's "prime directive") relatively easy without
> forcing people to do things they don't feel comfortable doing.
>

I don't think I made myself clear, sorry. I was not suggesting that package
developers include a complete unit
test suite. I was suggesting that unit testing should be done outside of the
CRAN release process. Packages
should be submitted for "release" to CRAN after they have been tested (the
responsibility of the package
developers). I understand that the main problem here is that package
developers do not have access to
all supported platforms, so the current process is not likely to change.

Dominick


>
>      If you need more confidence in the software you use, you can build
> your own test suites -- maybe in packages you write yourself -- or pay
> someone else to develop test suites to your specifications.  For example,
> Revolution Analytics offers "Package validation, development and support".
>
>
>       Spencer
>
>
>
> On 1/17/2011 3:27 PM, Dominick Samperi wrote:
>
>> On Mon, Jan 17, 2011 at 5:15 PM, Spencer Graves<
>> spencer.gra...@structuremonitoring.com>  wrote:
>>
>>  Hi, Paul:
>>>
>>>
>>>      The "Writing R Extensions" manual says that *.R code in a "tests"
>>> directory is run during "R CMD check".  I suspect that many R programmers
>>> do
>>> this routinely.  I probably should do that also.  However, for me, it's
>>> simpler to have everything in the "examples" section of *.Rd files.  I
>>> think
>>> the examples with independently developed answers provides useful
>>> documentation.
>>>
>>>  This is a unit test function, and I think it would be better if there
>> was a
>> way to unit test packages *before* they
>> are released to CRAN. Otherwise, this is not really a "release," it is
>> test
>> or "beta" version. This is currently
>> possible under Windows using http://win-builder.r-project.org/, for
>> example.
>>
>> My earlier remark about the release process was more about documentation
>> than about unit testing, more
>> about the gentle "nudging" that the R release process does to help insure
>> consistent documentation and
>> organization, and about how this nudging might be extended to the C/C++
>> part
>> of a package.
>>
>> Dominick
>>
>>
>>       Spencer
>>>
>>>
>>>
>>> On 1/17/2011 1:52 PM, Paul Gilbert wrote:
>>>
>>>  Spencer
>>>>
>>>> Would it not be easier to include this kind of test in a small file in
>>>> the
>>>> tests/ directory?
>>>>
>>>> Paul
>>>>
>>>> -----Original Message-----
>>>> From: r-devel-boun...@r-project.org [mailto:
>>>> r-devel-boun...@r-project.org]
>>>> On Behalf Of Spencer Graves
>>>> Sent: January 17, 2011 3:58 PM
>>>> To: Dominick Samperi
>>>> Cc: Patrick Leyshock; r-devel@r-project.org; Dirk Eddelbuettel
>>>> Subject: Re: [Rd] R vs. C
>>>>
>>>>
>>>>        For me, a major strength of R is the package development
>>>> process.  I've found this so valuable that I created a Wikipedia entry
>>>> by that name and made additions to a Wikipedia entry on "software
>>>> repository", noting that this process encourages good software
>>>> development practices that I have not seen standardized for other
>>>> languages.  I encourage people to review this material and make
>>>> additions or corrections as they like (or sent me suggestions for me to
>>>> make appropriate changes).
>>>>
>>>>
>>>>        While R has other capabilities for unit and regression testing, I
>>>> often include unit tests in the "examples" section of documentation
>>>> files.  To keep from cluttering the examples with unnecessary material,
>>>> I often include something like the following:
>>>>
>>>>
>>>> A1<- myfunc() # to test myfunc
>>>>
>>>> A0<- ("manual generation of the correct  answer for A1")
>>>>
>>>> \dontshow{stopifnot(} # so the user doesn't see "stopifnot("
>>>> all.equal(A1, A0) # compare myfunc output with the correct answer
>>>> \dontshow{)} # close paren on "stopifnot(".
>>>>
>>>>
>>>>        This may not be as good in some ways as a full suite of unit
>>>> tests, which could be provided separately.  However, this has the
>>>> distinct advantage of including unit tests with the documentation in a
>>>> way that should help users understand "myfunc".  (Unit tests too
>>>> detailed to show users could be completely enclosed in "\dontshow".
>>>>
>>>>
>>>>        Spencer
>>>>
>>>>
>>>> On 1/17/2011 11:38 AM, Dominick Samperi wrote:
>>>>
>>>>  On Mon, Jan 17, 2011 at 2:08 PM, Spencer Graves<
>>>>> spencer.gra...@structuremonitoring.com>    wrote:
>>>>>
>>>>>        Another point I have not yet seen mentioned:  If your code is
>>>>>
>>>>>> painfully slow, that can often be fixed without leaving R by
>>>>>> experimenting
>>>>>> with different ways of doing the same thing -- often after using
>>>>>> profiling
>>>>>> your code to find the slowest part as described in chapter 3 of
>>>>>> "Writing
>>>>>> R
>>>>>> Extensions".
>>>>>>
>>>>>>
>>>>>>       If I'm given code already written in C (or some other language),
>>>>>> unless it's really simple, I may link to it rather than recode it in
>>>>>> R.
>>>>>>   However, the problems with portability, maintainability,
>>>>>> transparency
>>>>>> to
>>>>>> others who may not be very facile with C, etc., all suggest that it's
>>>>>> well
>>>>>> worth some effort experimenting with alternate ways of doing the same
>>>>>> thing
>>>>>> in R before jumping to C or something else.
>>>>>>
>>>>>>       Hope this helps.
>>>>>>       Spencer
>>>>>>
>>>>>>
>>>>>>
>>>>>> On 1/17/2011 10:57 AM, David Henderson wrote:
>>>>>>
>>>>>>  I think we're also forgetting something, namely testing.  If you
>>>>>> write
>>>>>>
>>>>>>> your
>>>>>>> routine in C, you have placed additional burden upon yourself to test
>>>>>>> your
>>>>>>> C
>>>>>>> code through unit tests, etc.  If you write your code in R, you still
>>>>>>> need
>>>>>>> the
>>>>>>> unit tests, but you can rely on the well tested nature of R to allow
>>>>>>> you
>>>>>>> to
>>>>>>> reduce the number of tests of your algorithm.  I routinely tell
>>>>>>> people
>>>>>>> at
>>>>>>> Sage
>>>>>>> Bionetworks where I am working now that your new C code needs to
>>>>>>> experience at
>>>>>>> least one order of magnitude increase in performance to warrant the
>>>>>>> effort
>>>>>>> of
>>>>>>> moving from R to C.
>>>>>>>
>>>>>>> But, then again, I am working with scientists who are not primarily,
>>>>>>> or
>>>>>>> even
>>>>>>> secondarily, coders...
>>>>>>>
>>>>>>> Dave H
>>>>>>>
>>>>>>>
>>>>>>>  This makes sense, but I have seem some very transparent algorithms
>>>>>>>
>>>>>> turned
>>>>> into vectorized R code
>>>>> that is difficult to read (and thus to maintain or to change). These
>>>>> chunks
>>>>> of optimized R code are like
>>>>> embedded assembly, in the sense that nobody is likely to want to mess
>>>>> with
>>>>> it. This could be addressed
>>>>> by including pseudo code for the original (more transparent) algorithm
>>>>> as
>>>>> a
>>>>> comment, but I have never
>>>>> seen this done in practice (perhaps it could be enforced by R CMD
>>>>> check?!).
>>>>>
>>>>> On the other hand, in principle a well-documented piece of C/C++ code
>>>>> could
>>>>> be much easier to understand,
>>>>> without paying a performance penalty...but "coders" are not likely to
>>>>> place
>>>>> this high on their
>>>>> list of priorities.
>>>>>
>>>>> The bottom like is that R is an adaptor ("glue") language like Lisp
>>>>> that
>>>>> makes it easy to mix and
>>>>> match functions (using classes and generic functions), many of which
>>>>> are
>>>>> written in C (or C++
>>>>> or Fortran) for performance reasons. Like any object-based system there
>>>>> can
>>>>> be a lot of
>>>>> object copying, and like any functional programming system, there can
>>>>> be
>>>>> a
>>>>> lot of function
>>>>> calls, resulting in poor performance for some applications.
>>>>>
>>>>> If you can vectorize your R code then you have effectively found a way
>>>>> to
>>>>> benefit from
>>>>> somebody else's C code, thus saving yourself some time. For operations
>>>>> other
>>>>> than pure
>>>>> vector calculations you will have to do the C/C++ programming yourself
>>>>> (or
>>>>> call a library
>>>>> that somebody else has written).
>>>>>
>>>>> Dominick
>>>>>
>>>>>
>>>>>
>>>>>  ----- Original Message ----
>>>>>
>>>>>> From: Dirk Eddelbuettel<e...@debian.org>
>>>>>>> To: Patrick Leyshock<ngkbr...@gmail.com>
>>>>>>> Cc: r-devel@r-project.org
>>>>>>> Sent: Mon, January 17, 2011 10:13:36 AM
>>>>>>> Subject: Re: [Rd] R vs. C
>>>>>>>
>>>>>>>
>>>>>>> On 17 January 2011 at 09:13, Patrick Leyshock wrote:
>>>>>>> | A question, please about development of R packages:
>>>>>>> |
>>>>>>> | Are there any guidelines or best practices for deciding when and
>>>>>>> why
>>>>>>> to
>>>>>>> | implement an operation in R, vs. implementing it in C?  The
>>>>>>> "Writing
>>>>>>> R
>>>>>>> | Extensions" recommends "working in interpreted R code . . . this is
>>>>>>> normally
>>>>>>> | the best option."  But we do write C-functions and access them in R
>>>>>>> -
>>>>>>> the
>>>>>>> | question is, when/why is this justified, and when/why is it NOT
>>>>>>> justified?
>>>>>>> |
>>>>>>> | While I have identified helpful documents on R coding standards, I
>>>>>>> have
>>>>>>> not
>>>>>>> | seen notes/discussions on when/why to implement in R, vs. when to
>>>>>>> implement
>>>>>>> | in C.
>>>>>>>
>>>>>>> The (still fairly recent) book 'Software for Data Analysis:
>>>>>>> Programming
>>>>>>> with
>>>>>>> R' by John Chambers (Springer, 2008) has a lot to say about this.
>>>>>>>  John
>>>>>>> also
>>>>>>> gave a talk in November which stressed 'multilanguage' approaches;
>>>>>>> see
>>>>>>> e.g.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> http://blog.revolutionanalytics.com/2010/11/john-chambers-on-r-and-multilingualism.html
>>>>>>>
>>>>>>>
>>>>>>> In short, it all depends, and it is unlikely that you will get a
>>>>>>> coherent
>>>>>>> answer that is valid for all circumstances.  We all love R for how
>>>>>>> expressive
>>>>>>> and powerful it is, yet there are times when something else is called
>>>>>>> for.
>>>>>>> Exactly when that time is depends on a great many things and you have
>>>>>>> not
>>>>>>> mentioned a single metric in your question.  So I'd start with John's
>>>>>>> book.
>>>>>>>
>>>>>>> Hope this helps, Dirk
>>>>>>>
>>>>>>>  ______________________________________________
>>>>>>>
>>>>>> R-devel@r-project.org mailing list
>>>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>>>
>>>>>>  ______________________________________________
>>>>>
>>>> R-devel@r-project.org mailing list
>>>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>>>
>>>>
>>>> ====================================================================================
>>>>
>>>> La version française suit le texte anglais.
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------------
>>>>
>>>> This email may contain privileged and/or confidential information, and
>>>> the
>>>> Bank of
>>>> Canada does not waive any related rights. Any distribution, use, or
>>>> copying of this
>>>> email or the information it contains by other than the intended
>>>> recipient
>>>> is
>>>> unauthorized. If you received this email in error please delete it
>>>> immediately from
>>>> your system and notify the sender promptly by email that you have done
>>>> so.
>>>>
>>>>
>>>>
>>>> ------------------------------------------------------------------------------------
>>>>
>>>> Le présent courriel peut contenir de l'information privilégiée ou
>>>> confidentielle.
>>>> La Banque du Canada ne renonce pas aux droits qui s'y rapportent. Toute
>>>> diffusion,
>>>> utilisation ou copie de ce courriel ou des renseignements qu'il contient
>>>> par une
>>>> personne autre que le ou les destinataires désignés est interdite. Si
>>>> vous
>>>> recevez
>>>> ce courriel par erreur, veuillez le supprimer immédiatement et envoyer
>>>> sans délai à
>>>> l'expéditeur un message électronique pour l'aviser que vous avez éliminé
>>>> de votre
>>>> ordinateur toute copie du courriel reçu.
>>>>
>>>

        [[alternative HTML version deleted]]

______________________________________________
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] R vs. C

Reply via email to