Re: [Rd] A bug in princomp(), perhaps?

2014-05-30 Thread peter dalgaard
It's only documented to work for princomp.formula; other methods do not know 
about na.action.

-pd

On 29 May 2014, at 22:10 , Ravi Varadhan  wrote:

> Hi,
> It may be my misunderstanding, but it seems that the "na.action" in the 
> princomp() function for principal components analysis does not work.  Please 
> see this simple example:
> 
> u <- matrix(rnorm(75), ncol=1)
> v <- matrix(rnorm(20), ncol=1)
> x <- u%*%t(v) + matrix(rnorm(20*75),ncol=20)
> x[1,1] <- NA
> pc.out <- princomp(x, na.action=na.exclude)
> Error in cov.wt(z) : 'x' must contain finite values only
>> 
> 
> Note, I have:
>> options("na.action")
> $na.action
> [1] "na.omit"
> 
> Thanks,
> Ravi
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] A bug in princomp(), perhaps?

2014-05-30 Thread Ravi Varadhan
Thank you, Peter.  Now I see that.  

I still think the documentation of `na.action' can be made more explicit to 
state that this option is only used for princomp.formula.

Best regards,
Ravi

-Original Message-
From: peter dalgaard [mailto:pda...@gmail.com] 
Sent: Friday, May 30, 2014 5:15 AM
To: Ravi Varadhan
Cc: r-devel@r-project.org
Subject: Re: [Rd] A bug in princomp(), perhaps?

It's only documented to work for princomp.formula; other methods do not know 
about na.action.

-pd

On 29 May 2014, at 22:10 , Ravi Varadhan  wrote:

> Hi,
> It may be my misunderstanding, but it seems that the "na.action" in the 
> princomp() function for principal components analysis does not work.  Please 
> see this simple example:
> 
> u <- matrix(rnorm(75), ncol=1)
> v <- matrix(rnorm(20), ncol=1)
> x <- u%*%t(v) + matrix(rnorm(20*75),ncol=20) x[1,1] <- NA pc.out <- 
> princomp(x, na.action=na.exclude) Error in cov.wt(z) : 'x' must 
> contain finite values only
>> 
> 
> Note, I have:
>> options("na.action")
> $na.action
> [1] "na.omit"
> 
> Thanks,
> Ravi
> 
>   [[alternative HTML version deleted]]
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 
Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD check for the R code from vignettes

2014-05-30 Thread Kevin Coombes

Hi,

Unless someone is planning to change Stangle to include inline 
expressions (which I am *not* advocating), I think that relying on 
side-effects within an \Sexpr construction is a bad idea. So, my own 
coding style is to restrict my use of \Sexpr to calls of the form 
\Sexpr{show.the.value.of.this.variable}. As a result, I more-or-less 
believe that having R CMD check use Stangle and report an error is 
probably a good thing.


There is a completely separate questions about the relationship between 
Sweave/Stangle or knit/purl and literate programming that is linked to 
your question about whether to use Stangle on vignettes. The underlying 
model(s) in R have drifted away from Knuth's original conception, for 
some good reasons.


The original goal of literate programming was to be able to explain the 
algorithms and data structures in the code to humans.  For that purpose, 
it was important to have named code chunks that you could move around, 
which would allow you to describe the algorithm starting from a high 
level overview and then drilling down into the details. From this 
perspective, "tangle" was critical to being able to reconstruct a 
program that would compile and run correctly.


The vast majority of applications of Sweave/Stangle or knit/purl in 
modern R have a completely different goal: to produce some sort of 
document that describes the results of an analysis to a non-programmer 
or non-statistician.  For this goal, "weave" is much more important than 
"tangle", because the most important aspect is the ability to integrate 
the results (figures, tables, etc) of running the code into the document 
that get passed off to the person for whom the analysis was prepared. As 
a result, the number of times in my daily work that I need to explicitly 
invoke Stangle (or purl) explicitly is many orders of magnitude smaller 
than  the number of times that I invoke Sweave (or knitr).


  -- Kevin


On 5/30/2014 1:04 AM, Yihui Xie wrote:

Hi,

Recently I saw a couple of cases in which the package vignettes were
somewhat complicated so that Stangle() (or knitr::purl() or other
tangling functions) can fail to produce the exact R code that is
executed by the weaving function Sweave() (or knitr::knit(), ...). For
example, this is a valid document that can pass the weaving process
but cannot generate a valid R script to be source()d:

\documentclass{article}
\begin{document}
Assign 1 to x: \Sexpr{x <- 1}
<<>>=
x + 1
@
\end{document}

That is because the inline R code is not written to the R script
during the tangling process. When an R package vignette contains
inline R code expressions that have significant side effects, R CMD
check can fail because the tangled output is not correct. What I
showed here is only a trivial example, and I have seen two packages
that have more complicated scenarios than this. Anyway, the key thing
that I want to discuss here is, since the R code in the vignette has
been executed once during the weaving process, does it make much sense
to execute the code generated from the tangle function? In other
words, if the weaving process has succeeded, is it necessary to
source() the R script again?

The two options here are:

1. Do not check the R code from vignettes;
2. Or fix the tangle function so that it produces exactly what was
executed in the weaving process. If this is done, I'm back to my
previous question: does it make sense to run the code twice?

To push this a little further, personally I do not quite appreciate
literate programming in R as two separate steps, namely weave and
tangle. In particular, I do not see the value of tangle, considering
Sweave() (or knitr::knit()) as the new "source()". Therefore
eventually I tend to just drop tangle, but perhaps I missed something
here, and I'd like to hear what other people think about it.

Regards,
Yihui
--
Yihui Xie 
Web: http://yihui.name

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Style question

2014-05-30 Thread Hadley Wickham
> Even more important than choosing between whatever(...)
> or foo::whatever(...), you should import that function
> from the foo package by putting
>
>   importFrom(foo, whatever)
>
> or
>
>   import(foo)
>
> in your NAMESPACE file.
>
> The 1st form also kind of document what function comes from what
> package.
>
> Note that you'll also need to have foo in the Depends or Imports field
> of your DESCRIPTION file. Which field is appropriate depends on whether
> or not you want foo to show up in the user's search path when s/he loads
> your package with 'library(yourpackage)'.

Except that if you do foo::whatever() you don't need to explicitly
import the function.

Hadley


-- 
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] A bug in princomp(), perhaps?

2014-05-30 Thread Gavin Simpson
Ravi,

You mean something /more/ explicit than the Usage section, wherein
`na.action` only exists in the formula method?

I doubt we'd want RCore to go down the road of documenting all the
arguments that do/don't work with particular methods included in an Rd
file, beyond the Usage section.

G


On 30 May 2014 06:33, Ravi Varadhan  wrote:

> Thank you, Peter.  Now I see that.
>
> I still think the documentation of `na.action' can be made more explicit
> to state that this option is only used for princomp.formula.
>
> Best regards,
> Ravi
>
> -Original Message-
> From: peter dalgaard [mailto:pda...@gmail.com]
> Sent: Friday, May 30, 2014 5:15 AM
> To: Ravi Varadhan
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] A bug in princomp(), perhaps?
>
> It's only documented to work for princomp.formula; other methods do not
> know about na.action.
>
> -pd
>
> On 29 May 2014, at 22:10 , Ravi Varadhan  wrote:
>
> > Hi,
> > It may be my misunderstanding, but it seems that the "na.action" in the
> princomp() function for principal components analysis does not work.
>  Please see this simple example:
> >
> > u <- matrix(rnorm(75), ncol=1)
> > v <- matrix(rnorm(20), ncol=1)
> > x <- u%*%t(v) + matrix(rnorm(20*75),ncol=20) x[1,1] <- NA pc.out <-
> > princomp(x, na.action=na.exclude) Error in cov.wt(z) : 'x' must
> > contain finite values only
> >>
> >
> > Note, I have:
> >> options("na.action")
> > $na.action
> > [1] "na.omit"
> >
> > Thanks,
> > Ravi
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000
> Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd@cbs.dk  Priv: pda...@gmail.com
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 

Gavin Simpson, PhD

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] A bug in princomp(), perhaps?

2014-05-30 Thread Ravi Varadhan
Gavin,
I agree w.r.t. documenting all arguments.  However, it is quite natural to 
expect that something as basic as `na.action’ would work for more generally 
than with only one particular type of usage.  Why should the behavior of 
princomp(x, …), where x is a data matrix or dataframe be any different than 
when a formula is provided, with regards to the NA action?  It should be easy 
enough to remove the rows of `x’ with NAs.  This is my main point.

Ravi

From: Gavin Simpson [mailto:ucfa...@gmail.com]
Sent: Friday, May 30, 2014 10:46 AM
To: Ravi Varadhan
Cc: peter dalgaard; r-devel@r-project.org
Subject: Re: [Rd] A bug in princomp(), perhaps?

Ravi,

You mean something /more/ explicit than the Usage section, wherein `na.action` 
only exists in the formula method?

I doubt we'd want RCore to go down the road of documenting all the arguments 
that do/don't work with particular methods included in an Rd file, beyond the 
Usage section.

G

On 30 May 2014 06:33, Ravi Varadhan 
mailto:ravi.varad...@jhu.edu>> wrote:
Thank you, Peter.  Now I see that.

I still think the documentation of `na.action' can be made more explicit to 
state that this option is only used for princomp.formula.

Best regards,
Ravi

-Original Message-
From: peter dalgaard [mailto:pda...@gmail.com]
Sent: Friday, May 30, 2014 5:15 AM
To: Ravi Varadhan
Cc: r-devel@r-project.org
Subject: Re: [Rd] A bug in princomp(), perhaps?

It's only documented to work for princomp.formula; other methods do not know 
about na.action.

-pd

On 29 May 2014, at 22:10 , Ravi Varadhan 
mailto:ravi.varad...@jhu.edu>> wrote:

> Hi,
> It may be my misunderstanding, but it seems that the "na.action" in the 
> princomp() function for principal components analysis does not work.  Please 
> see this simple example:
>
> u <- matrix(rnorm(75), ncol=1)
> v <- matrix(rnorm(20), ncol=1)
> x <- u%*%t(v) + matrix(rnorm(20*75),ncol=20) x[1,1] <- NA pc.out <-
> princomp(x, na.action=na.exclude) Error in cov.wt(z) : 'x' must
> contain finite values only
>>
>
> Note, I have:
>> options("na.action")
> $na.action
> [1] "na.omit"
>
> Thanks,
> Ravi
>
>   [[alternative HTML version deleted]]
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

--
Peter Dalgaard, Professor,
Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000 
Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: 
pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



--

Gavin Simpson, PhD

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] A bug in princomp(), perhaps?

2014-05-30 Thread Gavin Simpson
Really? I can't recall a single instance where such things worked outside
of formula methods for functions and I tend to equate its use with the
standard non-standard evaluation idiom.

It could easily remove the NAs, but you wanted na.exclude which means
remove them whilst computing but put them back in in the correct place in
the results. Doing this adds complexity to a code base. We have this
effectively for free in the standard non-standard evaluation methods
employed by most functions with formula methods. And of course:

princomp(na.omit(x))

gives one simple way to remove them without hardcoding checks for NAs in
each and every function that one might reasonably expect to remove NAs.

G


On 30 May 2014 08:57, Ravi Varadhan  wrote:

>  Gavin,
>
> I agree w.r.t. documenting all arguments.  However, it is quite natural to
> expect that something as basic as `na.action’ would work for more generally
> than with only one particular type of usage.  Why should the behavior of
> princomp(x, …), where x is a data matrix or dataframe be any different than
> when a formula is provided, with regards to the NA action?  It should be
> easy enough to remove the rows of `x’ with NAs.  This is my main point.
>
>
>
> Ravi
>
>
>
> *From:* Gavin Simpson [mailto:ucfa...@gmail.com]
> *Sent:* Friday, May 30, 2014 10:46 AM
> *To:* Ravi Varadhan
> *Cc:* peter dalgaard; r-devel@r-project.org
>
> *Subject:* Re: [Rd] A bug in princomp(), perhaps?
>
>
>
> Ravi,
>
>
>
> You mean something /more/ explicit than the Usage section, wherein
> `na.action` only exists in the formula method?
>
>
>
> I doubt we'd want RCore to go down the road of documenting all the
> arguments that do/don't work with particular methods included in an Rd
> file, beyond the Usage section.
>
>
>
> G
>
>
>
> On 30 May 2014 06:33, Ravi Varadhan  wrote:
>
> Thank you, Peter.  Now I see that.
>
> I still think the documentation of `na.action' can be made more explicit
> to state that this option is only used for princomp.formula.
>
> Best regards,
> Ravi
>
>
> -Original Message-
> From: peter dalgaard [mailto:pda...@gmail.com]
> Sent: Friday, May 30, 2014 5:15 AM
> To: Ravi Varadhan
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] A bug in princomp(), perhaps?
>
> It's only documented to work for princomp.formula; other methods do not
> know about na.action.
>
> -pd
>
> On 29 May 2014, at 22:10 , Ravi Varadhan  wrote:
>
> > Hi,
> > It may be my misunderstanding, but it seems that the "na.action" in the
> princomp() function for principal components analysis does not work.
>  Please see this simple example:
> >
> > u <- matrix(rnorm(75), ncol=1)
> > v <- matrix(rnorm(20), ncol=1)
> > x <- u%*%t(v) + matrix(rnorm(20*75),ncol=20) x[1,1] <- NA pc.out <-
> > princomp(x, na.action=na.exclude) Error in cov.wt(z) : 'x' must
> > contain finite values only
> >>
> >
> > Note, I have:
> >> options("na.action")
> > $na.action
> > [1] "na.omit"
> >
> > Thanks,
> > Ravi
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
> --
> Peter Dalgaard, Professor,
> Center for Statistics, Copenhagen Business School Solbjerg Plads 3, 2000
> Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd@cbs.dk  Priv: pda...@gmail.com
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>
>
>
> --
>
> Gavin Simpson, PhD
>
>


-- 

Gavin Simpson, PhD

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD check for the R code from vignettes

2014-05-30 Thread Carl Boettiger
Hi Yihui,

I agree with you (and your comments in [knitr issue 784]) that it seems
wrong for R CMD check to be using tangle (purl, etc) as a way to check R
code in a vignette, when the standard and expected way to check the
vignette is already to knit / Sweave the vignette.

I also agree with the perspective that the tangle function no longer plays
the crucial role it did when we were using noweb and C programs that
couldn't be compiled without tangle.

However, I would be hesitant to see tangle removed entirely, as it is
occasionally a convenient way to create an R script from a dynamic
document.  Pure R scripts are still much more widely recognized than
dynamic documents, and I sometimes will just tangle out the R code because
a collaborator would have no idea what to do with a .Rmd file (Though
RStudio is certainly improving this situation).  Tangle-like functions also
provides a nice compliment to the "stitch" and friends that make dynamic
documents from the ubiquitous R scripts.

[knitr issue 784]: https://github.com/yihui/knitr/issues/784


- Carl



On Fri, May 30, 2014 at 6:21 AM, Kevin Coombes 
wrote:

> Hi,
>
> Unless someone is planning to change Stangle to include inline expressions
> (which I am *not* advocating), I think that relying on side-effects within
> an \Sexpr construction is a bad idea. So, my own coding style is to
> restrict my use of \Sexpr to calls of the form
> \Sexpr{show.the.value.of.this.variable}. As a result, I more-or-less
> believe that having R CMD check use Stangle and report an error is probably
> a good thing.
>
> There is a completely separate questions about the relationship between
> Sweave/Stangle or knit/purl and literate programming that is linked to your
> question about whether to use Stangle on vignettes. The underlying model(s)
> in R have drifted away from Knuth's original conception, for some good
> reasons.
>
> The original goal of literate programming was to be able to explain the
> algorithms and data structures in the code to humans.  For that purpose, it
> was important to have named code chunks that you could move around, which
> would allow you to describe the algorithm starting from a high level
> overview and then drilling down into the details. From this perspective,
> "tangle" was critical to being able to reconstruct a program that would
> compile and run correctly.
>
> The vast majority of applications of Sweave/Stangle or knit/purl in modern
> R have a completely different goal: to produce some sort of document that
> describes the results of an analysis to a non-programmer or
> non-statistician.  For this goal, "weave" is much more important than
> "tangle", because the most important aspect is the ability to integrate the
> results (figures, tables, etc) of running the code into the document that
> get passed off to the person for whom the analysis was prepared. As a
> result, the number of times in my daily work that I need to explicitly
> invoke Stangle (or purl) explicitly is many orders of magnitude smaller
> than  the number of times that I invoke Sweave (or knitr).
>
>   -- Kevin
>
>
>
> On 5/30/2014 1:04 AM, Yihui Xie wrote:
>
>> Hi,
>>
>> Recently I saw a couple of cases in which the package vignettes were
>> somewhat complicated so that Stangle() (or knitr::purl() or other
>> tangling functions) can fail to produce the exact R code that is
>> executed by the weaving function Sweave() (or knitr::knit(), ...). For
>> example, this is a valid document that can pass the weaving process
>> but cannot generate a valid R script to be source()d:
>>
>> \documentclass{article}
>> \begin{document}
>> Assign 1 to x: \Sexpr{x <- 1}
>> <<>>=
>> x + 1
>> @
>> \end{document}
>>
>> That is because the inline R code is not written to the R script
>> during the tangling process. When an R package vignette contains
>> inline R code expressions that have significant side effects, R CMD
>> check can fail because the tangled output is not correct. What I
>> showed here is only a trivial example, and I have seen two packages
>> that have more complicated scenarios than this. Anyway, the key thing
>> that I want to discuss here is, since the R code in the vignette has
>> been executed once during the weaving process, does it make much sense
>> to execute the code generated from the tangle function? In other
>> words, if the weaving process has succeeded, is it necessary to
>> source() the R script again?
>>
>> The two options here are:
>>
>> 1. Do not check the R code from vignettes;
>> 2. Or fix the tangle function so that it produces exactly what was
>> executed in the weaving process. If this is done, I'm back to my
>> previous question: does it make sense to run the code twice?
>>
>> To push this a little further, personally I do not quite appreciate
>> literate programming in R as two separate steps, namely weave and
>> tangle. In particular, I do not see the value of tangle, considering
>> Sweave() (or knitr::knit()) as the new "source()". Therefore

Re: [Rd] Style question

2014-05-30 Thread Hervé Pagès

Hi Hadley,

On 05/30/2014 07:06 AM, Hadley Wickham wrote:

Even more important than choosing between whatever(...)
or foo::whatever(...), you should import that function
from the foo package by putting

   importFrom(foo, whatever)

or

   import(foo)

in your NAMESPACE file.

The 1st form also kind of document what function comes from what
package.

Note that you'll also need to have foo in the Depends or Imports field
of your DESCRIPTION file. Which field is appropriate depends on whether
or not you want foo to show up in the user's search path when s/he loads
your package with 'library(yourpackage)'.


Except that if you do foo::whatever() you don't need to explicitly
import the function.


There is at least one subtle consequence to keep in mind when doing
this. Of course, whatever choice you make, if the whatever() function
moves to a different package, this breaks your package.
However, if you explicitly import the function, your package will
break at load-time (which is good) and you'll only have to modify
1 line in the NAMESPACE file to fix it. But if you do foo::whatever(),
your package won't break at load-time, only at run-time. Also you'll
have to edit all the calls to foo::whatever() to fix the package.

Probably not a big deal, but in an environment like Bioconductor where
infrastructure classes and functions can be shared by hundreds of
packages, having people use foo::whatever() in a systematic way would
probably make maintenance a little bit more painful than it needs to
be when the need arises to reorganize/refactor parts of the
infrastructure. Also, the ability to quickly grep the NAMESPACE
files of all BioC packages to see who imports what is very convenient
in this situation.

Cheers,
H.



Hadley




--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Style question

2014-05-30 Thread Hadley Wickham
> There is at least one subtle consequence to keep in mind when doing
> this. Of course, whatever choice you make, if the whatever() function
> moves to a different package, this breaks your package.
> However, if you explicitly import the function, your package will
> break at load-time (which is good) and you'll only have to modify
> 1 line in the NAMESPACE file to fix it. But if you do foo::whatever(),
> your package won't break at load-time, only at run-time. Also you'll
> have to edit all the calls to foo::whatever() to fix the package.
>
> Probably not a big deal, but in an environment like Bioconductor where
> infrastructure classes and functions can be shared by hundreds of
> packages, having people use foo::whatever() in a systematic way would
> probably make maintenance a little bit more painful than it needs to
> be when the need arises to reorganize/refactor parts of the
> infrastructure. Also, the ability to quickly grep the NAMESPACE
> files of all BioC packages to see who imports what is very convenient
> in this situation.

OTOH, I think there's a big benefit to being able to read package code
and instantly know where a function comes from.

Personally, I found this outweighs the benefits that you outline:

* functions rarely move between packages, and gsubbing for pkga:foo to
pkgb:foo isn't hard
* it's not that much hard to grep for pkg::foo in R/* than it is to
grep NAMESPACE

Hadley

-- 
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] R CMD check for the R code from vignettes

2014-05-30 Thread Henrik Bengtsson
I think there are several aspects to Yihue's post and some simple
workarounds/long solutions to the issues:

1. For the reasons argued, I would agree that 'R CMD check'
incorrectly assumes that tangled code script should be able to run
without errors.  Instead I think it should only check the syntax, i.e.
that it can be parsed without errors.  If not, then Sweave may have to
be redfined to clarify that \Sexpr{}/"inline" expressions must not
have "side effects".

2. For other (=non-Sweave) vignette builder packages, you can already
today define engines that do not tangle, think
%\VignetteEngine{knitr::knitr_no_tangle}.

3. Extending on this, I'd like to propose %\VignetteTangle{no} (and/or
false, FALSE, ...), which would tell the engine to not generate the
"tangle" script file.  Then it is up to the vignette engine to
acknowledge this or not, but at least we will have a standard across
engines rather that each of us come up with their own markup for this.
 You can also imagine that one support other types of settings, e.g.
%\VignetteTangle{all} to include also \Sexpr{} in the tangled output.

/Henrik

On Fri, May 30, 2014 at 9:29 AM, Carl Boettiger  wrote:
> Hi Yihui,
>
> I agree with you (and your comments in [knitr issue 784]) that it seems
> wrong for R CMD check to be using tangle (purl, etc) as a way to check R
> code in a vignette, when the standard and expected way to check the
> vignette is already to knit / Sweave the vignette.
>
> I also agree with the perspective that the tangle function no longer plays
> the crucial role it did when we were using noweb and C programs that
> couldn't be compiled without tangle.
>
> However, I would be hesitant to see tangle removed entirely, as it is
> occasionally a convenient way to create an R script from a dynamic
> document.  Pure R scripts are still much more widely recognized than
> dynamic documents, and I sometimes will just tangle out the R code because
> a collaborator would have no idea what to do with a .Rmd file (Though
> RStudio is certainly improving this situation).  Tangle-like functions also
> provides a nice compliment to the "stitch" and friends that make dynamic
> documents from the ubiquitous R scripts.
>
> [knitr issue 784]: https://github.com/yihui/knitr/issues/784
>
>
> - Carl
>
>
>
> On Fri, May 30, 2014 at 6:21 AM, Kevin Coombes 
> wrote:
>
>> Hi,
>>
>> Unless someone is planning to change Stangle to include inline expressions
>> (which I am *not* advocating), I think that relying on side-effects within
>> an \Sexpr construction is a bad idea. So, my own coding style is to
>> restrict my use of \Sexpr to calls of the form
>> \Sexpr{show.the.value.of.this.variable}. As a result, I more-or-less
>> believe that having R CMD check use Stangle and report an error is probably
>> a good thing.
>>
>> There is a completely separate questions about the relationship between
>> Sweave/Stangle or knit/purl and literate programming that is linked to your
>> question about whether to use Stangle on vignettes. The underlying model(s)
>> in R have drifted away from Knuth's original conception, for some good
>> reasons.
>>
>> The original goal of literate programming was to be able to explain the
>> algorithms and data structures in the code to humans.  For that purpose, it
>> was important to have named code chunks that you could move around, which
>> would allow you to describe the algorithm starting from a high level
>> overview and then drilling down into the details. From this perspective,
>> "tangle" was critical to being able to reconstruct a program that would
>> compile and run correctly.
>>
>> The vast majority of applications of Sweave/Stangle or knit/purl in modern
>> R have a completely different goal: to produce some sort of document that
>> describes the results of an analysis to a non-programmer or
>> non-statistician.  For this goal, "weave" is much more important than
>> "tangle", because the most important aspect is the ability to integrate the
>> results (figures, tables, etc) of running the code into the document that
>> get passed off to the person for whom the analysis was prepared. As a
>> result, the number of times in my daily work that I need to explicitly
>> invoke Stangle (or purl) explicitly is many orders of magnitude smaller
>> than  the number of times that I invoke Sweave (or knitr).
>>
>>   -- Kevin
>>
>>
>>
>> On 5/30/2014 1:04 AM, Yihui Xie wrote:
>>
>>> Hi,
>>>
>>> Recently I saw a couple of cases in which the package vignettes were
>>> somewhat complicated so that Stangle() (or knitr::purl() or other
>>> tangling functions) can fail to produce the exact R code that is
>>> executed by the weaving function Sweave() (or knitr::knit(), ...). For
>>> example, this is a valid document that can pass the weaving process
>>> but cannot generate a valid R script to be source()d:
>>>
>>> \documentclass{article}
>>> \begin{document}
>>> Assign 1 to x: \Sexpr{x <- 1}
>>> <<>>=
>>> x + 1
>>> @
>>> \end{document}
>>>
>>> That is 

Re: [Rd] R CMD check for the R code from vignettes

2014-05-30 Thread Henrik Bengtsson
Sorry, it should be Yihui and nothing else. /Henrik

On Fri, May 30, 2014 at 10:15 AM, Henrik Bengtsson  
wrote:
> I think there are several aspects to Yihue's post and some simple
> workarounds/long solutions to the issues:
>
> 1. For the reasons argued, I would agree that 'R CMD check'
> incorrectly assumes that tangled code script should be able to run
> without errors.  Instead I think it should only check the syntax, i.e.
> that it can be parsed without errors.  If not, then Sweave may have to
> be redfined to clarify that \Sexpr{}/"inline" expressions must not
> have "side effects".
>
> 2. For other (=non-Sweave) vignette builder packages, you can already
> today define engines that do not tangle, think
> %\VignetteEngine{knitr::knitr_no_tangle}.
>
> 3. Extending on this, I'd like to propose %\VignetteTangle{no} (and/or
> false, FALSE, ...), which would tell the engine to not generate the
> "tangle" script file.  Then it is up to the vignette engine to
> acknowledge this or not, but at least we will have a standard across
> engines rather that each of us come up with their own markup for this.
>  You can also imagine that one support other types of settings, e.g.
> %\VignetteTangle{all} to include also \Sexpr{} in the tangled output.
>
> /Henrik
>
> On Fri, May 30, 2014 at 9:29 AM, Carl Boettiger  wrote:
>> Hi Yihui,
>>
>> I agree with you (and your comments in [knitr issue 784]) that it seems
>> wrong for R CMD check to be using tangle (purl, etc) as a way to check R
>> code in a vignette, when the standard and expected way to check the
>> vignette is already to knit / Sweave the vignette.
>>
>> I also agree with the perspective that the tangle function no longer plays
>> the crucial role it did when we were using noweb and C programs that
>> couldn't be compiled without tangle.
>>
>> However, I would be hesitant to see tangle removed entirely, as it is
>> occasionally a convenient way to create an R script from a dynamic
>> document.  Pure R scripts are still much more widely recognized than
>> dynamic documents, and I sometimes will just tangle out the R code because
>> a collaborator would have no idea what to do with a .Rmd file (Though
>> RStudio is certainly improving this situation).  Tangle-like functions also
>> provides a nice compliment to the "stitch" and friends that make dynamic
>> documents from the ubiquitous R scripts.
>>
>> [knitr issue 784]: https://github.com/yihui/knitr/issues/784
>>
>>
>> - Carl
>>
>>
>>
>> On Fri, May 30, 2014 at 6:21 AM, Kevin Coombes 
>> wrote:
>>
>>> Hi,
>>>
>>> Unless someone is planning to change Stangle to include inline expressions
>>> (which I am *not* advocating), I think that relying on side-effects within
>>> an \Sexpr construction is a bad idea. So, my own coding style is to
>>> restrict my use of \Sexpr to calls of the form
>>> \Sexpr{show.the.value.of.this.variable}. As a result, I more-or-less
>>> believe that having R CMD check use Stangle and report an error is probably
>>> a good thing.
>>>
>>> There is a completely separate questions about the relationship between
>>> Sweave/Stangle or knit/purl and literate programming that is linked to your
>>> question about whether to use Stangle on vignettes. The underlying model(s)
>>> in R have drifted away from Knuth's original conception, for some good
>>> reasons.
>>>
>>> The original goal of literate programming was to be able to explain the
>>> algorithms and data structures in the code to humans.  For that purpose, it
>>> was important to have named code chunks that you could move around, which
>>> would allow you to describe the algorithm starting from a high level
>>> overview and then drilling down into the details. From this perspective,
>>> "tangle" was critical to being able to reconstruct a program that would
>>> compile and run correctly.
>>>
>>> The vast majority of applications of Sweave/Stangle or knit/purl in modern
>>> R have a completely different goal: to produce some sort of document that
>>> describes the results of an analysis to a non-programmer or
>>> non-statistician.  For this goal, "weave" is much more important than
>>> "tangle", because the most important aspect is the ability to integrate the
>>> results (figures, tables, etc) of running the code into the document that
>>> get passed off to the person for whom the analysis was prepared. As a
>>> result, the number of times in my daily work that I need to explicitly
>>> invoke Stangle (or purl) explicitly is many orders of magnitude smaller
>>> than  the number of times that I invoke Sweave (or knitr).
>>>
>>>   -- Kevin
>>>
>>>
>>>
>>> On 5/30/2014 1:04 AM, Yihui Xie wrote:
>>>
 Hi,

 Recently I saw a couple of cases in which the package vignettes were
 somewhat complicated so that Stangle() (or knitr::purl() or other
 tangling functions) can fail to produce the exact R code that is
 executed by the weaving function Sweave() (or knitr::knit(), ...). For
 example, this is a valid document that

Re: [Rd] Style question

2014-05-30 Thread Gabriel Becker
This isn't likely to make much difference in most cases, but calling a
function via :: can incur up to about twice the overhead on average
compared to calling an imported function

> fun1
function ()
file_ext("text.txt")

> fun2
function ()
tools::file_ext("text.txt")

> microbenchmark(fun1(), times=1)
Unit: microseconds
   exprmin lq median  uq max neval
 fun1() 24.506 25.654 26.324 27.8795 154.001 1
> microbenchmark(fun2(), times=1)
Unit: microseconds
   exprmin  lq  median  uq max neval
 fun2() 42.723 46.6945 48.8685 52.0595 2021.91 1

Also, if one uses roxygen2 (or even if one doesn't) ##'@importFrom above
the function doing the calling documents this.

And of course if you need to know where a function lives environment will
tell you.

~G


On Fri, May 30, 2014 at 10:00 AM, Hadley Wickham 
wrote:

> > There is at least one subtle consequence to keep in mind when doing
> > this. Of course, whatever choice you make, if the whatever() function
> > moves to a different package, this breaks your package.
> > However, if you explicitly import the function, your package will
> > break at load-time (which is good) and you'll only have to modify
> > 1 line in the NAMESPACE file to fix it. But if you do foo::whatever(),
> > your package won't break at load-time, only at run-time. Also you'll
> > have to edit all the calls to foo::whatever() to fix the package.
> >
> > Probably not a big deal, but in an environment like Bioconductor where
> > infrastructure classes and functions can be shared by hundreds of
> > packages, having people use foo::whatever() in a systematic way would
> > probably make maintenance a little bit more painful than it needs to
> > be when the need arises to reorganize/refactor parts of the
> > infrastructure. Also, the ability to quickly grep the NAMESPACE
> > files of all BioC packages to see who imports what is very convenient
> > in this situation.
>
> OTOH, I think there's a big benefit to being able to read package code
> and instantly know where a function comes from.
>
> Personally, I found this outweighs the benefits that you outline:
>
> * functions rarely move between packages, and gsubbing for pkga:foo to
> pkgb:foo isn't hard
> * it's not that much hard to grep for pkg::foo in R/* than it is to
> grep NAMESPACE
>
> Hadley
>
> --
> http://had.co.nz/
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>



-- 
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Style question

2014-05-30 Thread Gábor Csárdi
On Fri, May 30, 2014 at 6:55 PM, Hervé Pagès  wrote:
[...]

> There is at least one subtle consequence to keep in mind when doing
> this. Of course, whatever choice you make, if the whatever() function
> moves to a different package, this breaks your package.
> However, if you explicitly import the function, your package will
> break at load-time (which is good) and you'll only have to modify
> 1 line in the NAMESPACE file to fix it. But if you do foo::whatever(),
> your package won't break at load-time, only at run-time. Also you'll
> have to edit all the calls to foo::whatever() to fix the package.
>

It'll break at run-time, yes, but if you use pkg::fun and fun is not in pkg
any more, then AFAIK you'll get a warning from R CMD check.

Gabor

[...]

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Style question

2014-05-30 Thread Hervé Pagès

On 05/30/2014 10:00 AM, Hadley Wickham wrote:

There is at least one subtle consequence to keep in mind when doing
this. Of course, whatever choice you make, if the whatever() function
moves to a different package, this breaks your package.
However, if you explicitly import the function, your package will
break at load-time (which is good) and you'll only have to modify
1 line in the NAMESPACE file to fix it. But if you do foo::whatever(),
your package won't break at load-time, only at run-time. Also you'll
have to edit all the calls to foo::whatever() to fix the package.

Probably not a big deal, but in an environment like Bioconductor where
infrastructure classes and functions can be shared by hundreds of
packages, having people use foo::whatever() in a systematic way would
probably make maintenance a little bit more painful than it needs to
be when the need arises to reorganize/refactor parts of the
infrastructure. Also, the ability to quickly grep the NAMESPACE
files of all BioC packages to see who imports what is very convenient
in this situation.


OTOH, I think there's a big benefit to being able to read package code
and instantly know where a function comes from.


To me this is way more readable:

  setClass("A", representation(...))
  setMethod("head", "A", function(x, ...) {...})

than this:

  methods::setClass("A", methods::representation(...))
  methods::setMethod(utils::head, "A", function(x, ...) {...})

All the :: clutter adds very little value and hurts readability.
Just a matter of taste I guess.

Also it almost never matters to me *where* a function comes from.
The only thing I find relevant when I read code is *what* a function
does and I can find out by doing ?whatever (I generally don't need
to do ?foo::whatever). If I need to try it (interactively), I do
whatever(...), not foo::whatever(...). Sometimes, ?whatever will
fail because foo's NAMESPACE is loaded but foo is not attached to my
search path. In that case, and in that case only, I need to know
*where* the function comes from so I can library() the package where
it's defined and documented, and then I can do ?whatever. But this is
a rare situation and doesn't justify systematic use of foo::whatever().

So I only reserve the use of foo::whatever() to disambiguate in case
of name collision or to call a function defined in a *suggested*
package.

Finally, now that the use of a NAMESPACE became mandatory (well, this
happened a few years ago), advocating systematic use of foo::whatever()
without explicitly importing the function sounds a little bit like an
heroic act of resistance ;-)

H.



Personally, I found this outweighs the benefits that you outline:

* functions rarely move between packages, and gsubbing for pkga:foo to
pkgb:foo isn't hard
* it's not that much hard to grep for pkg::foo in R/* than it is to
grep NAMESPACE

Hadley



--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Style question

2014-05-30 Thread Hadley Wickham
> Finally, now that the use of a NAMESPACE became mandatory (well, this
> happened a few years ago), advocating systematic use of foo::whatever()
> without explicitly importing the function sounds a little bit like an
> heroic act of resistance ;-)

I don't think that's at all true - for most other programming
languages, the preferred style is to explicitly refer to functions,
including their namespace/package etc.

Hadley

-- 
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Style question

2014-05-30 Thread Gábor Csárdi
On Fri, May 30, 2014 at 9:08 PM, Hadley Wickham  wrote:

> > Finally, now that the use of a NAMESPACE became mandatory (well, this
> > happened a few years ago), advocating systematic use of foo::whatever()
> > without explicitly importing the function sounds a little bit like an
> > heroic act of resistance ;-)
>
> I don't think that's at all true - for most other programming
> languages, the preferred style is to explicitly refer to functions,
> including their namespace/package etc.
>

I think with R the issue of having functions with the same name (but
different semantics) imported from different packages does not come up too
often. IMHO the reason for this is (partly) historical. In the past there
were no namespaces, at least they were not mandatory, and packages were
loaded and attached as a whole, so people were defensive and used very
specific function names to avoid name clashes.

I chose graph.density() over density() and chose graph.adjlist() over
adjlist(), etc. Last week I am chose diff() over git_diff(), and I guess I
am not the only one with this tendency. It is just a matter of time to have
a bunch of packages with a diff() function, and then it will matter where
diff() is coming from.

Gabor


>
> Hadley
>
> --
> http://had.co.nz/
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Style question

2014-05-30 Thread Gábor Csárdi
On Fri, May 30, 2014 at 9:17 PM, Gábor Csárdi 
wrote:

> On Fri, May 30, 2014 at 9:08 PM, Hadley Wickham 
> wrote:
>
>> > Finally, now that the use of a NAMESPACE became mandatory (well, this
>> > happened a few years ago), advocating systematic use of foo::whatever()
>> > without explicitly importing the function sounds a little bit like an
>> > heroic act of resistance ;-)
>>
>> I don't think that's at all true - for most other programming
>> languages, the preferred style is to explicitly refer to functions,
>> including their namespace/package etc.
>>
>
> I think with R the issue of having functions with the same name (but
> different semantics) imported from different packages does not come up too
> often. IMHO the reason for this is (partly) historical. In the past there
> were no namespaces, at least they were not mandatory, and packages were
> loaded and attached as a whole, so people were defensive and used very
> specific function names to avoid name clashes.
>
> I chose graph.density() over density() and chose graph.adjlist() over
> adjlist(), etc. Last week I am chose diff() over git_diff(), and I guess I
> am not the only one with this tendency. It is just a matter of time to have
> a bunch of packages with a diff() function, and then it will matter where
> diff() is coming from.
>

Btw. this said, personally I still prefer importFrom(pkg, diff) and then
diff() to pkg::diff(), most of the time.

Gabor

[...]

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Style question

2014-05-30 Thread Hervé Pagès

Hi Gabe,

On 05/30/2014 11:34 AM, Gabriel Becker wrote:

This isn't likely to make much difference in most cases, but calling a
function via :: can incur up to about twice the overhead on average
compared to calling an imported function

 > fun1
function ()
file_ext("text.txt")

 > fun2
function ()
tools::file_ext("text.txt")

 > microbenchmark(fun1(), times=1)
Unit: microseconds
exprmin lq median  uq max neval
  fun1() 24.506 25.654 26.324 27.8795 154.001 1
 > microbenchmark(fun2(), times=1)
Unit: microseconds
exprmin  lq  median  uq max neval
  fun2() 42.723 46.6945 48.8685 52.0595 2021.91 1


Interesting. Or with a void function so the timing more closely
reflects the time it takes to look up the symbol:

  > void
  function ()
  NULL
  

  > fun1
  function ()
  void()
  

  > fun2
  function ()
  S4Vectors::void()
  

  > microbenchmark(fun1(), times=1)
  Unit: nanoseconds
 expr min  lq median  uq   max neval
   fun1() 261 268270 301 11960 1
  > microbenchmark(fun2(), times=1)
  Unit: microseconds
 exprmin lq median uq  max neval
   fun2() 13.486 14.918 15.782 16.753 60542.19 1

S4Vectors::void() is about 60x slower than void()!

Cheers,
H.



Also, if one uses roxygen2 (or even if one doesn't) ##'@importFrom above
the function doing the calling documents this.

And of course if you need to know where a function lives environment
will tell you.

~G


On Fri, May 30, 2014 at 10:00 AM, Hadley Wickham mailto:h.wick...@gmail.com>> wrote:

 > There is at least one subtle consequence to keep in mind when doing
 > this. Of course, whatever choice you make, if the whatever() function
 > moves to a different package, this breaks your package.
 > However, if you explicitly import the function, your package will
 > break at load-time (which is good) and you'll only have to modify
 > 1 line in the NAMESPACE file to fix it. But if you do
foo::whatever(),
 > your package won't break at load-time, only at run-time. Also you'll
 > have to edit all the calls to foo::whatever() to fix the package.
 >
 > Probably not a big deal, but in an environment like Bioconductor
where
 > infrastructure classes and functions can be shared by hundreds of
 > packages, having people use foo::whatever() in a systematic way would
 > probably make maintenance a little bit more painful than it needs to
 > be when the need arises to reorganize/refactor parts of the
 > infrastructure. Also, the ability to quickly grep the NAMESPACE
 > files of all BioC packages to see who imports what is very convenient
 > in this situation.

OTOH, I think there's a big benefit to being able to read package code
and instantly know where a function comes from.

Personally, I found this outweighs the benefits that you outline:

* functions rarely move between packages, and gsubbing for pkga:foo to
pkgb:foo isn't hard
* it's not that much hard to grep for pkg::foo in R/* than it is to
grep NAMESPACE

Hadley

--
http://had.co.nz/

__
R-devel@r-project.org  mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel




--
Gabriel Becker
Graduate Student
Statistics Department
University of California, Davis


--
Hervé Pagès

Program in Computational Biology
Division of Public Health Sciences
Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N, M1-B514
P.O. Box 19024
Seattle, WA 98109-1024

E-mail: hpa...@fhcrc.org
Phone:  (206) 667-5791
Fax:(206) 667-1319

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Style question

2014-05-30 Thread Winston Chang
Using `::` does add some overhead - on the order of 5-10 microseconds
on my computer. Still, it would take 100,000 calls to add 0.5-1 second
of delay.

microbenchmark(
  base::identity(1),
  identity(1),
  unit = "us"
)
# Unit: microseconds
#   expr   min lq median uqmax neval
#  base::identity(1) 5.677 6.2180 6.6695 7.3655 60.104   100
#identity(1) 0.262 0.2965 0.3210 0.4035  1.034   100

This test isn't exactly like putting identity in imports, since in
this case, the number environments to search is greater -- but it's
reasonably close.

If you're in a situation where you want to be explicit about where a
function came from, but the slowness of `::` is an issue, you could
create a variable that points to the environment and access the
function using $:

base <- as.environment('package:base')
microbenchmark(
  base::identity(1),
  base$identity(1),
  identity(1),
  unit = "us"
)
# Unit: microseconds
#   expr   min lq median uqmax neval
#  base::identity(1) 5.520 6.0795 6.4485 7.0020 32.232   100
#   base$identity(1) 0.504 0.5940 0.6635 0.8105  7.701   100
#identity(1) 0.248 0.2815 0.3100 0.3885  7.925   100


-Winston

On Fri, May 30, 2014 at 2:53 PM, Hervé Pagès  wrote:
> Hi Gabe,
>
>
> On 05/30/2014 11:34 AM, Gabriel Becker wrote:
>>
>> This isn't likely to make much difference in most cases, but calling a
>> function via :: can incur up to about twice the overhead on average
>> compared to calling an imported function
>>
>>  > fun1
>> function ()
>> file_ext("text.txt")
>> 
>>  > fun2
>> function ()
>> tools::file_ext("text.txt")
>> 
>>  > microbenchmark(fun1(), times=1)
>> Unit: microseconds
>> exprmin lq median  uq max neval
>>   fun1() 24.506 25.654 26.324 27.8795 154.001 1
>>  > microbenchmark(fun2(), times=1)
>> Unit: microseconds
>> exprmin  lq  median  uq max neval
>>   fun2() 42.723 46.6945 48.8685 52.0595 2021.91 1
>
>
> Interesting. Or with a void function so the timing more closely
> reflects the time it takes to look up the symbol:
>
>   > void
>   function ()
>   NULL
>   
>
>   > fun1
>   function ()
>   void()
>   
>
>   > fun2
>   function ()
>   S4Vectors::void()
>   
>
>   > microbenchmark(fun1(), times=1)
>   Unit: nanoseconds
>
>  expr min  lq median  uq   max neval
>fun1() 261 268270 301 11960 1
>
>   > microbenchmark(fun2(), times=1)
>   Unit: microseconds
>  exprmin lq median uq  max neval
>fun2() 13.486 14.918 15.782 16.753 60542.19 1
>
> S4Vectors::void() is about 60x slower than void()!
>
> Cheers,
> H.
>
>>
>> Also, if one uses roxygen2 (or even if one doesn't) ##'@importFrom above
>> the function doing the calling documents this.
>>
>> And of course if you need to know where a function lives environment
>> will tell you.
>>
>> ~G
>>
>>
>> On Fri, May 30, 2014 at 10:00 AM, Hadley Wickham > > wrote:
>>
>>  > There is at least one subtle consequence to keep in mind when doing
>>  > this. Of course, whatever choice you make, if the whatever()
>> function
>>  > moves to a different package, this breaks your package.
>>  > However, if you explicitly import the function, your package will
>>  > break at load-time (which is good) and you'll only have to modify
>>  > 1 line in the NAMESPACE file to fix it. But if you do
>> foo::whatever(),
>>  > your package won't break at load-time, only at run-time. Also
>> you'll
>>  > have to edit all the calls to foo::whatever() to fix the package.
>>  >
>>  > Probably not a big deal, but in an environment like Bioconductor
>> where
>>  > infrastructure classes and functions can be shared by hundreds of
>>  > packages, having people use foo::whatever() in a systematic way
>> would
>>  > probably make maintenance a little bit more painful than it needs
>> to
>>  > be when the need arises to reorganize/refactor parts of the
>>  > infrastructure. Also, the ability to quickly grep the NAMESPACE
>>  > files of all BioC packages to see who imports what is very
>> convenient
>>  > in this situation.
>>
>> OTOH, I think there's a big benefit to being able to read package code
>> and instantly know where a function comes from.
>>
>> Personally, I found this outweighs the benefits that you outline:
>>
>> * functions rarely move between packages, and gsubbing for pkga:foo to
>> pkgb:foo isn't hard
>> * it's not that much hard to grep for pkg::foo in R/* than it is to
>> grep NAMESPACE
>>
>> Hadley
>>
>> --
>> http://had.co.nz/
>>
>> __
>> R-devel@r-project.org  mailing list
>>
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>
>>
>>
>> --
>> Gabriel Becker
>> Graduate Student
>> Statistics Department
>> University of California, Davis
>
>
> --
> Hervé Pagès

Re: [Rd] R CMD check for the R code from vignettes

2014-05-30 Thread Yihui Xie
Hi Kevin,

Personally I also avoid code that have side effects in the inline
expressions, but I think there are legitimate use cases in which
inline expressions have side effects. This discussion was motivated by
Carl's knitcitations package, as well as another question on
StackOverflow (http://stackoverflow.com/q/23927325/559676).

I'm aware of the distinction between the original literate programming
paradigm and the one in R (that is why I said "literate programming in
R" instead of "literate programming in general"). In R, weave actually
does what both weave and tangle do in the original paradigm -- there
is no need to tangle the document to get the computer code so that we
can execute it.

To Carl: I agree that it is a little extreme to drop tangle entirely,
so I think at least knitr::purl() will stay there in the foreseeable
future. I tend to adopt Henrik's idea, i.e., to provide vignette
engines that just ignore tangle. At the moment, it seems R CMD check
is comfortable with vignettes that do not have corresponding R
scripts, and I hope these R scripts will not become mandatory in the
future.

Thanks everyone for your comments!

Regards,
Yihui
--
Yihui Xie 
Web: http://yihui.name


On Fri, May 30, 2014 at 8:21 AM, Kevin Coombes
 wrote:
> Hi,
>
> Unless someone is planning to change Stangle to include inline expressions
> (which I am *not* advocating), I think that relying on side-effects within
> an \Sexpr construction is a bad idea. So, my own coding style is to restrict
> my use of \Sexpr to calls of the form
> \Sexpr{show.the.value.of.this.variable}. As a result, I more-or-less believe
> that having R CMD check use Stangle and report an error is probably a good
> thing.
>
> There is a completely separate questions about the relationship between
> Sweave/Stangle or knit/purl and literate programming that is linked to your
> question about whether to use Stangle on vignettes. The underlying model(s)
> in R have drifted away from Knuth's original conception, for some good
> reasons.
>
> The original goal of literate programming was to be able to explain the
> algorithms and data structures in the code to humans.  For that purpose, it
> was important to have named code chunks that you could move around, which
> would allow you to describe the algorithm starting from a high level
> overview and then drilling down into the details. From this perspective,
> "tangle" was critical to being able to reconstruct a program that would
> compile and run correctly.
>
> The vast majority of applications of Sweave/Stangle or knit/purl in modern R
> have a completely different goal: to produce some sort of document that
> describes the results of an analysis to a non-programmer or
> non-statistician.  For this goal, "weave" is much more important than
> "tangle", because the most important aspect is the ability to integrate the
> results (figures, tables, etc) of running the code into the document that
> get passed off to the person for whom the analysis was prepared. As a
> result, the number of times in my daily work that I need to explicitly
> invoke Stangle (or purl) explicitly is many orders of magnitude smaller than
> the number of times that I invoke Sweave (or knitr).
>
>   -- Kevin
>
>
>
> On 5/30/2014 1:04 AM, Yihui Xie wrote:
>>
>> Hi,
>>
>> Recently I saw a couple of cases in which the package vignettes were
>> somewhat complicated so that Stangle() (or knitr::purl() or other
>> tangling functions) can fail to produce the exact R code that is
>> executed by the weaving function Sweave() (or knitr::knit(), ...). For
>> example, this is a valid document that can pass the weaving process
>> but cannot generate a valid R script to be source()d:
>>
>> \documentclass{article}
>> \begin{document}
>> Assign 1 to x: \Sexpr{x <- 1}
>> <<>>=
>> x + 1
>> @
>> \end{document}
>>
>> That is because the inline R code is not written to the R script
>> during the tangling process. When an R package vignette contains
>> inline R code expressions that have significant side effects, R CMD
>> check can fail because the tangled output is not correct. What I
>> showed here is only a trivial example, and I have seen two packages
>> that have more complicated scenarios than this. Anyway, the key thing
>> that I want to discuss here is, since the R code in the vignette has
>> been executed once during the weaving process, does it make much sense
>> to execute the code generated from the tangle function? In other
>> words, if the weaving process has succeeded, is it necessary to
>> source() the R script again?
>>
>> The two options here are:
>>
>> 1. Do not check the R code from vignettes;
>> 2. Or fix the tangle function so that it produces exactly what was
>> executed in the weaving process. If this is done, I'm back to my
>> previous question: does it make sense to run the code twice?
>>
>> To push this a little further, personally I do not quite appreciate
>> literate programming in R as two separate steps, namely weave and
>>