date:20110317

[Rd] Detecting bad lexical scoping

2011-03-17 Thread Ryan King

I've recently hunted down a troublesome bug in my own code, and am
looking for an easy mechanism to detect this kind of error in other R
code.  The problem was an undefined variable inside of a function.
Unfortunately, R looked for that variable in the global environment
and found it since there was variable with that name in my testing
scripts (note to self: do not name things "x").

Is there an easy way to report all the non-local objects accessed in a
function?

Is there any easy way around the usual scoping rules?  I want to be
able to get to base functions, and am willing to namespace:: or :::
access all of my own functions (it's in a package) if necessary to
block the standard scoping rules.  The language def section on
environments made me hurt.

Thanks,
C Ryan King
Dept Health Studies
University of Chicago

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Detecting bad lexical scoping

2011-03-17 Thread Duncan Murdoch


On 16/03/2011 4:14 PM, Ryan King wrote:

I've recently hunted down a troublesome bug in my own code, and am
looking for an easy mechanism to detect this kind of error in other R
code.  The problem was an undefined variable inside of a function.
Unfortunately, R looked for that variable in the global environment
and found it since there was variable with that name in my testing
scripts (note to self: do not name things "x").

Is there an easy way to report all the non-local objects accessed in a
function?


The problem is that all base functions are non-local objects, so you 
probably don't want to do exactly that, but the codetools package can do 
it (using tools::findGlobals) and also provides help in identifying 
errors.  You say you keep your code in a package (which is a good idea) 
so you should run "R CMD check" on the package.  If your code is not in 
a package, use tools::checkUsage or related functions.




Is there any easy way around the usual scoping rules?  I want to be
able to get to base functions, and am willing to namespace:: or :::
access all of my own functions (it's in a package) if necessary to
block the standard scoping rules.  The language def section on
environments made me hurt.


Quoting Uwe Ligges, "No".

Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Feature request: display file name in R CMD check warning

2011-03-17 Thread Duncan Murdoch


On 16/03/2011 7:55 PM, Dan Tenenbaum wrote:

Hi,

I came across the following warning in R CMD check (it only occurred on
Windows):

The \usage entries for S3 methods should use the \method markup and not

their full name.
See the chapter 'Writing R documentation files' in manual 'Writing R
Extensions'.



The package I'm looking at is one that I did not write which has 34 .Rd
files. This warning does not tell me which file to look in. It would be very
helpful if it did. Same goes for other warnings/errors produced by R CMD
check.

I also tried to narrow it down with tools:checkRd() but as I mentioned
earlier, this function does not produce the same output as R CMD check and
the same thing happens in the case of the warning above. I ran checkRd() on
all 34 .Rd files and did not see the warning above.

I'd worry that people might give up and not try and find the source of
warnings if it proves too difficult.
Thanks for considering this idea...


That sounds like a good idea.  It would be easier to experiment with if 
you provided the offending file.  (Yes, the lack of a name makes it hard 
to find that file, but it's easier for you than for almost anyone else.)


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Feature request: display file name in R CMD check warning

2011-03-17 Thread Duncan Murdoch


On 16/03/2011 7:55 PM, Dan Tenenbaum wrote:

Hi,

I came across the following warning in R CMD check (it only occurred on
Windows):

The \usage entries for S3 methods should use the \method markup and not

their full name.
See the chapter 'Writing R documentation files' in manual 'Writing R
Extensions'.



The package I'm looking at is one that I did not write which has 34 .Rd
files. This warning does not tell me which file to look in. It would be very
helpful if it did. Same goes for other warnings/errors produced by R CMD
check.


I was unable to duplicate this.  When I tried it by messing up one of 
the man pages in the ellipse package, I got this:


S3 methods shown with full name in documentation object 'ellipse.glm':
  ellipse.glm

The \usage entries for S3 methods should use the \method markup and not
their full name.
See the chapter 'Writing R documentation files' in manual 'Writing R
Extensions'.

"Documentation object 'ellipse.glm'" tells me the \name{} inside the .Rd 
file, which is enough to uniquely identify the file.  Are you not seeing 
this part of the message?


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Detecting bad lexical scoping

2011-03-17 Thread Gabor Grothendieck

On Wed, Mar 16, 2011 at 4:14 PM, Ryan King  wrote:
> I've recently hunted down a troublesome bug in my own code, and am
> looking for an easy mechanism to detect this kind of error in other R
> code.  The problem was an undefined variable inside of a function.
> Unfortunately, R looked for that variable in the global environment
> and found it since there was variable with that name in my testing
> scripts (note to self: do not name things "x").
>
> Is there an easy way to report all the non-local objects accessed in a
> function?
>
> Is there any easy way around the usual scoping rules?  I want to be
> able to get to base functions, and am willing to namespace:: or :::
> access all of my own functions (it's in a package) if necessary to
> block the standard scoping rules.  The language def section on
> environments made me hurt.
>

One way is discussed in the Feb 27, 2010 news item on the proto home
page: http://r-proto.googlecode.com

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Standardized Pearson residuals (and score tests)

2011-03-17 Thread peter dalgaard


On Mar 16, 2011, at 23:29 , Gordon K Smyth wrote:

> Hi Peter and others,
> 
> If it helps, I wrote a small function glm.scoretest() for the statmod package 
> on CRAN to compute score tests from glm fits.  The score test for adding a 
> covariate, or any set of covariates, can be extracted very neatly from the 
> standard glm output, although you probably already know that.

Thanks Gordon,

I'll have a look. It's the kind of think where you _strongly suspect_ that a 
neat solution exists, but where you can't just write it down immediately. Looks 
like your code needs some elaboration to handle factor terms and more general 
model reductions, though.

-pd

> 
> Regards
> Gordon
> 
> -
> Professor Gordon K Smyth,
> NHMRC Senior Research Fellow,
> Bioinformatics Division,
> Walter and Eliza Hall Institute of Medical Research,
> 1G Royal Parade, Parkville, Vic 3052, Australia.
> sm...@wehi.edu.au
> http://www.wehi.edu.au
> http://www.statsci.org/smyth
> 
>> Date: Tue, 15 Mar 2011 12:17:46 +0100
>> From: peter dalgaard 
>> To: Brett Presnell 
>> Cc: r-devel@r-project.org
>> Subject: Re: [Rd] Standardized Pearson residuals
>> 
>> 
>> On Mar 15, 2011, at 04:40 , Brett Presnell wrote:
>> 
> Background: I'm currently teaching an undergrad/grad-service course from 
> Agresti's "Introduction to Categorical Data Analysis (2nd edn)" and 
> deviance residuals are not used in the text.  For now I'll just provide 
> the students with a simple function to use, but I prefer to use R's 
> native capabilities whenever possible.
 
 Incidentally, chisq.test will have a stdres component in 2.13.0 for much 
 the same reason.
>>> 
>>> Thank you.  That's one more thing I won't have to provide code for anymore. 
>>>  Coincidentally, Agresti mentioned this to me a week or two ago as 
>>> something that he felt was missing, so that's at least two people who will 
>>> be happy to see this added.
>>> 
>> 
>> And of course, I was teaching a course based on Agresti & Franklin: 
>> "Statistics, The Art and Science of Learning from Data", when I realized 
>> that R was missing standardized residuals.
>> 
>> 
>>> It would also be nice for teaching purposes if glm or summary.glm had a 
>>> "pearsonchisq" component and a corresponding extractor function, but I can 
>>> imagine that there might be arguments against it that haven't occured to 
>>> me.  Plus, I doubt that anyone wants to touch glm unless it's to repair a 
>>> bug. If I'm wrong about all that though, ...
>>> 
>> Hmm, how would that work? If there was one, I'd worry that people would 
>> start subtracting them which is usually not the right thing to do. I do miss 
>> having a test on the residual deviance occasionally (even though it is only 
>> sometimes meaningful), having to fit a saturated model explicitly can be a 
>> bit silly. E.g. in this case (homogeneity of birth rates):
>> 
>>> anova(glm(births~month,poisson,data=bb), test="Chisq")
>> ...
>> Df Deviance Resid. Df Resid. Dev P(>|Chi|)
>> NULL 11 225.98
>> month 11   225.98 0   0.00 < 2.2e-16 ***
>>> anova(glm(births~1,poisson,data=bb), test="Chisq")
>> ...
>>Df Deviance Resid. Df Resid. Dev P(>|Chi|)
>> NULL11 225.98
>> 
>> Notice that the latter version gives me the correct deviance but no p-value.
>> 
>> 
>> A better support for generic score tests could be desirable too. I suspect 
>> that this would actually be the Pearson Chi-square in the interesting cases.
>> 
>> -- 
>> Peter Dalgaard
>> Center for Statistics, Copenhagen Business School
>> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501
>> Email: pd@cbs.dk  Priv: pda...@gmail.com
> 
> __
> The information in this email is confidential and intended solely for the 
> addressee.
> You must not disclose, forward, print or use it without the permission of the 
> sender.
> __

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] New feature for download.packages(): optional resolution of package dependencies

2011-03-17 Thread Dr. Peter Ruckdeschel

Hi r-devels,

may I ask for an enhancement for download.packages()
to optionally resolve package dependencies similarly to
the respective functionality in install.packages() ?

This would be a major help in compiling a large number of
packages (e.g. by means of download.view() from pkg ctv)
for later offline installations.

Last November, I addressed Duncan Murdoch offline in this
issue, and Duncan then seconded me---so the idea might not
be this silly. He was pointing me to available.packages() which
already provides a dependency list, which though would have
to be parsed.

AFAICS in the svn, as of rev54842, he has not found the time
for looking deeper into this so far. Surely, like most of
you, he has had more urgent issues to work on,  but maybe
someone (else) of you has an idea for an easy but still
sustainable solution.

Any suggestions appreciated.

Cheers,
Peter

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Standardized Pearson residuals

2011-03-17 Thread peter dalgaard

On Mar 16, 2011, at 23:34 , John Maindonald wrote:

> One can easily test for the binary case and not give the statistic in that 
> case.

Warning if expected cell counts < 5 would be another possibility. 

> 
> A general point is that if one gave no output that was not open to abuse,
> there'd be nothing given at all!  One would not be giving any output at all
> from poisson or binomial models, given that data that really calls for 
> quasi links (or a glmm with observation level random effects) is in my
> experience the rule rather than the exception!

Hmmm. Not sure I agree on that entirely, but that's a different discussion.

> At the very least, why not a function dispersion() or pearsonchisquare()
> that gives this information.

Lots of options here Offhand, my preference would go to something like
anova(..., test="score") and/or an extra line in summary(). It's not a 
computationally intensive item as far as I can see, it's more about "output 
real estate" -- how "SAS-like" do we want to become?

> Apologies that I misattributed this.

Never mind...

Back to the original question: 

The current rstandard() code reads

## FIXME ! -- make sure we are following "the literature":
rstandard.glm <- function(model, infl = lm.influence(model, do.coef=FALSE), ...)
{
res <- infl$wt.res # = "dev.res"  really
res <- res / sqrt(summary(model)$dispersion * (1 - infl$hat))
res[is.infinite(res)] <- NaN
res
}

which is "svn blame" to ripley but that is due to the 2003 code reorganization 
(except for the infinity check from 2005). So apparently, we have had that 
FIXME since forever... and finding its author appears to be awkward (Maechler, 
perhaps?).

I did try Bretts code in lieu of the above (with a mod to handle $dispersion) 
and even switched the default to use the Pearson residuals. Make check-devel 
sailed straight through apart from the obvious code/doc mismatch, so we don't 
have any checks in place nor any examples using rstandard(). I rather strongly 
suspect that there aren't many user codes using it either.

It is quite tempting simply to commit the change (after updating the docs). One 
thing holding me back though: I don't know what "the literature" refers to.

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Standardized Pearson residuals

2011-03-17 Thread Martin Maechler

> peter dalgaard 
> on Thu, 17 Mar 2011 15:45:01 +0100 writes:

> On Mar 16, 2011, at 23:34 , John Maindonald wrote:

>> One can easily test for the binary case and not give the
>> statistic in that case.

> Warning if expected cell counts < 5 would be another
> possibility.

>> 
>> A general point is that if one gave no output that was not open
>> to abuse, there'd be nothing given at all!  One would not be
>> giving any output at all from poisson or binomial models, given
>> that data that really calls for quasi links (or a glmm with
>> observation level random effects) is in my experience the rule
>> rather than the exception!

> Hmmm. Not sure I agree on that entirely, but that's a different
> discussion.


>> At the very least, why not a function dispersion() or
>> pearsonchisquare() that gives this information.

> Lots of options here Offhand, my preference would go to
> something like anova(..., test="score") and/or an extra line in
> summary(). It's not a computationally intensive item as far as I
> can see, it's more about "output real estate" -- how "SAS-like"
> do we want to become?

>> Apologies that I misattributed this.

> Never mind...

> Back to the original question:

> The current rstandard() code reads

## FIXME ! -- make sure we are following "the literature":
rstandard.glm <- function(model, infl = lm.influence(model, do.coef=FALSE), ...)
{
res <- infl$wt.res # = "dev.res"  really
res <- res / sqrt(summary(model)$dispersion * (1 - infl$hat))
res[is.infinite(res)] <- NaN
res
}

> which is "svn blame" to ripley but that is due to the 2003
> code reorganization (except for the infinity check from
> 2005). So apparently, we have had that FIXME since
> forever... and finding its author appears to be awkward
> (Maechler, perhaps?).

yes, almost surely

> I did try Bretts code in lieu of the above (with a mod to
> handle $dispersion) and even switched the default to use
> the Pearson residuals. Make check-devel sailed straight
> through apart from the obvious code/doc mismatch, so we
> don't have any checks in place nor any examples using
> rstandard(). I rather strongly suspect that there aren't
> many user codes using it either.

> It is quite tempting simply to commit the change (after
> updating the docs). One thing holding me back though: I
> don't know what "the literature" refers to.

well, "the relevant publications on the topic" ...
and now define that (e.g. using the three 'References' on the
help page).
Really, that's what I think I meant when I (think I) wrote that FIXME.
The point then I think was that we had code "donations", and they
partly were clearly providing functionality that was (tested)
"correct" (according to e.g. McCoullagh & Nelder and probably
another one or two text books I would have consulted ... no
large Wikipedia back then), 
but also provided things for which there was nothing in "the
literature", but as the author provided them with other good
code, we would have put it in, as well
== my vague recollection from the past

Martin

> -- 
> Peter Dalgaard Center for Statistics, Copenhagen Business
> School Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501 Email: pd@cbs.dk Priv:
> pda...@gmail.com

> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] New feature for download.packages(): optional resolution of package dependencies

2011-03-17 Thread Simon Urbanek


On Mar 17, 2011, at 10:26 AM, Dr. Peter Ruckdeschel wrote:

> Hi r-devels,
> 
> may I ask for an enhancement for download.packages()
> to optionally resolve package dependencies similarly to
> the respective functionality in install.packages() ?
> 
> This would be a major help in compiling a large number of
> packages (e.g. by means of download.view() from pkg ctv)
> for later offline installations.
> 
> Last November, I addressed Duncan Murdoch offline in this
> issue, and Duncan then seconded me---so the idea might not
> be this silly. He was pointing me to available.packages() which
> already provides a dependency list, which though would have
> to be parsed.
> 

But you don't have to do it yourself - the code is already there, try

utils:::getDependencies("foo",,available.packages())

That said, just adding something along the lines of

if (!missing(dependencies)) pkg <- getDependencies(pkg, dependencies, available)

might be simple enough and do the trick ...

Cheers,
Simon


> AFAICS in the svn, as of rev54842, he has not found the time
> for looking deeper into this so far. Surely, like most of
> you, he has had more urgent issues to work on,  but maybe
> someone (else) of you has an idea for an easy but still
> sustainable solution.
> 
> Any suggestions appreciated.
> 
> Cheers,
> Peter
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Feature request: display file name in R CMD check warning

2011-03-17 Thread Dan Tenenbaum

On Thu, Mar 17, 2011 at 2:36 AM, Duncan Murdoch
 wrote:
> On 16/03/2011 7:55 PM, Dan Tenenbaum wrote:
>>
>> Hi,
>>
>> I came across the following warning in R CMD check (it only occurred on
>> Windows):
>>
>> The \usage entries for S3 methods should use the \method markup and not
>>>
>>> their full name.
>>> See the chapter 'Writing R documentation files' in manual 'Writing R
>>> Extensions'.
>>
>>
>> The package I'm looking at is one that I did not write which has 34 .Rd
>> files. This warning does not tell me which file to look in. It would be
>> very
>> helpful if it did. Same goes for other warnings/errors produced by R CMD
>> check.
>
> I was unable to duplicate this.  When I tried it by messing up one of the
> man pages in the ellipse package, I got this:
>
> S3 methods shown with full name in documentation object 'ellipse.glm':
>  ellipse.glm
>
> The \usage entries for S3 methods should use the \method markup and not
> their full name.
> See the chapter 'Writing R documentation files' in manual 'Writing R
> Extensions'.
>
> "Documentation object 'ellipse.glm'" tells me the \name{} inside the .Rd
> file, which is enough to uniquely identify the file.  Are you not seeing
> this part of the message?
>


No, I'm not.

I still can't identify the offending file, but you can download the
whole package:

svn --username readonly --password readonly export
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/affyILM/

Then run R CMD check on this package. You may need to install dependencies.

Thanks
Dan


> Duncan Murdoch
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Feature request: display file name in R CMD check warning

2011-03-17 Thread Dan Tenenbaum

On Thu, Mar 17, 2011 at 9:33 AM, Dan Tenenbaum  wrote:
> On Thu, Mar 17, 2011 at 2:36 AM, Duncan Murdoch
>  wrote:
>> On 16/03/2011 7:55 PM, Dan Tenenbaum wrote:
>>>
>>> Hi,
>>>
>>> I came across the following warning in R CMD check (it only occurred on
>>> Windows):
>>>
>>> The \usage entries for S3 methods should use the \method markup and not

 their full name.
 See the chapter 'Writing R documentation files' in manual 'Writing R
 Extensions'.
>>>
>>>
>>> The package I'm looking at is one that I did not write which has 34 .Rd
>>> files. This warning does not tell me which file to look in. It would be
>>> very
>>> helpful if it did. Same goes for other warnings/errors produced by R CMD
>>> check.
>>
>> I was unable to duplicate this.  When I tried it by messing up one of the
>> man pages in the ellipse package, I got this:
>>
>> S3 methods shown with full name in documentation object 'ellipse.glm':
>>  ellipse.glm
>>
>> The \usage entries for S3 methods should use the \method markup and not
>> their full name.
>> See the chapter 'Writing R documentation files' in manual 'Writing R
>> Extensions'.
>>
>> "Documentation object 'ellipse.glm'" tells me the \name{} inside the .Rd
>> file, which is enough to uniquely identify the file.  Are you not seeing
>> this part of the message?
>>
>
>
> No, I'm not.
>
> I still can't identify the offending file, but you can download the
> whole package:
>
> svn --username readonly --password readonly export
> https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/affyILM/
>
> Then run R CMD check on this package. You may need to install dependencies.
>

Oops, sorry, this is the wrong package.
The one that produces this warning is:

svn --username readonly --password readonly export
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/bgafun

This warning only occurs under windows.

Dan


> Thanks
> Dan
>
>
>> Duncan Murdoch
>>
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Standardized Pearson residuals

2011-03-17 Thread peter dalgaard


On Mar 17, 2011, at 16:14 , Martin Maechler wrote:

>> peter dalgaard 
>>on Thu, 17 Mar 2011 15:45:01 +0100 writes:
>> 
> 
>> Back to the original question:
> 
>> The current rstandard() code reads
> 
> ## FIXME ! -- make sure we are following "the literature":
> rstandard.glm <- function(model, infl = lm.influence(model, do.coef=FALSE), 
> ...)
> {
>res <- infl$wt.res # = "dev.res"  really
>res <- res / sqrt(summary(model)$dispersion * (1 - infl$hat))
>res[is.infinite(res)] <- NaN
>res
> }
> 
>> which is "svn blame" to ripley but that is due to the 2003
>> code reorganization (except for the infinity check from
>> 2005). So apparently, we have had that FIXME since
>> forever... and finding its author appears to be awkward
>> (Maechler, perhaps?).
> 
> yes, almost surely
> 
>> I did try Bretts code in lieu of the above (with a mod to
>> handle $dispersion) and even switched the default to use
>> the Pearson residuals. Make check-devel sailed straight
>> through apart from the obvious code/doc mismatch, so we
>> don't have any checks in place nor any examples using
>> rstandard(). I rather strongly suspect that there aren't
>> many user codes using it either.
> 
>> It is quite tempting simply to commit the change (after
>> updating the docs). One thing holding me back though: I
>> don't know what "the literature" refers to.
> 
> well, "the relevant publications on the topic" ...
> and now define that (e.g. using the three 'References' on the
> help page).

I count 5 actually... IIRC, the first two do not deal with glm diagnostics. The 
last two are by Fox, and, presumably, he is around to chime in if he wants. The 
middle one, by Williams, does define both standardized Pearson and standardized 
deviance residuals.

Or did you mean the three on ?glm.summaries? I would assume Davison and Snell 
to be the operative one, but I don't have it to hand.

Anyways, given that de default for residuals.glm is deviance residuals, I 
suppose that rstandard.glm should have the same default for consistency, and 
that is also the least disruptive variant. I see no reason not to make 
standardized Pearson residuals an option. 

> Really, that's what I think I meant when I (think I) wrote that FIXME.
> The point then I think was that we had code "donations", and they
> partly were clearly providing functionality that was (tested)
> "correct" (according to e.g. McCoullagh & Nelder and probably
> another one or two text books I would have consulted ... no
> large Wikipedia back then), 
> but also provided things for which there was nothing in "the
> literature", but as the author provided them with other good
> code, we would have put it in, as well
> == my vague recollection from the past
> 
> Martin
> 
>> -- 
>> Peter Dalgaard Center for Statistics, Copenhagen Business
>> School Solbjerg Plads 3, 2000 Frederiksberg, Denmark
>> Phone: (+45)38153501 Email: pd@cbs.dk Priv:
>> pda...@gmail.com
> 
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel

-- 
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Standardized Pearson residuals

2011-03-17 Thread John Fox

Dear Peter and Martin,

On Thu, 17 Mar 2011 18:08:18 +0100
 peter dalgaard  wrote:
> 
> On Mar 17, 2011, at 16:14 , Martin Maechler wrote:
> 
> >> peter dalgaard 
> >>on Thu, 17 Mar 2011 15:45:01 +0100 writes:
> >> 
> > 
> >> Back to the original question:
> > 
> >> The current rstandard() code reads
> > 
> > ## FIXME ! -- make sure we are following "the literature":
> > rstandard.glm <- function(model, infl = lm.influence(model, do.coef=FALSE), 
> > ...)
> > {
> >res <- infl$wt.res # = "dev.res"  really
> >res <- res / sqrt(summary(model)$dispersion * (1 - infl$hat))
> >res[is.infinite(res)] <- NaN
> >res
> > }
> > 
> >> which is "svn blame" to ripley but that is due to the 2003
> >> code reorganization (except for the infinity check from
> >> 2005). So apparently, we have had that FIXME since
> >> forever... and finding its author appears to be awkward
> >> (Maechler, perhaps?).
> > 
> > yes, almost surely
> > 
> >> I did try Bretts code in lieu of the above (with a mod to
> >> handle $dispersion) and even switched the default to use
> >> the Pearson residuals. Make check-devel sailed straight
> >> through apart from the obvious code/doc mismatch, so we
> >> don't have any checks in place nor any examples using
> >> rstandard(). I rather strongly suspect that there aren't
> >> many user codes using it either.
> > 
> >> It is quite tempting simply to commit the change (after
> >> updating the docs). One thing holding me back though: I
> >> don't know what "the literature" refers to.
> > 
> > well, "the relevant publications on the topic" ...
> > and now define that (e.g. using the three 'References' on the
> > help page).
> 
> I count 5 actually... IIRC, the first two do not deal with glm diagnostics. 
> The last two are by Fox, and, presumably, he is around to chime in if he 
> wants. The middle one, by Williams, does define both standardized Pearson and 
> standardized deviance residuals.

Though I don't have it in front of me, I recall that my Applied Regression text 
follows Williams and defines both standardized deviance and standardized 
Pearson residuals. As well, there are new editions of both these sources: 

Fox, J. (2008) Applied Regression Analysis and Generalized Linear Models, 
Second Edition (Sage)

Fox, J. and S. Weisberg (2011) An R Companion to Applied Regression, Second 
Edition (Sage)

I'd take Williams as the definitive reference. I'll send a follow-up message if 
my memory proves faulty.

Best,
 John

> 
> Or did you mean the three on ?glm.summaries? I would assume Davison and Snell 
> to be the operative one, but I don't have it to hand.
>  
> Anyways, given that de default for residuals.glm is deviance residuals, I 
> suppose that rstandard.glm should have the same default for consistency, and 
> that is also the least disruptive variant. I see no reason not to make 
> standardized Pearson residuals an option. 
> 
> > Really, that's what I think I meant when I (think I) wrote that FIXME.
> > The point then I think was that we had code "donations", and they
> > partly were clearly providing functionality that was (tested)
> > "correct" (according to e.g. McCoullagh & Nelder and probably
> > another one or two text books I would have consulted ... no
> > large Wikipedia back then), 
> > but also provided things for which there was nothing in "the
> > literature", but as the author provided them with other good
> > code, we would have put it in, as well
> > == my vague recollection from the past
> > 
> > Martin
> > 
> >> -- 
> >> Peter Dalgaard Center for Statistics, Copenhagen Business
> >> School Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> >> Phone: (+45)38153501 Email: pd@cbs.dk Priv:
> >> pda...@gmail.com
> > 
> >> __
> >> R-devel@r-project.org mailing list
> >> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> -- 
> Peter Dalgaard
> Center for Statistics, Copenhagen Business School
> Solbjerg Plads 3, 2000 Frederiksberg, Denmark
> Phone: (+45)38153501
> Email: pd@cbs.dk  Priv: pda...@gmail.com
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel


John Fox
Sen. William McMaster Prof. of Social Statistics
Department of Sociology
McMaster University
Hamilton, Ontario, Canada
http://socserv.mcmaster.ca/jfox/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Standardized Pearson residuals

2011-03-17 Thread Brett Presnell

John Maindonald  writes:

> One can easily test for the binary case and not give the statistic in
> that case.
>
> A general point is that if one gave no output that was not open to
> abuse, there'd be nothing given at all!

Thanks John.  I've been reluctant to push too hard for this on r-devel,
but this is more or less the point I made to Peter in a private email.
My example was the deviance, which is just as open to abuse as Pearson's
chi-square (except that differences of deviances for nested,
non-saturated models are usually ok), but which print.glm() and
summary.glm() proudly display at every opportunity.  If we were overly
concerned about preventing abuse, we might just provide the
(log-)likelihood so that users would at least have to take some
(trivial) action to compute the "dangerous" deviance statistic.

Obviously the question is one of balancing utility against the
likelihood of abuse, and apparently you and I would agree that the
utility of having Pearson's chi-square available as a matter of course
outweighs the dangers.  I don't think it would be too difficult to get
Peter to agree, but I suspect that there would likely be a strong
general reluctance to change glm() and/or summary.glm().  This could be
avoided by just providing a convenience function, as you suggest,
without necessarily changing any existing function.  This would put
Pearson's chi-square on a more nearly equal R-footing with the deviance
as a goodness-of-fit statistic, which I don't think anyone would argue
with, while still leaving the user with a bit to do to get a p-value,
say.

One of Peter's other posts suggests, perhaps unintentionally, a
functionality that would actually return the results of a
goodness-of-fit test, though possibly using only the deviance.  This
might indeed be useful, but I would still like to have an easy,
standardized way to just get Pearson's chi-square statistic.  Assuming a
situation in which either is appropriate, I tend to prefer Pearson's
chi-square over the deviance for testing goodness-of-fit, because I
think that in marginal cases its null distribution is usually better
approximated by the chi-square (if you think I'm wrong about this,
please let me know).  Pearson's chi-square also has the advantage of
being much easier to explain as a goodness-of-fit statistic than the
deviance, leaving everything else about the deviance to the realm of
likelihood-ratio tests, where it mostly belongs.  My original feature
request was, after all, mostly about having a simple, standardized way
in R for "naive users" (my students) to do some fairly standard things.

> One would not be giving any output at all from poisson or binomial
> models, given that data that really calls for quasi links (or a glmm
> with observation level random effects) is in my experience the rule
> rather than the exception!
>
> At the very least, why not a function dispersion() or pearsonchisquare()
> that gives this information.
>
> Apologies that I misattributed this.
>
> John Maindonald email: john.maindon...@anu.edu.au
> phone : +61 2 (6125)3473fax  : +61 2(6125)5549
> Centre for Mathematics & Its Applications, Room 1194,
> John Dedman Mathematical Sciences Building (Building 27)
> Australian National University, Canberra ACT 0200.
> http://www.maths.anu.edu.au/~johnm
>
> On 16/03/2011, at 12:41 AM, peter dalgaard wrote:
>
>> 
>> On Mar 15, 2011, at 13:42 , John Maindonald wrote:
>> 
 Peter Dalgaard: It would also be nice for teaching purposes if glm or 
 summary.glm had a
 "pearsonchisq" component and a corresponding extractor function, but I
 can imagine that there might be arguments against it that haven't
 occured to me.  Plus, I doubt that anyone wants to touch glm unless it's
 to repair a bug. If I'm wrong about all that though, ...
>>> 
>> 
>> Umm, that was Brett, actually.
>
>>> This would remedy what I have long judged a deficiency in summary.glm().
>>> The information is important for diagnostic purposes.  One should not have
>>> to fit a model with a quasi error, or suss out how to calculate the Pearson
>>> chi square from the glm model object, to discover that the information in 
>>> the
>>> model object is inconsistent with simple binomial or poisson assumptions.
>> 
>> It could be somewhere between useless and misleading in cases like binary 
>> logistic regression though. (Same thing goes for the test against the 
>> saturated model: Sometimes it makes sense and sometimes not.)
>> 
>>> 
>>> John Maindonald email: john.maindon...@anu.edu.au
>>> phone : +61 2 (6125)3473fax  : +61 2(6125)5549
>>> Centre for Mathematics & Its Applications, Room 1194,
>>> John Dedman Mathematical Sciences Building (Building 27)
>>> Australian National University, Canberra ACT 0200.
>>> http://www.maths.anu.edu.au/~johnm
>>> 
>>> On 15/03/2011, at 10:00 PM, r-devel-requ...@r-project.org wrote:
>>> 
 From: Brett Presnell 
 Date: 15 March 2011 2:40:29 PM AEDT
 To: pete

Re: [Rd] Standardized Pearson residuals (and score tests)

2011-03-17 Thread Gordon K Smyth


On Thu, 17 Mar 2011, peter dalgaard wrote:


On Mar 16, 2011, at 23:29 , Gordon K Smyth wrote:


Hi Peter and others,

If it helps, I wrote a small function glm.scoretest() for the statmod 
package on CRAN to compute score tests from glm fits.  The score test 
for adding a covariate, or any set of covariates, can be extracted very 
neatly from the standard glm output, although you probably already know 
that.


Thanks Gordon,

I'll have a look. It's the kind of think where you _strongly suspect_ 
that a neat solution exists, but where you can't just write it down 
immediately. Looks like your code needs some elaboration to handle 
factor terms and more general model reductions, though.


Yes, the glm.scoretest() function is very basic.  At the moment it tests 
for adding a single covariate at a time, i.e., a 1 df test, or several 1 
df tests.  If you like, I could add a multiple column version that would 
work for factors etc, it would be only more line or so.  I figure you'd 
want to pull code out of glm.scoretest() rather than call it explicitly 
anyway.


Gordon


-pd



Regards
Gordon

-
Professor Gordon K Smyth,
NHMRC Senior Research Fellow,
Bioinformatics Division,
Walter and Eliza Hall Institute of Medical Research,
1G Royal Parade, Parkville, Vic 3052, Australia.
sm...@wehi.edu.au
http://www.wehi.edu.au
http://www.statsci.org/smyth



--
Peter Dalgaard
Center for Statistics, Copenhagen Business School
Solbjerg Plads 3, 2000 Frederiksberg, Denmark
Phone: (+45)38153501
Email: pd@cbs.dk  Priv: pda...@gmail.com




__
The information in this email is confidential and intend...{{dropped:4}}

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Feature request: display file name in R CMD check warning

2011-03-17 Thread Duncan Murdoch


On 11-03-17 12:33 PM, Dan Tenenbaum wrote:

On Thu, Mar 17, 2011 at 2:36 AM, Duncan Murdoch
  wrote:

On 16/03/2011 7:55 PM, Dan Tenenbaum wrote:


Hi,

I came across the following warning in R CMD check (it only occurred on
Windows):

The \usage entries for S3 methods should use the \method markup and not


their full name.
See the chapter 'Writing R documentation files' in manual 'Writing R
Extensions'.



The package I'm looking at is one that I did not write which has 34 .Rd
files. This warning does not tell me which file to look in. It would be
very
helpful if it did. Same goes for other warnings/errors produced by R CMD
check.


I was unable to duplicate this.  When I tried it by messing up one of the
man pages in the ellipse package, I got this:

S3 methods shown with full name in documentation object 'ellipse.glm':
  ellipse.glm

The \usage entries for S3 methods should use the \method markup and not
their full name.
See the chapter 'Writing R documentation files' in manual 'Writing R
Extensions'.

"Documentation object 'ellipse.glm'" tells me the \name{} inside the .Rd
file, which is enough to uniquely identify the file.  Are you not seeing
this part of the message?




No, I'm not.

I still can't identify the offending file, but you can download the
whole package:

svn --username readonly --password readonly export
https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/affyILM/

Then run R CMD check on this package. You may need to install dependencies.


I installed a lot of dependencies, but couldn't trust check because of 
this warning:


Found the following significant warnings:
  Warning: running command 'C:\WINDOWS\system32\cmd.exe /c ftype perl' 
had status 2


I'm not sure where this comes from; I don't think bgafun uses perl, so 
it's from one of the dependencies.  But I think the error message you 
are seeing is spurious:  something else is going wrong when check tries 
to check \usage sections, and so it reports that there was an error.


As a sort of confirmation of this, I removed the dependencies from the 
DESCRIPTION file and tried to run check; I got lots of errors because of 
the missing dependencies now, but the one about S3 methods went away.


Not sure what to suggest to diagnose this; I'm not familiar with most of 
those packages I just installed as dependencies.  But I think it's safe 
to say that you shouldn't worry about the \usage sections.


If you do figure out what's going wrong, please let us know because it 
would probably be a good idea to fix the usage checks so they give the 
right message.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] avoid copying big object passed into optimize()

2011-03-17 Thread Zepu Zhang

Thanks! I found, say, exp(x) causes 2 duplications whereas sum(x) 0
duplication. Is there any document to learn from about this?

(first time list user. sorry if anything about the posting procedure is
wrong)

On Wed, Mar 9, 2011 at 6:16 PM, Matt Shotwell  wrote:

> On Wed, 2011-03-09 at 17:15 -0900, Zepu Zhang wrote:
> > Hello list,
> >
> > I have the following scenario:
> >
> > f1 <- function(a)
> > {
> >   # doing things; may need 'a', but does not change 'a'.
> >
> >  g <- function(x)
> >  {
> >   sum(x + a)# Say. Use 'a'; does not change 'a'.
>
> The expression 'x + a' causes 'a' to be duplicated; 'x' is added to each
> element of the duplicated vector, then returned. The sum occurs
> afterward. To avoid this use an expression like: 'length(a) * x +
> sum(a)'. Also, please see this recent thread regarding the
> pass-by-value / pass-by-reference issue:
> http://tolstoy.newcastle.edu.au/R/e13/help/11/03/6632.html
>
> >  }
> >
> >  optimize(f = g, lower = 0, upper = 1)
> > }
> >
> >
> > f2 <- function()
> > {
> > b <- runif(1000)   # Create big object.
> >
> > f1(a = b)
> > }
> >
> >
> > My main concern is to reduce copying of the big object 'a'. Questions:
> >
> > (1) In f1, 'a' never appears on the LHS of assignment. Is it passed by
> value
> > or by reference? Say the situation is simpler and more general: no
> > optimization call in f1.
>
> 'a' is passed by value, but not necessarily copied in memory.
>
> > (2) Is there any difference, as far as copying of the big 'a' is
> concerned,
> > if 'g' is changed to
> >g <- function(x, b)  { sum(x + b) }
> > and called by
> > optimize(f = g, lower = 0, upper = 1, b = a)
>
> No.
>
> > (3) Is 'a' passed into the C optimization function one-off, or again and
> > again across the C-R interface?
>
> I don't think either is completely correct. But more to your point, 'a'
> is not necessarily copied repeatedly. If you make the substitution I
> suggested above for 'g', then 'a' is not repeatedly copied.
>
> > (4) Does it help if I remove the argument 'a' of 'f1', and let 'g' look
> for
> > it (of course it should be referred to as 'b' now) directly in the
> > environment of 'f2'?
>
> No. 'g' would then search and find 'a' farther down the environment
> tree.
>
> > (5) Any suggestions?
>
> Avoid operations that necessitate a copy. Compile R with
> --enable-memory-profiling and use the tracemem function to help in this.
>
> > Many thanks for your help!
> >
> > Zepu
> >
> >   [[alternative HTML version deleted]]
> >
> > __
> > R-devel@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-devel
>
>
>

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

[Rd] Detecting bad lexical scoping

Re: [Rd] Detecting bad lexical scoping

Re: [Rd] Feature request: display file name in R CMD check warning

Re: [Rd] Feature request: display file name in R CMD check warning

Re: [Rd] Detecting bad lexical scoping

Re: [Rd] Standardized Pearson residuals (and score tests)

[Rd] New feature for download.packages(): optional resolution of package dependencies

Re: [Rd] Standardized Pearson residuals

Re: [Rd] Standardized Pearson residuals

Re: [Rd] New feature for download.packages(): optional resolution of package dependencies

Re: [Rd] Feature request: display file name in R CMD check warning

Re: [Rd] Feature request: display file name in R CMD check warning

Re: [Rd] Standardized Pearson residuals

Re: [Rd] Standardized Pearson residuals

Re: [Rd] Standardized Pearson residuals

Re: [Rd] Standardized Pearson residuals (and score tests)

Re: [Rd] Feature request: display file name in R CMD check warning

Re: [Rd] avoid copying big object passed into optimize()

18 matches

Site Navigation

Mail list logo

Footer information