[Rd] Fortune?

2013-02-28 Thread Patrick Burns
I think the rule is that you can do anything as long as you don't 
complain. If you want to complain, you must follow the instructions.


-- Jari Oksanen  in

Re: [Rd] Keeping up to date with R-devel


--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @burnsstat @portfolioprobe
http://www.portfolioprobe.com/blog
http://www.burns-stat.com
(home of:
 'Impatient R'
 'The R Inferno'
 'Tao Te Programming')

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Fortune?

2013-02-28 Thread Joris Meys
Has my vote!

On Thu, Feb 28, 2013 at 10:57 AM, Patrick Burns wrote:

> I think the rule is that you can do anything as long as you don't
> complain. If you want to complain, you must follow the instructions.
>
> -- Jari Oksanen  in
>
> Re: [Rd] Keeping up to date with R-devel
>
>
> --
> Patrick Burns
> pbu...@pburns.seanet.com
> twitter: @burnsstat @portfolioprobe
> http://www.portfolioprobe.com/**blog 
> http://www.burns-stat.com
> (home of:
>  'Impatient R'
>  'The R Inferno'
>  'Tao Te Programming')
>
> __**
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/**listinfo/r-devel
>



-- 
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Bordered legend icons and Text in plots

2013-02-28 Thread James Hawley
Hello,

My colleagues and I use lattice for a variety of different purposes, and we 
came across a couple issues involving legend.R and update.trellis.R:
1. When using xyplot, the shapes in the plots are able to have borders and fill 
colours, but not so in the legend. I created a short script to isolate the 
problem (see legend-icons.R and Nofill.png).
   After some tracing through the source code, it turned out that the "fill" 
argument in "key" was being dropped by the line
   pars <- pars[!sapply(pars, is.null)] # remove NULL components (lines 302 
and 322)
   As in key$fill was NULL when passed. The issue seems to be from the function 
process.key(). The output list does not return "fill" as one of its arguments, 
and was dropped whenever "key" was processed.
2. Giving multiple grobs to the 'inside' part of 'legend' only uses the first 
grob if you use update.trellis to provide them. This is caused by the way 
modifyList handles list elements with the same name (which is called by 
update.trellis), and can be worked around (see update.trellis.R).
   The issue without modification to your code can be seen in Notext.png 
(created by legend-text.R). Multiple grobs could still be given to 'inside' via 
the other methods (e.g. xyplot and the like).

I've made a few modifications to the source code and uploaded them here as well 
(legend.R and update.trellis.R).
For Issue 1, I added the line "fill = fill," to the output list of 
process.key(), and this seems to have fixed the issue (see legend.R line 216 
and Fill.png for results).
For Issue 2 there is a workaround in update.trellis.R lines 267-275 (see 
Text.png for the results).

James Hawley
Student

Ontario Institute for Cancer Research
MaRS Centre, South Tower
101 College Street, Suite 800
Toronto, Ontario, Canada M5G 0A3

Toll-free: 1-866-678-6427
Twitter: @OICR_news
www.oicr.on.ca

This message and any attachments may contain confidential and/or privileged 
information for the sole use of the intended recipient. Any review or 
distribution by anyone other than the person for whom it was originally 
intended is strictly prohibited. If you have received this message in error, 
please contact the sender and delete all copies. Opinions, conclusions or other 
information contained in this message may not be that of the organization.
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] nobs() with glm(family="poisson")

2013-02-28 Thread Ravi Varadhan
This is getting further away from typical R-devel issues, but let me add 
another perspective:  the `n' in BIC reflects the rate at which the information 
in the log-likelihood grows.  

Ravi

-Original Message-
From: r-devel-boun...@r-project.org [mailto:r-devel-boun...@r-project.org] On 
Behalf Of Steven McKinney
Sent: Wednesday, February 27, 2013 5:26 PM
To: 'Milan Bouchet-Valat'
Cc: r-devel
Subject: Re: [Rd] nobs() with glm(family="poisson")



> -Original Message-
> From: r-devel-boun...@r-project.org 
> [mailto:r-devel-boun...@r-project.org]
> On Behalf Of Milan Bouchet-Valat
> Sent: February-27-13 12:56 PM
> To: peter dalgaard
> Cc: r-devel
> Subject: Re: [Rd] nobs() with glm(family="poisson")
> 
> Thanks for the (critical, indeed) answer!
> 
> Le mercredi 27 février 2013 à 20:48 +0100, peter dalgaard a écrit :
> > On Feb 27, 2013, at 19:46 , Milan Bouchet-Valat wrote:
> >
> > > I cannot believes nobody cares about this -- or I'm completely 
> > > wrong
> and
> > > in that case everybody should rush to put the shame on me... :-p
> >
> > Well, nobs() is the number of observations. If you have 5 Poisson 
> > distributed counts, you have 5 observations.
> Well, say that to the statistical offices that spend millions to 
> survey thousands of people with correct (but complex) sampling 
> designs, they'll be happy to know that the collected data only 
> provides an information equivalent to 5 independent outcomes. ;-)

Milan:

It seems to me you are mixing up Binomial and Poisson situations, and not 
assessing independence appropriately.

The above example discusses Bernoulli outcomes which are sometimes aggregated 
into Binomial "cases" depending on the study design.
Now if the survey samples people in the same household or even neighbourhood, 
those Bernoulli outcomes will not be independent (hence clustered survey 
techniques) and summing the Binomial denominators would not be appropriate, for 
the survey analysis or for BIC calculations.  The "n" in the BIC calculation 
should reflect independent observations.  If you knock on the same door 1000 
times and ask the person who they will vote for, you do not have 1000 
independent observations, even though your Binomial denominator is 1000.

The example you show from ?glm is a Poisson example showing
9 independent Poisson counts.  If I count the number of cars passing through an 
intersection during non-overlapping one minute intervals (say 9 such 
intervals), then the number of observations I have is the number of 
non-overlapping one minute interval car count totals (e.g. the nine counts 
c(18, 17, 15, 20, 10, 20, 25, 13, 12)), not the number of cars I saw in total.

A piece of software that adds things up can not know the context from which the 
numbers were derived, so you have to figure out the level of independence 
appropriate to your study design and work out the BIC count accordingly.

Raftery alludes to this in a preceding section:

"When the data have been collected using a complex survey design with resulting 
weights, it is not yet clear what n should be, and this issue awaits further 
study.  However, it seems reasonable that if the model is based on an 
assumption of simple random sampling but the sampling design is less efficient, 
then n should be reduced to reflect the efficiency of the sampling design 
relative to simple random sampling."



Steven McKinney
Statistician
Molecular Oncology and Breast Cancer Program British Columbia Cancer Research 
Centre


> 
> > If the number of observations is not the right thing to use in some 
> > context, use the right thing instead. Changing the definition of
> > nobs() surely leads to madness.
> It is common usage in the literature using log-linear models to report 
> the sum of counts as the number of observations. I think this indeed 
> makes sense, but I'm not particularly attached to the choice of words 
> -- let's call it as you please.
> 
> The root issue is that nobs() was precisely introduced to be the basis 
> for the BIC() function, as ?nobs states explicitly:
> >  Extract the number of ‘observations’ from a model fit.  This is
> >  principally intended to be used in computing BIC (see ‘AIC’)
> 
> So it's OK to say that the number of observations is the number of 
> cells (even if I think this is not very user-friendly), but then the 
> documentation is misleading, and the BIC() function returns incorrect 
> values for the very first example provided in ?glm.
> 
> > (I suppose that the fact that n is so obviously the wrong thing for 
> > one particularly well-digested family of distribution functions 
> > could be taken to indicate a generic weakness with the BIC.)
> I'm sure we can agree on the fact that BIC has its weaknesses (and I'm 
> not the best person able to judge), but the point at stake is IMHO not 
> one of them. After all, usual statistics for the Poisson family, such 
> as deviance or residuals, are based on the sum of counts, not on the 
> number

Re: [Rd] Fortune?

2013-02-28 Thread Achim Zeileis

On Thu, 28 Feb 2013, Joris Meys wrote:


Has my vote!


Mine as well :-)
Added to the devel-version on R-Forge now.

thx,
Z


On Thu, Feb 28, 2013 at 10:57 AM, Patrick Burns wrote:


I think the rule is that you can do anything as long as you don't
complain. If you want to complain, you must follow the instructions.

-- Jari Oksanen  in

Re: [Rd] Keeping up to date with R-devel


--
Patrick Burns
pbu...@pburns.seanet.com
twitter: @burnsstat @portfolioprobe
http://www.portfolioprobe.com/**blog 
http://www.burns-stat.com
(home of:
 'Impatient R'
 'The R Inferno'
 'Tao Te Programming')

__**
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/**listinfo/r-devel





--
Joris Meys
Statistical consultant

Ghent University
Faculty of Bioscience Engineering
Department of Mathematical Modelling, Statistics and Bio-Informatics

tel : +32 9 264 59 87
joris.m...@ugent.be
---
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel



__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] S3 generics in packages: default method

2013-02-28 Thread Thomas Lumley
For a package providing 'virtual' data.frames backed by MonetDB database
tables, we want to make some functions generic, to implement versions that
work on single or multiple database columns.

My approach was

sd <- function(x, na.rm=TRUE,...) UseMethod("sd")

sd.default<- base::sd

I've done that before to make dotchart() generic in the 'survey' package.

With dotchart() (and even sd()) there still doesn't seem to be a problem
(based on the CRAN check results), but with var() we get a note

Found .Internal call in the following function:
  ‘var.default’
with calls to .Internal functions
  ‘cov’

because base::var contains calls to .Internal.  These seems harmless to me
-- the package is only calling .Internal() through base functions in usual
way.

We could use
   var.default<-function(x, y=NULL, na.rm=TRUE, ...)
base::var(x,y,na.rm,...)

but it seems that could have different lazy-evaluation behaviour.

-thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Return value of S4 validity function

2013-02-28 Thread Gabor Grothendieck
On Thu, Feb 28, 2013 at 3:44 PM, Gabor Grothendieck
 wrote:
> This issues a message about a needing to be non-negative as expected:
>
> setClass("A",
>   representation = list(a = "numeric"),
>   prototype = list(a = 0),
>   validity = function(object) {
> out <- if (object@a < 0) "a must be non-negative"
> if (is.null(out)) TRUE else out ##
>   })
> new("A", a = -1)
>
> but it also works if the ## line is omitted so it appears that one can
> use NULL in place of TRUE.  I wonder if the use of NULL as an
> alternative to TRUE could be officially supported as it would allow
> one to write validity methods in a more concise manner.  It appears
> that this would only require a change to the documentation.
>

Pressed return too quickly.  This should have continued on to give this example:

new("A")

with and without the ## line both of which work since it appears that
NULL can be used in place of TRUE.

-- 
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Return value of S4 validity function

2013-02-28 Thread Gabor Grothendieck
This issues a message about a needing to be non-negative as expected:

setClass("A",
  representation = list(a = "numeric"),
  prototype = list(a = 0),
  validity = function(object) {
out <- if (object@a < 0) "a must be non-negative"
if (is.null(out)) TRUE else out ##
  })
new("A", a = -1)

but it also works if the ## line is omitted so it appears that one can
use NULL in place of TRUE.  I wonder if the use of NULL as an
alternative to TRUE could be officially supported as it would allow
one to write validity methods in a more concise manner.  It appears
that this would only require a change to the documentation.

--
Statistics & Software Consulting
GKX Group, GKX Associates Inc.
tel: 1-877-GKX-GROUP
email: ggrothendieck at gmail.com

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] nobs() with glm(family="poisson")

2013-02-28 Thread Thomas Lumley
On Thu, Feb 28, 2013 at 11:56 AM, Ravi Varadhan wrote:

> This is getting further away from typical R-devel issues, but let me add
> another perspective:  the `n' in BIC reflects the rate at which the
> information in the log-likelihood grows.
>

But in the derivation of  BIC, the log(n) term is kept and various terms of
order 1 are discarded.  What we're arguing about is one of the O(1) terms.
 If it makes an important difference then presumably we should also worry
about the other O(1) terms that got discarded.

   -thomas

-- 
Thomas Lumley
Professor of Biostatistics
University of Auckland

[[alternative HTML version deleted]]

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] nobs() with glm(family="poisson")

2013-02-28 Thread Milan Bouchet-Valat
Le mercredi 27 février 2013 à 18:59 -0500, John Fox a écrit :
> Dear Milan and Steven,
> 
> At the risk of muddying the water further, I think that the potential
> confusion here is that Poisson GLMs are applied in two formally
> equivalent but substantively different situations: (1) where the
> counts are cells in a contingency table, in which case the Poisson GLM
> is used to fit an equivalent loglinear association model to the table;
> and (2) where the counts are observations on a non-negative integer
> response -- what's often called "Poisson regression." In the first
> case, but not the second, it makes sense to think of the sum of the
> counts as the natural sample size. I don't think that one can expect
> GLM software to distinguish these cases.
Thanks, that sounds like a good summary of the situation. There are two
legitimate use cases and definitions of number of observations, yet only
one nobs() and one BIC() function.

Indeed software can hardly find out which situation applies, except
maybe if it is possible to consider (as I suggested above) that
log-linear models are always fitted to tabular data (several
observations per cell), while poisson regressions are fitted to data
frames (one observation per row). If this is right, then a possible
solution would be to define nobs.glm() like this:

nobs.glm <- function(object, ...) {
w <- object$prior.weights

if(is.matrix(object$data)) {
if (!is.null(w)) sum(object$data[w != 0], na.rm=TRUE)
else sum(object$data, na.rm=TRUE)
}
else {
if (!is.null(w)) sum(w != 0) else length(object$residuals)
}
}

This would just require glm() to call as.data.frame(data) when passed a
table.


(loglin() could be considered the most natural way of fitting log-linear
models, but glm() is very useful too since it supports the quasipoisson
family, and the negative binomial via glm.nb(); finally, glm() handles
structural zeros better than loglin().)


Regards

> Best,
>  John
> 
> > -Original Message-
> > From: r-devel-boun...@r-project.org [mailto:r-devel-bounces@r-
> > project.org] On Behalf Of Milan Bouchet-Valat
> > Sent: Wednesday, February 27, 2013 5:59 PM
> > To: Steven McKinney
> > Cc: r-devel
> > Subject: Re: [Rd] nobs() with glm(family="poisson")
> > 
> > Le mercredi 27 février 2013 à 14:26 -0800, Steven McKinney a écrit :
> > >
> > > > -Original Message-
> > > > From: r-devel-boun...@r-project.org [mailto:r-devel-bounces@r-
> > project.org]
> > > > On Behalf Of Milan Bouchet-Valat
> > > > Sent: February-27-13 12:56 PM
> > > > To: peter dalgaard
> > > > Cc: r-devel
> > > > Subject: Re: [Rd] nobs() with glm(family="poisson")
> > > >
> > > > Thanks for the (critical, indeed) answer!
> > > >
> > > > Le mercredi 27 février 2013 à 20:48 +0100, peter dalgaard a écrit :
> > > > > On Feb 27, 2013, at 19:46 , Milan Bouchet-Valat wrote:
> > > > >
> > > > > > I cannot believes nobody cares about this -- or I'm completely
> > wrong
> > > > and
> > > > > > in that case everybody should rush to put the shame on me... :-
> > p
> > > > >
> > > > > Well, nobs() is the number of observations. If you have 5 Poisson
> > > > > distributed counts, you have 5 observations.
> > > > Well, say that to the statistical offices that spend millions to
> > survey
> > > > thousands of people with correct (but complex) sampling designs,
> > they'll
> > > > be happy to know that the collected data only provides an
> > information
> > > > equivalent to 5 independent outcomes. ;-)
> > >
> > > Milan:
> > >
> > > It seems to me you are mixing up Binomial and Poisson situations,
> > > and not assessing independence appropriately.
> > >
> > > The above example discusses Bernoulli outcomes which are sometimes
> > > aggregated into Binomial "cases" depending on the study design.
> > > Now if the survey samples people in the same household or even
> > > neighbourhood, those Bernoulli outcomes will not be independent
> > > (hence clustered survey techniques) and summing the Binomial
> > > denominators would not be appropriate, for the survey analysis or
> > > for BIC calculations.  The "n" in the BIC calculation should
> > > reflect independent observations.  If you knock on the same
> > > door 1000 times and ask the person who they will vote for,
> > > you do not have 1000 independent observations, even though
> > > your Binomial denominator is 1000.
> > My intention was not to introduce the issue of survey designs into the
> > discussion, but merely to make the point that in surveys, counts are
> > usually *to some extent at least* independent observations, even when
> > clustering is present, and that the fact that different people are
> > asked
> > and that each answer costs money is the best indication of that.
> > Anyway,
> > BIC does not apply if we are not assuming that the data comes from a
> > simple random sample, so let's leave this complication aside.
> > 
> > > The example you show from ?glm is a Poisson example showing
> > > 9 indep

[Rd] conflict between rJava and data.table

2013-02-28 Thread Bunny
Dear devel-listers, 

I found a conflct between rJava and data.table. Actually me questions is where 
to report it? 
Should I rather send it directly to the package maintainers or post it on some 
bug tracker. 
The problem is that data.table has a function called "J" and rJava uses the 
same quite intensively. 
I used the  xlsx R package which depends on rJava to write .xls files and ran 
into an error. 

write.xls from this package uses the functions and returns an error depending 
on the sequence the packages
were loaded. 


Error in .jnew("org/apache/poi/xssf/usermodel/XSSFWorkbook") : 
  java.lang.AbstractMethodError: 
java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;


data.table::J
rJava::J

I can work around this by loading and unloading packages, but I feel this 
should be addressed because 
loading these two packages that both deal with tables of data does not seem 
that unlikely to me. 

best

matt

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] conflict between rJava and data.table

2013-02-28 Thread Simon Urbanek
On Feb 28, 2013, at 5:09 PM, Bunny wrote:

> Dear devel-listers, 
> 
> I found a conflct between rJava and data.table. Actually me questions is 
> where to report it? 
> Should I rather send it directly to the package maintainers or post it on 
> some bug tracker. 
> The problem is that data.table has a function called "J" and rJava uses the 
> same quite intensively. 
> I used the  xlsx R package which depends on rJava to write .xls files and ran 
> into an error. 
> 
> write.xls from this package uses the functions and returns an error depending 
> on the sequence the packages
> were loaded. 
> 
> 
> Error in .jnew("org/apache/poi/xssf/usermodel/XSSFWorkbook") : 
>  java.lang.AbstractMethodError: 
> java.lang.ClassLoader.loadClass(Ljava/lang/String;)Ljava/lang/Class;
> 
> 
> data.table::J
> rJava::J
> 
> I can work around this by loading and unloading packages, but I feel this 
> should be addressed because 
> loading these two packages that both deal with tables of data does not seem 
> that unlikely to me. 
> 

Can you elaborate on the details as of where this will be a problem? Packages 
should not be affected since they should be importing the namespaces from the 
packages they use, so the only problem would be in a package that uses both 
data.table and rJava --  and this is easily resolved in the namespace of such 
package. So there is no technical reason why you can't have multiple 
definitions of J - that's what namespaces are for.

The error you report is entirely unrelated to J -- at lest in isolation. If you 
have a reproducible example, please share it.

Cheers,
Simon




> best
> 
> matt
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] conflict between rJava and data.table

2013-02-28 Thread Steve Lianoglou
Hi,

On Thu, Feb 28, 2013 at 5:09 PM, Bunny  wrote:
> Dear devel-listers,
>
> I found a conflct between rJava and data.table. Actually me questions is 
> where to report it?
> Should I rather send it directly to the package maintainers or post it on 
> some bug tracker.
> The problem is that data.table has a function called "J" and rJava uses the 
> same quite intensively.
[snip]

The development version of data.table no longer exports J from, but
once could still use J inside data.tabe[ ... ] calls.

I reckon using that would solve your problem. I'm not sure what
version of data.table you can get by installing from R-forge, but you
can either check out from subversion or download the latest source
tar-ball from R-forge and install from source ...

HTH,
-steve

--
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] conflict between rJava and data.table

2013-02-28 Thread Steve Lianoglou
Ugh, sorry, I meant to say:

On Thu, Feb 28, 2013 at 8:49 PM, Steve Lianoglou
 wrote:
[snip]
> The development version of data.table no longer exports J from, but
> once could still use J inside data.tabe[ ... ] calls.

The development version of data.table no longer exports J, so this
shouldn't happen. One could still use J from inside data.table[ ... ]
expression, though.

Hope that's a bit more clear.

-steve

> I reckon using that would solve your problem. I'm not sure what
> version of data.table you can get by installing from R-forge, but you
> can either check out from subversion or download the latest source
> tar-ball from R-forge and install from source ...
>
> HTH,
> -steve
>
> --
> Steve Lianoglou
> Graduate Student: Computational Systems Biology
>  | Memorial Sloan-Kettering Cancer Center
>  | Weill Medical College of Cornell University
> Contact Info: http://cbio.mskcc.org/~lianos/contact



-- 
Steve Lianoglou
Graduate Student: Computational Systems Biology
 | Memorial Sloan-Kettering Cancer Center
 | Weill Medical College of Cornell University
Contact Info: http://cbio.mskcc.org/~lianos/contact

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel