date:20050818

Re: [Rd] problem using model.frame()

2005-08-18 Thread Gavin Simpson

On Thu, 2005-08-18 at 07:57 +0300, Jari Oksanen wrote:
> On 18 Aug 2005, at 1:49, Gavin Simpson wrote:
> 
> > On Wed, 2005-08-17 at 20:24 +0200, Martin Maechler wrote:
> >>> "GS" == Gavin Simpson <[EMAIL PROTECTED]>
> >>> on Tue, 16 Aug 2005 18:44:23 +0100 writes:
> >>
> >> GS> On Tue, 2005-08-16 at 12:35 -0400, Gabor Grothendieck
> >> GS> wrote:
>  On 8/16/05, Gavin Simpson <[EMAIL PROTECTED]>
>  wrote: > On Tue, 2005-08-16 at 11:25 -0400, Gabor
>  Grothendieck wrote: > > It can handle data frames like
>  this:
> >>
> >> model.frame(y1) > > or > > model.frame(~., y1)
> >
> > Thanks Gabor,
> >
> > Yes, I know that works, but I want the function
>  coca.formula to accept a > formula like this y2 ~ y1,
>  with both y1 and y2 being data frames. It is
> 
>  The expressions I gave work generally (i.e. lm, glm,
>  ...), not just in model.matrix, so would it be ok if the
>  user just does this?
> 
>  yourfunction(y2 ~., y1)
> >>
> >> GS> Thanks again Gabor for your comments,
> >>
> >> GS> I'd prefer the y1 ~ y2 as data frames - as this is the
> >> GS> most natural way of doing things. I'd like to have (y2
> >> GS> ~., y1) as well, and (y2 ~ spp1 + spp2 + spp3, y1) also
> >> GS> work - silently without any trouble.
> >>
> >> I'm sorry, Gavin, I tend to disagree quite a bit.
> >>
> >> The formula notation has quite a history in the S language, and
> >> AFAIK never was the idea to use data.frames as formula
> >> components, but rather as "environments" in which formula
> >> components are looked up --- exactly as Gabor has explained.
> >
> > Hi Martin, thanks for your comments,
> >
> > But then one could have a matrix of variables on the rhs of the formula
> > and it would work - whether this is a documented feature or un-intended
> > side-effect of matrices being stored as vectors with dims, I don't 
> > know.
> >
> > And whilst the formula may have a long history, a number of packages
> > have extended the interface to implement a specific feature, which 
> > don't
> > work with standard functions like lm, glm and friends. I don't see how
> > what I wanted to achieve is greatly different to that or using a 
> > matrix.
> >
> >> To break with such a deeply rooted principle,
> >> you should have very very good reasons, because you're breaking
> >> the concepts on which all other uses of formulae are based.
> >> And this would potentially lead to much confusion of your users,
> >> at least in the way they should learn to think about what
> >> formulae mean.
> >
> > In the end I managed to treat y1 ~ y2 (both data frames) as a special
> > case, which allows the existing formula notation to work as well, so I
> > can use y1 ~ y2, y1 ~ ., data = y2, or y1 ~ var + var2, data = y2. This
> > is what I wanted all along, to extend my interface (not do anything to
> > R's formulae), but to also work in the traditional sense.
> >
> > The model I am writing code for really is modelling the relationship
> > between two matrices of data. In one version of the method, there is
> > real equivalence between both sides of the formula so it would seem odd
> > to treat the two sides of the formula differently. At least to me ;-)
> 
> It seems that I may be responsible for one of these extensions (lhs as 
> a data.frame in cca and rda in vegan package). There the response (lhs) 
> is multivariate or a multispecies community, and you must take that as 
> a whole without manipulation (and if you tried using VGAM you see there 
> really is painful to define lhs with, say, 127 elements). 

Hi Jari,

Thanks for reminding me about this - I'd forgotten about not normally
being able to have a data.frame on the lhs of the formula either - I'm
surprised no-one pulled me up on that one before, either ;-) 

I guess what I'm proposing is really pushing the formula representation
too far for some people. I'm coming round to the y1 ~ ., data = y2 way
of doing things - still prefer y1 ~ y2 though ;-)

Also, both y1 and y2 are community matrices (i.e. both have many, many
species, aka variables for the non-community ecologists reading this).
I'm not sure that it makes sense to treat the two sides differently. In
the predictive co-correspondence mode (the default), multivariate pls is
used to regress one matrix on another, with the number of pls components
being chosen by cross-validation or a permutation test.

> However, in 
> general you shouldn't use models where you use all the 'explanatory' 
> variables (rhs) that yo happen to have by accident. So much bad science 
> has been created with that approach even in your field, Gav. 

Well, I agree with you there...

> The whole 
> idea of formula is the ability to choose from candidate variables. That 
> is: to build a model. Therefore you have one-sided formulae in prcomp() 
> and princomp(): you can say prcomp(~ x1 + log(x2) +x4, data) or 
> prcomp(~ . - x3, data). I think you should try to

[Rd] kendall tau correlation test for ties: Potential error (PR#8076)

2005-08-18 Thread dkoschuetzki

Full_Name: Dirk Koschuetzki
Version: 2.1.1
OS: source code
Submission from: (NULL) (194.94.136.34)


Hello,

>From the source code (R-2.1.1, file: .../R-2.1.1/src/library/stats/R/)

**
cor.test.default <-
function(x, y, alternative = c("two.sided", "less", "greater"),
 method = c("pearson", "kendall", "spearman"), exact = NULL,
 conf.level = 0.95, ...)
{
alternative <- match.arg(alternative)
method <- match.arg(method)
DNAME <- paste(deparse(substitute(x)), "and", deparse(substitute(y)))

if(length(x) != length(y))
stop("'x' and 'y' must have the same length")
OK <- complete.cases(x, y)
x <- x[OK]
y <- y[OK]
n <- length(x)

PVAL <- NULL
NVAL <- 0
conf.int <- FALSE

if(method == "pearson") {
// Omitted
}
else {
if(n < 2)
stop("not enough finite observations")
PARAMETER <- NULL
TIES <- (min(length(unique(x)), length(unique(y))) < n)
if(method == "kendall") {
method <- "Kendall's rank correlation tau"
names(NVAL) <- "tau"
r <- cor(x,y, method = "kendall")
ESTIMATE <- c(tau = r)

if(!is.finite(ESTIMATE)) {  # all x or all y the same
ESTIMATE[] <- NA
STATISTIC <- c(T = NA)
PVAL <- NA
}
else {
if(is.null(exact))
exact <- (n < 50)
if(exact && !TIES) {
q <- round((r + 1) * n * (n - 1) / 4)
pkendall <- function(q, n) {
.C("pkendall",
   length(q),
   p = as.double(q),
   as.integer(n),
   PACKAGE = "stats")$p
}
PVAL <-
switch(alternative,
   "two.sided" = {
   if(q > n * (n - 1) / 4)
   p <- 1 - pkendall(q - 1, n)
   else
   p <- pkendall(q, n)
   min(2 * p, 1)
   },
   "greater" = 1 - pkendall(q - 1, n),
   "less" = pkendall(q, n))
STATISTIC <- c(T = q)
} else {
STATISTIC <- c(z = r / sqrt((4 * n + 10) / (9 * n*(n-1
p <- pnorm(STATISTIC)
if(exact && TIES)
warning("Cannot compute exact p-value with ties")
}
}
} else {
// OMITTED
}
}

if(is.null(PVAL)) # for "pearson" only, currently
PVAL <- switch(alternative,
   "less" = p,
   "greater" = 1 - p,
   "two.sided" = 2 * min(p, 1 - p))

RVAL <- list(statistic = STATISTIC,
 parameter = PARAMETER,
 p.value = as.numeric(PVAL),
 estimate = ESTIMATE,
 null.value = NVAL,
 alternative = alternative,
 method = method,
 data.name = DNAME)
if(conf.int)
RVAL <- c(RVAL, list(conf.int = cint))
class(RVAL) <- "htest"
RVAL
}
*

Please look at the computation of the p-value for Kendalls tau. There is an
assignment to "p" right above the warning. In the bottom of the function there
is a comment that for the pearson case we have to use the modification and set
PVAL. 

The problem is: 
* Either the comment is wrong because the modification should be done with
kendall too, or
* The variable PVAL has to be assigned in the kendall block.

I hope this is clear so far.

Please send me some comments, because I'm not sure if my observation is ok. And
currently I try to figure out the significance in the biserial case which of
course makes heavy use of the tied case.

Cheers,
Dirk

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] kendall tau correlation test for ties: Potential error (PR#8076)

2005-08-18 Thread Peter Dalgaard

[EMAIL PROTECTED] writes:

> } else {
> STATISTIC <- c(z = r / sqrt((4 * n + 10) / (9 * n*(n-1
> p <- pnorm(STATISTIC)
> if(exact && TIES)
> warning("Cannot compute exact p-value with ties")
> }
> }
>   } else {
>   // OMITTED
> }
> }
> 
> if(is.null(PVAL)) # for "pearson" only, currently
>   PVAL <- switch(alternative,
>  "less" = p,
>  "greater" = 1 - p,
>  "two.sided" = 2 * min(p, 1 - p))
...
> 
> Please look at the computation of the p-value for Kendalls tau. There is an
> assignment to "p" right above the warning. In the bottom of the function there
> is a comment that for the pearson case we have to use the modification and set
> PVAL. 
> 
> The problem is: 
> * Either the comment is wrong because the modification should be done with
> kendall too, or
> * The variable PVAL has to be assigned in the kendall block.
> 
> I hope this is clear so far.

I think it is the comment that is wrong. The calculation of
opposite-side one-sided and two-sided alternatives make OK sense when
the normal approximation of the test statistic is being used. It's
when you use a discrete distribution that you need to be careful.

(As brought up recently, the normal approximation itself is not too
hot in the tied case, but that's another matter.)

-- 
   O__   Peter Dalgaard Øster Farimagsgade 5, Entr.B
  c/ /'_ --- Dept. of Biostatistics PO Box 2099, 1014 Cph. K
 (*) \(*) -- University of Copenhagen   Denmark  Ph:  (+45) 35327918
~~ - ([EMAIL PROTECTED])  FAX: (+45) 35327907

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] Registering S3 class from external package

2005-08-18 Thread John Chambers

Your description is a bit too vague to be sure, but it sounds as if 
you're trying to do something that breaks the basic namespace idea.

Paul Roebuck wrote:

> The package I'm working on can extract data from external
> packages but would otherwise have no dependency on them.
> However, I desire to be able to dispatch based on an
> external S3 class if its package is attached (.First.lib).

Dispatch from where?  If you mean from a generic function inside your 
package's namespace, then the corresponding class definitions need to be 
defined in or imported into that namespace when it's constructed.  One 
of the main virtues of the namespace mechanism is that the known classes 
(& therefore the dispatch behavior) are fixed by the package source + 
its imports.  This means that the behavior of the package is insulated 
against effects from attaching other packages.

OTOH, if you mean dispatch from a generic function in the global 
environment or another (non-NAMESPACE) package, then you need to ensure 
that the setOldClass() takes place in the appropriate environment; e.g. 
if the generic is in the global environment,
   setOldClass(c("foo", "bar"), where = .GlobalEnv)
If you're dispatching from a generic in "somepkg", the setOldClass call 
should just be a part of the source for that package.
But, again, these will not help if you're dispatching from the mypkg 
namespace; that's just what the mechanism is trying to avoid.

In case it's not clear, the setOldClass() call defines a special kind of 
  S4 class for each of the S3 classes.  Therefore it's subject to the 
same locking rules as setClass() or any other call that needs to do an 
assignment as a side effect.

> My code is S4-based and its package has NAMESPACE.
> 
> Registering the external class prior to the other
> package being attached doesn't seem to work so I am
> attempting to perform the registration once the other
> package has done so. But my namespace is locked by the
> time this occurs. Can someone either tell me how to do
> this, suggest a better alternative, or point me to
> another package that does something similar?
> 
> Current attempt is something like the following:
> 
>   setHook(packageEvent("somepkg", "attach"),
>   function(...) {
>   cat("* Register", sQuote("oldstyle"),
>   "as S3 class", "\n")
>   setOldClass(c("oldstyle", "data.frame"),
>   where = asNamespace("mypkg"))
>   })
> 
> 
> -
> 
>>require(mypkg)
> 
> Loading required package: mypkg
> ...
> 
>>require(somepkg)
> 
> Loading required package: somepkg
> * Register 'oldstyle' as S3 class
> Error in assign(classMetaName(Class), def, where) :
> cannot add bindings to a locked environment
> 
>>R.version.string
> 
> [1] "R version 2.1.1, 2005-06-20"
> 
> --
> SIGSIG -- signature too long (core dumped)
> 
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
> 
> 
> !DSPAM:42fb68c222117714262142!
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] problem using model.frame()

[Rd] kendall tau correlation test for ties: Potential error (PR#8076)

Re: [Rd] kendall tau correlation test for ties: Potential error (PR#8076)

Re: [Rd] Registering S3 class from external package

4 matches

Site Navigation

Mail list logo

Footer information