Re: [Rd] setClassUnion with numeric; extending class union

2009-02-12 Thread Sklyar, Oleg (London)
Hi John,

sorry for not posting more info. Strangely I get warnings about
setClassUnion with numeric in a very special case: if I define it in a
clean R session then there are no warnings, however if I load a number
of my packages where there are other classes derived from numeric and
exported then I get the following warnings:

> setClassUnion("numericOrNULL", c("numeric","NULL"))
[1] "numericOrNULL"
Warning messages:
1: In .checkSubclasses(class1, classDef, class2, classDef2, where1,  :
  Subclass "TimeDateBase" of class "numeric" is not local and cannot be
updated for new inheritance information; consider setClassUnion()
2: In .checkSubclasses(class1, classDef, class2, classDef2, where1,  :
  Subclass "TimeDate" of class "numeric" is not local and cannot be
updated for new inheritance information; consider setClassUnion()
3: In .checkSubclasses(class1, classDef, class2, classDef2, where1,  :
  Subclass "Time" of class "numeric" is not local and cannot be updated
for new inheritance information; consider setClassUnion()

The class is operational even with those warnings though. Now, the above
classes are defined as follows:

## - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - 
setClass("TimeDateBase", 
representation("numeric", mode="character"),
prototype(mode="posix")
)

## - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - 
setClass("TimeDate",
representation("TimeDateBase", tzone="character"),
prototype(tzone="Europe/London")
)

## - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
- - - - 
setClass("Time", 
representation("TimeDateBase")
)

Theses classes work perfectly fine on their own and are used throughout
our code for all possible time and date operations extending the
existing functionality of R and available third party packages by an
order of magnitude. I do not see a relation between the above class
definitions and the newly defined class union though apart from the fact
that they are in a package namespace and therefore locked. Sorry I
cannot provide more source code as the code is not yet public.

It would definitely be nice to somehow have a .Data slot in NULL or even
a data.frame, although I do understand that this is quite a substantial
piece of work to make it all robust and backward compatible.

> sessionInfo() ## of a clean session

R version 2.9.0 Under development (unstable) (2009-02-02 r47821) 
x86_64-unknown-linux-gnu 

locale:
C

attached base packages:
[1] stats graphics  utils datasets  grDevices methods   base


Any thoughts are greatly appreciated.

Kind regards,
Oleg

Dr Oleg Sklyar
Research Technologist
AHL / Man Investments Ltd
+44 (0)20 7144 3107
oskl...@maninvestments.com 

> -Original Message-
> From: John Chambers [mailto:j...@r-project.org] 
> Sent: 11 February 2009 20:40
> To: Sklyar, Oleg (London)
> Cc: r-devel@r-project.org
> Subject: Re: [Rd] setClassUnion with numeric; extending class union
> 
> So, I was intrigued and played around a bit more.  Still 
> can't get any 
> warnings, but the following may be the issue.
> 
> One thing NOT currently possible is to have a class that has 
> NULL as its 
> data part, because type NULL is abnormal and can't have attributes.
> 
> So if you want a class that contains a union including NULL, 
> you're in 
> trouble generating a value from the class that is NULL.  It's 
> not really 
> a consequence of the setUnion() per se.
> 
>  > setClass("bar", contains = "numericOrNULL")
> [1] "bar"
>  > zz = new("bar", NULL)
> Error in validObject(.Object) :
>   invalid class "bar" object: invalid object for slot ".Data" 
> in class 
> "bar": got class "list", should be or extend class "numericOrNULL"
> 
> (How one got from the error to the message is a question, but in any 
> case this can't currently work.)
> 
> As in my example and in your example with a slot called "data", no 
> problem in having a slot value that is NULL.
> 
> Looking ahead, I'm working on some extensions that would 
> allow classes 
> to contain "abnormal" data types (externalptr, environment, ...) by 
> using a reserved slot name, since one can not make the actual 
> data type 
> one of those types.
> 
> John Chambers wrote:
> > What warnings? Which part of the following is not what 
> you're looking 
> > for? (The usual information is needed, like version of R, 
> reproducible 
> > example, etc.)
> >
> >
> > > setClassUnion("numericOrNULL", c("numeric","NULL"))
> > [1] "numericOrNULL"
> > > setClass("foo", representation(x="numericOrNULL"))
> > [1] "foo"
> > > ff = new("foo", x= 1:10)
> > > fg = new("foo", x = NULL)
> > >
> > > ff
> > An object of class "foo"
> > Slot "x":
> > [1] 1 2 3 4 5 6 7 8 9 10
> >
> > > fg
> > An object of class "foo"
> > Slot "x":
> > NULL
> > > fk = new("foo")
> > > fk
> > An object of class "foo"
> > Slot "x":
> > NULL
> >
> > John
> >
> > Sklyar, Oleg (London) wrote:
> >> Dear list:
> >>
> >> I am looking for a good way t

[Rd] Patch for src/main/character.c, systematizing recent fix to do_grep

2009-02-12 Thread Wacek Kusnierczyk
The attached patch provides a modification to the recent fix/improvement
to do_grep already included in the most recent development version.

The original fix added new functionality to the grep function by adding
a new parameter, 'invert'.  In the source code for the underlying
do_grep, the value of the parameter is used to invert the logical
match-no match flag vector ind.  The modification is distributed across
several lines of code.

The patch systematizes the solution by inverting the logical match flag
vector in place, once for each element in the character vector passed to
grep as the argument 'x'.  In the patched version, the invertion appears
just once in the code.

The patch does not modify the functionality of grep in any way.  If the
respective documentation was updated to cover the new functionality
introduced by the original modification, it still applies to the patched
version.

The patch does not solve any immediate problem.  However, due to
replacing the redundantly distributed original modification with a
one-line modofication, the patch is intended to make it easier to
understand, maintain, and further modify the source code.

The patch also renames the variable 'invert' introduced in the original
modification to 'invert_opt', for consistency with how (almost) all
other logical flag parameters in do_grep are named.  This modification
is again functionally transparent and requires no modifications to the
documentation.


The patch was prepared as follows:

svn co https://svn.R-project.org/R/trunk/
cd trunk
tools/rsync-recommended
# modifications made to src/main/character.c
svn diff > do_grep.diff

The patched sources were successfully compiled and tested as follows:

svn revert -R .
patch -p0 < do_grep.diff
./configure
make
make check

Assuming that appropriate tests were prepared for the extended version
of grep as of the original modification, the patched version was
successfully tested.

The patched grep was also tested as follows:

bin/R --no-save -q 

[Rd] Why is srcref of length 6 and not 4 ?

2009-02-12 Thread Romain Francois

Hello,

Consider this file (/tmp/test.R) :


f <- function( x, y = 2 ){
  z <- x + y
  print( z )
}


I get this in R 2.7.2 :

> p <- parse( "/tmp/test.R" )
> str( attr( p, "srcref" ) )
List of 1
$ :Class 'srcref'  atomic [1:4] 1 1 4 1
.. ..- attr(*, "srcfile")=Class 'srcfile' length 4 

and this in R-devel :

> p <- parse( "/tmp/test.R" )
> str( attr(p, "srcref") )
List of 1
$ :Class 'srcref'  atomic [1:6] 1 1 4 1 1 1
.. ..- attr(*, "srcfile")=Class 'srcfile' 

What are the two last numbers ?

Romain

--
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr

--
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] Spearman's rank correlation test

2009-02-12 Thread Petr Savicky
Hi All:

help(cor.test) claims
  For Spearman's test, p-values are computed using algorithm AS 89.

Algorithm AS 89 was introduced by the paper
  D. J. Best & D. E. Roberts (1975), Algorithm AS 89: The Upper Tail
  Probabilities of Spearman's rho. Applied Statistics, Vol. 24, No. 3, 377-379.
Table 1(a) in this paper presents maximum absolute error |\Delta_m|, of the
approximation for all possible values of the statistic S for samples sizes
n = 7, 9, 11, 13. The presented errors are

   n  |\Delta_m|

   7  0.0046
   9  0.0011
  11  0.0006
  13  0.0005

Due to the problem explained in detail including a patch at
  https://stat.ethz.ch/pipermail/r-devel/2009-January/051936.html
the error of R implementation of Spearman's rank correlation test is larger
than the above bounds for the sample size n = 11 and some of the values of S,
which correspond to positive correlation.

For example, for n = 11 and S = 90, we have
  x <- 1:11
  y <- c(6:1, 7, 11:8)
  out <- cor.test(x, y, method="spearman", alternative="greater")
  out$statistic # 90
  out$p.value   # 0.02921104
while the correct p-value is 0.03044548, so the absolute difference
is 0.00123444. This is larger than the absolute error 0.0006 guaranteed
for AS 89. In my opinion, this means that the claim from help(cor.test)
cited above is not correct.

To see the error of AS 89 in the example above, one can use
  cor.test(x, -y, method="spearman", alternative="less")$p.value # 0.03036413
since on the side of negative correlation, R calls AS 89 correctly.
So, for the x, y above, correctly called AS 89 has absolute error 0.8135.

There is a package pspearman currently included to CRAN, which provides a
correction of the problem without the need to modify R base.

Petr.

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why is srcref of length 6 and not 4 ?

2009-02-12 Thread Duncan Murdoch

On 12/02/2009 7:01 AM, Romain Francois wrote:

Hello,

Consider this file (/tmp/test.R) :


f <- function( x, y = 2 ){
   z <- x + y
   print( z )
}


I get this in R 2.7.2 :

 > p <- parse( "/tmp/test.R" )
 > str( attr( p, "srcref" ) )
List of 1
$ :Class 'srcref'  atomic [1:4] 1 1 4 1
 .. ..- attr(*, "srcfile")=Class 'srcfile' length 4 

and this in R-devel :

 > p <- parse( "/tmp/test.R" )
 > str( attr(p, "srcref") )
List of 1
$ :Class 'srcref'  atomic [1:6] 1 1 4 1 1 1
 .. ..- attr(*, "srcfile")=Class 'srcfile' 

What are the two last numbers ?


The original design for srcref gave 4 entries: start line, start byte, 
stop line, stop byte. However, in multibyte strings, bytes don't 
correspond to columns, so error messages could often report the wrong 
location according to what a user sees in an editor.  To support the 
more useful error messages in R-devel, I added two more values: start 
column and stop column.  With pure ASCII text these will be the same as 
start byte and stop byte; with UTF-8 text and non-ASCII characters they 
will be be different.  Other multibyte encodings are only supported if 
the platform can convert them to UTF-8 (and are not well tested; error 
reports would be welcome, if there's a way to improve the performance.)


If you are using these for error reports, I recommend using the two new 
values.  If you are trying to retrieve the text from the source file, 
use the originals.


Duncan Murdoch

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why is srcref of length 6 and not 4 ?

2009-02-12 Thread Romain Francois

Duncan Murdoch wrote:

On 12/02/2009 7:01 AM, Romain Francois wrote:

Hello,

Consider this file (/tmp/test.R) :


f <- function( x, y = 2 ){
   z <- x + y
   print( z )
}


I get this in R 2.7.2 :

 > p <- parse( "/tmp/test.R" )
 > str( attr( p, "srcref" ) )
List of 1
$ :Class 'srcref'  atomic [1:4] 1 1 4 1
 .. ..- attr(*, "srcfile")=Class 'srcfile' length 4 

and this in R-devel :

 > p <- parse( "/tmp/test.R" )
 > str( attr(p, "srcref") )
List of 1
$ :Class 'srcref'  atomic [1:6] 1 1 4 1 1 1
 .. ..- attr(*, "srcfile")=Class 'srcfile' 

What are the two last numbers ?


The original design for srcref gave 4 entries: start line, start byte, 
stop line, stop byte. However, in multibyte strings, bytes don't 
correspond to columns, so error messages could often report the wrong 
location according to what a user sees in an editor.  To support the 
more useful error messages in R-devel, I added two more values: start 
column and stop column.  With pure ASCII text these will be the same 
as start byte and stop byte; with UTF-8 text and non-ASCII characters 
they will be be different.  Other multibyte encodings are only 
supported if the platform can convert them to UTF-8 (and are not well 
tested; error reports would be welcome, if there's a way to improve 
the performance.)


If you are using these for error reports, I recommend using the two 
new values.  If you are trying to retrieve the text from the source 
file, use the originals.


Duncan Murdoch



Thank you Duncan,

I am using this to massage the output of "parse" into a data frame to 
represent it as a tree

(see http://addictedtor.free.fr/misc/sidekick.png)

> cat( readLines( "/tmp/test.R" ), sep = "\n" )
f <- function( x, y = 2 ){
   z <- x + y
   g <- function( x ){
 print( x )
 xx <- x + 1
   }
   g( x )
}
>
> sidekick( "/tmp/test.R", encoding = "utf-8" )
 id parent mode srcref1 srcref2 srcref3 srcref4   
description
1  1  0 function   1   1   8   1 f <- function(x, y 
= 2) {
2  2  1 name   1  26   1  
26 {
3  3  1 call   2   2   2  11z <- 
x + y
4  4  1 function   3   2   6   2g <- 
function(x) {
5  5  1 call   7   2   7   
7  g(x)
6  6  4 name   3  20   3  
20 {
7  7  4 call   4   4   4  13  
print(x)
8  8  4 call   5   4   5  14   xx <- 
x + 1




--
Romain Francois
Independent R Consultant
+33(0) 6 28 91 30 30
http://romainfrancois.blog.free.fr

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Why is srcref of length 6 and not 4 ?

2009-02-12 Thread hadley wickham
> I am using this to massage the output of "parse" into a data frame to
> represent it as a tree
> (see http://addictedtor.free.fr/misc/sidekick.png)

You might also want to take a look at
http://github.com/hadley/eval.with.details/blob/master/R/parse.r

where I'm trying to do something similar for a different purpose.

Hadley

-- 
http://had.co.nz/

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] proposed simulate.glm method

2009-02-12 Thread Ben Bolker

  I have found the "simulate" method (incorporated
in some packages) very handy. As far as I can tell the
only class for which simulate is actually implemented
in base R is lm ... this is actually a little dangerous
for a naive user who might be tempted to try
simulate(X) where X is a glm fit instead, because
it defaults to simulate.lm (since glm inherits from
the lm class), and the answers make no sense ...

Here is my simulate.glm(), which is modeled on
simulate.lm .  It implements simulation for poisson
and binomial (binary or non-binary) models, should
be easy to implement others if that seems necessary.

  I hereby request comments and suggest that it wouldn't
hurt to incorporate it into base R ...  (I will write
docs for it if necessary, perhaps by modifying ?simulate --
there is no specific documentation for simulate.lm)

  cheers
Ben Bolker


simulate.glm <- function (object, nsim = 1, seed = NULL, ...)
{
  ## RNG stuff copied from simulate.lm
  if (!exists(".Random.seed", envir = .GlobalEnv, inherits = FALSE))
runif(1)
  if (is.null(seed))
RNGstate <- get(".Random.seed", envir = .GlobalEnv)
  else {
R.seed <- get(".Random.seed", envir = .GlobalEnv)
set.seed(seed)
RNGstate <- structure(seed, kind = as.list(RNGkind()))
on.exit(assign(".Random.seed", R.seed, envir = .GlobalEnv))
  }
  ## get probabilities/intensities
  pred <- matrix(rep(predict(object,type="response"),nsim),ncol=nsim)
  ntot <- length(pred)
  if (object$family$family=="binomial") {
resp <- object$model[[1]]
size <- if (is.matrix(resp)) rowSums(resp) else 1
  }
  val <- switch(object$family$family,
poisson=rpois(ntot,pred),
binomial=rbinom(ntot,prob=pred,size=size),
stop("family ",object$family$family," not implemented"))
  ans <- as.data.frame(matrix(val,ncol=nsim))
  attr(ans, "seed") <- RNGstate
  ans
}

if (FALSE) {
  ## examples: modified from ?simulate
  x <- 1:10
  n <- 10
  y <- rbinom(length(x),prob=plogis((x-5)/2),size=n)
  y2 <- c("a","b")[1+rbinom(length(x),prob=plogis((x-5)/2),size=1)]
  mod1 <- glm(cbind(y,n-y) ~ x,family=binomial)
  mod2 <- glm(factor(y2) ~ x,family=binomial)
  S1 <- simulate(mod1, nsim = 4)
  S1B <- simulate(mod2, nsim = 4)
  ## repeat the simulation:
  .Random.seed <- attr(S1, "seed")
  identical(S1, simulate(mod1, nsim = 4))

  S2 <- simulate(mod1, nsim = 200, seed = 101)
  rowMeans(S2)/10 # after correcting for binomial sample size, should be
about
  fitted(mod1)

  plot(rowMeans(S2)/10)
  lines(fitted(mod1))

  ## repeat identically:
  (sseed <- attr(S2, "seed")) # seed; RNGkind as attribute
  stopifnot(identical(S2, simulate(mod1, nsim = 200, seed = sseed)))
}


-- 
Ben Bolker
Associate professor, Biology Dep't, Univ. of Florida
bol...@ufl.edu / www.zoology.ufl.edu/bolker
GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc



signature.asc
Description: OpenPGP digital signature
__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] proposed simulate.glm method

2009-02-12 Thread Alex D'Amour
There is functionality similar to this included in the Zelig package
with it's "sim" method. The "sim" method goes a step further and
replicates the fitted model's analysis on the generated datasets as
well. I would suggest taking a look -- Zelig supports most (if not
all) glm models and a wide range of others.

The Zelig maintainers' site can be found at: http://gking.harvard.edu/zelig/.

Full disclosure: I am an employee of the Institute for Quantitative
Social Science, which performs most of the development and support for
the Zelig package.

Best,
Alex D'Amour
Statistical Programmer
Harvard Institute for Quantitative Social Science


2009/2/12 Ben Bolker :
>
>  I have found the "simulate" method (incorporated
> in some packages) very handy. As far as I can tell the
> only class for which simulate is actually implemented
> in base R is lm ... this is actually a little dangerous
> for a naive user who might be tempted to try
> simulate(X) where X is a glm fit instead, because
> it defaults to simulate.lm (since glm inherits from
> the lm class), and the answers make no sense ...
>
> Here is my simulate.glm(), which is modeled on
> simulate.lm .  It implements simulation for poisson
> and binomial (binary or non-binary) models, should
> be easy to implement others if that seems necessary.
>
>  I hereby request comments and suggest that it wouldn't
> hurt to incorporate it into base R ...  (I will write
> docs for it if necessary, perhaps by modifying ?simulate --
> there is no specific documentation for simulate.lm)
>
>  cheers
>Ben Bolker
> --
> Ben Bolker
> Associate professor, Biology Dep't, Univ. of Florida
> bol...@ufl.edu / www.zoology.ufl.edu/bolker
> GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
>
>
> __
> R-devel@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel
>
>

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] proposed simulate.glm method

2009-02-12 Thread Ben Bolker
  Elsewhere (at least in lme4), refit(sim(model)) does the
same thing [and so one would need something like
apply(sim(model,1000),2,refit)].

  sim() is quite interesting, as is Zelig, but I'm not
sure I am ready to leap to it yet -- this was basically
a suggestion that simulate.glm could be included in
"vanilla" R ...

  Also (for better or worse), it looks like sim() also
does parametric bootstrapping on the parameter values,
whereas simulate.[g]lm() just uses "plug-in" estimates.

  cheers
Ben Bolker


Alex D'Amour wrote:
> There is functionality similar to this included in the Zelig package
> with it's "sim" method. The "sim" method goes a step further and
> replicates the fitted model's analysis on the generated datasets as
> well. I would suggest taking a look -- Zelig supports most (if not
> all) glm models and a wide range of others.
> 
> The Zelig maintainers' site can be found at: http://gking.harvard.edu/zelig/.
> 
> Full disclosure: I am an employee of the Institute for Quantitative
> Social Science, which performs most of the development and support for
> the Zelig package.
> 
> Best,
> Alex D'Amour
> Statistical Programmer
> Harvard Institute for Quantitative Social Science
> 
> 
> 2009/2/12 Ben Bolker :
>>  I have found the "simulate" method (incorporated
>> in some packages) very handy. As far as I can tell the
>> only class for which simulate is actually implemented
>> in base R is lm ... this is actually a little dangerous
>> for a naive user who might be tempted to try
>> simulate(X) where X is a glm fit instead, because
>> it defaults to simulate.lm (since glm inherits from
>> the lm class), and the answers make no sense ...
>>
>> Here is my simulate.glm(), which is modeled on
>> simulate.lm .  It implements simulation for poisson
>> and binomial (binary or non-binary) models, should
>> be easy to implement others if that seems necessary.
>>
>>  I hereby request comments and suggest that it wouldn't
>> hurt to incorporate it into base R ...  (I will write
>> docs for it if necessary, perhaps by modifying ?simulate --
>> there is no specific documentation for simulate.lm)
>>
>>  cheers
>>Ben Bolker
>> --
>> Ben Bolker
>> Associate professor, Biology Dep't, Univ. of Florida
>> bol...@ufl.edu / www.zoology.ufl.edu/bolker
>> GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc
>>
>>
>> __
>> R-devel@r-project.org mailing list
>> https://stat.ethz.ch/mailman/listinfo/r-devel
>>
>>


-- 
Ben Bolker
Associate professor, Biology Dep't, Univ. of Florida
bol...@ufl.edu / www.zoology.ufl.edu/bolker
GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


[Rd] typo in example(dbEscapeStrings) (PR#13521)

2009-02-12 Thread mayeul . kauffmann
Full_Name: Mayeul Kauffmann
Version: 2.8.1
OS: x86_64-pc-linux-gnu (kubuntu)
Submission from: (NULL) (86.200.212.40)


The file /library/RMySQL/html/dbEscapeStrings.html documents dbEscapeStrings()
In the example, an 's' is missing in line 3:

## Not run: 
tmp <- sprintf("select * from emp where lname = %s", "O'Reilly")
sql <- dbEscapeString(con, tmp)
dbGetQuery(con, sql)
## End(Not run)

sql <- dbEscapeString(con, tmp)
should be:
sql <- dbEscapeStrings(con, tmp)

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel