Re: [Rd] setClassUnion with numeric; extending class union
Hi John, sorry for not posting more info. Strangely I get warnings about setClassUnion with numeric in a very special case: if I define it in a clean R session then there are no warnings, however if I load a number of my packages where there are other classes derived from numeric and exported then I get the following warnings: > setClassUnion("numericOrNULL", c("numeric","NULL")) [1] "numericOrNULL" Warning messages: 1: In .checkSubclasses(class1, classDef, class2, classDef2, where1, : Subclass "TimeDateBase" of class "numeric" is not local and cannot be updated for new inheritance information; consider setClassUnion() 2: In .checkSubclasses(class1, classDef, class2, classDef2, where1, : Subclass "TimeDate" of class "numeric" is not local and cannot be updated for new inheritance information; consider setClassUnion() 3: In .checkSubclasses(class1, classDef, class2, classDef2, where1, : Subclass "Time" of class "numeric" is not local and cannot be updated for new inheritance information; consider setClassUnion() The class is operational even with those warnings though. Now, the above classes are defined as follows: ## - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - setClass("TimeDateBase", representation("numeric", mode="character"), prototype(mode="posix") ) ## - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - setClass("TimeDate", representation("TimeDateBase", tzone="character"), prototype(tzone="Europe/London") ) ## - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - setClass("Time", representation("TimeDateBase") ) Theses classes work perfectly fine on their own and are used throughout our code for all possible time and date operations extending the existing functionality of R and available third party packages by an order of magnitude. I do not see a relation between the above class definitions and the newly defined class union though apart from the fact that they are in a package namespace and therefore locked. Sorry I cannot provide more source code as the code is not yet public. It would definitely be nice to somehow have a .Data slot in NULL or even a data.frame, although I do understand that this is quite a substantial piece of work to make it all robust and backward compatible. > sessionInfo() ## of a clean session R version 2.9.0 Under development (unstable) (2009-02-02 r47821) x86_64-unknown-linux-gnu locale: C attached base packages: [1] stats graphics utils datasets grDevices methods base Any thoughts are greatly appreciated. Kind regards, Oleg Dr Oleg Sklyar Research Technologist AHL / Man Investments Ltd +44 (0)20 7144 3107 oskl...@maninvestments.com > -Original Message- > From: John Chambers [mailto:j...@r-project.org] > Sent: 11 February 2009 20:40 > To: Sklyar, Oleg (London) > Cc: r-devel@r-project.org > Subject: Re: [Rd] setClassUnion with numeric; extending class union > > So, I was intrigued and played around a bit more. Still > can't get any > warnings, but the following may be the issue. > > One thing NOT currently possible is to have a class that has > NULL as its > data part, because type NULL is abnormal and can't have attributes. > > So if you want a class that contains a union including NULL, > you're in > trouble generating a value from the class that is NULL. It's > not really > a consequence of the setUnion() per se. > > > setClass("bar", contains = "numericOrNULL") > [1] "bar" > > zz = new("bar", NULL) > Error in validObject(.Object) : > invalid class "bar" object: invalid object for slot ".Data" > in class > "bar": got class "list", should be or extend class "numericOrNULL" > > (How one got from the error to the message is a question, but in any > case this can't currently work.) > > As in my example and in your example with a slot called "data", no > problem in having a slot value that is NULL. > > Looking ahead, I'm working on some extensions that would > allow classes > to contain "abnormal" data types (externalptr, environment, ...) by > using a reserved slot name, since one can not make the actual > data type > one of those types. > > John Chambers wrote: > > What warnings? Which part of the following is not what > you're looking > > for? (The usual information is needed, like version of R, > reproducible > > example, etc.) > > > > > > > setClassUnion("numericOrNULL", c("numeric","NULL")) > > [1] "numericOrNULL" > > > setClass("foo", representation(x="numericOrNULL")) > > [1] "foo" > > > ff = new("foo", x= 1:10) > > > fg = new("foo", x = NULL) > > > > > > ff > > An object of class "foo" > > Slot "x": > > [1] 1 2 3 4 5 6 7 8 9 10 > > > > > fg > > An object of class "foo" > > Slot "x": > > NULL > > > fk = new("foo") > > > fk > > An object of class "foo" > > Slot "x": > > NULL > > > > John > > > > Sklyar, Oleg (London) wrote: > >> Dear list: > >> > >> I am looking for a good way t
[Rd] Patch for src/main/character.c, systematizing recent fix to do_grep
The attached patch provides a modification to the recent fix/improvement to do_grep already included in the most recent development version. The original fix added new functionality to the grep function by adding a new parameter, 'invert'. In the source code for the underlying do_grep, the value of the parameter is used to invert the logical match-no match flag vector ind. The modification is distributed across several lines of code. The patch systematizes the solution by inverting the logical match flag vector in place, once for each element in the character vector passed to grep as the argument 'x'. In the patched version, the invertion appears just once in the code. The patch does not modify the functionality of grep in any way. If the respective documentation was updated to cover the new functionality introduced by the original modification, it still applies to the patched version. The patch does not solve any immediate problem. However, due to replacing the redundantly distributed original modification with a one-line modofication, the patch is intended to make it easier to understand, maintain, and further modify the source code. The patch also renames the variable 'invert' introduced in the original modification to 'invert_opt', for consistency with how (almost) all other logical flag parameters in do_grep are named. This modification is again functionally transparent and requires no modifications to the documentation. The patch was prepared as follows: svn co https://svn.R-project.org/R/trunk/ cd trunk tools/rsync-recommended # modifications made to src/main/character.c svn diff > do_grep.diff The patched sources were successfully compiled and tested as follows: svn revert -R . patch -p0 < do_grep.diff ./configure make make check Assuming that appropriate tests were prepared for the extended version of grep as of the original modification, the patched version was successfully tested. The patched grep was also tested as follows: bin/R --no-save -q
[Rd] Why is srcref of length 6 and not 4 ?
Hello, Consider this file (/tmp/test.R) : f <- function( x, y = 2 ){ z <- x + y print( z ) } I get this in R 2.7.2 : > p <- parse( "/tmp/test.R" ) > str( attr( p, "srcref" ) ) List of 1 $ :Class 'srcref' atomic [1:4] 1 1 4 1 .. ..- attr(*, "srcfile")=Class 'srcfile' length 4 and this in R-devel : > p <- parse( "/tmp/test.R" ) > str( attr(p, "srcref") ) List of 1 $ :Class 'srcref' atomic [1:6] 1 1 4 1 1 1 .. ..- attr(*, "srcfile")=Class 'srcfile' What are the two last numbers ? Romain -- Romain Francois Independent R Consultant +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr -- Romain Francois Independent R Consultant +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] Spearman's rank correlation test
Hi All: help(cor.test) claims For Spearman's test, p-values are computed using algorithm AS 89. Algorithm AS 89 was introduced by the paper D. J. Best & D. E. Roberts (1975), Algorithm AS 89: The Upper Tail Probabilities of Spearman's rho. Applied Statistics, Vol. 24, No. 3, 377-379. Table 1(a) in this paper presents maximum absolute error |\Delta_m|, of the approximation for all possible values of the statistic S for samples sizes n = 7, 9, 11, 13. The presented errors are n |\Delta_m| 7 0.0046 9 0.0011 11 0.0006 13 0.0005 Due to the problem explained in detail including a patch at https://stat.ethz.ch/pipermail/r-devel/2009-January/051936.html the error of R implementation of Spearman's rank correlation test is larger than the above bounds for the sample size n = 11 and some of the values of S, which correspond to positive correlation. For example, for n = 11 and S = 90, we have x <- 1:11 y <- c(6:1, 7, 11:8) out <- cor.test(x, y, method="spearman", alternative="greater") out$statistic # 90 out$p.value # 0.02921104 while the correct p-value is 0.03044548, so the absolute difference is 0.00123444. This is larger than the absolute error 0.0006 guaranteed for AS 89. In my opinion, this means that the claim from help(cor.test) cited above is not correct. To see the error of AS 89 in the example above, one can use cor.test(x, -y, method="spearman", alternative="less")$p.value # 0.03036413 since on the side of negative correlation, R calls AS 89 correctly. So, for the x, y above, correctly called AS 89 has absolute error 0.8135. There is a package pspearman currently included to CRAN, which provides a correction of the problem without the need to modify R base. Petr. __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Why is srcref of length 6 and not 4 ?
On 12/02/2009 7:01 AM, Romain Francois wrote: Hello, Consider this file (/tmp/test.R) : f <- function( x, y = 2 ){ z <- x + y print( z ) } I get this in R 2.7.2 : > p <- parse( "/tmp/test.R" ) > str( attr( p, "srcref" ) ) List of 1 $ :Class 'srcref' atomic [1:4] 1 1 4 1 .. ..- attr(*, "srcfile")=Class 'srcfile' length 4 and this in R-devel : > p <- parse( "/tmp/test.R" ) > str( attr(p, "srcref") ) List of 1 $ :Class 'srcref' atomic [1:6] 1 1 4 1 1 1 .. ..- attr(*, "srcfile")=Class 'srcfile' What are the two last numbers ? The original design for srcref gave 4 entries: start line, start byte, stop line, stop byte. However, in multibyte strings, bytes don't correspond to columns, so error messages could often report the wrong location according to what a user sees in an editor. To support the more useful error messages in R-devel, I added two more values: start column and stop column. With pure ASCII text these will be the same as start byte and stop byte; with UTF-8 text and non-ASCII characters they will be be different. Other multibyte encodings are only supported if the platform can convert them to UTF-8 (and are not well tested; error reports would be welcome, if there's a way to improve the performance.) If you are using these for error reports, I recommend using the two new values. If you are trying to retrieve the text from the source file, use the originals. Duncan Murdoch __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Why is srcref of length 6 and not 4 ?
Duncan Murdoch wrote: On 12/02/2009 7:01 AM, Romain Francois wrote: Hello, Consider this file (/tmp/test.R) : f <- function( x, y = 2 ){ z <- x + y print( z ) } I get this in R 2.7.2 : > p <- parse( "/tmp/test.R" ) > str( attr( p, "srcref" ) ) List of 1 $ :Class 'srcref' atomic [1:4] 1 1 4 1 .. ..- attr(*, "srcfile")=Class 'srcfile' length 4 and this in R-devel : > p <- parse( "/tmp/test.R" ) > str( attr(p, "srcref") ) List of 1 $ :Class 'srcref' atomic [1:6] 1 1 4 1 1 1 .. ..- attr(*, "srcfile")=Class 'srcfile' What are the two last numbers ? The original design for srcref gave 4 entries: start line, start byte, stop line, stop byte. However, in multibyte strings, bytes don't correspond to columns, so error messages could often report the wrong location according to what a user sees in an editor. To support the more useful error messages in R-devel, I added two more values: start column and stop column. With pure ASCII text these will be the same as start byte and stop byte; with UTF-8 text and non-ASCII characters they will be be different. Other multibyte encodings are only supported if the platform can convert them to UTF-8 (and are not well tested; error reports would be welcome, if there's a way to improve the performance.) If you are using these for error reports, I recommend using the two new values. If you are trying to retrieve the text from the source file, use the originals. Duncan Murdoch Thank you Duncan, I am using this to massage the output of "parse" into a data frame to represent it as a tree (see http://addictedtor.free.fr/misc/sidekick.png) > cat( readLines( "/tmp/test.R" ), sep = "\n" ) f <- function( x, y = 2 ){ z <- x + y g <- function( x ){ print( x ) xx <- x + 1 } g( x ) } > > sidekick( "/tmp/test.R", encoding = "utf-8" ) id parent mode srcref1 srcref2 srcref3 srcref4 description 1 1 0 function 1 1 8 1 f <- function(x, y = 2) { 2 2 1 name 1 26 1 26 { 3 3 1 call 2 2 2 11z <- x + y 4 4 1 function 3 2 6 2g <- function(x) { 5 5 1 call 7 2 7 7 g(x) 6 6 4 name 3 20 3 20 { 7 7 4 call 4 4 4 13 print(x) 8 8 4 call 5 4 5 14 xx <- x + 1 -- Romain Francois Independent R Consultant +33(0) 6 28 91 30 30 http://romainfrancois.blog.free.fr __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Why is srcref of length 6 and not 4 ?
> I am using this to massage the output of "parse" into a data frame to > represent it as a tree > (see http://addictedtor.free.fr/misc/sidekick.png) You might also want to take a look at http://github.com/hadley/eval.with.details/blob/master/R/parse.r where I'm trying to do something similar for a different purpose. Hadley -- http://had.co.nz/ __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] proposed simulate.glm method
I have found the "simulate" method (incorporated in some packages) very handy. As far as I can tell the only class for which simulate is actually implemented in base R is lm ... this is actually a little dangerous for a naive user who might be tempted to try simulate(X) where X is a glm fit instead, because it defaults to simulate.lm (since glm inherits from the lm class), and the answers make no sense ... Here is my simulate.glm(), which is modeled on simulate.lm . It implements simulation for poisson and binomial (binary or non-binary) models, should be easy to implement others if that seems necessary. I hereby request comments and suggest that it wouldn't hurt to incorporate it into base R ... (I will write docs for it if necessary, perhaps by modifying ?simulate -- there is no specific documentation for simulate.lm) cheers Ben Bolker simulate.glm <- function (object, nsim = 1, seed = NULL, ...) { ## RNG stuff copied from simulate.lm if (!exists(".Random.seed", envir = .GlobalEnv, inherits = FALSE)) runif(1) if (is.null(seed)) RNGstate <- get(".Random.seed", envir = .GlobalEnv) else { R.seed <- get(".Random.seed", envir = .GlobalEnv) set.seed(seed) RNGstate <- structure(seed, kind = as.list(RNGkind())) on.exit(assign(".Random.seed", R.seed, envir = .GlobalEnv)) } ## get probabilities/intensities pred <- matrix(rep(predict(object,type="response"),nsim),ncol=nsim) ntot <- length(pred) if (object$family$family=="binomial") { resp <- object$model[[1]] size <- if (is.matrix(resp)) rowSums(resp) else 1 } val <- switch(object$family$family, poisson=rpois(ntot,pred), binomial=rbinom(ntot,prob=pred,size=size), stop("family ",object$family$family," not implemented")) ans <- as.data.frame(matrix(val,ncol=nsim)) attr(ans, "seed") <- RNGstate ans } if (FALSE) { ## examples: modified from ?simulate x <- 1:10 n <- 10 y <- rbinom(length(x),prob=plogis((x-5)/2),size=n) y2 <- c("a","b")[1+rbinom(length(x),prob=plogis((x-5)/2),size=1)] mod1 <- glm(cbind(y,n-y) ~ x,family=binomial) mod2 <- glm(factor(y2) ~ x,family=binomial) S1 <- simulate(mod1, nsim = 4) S1B <- simulate(mod2, nsim = 4) ## repeat the simulation: .Random.seed <- attr(S1, "seed") identical(S1, simulate(mod1, nsim = 4)) S2 <- simulate(mod1, nsim = 200, seed = 101) rowMeans(S2)/10 # after correcting for binomial sample size, should be about fitted(mod1) plot(rowMeans(S2)/10) lines(fitted(mod1)) ## repeat identically: (sseed <- attr(S2, "seed")) # seed; RNGkind as attribute stopifnot(identical(S2, simulate(mod1, nsim = 200, seed = sseed))) } -- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bol...@ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc signature.asc Description: OpenPGP digital signature __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] proposed simulate.glm method
There is functionality similar to this included in the Zelig package with it's "sim" method. The "sim" method goes a step further and replicates the fitted model's analysis on the generated datasets as well. I would suggest taking a look -- Zelig supports most (if not all) glm models and a wide range of others. The Zelig maintainers' site can be found at: http://gking.harvard.edu/zelig/. Full disclosure: I am an employee of the Institute for Quantitative Social Science, which performs most of the development and support for the Zelig package. Best, Alex D'Amour Statistical Programmer Harvard Institute for Quantitative Social Science 2009/2/12 Ben Bolker : > > I have found the "simulate" method (incorporated > in some packages) very handy. As far as I can tell the > only class for which simulate is actually implemented > in base R is lm ... this is actually a little dangerous > for a naive user who might be tempted to try > simulate(X) where X is a glm fit instead, because > it defaults to simulate.lm (since glm inherits from > the lm class), and the answers make no sense ... > > Here is my simulate.glm(), which is modeled on > simulate.lm . It implements simulation for poisson > and binomial (binary or non-binary) models, should > be easy to implement others if that seems necessary. > > I hereby request comments and suggest that it wouldn't > hurt to incorporate it into base R ... (I will write > docs for it if necessary, perhaps by modifying ?simulate -- > there is no specific documentation for simulate.lm) > > cheers >Ben Bolker > -- > Ben Bolker > Associate professor, Biology Dep't, Univ. of Florida > bol...@ufl.edu / www.zoology.ufl.edu/bolker > GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc > > > __ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel > > __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] proposed simulate.glm method
Elsewhere (at least in lme4), refit(sim(model)) does the same thing [and so one would need something like apply(sim(model,1000),2,refit)]. sim() is quite interesting, as is Zelig, but I'm not sure I am ready to leap to it yet -- this was basically a suggestion that simulate.glm could be included in "vanilla" R ... Also (for better or worse), it looks like sim() also does parametric bootstrapping on the parameter values, whereas simulate.[g]lm() just uses "plug-in" estimates. cheers Ben Bolker Alex D'Amour wrote: > There is functionality similar to this included in the Zelig package > with it's "sim" method. The "sim" method goes a step further and > replicates the fitted model's analysis on the generated datasets as > well. I would suggest taking a look -- Zelig supports most (if not > all) glm models and a wide range of others. > > The Zelig maintainers' site can be found at: http://gking.harvard.edu/zelig/. > > Full disclosure: I am an employee of the Institute for Quantitative > Social Science, which performs most of the development and support for > the Zelig package. > > Best, > Alex D'Amour > Statistical Programmer > Harvard Institute for Quantitative Social Science > > > 2009/2/12 Ben Bolker : >> I have found the "simulate" method (incorporated >> in some packages) very handy. As far as I can tell the >> only class for which simulate is actually implemented >> in base R is lm ... this is actually a little dangerous >> for a naive user who might be tempted to try >> simulate(X) where X is a glm fit instead, because >> it defaults to simulate.lm (since glm inherits from >> the lm class), and the answers make no sense ... >> >> Here is my simulate.glm(), which is modeled on >> simulate.lm . It implements simulation for poisson >> and binomial (binary or non-binary) models, should >> be easy to implement others if that seems necessary. >> >> I hereby request comments and suggest that it wouldn't >> hurt to incorporate it into base R ... (I will write >> docs for it if necessary, perhaps by modifying ?simulate -- >> there is no specific documentation for simulate.lm) >> >> cheers >>Ben Bolker >> -- >> Ben Bolker >> Associate professor, Biology Dep't, Univ. of Florida >> bol...@ufl.edu / www.zoology.ufl.edu/bolker >> GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc >> >> >> __ >> R-devel@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-devel >> >> -- Ben Bolker Associate professor, Biology Dep't, Univ. of Florida bol...@ufl.edu / www.zoology.ufl.edu/bolker GPG key: www.zoology.ufl.edu/bolker/benbolker-publickey.asc __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
[Rd] typo in example(dbEscapeStrings) (PR#13521)
Full_Name: Mayeul Kauffmann Version: 2.8.1 OS: x86_64-pc-linux-gnu (kubuntu) Submission from: (NULL) (86.200.212.40) The file /library/RMySQL/html/dbEscapeStrings.html documents dbEscapeStrings() In the example, an 's' is missing in line 3: ## Not run: tmp <- sprintf("select * from emp where lname = %s", "O'Reilly") sql <- dbEscapeString(con, tmp) dbGetQuery(con, sql) ## End(Not run) sql <- dbEscapeString(con, tmp) should be: sql <- dbEscapeStrings(con, tmp) __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel