[Rd] [RfC] Family dispersion
Hi, I'd like to hear your opinion about the following proposal to make the computation of dispersion in GLMs more flexible. Dispersion is used in summary.glm; the relevant code chunk with the dispersion calculation is listed below (from glm.R): summary.glm <- function(object, dispersion = NULL, correlation = FALSE, symbolic.cor = FALSE, ...) { est.disp <- FALSE df.r <- object$df.residual if(is.null(dispersion)) # calculate dispersion if needed dispersion <- if(object$family$family %in% c("poisson", "binomial")) 1 else if(df.r > 0) { est.disp <- TRUE if(any(object$weights==0)) warning("observations with zero weight not used for calculating dispersion") sum((object$weights*object$residuals^2)[object$weights > 0])/ df.r } else { est.disp <- TRUE NaN } # ... } Many exponential families have unit dispersion, or can be cast to have unit dispersion, e.g. hypergeometric, negative binomial, and so on. However, summary.glm only assigns unit dispersion to Poisson and binomial families, as the code above indicates. My suggestion is to make this check more general by having a 'dispersion' slot in the family class; for instance, we would have poisson(...)$dispersion = 1 and binomial(...)$dispersion = 1. The updated summary.glm would be: default.dispersion <- function (object, ...) { df.r <- object$df.residual if (df.r > 0) { if (any(object$weights == 0)) warning("observations with zero weight not used for calculating dispersion") sum((object$weights * object$residuals ^ 2)[object$weights > 0]) / df.r } else NaN } summary.glm <- function(object, dispersion = default.dispersion, correlation = FALSE, symbolic.cor = FALSE, ...) { if (!is.null(object$family$dispersion)) # use family dispersion? dispersion <- object$family$dispersion est.disp <- is.function(dispersion) dispersion <- if (est.disp) dispersion(object, ...) else dispersion df.r <- object$df.residual # ... (unchanged code below) } Note that 'dispersion' can be a function taking a glm object or a number (e.g. 1). Here are some examples: R> library(MASS) R> gm <- glm(formula, family=Gamma()) R> summary(gm, dispersion = gamma.dispersion) # ML estimate of dispersion R> set.dispersion <- function (fam, disp) # update family dispersion R> structure(within(unclass(fam), dispersion <- disp), class = "family") R> gm <- glm(formula, family=set.dispersion(Gamma(), gamma.dispersion)) R> summary(gm) # use family dispersion R> Exp <- function (...) set.dispersion(Gamma(...), 1) Thanks in advance for the feedback. Cheers, Luis -- Computers are useless. They can only give you answers. -- Pablo Picasso -- Luis Carvalho Associate Professor Dept. of Mathematics and Statistics Boston University http://math.bu.edu/people/lecarval __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Matrix memory layout R vs. C
Hi Larissa, > I'm trying to pass a matrix from R to C, where some computation is > done for performance reasons, and back to R for evaluation. But I've > run into the problem that R and C seem to have different ways of > representing the matrix in main memory. The C representation of a 2D > matrix in linear memory is concatenation of the rows whereas in R, > it's a concatenation of the columns. That leads to the problem. R uses column-major order [1] because that's the order used by FORTRAN and R uses many libraries with a FORTRAN interface (most important: BLAS and LAPACK for numerical linear algebra.) That's the situation with many other languages / libraries that use similar interfaces, such as MATLAB, Octave, Julia, and Scilab [1]. So, there's no way around it and you just have to get used to referencing matrix entries in col-major order. [1] http://en.wikipedia.org/wiki/Row-major_order > Here's an example of C code that simply prints the matrix it gets from R: Try this instead: #include #include "R.h" void printMatrix(int *mPtr, int m, int n) { int j,k; for(j = 0; j < m; j++){ for(k = 0; k < n; k++) { printf("%d\t", mPtr[j + m * k]); } printf("\n"); } } > No matter if you create the matrix with byrow=TRUE or FALSE, C > always interprets it the other way round. Is there a way to avoid > this? I've read previous posts on passing a matrix from R to C, but > the essence of the answers was that "a matrix in R is just a vector > with attributes", but I don't see how this helps. Maybe someone can > clarify. Specifying byrow=TRUE only changes how the matrix is *read*, not how it's stored. A matrix -- and, more generally, an array -- is in fact just a vector with (dimension) attributes, but that just specifies the memory layout of the matrix, and not its representation (that is, it doesn't help.) Unfortunately, there's no way to avoid this, but it shouldn't be too bad to get used to it. :) Cheers, Luis -- Computers are useless. They can only give you answers. -- Pablo Picasso -- Luis Carvalho (Kozure) lua -e 'print((("lexcarva...@no.gmail.spam.com"):gsub("(%u+%.)","")))' __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel
Re: [Rd] Matrix memory layout R vs. C
> And to be safer on a 64-bit platform > > #define INDEX(i,j) ((i) + rows*(R_xlen_t)(j)) > > since rows*j might overflow there. Shouldn't 'rows' be also a parameter? #define INDEX(rows,i,j) ((i) + (rows)*((R_xlen_t)(j))) Cheers, Luis -- Computers are useless. They can only give you answers. -- Pablo Picasso -- Luis Carvalho (Kozure) lua -e 'print((("lexcarva...@no.gmail.spam.com"):gsub("(%u+%.)","")))' __ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel