[Rd] [RfC] Family dispersion

2016-06-01 Thread Luis Carvalho
Hi,

I'd like to hear your opinion about the following proposal to make the
computation of dispersion in GLMs more flexible. Dispersion is used in
summary.glm; the relevant code chunk with the dispersion calculation is listed
below (from glm.R):


summary.glm <- function(object, dispersion = NULL,
correlation = FALSE, symbolic.cor = FALSE, ...)
{
  est.disp <- FALSE
  df.r <- object$df.residual
  if(is.null(dispersion))   # calculate dispersion if needed
dispersion <-
  if(object$family$family %in% c("poisson", "binomial"))  1
  else if(df.r > 0) {
est.disp <- TRUE
if(any(object$weights==0))
  warning("observations with zero weight not used for calculating 
dispersion")
sum((object$weights*object$residuals^2)[object$weights > 0])/ df.r
  } else {
est.disp <- TRUE
NaN
  }
  # ...
}


Many exponential families have unit dispersion, or can be cast to have unit
dispersion, e.g. hypergeometric, negative binomial, and so on. However,
summary.glm only assigns unit dispersion to Poisson and binomial families, as
the code above indicates. My suggestion is to make this check more general by
having a 'dispersion' slot in the family class; for instance, we would have
poisson(...)$dispersion = 1 and binomial(...)$dispersion = 1. The updated
summary.glm would be:


default.dispersion <- function (object, ...) {
  df.r <- object$df.residual
  if (df.r > 0) {
if (any(object$weights == 0))
  warning("observations with zero weight not used for calculating 
dispersion")
sum((object$weights * object$residuals ^ 2)[object$weights > 0]) / df.r
  }
  else NaN
}

summary.glm <- function(object, dispersion = default.dispersion,
correlation = FALSE, symbolic.cor = FALSE, ...)
{
  if (!is.null(object$family$dispersion)) # use family dispersion?
dispersion <- object$family$dispersion
  est.disp <- is.function(dispersion)
  dispersion <- if (est.disp) dispersion(object, ...) else dispersion

  df.r <- object$df.residual
  # ... (unchanged code below)
}


Note that 'dispersion' can be a function taking a glm object or a number
(e.g. 1). Here are some examples:

R> library(MASS)
R> gm <- glm(formula, family=Gamma())
R> summary(gm, dispersion = gamma.dispersion) # ML estimate of dispersion

R> set.dispersion <- function (fam, disp) # update family dispersion
R>   structure(within(unclass(fam), dispersion <- disp), class = "family")
R> gm <- glm(formula, family=set.dispersion(Gamma(), gamma.dispersion))
R> summary(gm) # use family dispersion
R> Exp <- function (...) set.dispersion(Gamma(...), 1)

Thanks in advance for the feedback.

Cheers,
Luis


-- 
Computers are useless. They can only give you answers.
-- Pablo Picasso

-- 
Luis Carvalho
Associate Professor
Dept. of Mathematics and Statistics
Boston University
http://math.bu.edu/people/lecarval

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Matrix memory layout R vs. C

2013-12-06 Thread Luis Carvalho
Hi Larissa,

> I'm trying to pass a matrix from R to C, where some computation is
> done for performance reasons, and back to R for evaluation. But I've
> run into the problem that R and C seem to have different ways of
> representing the matrix in main memory. The C representation of a 2D
> matrix in linear memory is concatenation of the rows whereas in R,
> it's a concatenation of the columns.  That leads to the problem.



R uses column-major order [1] because that's the order used by FORTRAN and R
uses many libraries with a FORTRAN interface (most important: BLAS and LAPACK
for numerical linear algebra.) That's the situation with many other languages
/ libraries that use similar interfaces, such as MATLAB, Octave, Julia, and
Scilab [1]. So, there's no way around it and you just have to get used to
referencing matrix entries in col-major order.

[1] http://en.wikipedia.org/wiki/Row-major_order


> Here's an example of C code that simply prints the matrix it gets from R:



Try this instead:

#include 
#include "R.h"

void printMatrix(int *mPtr, int m, int n) {
  int j,k;

  for(j = 0; j < m; j++){
for(k = 0; k < n; k++) {
  printf("%d\t", mPtr[j + m * k]);
}
printf("\n");
  }
}

> No matter if you create the matrix with byrow=TRUE or FALSE, C
> always interprets it the other way round. Is there a way to avoid
> this? I've read previous posts on passing a matrix from R to C, but
> the essence of the answers was that "a matrix in R is just a vector
> with attributes", but I don't see how this helps. Maybe someone can
> clarify.

Specifying byrow=TRUE only changes how the matrix is *read*, not how it's
stored. A matrix -- and, more generally, an array -- is in fact just a vector
with (dimension) attributes, but that just specifies the memory layout of the
matrix, and not its representation (that is, it doesn't help.) Unfortunately,
there's no way to avoid this, but it shouldn't be too bad to get used to it.
:)

Cheers,
Luis

-- 
Computers are useless. They can only give you answers.
-- Pablo Picasso

-- 
Luis Carvalho (Kozure)
lua -e 'print((("lexcarva...@no.gmail.spam.com"):gsub("(%u+%.)","")))'

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel


Re: [Rd] Matrix memory layout R vs. C

2013-12-06 Thread Luis Carvalho
> And to be safer on a 64-bit platform
> 
> #define INDEX(i,j) ((i) + rows*(R_xlen_t)(j))
> 
> since rows*j might overflow there.

Shouldn't 'rows' be also a parameter?

#define INDEX(rows,i,j) ((i) + (rows)*((R_xlen_t)(j)))

Cheers,
Luis

-- 
Computers are useless. They can only give you answers.
    -- Pablo Picasso

-- 
Luis Carvalho (Kozure)
lua -e 'print((("lexcarva...@no.gmail.spam.com"):gsub("(%u+%.)","")))'

__
R-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel