Re: [R] Use apply only on non-missing values

Doran, Harold Thu, 03 Jun 2010 10:40:36 -0700

Actually, this is clever. I modified your advice and vectorized this as:

1/ (1 +   exp(b_vector - t(matrix(theta, nrow= nrow(dat) , ncol= ncol(dat)))))

Instead of using the apply() function as I did before. In terms of speed, this 
new solution is immensely faster, as you also noted. Now, whether it works on 
my real world problem is TBD. It is running now, slowly.

From: Joris Meys [mailto:jorism...@gmail.com]
Sent: Wednesday, June 02, 2010 7:35 PM
To: Doran, Harold
Cc: r-help@r-project.org
Subject: Re: [R] Use apply only on non-missing values

Not really a direct answer on your question, but:
> system.time(replicate(10000,apply(as.matrix(theta), 1, rasch, b_vector)))
   user  system elapsed
   4.51    0.03    4.55

> system.time(replicate(10000,theta%*%t(b_vector)))
   user  system elapsed
   0.25    0.00    0.25

It does make a difference on large datasets...
Cheers
Joris
On Wed, Jun 2, 2010 at 4:44 PM, Doran, Harold 
<hdo...@air.org<mailto:hdo...@air.org>> wrote:
I have a function that I am currently using very inefficiently. The following 
are needed to illustrate the problem:

set.seed(12345)
dat <- matrix(sample(c(0,1), 110, replace = TRUE), nrow = 11, ncol=10)
mis <- sample(1:110, 5)
dat[mis] <- NA
theta <- rnorm(11)
b_vector <- runif(10, -4,4)
empty <- which(is.na<http://is.na>(t(dat)))

So, I have a matrix (dat) with some values within the matrix missing. In my 
real world problem, the matrix is huge, and most values are missing. The 
function in question is called derivs() and is below. But, let me step through 
the inefficient portions.

First, I create a matrix of some predicted probabilities as:

rasch <- function(theta,b) 1/ (1 + exp(b-theta))
mat <- apply(as.matrix(theta), 1, rasch, b_vector)

However, I only need those predicted probabilities in places where the data are 
not missing. So, the next step in the function is

mat[empty] <- NA

which manually places NAs in places where the data are missing (notice the 
matrix 'mat' is the transpose of the data matrix and so I get the empty 
positions from the transpose of dat).

Afterwards, the function computes the gradient and hessians needed to complete 
the MLE estimation.

All of this works in the sense that it yields the correct answers for my 
problem. But, the glaring problem is that I create predicted probabilities for 
every cell in 'mat' when in many cases they are not needed. I end up replacing 
those values with NAs. In my real world problem, this is horribly inefficient 
and slow.

My question is then is there a way to use apply such that is computes the 
necessary predicted probabilities only when the data are not missing to yield 
the matrix 'mat'. My desired end result is the matrix 'mat' created after the 
manually placing the NAs in the appropriate cells.

Thanks
Harold

derivs <- function(dat, b_vector, theta){
                               mat <- apply(as.matrix(theta), 1, rasch, 
b_vector)
                               mat[empty] <- NA
                               gradient <- -(colSums(dat, na.rm = TRUE) - 
rowSums(mat, na.rm = TRUE))
                               hessian <-  -(rowSums(mat * (1-mat), na.rm = 
TRUE))
                               list('gradient' = gradient, 'hessian' = hessian)
               }

> sessionInfo()
R version 2.10.1 (2009-12-14)
i386-pc-mingw32

locale:
[1] LC_COLLATE=English_United States.1252  LC_CTYPE=English_United States.1252  
  LC_MONETARY=English_United States.1252
[4] LC_NUMERIC=C                           LC_TIME=English_United States.1252

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base

loaded via a namespace (and not attached):
[1] tools_2.10.1
>

       [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org<mailto:R-help@r-project.org> mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

--
Joris Meys
Statistical Consultant

Ghent University
Faculty of Bioscience Engineering
Department of Applied mathematics, biometrics and process control

Coupure Links 653
B-9000 Gent

tel : +32 9 264 59 87
joris.m...@ugent.be
-------------------------------
Disclaimer : http://helpdesk.ugent.be/e-maildisclaimer.php

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Use apply only on non-missing values

Reply via email to