Hi All,

I have a 1000x1000000 matrix. 
The calculation I would like to do is actually very simple: for each row, 
calculate the frequency of a given pattern. For example, a toy dataset is as 
follows.

Col1    Col2    Col3    Col4
01      02      02      00              => Freq of “02” is 0.5
02      02      02      01              => Freq of “02” is 0.75
00      02      01      01              …

My code is quite simple as the following to find the pattern “02”.

OccurrenceRate_Fun<-function(dataMatrix)
{
  tmp<-NULL
  tmpMatrix<-apply(dataMatrix,1,match,"02")
   for ( i in 1: ncol(tmpMatrix))
  {
    tmpRate<-table(tmpMatrix[,i])[[1]]/ nrow(tmpMatrix)
    tmp<-c(tmp,tmpHET)
  }
  rm(tmpMatrix)
  rm(tmpRate)
  return(tmp)
  gc()
}

The problem is the memory usage grows very fast and hard to be handled on 
machines with less RAM.
Could anyone please give me some comments on how to reduce the space complexity 
in this calculation?

Thanks,
Mike
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to