Hi All,
I have a 1000x1000000 matrix.
The calculation I would like to do is actually very simple: for each row,
calculate the frequency of a given pattern. For example, a toy dataset is as
follows.
Col1 Col2 Col3 Col4
01 02 02 00 => Freq of “02” is 0.5
02 02 02 01 => Freq of “02” is 0.75
00 02 01 01 …
My code is quite simple as the following to find the pattern “02”.
OccurrenceRate_Fun<-function(dataMatrix)
{
tmp<-NULL
tmpMatrix<-apply(dataMatrix,1,match,"02")
for ( i in 1: ncol(tmpMatrix))
{
tmpRate<-table(tmpMatrix[,i])[[1]]/ nrow(tmpMatrix)
tmp<-c(tmp,tmpHET)
}
rm(tmpMatrix)
rm(tmpRate)
return(tmp)
gc()
}
The problem is the memory usage grows very fast and hard to be handled on
machines with less RAM.
Could anyone please give me some comments on how to reduce the space complexity
in this calculation?
Thanks,
Mike
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.