On Jan 5, 2006, at 7:33 PM, <[EMAIL PROTECTED]> <[EMAIL PROTECTED]> wrote:
> The empirically derived limit on my machine (under R 1.9.1) was > approximately 7500 data points. > I have been able to successfully run the script that uses package > MCLUST on several hundred smaller data sets. > > I even had written a work-around for the case of greater than 9600 > data points. My work-around first orders the > points by their value then takes a sample (e.g. every other point > or 1 point every n points) in order to bring the number under > 9600. No problems with the computations were observed, but you are > correct that a deconvolution on that larger dataset of 9600 takes > almost 30 minutes. However, for our purposes, we do not have many > datasets over 9600 so the time is not a major constraint. > > Unfortunately, my management does not like using a work-around and > really wants to operate on the larger data sets. > I was told to find a way to make it operate on the larger data sets > or avoid using R and find another solution. Well, sure, if your only concern is the memory then moving to unix will give you several hundred more data points you can use. I would recommend a 64-bit unix preferably, because then there is practically no software limit on the size of virtual memory. Nevertheless there is still a limit of ca. 4GB for a single vector, so that should give you around 32500 rows that mclust can handle as- is (I don't want to see the runtime, though ;)). For anything else you'll really have to think about another approach.. Cheers, Simon ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel