Re: Additions to support Large Linear Regression problems

2011-06-25 Thread Ted Dunning
10GB will definitely be fine (and will fit into memory on many modern machines!). Somewhere north of a design matrix of 1TB, you will be breaking new ground, mostly in terms of how well your basic algorithm works. The LSMR should still work, but size does matter, often in surprising ways. On Fri

Re: Additions to support Large Linear Regression problems

2011-06-24 Thread Greg Sterijevski
Hi Ted, I will look at the Mahout library. I was not aware of this. I will see if this is amenable to my problems. A large problem would be one where it does not make sense to pull all the data into core, whether its 10Gb or 100Tb. While some of these design matrices might be sparse, there is no

Re: Additions to support Large Linear Regression problems

2011-06-24 Thread Ted Dunning
Mahout has this. We have an LSMR implementation that can accept a generic linear operator. You can implement this linear operator as an out of core multiplication or as a cluster operation. You don't say how large you want the system to be or whether you have sparse data. That might change the

[math] Re: Additions to support Large Linear Regression problems

2011-06-24 Thread Phil Steitz
On 6/24/11 11:44 AM, Greg Sterijevski wrote: > Hello All, > > I have been a user of the math commons jar for a little over a year and am > very impressed with it. I was wondering whether anyone is actively working > on implementing functionality to do regressions on very very large data > sets. The

Additions to support Large Linear Regression problems

2011-06-24 Thread Greg Sterijevski
Hello All, I have been a user of the math commons jar for a little over a year and am very impressed with it. I was wondering whether anyone is actively working on implementing functionality to do regressions on very very large data sets. The current implementation of the OLS routine is an in-core