10GB will definitely be fine (and will fit into memory on many modern
machines!).
Somewhere north of a design matrix of 1TB, you will be breaking new ground,
mostly in terms of how well your basic algorithm works. The LSMR should
still work, but size does matter, often in surprising ways.
On Fri
Hi Ted,
I will look at the Mahout library. I was not aware of this. I will see if
this is amenable to my problems.
A large problem would be one where it does not make sense to pull all the
data into core, whether its 10Gb or 100Tb. While some of these design
matrices might be sparse, there is no
Mahout has this.
We have an LSMR implementation that can accept a generic linear operator.
You can implement this linear operator as an out of core multiplication or
as a cluster operation.
You don't say how large you want the system to be or whether you have sparse
data. That might change the
On 6/24/11 11:44 AM, Greg Sterijevski wrote:
> Hello All,
>
> I have been a user of the math commons jar for a little over a year and am
> very impressed with it. I was wondering whether anyone is actively working
> on implementing functionality to do regressions on very very large data
> sets. The
Hello All,
I have been a user of the math commons jar for a little over a year and am
very impressed with it. I was wondering whether anyone is actively working
on implementing functionality to do regressions on very very large data
sets. The current implementation of the OLS routine is an in-core