Adrienne,
I have a crew just starting work on this problem this week, but I
think in your case the best solution is more memory. R stores the
distance matrix as a vector with (n^2-n)/2 entries. It's perfectly
dense and immune to sparse matrix compaction. It would be quite
possible to multiply your values by 1000 and store as integers (given
your range of values) and reduce the space required significantly, but
the regression function you ultimately pass this to expects doubles and
you would have to cast the values back to double on the fly, resulting
in wasted time and a matrix that is as big as it would have been anyway.
Our view is that while it's possible to store the distance matrix on
disk rather than memory, all the functions that accept this matrix as an
argument also have to re-written to work with the distances on disk.
We're looking into doing this for PCO and NMDS, but someone would have
to do the same for the geographically weighted regression. I'm sure
that's doable, but certainly not trivial.
Dave
On 06/05/2014 08:56 AM, Adrienne Wootten wrote:
Jim,
There are not going to be additional copies of the distance matrix. The
distance matrix is what is needed for a geographically weighted regression,
so I can estimate the results from that to be a SpatialPointsDataFrame at
roughly 30000 rows by 7 columns. Much smaller size than the distance
matrix, though I'm not sure given the class of that object.
Potentially lots of room taken up, at the moment the machine in question
has 32GB to work with, but we may be able to shift this onto machines with
more RAM.
A
On Thu, Jun 5, 2014 at 10:00 AM, jim holtman <jholt...@gmail.com> wrote:
The real question is how much memory does the machine that you are working
on have. The 32000x32000 matrix will take up ~8GB of physical memory, so
how much memory will the rest of your objects take up. Are any of them
going to be copies of the distance matrix, or is it always going to be
unchanged? Normally my rule of thumb is that I should have 3-4 times the
largest object I am working with since I may be making copies of it. So it
is important to understand how the rest of your program will be using this
large matrix and what type of operations you will be doing on it.
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
On Thu, Jun 5, 2014 at 9:48 AM, Adrienne Wootten <amwoo...@ncsu.edu>
wrote:
Jim
At the moment I'm using write.table. I tried using write.matrix from the
MASS package, but that failed. Integers are not appropriate here because
we are working with fractions of miles for some locations and that needs to
be retained. The range is from 0 to about 3.5 (it's a little less than
that with the digits)
I haven't tried the save function yet, but I wasn't aware of that one
previously. Thanks for pointing that out.
The bigger concern is reading and working with that dataset in the other
calculation though.
Adrienne
On Thu, Jun 5, 2014 at 9:37 AM, jim holtman <jholt...@gmail.com> wrote:
How are you writing it out now? Are you using 'save' which will
compress the file? What are the range of numbers in the matrix? Can you
scale them to integers (what is the range of the numbers) which might save
some space? You did not provide enough information to make a definitive
solution.
Jim Holtman
Data Munger Guru
What is the problem that you are trying to solve?
Tell me what you want to do, not how you want to do it.
On Thu, Jun 5, 2014 at 9:26 AM, Adrienne Wootten <amwoo...@ncsu.edu>
wrote:
All,
Got a tricky situation and unfortunately because it's a big file I can't
exactly provide an example, so I'll describe this as best I can for
everyone.
I have a distance matrix that we are using for a modeling calculation in
space for multiple days. Since the matrix is never going to change for
different dates, I want to keep the matrix in a file and refer to that
so I
don't have to repeat the calculation over and over again for that. The
problem is it's a 32000 X 32000 matrix and roughly works out to 15GB of
storage. This makes it a trick to read the file back into R, but it
leaves
me with two questions for the group.
Is there anyway to have R write this out so that it takes up less
space? I
know R primarily treats numbers as doubles, but I'm trying to find a
way to
get R to write the values as floats or singles.
with how big it is, it may not be wise to save it as an object in R when
read in, so I'm wondering is there anyway to have R do the calculation
it
needs to do without saving the matrix as an object in R? Basically can
I
have it run the calculation off the file itself?
Thanks!
Adrienne
--
Adrienne Wootten
Graduate Research Assistant
State Climate Office of North Carolina
Department of Marine, Earth and Atmospheric Sciences
North Carolina State University
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide
http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
--
Adrienne Wootten
Graduate Research Assistant
State Climate Office of North Carolina
Department of Marine, Earth and Atmospheric Sciences
North Carolina State University
--
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
David W. Roberts office 406-994-4548
Professor and Head FAX 406-994-3190
Department of Ecology email drobe...@montana.edu
Montana State University
Bozeman, MT 59717-3460
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.