Isn't de-dupe just another flavor, conceptually, of a journaling file 
system..in the sense that in many systems, only a small part of the file 
actually changes each time, so saving "diffs" allows one to reconstruct any 
arbitrary version with much smaller file space.
I guess the de-dupe is a bit more aggressive than that, in that it 
theoretically can look for common "stuff" between unrelated files, so  maybe a 
better model is a  "data compression" algorithm on the fly.  And for that, it's 
all about trading between cost of storage space, retrieval time, and 
computational effort to run the algorithm.  (Reliability factors into it a 
bit.. Compression removes redundancy, after all, but the defacto redundancy 
provided by having previous versions around isn't a good "system" solution, 
even if it's the one people use)

I think one can make the argument that computation is always getting cheaper, 
at a faster rate than storage density or speed (because of the physics limits 
on the storage...), so the "span" over which you can do compression can be 
arbitrarily increased over time. TIFF and FAX do compression over a few bits. 
Zip and it's ilk do compression over kilobits or megabits (depending on whether 
they build a custom symbol table).  Dedupe is doing compression over Gigabits 
and terabits, presumably (although I assume that there's a granularity at some 
point.. A dedupe system looks at symbols that are, say, 512 bytes long, as 
opposed to ZIP looking at 8bit symbols, or Group4 Fax looking at 1 bit symbols.

The hierarchical storage is really optimizing along a different axis than 
compression.  It's more like cache than compression.. Make the "average time to 
get to the next bit you need" smaller rather than "make smaller number of bits"

Granted, for a lot of systems, "time to get a bit" is proportional to "number 
of bits"

On 6/5/09 8:00 AM, "Joe Landman" <land...@scalableinformatics.com> wrote:

John Hearns wrote:
> 2009/6/5 Mark Hahn <h...@mcmaster.ca>:
>> I'm not sure - is there some clear indication that one level of storage is
>> not good enough?

I hope I pointed this out before, but Dedup is all about reducing the
need for the less expensive 'tier'.  Tiered storage has some merits,
especially in the 'infinite size' storage realm.  Take some things
offline, leave things you need online until they go dormant.  Define
dormant on your own terms.

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to