Re: [Beowulf] dedupe filesystem

2009-06-05 Thread Mark Hahn
IMO, this is a dubious assertion. I bought a couple incredibly cheap desktop disks for home use a couple weeks ago: just seagate 7200.12's. Are you happy with the 7200.12 so far? I must admit the awful 7200.11 (the 1 and 1.5 TByte variety) has quite soured me on Seagate. I haven't had any probl

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread Lawrence Stewart
On Jun 5, 2009, at 1:04 PM, Lux, James P wrote: --- Many years ago I read an interesting paper talking about how modern user interfaces are hobbled by assumptions incorporated decades ago. When disk space is slow and precious, having users decide to explicitly save their file while edi

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread Lawrence Stewart
On Jun 5, 2009, at 1:12 PM, Joe Landman wrote: Lux, James P wrote: It only looks at raw blocks. If they have the same hash signatures (think like MD5 or SHA ... hopefully with fewer collisions), then they are duplicates. maybe a better model is a “data compression” algorithm on the fl

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread Joe Landman
Lux, James P wrote: Isn’t de-dupe just another flavor, conceptually, of a journaling file system..in the sense that in many systems, only a small part of the file actually changes each time, so saving “diffs” allows one to reconstruct any arbitrary version with much smaller file space. Its re

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread Lux, James P
On 6/5 So what we really want is a storage system that will swallow up drives as they get bigger and bigger - so as your researchers create more and more data, or stream in more and more satellite/accelerator data/logs of phone calls (a la GCHQ) then your storage system is expanding at a faster

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread Lux, James P
On 6/5/09 10:18 AM, "John Hearns" wrote: > 2009/6/5 Lux, James P : >>> >> In theory, then, with sufficient computational power (and that¹s what this >> list is all about)  with the data on a small thumb drive I should be able to >> reconstruct everything,  in every version, I¹ve ever created

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread Joe Landman
John Hearns wrote: In 2009, twenty years later, I think he might have a different take on this. I put all my bits onto floppys when I left there, and moved the important ones to spinning rust. I can still read the floppies. I doubt he can still read the tapes. This is I think referred to as

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread John Hearns
2009/6/5 Lux, James P : >> > In theory, then, with sufficient computational power (and that’s what this > list is all about)  with the data on a small thumb drive I should be able to > reconstruct everything,  in every version, I’ve ever created or will create. >  All it takes is a sufficiently pow

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread John Hearns
2009/6/5 Joe Landman : > > A brief anecdote.  In 1989, a fellow graduate student was leaving for > another school.  He was taking his data with him.  He spooled up a Vax 8650 > unit with a tape.  I asked him why this over other media.  His response was, > you can read a Vax tape anywhere. > > In 20

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread Lux, James P
Isn't de-dupe just another flavor, conceptually, of a journaling file system..in the sense that in many systems, only a small part of the file actually changes each time, so saving "diffs" allows one to reconstruct any arbitrary version with much smaller file space. I guess the de-dupe is a bit

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread Joe Landman
John Hearns wrote: There is a science fiction novel which describes how women will live forever. As women become older, their life expectancy will increase as new and expensive treatments become available to medical science to extend their lifetime. At the point where the rate of increase become

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread John Hearns
2009/6/5 Joe Landman : > > > Look at it this way ... what is the cost/benefit to the movie-company to > buy/build expensive storage and build tiers, as compared to much less > expensve replicated/HSMed storage?  I think the writing is clearly on the > wall on this.  Lots of the folks in this indust

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread Joe Landman
John Hearns wrote: 2009/6/5 Mark Hahn : I'm not sure - is there some clear indication that one level of storage is not good enough? I hope I pointed this out before, but Dedup is all about reducing the need for the less expensive 'tier'. Tiered storage has some merits, especially in the 'in

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread Loic Tortay
Mark Hahn wrote: [...] this seems like a bad design to me. I would think (and I'm reasonably familiar with Lustre, though not an internals expert) that if you're going to touch Lustre interfaces at all, you should simply add cheaper, higher-density OSTs, and make more intelligent placement/mi

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread Jeff Layton
John Hearns wrote: ps. The robinhood file scanning utility which the Lustre DSM project intends to use looked good to me. I downloaded it and tried to compile it up - it claims to compile without having all the Lutre libraries on the system, but did not. Grr... AFAIK it's Lustre specific

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread Eugen Leitl
On Fri, Jun 05, 2009 at 09:52:55AM -0400, Mark Hahn wrote: > IMO, this is a dubious assertion. I bought a couple incredibly cheap > desktop disks for home use a couple weeks ago: just seagate 7200.12's. Are you happy with the 7200.12 so far? I must admit the awful 7200.11 (the 1 and 1.5 TByte va

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread John Hearns
ps. The robinhood file scanning utility which the Lustre DSM project intends to use looked good to me. I downloaded it and tried to compile it up - it claims to compile without having all the Lutre libraries on the system, but did not. Grr ___ Beo

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread Mark Hahn
The best of both worlds would certainly be a central, fast storage filesystem, coupled with a hierarchical storage management system. I'm not sure - is there some clear indication that one level of storage is not good enough? I guess it strongly depends on your workload and applications. If yo

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread John Hearns
2009/6/5 Mark Hahn : > > I'm not sure - is there some clear indication that one level of storage is > not good enough? That is well worthy of a debate. As the list knows, I am a fan of HSMs - for the very good reason of having good experience with them. There are still arguments made that 'front

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread John Hearns
2009/6/5 Mark Hahn : > I'm not sure - is there some clear indication that one level of storage is > not good enough? That is well worthy of a debate. As the list knows, I am a fan of HSMs - for the very good reason of having good experience with them. There are still arguments made that 'front

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread Kilian CAVALOTTI
On Friday 05 June 2009 15:52:55 Mark Hahn wrote: > > The best of both worlds would certainly be a central, fast storage > > filesystem, coupled with a hierarchical storage management system. > > I'm not sure - is there some clear indication that one level of storage is > not good enough? I guess i

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread Mark Hahn
have tiered storage today, but in the future i can see a need to have a storage pool with SATA and a storage pool with SAS or faster drives in it. IMO, this is a dubious assertion. I bought a couple incredibly cheap desktop disks for home use a couple weeks ago: just seagate 7200.12's. these ar

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread John Hearns
2009/6/5 Kilian CAVALOTTI : >> > Oh wait, it might exist already... Well, at least it's in the works: Sun and > CEA are working on implementing such an HSM for Lustre 2.0. See > http://wiki.lustre.org/images/8/8b/AurelienDegremont.pdf for details. > That looks interesting, thankyou. The Robinhood

Re: [Beowulf] dedupe filesystem

2009-06-05 Thread Kilian CAVALOTTI
On Wednesday 03 June 2009 14:55:52 Michael Di Domenico wrote: > Do you find such a policy hard to enforce with researchers? I don't > have tiered storage today, but in the future i can see a need to have > a storage pool with SATA and a storage pool with SAS or faster drives > in it. Some of the

Re: [Beowulf] Re: dedupe Filesystem

2009-06-05 Thread Kilian CAVALOTTI
On Wednesday 03 June 2009 14:24:19 Lawrence Stewart wrote: > I probably don't need to remind anyone here that deduplication on a > live filesystem (as opposed to backups) can have really bad > performance effects. Imagine if you have to move the disk arms around > for every file for every block of