> On May 25, 2017, at 4:10 PM, Kilian Cavalotti > <kilian.cavalotti.w...@gmail.com> wrote: > > On Thu, May 25, 2017 at 8:58 AM, Ryan Novosielski <novos...@rutgers.edu> > wrote: >> I’d be interested to hear what people are doing, generally, about backing up >> very large volumes of data (that probably seem smaller to more established >> centers), like 500TB to 1PB. It sounds to me like a combination of >> replication and filesystem snapshots (those replicated or not) do protect >> against hardware failure and user failure, depending on the frequency and >> whether or not you have any other hidden weaknesses. > > At Stanford, we (Research Computing) have developed a PoC using Lustre > HSM and a Google Drive backend to back our /scratch filesystem up, > mostly because Google Drive is free and unlimited for .edu accounts > (^_^). We didn't announce anything to our users, so they don't start > relying on it, and use this more as an insurance against user > "creativity" than a real disaster-recovery mechanism. > > We found out that this was working quite well for backing up large > files, but not so well for smaller ones because Google enforce secret > file operation rate limits (I say secret because they're not the ones > that are documented, and support doesn't want to talk about them), > which I guess is fair for a free and unlimited service. But that means > that for a filesystem with hundreds of millions of files, this is not > really appropriate. > > We did some tests for restoring data from the Google Drive backend, > and another limitation with the current Lustre HSM implementation is > that the HSM coordinator doesn't prioritize restore operations. > Meaning that if you have thousands of "archive" operations in queue, > the coordinator will need to go through all of them before processing > your "restore" ops. Which again, in real life, might be a deal-breaker > for disaster recovery. > > Anyway, we had quite some fun doing it, including some nice chats with > the Networking people on campus (which actually lead to a new 100G > data link being deployed). We've released the open source Lustre HSM > to Google Drive copytool that we developed on GitHub > (https://github.com/stanford-rc/ct_gdrive). And we're now the proud > users of about about 3.3 PB on Google Drive (screenshot attached, > because it happened).
Boy, that’s great, Kilian, thanks! I’m already glad I asked. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `'
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf