> On May 25, 2017, at 11:48, Tim Cutts <t...@sanger.ac.uk> wrote: > > Neither is replication a backup, and for the same reason. However, at large > data scales formal backups become prohibitively expensive, and therefore > people use replication or erasure coding instead, and have to accept that > while they're protected against hardware failure, they're not very well > protected against user failure. > > This is a really thorny issue. On our archival storage platform for our raw > sequencing data, where we use iRODS to manage the data, the data is > replicated, and there are tight controls on who is allowed to modify the data > (essentially, no-one - even the data owners are not allowed to modify or > delete their own data on that platform; they have to make a specific request > to a core team responsible for the archive)
I’d be interested to hear what people are doing, generally, about backing up very large volumes of data (that probably seem smaller to more established centers), like 500TB to 1PB. It sounds to me like a combination of replication and filesystem snapshots (those replicated or not) do protect against hardware failure and user failure, depending on the frequency and whether or not you have any other hidden weaknesses. -- ____ || \\UTGERS, |---------------------------*O*--------------------------- ||_// the State | Ryan Novosielski - novos...@rutgers.edu || \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus || \\ of NJ | Office of Advanced Research Computing - MSB C630, Newark `'
signature.asc
Description: Message signed with OpenPGP using GPGMail
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf