> On May 25, 2017, at 11:48, Tim Cutts <t...@sanger.ac.uk> wrote:
> 
> Neither is replication a backup, and for the same reason.  However, at large 
> data scales formal backups become prohibitively expensive, and therefore 
> people use replication or erasure coding instead, and have to accept that 
> while they're protected against hardware failure, they're not very well 
> protected against user failure.
> 
> This is a really thorny issue.  On our archival storage platform for our raw 
> sequencing data, where we use iRODS to manage the data, the data is 
> replicated, and there are tight controls on who is allowed to modify the data 
> (essentially, no-one - even the data owners are not allowed to modify or 
> delete their own data on that platform; they have to make a specific request 
> to a core team responsible for the archive)

I’d be interested to hear what people are doing, generally, about backing up 
very large volumes of data (that probably seem smaller to more established 
centers), like 500TB to 1PB. It sounds to me like a combination of replication 
and filesystem snapshots (those replicated or not) do protect against hardware 
failure and user failure, depending on the frequency and whether or not you 
have any other hidden weaknesses.

--
____
|| \\UTGERS,     |---------------------------*O*---------------------------
||_// the State  |         Ryan Novosielski - novos...@rutgers.edu
|| \\ University | Sr. Technologist - 973/972.0922 (2x0922) ~*~ RBHS Campus
||  \\    of NJ  | Office of Advanced Research Computing - MSB C630, Newark
     `'

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to