There are no dumb questions in this snafu, I have already covered the dumb aspects adequately :)
Replication was not enabled, this was scratch space set up to be as large and fast as possible. The fact that I can say "it was scratch" doesn't make it sting less, thus the grasping at straws. jbh On Sat, Apr 29, 2017, 7:05 PM Evan Burness <evan.burn...@cyclecomputing.com> wrote: > Hi John, > > I'm not a GPFS expert, but I did manage some staff that ran GPFS > filesystems while I was at NCSA. Those folks reeeaaalllly knew what they > were doing. > > Perhaps a dumb question, but should we infer from your note that metadata > replication is not enabled across those 4 NSDs handling it? > > > Best, > > Evan > > > ------------------------- > Evan Burness > Director, HPC > Cycle Computing > evan.burn...@cyclecomputing.com > (919) 724-9338 > > On Sat, Apr 29, 2017 at 9:36 AM, Peter St. John <peter.st.j...@gmail.com> > wrote: > >> just a friendly reminder that while the probability of a particular >> coincidence might be very low, the probability that there will be **some** >> coincidence is very high. >> >> Peter (pedant) >> >> On Sat, Apr 29, 2017 at 3:00 AM, John Hanks <griz...@gmail.com> wrote: >> >>> Hi, >>> >>> I'm not getting much useful vendor information so I thought I'd ask here >>> in the hopes that a GPFS expert can offer some advice. We have a GPFS >>> system which has the following disk config: >>> >>> [root@grsnas01 ~]# mmlsdisk grsnas_data >>> disk driver sector failure holds holds >>> storage >>> name type size group metadata data status >>> availability pool >>> ------------ -------- ------ ----------- -------- ----- ------------- >>> ------------ ------------ >>> SAS_NSD_00 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_01 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_02 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_03 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_04 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_05 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_06 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_07 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_08 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_09 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_10 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_11 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_12 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_13 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_14 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_15 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_16 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_17 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_18 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_19 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_20 nsd 512 100 No Yes ready up >>> system >>> SAS_NSD_21 nsd 512 100 No Yes ready up >>> system >>> SSD_NSD_23 nsd 512 200 Yes No ready up >>> system >>> SSD_NSD_24 nsd 512 200 Yes No ready up >>> system >>> SSD_NSD_25 nsd 512 200 Yes No to be emptied >>> down system >>> SSD_NSD_26 nsd 512 200 Yes No ready up >>> system >>> >>> SSD_NSD_25 is a mirror in which both drives have failed due to a series >>> of unfortunate events and will not be coming back. From the GPFS >>> troubleshooting guide it appears that my only alternative is to run >>> >>> mmdeldisk grsnas_data SSD_NSD_25 -p >>> >>> around which the documentation also warns is irreversible, the sky is >>> likely to fall, dogs and cats sleeping together, etc. But at this point I'm >>> already in an irreversible situation. Of course this is a scratch >>> filesystem, of course people were warned repeatedly about the risk of using >>> a scratch filesystem that is not backed up and of course many ignored that. >>> I'd like to recover as much as possible here. Can anyone confirm/reject >>> that deleting this disk is the best way forward or if there are other >>> alternatives to recovering data from GPFS in this situation? >>> >>> Any input is appreciated. Adding salt to the wound is that until a few >>> months ago I had a complete copy of this filesystem that I had made onto >>> some new storage as a burn-in test but then removed as that storage was >>> consumed... As they say, sometimes you eat the bear, and sometimes, well, >>> the bear eats you. >>> >>> Thanks, >>> >>> jbh >>> >>> (Naively calculated probability of these two disks failing close >>> together in this array: 0.00001758. I never get this lucky when buying >>> lottery tickets.) >>> -- >>> ‘[A] talent for following the ways of yesterday, is not sufficient to >>> improve the world of today.’ >>> - King Wu-Ling, ruler of the Zhao state in northern China, 307 BC >>> >>> _______________________________________________ >>> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >>> >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> > > > -- > Evan Burness > Director, HPC Solutions > Cycle Computing > evan.burn...@cyclecomputing.com > (919) 724-9338 > -- ‘[A] talent for following the ways of yesterday, is not sufficient to improve the world of today.’ - King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
_______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf