Hi John,
I would recommend joining up at spectrumscale.org mailing list, where
you will find very good experts from the HPC industry who know GPFS
well, including, Vendors, users and integrators. More specifically,
you'll you'll find gpfs developers on there. Maybe someone on that list
can help out
More direct link to the mailing list, here,
https://www.spectrumscale.org:10000/virtualmin-mailman/unauthenticated/listinfo.cgi/gpfsug-discuss/
On 29/04/2017 08:00, John Hanks wrote:
Hi,
I'm not getting much useful vendor information so I thought I'd ask
here in the hopes that a GPFS expert can offer some advice. We have a
GPFS system which has the following disk config:
[root@grsnas01 ~]# mmlsdisk grsnas_data
disk driver sector failure holds holds
storage
name type size group metadata data status
availability pool
------------ -------- ------ ----------- -------- ----- -------------
------------ ------------
SAS_NSD_00 nsd 512 100 No Yes ready up
system
SAS_NSD_01 nsd 512 100 No Yes ready up
system
SAS_NSD_02 nsd 512 100 No Yes ready up
system
SAS_NSD_03 nsd 512 100 No Yes ready up
system
SAS_NSD_04 nsd 512 100 No Yes ready up
system
SAS_NSD_05 nsd 512 100 No Yes ready up
system
SAS_NSD_06 nsd 512 100 No Yes ready up
system
SAS_NSD_07 nsd 512 100 No Yes ready up
system
SAS_NSD_08 nsd 512 100 No Yes ready up
system
SAS_NSD_09 nsd 512 100 No Yes ready up
system
SAS_NSD_10 nsd 512 100 No Yes ready up
system
SAS_NSD_11 nsd 512 100 No Yes ready up
system
SAS_NSD_12 nsd 512 100 No Yes ready up
system
SAS_NSD_13 nsd 512 100 No Yes ready up
system
SAS_NSD_14 nsd 512 100 No Yes ready up
system
SAS_NSD_15 nsd 512 100 No Yes ready up
system
SAS_NSD_16 nsd 512 100 No Yes ready up
system
SAS_NSD_17 nsd 512 100 No Yes ready up
system
SAS_NSD_18 nsd 512 100 No Yes ready up
system
SAS_NSD_19 nsd 512 100 No Yes ready up
system
SAS_NSD_20 nsd 512 100 No Yes ready up
system
SAS_NSD_21 nsd 512 100 No Yes ready up
system
SSD_NSD_23 nsd 512 200 Yes No ready up
system
SSD_NSD_24 nsd 512 200 Yes No ready up
system
SSD_NSD_25 nsd 512 200 Yes No to be emptied
down system
SSD_NSD_26 nsd 512 200 Yes No ready up
system
SSD_NSD_25 is a mirror in which both drives have failed due to a
series of unfortunate events and will not be coming back. From the
GPFS troubleshooting guide it appears that my only alternative is to run
mmdeldisk grsnas_data SSD_NSD_25 -p
around which the documentation also warns is irreversible, the sky is
likely to fall, dogs and cats sleeping together, etc. But at this
point I'm already in an irreversible situation. Of course this is a
scratch filesystem, of course people were warned repeatedly about the
risk of using a scratch filesystem that is not backed up and of course
many ignored that. I'd like to recover as much as possible here. Can
anyone confirm/reject that deleting this disk is the best way forward
or if there are other alternatives to recovering data from GPFS in
this situation?
Any input is appreciated. Adding salt to the wound is that until a few
months ago I had a complete copy of this filesystem that I had made
onto some new storage as a burn-in test but then removed as that
storage was consumed... As they say, sometimes you eat the bear, and
sometimes, well, the bear eats you.
Thanks,
jbh
(Naively calculated probability of these two disks failing close
together in this array: 0.00001758. I never get this lucky when buying
lottery tickets.)
--
‘[A] talent for following the ways of yesterday, is not sufficient to
improve the world of today.’
- King Wu-Ling, ruler of the Zhao state in northern China, 307 BC
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
--
regards,
Arif Ali
Mob: +447970148122
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf