Re: [Beowulf] Lustre Upgrades

2018-07-23 Thread Paul Edmon
The main issue we see is that OST's get hung up occassionally which causes writes to hang as the OST flaps connecting and disconnecting with the MDS.  Rebooting the OSS's fixes the issue as it forces the remount.  It seems to only happen when the system is full (i.e. above 95% usage) and under

Re: [Beowulf] Lustre Upgrades

2018-07-23 Thread Paul Edmon
Yeah we've pinged Intel/Whamcloud to find out upgrade paths as we wanted to know what the recommended procedure is. Sure. So we have 3 systems that we want to upgrade 1 that is a PB and 2 that are 5 PB each.  I will just give you a description of one and assume that everything would scale line

Re: [Beowulf] Lustre Upgrades

2018-07-23 Thread Michael Di Domenico
On Mon, Jul 23, 2018 at 1:34 PM, Paul Edmon wrote: > Yeah we've found out firsthand that its problematic as we have been seeing > issues :). Hence the urge to upgrade. what issues are you seeing? I have 2.10.4 clients pointing at 2.5.1 servers, haven't seen any obvious issues and it's been runn

Re: [Beowulf] Lustre Upgrades

2018-07-23 Thread Jeff Johnson
Paul, How big are your ldiskfs volumes? What type of underlying hardware are they? Running e2fsck (ldiskfs aware) is wise and can be done in parallel. It could be within a couple of days, the time all depends on the size and underlying hardware. Going from 2.5.34 to 2.10.4 is a significant jump.

Re: [Beowulf] Lustre Upgrades

2018-07-23 Thread Paul Edmon
Yeah we've found out firsthand that its problematic as we have been seeing issues :).  Hence the urge to upgrade. We've begun exploring this but we wanted to reach out to other people who may have gone through the same thing to get their thoughts.  We also need to figure out how significant an

Re: [Beowulf] Lustre Upgrades

2018-07-23 Thread Jeff Johnson
You're running 2.10.4 clients against 2.5.34 servers? I believe there are notable lnet attrs that don't exist in 2.5.34. Maybe a Whamcloud wiz might chime in but I think that version mismatch might be problematic. You can do a testbed upgrade to test taking a ldiskfs volume from 2.5.34 to 2.10.4,

Re: [Beowulf] Lustre Upgrades

2018-07-23 Thread Paul Edmon
My apologies I meant 2.5.34 not 2.6.34.  We'd like to get up to 2.10.4 which is what our clients are running.  Recently we upgraded our cluster to CentOS7 which necessitated the client upgrade.  Our storage servers though stayed behind on 2.5.34. -Paul Edmon- On 07/23/2018 01:00 PM, Jeff Joh

Re: [Beowulf] Lustre Upgrades

2018-07-23 Thread Jeff Johnson
Paul, 2.6.34 is a kernel version. What version of Lustre are you at now? Some updates are easier than others. --Jeff On Mon, Jul 23, 2018 at 8:59 AM, Paul Edmon wrote: > We have some old large scale Lustre installs that are running 2.6.34 and > we want to get these up to the latest version of

Re: [Beowulf] Lustre Upgrades

2018-07-23 Thread Michael Di Domenico
On Mon, Jul 23, 2018 at 11:59 AM, Paul Edmon wrote: > Should we just write off upgrading and stand up new servers > that are on the correct version (in which case we need to transfer the > several PB's worth of data over to the new system)? if you can afford the hardware and the time for the copy

[Beowulf] Lustre Upgrades

2018-07-23 Thread Paul Edmon
We have some old large scale Lustre installs that are running 2.6.34 and we want to get these up to the latest version of Lustre.  I was curious if people in this group have any experience with doing this and if they could share them.  How do you handle upgrades like this?  How much time does i