Re: [Beowulf] Lustre Upgrades

2018-07-24 Thread Stu Midgley
let be clear... you can do this with Lustre as well (we do it all the time). We also rebalance the OST's all the time... On Tue, Jul 24, 2018 at 10:31 PM John Hearns via Beowulf < beowulf@beowulf.org> wrote: > Forgive me for saying this, but the philosophy for software defined > storage such a

Re: [Beowulf] Lustre Upgrades

2018-07-24 Thread James Burton
Does anyone have any experience with how BeeGFS compares to Lustre? We're looking at both of those for our next generation HPC storage system. Is CephFS a valid option for HPC now? Last time I played with CephFS it wasn't ready for prime time, but that was a few years ago. On Tue, Jul 24, 2018 at

Re: [Beowulf] emergent behavior - correlation of job end times

2018-07-24 Thread Christopher Samuel
On 25/07/18 04:52, David Mathog wrote: One possibility is that at the "leading" edge the first job that reads a section of data will do so slowly, while later jobs will take the same data out of cache. That will lead to a "peloton" sort of effect, where the leader is slowed and the followers ac

Re: [Beowulf] ServerlessHPC

2018-07-24 Thread Kilian Cavalotti
On Tue, Jul 24, 2018 at 3:13 PM, Greg Lindahl wrote: > We should all remember Don Becker's definition of "zero copy" -- it's > when you make someone else do the copy and then pretend it was free. That will definitely go on my wall. :D Cheers, -- Kilian __

Re: [Beowulf] ServerlessHPC

2018-07-24 Thread Stu Midgley
you can pay a lot of money for pretend... On Wed, Jul 25, 2018 at 6:14 AM Greg Lindahl wrote: > We should all remember Don Becker's definition of "zero copy" -- it's > when you make someone else do the copy and then pretend it was free. > > That was totally a foreshadowing of "serverless"! > > -

Re: [Beowulf] ServerlessHPC

2018-07-24 Thread Greg Lindahl
We should all remember Don Becker's definition of "zero copy" -- it's when you make someone else do the copy and then pretend it was free. That was totally a foreshadowing of "serverless"! -- greg ___ Beowulf mailing list, Beowulf@beowulf.org sponsore

[Beowulf] ServerlessHPC

2018-07-24 Thread John Hearns via Beowulf
All credit goes to Pim Schravendijk for coining a new term on Twitter today https://twitter.com/rdwrt https://twitter.com/rdwrt/status/1021761796498182144?s=03 We will all be doing it in six months time. ___ Beowulf mailing list, Beowulf@beowulf.org spon

[Beowulf] emergent behavior - correlation of job end times

2018-07-24 Thread David Mathog
Hi all, Thought some of you might find this interesting. Using the WGS (aka CA aka Celera) genome assembler there is a step which runs a large number (in this instance, 47634) of overlap comparisons. There are N sequences (many millions, of three different types) and it makes many sequence r

Re: [Beowulf] Lustre Upgrades

2018-07-24 Thread Fred Youhanaie
Nah, that ain't large scale ;-) If you want large scale have a look at snowmobile: https://aws.amazon.com/snowmobile/ They drive a 45-foot truck to your data centre, fill it up with your data bits, then drive it back to their data centre :-() Cheers, Fred On 24/07/18 19:04, Jonathan

Re: [Beowulf] Lustre Upgrades

2018-07-24 Thread Jonathan Engwall
Snowball is the very large scale AWS data service. On July 24, 2018, at 8:35 AM, Joe Landman wrote: On 07/24/2018 11:06 AM, John Hearns via Beowulf wrote: > Joe, sorry to split the thread here. I like BeeGFS and have set it up. > I have worked for two companies now who have sites around the w

Re: [Beowulf] Lustre Upgrades

2018-07-24 Thread John Hearns via Beowulf
Thankyou for a comprehensive reply. On Tue, 24 Jul 2018 at 17:56, Paul Edmon wrote: > This was several years back so the current version of Gluster may be in > better shape. We tried to use it for our primary storage but ran into > scalability problems. It especially was the case when it came

Re: [Beowulf] Lustre Upgrades

2018-07-24 Thread Paul Edmon
This was several years back so the current version of Gluster may be in better shape.  We tried to use it for our primary storage but ran into scalability problems.  It especially was the case when it came to healing bricks and doing replication.  It just didn't scale well.  Eventually we aband

Re: [Beowulf] Lustre Upgrades

2018-07-24 Thread Joe Landman
On 07/24/2018 11:06 AM, John Hearns via Beowulf wrote: Joe, sorry to split the thread here. I like BeeGFS and have set it up. I have worked for two companies now who have sites around the world, those sites being independent research units. But HPC facilities are in headquarters. The sites wa

Re: [Beowulf] Lustre Upgrades

2018-07-24 Thread Joe Landman
On 07/24/2018 11:06 AM, John Hearns via Beowulf wrote: Joe, sorry to split the thread here. I like BeeGFS and have set it up. I have worked for two companies now who have sites around the world, those sites being independent research units. But HPC facilities are in headquarters. The sites wa

Re: [Beowulf] Lustre Upgrades

2018-07-24 Thread Joe Landman
On 07/24/2018 11:06 AM, John Hearns via Beowulf wrote: Joe, sorry to split the thread here. I like BeeGFS and have set it up. I have worked for two companies now who have sites around the world, those sites being independent research units. But HPC facilities are in headquarters. The sites wa

Re: [Beowulf] Lustre Upgrades

2018-07-24 Thread John Hearns via Beowulf
Joe, sorry to split the thread here. I like BeeGFS and have set it up. I have worked for two companies now who have sites around the world, those sites being independent research units. But HPC facilities are in headquarters. The sites want to be able to drop files onto local storage yet have it ma

Re: [Beowulf] Lustre Upgrades

2018-07-24 Thread John Hearns via Beowulf
Paul, thanks for the reply. I would like to ask, if I may. I rather like Glustre, but have not deployed it in HPC. I have heard a few people comment about Gluster not working well in HPC. Would you be willing to be more specific? One research site I talked to did the classic 'converged infrastruct

Re: [Beowulf] Lustre Upgrades

2018-07-24 Thread Joe Landman
On 07/24/2018 10:31 AM, John Hearns via Beowulf wrote: Forgive me for saying this, but the philosophy for software defined storage such as CEPH and Gluster is that forklift style upgrades should not be necessary. When a storage server is to be retired the data is copied onto the new server th

Re: [Beowulf] Lustre Upgrades

2018-07-24 Thread Paul Edmon
While I agree with you in principle, one also has to deal with the reality as you find yourself in.  In our case we have more experience with Lustre than Ceph in an HPC and we got burned pretty badly by Gluster.  While I like Ceph in principle I haven't seen it do what Lustre can do in a HPC se

Re: [Beowulf] Lustre Upgrades

2018-07-24 Thread John Hearns via Beowulf
Forgive me for saying this, but the philosophy for software defined storage such as CEPH and Gluster is that forklift style upgrades should not be necessary. When a storage server is to be retired the data is copied onto the new server then the old one taken out of service. Well, copied is not the

Re: [Beowulf] Lustre Upgrades

2018-07-24 Thread Paul Edmon
Yeah, that's my preferred solution as the hardware we have is nearing end of life.  In that case though we would then have to coordinate the cut over of the data to the new storage and forklift all those PB's over to the new system, which brings its own unique challenges.  Plus then you also ha

Re: [Beowulf] Lustre Upgrades

2018-07-24 Thread Jörg Saßmannshausen
Hi Paul, with a file system being 93% full, in my humble opinion it would make sense to increase the underlying hardware capacity as well. The reasoning behind it is that usually over time there will be more data on any given file system and thus if there is already a downtime, I would increase