Re: [Beowulf] Small files

2014-06-12 Thread Christopher Samuel
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 12/06/14 05:03, Tom Harvill wrote: > We've found that a large majority of our files (~40MM of ~50MM) > are less than 10KB. We believe our filesystem (lustre) is > bottlenecked with IOPs and locking related to jobs running against > these files. We

Re: [Beowulf] Big Data and HPC

2014-06-12 Thread Jason Riedy
And Mark Hahn writes: > I think of BD as a kind of analysis of "repurposed" data. or maybe > "meta-analytics". Big data: Tackling the same analysis problems sociologists, psychologists, astrophysicists, and other non-experimental scientists have had since the beginning of time, or at least IRBs...

Re: [Beowulf] Small files

2014-06-12 Thread Jeffrey Layton
Tom, Without digging into the details too much, can you describe the Lustre setup? As Bernd alluded to, it might be something in the configuration or version that is hampering better performance. But then again, you may not want to upgrade to a newer because of disruption. But maybe there are some

Re: [Beowulf] Small files

2014-06-12 Thread Kilian Cavalotti
Hi Tom, On Wed, Jun 11, 2014 at 12:03 PM, Tom Harvill wrote: > I want to ask this general question: how does your shop deal with the > general problem of > small files in filesystems on (beowulf) compute clusters? Specifically, > files that users expect > to actively use for read and write operat

Re: [Beowulf] Small files

2014-06-12 Thread Bernd Schubert
On 06/12/2014 03:09 PM, Jeffrey Layton wrote: Tom, Without digging into the details too much, can you describe the Lustre setup? As Bernd alluded to, it might be something in the configuration or version that is hampering better performance. But then again, you may not want to upgrade to a newer

Re: [Beowulf] Small files

2014-06-12 Thread Bernd Schubert
Hello Tom, this is rather self advertising, but you might want to try out FhGFS/BeeGFS. We spent lots of development resources to have good handling of zillions of small files. On 06/11/2014 09:03 PM, Tom Harvill wrote: Hello, This is my first time posting to this list, thanks in advance f

Re: [Beowulf] Small files

2014-06-12 Thread John Hearns
Tom, as Reuti says let's have a look at the nature of these files. what are they, and are analysis jobs really revisiting them again and again? This is a marvellous tool for analysing filesystem usage: http://www.chiark.greenend.org.uk/~sgtatham/agedu/ I have used it a lot in the past on the scra

Re: [Beowulf] Small files

2014-06-12 Thread John Hearns
Tom, I agree with you regarding small files. In my case, I manage a DMF (SGI Data Migration Facility) setup. I was concerned at the amount of small files which we were storing - in terms of the size of the database files, and storing small files to tape. SGI engineers reassured me that the system w

Re: [Beowulf] Small files

2014-06-12 Thread Reuti
Hi, Am 11.06.2014 um 21:03 schrieb Tom Harvill: > This is my first time posting to this list, thanks in advance for any time > you spend > replying. > > We've found that a large majority of our files (~40MM of ~50MM) are less than > 10KB. > We believe our filesystem (lustre) is bottlenecked wi

[Beowulf] Small files

2014-06-12 Thread Tom Harvill
Hello, This is my first time posting to this list, thanks in advance for any time you spend replying. We've found that a large majority of our files (~40MM of ~50MM) are less than 10KB. We believe our filesystem (lustre) is bottlenecked with IOPs and locking related to jobs running against