On 06/13/2014 09:31 AM, Joe Landman wrote:
On 06/13/2014 09:17 AM, Skylar Thompson wrote:
We've recently implemented a quota of 1 million files per 1TB of
filesystem space. And yes, we had to clean up a number of groups' and
individuals' spaces before implementing that. There seems to be a trend
in the bioinformatics community for using the filesystem as a database.

I wasn't going to say anything about this, but, yes, there are some
significant abuses of file systems going on in this community.  But this
is nothing new, sadly ...  I've seen this since the late 90's.

I think we're all probably too close to the tool in question (HPC storage). Ultimately this is just a hammer for scientists and other non-CS/IT types, so of course they are going to scoff when we tell them they are holding the hammer such that it hits sideways. "Who's to tell me how to hold the hammer?! This side has more metallic surface area anyhow, making it easier to hit the nail this way!"

So you can either:
a) Fix it transparently with automatic policies/FS's in the back-end. (I know of at least one FS that packs small files with metadata transparently on SSDs to expedite small file IOPS, but message me off-list for that as I start work for that shop soon and don't want to so blatantly advertise). There are limits to how much these policies/FS's can fix though. Bad I/O will still be Bad I/O after a point. b) Enact any number of the "rules" mentioned previously and tell the users, "no really, we know a thing or two about these systems, learn how to hold the hammer." You may need to demonstrate on their skull a few times for the proper orientation to sink in.

I did teach a graduate course on HPC programming at my alma mater about
a decade ago.  Covered parallelism, optimization, and gave rough rubrics
for how to write code that made effective use of the machine resources.
  I had face-palm moments when one of the kids told me he didn't know C,
but could work in C++.  Now-a-days we'd be lucky to find anyone whose
minds were not polluted by Java + other bad-for-hpc things.

What? How dare you! I pollute my mind with Java every morning, maybe two or three cups full before any real work gets done ;).

Joking aside, making good use of CPU/Memory resources in the HPC context even today still requires good knowledge of C/Fortran and all of their associated parallelism libraries. However, I am not so convinced making optimal or near-optimal use of remote storage cares whatsoever between C, C++, or Java for that matter. You can do very horrible things I/O-wise in any of those languages, byte-sized I/O perhaps even more likely to happen in a byte-oriented language like C.

Best,

ellis

--
Ph.D. Candidate
Department of Computer Science and Engineering
The Pennsylvania State University
www.ellisv3.com
_______________________________________________
Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing
To change your subscription (digest mode or unsubscribe) visit 
http://www.beowulf.org/mailman/listinfo/beowulf

Reply via email to