On Wed, 9 Jan 2013 at 08:27 -0000, Vincent Diepeveen wrote: > What would be a rather interesting thought for building a single box > dirt cheap with huge 'RAM' is the idea of having 1 fast RAID array > of SSD's function as the 'RAM'.
We recently had a chance to look at something like this at a smaller scale. Most of our nodes are diskless. People have expressed interest in having local disk and/or SSD so we have one test node with a local hard disk (no raid) and one test node with a local SSD. I have generally left them configured as swap with a manual ability to mount as local disk. Until recently the only place these nodes actually provided any benefit was with jobs which mapped large files into virtual memory. Having the large swap space allowed the scheduler to schedule these jobs on these nodes, even though the swap was not used at all. This is a scheduler issue not really a hardware issue. When the swap space was actually used for jobs the performance sucked greatly. More recently we had a user job which was not working on either of these test nodes (actually for more basic reasons) until I got involved. I had access to a new test node (intended cloud-ish stuff) so was able to run and watch the (slightly fixed) application with 200G+ hard disk based swap, 200G+ SSD based swap and 200G real RAM (and 48 cores). For this application, all three ran fine for the first day or so until the application crossed the first memory boundary. After that both swap solutions slowed down significantly while the RAM system kept chugging along. We saw another memory plateau (which had been seen in previous runs). However, after another day or so the application took another large growth of memory and ran another couple of days until completing successfully. The application only succeeded on the large memory node. It might have eventually completed on the SSD based swap node but would have taken significantly more time (didn't even bothered to estimate). Take aways: - when you want RAM, you really want RAM, not something else (swap even to SSD is still swap). This actually reinforces my belief in diskless (and thus swapless) nodes. - having a couple of test nodes of different/larger configuration may allow for application completion (and associated monitoring). We now know this specific application is mostly single threaded (there was one short period where it used all the available cores). We know how much memory the application actually uses (between 72G and 96G). Prior testing (which did not go to completion) had only showed the second plateau and was indication a smaller memory need. - We have better information about what possible applications of our next generation nodes might require (the ones just delivered will also be short on memory for this specific application). We can feed this information into future expansion/upgrade procurement. - After doing this one run, the user can decide basic questions of theirs: Is the application even useful? What is the value of the application versus the cost to acquire/upgrade hardware to allow for additional runs of the application? All the other discussion in these threads is useful, but sometimes a basic brute force approach is sufficient (either it works or it doesn't work). The practical ideas about what is required to build a 500GB node are useful (e.g. may need multiple CPUs just to be able to add the memory at all). I wish I had more time/need to understand/use some of these lower level performance improvement issues. However, Brute force is all most people are actually interested in and sometimes the answer is to wait for time/other uses to drive up capability of commodity systems. I also continue to believe that push back to application programmers to build knowledge about scaling issues is necessary. I'm not fully fanatical about it, but I don't like wasting environmental resources when more intelligent programming would reduce needs. I like the idea of large HPC systems helping to supply this back pressure by getting more work out of existing systems. Unfortunately, this often seem to work the other way in that now people think (smaller) HPC systems are becoming commodity items. Stuart Barkley -- I've never been lost; I was once bewildered for three days, but never lost! -- Daniel Boone _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org sponsored by Penguin Computing To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf