"Jeffrey B. Layton" <[EMAIL PROTECTED]> writes: > Here comes the $64 question - how do you benchmark the IO portion of > your code so you can understand whether you need a parallel file > system, what kind of connection do you need from a client to the > storage, etc. This is a difficult problem and one in which I have an > interest.
This is straightforward, though not easy to explain compactly. The key is to know how to run tools like top, vmstat, etc. and read them. If you run your code on a real machine, you can swiftly see if you are using 100% of your CPU or not. The goal, naturally, is to have the CPU busy at all times. If you are CPU bound, congratulations, you then can turn to tools like cache performance evaluators to determine if you can tune your CPU utilization somehow (which you almost certainly can). If, however, your CPU is not at 100% utilization, you are somehow I/O bound. There are several reasons this could be happening. First, you could be using lots of virtual memory -- the tools will tell you in a moment -- in which case the single best thing to do is not to increase the speed of the file system at all but to increase the amount of memory you have available so your working set fits very comfortably in RAM. Second, you could doing lots of file i/o paging in program text segments, which is another flavor of the first problem. Again, more memory will help, but so will proper tuning of the page cache parameters. Third, you could be doing lots of file i/o to legitimate data files. Here again, it is possible that if the files are small enough and your access patterns are repetitive enough that increasing your RAM could be enough to make everything fit in the buffer cache and radically lower the i/o bandwidth. On the other hand, if you're dealing with files that are tens or hundreds of gigabytes instead of tens of megabytes in size, and your access patterns are very scattered, that clearly isn't going to help and at that point you need to improve your I/O bandwidth substantially. > The best way I've found is to look a the IO pattern of your > code(s). The best I've found to do this is to run an strace against > the code. I've written an strace analyzer that gives you a > higher-level view of what's going on with the IO. That will certainly give you some idea of access patterns for case 3 (above), but on the other hand, I've gotten pretty far just glancing at the code in question and looking at the size of my files. I have to say, though, that really dumb measures (like increasing the amount of RAM available for buffer cache -- gigs of memory are often a lot cheaper than a fiber channel card -- or just having a fast and cheap local drive for intermediate data i/o) can in many cases make the problem go away a lot better than complicated storage back end hardware can. If you really are hitting disk and can't help it, a drive on every node means a lot of spindles and independent heads, versus a fairly small number of those at a central storage point. 200 spindles always beat 20. In any case, let me note the most important rule: if your CPUs aren't doing work most of the time, you're not allocating resources properly. If the task is really I/O bound, there is no point in having more CPU than I/O can possibly manage. You're better off having 1/2th the number of nodes with gargantuan amounts of cache memory than having CPUs that are spending 80% of their time twiddling their thumbs. The goal is to have the CPUs crunching 100% of the time, and if they're not doing that, you're not doing things as well as you can. Of course, if your CPU is crunching 100% of the time, there is no point in wasting time on faster i/o as it pretty much by definition is going to go to waste. > I'm also working on a tool that can take the strace output and > create a "simulator" that will run in a similar manner to the > original code but actually perform the IO of the original code using > dummy data. This allows you to "give" away a simple dummy code to > various HPC storage vendors and test your application. This code is > taking a little longer than I'd hoped to develop :( It sounds cool, but I suspect that with even simpler tools you can probably deduce most of what is going on and get around it. -- Perry E. Metzger [EMAIL PROTECTED] _______________________________________________ Beowulf mailing list, [email protected] To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
