On Sun, Feb 26, 2012 at 1:49 PM, Nathaniel Smith <n...@pobox.com> wrote:
> On Sun, Feb 26, 2012 at 7:16 PM, Warren Weckesser > <warren.weckes...@enthought.com> wrote: > > On Sun, Feb 26, 2012 at 1:00 PM, Nathaniel Smith <n...@pobox.com> wrote: > >> For this kind of benchmarking, you'd really rather be measuring the > >> CPU time, or reading byte streams that are already in memory. If you > >> can process more MB/s than the drive can provide, then your code is > >> effectively perfectly fast. Looking at this number has a few > >> advantages: > >> - You get more repeatable measurements (no disk buffers and stuff > >> messing with you) > >> - If your code can go faster than your drive, then the drive won't > >> make your benchmark look bad > >> - There are probably users out there that have faster drives than you > >> (e.g., I just measured ~340 megabytes/s off our lab's main RAID > >> array), so it's nice to be able to measure optimizations even after > >> they stop mattering on your equipment. > > > > > > For anyone benchmarking software like this, be sure to clear the disk > cache > > before each run. In linux: > > Err, my argument was that you should do exactly the opposite, and just > worry about hot-cache times (or time reading a big in-memory buffer, > to avoid having to think about the OS's caching strategies). > > Right, I got that. Sorry if the placement of the notes about how to clear the cache seemed to imply otherwise. > Clearing the disk cache is very important for getting meaningful, > repeatable benchmarks in code where you know that the cache will > usually be cold and where hitting the disk will have unpredictable > effects (i.e., pretty much anything doing random access, like > databases, which have complicated locality patterns, you may or may > not trigger readahead, etc.). But here we're talking about pure > sequential reads, where the disk just goes however fast it goes, and > your code can either keep up or not. > > One minor point where the OS interface could matter: it's good to set > up your code so it can use mmap() instead of read(), since this can > reduce overhead. read() has to copy the data from the disk into OS > memory, and then from OS memory into your process's memory; mmap() > skips the second step. > > Thanks for the tip. Do you happen to have any sample code that demonstrates this? I'd like to explore this more. Warren
_______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion