On Feb 26, 2012, at 1:49 PM, Nathaniel Smith wrote: > On Sun, Feb 26, 2012 at 7:16 PM, Warren Weckesser > <warren.weckes...@enthought.com> wrote: >> On Sun, Feb 26, 2012 at 1:00 PM, Nathaniel Smith <n...@pobox.com> wrote: >>> For this kind of benchmarking, you'd really rather be measuring the >>> CPU time, or reading byte streams that are already in memory. If you >>> can process more MB/s than the drive can provide, then your code is >>> effectively perfectly fast. Looking at this number has a few >>> advantages: >>> - You get more repeatable measurements (no disk buffers and stuff >>> messing with you) >>> - If your code can go faster than your drive, then the drive won't >>> make your benchmark look bad >>> - There are probably users out there that have faster drives than you >>> (e.g., I just measured ~340 megabytes/s off our lab's main RAID >>> array), so it's nice to be able to measure optimizations even after >>> they stop mattering on your equipment. >> >> >> For anyone benchmarking software like this, be sure to clear the disk cache >> before each run. In linux: > > Err, my argument was that you should do exactly the opposite, and just > worry about hot-cache times (or time reading a big in-memory buffer, > to avoid having to think about the OS's caching strategies). > > Clearing the disk cache is very important for getting meaningful, > repeatable benchmarks in code where you know that the cache will > usually be cold and where hitting the disk will have unpredictable > effects (i.e., pretty much anything doing random access, like > databases, which have complicated locality patterns, you may or may > not trigger readahead, etc.). But here we're talking about pure > sequential reads, where the disk just goes however fast it goes, and > your code can either keep up or not.
Exactly. > One minor point where the OS interface could matter: it's good to set > up your code so it can use mmap() instead of read(), since this can > reduce overhead. read() has to copy the data from the disk into OS > memory, and then from OS memory into your process's memory; mmap() > skips the second step. Cool. Nice trick! -- Francesc Alted _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion