On Sun, Feb 26, 2012 at 5:23 PM, Warren Weckesser <warren.weckes...@enthought.com> wrote: > I haven't pushed it to the extreme, but the "big" example (in the examples/ > directory) is a 1 gig text file with 2 million rows and 50 fields in each > row. This is read in less than 30 seconds (but that's with a solid state > drive).
Obviously this was just a quick test, but FYI, a solid state drive shouldn't really make any difference here -- this is a pure sequential read, and for those, SSDs are if anything actually slower than traditional spinning-platter drives. For this kind of benchmarking, you'd really rather be measuring the CPU time, or reading byte streams that are already in memory. If you can process more MB/s than the drive can provide, then your code is effectively perfectly fast. Looking at this number has a few advantages: - You get more repeatable measurements (no disk buffers and stuff messing with you) - If your code can go faster than your drive, then the drive won't make your benchmark look bad - There are probably users out there that have faster drives than you (e.g., I just measured ~340 megabytes/s off our lab's main RAID array), so it's nice to be able to measure optimizations even after they stop mattering on your equipment. Cheers, -- Nathaniel _______________________________________________ NumPy-Discussion mailing list NumPy-Discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion