Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

Nathaniel Smith Sun, 26 Feb 2012 11:00:44 -0800

On Sun, Feb 26, 2012 at 5:23 PM, Warren Weckesser
<warren.weckes...@enthought.com> wrote:
> I haven't pushed it to the extreme, but the "big" example (in the examples/
> directory) is a 1 gig text file with 2 million rows and 50 fields in each
> row.  This is read in less than 30 seconds (but that's with a solid state
> drive).


Obviously this was just a quick test, but FYI, a solid state drive
shouldn't really make any difference here -- this is a pure sequential
read, and for those, SSDs are if anything actually slower than
traditional spinning-platter drives.

For this kind of benchmarking, you'd really rather be measuring the
CPU time, or reading byte streams that are already in memory. If you
can process more MB/s than the drive can provide, then your code is
effectively perfectly fast. Looking at this number has a few
advantages:
 - You get more repeatable measurements (no disk buffers and stuff
messing with you)
 - If your code can go faster than your drive, then the drive won't
make your benchmark look bad
 - There are probably users out there that have faster drives than you
(e.g., I just measured ~340 megabytes/s off our lab's main RAID
array), so it's nice to be able to measure optimizations even after
they stop mattering on your equipment.

Cheers,
-- Nathaniel
_______________________________________________
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion

Re: [Numpy-discussion] Possible roadmap addendum: building better text file readers

Reply via email to