I really shouldn't, but I'll bite just this once. On Fri, Jun 11, 2010 at 7:28 PM, Steven D'Aprano <st...@pearwood.info> wrote: > > Your idea of elegant and simple is not the same as mine. To me, I see > unnecessary work and unneeded complexity. A generator expression for a > beginner having trouble with the basics? Calling mylist.count(item) > *twice* for every item?! How is that possibly elegant? >
What I mean by simple is that that line of code is the most obvious and direct translation of "a list of every item and how often it appears, for every item that appears more than once." I call it elegant because the code is even shorter than that English description while remaining readable, possibly even to someone with little programming experience. I'm making a judgement based on readability, not performance. Code that goes for performance in lieu of readability I'd call "clever," and that's not a compliment. Yes, it's an opinion. You disagree. That's fine. > [pedantic] > Not everything. Some things are inherently slow, for example: > > http://en.wikipedia.org/wiki/Busy_beaver > > A five-state Busy Beaver machine takes 47,176,870 steps to return, and a > six-state Busy Beaver takes 10**21132 steps to return. > > A slightly less extreme example is Ackermann's Function: > > http://en.wikipedia.org/wiki/Ackermann's_function > > where (for example) A(4,2) has over 19,000 digits. Calculating the > number of recursive steps needed to calculate that value is left as an > exercise. > [/pedantic] > > These are extreme examples, but the point is that there are tasks which > are hard even for small N. "Find N needles in a haystack" remains hard > even for N=1. > Now you're just arguing for the sake of being right (points for marking it [pedantic], I suppose). The busy beaver and Ackermann function have no bearing on this discussion whatsoever. If we're going to argue pedantically, the 1-state busy beaver (n=1) finishes in one step, and A(1,0) = A(0, 1) = 2. Even those are fast for small n, they just grow ridiculously quickly. Also, (pedantically), the problem "find needles in a haystack" grows in the size of the haystack, not the amount of needles. So the more appropriate n=1 version would be a haystack of size 1 with a needle in it. Which is quite easy. > (And before you suggest using a magnet, it's a *plastic* needle.) That solution is not in the spirit of the problem, and since I know that, I wouldn't bring it up. > > The problem is that situations that won't be encountered often *are* > encountered, long after the coder who introduced the poorly-performing > algorithm has moved on. > > Here's a real-life example, from Python itself: last September, on the > Python-Dev mailing list, Chris Withers reported a problem downloading a > file using Python's httplib module. For a file that took wget or > Internet Explorer approximately 2 seconds to download, it took Python > up to thirty MINUTES -- that's nearly a thousand times slower. > > Eventually Chris, with the help of others on the list, debugged the > problem down to a Shlemiel algorithm in the httplib code. Somebody > found a comment in the _read_chunked method that said: > > # XXX This accumulates chunks by repeated string concatenation, > # which is not efficient as the number or size of chunks gets big. > > So somebody used an algorithm which they KNEW was inefficient and slow, > it had been there for years, affecting who knows how many people, until > eventually somebody was annoyed sufficiently to do something about it. > And the sad thing is that avoiding repeated string concatenation is > easy and there was no need for it in the first place. > Estimating the size of your input is sometimes hard. That's a valid point. OTOH, that line of code takes maybe 10 seconds to write, and after profiling reveals it to be slow (you *are* profiling your code, right?) you can easily replace it with something more appropriate. Or your successor can, since he'll have no problem figuring out what it does. > > You've missed the point. It's not the language, you can write poorly > performing code in any language, and an O(N) algorithm in a slow > language will probably out-perform an O(N**2) or O(2**N) algorithm in a > fast language. > That wasn't the point I was trying to make. My point was that speed is not always the primary concern, and if it is, you shouldn't be writing code in python anyway, since it is always possible to write a program in C that performs better. Alan's code makes a trade-off between performance and readability. I'd call it the easiest to read solution so far. That's a trade-off that may or may not be appropriate. My point is that you shouldn't dismiss the code just because the algorithm sucks. You should only dismiss it because the algorithm sucks *and* your code will have to handle large inputs. Hugo _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor